In the last entry to this Machine Learning blog series, we discussed Supervised Learning and its use of labelled training data to deduce the correct outcome. However, we didn’t really point out that, in order to achieve decent levels of accuracy, these models rely heavily on access to large quantities of labelled training data. Getting your hands on suitable data of this magnitude is often quite the task in and of itself. If you get lucky you may be able to source some off-the-shelf data sets that can be moulded to suit your purpose, but it is very unlikely that you will find a perfect fit. So what happens when the data you need just doesn’t exist?
The obvious answer would be to create a unique data set, specifically designed with your model in mind. However, to manually prepare the quantity of data required would take a considerable amount of time to complete, not to mention being particularly boring and repetitive to boot. Ideally, we would get a machine to do it automatically, but considering the point of the data is to train your model to do just that, you will quickly find yourself in a never-ending chicken and egg situation. So if machine power is out and manual labelling is far too time-consuming, is there any other alternative? What’s needed in these cases is a robot simulator! A way of simulating the process of your Machine Learning model, so that the simulation can then be used in turn to train the model.
Fortunately, it seems that Amazon have already thought ahead and created a form of artificial, Artificial Intelligence in the form of their Mechanical Turk (MTurk) service. This service utilises a crowdsourcing network to connect a world-wide 24-7 workforce of human beings to organisations and businesses, who need to complete a lot of simple, but repetitive tasks. As mentioned above, these tasks tend to be difficult or, in certain situations, even impossible to complete through the application of a software algorithm, such as survey participation, content moderation and of course data classification. MTurk even features an API so that software systems can interface with the ‘robot simulators’. So, potentially, there is the opportunity for the creation of a machine that could autonomously use the service to train itself. Scary thought I know!