Artificial, Artificial Intelligence
In the last entry to this Machine Learning blog series, we discussed Supervised Learning and its use of labelled training data to deduce the correct outcome. However, we didn’t really point out that, in order to achieve decent levels of accuracy, these models rely heavily on access to large quantities of labelled training data. Getting your hands on suitable data of this magnitude is often quite the task in and of itself. If you get lucky you may be able to source some off-the-shelf data sets that can be moulded to suit your purpose, but it is very unlikely that you will find a perfect fit. So what happens when the data you need just doesn’t exist?
The obvious answer would be to create a unique data set, specifically designed with your model in mind. However, to manually prepare the quantity of data required would take a considerable amount of time to complete, not to mention being particularly boring and repetitive to boot. Ideally, we would get a machine to do it automatically, but considering the point of the data is to train your model to do just that, you will quickly find yourself in a never-ending chicken and egg situation. So if machine power is out and manual labelling is far too time-consuming, is there any other alternative? What’s needed in these cases is a robot simulator! A way of simulating the process of your Machine Learning model, so that the simulation can then be used in turn to train the model.
Fortunately, it seems that Amazon have already thought ahead and created a form of artificial, Artificial Intelligence in the form of their Mechanical Turk (MTurk) service. This service utilises a crowdsourcing network to connect a world-wide 24-7 workforce of human beings to organisations and businesses, who need to complete a lot of simple, but repetitive tasks. As mentioned above, these tasks tend to be difficult or, in certain situations, even impossible to complete through the application of a software algorithm, such as survey participation, content moderation and of course data classification. MTurk even features an API so that software systems can interface with the ‘robot simulators’. So, potentially, there is the opportunity for the creation of a machine that could autonomously use the service to train itself. Scary thought I know!
For those who may be interested in where the name Mechanical Turk came from, it was in fact based on an intimidating, but fake, chess-playing machine created in the late 18th century. This elaborate collection of cogs and levers appeared to be a mechanical automaton that players could compete against in a game of chess. In reality, a human would be hiding in a cabinet and would come up with the moves, which for fans of the Simpsons may bring back memory’s of Homer’s attempt at robot wars.
Using services like MTurk enables researchers and developers to quickly build up large data sets to train their models, a feat that may have been difficult to achieve within their own organisations. They also open up the opportunity for models to be bespoke to the problems that they are trying to solve, rather than having to make do and mend with off-the-shelf data sets and models. Of course, as with much of Amazon’s business model, some controversies surround the MTurk service, especially in regards to the pay the ‘robot simulators’ receive for their input.
In reflection, maybe when the robots finally take over, they won’t be enslaving humanity as batteries, but rather as a source of problem-solving that they are not best placed to achieve themselves? Maybe with the appearance of services, such as Amazon Turk, to some extent, this may already be happening …
Img 1 ⇑ The Mechanical Turk Hoax