Crowd Sourcing for Big Data Analytics

By Saurabh Shakyawar, 3rd Eye Advisory®
Crowd Sourcing for Big Data Analytics

Crowd Sourcing is the process of enlisting a crowd of people to solve a problem. The idea of crowd Sourcing was introduced first by Jeff. Howe in 2006. Since then, an enormous amount of efforts from both academia and industry has been put into this area and so many crowd sourcing platforms and research prototypes have been introduced.

Amazon Mechanical Turk (MTurk), Crowdflower, Wikipedia and Stackoverflow are examples of well-known crowd sourcing platforms. To crowd source a problem, the problem owner, also called the requester, prepares a request for crowd's contributions and submits it to a crowd sourcing platform. This request, also referred to as the crowd sourcing task or shortly as the task, consists of a description of the problem that is asked to be solved, a set of requirements necessary for task accomplishment, a possible criteria for evaluating quality of crowd contributions and any other information which can help workers produce contributions of higher quality levels.

People who are willing to contribute to the task, also called workers, select the task, if they are eligible to do so, and provide the requester with their contributions. The contributed content has been sent to the requester directly or via crowd sourcing platform. Then contributors may be evaluated the contributions and workers will be rewarded based on whose contributions have been accepted. Several dimensions characterized a crowd sourcing task, each of which impacting various aspects of the task from outcome quality to execution time or the costs.

Worker Selection:
Quality of workers who contribute to a task can directly impact the quality of its outcome Low quality or malicious workers can produce low quality contributions and consequently waste the resources of the requester. Research shows that recruiting suitable workers can lead to receiving high quality contributions. A suitable worker is a worker whose profile, history, experiences and expertise highly matches the requirements of a task.

Real-time Control and Support:
During the execution of the task, the requester may manually or automatically control the workflow of the task and manipulate the workflow or the list of the workers who are involved in the task in order to increase the chance of receiving high quality contributions. Moreover, workers may increase their experience while contributing to a task by receiving real-time feedback from other workers or requester. The feedback received in real-time, and before final submission of the worker's contribution, can assist her with pre-assessing her contribution and change it so that satisfies the task requirements. Real-time workflow control and giving feedback can directly impact the outcome quality, the execution time and also the cost of the task, so they should be taken into account when studying crowd sourcing processes.

Quality Assessment:
Assessing the quality of contributions received from the crowd is another important aspect of a crowd sourcing process. Quality in crowd sourcing is always under question. The reason is that workers in crowd sourcing systems have different levels of expertise and experiences; they contribute with different incentives and motivations; and even they might be included in collaborative unfair activities. Several approaches are proposed to assess quality of workers' contributions such as expert review, Input agreement, output agreement, majority consensus and ground truth.

Compensation Policy:
Rewarding the workers whose contributions have been accepted or punishing malicious or low quality workers can directly impact their chance, eligibility and motivation to contribute to the future tasks. Rewards can be monetary (extrinsic) or non-monetary (intrinsic). Research shows that the impact of intrinsic rewards, e.g., altruism or recognition in the community, on the quality of the workers' contributions is more than the monetary rewards. Choosing an adequate compensation policy can greatly impact the number of contributing workers as well as the quality of their contributions. Hence, compensation policy is an important aspect of a crowd sourcing process

Aggregation Technique:
A single crowd sourcing task might be assigned to several workers. The final outcome of such a task can be one or few of the individual contributions received from workers or an aggregation of all of them. Voting is example of the tasks that crowd contributions are aggregated to build up the final task outcome. In contrast, in competition tasks only one or few workers' contributions are accepted and rewarded. Each of the individual contributions has its own characteristics such as quality level, worker's reputation and expertise and so many other attributes. Therefore, combining or even comparing these contributions is a challenging tasks and choosing a wrong aggregation method can directly impact the quality of the task outcome.

#ReadyBusinessPlan #ask3rdEyeAdvisory #LearnAt3rdEyeAdvisory #3rdEyeAdvisory

Article by: Saurabh Shakyawar, 3rd Eye Advisory®
More on IT Advisory