R vs Python

By Sushmita Rai, 3EA
R vs Python

Python and R, considered to be the most popular programming languages for statistics. Experts of the specified domain think or believe that R's functionality is used to create a better visualization of the data whereas python is often praised for its easy-to-understand syntax. For most data analysis projects, the primary goal is to create the highest quality analysis in the least amount of time. If we understand the underlying concept of what we are expecting to do with the data, then as a next step we adopt either programming language to perform the analysis. For example, if a Data Analyst understands the principles of natural language processing, data cleaning, and machine learning, then he/she can implement an automated text summarizer in R or Python.

Both Python and R, are popular programming languages for statistics. R is the open source counterpart of SAS, which has traditionally been used in academics and research. Python is often used for its easy-to-understand syntax.

Many a time the question has been raised that which programming language is better. Which programming language considered as one of the emerging global trends in regards to Indian analytics industry (which is at a stage of evolution)? I will be describing certain points of differentiation between R and Python which will help the users to understand that why market slightly bending towards Python in today's scenario. Sometimes we get confused with whether to use R or Python for day-to-day data analysis tasks but as it has been explained by experts that this choice depends on the type of data analytical challenge that we are facing.

We can find many numbers comparing the adoption and popularity of R and Python. It's hard to compare them side-by-side and the main reason for this is that we will find R only in data science environment whereas python is widely used as a general purpose language in many fields, such as web development.

When and how to use R?
R language is used for data analysis tasks and it's manageable for almost any type of data analysis because of the huge number of packages and readily usable tests often provide with the necessary tools for solving big data problems. The following popular packages that are recommended:

  • dplyr, plyr and data table used for data manipulation

  • stringr to manipulate strings

  • ggplot2, ggyis, lattice to visualize data

  • caret for machine learning

When and how to use Python?
Python is used whenever the data analysis tasks need to be incorporated into web apps or if statistics code needs to be incorporated into a production database. It is often considered as one of the best tool to implement algorithms for production use. Underdeveloped basic Python packages for data analysis was an issue in the past, and this has improved significantly over years. In order to perform the basic analytical techniques, a programmer need to install NumPy/SciPy (scientific computing) and pandas (data manipulation). Other packages like matplotlib for graphics and scikit-learn for machine learning.

Results of KDnuggets poll says that in 2016 Python was in 2nd place with 34% share which was increased in 2017 by 41%. It was also found that the share of KDnuggets who used both R and Python in significant ways also increased from 8.5% to 12% in 2017 and the shares have been dropped for other programming tools.

Figure 1: Share of Python, R, Both, or Other platforms usage for Analytics, Data Science, Machine Learning, 2016 vs 2017

The Market is slightly bending towards Python in today's scenario and after evaluating certain measures an analyst can think of what might be suitable as data analysis tool. Here are few specific scenarios describing the basic circumstances and the possible solutions for the same:

  • If a novice entering in analytics industry (specifically so in India), it is recommended to learn SAS as the first language as it holds highest job market share.

  • The professionals who have already spent time in the industry must diversify their expertise in learning a new tool.

  • For experts in the industry, one should know at least 2 of these languages.

  • For start-up/ freelancing, R/Python is more useful.

If any of the organization is looking for purchasing a tool, then the following scores for specific differentiating parameters will be helpful:

Parameter R (Scores: 1-Low and 5-High) Python (Scores: 1- Low and 5-High)
Availability/ Cost 5 5
Ease of learning 2.5 3.5
Data Handling Capabilities 4 4
Graphical Capabilities 4.5 4.5
Advancement in tools 4.5 4.5
Job Scenario 4.5 4.5
Customer service Support and Community 3.5 3.5

The future looks bright for Python users, but we expect that R and other platforms will maintain some share in the foreseeable future because of their large embedded base.

#ReadyBusinessPlan #Ask3EA #LearnAt3EA #3EA #BusinessPlan #CapacityEnhancement #CapacityBuilding #Capacity #Assessment #Global #RvsPython #DataScience #DataAnalysis

Article by: Sushmita Rai, 3EA