Showing posts with label Data Scientist. Show all posts
Showing posts with label Data Scientist. Show all posts

Wednesday, August 7, 2019

What Skills Do Data Scientists Need



There is currently a huge demand for data scientists, which is a top-trending job with attractive salaries. But what are the skills and tools that employers are looking for.
It's a few years since we asked What is a Data Scientist and How Do I Become One? The answer given back in 2015 is still valid as a starting point:
Similar to a business/data analyst, data scientists combine knowledge of computer science and applications, modelling, statistics, analytics and math to uncover insights in data.
But what does this mean in terms of the skillset a data scientist should acquire. The question How to Become More Marketable as a Data Scientist has been tackled by the research team at CV Compiler, a company which provides guidance on creating a convincing resume to developers and others in the software industry. For an analysis of the skills required by data scientists the CV Compiler team looked at 300 Data Science vacancies from StackOverflow, AngelList, and similar websites. Then using their own text analytics tool, they identified the terms which were mentioned the most frequently and created this chart:

dsskills
It needs to be noted that the research represents the preferences of employers, rather than of data scientists.
I would have expected to see "Machine Learning" near the top of the list because looking at job descriptions you discover that Machine Learning Engineers work in Data Science teams and that Data Science Interns can expect to "gain valuable AI/ML skills". Perhaps the two terms are so intertwined that knowledge of  Machine Learning is assumed.
While R is frequently referred to as "the language of data science, Python outnumbering it in job vacancies makes sense in that Python a general-purpose language and currently trending when it comes to popularity. I'm surprised to see Scala quite so high and the complete absense of Julia both from the table and from the blog report write up where other skills and tools that gain substantial number of mentions are discussed. For example, while Big Data is in the table with 221 mentions, the term Data Mining, used for "collecting big data" isn't in the table despite but the fact that it had 128 mention in job vacancies is reported.
While SQL comes high in the list, and ETL (Extract, transform, load) is in the table, there's no mention anywhere Mongo DB or No SQL. On the other hand mentions of the open source  Apache Spark outnumber those of Hadoop. Commenting on this Andrew Stetsenko writes:
According to the 2018 Big Data Analytics Market Study, Big Data adoption in enterprises soared from 17% in 2015 to 59% in 2018. Thus the popularity of Big Data tools also grew. [In addition to Spark and Haddoop] the most popular ones are MapReduce (36), and Redshift (29) .....some employers still expect candidates to be familiar with Apache Pig (30), HBase (32), and similar technologies. HDFS (20) is still being mentioned in vacancies as well.
As with Compiler CV's earlier report on the skills needed by JavaScript developers, the figures in brackets are the number of mentions.
Stetsenko also mentions the importance of data visualization, mentioned in 55 job vacancies and notes:
It’s crucial that you could represent the outcomes of your work in a format, understandable to any team member or a customer. As for the data visualization tools, employers prefer Tableau (54).
The fact that Computer Vision and NLP (Natural Language Processing) make it into the table serves to emphasize that AI and Data Science are inextricably linked and that knowledge of AI tools such as Tensorflow is well worth acquiring.

 Source: https://www.i-programmer.info/news/197-data-mining/12988-what-skills-do-data-scientists-need.html (Accessed on August 7, 2019)


Featured Posts

Top Searches from “IEEE Xplore Digital Library" - 19th April 2024

The Learning and Information Resource Centre is pleased to inform you about the  Top   Searches  from " IEEE   Xplore   Digital Library...