Data that makes machine learning and data scientists dealing with big data unnecessary


ByCarlos Muza

Businesses and researchers post data, analysts around the world and data scientists compete for predictive modeling and analytical methods Platform "Kaggle"Discloses data that clearly shows what tools are used by people involved in machine learning and data science as of 2017 and how much salary they are getting.

The State of ML and Data Science 2017 | Kaggle
https://www.kaggle.com/surveys/2017

We conducted a survey to accurately grasp the usage of data science and machine learning for users registered by Kaggle on their site. In the survey, it seems that more than 16,000 responses were obtained, and we are acquiring infamy information from data scientists and analysts all over the world, developers using machine learning, and so on.

First, check the age group of people involved in data science. The graph below shows the age of the respondents in a graph and the median is "30 years old". The median overall is 30 years old, but this number varies greatly from country to country, for example India's data scientist is 25 years old, Australia has 34 years and 9 years of age difference as well. In the case of Japan, the median was 33 years old.


Subsequently, data on the type of employment that respondents work in. 65.7% of the total work as regular employees. 12.7% answered that they were looking for jobs, 8% worked in freelance, 5.6% if they did not work, 5.6% were working, and 5.5% were working as part-time It is getting.


Next is the answer to the question "What kind of occupation is named?" The most common is data scientist (24.4%), which means that a quarter of all work as a data scientist. Next, 12.3% by software developers / engineers, 11.3% by data analysts, 9.2% by scientists / researchers, and 5.9% by machine learning engineers.


The graph below shows the annual income of respondents in a graph. There were 3771 people who responded, with a median of $ 54,441 (about 6.3 million yen). In the US, "average machine learning engineer" got the most expensive salary on average on each occupation.


The most common among respondents is a user who has a master 's degree. However, respondents who have salaries of at least 150,000 dollars (about 17 million yen) are likely to have doctorates.


When asked what kind of method is used in data science, "Logistic regression63.5% answered that they are using "Decision tree49.9%, "Random Forest"Is 46.3%," Neural network "is 37.6%. The logistic regression that the most people use is said to be used in almost all industries and occupations except in the military and security fields where the use of neural networks is common.


According to the survey, the most commonly used programming language is "Python", And 76.3% answered that they are using it. However, when we squeeze to statisticians "R language"Is increasing to 90.8%.


Further, as the types of data used at work, 65.5% of "relational data (relational data)", 53.0% of "text data", 18.1% of "image data", 5.1% of "movie data" Others account for 10.3%. Relational data is the most frequently used data type in any field except academic environment where text data is often used and military / security field.


About how we share the code at work, 58.4% usersGitWe responded that they are using.


When I heard obstacles in work, 49.4% answered that about half of people said "Dirty data" containing wrong information. According to Kaggle, "Dirty data" seems to be one of the most common problems for people working in the field of data science. In addition, "Lack of human resources in the field of data science" and "Insufficient support in management and finance" are also mentioned.


Recommended to those who will step into the field of data science from now on "What to study first" is "Python".


When asked what kind of platform we are learning about data science, we found that "Employed in Field" and "Entering Field" (those entering the field of data science in the future) " I answered that both studied at Kaggle. In addition, various study methods such as online course, blog, YouTube movie etc. are cited.


In the question of where to find open data to be used in data science studies and actual work, 63.4% are the most frequent people who use "dataset aggregator" like Kaggle, others are " Google Search "is 33.3%," University "is 26.6%," Collect by yourself "is 23.7%," GitHub "is 22.2%," Government's website "is 19.3%, others are 3.7%.


Furthermore, in response to the question of how I was looking for jobs or looking for, there are many people who have already used recruiters, and those who are currently looking for answers say they are looking for a company site doing.

in Note, Posted by logu_ii