A Short Introduction to Data Science
Harvard Business Review calls data science the “sexiest” job of the 21st century and it’s showing huge promise for those looking to switch careers or get started with a rewarding position.
What Is Data Science?
It is the process of drawing useful conclusions from data using computers.
What Is A Data Scientist?
The following diagram illustrates the skills a data scientist is expected to have:
Hacking skills in the diagram means programming-related skills.
If you are working with a team, you don't have to have all of the skills in the diagram; for example you might be strong in hacking skills and statistics, and your colleague might have the substantive expertise required, so you can complete each other and form a team.
Different Names
Data analysis have different names depending on the field it is used in. For example, it can be called:
- Bioinformatics: when dealing with medical data
- Data Science: for data from web analytics.
- Machine Learning: for data in computer science or computer vision.
- Natural Language Processing: when dealing with data in the text form
- Signal Processing: when dealing with electrical signals
- Business Analytics: when dealing with data on customers
- Econometrics: when dealing with economic data, etc
Where does data come from?
There are a lot or sources that data can come from. For example:
- Customer behavior records: what customers buy and when
- Text: text from books, magazines, and the Internet
- Video
- Climate records: data about weather for the past ten years
- Health records: hospital records about its patients and their related data
- Governmental records: about crimes; when they happen and where
These are a few examples. In our world now, there are a multitude of resources that data can come from.
Why Data Science?
Now, data is everywhere in huge amounts; with 400 hours of video uploaded to YouTube every minute, and 350000 tweets sent on Twitter every minute; with many organizations collecting data like what United States government does in data.gov where it shares data it collects in multiple domains like health, climate, education, etc. With all of that data, and considering the complexity of our world nowadays, we need an efficient, fast, and reliable way to deal with data and get useful conclusions from it, and here where comes data science which uses computer power to deal with data rather than using human power because humans need a very long time to process and analyze the huge amounts of data available today, and they might overlook some aspects in their conclusions.
Where Data Science Is Used?
Data science is used in an increasing number of fields. It is used in the domains of healthcare, retailing, mobile networks, economy and finance, oil and gas, etc.
Generally, data science uses data for two purposes:
- to describe some phenomenon
- to predict the future
Data science has many interesting applications. For example, it is used to build recommendation systems like product recommendation on Amazon and connection recommendation on LinkedIn; it is used also in image and speech recognition where computer can identify the contents of an image and the words of a speech; it is used by companies to detect fraudulent activities; it is used in building self-driving cars and robots; it is used to predict medical diagnosis, and the list goes on and on.
What Tools Data Scientists Use?
This question actually needs a dedicated post to talk about because there are a lot of tools to use. However, one of the most used tools in data science is Python: it has useful libraries like NumPy which facilitates dealing with multidimensional arrays and provides many useful mathematical functions, and Pandas which handles data in a way suited for analysis; it takes the best parts of R programming language and implements them in Python.
Data Science Specialties
Some of data science specialties are:
- Data Visualization: make infographics that are great tools to visualize.
- Educator: start a blog or a YouTube channel. A lot of people want to learn about data science these days.
- Programmer: there is a lot of programming in data science.
- Researcher: going to a graduate school is one of the best strategies here.
Sources
Demystifying Data Science (video)
23 Reasons to Get Excited About Data
Jeff Leek’s Data Analysis Coursera Class
Udacity course: Intro to Data Science