A Simple Introduction to Understand Machine Learning
What Is Machine Learning?
Machine learning is a field of computer science that gives computer systems the ability to "learn" (i.e. gradually improve performance on a specific task) with data, without being explicitly programmed.
Machine learning is employed in a range of computing tasks where designing and programming explicit algorithms with good performance is difficult or impossible.
Machine learning, when used as a data-analytics tool, allows researchers, data scientists, engineers, and analysts to produce reliable, repeatable decisions and results, and uncover hidden insights through learning from historical relationships and trends in the data.
If machine learning isn't new, why is there so much interest today? Machine learning algorithms need a lot of data and computing power to produce useful results. Today, we have more data than ever, and computing power is common and cheap. Now we can automatically apply complex mathematical calculations to big data over and over, faster and faster.
Think of machine learning as a means of building models of data. Once these models have been fit to previously seen data, they can be used to predict and understand aspects of newly observed data.
Traditional computer science faces a major limitation regarding the task of extending human intelligence: we first need to explain to the computer how to perform the task we want to accomplish. For example, to create a mathematical software, we first must write a program that explains to the computer how to do each mathematical operation.However, what if the task we want to program is a more complex one? Or what if we need to program a task but we don’t even know how such a task is done?Let’s think about when we teach a kid how to identify different kinds of animals. You cannot start describing the characteristics of every animal like: “If the animal is within this range of colors and has black vertical stripes with a slightly elliptical shape and has a nose like… then it is a tiger”. Can you imagine using that teaching strategy with kids? It would be impossible and would take forever. In most cases you wouldn’t even be sure which features of each animal you are using to identify it. Instead, what we do is to show pictures of animals to children together with some specific tips, and this way they unconsciously learn what features are those that identify each animal.The need to make a program that explains to computers how to perform each task is the great limitation faced by traditional computer science programming. It has prevented computers from further extending our intelligence to solve more complex tasks. Here is where Machine Learning comes to the rescue.for the task of teaching a computer to identify animals, we will show to the computer a bunch of labeled pictures (e.g. this picture is a tiger, this pictures is a cat, etc.), the same way we do it when we teach children. The Machine Learning algorithm will use these samples to identify which are the features that differentiate one animal from another, and with this information it will write its own program to perform the task of identifying animals
Methods of Machine Learning
There are two broad methods of machine learning algorithms, depending on whether there is a feedback available to the learning system:
Supervised learning
The computer is presented with example inputs and their desired outputs, given by a "teacher", and the goal is to learn a general rule that maps inputs to outputs. Supervised learning is commonly used in applications where historical data predicts likely future events. For example, it can anticipate when credit card transactions are likely to be fraudulent or which insurance customer is likely to file a claim.
Supervised learning involves somehow modeling the relationship between measured features of data and some label associated with the data; once this model is determined, it can be used to apply labels to new, unknown data. This is further subdivided into classification tasks and regression tasks: in classification, the labels are discrete categories, while in regression, the labels are continuous quantities.
For example, with supervised learning, an algorithm may be fed data with images of sharks labeled as fish
and images of oceans labeled as water
. By being trained on this data, the supervised learning algorithm should be able to later identify unlabeled shark images as fish
and unlabeled ocean images as water
.
Unsupervised learning
No labels are given to the learning algorithm, leaving it on its own to find structure in its input. The goal is to explore the data and find some structure within. For example, it can identify segments of customers with similar attributes who can then be treated similarly in marketing campaigns. Or it can find the main attributes that separate customer segments from each other. Unsupervised learning involves modeling the features of a dataset without reference to any label, and is often described as "letting the dataset speak for itself." These models include tasks such as clustering and dimensionality reduction. Clustering algorithms identify distinct groups of data, while dimensionality reduction algorithms search for more succinct representations of the data. You may have a large dataset of customers and their purchases, but as a human you will likely not be able to make sense of what similar attributes can be drawn from customer profiles and their types of purchases. With this data fed into an unsupervised learning algorithm, it may be determined that women of a certain age range who buy unscented soaps are likely to be pregnant, and therefore a marketing campaign related to pregnancy and baby products can be targeted to this audience in order to increase their number of purchases.
There are some other methods also like semi-supervised learning and reinforcement learning.
Machine Learning Applications
There are a lot of interesting applications of machine learning. Some are:
- Self-driving Google car.
- Fraud detection.
- Web search results.
- Real-time ads on web pages and mobile devices.
- Text-based sentiment analysis.
- Prediction of equipment failure.
- Email spam filtering.
- A computer program "AlphaGo" defeated one of the best human players at Go which is a strategy game. Many masters could not fathom how it would be possible for a machine to grasp the full nuance and complexity of this ancient Chinese war strategy game, with its 10¹⁷⁰ possible board positions. Watch a 3-minutes video about this here.
- On August 11, 2017, OpenAI reached yet another incredible milestone by defeating the world’s top professionals in 1v1 matches of the online multiplayer game Dota 2. See the full match at The International 2017, with Dendi (human) vs. OpenAI (bot), on YouTube.
- In 2012, co-founder of Sun Microsystems Vinod Khosla predicted that 80% of medical doctors jobs would be lost in the next two decades to automated machine learning medical diagnostic software.
- In 2014, it has been reported that a machine learning algorithm has been applied in Art History to study fine art paintings, and that it may have revealed previously unrecognized influences between artists.
- Facial recognition technology which for example allows social media platforms to help users tag and share photos of friends.
- Recommendation engines, powered by machine learning, suggest what movies or television shows to watch next based on user preferences.
- Speech and handwriting recognition.
- Adaptive websites which are websites that build a model of user activity and modify the information and/or presentation of information to the user in order to better address the user's needs.
- Affective computing which is the study and development of systems and devices that can recognize, interpret, process, and simulate human emotions.
- Natural-language processing which is a field concerned with the interactions between computers and human (natural) languages.
Programming Languages for Machine Learning
From data taken from job ads on indeed.com in December 2016, it can be inferred that Python is the most sought-for programming language in the machine learning professional field. Python is followed by Java, then R, then C++.
Python’s popularity may be due to the increased development of deep learning frameworks available for this language recently, including TensorFlow, PyTorch, and Keras. As a language that has readable syntax and the ability to be used as a scripting language, Python proves to be powerful and straightforward both for preprocessing data and working with data directly.
Bias!
Although data and computational analysis may make us think that we are receiving objective information, this is not the case; being based on data does not mean that machine learning outputs are neutral. Human bias plays a role in how data is collected, organized, and ultimately in the algorithms that determine how machine learning will interact with that data.
If, for example, people are providing images for “fish” as data to train an algorithm, and these people overwhelmingly select images of goldfish, a computer may not classify a shark as a fish. This would create a bias against sharks as fish, and sharks would not be counted as fish.
Because human bias can negatively impact others, it is extremely important to be aware of it, and to also work towards eliminating it as much as possible. One way to work towards achieving this is by ensuring that there are diverse people working on a project and that diverse people are testing and reviewing it.
References
- What is machine learning? - Cloudera
- Machine learning - Wikipedia
- Affective computing - Wikipedia
- Adaptive website
- Natural-language processing
- Machine Learning. What it is and why it matters - SAS
- Machine Learning for Humans🤖👶 - Medium
- Python Data Science Handbook
- What is machine learning? - Quora answer
- An Introduction to Machine Learning - Digital Ocean