Starting guide to artificial intelligence part 2

3 min readApr 2, 2021

So in Python Machine learning we have a lot of libraries that makes the implementation process a lot easier.

First of all, Pandas library is one of the most useful libraries in Python. The main use of this library is in data preprocessing and manipulation. Starting from reading and loading the data into python with from many extensions such as csv, xsls, html, json, etc. And preprocess these data by dealing with missing values and a lot more. In addition, the great documentation offered on their website makes it easier and faster in preprocessing and cleaning the data.

Second is Numpy. It is also one of the most significant libraries. Not only in machine learning, but in a lot of applications in the mathematical field. This library allows you to do complex mathematical operations and matrix manipulation. The most useful part in this library is that it takes advantage of the parallelization in your pc. In simpler words, it uses parallel computing in your computer to make any process faster than normal implementation. So for example if you want to build matrix vector multiplication with an average programming skills, you will do some for loops. However this library takes advantage of multiple cores in the processor of each computer which makes any operation faster.

Third one is Scikit-learn, this library has a lot of functions to help you building a lot of machine learning models using some functions that takes a lot of parameters to allow you to control all the machine learning model properties. There is a huge useful documentation in this library that will simplify the process of building any model such as multilayer perceptron, support vector machine, multilayer regressor and a lot more!

So our focus will be with the supervised machine learning. In this type goal is to teach the machine by giving it a lot of data and then the machine will be able to build an experience from these data. In machine learning we call this experience a pattern.

Supervised machine learning can be divided into regression and classification. Regression is when the predicted value of the machine is numeric value, so for example you are predicting salaries or price of a product. However, classification is when the predicted value is classes. For example low, medium and high. Or class A,B,C.

We will build a complete detailed easy example on both of them in the next parts. However, we will explain here briefly the process and how is it working.

Lets say our objective is to build a machine learning model that is able to predict the price of a house using different factors. Which is a regression problem. The factors such as, number of rooms, year of building this house, State , How many car parkings, Has a backyard? These factors and a lot more control the price of the house. This type of data in machine learning we call it structured data or tabular data. Dealing with this type of data is a lot easier than dealing with unstructured data such as images, text data. We will take about unstructured data in details later.

So getting back to the house price example, the data would look something like this:

PS: This data are randomly generated and they aren’t real data, this table is just for explanation.

In this case the machine learning model will have 5 inputs and 1 output. Inputs are also called features and independent variables. And output also called dependent variable, because it depends on all the inputs together.

In real life problems. This data will not be easy. They might have a lot of missing values and outliers. So Pandas will help us to solve these problems when we start solving this problem.

In part 3, we will solve a complete regression example.

Written by Omar M. Atef