REGRESSION USING PYTHON

Anjan Parajuli
Analytics Vidhya
Published in
4 min readJul 18, 2021

--

Well well well!!!!!!!!

We are discussing about regression here in as easier way as possible.

Lets start from basics.

Regression is simply a way or a tool that describes relationship among variables and helps us estimate or predict from that relationship. Simply , it is a way of predicting a single value that makes sense from large amount of data.

Well, if you are not getting it in your head till now just wait for few seconds until you start coding. It will be much easier to understand something using hands-on experience rather than seamlessly reading theories.

Caveat: We are doing simple linear regression here which is fundamental for multiple linear regression in the long run which will be posted later.

So here we go folks. We are doing it step by step:

STEP 1: Open google colab or any platform you are familiar with. And import the data as shown below.

Jupyter notebook is used but works with colab too.

Here numpy is required to create data. Scikit learn is used to import models and matplotlib to visualize data . You only need to know about numpy and sklearn but matplotlib would be helpful to get gist easily.

STEP 2: Get the data ready using numpy

Here you just write your X data inside first array and Y data inside second array.

In simple linear regression Y=a+bX .

Y is dependent variable on X and X is independent variable.

STEP 3 : Let’s visualize the data to check whether there is any pattern .

Woohh!! see we get the pattern of Y increasing as X increases using scatter plot. This is why visualizing is really helpful.

STEP 4: Fit the data into our LinearRegression

Here the first line model=LinearRegression() means we are instantiating object of class LinearRegression.

Woooh…what does that even mean?

Well I know it might be hard for someone not familiar with Object oriented programming to understand . So simply speaking, it means we have the equation

Y=a+bX where a=intercept and coefficient of X

and is asking us to place our data into that equation to find the value of a and b. So, first line is getting Y=a+bX ready to get data. Putting data into equation is done by second line of code.

STEP 5: GETTING OUR REQUIRED values of a and b from model

Here we get our required values of a and b. It’s just the matter of code here.

You don’t need to understand much because a and b is explained in earlier step.

STEP 6: VISUALIZING WHETHER OUR MODEL FITTED THE DATA OR NOT

Here we can see that the line that we obtain from the equation after obtaining values of a and b fits the data in much efficient way. And this is what our model should achieve too.

STEP 7 : TESTING R-squared

This shows that Y is 83% affected by X and remaining by other factors.

Congrats. You have finally performed simple linear regression using python.

You can also extend it to large problems ,even multiple linear regression. I hope to see you do that soon. And for now this is the basics you need to know for starting out with regression.

I will be posting more on statistical methods using python. Stay tuned.

Thank you for staying with me for so long. I appreciate it.💖

--

--