Python: simple Linear Regression – two possibilities, either sklearn or Numpys’ Polyfit

In contrast to the Numpy Polyfit way of getting linear regression, the one involving sklearn is somewhat more complex.

Step 1 – Get the data (normally with .csv)

# credit https://databasetown.com/machine-learning-with-python-a-real-life-example/
import pandas as pd
import numpy as np
from sklearn import linear_model
import matplotlib.pyplot as plt

mango_data = {'Year': [2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019],
            'Mango_Price': [40, 50, 55, 60, 65, 70, 75, 80, 90]}
# This is the code that you would use if you already had a csv file:
# df = pd.read_csv('mangoes_price.csv')
# df
mango_prices = pd.DataFrame(data=mango_data)
mango_prices

Step 2 – Visualise the data with scatter plot

plt.title('Our historic mango prices', fontsize=12)
plt.xlabel('Year')
plt.ylabel('Mango Price')
plt.scatter(mango_prices.Year,mango_prices.Mango_Price, color='blue', marker='.', linewidth='5')
plt.show()

Step 3 – Reformatting values

new_df = mango_prices.drop('Mango_Price', axis='columns')
new_df
Mango_Price = mango_prices.Mango_Price
Mango_Price

Step 4 – Use Sklearn fit() method to get Linear Regression

# in order to train model, we create an object of
#  Linear Regression class and call a fit() method
reg_model = linear_model.LinearRegression()
reg_model.fit(new_df,mango_prices.Mango_Price)
 LinearRegression() 

Step 5 – Use Sklearn regression model prediction to predict two years values

# predict the price of the mangoes in 2020 and 2021
reg_model.predict([[2020],[2021]])
 array([93.33333333, 99.        ]) 

Step 6 – Use Sklearn to get coefficient and intercept

# find the slope (coefficient)
reg_model.coef_
reg_model.intercept_
print(reg_model.coef_)
print(reg_model.intercept_)
 [5.66666667] -11353.333333333334 

Step 7 – Test the prediction by plugging coefficient and intercept in formula

# y = mx + b <-- m is a slope and b is an intercept.
# Values of coefficent and intercept in above equation
2020 * 5.66666667 + (-11353.333333333334)
&#91;/sourcecode&#93;
<!-- /wp:shortcode -->

<!-- wp:preformatted -->
<pre class="wp-block-preformatted"> 93.33334006666519  </pre>
<!-- /wp:preformatted -->

<!-- wp:heading -->
<h2>Step 8 - Get the accuracy</h2>
<!-- /wp:heading -->

<!-- wp:shortcode -->

# check model accuracy
reg_model.score(new_df,Mango_Price)
 0.9880341880341843 

Step 9 – Visualise the linear regression in a scatter plot

# add a 10 year range 
year_df = [2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024, 2025]
# reshape that 10 year 1D array to a 2D array that we can use for our model
B = np.reshape(year_df,(-1,1))
# mango price predictions
price = reg_model.predict(B)
C = np.reshape(price,(-1,1))
plt.title('Our historic prices (blue) and our predictions (red) ', fontsize=12)
# the predictions based on linear regression
plt.scatter(year_df,price, c='r')
# the actual prices in blue
plt.scatter(mango_prices.Year,mango_prices.Mango_Price, c='b')

plt.xlabel('Year')
plt.ylabel('Mango Price')
plt.show()

Similar Posts:

    None Found

Be the first to comment on "Python: simple Linear Regression – two possibilities, either sklearn or Numpys’ Polyfit"

Leave a comment

Your email address will not be published.


*