# Python: simple Linear Regression – two possibilities, either sklearn or Numpys’ Polyfit

In contrast to the Numpy Polyfit way of getting linear regression, the one involving sklearn is somewhat more complex.

## Step 1 – Get the data (normally with .csv)

```# credit https://databasetown.com/machine-learning-with-python-a-real-life-example/
import pandas as pd
import numpy as np
from sklearn import linear_model
import matplotlib.pyplot as plt

mango_data = {'Year': [2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019],
'Mango_Price': [40, 50, 55, 60, 65, 70, 75, 80, 90]}
# This is the code that you would use if you already had a csv file:
# df
mango_prices = pd.DataFrame(data=mango_data)
mango_prices
```

## Step 2 – Visualise the data with scatter plot

```plt.title('Our historic mango prices', fontsize=12)
plt.xlabel('Year')
plt.ylabel('Mango Price')
plt.scatter(mango_prices.Year,mango_prices.Mango_Price, color='blue', marker='.', linewidth='5')
plt.show()
```

## Step 3 – Reformatting values

```new_df = mango_prices.drop('Mango_Price', axis='columns')
new_df
```
```Mango_Price = mango_prices.Mango_Price
Mango_Price
```

## Step 4 – Use Sklearn fit() method to get Linear Regression

```# in order to train model, we create an object of
#  Linear Regression class and call a fit() method
reg_model = linear_model.LinearRegression()
reg_model.fit(new_df,mango_prices.Mango_Price)
```
` LinearRegression() `

## Step 5 – Use Sklearn regression model prediction to predict two years values

```# predict the price of the mangoes in 2020 and 2021
reg_model.predict([[2020],[2021]])
```
` array([93.33333333, 99.        ]) `

## Step 6 – Use Sklearn to get coefficient and intercept

```# find the slope (coefficient)
reg_model.coef_
reg_model.intercept_
print(reg_model.coef_)
print(reg_model.intercept_)
```
` [5.66666667] -11353.333333333334 `

## Step 7 – Test the prediction by plugging coefficient and intercept in formula

```# y = mx + b <-- m is a slope and b is an intercept.
# Values of coefficent and intercept in above equation
2020 * 5.66666667 + (-11353.333333333334)
&#91;/sourcecode&#93;
<!-- /wp:shortcode -->

<!-- wp:preformatted -->
<pre class="wp-block-preformatted"> 93.33334006666519  </pre>
<!-- /wp:preformatted -->

<h2>Step 8 - Get the accuracy</h2>

<!-- wp:shortcode -->

# check model accuracy
reg_model.score(new_df,Mango_Price)
```
` 0.9880341880341843 `

## Step 9 – Visualise the linear regression in a scatter plot

```# add a 10 year range
year_df = [2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024, 2025]
# reshape that 10 year 1D array to a 2D array that we can use for our model
B = np.reshape(year_df,(-1,1))
# mango price predictions
price = reg_model.predict(B)
C = np.reshape(price,(-1,1))
```
```plt.title('Our historic prices (blue) and our predictions (red) ', fontsize=12)
# the predictions based on linear regression
plt.scatter(year_df,price, c='r')
# the actual prices in blue
plt.scatter(mango_prices.Year,mango_prices.Mango_Price, c='b')

plt.xlabel('Year')
plt.ylabel('Mango Price')
plt.show()
```

None Found