Supervised Learning

Linear Regression

Learn the fundamental supervised learning algorithm for predicting continuous values.

Linear Regression

Linear regression predicts a continuous output by finding the best-fit line through the training data.

Simple: One input feature: y = mx + b

Multiple: Many features: y = w₁x₁ + w₂x₂ + ... + b

How It Works

The model learns by minimizing the Mean Squared Error (MSE) — the average of squared differences between predictions and actual values.

This is done using gradient descent — iteratively adjusting the weights in the direction that reduces the error.

Evaluation Metrics

MSE (Mean Squared Error): Average of squared errors
RMSE (Root MSE): In original units
MAE (Mean Absolute Error): Average of absolute errors
R² (R-squared): 1 = perfect, 0 = no better than mean, can be negative

Example

python

import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import PolynomialFeatures

# Generate sample data
np.random.seed(42)
X = np.random.rand(100, 1) * 10
y = 2.5 * X.flatten() + 5 + np.random.randn(100) * 2

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = LinearRegression()
model.fit(X_train, y_train)

print(f"Coefficient: {model.coef_[0]:.2f}")  # ~2.5
print(f"Intercept: {model.intercept_:.2f}")   # ~5.0

# Evaluate
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"RMSE: {mse**0.5:.2f}")
print(f"R²: {r2:.3f}")

# Multiple Linear Regression with pandas
df = pd.DataFrame({
    'size': [1500, 2000, 1200, 1800, 2500],
    'bedrooms': [3, 4, 2, 3, 5],
    'age': [10, 5, 20, 8, 2],
    'price': [300000, 450000, 200000, 380000, 550000]
})

X_multi = df[['size', 'bedrooms', 'age']]
y_multi = df['price']

multi_model = LinearRegression()
multi_model.fit(X_multi, y_multi)

# Feature importance (coefficients)
for feat, coef in zip(X_multi.columns, multi_model.coef_):
    print(f"{feat}: {coef:.2f}")

Try it yourself — PYTHON

import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import PolynomialFeatures

# Generate sample data
np.random.seed(42)
X = np.random.rand(100, 1) * 10
y = 2.5 * X.flatten() + 5 + np.random.randn(100) * 2

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = LinearRegression()
model.fit(X_train, y_train)

print(f"Coefficient: {model.coef_[0]:.2f}")  # ~2.5
print(f"Intercept: {model.intercept_:.2f}")   # ~5.0

# Evaluate
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"RMSE: {mse**0.5:.2f}")
print(f"R²: {r2:.3f}")

# Multiple Linear Regression with pandas
df = pd.DataFrame({
    'size': [1500, 2000, 1200, 1800, 2500],
    'bedrooms': [3, 4, 2, 3, 5],
    'age': [10, 5, 20, 8, 2],
    'price': [300000, 450000, 200000, 380000, 550000]
})

X_multi = df[['size', 'bedrooms', 'age']]
y_multi = df['price']

multi_model = LinearRegression()
multi_model.fit(X_multi, y_multi)

# Feature importance (coefficients)
for feat, coef in zip(X_multi.columns, multi_model.coef_):
    print(f"{feat}: {coef:.2f}")