Supervised Learning
Linear Regression
Learn the fundamental supervised learning algorithm for predicting continuous values.
Linear Regression
Linear regression predicts a continuous output by finding the best-fit line through the training data.
Simple: One input feature: y = mx + b
Multiple: Many features: y = w₁x₁ + w₂x₂ + ... + b
How It Works
The model learns by minimizing the Mean Squared Error (MSE) — the average of squared differences between predictions and actual values.
This is done using gradient descent — iteratively adjusting the weights in the direction that reduces the error.
Evaluation Metrics
- MSE (Mean Squared Error): Average of squared errors
- RMSE (Root MSE): In original units
- MAE (Mean Absolute Error): Average of absolute errors
- R² (R-squared): 1 = perfect, 0 = no better than mean, can be negative
Example
python
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import PolynomialFeatures
# Generate sample data
np.random.seed(42)
X = np.random.rand(100, 1) * 10
y = 2.5 * X.flatten() + 5 + np.random.randn(100) * 2
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train model
model = LinearRegression()
model.fit(X_train, y_train)
print(f"Coefficient: {model.coef_[0]:.2f}") # ~2.5
print(f"Intercept: {model.intercept_:.2f}") # ~5.0
# Evaluate
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"RMSE: {mse**0.5:.2f}")
print(f"R²: {r2:.3f}")
# Multiple Linear Regression with pandas
df = pd.DataFrame({
'size': [1500, 2000, 1200, 1800, 2500],
'bedrooms': [3, 4, 2, 3, 5],
'age': [10, 5, 20, 8, 2],
'price': [300000, 450000, 200000, 380000, 550000]
})
X_multi = df[['size', 'bedrooms', 'age']]
y_multi = df['price']
multi_model = LinearRegression()
multi_model.fit(X_multi, y_multi)
# Feature importance (coefficients)
for feat, coef in zip(X_multi.columns, multi_model.coef_):
print(f"{feat}: {coef:.2f}")Try it yourself — PYTHON