Foundations

Machine Learning Introduction

Understand what machine learning is, how it works, and the different types of learning.

What is Machine Learning?

Machine Learning (ML) is a subset of artificial intelligence where systems learn from data to make predictions or decisions, without being explicitly programmed for each task.

> "Machine learning is the field of study that gives computers the ability to learn without being explicitly programmed." — Arthur Samuel (1959)

Types of Machine Learning

Supervised Learning

The model learns from labeled training data. Examples:

Classification (spam/not spam, cat/dog)
Regression (predict house price, stock value)

Unsupervised Learning

The model finds patterns in unlabeled data. Examples:

Clustering (customer segments, topic modeling)
Dimensionality reduction (PCA, t-SNE)

Reinforcement Learning

An agent learns by taking actions and receiving rewards. Examples:

Game playing (chess, Go, video games)
Robotics, autonomous driving

The ML Workflow

Define the problem
Collect data
Explore and preprocess data (EDA)
Choose and train a model
Evaluate the model
Deploy and monitor

Example

python

# The classic ML example with scikit-learn
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report

# 1. Load data
iris = load_iris()
X, y = iris.data, iris.target
print(f"Dataset shape: {X.shape}")  # (150, 4)
print(f"Classes: {iris.target_names}")

# 2. Split into train/test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# 3. Preprocess (scale features)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)  # use same scaler!

# 4. Train model
model = KNeighborsClassifier(n_neighbors=5)
model.fit(X_train, y_train)

# 5. Evaluate
y_pred = model.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, y_pred):.2f}")
print(classification_report(y_test, y_pred, target_names=iris.target_names))

# 6. Make predictions
sample = [[5.1, 3.5, 1.4, 0.2]]
sample_scaled = scaler.transform(sample)
prediction = model.predict(sample_scaled)
print(f"Predicted: {iris.target_names[prediction[0]]}")

Try it yourself — PYTHON

# The classic ML example with scikit-learn
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report

# 1. Load data
iris = load_iris()
X, y = iris.data, iris.target
print(f"Dataset shape: {X.shape}")  # (150, 4)
print(f"Classes: {iris.target_names}")

# 2. Split into train/test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# 3. Preprocess (scale features)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)  # use same scaler!

# 4. Train model
model = KNeighborsClassifier(n_neighbors=5)
model.fit(X_train, y_train)

# 5. Evaluate
y_pred = model.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, y_pred):.2f}")
print(classification_report(y_test, y_pred, target_names=iris.target_names))

# 6. Make predictions
sample = [[5.1, 3.5, 1.4, 0.2]]
sample_scaled = scaler.transform(sample)
prediction = model.predict(sample_scaled)
print(f"Predicted: {iris.target_names[prediction[0]]}")