Supervised Learning
Classification
Learn classification algorithms from logistic regression to random forests.
Classification
Classification predicts which category an input belongs to.
Binary: Two classes (yes/no, spam/ham)
Multi-class: Three or more classes (cat/dog/bird)
Common Algorithms
- Logistic Regression: Fast, interpretable, good baseline
- Decision Tree: Interpretable, prone to overfitting
- Random Forest: Ensemble of trees, robust
- SVM: Great for high-dimensional data
- K-Nearest Neighbors: Simple, no training phase
- Gradient Boosting (XGBoost, LightGBM): Often best for tabular data
Evaluation Metrics
- Accuracy: % correct predictions (misleading for imbalanced classes)
- Precision: Of predicted positives, how many are actually positive
- Recall: Of actual positives, how many did we catch
- F1 Score: Harmonic mean of precision and recall
- ROC-AUC: Model's ability to discriminate between classes
Example
python
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import (
accuracy_score, precision_score, recall_score,
f1_score, confusion_matrix, roc_auc_score
)
from sklearn.datasets import make_classification
# Generate classification dataset
X, y = make_classification(
n_samples=1000, n_features=20, n_informative=10,
n_classes=2, random_state=42
)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Compare algorithms
models = {
'Logistic Regression': LogisticRegression(),
'Random Forest': RandomForestClassifier(n_estimators=100),
'Gradient Boosting': GradientBoostingClassifier(n_estimators=100),
}
for name, model in models.items():
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
y_prob = model.predict_proba(X_test)[:, 1]
print(f"\n{name}:")
print(f" Accuracy: {accuracy_score(y_test, y_pred):.3f}")
print(f" Precision: {precision_score(y_test, y_pred):.3f}")
print(f" Recall: {recall_score(y_test, y_pred):.3f}")
print(f" F1: {f1_score(y_test, y_pred):.3f}")
print(f" AUC-ROC: {roc_auc_score(y_test, y_prob):.3f}")
# Feature importance from Random Forest
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)
importances = rf.feature_importances_
sorted_idx = importances.argsort()[::-1]
for i in range(5):
print(f"Feature {sorted_idx[i]}: {importances[sorted_idx[i]]:.3f}")Try it yourself — PYTHON