Phase 2 • EduArtha

Programming & Software Engineering

Python is the language of AI. You must also understand how to write efficient, scalable code. This book covers Python mastery, scientific computing, software engineering practices, and hardware fundamentals.

⏱ 2–4 months | 13 Chapters | 50+ Exercises

Part I

Python Mastery

Core language skills every AI engineer needs

Chapter 1

Data Structures & Algorithms

Learning Objectives

Choose the right data structure for each problem (lists, dicts, sets, tuples)
Implement stacks, queues, and linked lists
Understand Big-O notation and analyze algorithm complexity
Implement binary search, merge sort, and quicksort

Built-in Data Structures

Structure	Ordered	Mutable	Duplicates	Lookup	Best For
List	✓	✓	✓	O(n)	Ordered collections
Tuple	✓	✗	✓	O(n)	Immutable records
Set	✗	✓	✗	O(1)	Membership testing
Dict	✓*	✓	Keys: ✗	O(1)	Key-value mapping

Python
# Performance comparison — why choosing right structure matters
import time

data_list = list(range(1_000_000))
data_set = set(data_list)

# Searching for an element
target = 999_999

start = time.time()
_ = target in data_list   # O(n) — scans every element
print(f"List: {time.time()-start:.6f}s")

start = time.time()
_ = target in data_set    # O(1) — hash lookup
print(f"Set:  {time.time()-start:.6f}s")
# Set is ~1000x faster for membership testing!

Big-O Notation

Notation	Name	Example	1M items
O(1)	Constant	Dict lookup	1 op
O(log n)	Logarithmic	Binary search	20 ops
O(n)	Linear	List scan	1M ops
O(n log n)	Linearithmic	Merge sort	20M ops
O(n²)	Quadratic	Bubble sort	1T ops

Searching & Sorting

Python
# Binary Search — O(log n)
def binary_search(arr, target):
    left, right = 0, len(arr) - 1
    while left <= right:
        mid = (left + right) // 2
        if arr[mid] == target:
            return mid
        elif arr[mid] < target:
            left = mid + 1
        else:
            right = mid - 1
    return -1

# Quick Sort — O(n log n) average
def quicksort(arr):
    if len(arr) <= 1:
        return arr
    pivot = arr[len(arr) // 2]
    left = [x for x in arr if x < pivot]
    middle = [x for x in arr if x == pivot]
    right = [x for x in arr if x > pivot]
    return quicksort(left) + middle + quicksort(right)

print(quicksort([38, 27, 43, 3, 9, 82, 10]))

# Stack implementation
class Stack:
    def __init__(self): self.items = []
    def push(self, item): self.items.append(item)
    def pop(self): return self.items.pop()
    def peek(self): return self.items[-1]
    def is_empty(self): return len(self.items) == 0

Project: Task Scheduler with Priority Queue

Python
import heapq

class TaskScheduler:
    def __init__(self):
        self.heap = []
        self.counter = 0

    def add_task(self, task, priority):
        heapq.heappush(self.heap, (priority, self.counter, task))
        self.counter += 1

    def get_next(self):
        if self.heap:
            priority, _, task = heapq.heappop(self.heap)
            return task
        return None

scheduler = TaskScheduler()
scheduler.add_task("Fix critical bug", 1)
scheduler.add_task("Write docs", 5)
scheduler.add_task("Deploy to prod", 2)
scheduler.add_task("Code review", 3)

while (task := scheduler.get_next()):
    print(f"Executing: {task}")

Exercises

Exercise 1.1: Implement merge sort and explain its time complexity

def merge_sort(arr):
    if len(arr) <= 1: return arr
    mid = len(arr) // 2
    left = merge_sort(arr[:mid])
    right = merge_sort(arr[mid:])
    return merge(left, right)

def merge(left, right):
    result, i, j = [], 0, 0
    while i < len(left) and j < len(right):
        if left[i] <= right[j]:
            result.append(left[i]); i += 1
        else:
            result.append(right[j]); j += 1
    result.extend(left[i:]); result.extend(right[j:])
    return result

Time: O(n log n) always. Space: O(n). Divides array in half each time (log n levels), merges n elements at each level.

Exercise 1.2: When would you use a dict over a list?

Use dict when you need fast O(1) key-based lookup, counting occurrences, or mapping relationships. Use list when you need ordered elements, indexed access, or iteration in sequence. Example: counting word frequencies → dict. Storing sorted scores → list.

Exercise 1.3: Implement a queue using two stacks

class QueueFromStacks:
    def __init__(self):
        self.in_stack = []
        self.out_stack = []
    def enqueue(self, item):
        self.in_stack.append(item)
    def dequeue(self):
        if not self.out_stack:
            while self.in_stack:
                self.out_stack.append(self.in_stack.pop())
        return self.out_stack.pop()

Exercise 1.4: What is the time complexity of checking if an element exists in a list vs a set?

List: O(n) — must scan linearly. Set: O(1) amortized — uses hash table. For 1M elements, list takes ~1M comparisons, set takes ~1. Always use sets for membership tests.

Chapter Summary

Choose data structures by access pattern: O(1) lookup → dict/set, ordered → list
Binary search (O(log n)) requires sorted data; quicksort/mergesort are O(n log n)
Big-O describes worst-case growth rate — crucial for scalable code
Stacks (LIFO) and queues (FIFO) solve specific ordering problems

Chapter 2

Object-Oriented Programming

Learning Objectives

Design classes with encapsulation, inheritance, and polymorphism
Implement magic/dunder methods for Pythonic objects
Apply abstract classes and design patterns
Build a complete OOP project

Classes & Objects

Python
class NeuralLayer:
    def __init__(self, input_size, output_size, activation='relu'):
        self.weights = [[0.0] * input_size for _ in range(output_size)]
        self.bias = [0.0] * output_size
        self.activation = activation
        self._name = f"Layer({input_size}→{output_size})"  # private

    def __repr__(self):
        return f"NeuralLayer({self._name}, act={self.activation})"

    def __len__(self):
        return len(self.weights)

    def param_count(self):
        return len(self.weights) * len(self.weights[0]) + len(self.bias)

layer = NeuralLayer(128, 64)
print(layer)              # NeuralLayer(Layer(128→64), act=relu)
print(len(layer))         # 64
print(layer.param_count()) # 8256

Inheritance & Polymorphism

Python
from abc import ABC, abstractmethod

class Shape(ABC):
    @abstractmethod
    def area(self): pass

    @abstractmethod
    def perimeter(self): pass

    def describe(self):
        return f"{self.__class__.__name__}: area={self.area():.2f}"

class Circle(Shape):
    def __init__(self, radius):
        self.radius = radius
    def area(self): return 3.14159 * self.radius ** 2
    def perimeter(self): return 2 * 3.14159 * self.radius

class Rectangle(Shape):
    def __init__(self, w, h):
        self.w, self.h = w, h
    def area(self): return self.w * self.h
    def perimeter(self): return 2 * (self.w + self.h)

# Polymorphism — same interface, different behavior
shapes = [Circle(5), Rectangle(4, 6), Circle(3)]
for s in shapes:
    print(s.describe())

Design Patterns

Python
# Singleton — only one instance ever
class DatabaseConnection:
    _instance = None
    def __new__(cls):
        if cls._instance is None:
            cls._instance = super().__new__(cls)
        return cls._instance

# Factory — create objects without specifying exact class
class ModelFactory:
    @staticmethod
    def create(model_type):
        models = {'linear': LinearModel, 'tree': TreeModel}
        return models[model_type]()

Project: Library Management System

Python
from datetime import datetime, timedelta

class Book:
    def __init__(self, title, author, isbn):
        self.title, self.author, self.isbn = title, author, isbn
        self.is_available = True
    def __str__(self): return f"'{self.title}' by {self.author}"

class Member:
    def __init__(self, name, member_id):
        self.name, self.id = name, member_id
        self.borrowed = []

class Library:
    def __init__(self, name):
        self.name = name
        self.books, self.members, self.loans = {}, {}, []

    def add_book(self, book):
        self.books[book.isbn] = book

    def borrow(self, isbn, member_id):
        book = self.books.get(isbn)
        member = self.members.get(member_id)
        if book and member and book.is_available:
            book.is_available = False
            due = datetime.now() + timedelta(days=14)
            self.loans.append({'book': book, 'member': member, 'due': due})
            member.borrowed.append(book)
            print(f"✓ {member.name} borrowed {book}. Due: {due:%Y-%m-%d}")

    def return_book(self, isbn):
        book = self.books.get(isbn)
        if book:
            book.is_available = True
            print(f"✓ {book} returned")

lib = Library("City Library")
lib.add_book(Book("Deep Learning", "Goodfellow", "978-0"))
lib.members["M1"] = Member("Alice", "M1")
lib.borrow("978-0", "M1")

Exercises

Exercise 2.1: Implement __add__ and __eq__ for a Vector2D class

class Vector2D:
    def __init__(self, x, y): self.x, self.y = x, y
    def __add__(self, other): return Vector2D(self.x+other.x, self.y+other.y)
    def __eq__(self, other): return self.x==other.x and self.y==other.y
    def __repr__(self): return f"Vec({self.x},{self.y})"

v = Vector2D(1,2) + Vector2D(3,4)  # Vec(4,6)

Exercise 2.2: What is the difference between @staticmethod and @classmethod?

@staticmethod: No access to class or instance. Just a function namespaced inside the class. def method():

@classmethod: Receives the class as first arg (cls). Can access/modify class state. Used for alternative constructors like Date.from_string("2024-01-15").

Exercise 2.3: Why is composition often preferred over inheritance?

"Favor composition over inheritance" — instead of Car extends Engine (a car IS an engine? No!), use Car has-a Engine. Composition is more flexible: you can swap components at runtime, avoid deep inheritance hierarchies, and follow the Single Responsibility Principle.

Chapter Summary

Classes encapsulate data + behavior; __init__ initializes state
Inheritance enables code reuse; ABC enforces interfaces
Dunder methods (__str__, __add__, __len__) make objects behave like built-ins
Design patterns (Singleton, Factory) solve recurring OOP problems

Chapter 3

Functional Programming Concepts

Learning Objectives

Use higher-order functions: map, filter, reduce, and lambdas
Build closures and understand their practical applications
Leverage functools for memoization and partial application
Write clean data processing pipelines in functional style

First-Class Functions & Lambdas

Python
# Functions are objects — can be passed, returned, stored
def apply_twice(func, value):
    return func(func(value))

print(apply_twice(lambda x: x * 2, 3))  # 12
print(apply_twice(lambda x: x + 10, 5)) # 25

# map, filter, reduce
nums = [1, 2, 3, 4, 5, 6, 7, 8]
squares = list(map(lambda x: x**2, nums))         # [1,4,9,16,25,36,49,64]
evens = list(filter(lambda x: x%2==0, nums))     # [2,4,6,8]

from functools import reduce
total = reduce(lambda a, b: a + b, nums)           # 36

# List comprehension alternative (more Pythonic)
squares = [x**2 for x in nums]
evens = [x for x in nums if x % 2 == 0]

Closures & functools

Python
# Closure — function that remembers its enclosing scope
def make_multiplier(factor):
    def multiply(x):
        return x * factor  # 'factor' is captured from outer scope
    return multiply

double = make_multiplier(2)
triple = make_multiplier(3)
print(double(5))  # 10
print(triple(5))  # 15

# Memoization with lru_cache
from functools import lru_cache

@lru_cache(maxsize=None)
def fibonacci(n):
    if n < 2: return n
    return fibonacci(n-1) + fibonacci(n-2)

print(fibonacci(100))  # Instant! Without cache: years

# Partial application
from functools import partial
import json

pretty_json = partial(json.dumps, indent=2, sort_keys=True)
print(pretty_json({"name": "AI", "type": "ML"}))

Project: Functional Data Pipeline

Python
from functools import reduce

# Pipeline: compose functions left-to-right
def pipeline(*funcs):
    return lambda x: reduce(lambda v, f: f(v), funcs, x)

# Data processing steps
clean = lambda data: [s.strip().lower() for s in data]
remove_empty = lambda data: [s for s in data if s]
remove_dupes = lambda data: list(dict.fromkeys(data))
sort_alpha = lambda data: sorted(data)

process = pipeline(clean, remove_empty, remove_dupes, sort_alpha)

raw = ["  Python ", "  java", "", "PYTHON", "  Go  ", "java"]
result = process(raw)
print(result)  # ['go', 'java', 'python']

Exercises

Exercise 3.1: Rewrite this loop using map and filter: [x**2 for x in range(20) if x % 3 == 0]

result = list(map(lambda x: x**2, filter(lambda x: x%3==0, range(20))))
# [0, 9, 36, 81, 144, 225, 324]
# List comprehension is more Pythonic, but map/filter is more composable

Exercise 3.2: Implement a memoized factorial using lru_cache

@lru_cache(maxsize=None)
def factorial(n):
    if n <= 1: return 1
    return n * factorial(n - 1)
print(factorial(100))  # Computed instantly with caching

Exercise 3.3: What is the advantage of closures over global variables?

Closures encapsulate state without polluting global scope. Each closure gets its own private copy of captured variables. They are thread-safe (no shared mutable state), testable (no hidden dependencies), and composable. Global variables create hidden coupling and make debugging harder.

Chapter Summary

Functions are first-class objects — pass, return, and store them
map/filter/reduce enable declarative data transformation
Closures capture enclosing scope — great for factories and callbacks
lru_cache provides free memoization for recursive functions

Chapter 4

Memory Management & Generators

Learning Objectives

Understand Python's memory model: reference counting and garbage collection
Build generators with yield for memory-efficient data processing
Use itertools for powerful lazy iteration patterns
Optimize memory with __slots__ and generator expressions

Generators: Lazy Evaluation

Python
import sys

# List vs Generator — memory comparison
big_list = [x**2 for x in range(1_000_000)]
big_gen = (x**2 for x in range(1_000_000))

print(f"List:      {sys.getsizeof(big_list):>10,} bytes")  # ~8.5 MB
print(f"Generator: {sys.getsizeof(big_gen):>10,} bytes")   # ~200 bytes!

# Custom generator with yield
def read_large_file(filepath):
    with open(filepath) as f:
        for line in f:
            yield line.strip()  # One line at a time, not whole file

# itertools — lazy iteration toolkit
from itertools import chain, islice, count, cycle

# Chain multiple iterables
combined = chain([1,2], [3,4], [5,6])  # 1,2,3,4,5,6

# Take first N from infinite generator
first_10_evens = list(islice((x for x in count() if x%2==0), 10))

# __slots__ — reduce memory per instance
class PointSlots:
    __slots__ = ['x', 'y']
    def __init__(self, x, y): self.x, self.y = x, y
# Uses ~40% less memory than regular class with __dict__

Project: Process 1M-Row CSV with Generators

Python
import csv

def read_csv_rows(filepath):
    with open(filepath) as f:
        reader = csv.DictReader(f)
        for row in reader:
            yield row

def filter_active(rows):
    for row in rows:
        if row.get('status') == 'active':
            yield row

def extract_emails(rows):
    for row in rows:
        yield row['email']

# Pipeline — processes 1M rows using constant memory!
# rows = read_csv_rows('users_1M.csv')
# active = filter_active(rows)
# emails = extract_emails(active)
# for email in emails:
#     send_newsletter(email)

Exercises

Exercise 4.1: Write a generator that yields Fibonacci numbers infinitely

def fibonacci():
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a + b

from itertools import islice
print(list(islice(fibonacci(), 10)))  # [0,1,1,2,3,5,8,13,21,34]

Exercise 4.2: When would you use __slots__?

Use __slots__ when creating millions of instances of the same class (e.g., nodes in a graph, particles in simulation). It eliminates per-instance __dict__, saving ~40% memory. Don't use it for classes with few instances or when you need dynamic attributes.

Exercise 4.3: What is the difference between yield and return?

return exits the function permanently and returns a value. yield pauses the function, returns a value, and remembers its state — next call resumes from where it paused. yield turns a function into a generator that produces values lazily, one at a time.

Chapter Summary

Generators use yield for lazy evaluation — constant memory regardless of data size
Generator expressions (x for x in ...) are memory-efficient alternatives to list comprehensions
itertools provides powerful lazy tools: chain, islice, product, combinations
__slots__ reduces memory for classes with many instances

Chapter 5

Decorators & Context Managers

Learning Objectives

Build decorators from scratch and understand the decorator pattern
Use @property, @staticmethod, @classmethod effectively
Create context managers with __enter__/__exit__ and contextlib
Build a reusable timing and logging framework

Building Decorators

Python
import time
from functools import wraps

# Timer decorator
def timer(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        start = time.perf_counter()
        result = func(*args, **kwargs)
        elapsed = time.perf_counter() - start
        print(f"⏱ {func.__name__} took {elapsed:.4f}s")
        return result
    return wrapper

# Retry decorator with arguments
def retry(max_attempts=3, delay=1):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            for attempt in range(max_attempts):
                try:
                    return func(*args, **kwargs)
                except Exception as e:
                    print(f"Attempt {attempt+1} failed: {e}")
                    time.sleep(delay)
            raise Exception(f"Failed after {max_attempts} attempts")
        return wrapper
    return decorator

@timer
@retry(max_attempts=3)
def fetch_data(url):
    print(f"Fetching {url}...")
    return {"status": "ok"}

Context Managers

Python
# Class-based context manager
class DatabaseTransaction:
    def __enter__(self):
        print("BEGIN TRANSACTION")
        return self
    def __exit__(self, exc_type, exc_val, exc_tb):
        if exc_type:
            print(f"ROLLBACK — Error: {exc_val}")
        else:
            print("COMMIT")
        return False

# contextlib — simpler decorator-based approach
from contextlib import contextmanager

@contextmanager
def temp_directory():
    import tempfile, shutil
    dirpath = tempfile.mkdtemp()
    try:
        yield dirpath
    finally:
        shutil.rmtree(dirpath)

with temp_directory() as tmpdir:
    print(f"Working in {tmpdir}")
# Directory is automatically cleaned up

Project: Timing & Logging Framework

Python
import time, logging
from functools import wraps

logging.basicConfig(level=logging.INFO)

def log_and_time(logger=None):
    def decorator(func):
        log = logger or logging.getLogger(func.__module__)
        @wraps(func)
        def wrapper(*args, **kwargs):
            log.info(f"▶ {func.__name__} started")
            start = time.perf_counter()
            try:
                result = func(*args, **kwargs)
                elapsed = time.perf_counter() - start
                log.info(f"✓ {func.__name__} completed in {elapsed:.3f}s")
                return result
            except Exception as e:
                elapsed = time.perf_counter() - start
                log.error(f"✗ {func.__name__} failed after {elapsed:.3f}s: {e}")
                raise
        return wrapper
    return decorator

@log_and_time()
def train_model(epochs):
    time.sleep(0.5)
    return {"accuracy": 0.95}

train_model(10)

Exercises

Exercise 5.1: Write a decorator that caches results in a dictionary (manual memoize)

def memoize(func):
    cache = {}
    @wraps(func)
    def wrapper(*args):
        if args not in cache:
            cache[args] = func(*args)
        return cache[args]
    wrapper.cache = cache
    return wrapper

Exercise 5.2: What does functools.wraps do and why is it important?

Without @wraps, the decorated function loses its original __name__, __doc__, and __module__. help(func) would show "wrapper" instead of the real function name. @wraps copies these attributes from the original function to the wrapper, preserving introspection and debugging ability.

Exercise 5.3: Build a context manager that suppresses specific exceptions

@contextmanager
def suppress(*exceptions):
    try:
        yield
    except exceptions:
        pass

with suppress(FileNotFoundError):
    open('nonexistent.txt')  # Silently ignored

Chapter Summary

Decorators wrap functions to add behavior: timing, logging, caching, retrying
Always use @functools.wraps to preserve function metadata
Context managers (with statement) ensure cleanup: files, connections, locks
contextlib.contextmanager simplifies context manager creation with yield

Part II

Scientific Computing

The data science toolkit

Chapter 6

NumPy — Array Operations

Learning Objectives

Create and manipulate NumPy arrays efficiently
Master broadcasting, vectorization, and boolean masking
Perform linear algebra operations for ML
Benchmark NumPy vs pure Python performance

Array Creation & Indexing

Python
import numpy as np

# Creation
a = np.array([1,2,3,4,5])
zeros = np.zeros((3,4))
rand = np.random.randn(3,3)
grid = np.arange(0, 100, 5)
space = np.linspace(0, 1, 50)

# Boolean masking — powerful filtering
data = np.random.randn(1000)
positives = data[data > 0]        # All positive values
outliers = data[np.abs(data) > 2]  # Beyond 2 std devs

# Vectorized operations — 100x faster than loops
import time
size = 1_000_000
a = np.random.randn(size)

start = time.time()
result_loop = [x**2 + 2*x + 1 for x in a]
print(f"Loop:  {time.time()-start:.3f}s")

start = time.time()
result_np = a**2 + 2*a + 1
print(f"NumPy: {time.time()-start:.3f}s")  # ~50-100x faster

Broadcasting & Linear Algebra

Python
# Broadcasting: (3,3) + (3,1) → auto-expands
matrix = np.ones((3,3))
col_vec = np.array([[10],[20],[30]])
result = matrix + col_vec  # Each row gets different offset

# Linear algebra for ML
X = np.random.randn(100, 3)
w = np.random.randn(3)
y = X @ w  # Matrix-vector multiplication (predictions)

# Normal equation: w = (XᵀX)⁻¹Xᵀy
X_b = np.c_[np.ones((100,1)), X]  # Add bias column
w_optimal = np.linalg.inv(X_b.T @ X_b) @ X_b.T @ y

Project: Linear Regression from Scratch with NumPy

Python
import numpy as np

# Generate data: y = 3x₁ + 5x₂ + 7 + noise
np.random.seed(42)
X = np.random.randn(200, 2)
y = 3*X[:,0] + 5*X[:,1] + 7 + np.random.randn(200)*0.5

# Add bias, split data
X_b = np.c_[np.ones((200,1)), X]
X_train, X_test = X_b[:160], X_b[160:]
y_train, y_test = y[:160], y[160:]

# Gradient descent
w = np.zeros(3)
lr, epochs = 0.01, 1000
for i in range(epochs):
    preds = X_train @ w
    error = preds - y_train
    gradient = (2/len(y_train)) * X_train.T @ error
    w -= lr * gradient

print(f"Learned weights: bias={w[0]:.2f}, w1={w[1]:.2f}, w2={w[2]:.2f}")
# Expected: ~7, ~3, ~5
rmse = np.sqrt(np.mean((X_test @ w - y_test)**2))
print(f"Test RMSE: {rmse:.4f}")

Exercises

Exercise 6.1: Normalize a matrix so each column has mean=0 and std=1

X = np.random.randn(100, 5) * 10 + 50
X_norm = (X - X.mean(axis=0)) / X.std(axis=0)
print(X_norm.mean(axis=0).round(10))  # ~[0, 0, 0, 0, 0]
print(X_norm.std(axis=0).round(10))   # ~[1, 1, 1, 1, 1]

Exercise 6.2: Why is np.dot(a,b) faster than sum(a[i]*b[i] for i in range(n))?

NumPy uses compiled C/Fortran code (BLAS libraries) that operates on contiguous memory blocks with CPU SIMD instructions. Python loops have interpreter overhead per iteration, dynamic type checking, and poor cache utilization. For 1M elements, NumPy can be 100-500x faster.

Exercise 6.3: Explain broadcasting rules with an example of incompatible shapes

Rules: dimensions are compared right-to-left. Each must be equal OR one must be 1. (3,4) + (4,) works: (3,4)+(1,4)→(3,4). (3,4) + (3,) FAILS: 4≠3 and neither is 1. Fix: reshape to (3,1) to broadcast across columns.

Chapter Summary

NumPy arrays are 100x faster than Python lists for numerical operations
Broadcasting auto-expands dimensions for element-wise operations
Boolean masking enables powerful data filtering without loops
Linear algebra functions (dot, inv, eig) are essential for ML implementations

Chapter 7

Pandas — Data Wrangling

Learning Objectives

Create, explore, and manipulate DataFrames
Handle missing data, merge datasets, and use groupby
Apply transformations with apply(), map(), and lambda
Perform a complete data analysis project

Python
import pandas as pd
import numpy as np

# Create from dict
df = pd.DataFrame({
    'name': ['Alice','Bob','Charlie','Diana','Eve'],
    'dept': ['ML','Web','ML','Data','Web'],
    'salary': [95000,82000,105000,78000,88000],
    'exp_years': [5, 3, 8, 2, 4]
})

# Selecting & filtering
ml_team = df[df['dept'] == 'ML']
senior = df.query('exp_years >= 5 and salary > 90000')

# GroupBy — split-apply-combine
dept_stats = df.groupby('dept').agg(
    avg_salary=('salary', 'mean'),
    headcount=('name', 'count'),
    max_exp=('exp_years', 'max')
)

# Missing data handling
df.loc[1, 'salary'] = np.nan
df['salary'] = df['salary'].fillna(df['salary'].median())

# Apply custom function
df['tax_bracket'] = df['salary'].apply(
    lambda s: 'High' if s > 90000 else 'Standard'
)

Project: Titanic Survival Analysis

Python
df = pd.read_csv('https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv')

# Exploration
print(df.shape)                  # (891, 12)
print(df['Survived'].value_counts())
print(df.isnull().sum())          # Age: 177, Cabin: 687 missing

# Survival rate by class and gender
survival = df.groupby(['Pclass', 'Sex'])['Survived'].mean()
print(survival.round(2))
# 1st class females: 97% survived, 3rd class males: 14%

# Feature engineering
df['Age'] = df['Age'].fillna(df['Age'].median())
df['FamilySize'] = df['SibSp'] + df['Parch'] + 1
df['IsAlone'] = (df['FamilySize'] == 1).astype(int)

Exercises

Exercise 7.1: Merge two DataFrames on a common column

orders = pd.DataFrame({'id':[1,2,3], 'product':['A','B','A']})
prices = pd.DataFrame({'product':['A','B'], 'price':[100,200]})
merged = orders.merge(prices, on='product')
print(merged)

Exercise 7.2: Find the top 3 departments by average salary using groupby

top3 = df.groupby('dept')['salary'].mean().nlargest(3)

Exercise 7.3: What is the difference between loc and iloc?

loc: label-based indexing — df.loc[0:5, 'name':'salary'] includes endpoint. iloc: integer-based indexing — df.iloc[0:5, 0:3] excludes endpoint (like Python slicing). Use loc with column names, iloc with column positions.

Chapter Summary

Pandas DataFrames are the standard for tabular data manipulation in Python
GroupBy enables split-apply-combine analysis patterns
Handle missing data with dropna/fillna before modeling
merge/join combines datasets; apply transforms columns with custom logic

Chapter 8

Matplotlib & Seaborn — Visualization

Learning Objectives

Create publication-quality plots with Matplotlib
Build statistical visualizations with Seaborn
Design multi-panel dashboards with subplots
Customize colors, labels, annotations, and themes

Python
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# Multi-panel dashboard
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# 1. Line plot
x = np.linspace(0, 10, 100)
axes[0,0].plot(x, np.sin(x), '#6366f1', linewidth=2, label='sin')
axes[0,0].plot(x, np.cos(x), '#f59e0b', linewidth=2, label='cos')
axes[0,0].legend(); axes[0,0].set_title('Trigonometric Functions')

# 2. Histogram
data = np.random.normal(0, 1, 1000)
axes[0,1].hist(data, bins=30, color='#10b981', edgecolor='white')
axes[0,1].set_title('Normal Distribution')

# 3. Scatter
x = np.random.randn(100)
y = 2*x + np.random.randn(100)*0.5
axes[1,0].scatter(x, y, alpha=0.7, c='#ec4899')
axes[1,0].set_title('Correlation')

# 4. Bar chart
categories = ['Python', 'JS', 'Java', 'C++']
values = [85, 72, 68, 55]
axes[1,1].barh(categories, values, color='#6366f1')
axes[1,1].set_title('Language Popularity')

plt.tight_layout()
plt.savefig('dashboard.png', dpi=150)

# Seaborn — statistical plots
sns.set_theme(style='whitegrid')
tips = sns.load_dataset('tips')
sns.boxplot(data=tips, x='day', y='total_bill', hue='time')
plt.title('Bill Distribution by Day')

Exercises

Exercise 8.1: Create a heatmap of a correlation matrix using Seaborn

df = sns.load_dataset('iris')
corr = df.select_dtypes(include='number').corr()
sns.heatmap(corr, annot=True, cmap='coolwarm', center=0)

Exercise 8.2: When would you use a violin plot instead of a box plot?

Violin plots show the full distribution shape (kernel density), while box plots only show quartiles. Use violin when you care about multimodality (e.g., bimodal distributions that box plots miss). Box plots are better for comparing medians across many groups quickly.

Exercise 8.3: How do you add annotations to highlight key data points?

plt.annotate('Peak', xy=(5, 100), xytext=(6, 120),
            arrowprops=dict(arrowstyle='->', color='red'),
            fontsize=12, color='red')

Chapter Summary

Matplotlib gives full control with figure/axes architecture
Seaborn provides high-level statistical plots with beautiful defaults
Always label axes, add titles, and use tight_layout for clean plots
Save with dpi=150+ for publication quality

Chapter 9

SciPy & Jupyter Notebooks

Learning Objectives

Use scipy.optimize for function minimization and curve fitting
Perform statistical tests with scipy.stats
Master Jupyter Notebook best practices and magic commands
Conduct an A/B test analysis

Python
from scipy import optimize, stats
import numpy as np

# Curve fitting — find best parameters
def model(x, a, b, c):
    return a * np.exp(-b * x) + c

x_data = np.linspace(0, 4, 50)
y_data = model(x_data, 2.5, 1.3, 0.5) + np.random.normal(0, 0.1, 50)

params, cov = optimize.curve_fit(model, x_data, y_data)
print(f"Fitted: a={params[0]:.2f}, b={params[1]:.2f}, c={params[2]:.2f}")

# T-test: are two groups statistically different?
group_a = np.random.normal(100, 10, 50)
group_b = np.random.normal(105, 10, 50)
t_stat, p_value = stats.ttest_ind(group_a, group_b)
print(f"t={t_stat:.3f}, p={p_value:.4f}")
print("Significant!" if p_value < 0.05 else "Not significant")

Jupyter Magic Commands

%timeit — benchmark code execution time. %matplotlib inline — show plots in notebook. %%writefile — save cell to file. %who — list variables. !pip install — run shell commands. %load_ext autoreload — auto-reload changed modules.

Project: A/B Test Analysis

Python
import numpy as np
from scipy import stats

# Control: old button, Treatment: new button
np.random.seed(42)
control_clicks = np.random.binomial(1, 0.12, 1000)   # 12% CTR
treatment_clicks = np.random.binomial(1, 0.15, 1000) # 15% CTR

print(f"Control CTR:   {control_clicks.mean():.1%}")
print(f"Treatment CTR: {treatment_clicks.mean():.1%}")

# Chi-squared test for proportions
from scipy.stats import chi2_contingency
table = np.array([[control_clicks.sum(), 1000-control_clicks.sum()],
                  [treatment_clicks.sum(), 1000-treatment_clicks.sum()]])
chi2, p, dof, expected = chi2_contingency(table)
print(f"\nChi² = {chi2:.3f}, p-value = {p:.4f}")
print("→ Deploy new button!" if p < 0.05 else "→ Keep old button")

Exercises

Exercise 9.1: Minimize f(x) = (x-3)² + 2 using scipy.optimize

result = optimize.minimize(lambda x: (x-3)**2+2, x0=0)
print(f"Minimum at x={result.x[0]:.4f}, f(x)={result.fun:.4f}")

Exercise 9.2: What does a p-value of 0.03 mean?

There's a 3% probability of observing results this extreme if the null hypothesis (no difference) were true. Since 0.03 < 0.05, we reject the null hypothesis and conclude the difference is statistically significant. Note: p-value does NOT tell you the magnitude of the effect — use effect size for that.

Chapter Summary

SciPy extends NumPy with optimization, statistics, and signal processing
curve_fit finds optimal parameters for any model function
t-tests and chi-squared tests determine statistical significance
Jupyter notebooks are the standard interactive environment for data science

Part III

Software Engineering

Building production-ready code

Chapter 10

Version Control with Git

Learning Objectives

Master core Git commands: init, add, commit, branch, merge
Work with remote repositories: clone, push, pull
Handle merge conflicts and use feature branch workflows
Write good commit messages and maintain .gitignore

Bash
# Initialize & first commit
git init
git add .
git commit -m "Initial commit: project structure"

# Branching workflow
git checkout -b feature/add-login      # Create & switch
# ... make changes ...
git add -A
git commit -m "feat: add user login endpoint"
git checkout main
git merge feature/add-login            # Merge feature into main
git branch -d feature/add-login        # Clean up branch

# Working with remotes
git remote add origin https://github.com/user/repo.git
git push -u origin main
git pull origin main                   # Fetch + merge

# Undo mistakes
git stash                              # Save uncommitted changes
git reset --soft HEAD~1                # Undo last commit, keep changes
git log --oneline --graph -10          # Visual history

Commit Message Convention

feat: new feature, fix: bug fix, docs: documentation, refactor: code restructuring, test: adding tests, chore: maintenance. Example: feat: add batch prediction endpoint with caching

Exercises

Exercise 10.1: What is the difference between git merge and git rebase?

Merge creates a new "merge commit" with two parents, preserving full history. Rebase replays your commits on top of the target branch, creating a linear history. Rebase is cleaner but rewrites history — never rebase shared branches. Use merge for team branches, rebase for local cleanup.

Exercise 10.2: How do you resolve a merge conflict?

Git marks conflicts with <<<< HEAD, ====, >>>> markers. Open the file, choose the correct code (or combine both), remove markers, then git add and git commit. Use git diff to review, and test before committing.

Exercise 10.3: Write a .gitignore for a Python ML project

__pycache__/
*.pyc
.env
*.sqlite
node_modules/
dist/
*.egg-info/
.ipynb_checkpoints/
data/*.csv
models/*.pkl
wandb/

Chapter Summary

Git tracks every change — you can always undo mistakes
Feature branches isolate work; merge integrates it
Good commit messages document project evolution
.gitignore prevents secrets and large files from being tracked

Chapter 11

Testing & Debugging

Learning Objectives

Write unit tests with pytest and understand test types
Use fixtures, parametrize, and measure code coverage
Debug with pdb, breakpoints, and logging
Build a fully tested module

Python
# calculator.py
def add(a, b): return a + b
def divide(a, b):
    if b == 0: raise ValueError("Cannot divide by zero")
    return a / b

# test_calculator.py
import pytest
from calculator import add, divide

def test_add():
    assert add(2, 3) == 5
    assert add(-1, 1) == 0
    assert add(0, 0) == 0

def test_divide_by_zero():
    with pytest.raises(ValueError):
        divide(10, 0)

# Parametrize — test multiple inputs at once
@pytest.mark.parametrize("a,b,expected", [
    (10, 2, 5), (9, 3, 3), (7, 2, 3.5)
])
def test_divide(a, b, expected):
    assert divide(a, b) == expected

# Fixtures — shared setup
@pytest.fixture
def sample_data():
    return [1, 2, 3, 4, 5]

def test_sum(sample_data):
    assert sum(sample_data) == 15

Bash
# Run tests with coverage
pytest test_calculator.py -v --cov=calculator --cov-report=term-missing

Debugging & Logging

Python
import logging

logging.basicConfig(level=logging.INFO,
    format='%(asctime)s %(levelname)s %(message)s')
logger = logging.getLogger(__name__)

def train_model(data):
    logger.info(f"Training on {len(data)} samples")
    try:
        # Training logic
        logger.info("Training complete")
    except Exception as e:
        logger.error(f"Training failed: {e}")
        raise

# Debug with breakpoint() — drops into pdb
def buggy_function(x):
    result = x * 2
    breakpoint()  # Execution pauses here → inspect variables
    return result + 1

Exercises

Exercise 11.1: What is the testing pyramid (unit vs integration vs E2E)?

Unit tests (base, most tests): Test individual functions in isolation. Fast, cheap. Integration tests (middle): Test components working together (API + database). E2E tests (top, fewest): Test full user workflows. Slow, expensive. The pyramid shape means: write many unit tests, fewer integration, fewest E2E.

Exercise 11.2: Write a test for a function that reads a file (using tmp_path fixture)

def test_read_file(tmp_path):
    f = tmp_path / "test.txt"
    f.write_text("hello world")
    assert f.read_text() == "hello world"

Exercise 11.3: Why use logging instead of print statements?

Logging provides: severity levels (DEBUG/INFO/WARNING/ERROR), timestamps, configurable output (file/console/remote), can be disabled in production without removing code, and supports structured formatting. Print statements must be manually removed and provide no filtering or context.

Chapter Summary

pytest is the standard Python testing framework — simple, powerful, extensible
Parametrize tests multiple inputs; fixtures share setup code
Aim for 80%+ code coverage; test edge cases and error paths
Use logging over print; use breakpoint() for interactive debugging

Chapter 12

Code Modularization, REST APIs & Docker

Learning Objectives

Structure Python projects with modules and packages
Build REST APIs with Flask and FastAPI
Containerize applications with Docker
Deploy an ML prediction API

Clean Code & Modules

Python
# Project structure
# ml_project/
# ├── src/
# │   ├── __init__.py
# │   ├── model.py
# │   ├── preprocess.py
# │   └── api.py
# ├── tests/
# ├── Dockerfile
# └── requirements.txt

FastAPI — Modern Python API

Python
from fastapi import FastAPI
from pydantic import BaseModel
import pickle
import numpy as np

app = FastAPI(title="ML Prediction API")

class PredictionInput(BaseModel):
    features: list[float]

class PredictionOutput(BaseModel):
    prediction: str
    confidence: float

# Load model on startup
with open("model.pkl", "rb") as f:
    model = pickle.load(f)

@app.post("/predict", response_model=PredictionOutput)
async def predict(data: PredictionInput):
    X = np.array(data.features).reshape(1, -1)
    pred = model.predict(X)[0]
    proba = model.predict_proba(X).max()
    return PredictionOutput(prediction=str(pred), confidence=float(proba))

@app.get("/health")
async def health():
    return {"status": "healthy"}

Docker — Containerization

Dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["uvicorn", "src.api:app", "--host", "0.0.0.0", "--port", "8000"]

Bash
# Build and run
docker build -t ml-api .
docker run -p 8000:8000 ml-api

# docker-compose for multi-service
# docker-compose.yml:
# services:
#   api:  build: . ports: ["8000:8000"]
#   db:   image: postgres:15

Exercises

Exercise 12.1: What is the difference between Flask and FastAPI?

Flask: Synchronous, mature, huge ecosystem, minimal by design. FastAPI: Async (ASGI), auto-generates OpenAPI docs, uses Pydantic for validation, 3-5x faster. Use FastAPI for new APIs; Flask for existing projects or simpler needs.

Exercise 12.2: Why containerize ML models with Docker?

Docker ensures "it works on my machine" → "it works everywhere." Packages Python version, libraries, system dependencies, and model files into one portable image. Eliminates version conflicts, enables horizontal scaling, and works with Kubernetes for orchestration.

Exercise 12.3: Explain SOLID principles with one-line examples

Single Responsibility: One class = one job. Open/Closed: Extend via inheritance, don't modify existing code. Liskov Substitution: Subclass should be usable wherever parent is. Interface Segregation: Many specific interfaces > one fat interface. Dependency Inversion: Depend on abstractions, not concrete classes.

Chapter Summary

Structure code into modules/packages for maintainability
FastAPI provides modern, fast, auto-documented REST APIs
Docker containerizes your app with all dependencies for portable deployment
SOLID principles guide clean, maintainable architecture

Part IV

Hardware & Computing

Understanding the machine beneath the code

Chapter 13

GPU vs CPU, CUDA & Distributed Computing

Learning Objectives

Understand CPU vs GPU architecture and when GPUs win
Learn CUDA programming basics and GPU-accelerated Python
Grasp distributed computing concepts: data and model parallelism
Understand memory bandwidth bottlenecks and mixed precision training

CPU vs GPU Architecture

Feature	CPU	GPU
Cores	4-64 complex cores	1000-16000 simple cores
Clock Speed	3-5 GHz	1-2 GHz
Strength	Sequential, complex logic	Massively parallel, simple ops
Memory	System RAM (64-512 GB)	VRAM (8-80 GB)
Best For	Control flow, I/O, OS	Matrix math, convolutions

A GPU's thousands of cores can execute the same operation on thousands of data elements simultaneously (SIMD — Single Instruction Multiple Data). Matrix multiplication — the core of neural networks — is perfectly parallel: each output element is an independent dot product.

GPU-Accelerated Python

Python
# CuPy — NumPy on GPU (drop-in replacement)
import cupy as cp
import numpy as np
import time

size = 10000

# CPU (NumPy)
a_cpu = np.random.randn(size, size).astype(np.float32)
b_cpu = np.random.randn(size, size).astype(np.float32)
start = time.time()
c_cpu = a_cpu @ b_cpu
print(f"CPU: {time.time()-start:.3f}s")

# GPU (CuPy)
a_gpu = cp.array(a_cpu)
b_gpu = cp.array(b_cpu)
start = time.time()
c_gpu = a_gpu @ b_gpu
cp.cuda.Stream.null.synchronize()
print(f"GPU: {time.time()-start:.3f}s")
# GPU is typically 10-50x faster for large matrix multiplications

# Numba — JIT compile Python to GPU kernels
from numba import cuda
import math

@cuda.jit
def vector_add_gpu(a, b, result):
    idx = cuda.grid(1)
    if idx < a.size:
        result[idx] = a[idx] + b[idx]

CUDA Concepts

CUDA organizes parallel execution into a hierarchy:

Level	Description	Analogy
Thread	Single execution unit	One worker
Block	Group of threads (up to 1024)	One team
Grid	Group of blocks	Entire workforce

Distributed Computing

Python
# PyTorch Data Parallel — split batches across GPUs
import torch
import torch.nn as nn

model = nn.Linear(1000, 100)

# Data Parallelism: same model, split data across GPUs
if torch.cuda.device_count() > 1:
    model = nn.DataParallel(model)
model = model.cuda()

# Model Parallelism: split model layers across GPUs
# layer1 → GPU0, layer2 → GPU1 (for models too large for one GPU)

Memory Bottlenecks & Mixed Precision

Memory Bandwidth is the Real Bottleneck

GPUs can compute faster than memory can feed data to them. Key optimizations: (1) Mixed precision (FP16): halves memory, doubles throughput with tensor cores. (2) Gradient checkpointing: recompute activations instead of storing them. (3) Data prefetching: load next batch while computing current one. (4) Model quantization: INT8 inference for 4x speedup on edge devices.

Python
# Mixed Precision Training with PyTorch
from torch.cuda.amp import autocast, GradScaler

scaler = GradScaler()

for batch in dataloader:
    optimizer.zero_grad()
    with autocast():                  # FP16 forward pass
        output = model(batch)
        loss = criterion(output, labels)
    scaler.scale(loss).backward()      # Scaled FP16 backward
    scaler.step(optimizer)
    scaler.update()
# ~2x faster training, ~50% less memory!

Project: Benchmark CPU vs GPU Matrix Operations

Python
import numpy as np
import time

sizes = [100, 500, 1000, 2000, 5000]
results = []

for n in sizes:
    A = np.random.randn(n, n).astype(np.float32)
    B = np.random.randn(n, n).astype(np.float32)

    start = time.time()
    C = A @ B
    cpu_time = time.time() - start

    gflops = (2 * n**3) / cpu_time / 1e9
    results.append((n, cpu_time, gflops))
    print(f"Size {n:>5}×{n:<5} → {cpu_time:.3f}s ({gflops:.1f} GFLOPS)")

# With GPU (CuPy):
# try:
#     import cupy as cp
#     A_gpu = cp.array(A); B_gpu = cp.array(B)
#     start = time.time()
#     C_gpu = A_gpu @ B_gpu
#     cp.cuda.Stream.null.synchronize()
#     gpu_time = time.time() - start
#     print(f"GPU: {gpu_time:.3f}s → {cpu_time/gpu_time:.1f}x speedup")

Exercises

Exercise 13.1: Why can't GPUs replace CPUs entirely?

GPUs excel at data parallelism — the same operation on many elements. But they are poor at: (1) complex branching/control flow, (2) sequential algorithms, (3) operating system tasks, (4) low-latency single-thread operations, (5) irregular memory access patterns. The CPU handles orchestration while the GPU handles computation.

Exercise 13.2: What is the difference between data parallelism and model parallelism?

Data parallelism: Same model replicated across GPUs, each processes different data batches. Gradients are averaged. Works for most models. Model parallelism: Different parts of the model on different GPUs. Required when model doesn't fit in one GPU's memory (e.g., GPT-4 with 1.7T parameters). More complex to implement.

Exercise 13.3: Why does mixed precision training work without losing accuracy?

FP16 has less precision but neural networks are robust to small rounding errors. The trick: (1) Forward pass in FP16 (fast, small). (2) Loss scaling prevents tiny gradients from rounding to zero. (3) Weight updates in FP32 (full precision master copy). (4) Only the final update step needs precision. Result: nearly identical accuracy with 2x speed and 50% memory.

Exercise 13.4: What is MapReduce and how does it relate to distributed ML?

Map: Apply a function to each data chunk independently (parallel). Reduce: Combine results into a single output. In ML: Map = compute gradients on each GPU's data batch. Reduce = average gradients across all GPUs (AllReduce). This is the foundation of distributed SGD used by PyTorch DDP and Horovod.

Chapter Summary

GPUs have thousands of simple cores ideal for parallel matrix operations
CuPy and Numba bring GPU acceleration to Python with minimal code changes
Data parallelism splits batches across GPUs; model parallelism splits the model
Mixed precision (FP16) doubles throughput with minimal accuracy loss
Memory bandwidth, not compute, is often the real bottleneck

🎓 Congratulations!

You've completed Programming & Software Engineering. You now have the skills to write clean, efficient, production-ready Python code — from algorithms and OOP to APIs and GPU computing.