Time Series Analysis
& Forecasting
Master the art and science of predicting the future from historical temporal data โ from classical ARIMA to deep learning LSTMs.
Learning Objectives
By the end of this chapter, you will be able to:
- Decompose a time series into its core components: trend, seasonality, cyclical, and noise.
- Test for stationarity using the ADF (Augmented Dickey-Fuller) and KPSS tests, and apply differencing/transformations to achieve stationarity.
- Interpret ACF and PACF plots to identify appropriate model orders for AR, MA, and ARIMA models.
- Build and tune ARIMA(p,d,q) models including seasonal variants (SARIMA) using the Box-Jenkins methodology.
- Apply Exponential Smoothing methods: Simple, Holt's Linear Trend, and Holt-Winters Seasonal models.
- Use Facebook Prophet for time series forecasting with trend changepoints, holiday effects, and custom seasonality.
- Design LSTM networks using TensorFlow/Keras for sequence-to-one and sequence-to-sequence forecasting.
- Understand multivariate time series analysis using Vector Autoregression (VAR).
- Detect anomalies in time series data using statistical and ML-based approaches.
- Evaluate forecasts with MAPE, RMSE, SMAPE, and other metrics, knowing their tradeoffs.
- Apply time series methods to real-world Indian and global case studies, including Nifty50 forecasting, monsoon prediction, and energy demand.
- Build end-to-end mini projects โ Stock Price Forecaster and Weather Predictor.
University exams heavily test stationarity concepts, ARIMA order selection from ACF/PACF plots, and deriving forecast equations. Be ready to interpret plots and perform hand calculations for small series.
Introduction
Time series data is everywhere โ from the minute-by-minute stock prices on the Bombay Stock Exchange to daily temperatures recorded by the India Meteorological Department, monthly GST revenues, and the second-by-second sensor readings in a smart factory. Unlike cross-sectional data where observations are independent, time series data has temporal ordering, and that ordering contains crucial information.
Time series analysis is the discipline of extracting meaningful statistics and characteristics from time-ordered data, while time series forecasting uses historical observations to predict future values. The applications span virtually every industry:
- Finance: Stock prices, exchange rates, portfolio risk (VaR)
- Weather: Temperature, rainfall, cyclone paths
- Retail: Sales demand, inventory optimization
- Healthcare: Patient vitals monitoring, epidemic forecasting
- Energy: Electricity demand, solar/wind output prediction
- Telecom: Network traffic, call volume
- Manufacturing: Sensor data for predictive maintenance
India's UPI (Unified Payments Interface) processes over 10 billion transactions per month. Time series forecasting helps NPCI and banks predict transaction volumes, detect anomalies during festivals (Diwali spike!), and capacity plan server infrastructure. The Reserve Bank of India (RBI) uses time series models to forecast inflation (CPI) and GDP growth for monetary policy decisions.
This chapter walks through the entire landscape โ from classical decomposition and ARIMA to modern deep learning (LSTM) and Meta's Prophet. We adopt a practitioner's approach: understand the theory, implement from scratch, then use production-ready libraries.
The most common mistake students make is applying machine learning models directly to time series without checking for stationarity. A non-stationary series violates the fundamental assumptions of most models. Always start with visualization, stationarity tests, and decomposition before modeling.
Historical Background
The study of time series has deep roots in astronomy, economics, and meteorology.
Timeline of Key Developments
| Year | Milestone | Contributor |
|---|---|---|
| 1927 | Yule introduces autoregressive (AR) models for sunspot data | G. Udny Yule |
| 1937 | Slutzky formalizes the moving average (MA) process | Eugen Slutzky |
| 1938 | Wold's decomposition theorem | Herman Wold |
| 1957 | Exponential smoothing methods proposed | C.C. Holt, R.G. Brown |
| 1960 | Seasonal Holt-Winters method | Peter Winters |
| 1970 | Box-Jenkins methodology for ARIMA | George Box & Gwilym Jenkins |
| 1979 | Dickey-Fuller test for unit roots | David Dickey & Wayne Fuller |
| 1980 | Vector Autoregression (VAR) models | Christopher Sims |
| 1982 | ARCH models for volatility | Robert Engle |
| 1986 | GARCH models generalize ARCH | Tim Bollerslev |
| 1992 | KPSS stationarity test | Kwiatkowski et al. |
| 1997 | Long Short-Term Memory (LSTM) architecture | Hochreiter & Schmidhuber |
| 2017 | Facebook Prophet for scalable forecasting | Sean Taylor & Ben Letham |
| 2019 | Temporal Fusion Transformers | Lim et al. (Google) |
| 2020 | N-BEATS: Neural basis expansion for time series | Oreshkin et al. |
| 2023 | TimeGPT & foundation models for time series | Nixtla & others |
The Box-Jenkins methodology (1970) was the watershed moment. It provided a systematic approach โ identify, estimate, diagnose โ that remains the gold standard for classical time series modeling. George Box famously said, "All models are wrong, but some are useful," which perfectly captures the forecasting philosophy.
The India Meteorological Department (IMD), established in 1875, is one of the oldest meteorological organizations. Indian statistician P.C. Mahalanobis (of Mahalanobis distance fame) at the Indian Statistical Institute pioneered statistical approaches to monsoon forecasting in the 1930s. Today, IMD uses ensemble models combining statistical and dynamical methods to issue Long Range Forecasts (LRF) for the Indian monsoon season.
Conceptual Explanation
4.1 Time Series Components
Every time series Y(t) can be decomposed into four fundamental components:
- Trend (T): The long-term upward or downward movement. Example: India's GDP growth over decades.
- Seasonality (S): Regular, periodic patterns repeating at fixed intervals. Example: Ice cream sales peak every summer, Diwali shopping spikes every October/November.
- Cyclical (C): Longer-term fluctuations without fixed period, often linked to business cycles. Example: Economic booms and recessions lasting 5-10 years.
- Noise/Residual (ฮต): Random, unpredictable variation after removing other components.
Two decomposition models are common:
Use additive when seasonal fluctuations are roughly constant regardless of the level. Use multiplicative when seasonal variation grows proportionally with the level (e.g., airline passengers โ the seasonal swing is larger when overall traffic is higher).
4.2 Stationarity
A time series is stationary when its statistical properties (mean, variance, autocorrelation) do not change over time. Most forecasting models require stationarity.
Strict vs. Weak Stationarity
- Strict stationarity: The joint distribution of (Y(tโ), Y(tโ), ..., Y(tโ)) is identical to (Y(tโ+ฯ), Y(tโ+ฯ), ..., Y(tโ+ฯ)) for all time shifts ฯ.
- Weak (covariance) stationarity: Mean is constant, variance is finite and constant, covariance depends only on the lag (not time). This is the practical requirement.
Testing for Stationarity
Augmented Dickey-Fuller (ADF) Test:
- Hโ: Unit root exists (series is non-stationary)
- Hโ: No unit root (series is stationary)
- If p-value < 0.05 โ reject Hโ โ series IS stationary
KPSS Test:
- Hโ: Series is stationary
- Hโ: Series has a unit root (non-stationary)
- If p-value < 0.05 โ reject Hโ โ series is NOT stationary
ADF and KPSS have opposite null hypotheses! Use both as a confirmatory pair. If ADF says stationary AND KPSS says stationary, you're confident. If they disagree, the series might be "near unit root" โ consider differencing or fractional integration.
4.3 Differencing and Transformations
To make a non-stationary series stationary:
- First Differencing: Y'(t) = Y(t) - Y(t-1) removes linear trends.
- Second Differencing: Y''(t) = Y'(t) - Y'(t-1) removes quadratic trends.
- Seasonal Differencing: Y'(t) = Y(t) - Y(t-m) where m is the seasonal period.
- Log Transform: Stabilizes variance when it grows with the level.
- Box-Cox Transform: Generalized power transform Y(ฮป) = (Y^ฮป - 1)/ฮป.
4.4 ACF and PACF
Autocorrelation Function (ACF) measures the correlation between Y(t) and Y(t-k) for lag k. It includes indirect correlations through intermediate lags.
Partial Autocorrelation Function (PACF) measures the direct correlation between Y(t) and Y(t-k), removing the effect of all intermediate lags.
| Model | ACF Behavior | PACF Behavior |
|---|---|---|
| AR(p) | Decays gradually (exponential/oscillating) | Cuts off after lag p |
| MA(q) | Cuts off after lag q | Decays gradually |
| ARMA(p,q) | Decays gradually | Decays gradually |
4.5 ARIMA Models
AR(p) โ Autoregressive: Current value depends on p past values.
MA(q) โ Moving Average: Current value depends on q past error terms.
ARIMA(p,d,q): Combines AR, differencing (d times), and MA. The I stands for "Integrated" โ it reverses differencing.
4.6 SARIMA
SARIMA extends ARIMA for seasonal data: SARIMA(p,d,q)(P,D,Q,m) where uppercase letters are seasonal orders and m is the seasonal period (e.g., 12 for monthly data with annual seasonality).
4.7 Exponential Smoothing
- SES (Simple Exponential Smoothing): For level-only data โ no trend, no seasonality.
- Holt's Linear: Adds a trend component.
- Holt-Winters: Adds both trend and seasonality. Comes in additive and multiplicative variants.
4.8 Prophet
Developed by Meta (Facebook), Prophet uses a decomposable additive model: y(t) = g(t) + s(t) + h(t) + ฮต(t) where g(t) is a piecewise-linear or logistic growth trend, s(t) is seasonality modeled with Fourier terms, and h(t) captures holiday effects. Prophet is designed to handle missing data, changepoints, and multiple seasonalities automatically.
4.9 LSTM for Time Series
Long Short-Term Memory networks are a type of recurrent neural network (RNN) that can learn long-range dependencies. They use a gating mechanism (forget, input, output gates) to selectively remember or forget information. For time series, the input is a sliding window of past observations, and the output is the forecast.
4.10 VAR โ Vector Autoregression
When you have multiple interrelated time series (e.g., Nifty50 + USD/INR + gold prices), VAR models capture the linear interdependencies. Each variable is a linear function of past lags of itself AND all other variables.
4.11 Anomaly Detection in Time Series
Anomalies are observations that deviate significantly from expected patterns. Methods include: statistical thresholds (z-score, IQR), forecast-based (flag points with large residuals), isolation forests, autoencoders, and Prophet's built-in uncertainty intervals.
In production systems, the choice between classical (ARIMA) and ML (LSTM, Prophet) depends on: (1) data volume โ LSTM needs thousands of data points, ARIMA works with fewer, (2) interpretability โ ARIMA provides coefficient-level insight, (3) automation โ Prophet is designed for "analyst in the loop" at scale, and (4) latency โ ARIMA is orders of magnitude faster to train than LSTM.
Mathematical Foundation
5.1 AR(p) Process
where c is a constant, ฯโ...ฯโ are AR coefficients, ฮต(t) ~ WN(0, ฯยฒ)
The process is stationary if all roots of the characteristic polynomial 1 - ฯโz - ฯโzยฒ - ... - ฯโzแต = 0 lie outside the unit circle.
5.2 MA(q) Process
where ฮผ is the mean, ฮธโ...ฮธโ are MA coefficients, ฮต(t) ~ WN(0, ฯยฒ)
An MA(q) process is always stationary (finite linear combination of white noise). It is invertible if all roots of 1 + ฮธโz + ฮธโzยฒ + ... + ฮธโzแต = 0 lie outside the unit circle.
5.3 ARIMA(p,d,q)
B is the backshift operator: BยทY(t) = Y(t-1)
ฯ(B) = 1 - ฯโB - ... - ฯโBแต (AR polynomial)
ฮธ(B) = 1 + ฮธโB + ... + ฮธโBแต (MA polynomial)
(1-B)แต is the differencing operator applied d times
5.4 SARIMA(p,d,q)(P,D,Q)โ
ฮฆ(Bแต) = 1 - ฮฆโBแต - ... - ฮฆโBแดพแต (seasonal AR)
ฮ(Bแต) = 1 + ฮโBแต + ... + ฮQBQแต (seasonal MA)
m = seasonal period
5.5 Exponential Smoothing โ State Space Form
Trend: b(t) = ฮฒ(โ(t) - โ(t-1)) + (1-ฮฒ)b(t-1)
Forecast: ลท(t+h) = โ(t) + hยทb(t)
Trend: b(t) = ฮฒ(โ(t) - โ(t-1)) + (1-ฮฒ)b(t-1)
Season: s(t) = ฮณ(y(t) - โ(t-1) - b(t-1)) + (1-ฮณ)s(t-m)
Forecast: ลท(t+h) = โ(t) + hยทb(t) + s(t+h-m)
5.6 Evaluation Metrics
MAPE has issues when actual values are near zero (division by zero). SMAPE is bounded [0%, 200%] and handles this better. RMSE penalizes large errors more heavily due to squaring. In exams, know which metric is appropriate: RMSE for point accuracy, MAPE for relative comparison, SMAPE for symmetry.
5.7 ADF Test Statistic
Test statistic: ฯ = ฮณฬ / SE(ฮณฬ)
Hโ: ฮณ = 0 (unit root โ non-stationary)
Formula Derivations
6.1 Deriving the AR(1) Autocorrelation Function
For AR(1): Y(t) = ฯY(t-1) + ฮต(t), with |ฯ| < 1 for stationarity.
Step 1: Variance โ ฮณ(0) = Var(Y(t)) = ฯยฒVar(Y(t-1)) + ฯยฒ = ฯยฒฮณ(0) + ฯยฒ
Therefore: ฮณ(0) = ฯยฒ / (1 - ฯยฒ)
Step 2: Autocovariance at lag k โ Multiply Y(t) = ฯY(t-1) + ฮต(t) by Y(t-k) and take expectations:
ฮณ(k) = E[Y(t)ยทY(t-k)] = ฯยทE[Y(t-1)ยทY(t-k)] + E[ฮต(t)ยทY(t-k)] = ฯยทฮณ(k-1) + 0 = ฯยทฮณ(k-1)
By recursion: ฮณ(k) = ฯแต ยท ฮณ(0)
Step 3: Autocorrelation: ฯ(k) = ฮณ(k)/ฮณ(0) = ฯแต
6.2 Deriving MA(1) Autocorrelation
For MA(1): Y(t) = ฮต(t) + ฮธฮต(t-1)
Step 1: ฮณ(0) = Var(ฮต(t) + ฮธฮต(t-1)) = (1 + ฮธยฒ)ฯยฒ
Step 2: ฮณ(1) = Cov(ฮต(t) + ฮธฮต(t-1), ฮต(t-1) + ฮธฮต(t-2)) = ฮธฯยฒ
Step 3: ฮณ(k) = 0 for k โฅ 2 (non-overlapping noise terms)
6.3 Deriving Simple Exponential Smoothing
The SES forecast ลท(t+1) = ฮฑy(t) + (1-ฮฑ)ลท(t) can be expanded recursively:
ลท(t+1) = ฮฑy(t) + (1-ฮฑ)[ฮฑy(t-1) + (1-ฮฑ)ลท(t-1)]
= ฮฑy(t) + ฮฑ(1-ฮฑ)y(t-1) + (1-ฮฑ)ยฒลท(t-1)
Continuing infinitely:
Weights decay geometrically โ recent data gets more weight. The weights sum to 1 since ฮฑ ฮฃ(1-ฮฑ)โฑ = ฮฑ ยท 1/(1-(1-ฮฑ)) = 1
6.4 Box-Cox Transformation
{ ln(y) if ฮป = 0
ฮป = 1: No transform | ฮป = 0.5: Square root | ฮป = 0: Log | ฮป = -1: Inverse
The optimal ฮป can be found by maximizing the log-likelihood of the transformed data. In practice, scipy's boxcox function handles this automatically. Always remember to inverse transform your forecasts back to the original scale!
Worked Numerical Examples
Example 1: AR(1) Forecast by Hand
Given: AR(1) model Y(t) = 5 + 0.7ยทY(t-1) + ฮต(t), with Y(100) = 20.
Task: Forecast Y(101), Y(102), Y(103).
Solution:
- ลท(101) = 5 + 0.7 ร 20 = 5 + 14 = 19.0
- ลท(102) = 5 + 0.7 ร 19.0 = 5 + 13.3 = 18.3
- ลท(103) = 5 + 0.7 ร 18.3 = 5 + 12.81 = 17.81
Note: Forecasts converge toward the long-run mean ฮผ = c/(1-ฯ) = 5/(1-0.7) = 16.67.
Example 2: Simple Exponential Smoothing
Given: Sales data: [100, 120, 110, 130, 125]. ฮฑ = 0.3. Initial forecast ลท(1) = 100.
Solution:
- ลท(2) = 0.3ร100 + 0.7ร100 = 30 + 70 = 100.0
- ลท(3) = 0.3ร120 + 0.7ร100 = 36 + 70 = 106.0
- ลท(4) = 0.3ร110 + 0.7ร106 = 33 + 74.2 = 107.2
- ลท(5) = 0.3ร130 + 0.7ร107.2 = 39 + 75.04 = 114.04
- ลท(6) = 0.3ร125 + 0.7ร114.04 = 37.5 + 79.83 = 117.33
Example 3: MAPE Calculation
Given: Actual = [100, 150, 200], Forecast = [110, 140, 180].
Solution:
- |100-110|/100 = 10/100 = 0.10
- |150-140|/150 = 10/150 = 0.0667
- |200-180|/200 = 20/200 = 0.10
MAPE = (100/3)(0.10 + 0.0667 + 0.10) = (100/3)(0.2667) = 8.89%
Example 4: Differencing by Hand
Given: Y = [10, 13, 18, 25, 34, 45]
First Difference: ฮY = [3, 5, 7, 9, 11] โ still shows a trend
Second Difference: ฮยฒY = [2, 2, 2, 2] โ constant! Series is now stationary.
Since d=2 differences were needed, the original series has a quadratic trend (Y โ tยฒ).
Example 5: ACF for MA(1)
Given: MA(1) model Y(t) = ฮต(t) + 0.6ฮต(t-1), ฯยฒ = 4.
Solution:
- ฮณ(0) = (1 + 0.6ยฒ) ร 4 = 1.36 ร 4 = 5.44
- ฮณ(1) = 0.6 ร 4 = 2.4
- ฮณ(k) = 0 for k โฅ 2
- ฯ(1) = 2.4/5.44 = 0.441
- ฯ(k) = 0 for k โฅ 2 โ ACF cuts off after lag 1 โ
For hand calculations, always show: (1) formula substitution, (2) intermediate arithmetic, (3) final answer with units/interpretation. For model identification, draw rough ACF/PACF sketches and match them to the table in Section 4.4.
Visual Diagrams
Original Series Y(t) Trend T(t) Seasonality S(t) Residual ฮต(t)
โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ
โ โญโฎ โญโฎ โญโฎ โ โ โฑโฑ โ โ โญโฎโญโฎโญโฎโญโฎโญโฎโญโฎ โ โ โท โทโต โทโตโทโต โท โ
โ โญโฏ โฐโฎโฏ โฐโฎโฏ โฐโฎ โ โ โฑโฑโฑ โ โ โฏโฐโฏโฐโฏโฐโฏโฐโฏโฐโฏโฐโฏ โ โโต โตโท โท โต โตโท โ
โโญโฏ โฐ โฐ โฐโฎโ = โ โฑโฑโฑ โ + โ (repeating) โ + โ โท โตโท โตโท โท โต โ
โโฏ โฐโ โ โฑโฑโฑ โ โ โญโฎโญโฎโญโฎโญโฎโญโฎโญโฎ โ โโต โท โต โตโท โทโต โ
โ โ โโฑ โ โ โฏโฐโฏโฐโฏโฐโฏโฐโฏโฐโฏโฐโฏ โ โ โต โตโท โท โตโท โ
โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ
Time โ Time โ Time โ Time โ
AR(2) Model MA(2) Model ARMA(1,1) Model โโโโโโโโโโโ โโโโโโโโโโโ โโโโโโโโโโโโโโโ ACF: Gradual Decay ACF: Cuts off at lag 2 ACF: Gradual Decay โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโ โ lag 1 โโโโโโโโโโโโโ โ lag 1 โโโโโโโโโโโ โ lag 1 โโโโโโโโ โ lag 2 โโโโโโโโโ โ lag 2 โโโโโโโโ โ lag 2 โโโโโ โ lag 3 โโ โ โ โ โ โ โ lag 3 (โ0) โโโโโโ โ lag 3 โโโ โ lag 4 โโ โ โ โ โ โ โ lag 4 (โ0) โโโโ โ lag 4 โโ โ lag 5 โโ โ โ โ โ โ โ lag 5 (โ0) โโโ โ lag 5 โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ PACF: Cuts off at lag 2 PACF: Gradual Decay PACF: Gradual Decay โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโ โ lag 1 โโโโโโโโโโโโโ โ lag 1 โโโโโโโโโโโ โ lag 1 โโโโโโโโ โ lag 2 โโโโโโโโ โ lag 2 โโโโโโ โ lag 2 โโ โ โ โ โ โ โ lag 3 (โ0) โโโโโ โ lag 3 โโโโ โ lag 3 โโ โ โ โ โ โ โ lag 4 (โ0) โโโ โ lag 4 โโโ โ lag 4 โโ โ โ โ โ โ โ lag 5 (โ0) โโ โ lag 5 โโ โ lag 5 โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ LSTM Cell โ
โ โ
c(t-1) โโโโโโโโโโโโโโโโโโโโคโโโ[ร]โโโโโโโโโโโ[+]โโโโโโโโโโโโโโโโโโโโโโโ c(t)
โ โ โ โ
โ โ โ โ
โ โโโดโโ โโโดโโ โ
โ โ fโ โ โ iโ โร[cฬโ] โ
โ โfor-โ โinp-โ โ
โ โget โ โut โ โ
โ โgateโ โgateโ โ
โ โโโฌโโ โโโฌโโ โ
h(t-1) โโโโโโโฌโโโโโโโโโโโโโค โ โ โ
โ โ โ โ โโโโโ โ
โ โ โโโโโโโโโโโโโโโโค tanhโโค โโโ[ร]โโโโโ h(t)
โ โ โ โ โโโโโ โ โ
โ โ โโโดโโโโโโโโโโโโโโโดโโ โโดโโ โ
โ โ โ ฯ ฯ tanh โ โoโโ โ
โ โ โ [fโ] [iโ] [cฬโ] โ โoutโ โ
โ โ โโโโโโโโโฌโโโโโโโโโโโ โputโ โ
โ โ โ โgteโ โ
โโโโโโโโโโโโโโผโโโโโโโโโโโ โโโโโ โ
โ โ
x(t) โโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
fโ = ฯ(Wfยท[h(t-1), x(t)] + bf) โ Forget gate
iโ = ฯ(Wiยท[h(t-1), x(t)] + bi) โ Input gate
cฬโ = tanh(Wcยท[h(t-1), x(t)] + bc) โ Candidate cell state
c(t) = fโ ร c(t-1) + iโ ร cฬโ โ New cell state
oโ = ฯ(Woยท[h(t-1), x(t)] + bo) โ Output gate
h(t) = oโ ร tanh(c(t)) โ Hidden state output
Original Series: [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
Window Size = 3, Horizon = 1
โโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโ
โ Features (X) โ Target โ
โโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโค
โ [10, 20, 30] โ 40 โ
โ [20, 30, 40] โ 50 โ
โ [30, 40, 50] โ 60 โ
โ [40, 50, 60] โ 70 โ
โ [50, 60, 70] โ 80 โ
โ [60, 70, 80] โ 90 โ
โ [70, 80, 90] โ 100 โ
โโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโ
โ Converts time series into supervised learning problem
โ For LSTM: X reshaped to (samples, timesteps, features) = (7, 3, 1)
Flowcharts
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ START: Raw Time Series Data โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Step 1: VISUALIZATION โ
โ โข Plot the series โข Look for trend, seasonality, outliers โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Step 2: STATIONARITY CHECK โ
โ โข ADF Test โข KPSS Test โข Visual inspection of mean/variance โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโดโโโโโโโโโ
โ Stationary? โ
โโโโโโโโโโฌโโโโโโโโโ
NO โฑ โ โฒ YES
โฑ โ โฒ
โผ โ โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Apply Transformations โ โ โ Step 3: MODEL IDENTIFICATION โ
โ โข Differencing (d) โ โ โ โข Plot ACF โ determine q โ
โ โข Log Transform โโโโโโ โ โข Plot PACF โ determine p โ
โ โข Seasonal diff (D) โ โ โข d = number of diffs needed โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Step 4: PARAMETER ESTIMATION โ
โ โข Fit ARIMA(p,d,q) or SARIMA(p,d,q)(P,D,Q,m) โ
โ โข Use MLE (Maximum Likelihood Estimation) โ
โ โข Compare AIC/BIC for candidate models โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Step 5: DIAGNOSTIC CHECKING โ
โ โข Ljung-Box test on residuals (should be white noise) โ
โ โข Residual ACF plot (no significant lags) โ
โ โข Q-Q plot (normality check) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโดโโโโโโโโโ
โ Residuals OK? โ
โโโโโโโโโโฌโโโโโโโโโ
NO โฑ โ โฒ YES
โฑ โ โฒ
โผ โ โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Revise model orders โ โ โ Step 6: FORECASTING โ
โ Try different (p,d,q) โโโโโโ โ โข Generate point forecasts โ
โ Consider SARIMA โ โ โข Compute confidence intervalsโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โข Monitor performance โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโ
โ Time Series Data โ
โโโโโโโโโโโโฌโโโโโโโโโโโโ
โ
โโโโโโโโโโโโดโโโโโโโโโโโโ
โ Univariate or โ
โ Multivariate? โ
โโโโโโโโโโโโฌโโโโโโโโโโโโ
Uni โฑ โฒ Multi
โฑ โฒ
โผ โผ
โโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ
โ Has โ โ VAR Model โ
โ Seasonality? โ โ or VECM โ
โโโโโโโโโฌโโโโโโโโ โโโโโโโโโโโโโโโโ
YES โฑ โฒ NO
โฑ โฒ
โผ โผ
โโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโ
โ Data Size? โ โ Data Size? โ
โโโโโโโโโฌโโโโโโโโโ โโโโโโโโโฌโโโโโโโโโ
<500โฑ โฒ>500 <500โฑ โฒ>500
โฑ โฒ โฑ โฒ
โผ โผ โผ โผ
โโโโโโโโโโโ โโโโโโโโโโ โโโโโโโโ โโโโโโโโโโโโ
โ SARIMA โ โ LSTM / โ โARIMA โ โ LSTM / โ
โ Holt- โ โProphet โ โ SES โ โ Prophet โ
โ Winters โ โ SARIMA โ โ Holt โ โ ARIMA โ
โโโโโโโโโโโ โโโโโโโโโโ โโโโโโโโ โโโโโโโโโโโโ
Raw Time Series
โ
โผ
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
โ Decompose โโโโโโถโ Model the โโโโโโถโ Compute โ
โ (STL / โ โ Expected โ โ Residuals โ
โ seasonal_ โ โ (ARIMA / โ โ r(t) = โ
โ decompose) โ โ Prophet) โ โ y(t) - ลท(t)โ
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโฌโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโ
โ Flag where โ
โ |r(t)| > โ
โ k ร ฯ โ
โ (k=2 or 3) โ
โโโโโโโโฌโโโโโโโ
โ
โโโโโโโโโโดโโโโโโโโโ
โ Anomaly? โ
โโโโโโโโโโฌโโโโโโโโโ
YES โฑ โฒ NO
โฑ โฒ
โผ โผ
โโโโโโโโโโโโโโโโ โโโโโโโโโโโโ
โ Alert / โ โ Normal โ
โ Investigate โ โ Continue โ
โโโโโโโโโโโโโโโโ โโโโโโโโโโโโ
Python Implementation (From Scratch)
10.1 Simple Exponential Smoothing from Scratch
PYTHON
import numpy as np
class SimpleExponentialSmoothing:
"""Simple Exponential Smoothing from scratch."""
def __init__(self, alpha=0.3):
self.alpha = alpha
self.fitted_values = []
def fit(self, series):
"""Fit the SES model to a time series."""
self.series = np.array(series, dtype=float)
n = len(self.series)
self.fitted_values = np.zeros(n)
# Initialize: first forecast = first observation
self.fitted_values[0] = self.series[0]
for t in range(1, n):
self.fitted_values[t] = (
self.alpha * self.series[t - 1] +
(1 - self.alpha) * self.fitted_values[t - 1]
)
self.level = self.fitted_values[-1]
return self
def forecast(self, steps=1):
"""Forecast future values (SES gives flat forecast)."""
last_smoothed = (
self.alpha * self.series[-1] +
(1 - self.alpha) * self.level
)
return np.full(steps, last_smoothed)
def mape(self):
"""Calculate Mean Absolute Percentage Error."""
actual = self.series[1:]
forecast = self.fitted_values[1:]
return np.mean(np.abs((actual - forecast) / actual)) * 100
# Example usage
data = [112, 118, 132, 129, 121, 135, 148, 148, 136, 119, 104, 118]
model = SimpleExponentialSmoothing(alpha=0.3)
model.fit(data)
print("Fitted:", np.round(model.fitted_values, 2))
print("Forecast (next 3):", np.round(model.forecast(3), 2))
print(f"MAPE: {model.mape():.2f}%")
10.2 AR(1) Model from Scratch
PYTHON
import numpy as np
class AR1Model:
"""AR(1) model: Y(t) = c + phi * Y(t-1) + epsilon."""
def __init__(self):
self.phi = None
self.c = None
self.sigma2 = None
def fit(self, series):
"""Estimate AR(1) parameters using OLS."""
y = np.array(series, dtype=float)
n = len(y)
# Set up regression: y(t) = c + phi * y(t-1)
Y = y[1:] # dependent variable
X = y[:-1] # lagged variable
# Add intercept
X_design = np.column_stack([np.ones(n - 1), X])
# OLS: beta = (X'X)^{-1} X'Y
beta = np.linalg.solve(X_design.T @ X_design, X_design.T @ Y)
self.c = beta[0]
self.phi = beta[1]
# Residuals
residuals = Y - X_design @ beta
self.sigma2 = np.var(residuals, ddof=2)
self.last_value = y[-1]
return self
def forecast(self, steps=5):
"""Generate multi-step ahead forecasts."""
predictions = []
current = self.last_value
for _ in range(steps):
next_val = self.c + self.phi * current
predictions.append(next_val)
current = next_val
return np.array(predictions)
def long_run_mean(self):
"""E[Y] = c / (1 - phi) for stationary process."""
if abs(self.phi) >= 1:
return float('inf')
return self.c / (1 - self.phi)
def is_stationary(self):
"""Check stationarity condition: |phi| < 1."""
return abs(self.phi) < 1
# Example: Fit AR(1) to synthetic data
np.random.seed(42)
n = 200
y = np.zeros(n)
y[0] = 10
for t in range(1, n):
y[t] = 3 + 0.7 * y[t-1] + np.random.normal(0, 1)
model = AR1Model()
model.fit(y)
print(f"Estimated: c={model.c:.4f}, phi={model.phi:.4f}")
print(f"True: c=3.0000, phi=0.7000")
print(f"Stationary: {model.is_stationary()}")
print(f"Long-run mean: {model.long_run_mean():.4f} (true: {3/(1-0.7):.4f})")
print(f"Forecast: {np.round(model.forecast(5), 2)}")
10.3 Augmented Dickey-Fuller Test from Scratch
PYTHON
import numpy as np
def adf_test_manual(series, max_lags=1):
"""
Manual ADF test implementation.
Tests H0: unit root (non-stationary) vs H1: stationary.
"""
y = np.array(series, dtype=float)
n = len(y)
# Compute first differences
dy = np.diff(y)
# Lagged level
y_lag = y[:-1]
# Trim to account for lags
if max_lags > 0:
# Include lagged differences
dy_lags = np.column_stack([
dy[max_lags - i - 1: n - 1 - i - 1]
for i in range(max_lags)
])
dy_trimmed = dy[max_lags:]
y_lag_trimmed = y_lag[max_lags:]
X = np.column_stack([
np.ones(len(dy_trimmed)), # intercept
y_lag_trimmed, # gamma * y(t-1)
dy_lags # lagged differences
])
else:
dy_trimmed = dy
y_lag_trimmed = y_lag
X = np.column_stack([np.ones(len(dy_trimmed)), y_lag_trimmed])
# OLS estimation
beta = np.linalg.solve(X.T @ X, X.T @ dy_trimmed)
residuals = dy_trimmed - X @ beta
# Standard error of gamma
sigma2 = np.sum(residuals**2) / (len(dy_trimmed) - len(beta))
cov_matrix = sigma2 * np.linalg.inv(X.T @ X)
se_gamma = np.sqrt(cov_matrix[1, 1])
# Test statistic
gamma = beta[1]
t_stat = gamma / se_gamma
# Critical values (approximate, for n > 100 with intercept)
critical_values = {
'1%': -3.43,
'5%': -2.86,
'10%': -2.57
}
print(f"ADF Test Statistic: {t_stat:.4f}")
print(f"Estimated gamma: {gamma:.6f}")
for level, cv in critical_values.items():
reject = "REJECT H0 (Stationary)" if t_stat < cv else "Fail to reject H0"
print(f" {level} critical value: {cv:.2f} โ {reject}")
return t_stat, gamma
# Test with a stationary series
np.random.seed(42)
stationary = np.random.normal(0, 1, 200).cumsum() # Random walk (non-stationary)
print("=== Random Walk (Non-Stationary) ===")
adf_test_manual(stationary)
print("\n=== White Noise (Stationary) ===")
white_noise = np.random.normal(0, 1, 200)
adf_test_manual(white_noise)
10.4 ACF/PACF Computation
PYTHON
import numpy as np
def compute_acf(series, max_lag=20):
"""Compute the Autocorrelation Function."""
y = np.array(series, dtype=float)
n = len(y)
mean = np.mean(y)
var = np.sum((y - mean) ** 2) / n
acf_values = []
for k in range(max_lag + 1):
if k == 0:
acf_values.append(1.0)
else:
cov = np.sum((y[k:] - mean) * (y[:-k] - mean)) / n
acf_values.append(cov / var)
return np.array(acf_values)
def compute_pacf(series, max_lag=20):
"""Compute PACF using the Durbin-Levinson algorithm."""
acf = compute_acf(series, max_lag)
n_lags = max_lag
pacf_values = np.zeros(n_lags + 1)
pacf_values[0] = 1.0
pacf_values[1] = acf[1]
phi = np.zeros((n_lags + 1, n_lags + 1))
phi[1, 1] = acf[1]
for k in range(2, n_lags + 1):
# Numerator
num = acf[k] - sum(phi[k-1, j] * acf[k-j] for j in range(1, k))
# Denominator
den = 1.0 - sum(phi[k-1, j] * acf[j] for j in range(1, k))
phi[k, k] = num / den
for j in range(1, k):
phi[k, j] = phi[k-1, j] - phi[k, k] * phi[k-1, k-j]
pacf_values[k] = phi[k, k]
return pacf_values
# Example
np.random.seed(42)
ar1_data = np.zeros(500)
for t in range(1, 500):
ar1_data[t] = 0.8 * ar1_data[t-1] + np.random.normal(0, 1)
acf = compute_acf(ar1_data, 10)
pacf = compute_pacf(ar1_data, 10)
print("Lag | ACF | PACF")
print("-" * 30)
for k in range(11):
print(f" {k:2d} | {acf[k]:6.3f} | {pacf[k]:6.3f}")
print("\nโ ACF decays exponentially, PACF cuts off after lag 1 โ AR(1) โ")
Extend the AR(1) model to AR(p). Implement an AR class that accepts a parameter p, fits using OLS with p lagged columns, and forecasts iteratively. Test with p=3 on synthetic data generated from a known AR(3) process.
TensorFlow Implementation
11.1 LSTM Time Series Forecaster
TENSORFLOW / KERAS
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping
from sklearn.preprocessing import MinMaxScaler
# =========================================================
# 1. Generate / Load Data
# =========================================================
# Synthetic: trend + seasonality + noise
np.random.seed(42)
t = np.arange(0, 500)
series = 0.05 * t + 10 * np.sin(2 * np.pi * t / 50) + np.random.normal(0, 2, 500)
series = series.reshape(-1, 1)
# =========================================================
# 2. Scale Data
# =========================================================
scaler = MinMaxScaler(feature_range=(0, 1))
scaled = scaler.fit_transform(series)
# =========================================================
# 3. Create Sliding Window Dataset
# =========================================================
def create_sequences(data, window_size=30):
X, y = [], []
for i in range(window_size, len(data)):
X.append(data[i - window_size:i, 0])
y.append(data[i, 0])
return np.array(X), np.array(y)
WINDOW = 30
X, y = create_sequences(scaled, WINDOW)
# Train-test split (80/20)
split = int(0.8 * len(X))
X_train, X_test = X[:split], X[split:]
y_train, y_test = y[:split], y[split:]
# Reshape for LSTM: (samples, timesteps, features)
X_train = X_train.reshape(-1, WINDOW, 1)
X_test = X_test.reshape(-1, WINDOW, 1)
# =========================================================
# 4. Build LSTM Model
# =========================================================
model = Sequential([
LSTM(64, return_sequences=True, input_shape=(WINDOW, 1)),
Dropout(0.2),
LSTM(32, return_sequences=False),
Dropout(0.2),
Dense(16, activation='relu'),
Dense(1)
])
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
loss='mse',
metrics=['mae']
)
model.summary()
# =========================================================
# 5. Train
# =========================================================
early_stop = EarlyStopping(
monitor='val_loss', patience=10, restore_best_weights=True
)
history = model.fit(
X_train, y_train,
epochs=100,
batch_size=32,
validation_split=0.2,
callbacks=[early_stop],
verbose=1
)
# =========================================================
# 6. Evaluate
# =========================================================
y_pred_scaled = model.predict(X_test)
y_pred = scaler.inverse_transform(y_pred_scaled)
y_actual = scaler.inverse_transform(y_test.reshape(-1, 1))
rmse = np.sqrt(np.mean((y_actual - y_pred) ** 2))
mape = np.mean(np.abs((y_actual - y_pred) / y_actual)) * 100
print(f"\nRMSE: {rmse:.4f}")
print(f"MAPE: {mape:.2f}%")
# =========================================================
# 7. Multi-Step Forecast
# =========================================================
def forecast_multistep(model, last_window, steps, scaler):
"""Iteratively forecast multiple steps ahead."""
predictions = []
current = last_window.copy()
for _ in range(steps):
pred = model.predict(current.reshape(1, -1, 1), verbose=0)
predictions.append(pred[0, 0])
current = np.append(current[1:], pred[0, 0])
predictions = np.array(predictions).reshape(-1, 1)
return scaler.inverse_transform(predictions)
future = forecast_multistep(model, scaled[-WINDOW:, 0], 10, scaler)
print(f"\nNext 10 forecasts: {future.flatten().round(2)}")
11.2 Bidirectional LSTM with Attention
TENSORFLOW / KERAS
from tensorflow.keras.layers import (
Bidirectional, LSTM, Dense, Dropout,
Attention, Input, Concatenate, Flatten
)
from tensorflow.keras.models import Model
def build_attention_lstm(window_size, n_features=1):
"""LSTM with simple attention mechanism for time series."""
inp = Input(shape=(window_size, n_features))
# Bidirectional LSTM
lstm_out = Bidirectional(
LSTM(64, return_sequences=True)
)(inp)
lstm_out = Dropout(0.2)(lstm_out)
# Second LSTM layer
lstm_out2 = Bidirectional(
LSTM(32, return_sequences=True)
)(lstm_out)
# Simple attention: learn which timesteps matter most
attention = Dense(1, activation='tanh')(lstm_out2)
attention = Flatten()(attention)
attention = Dense(window_size, activation='softmax')(attention)
# Apply attention weights
# Reshape attention for element-wise multiplication
from tensorflow.keras.layers import RepeatVector, Permute, Multiply
attention = RepeatVector(64)(attention)
attention = Permute([2, 1])(attention)
context = Multiply()([lstm_out2, attention])
context = tf.keras.layers.GlobalAveragePooling1D()(context)
# Output
out = Dense(32, activation='relu')(context)
out = Dropout(0.2)(out)
out = Dense(1)(out)
model = Model(inputs=inp, outputs=out)
model.compile(optimizer='adam', loss='mse', metrics=['mae'])
return model
# Build and display
attn_model = build_attention_lstm(window_size=30)
attn_model.summary()
LSTMs are powerful but come with caveats for time series: (1) They need thousands of data points โ don't use them on monthly data with only 60 observations. (2) They're a black box โ if interpretability matters (finance regulation), prefer ARIMA or Prophet. (3) Modern alternatives like Temporal Fusion Transformers and N-BEATS often outperform LSTMs. Always benchmark LSTM against a simple baseline like ARIMA first.
Scikit-Learn Pipeline
12.1 ARIMA with statsmodels + Prophet Pipeline
PYTHON
import numpy as np
import pandas as pd
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.stattools import adfuller, kpss
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
import warnings
warnings.filterwarnings('ignore')
# =========================================================
# 1. Load and Prepare Data
# =========================================================
# Using airline passengers dataset as example
from statsmodels.datasets import co2
data = co2.load().data
data = data.resample('M').mean().ffill()
data.columns = ['co2']
print(f"Data shape: {data.shape}")
print(data.head())
# =========================================================
# 2. Stationarity Tests
# =========================================================
def stationarity_report(series, name="Series"):
"""Run ADF and KPSS tests and print results."""
print(f"\n=== Stationarity Report: {name} ===")
# ADF Test
adf_result = adfuller(series.dropna(), autolag='AIC')
print(f"ADF Statistic: {adf_result[0]:.4f}")
print(f"ADF p-value: {adf_result[1]:.6f}")
print(f"ADF โ {'Stationary' if adf_result[1] < 0.05 else 'Non-Stationary'}")
# KPSS Test
kpss_result = kpss(series.dropna(), regression='ct')
print(f"KPSS Statistic: {kpss_result[0]:.4f}")
print(f"KPSS p-value: {kpss_result[1]:.4f}")
print(f"KPSS โ {'Non-Stationary' if kpss_result[1] < 0.05 else 'Stationary'}")
stationarity_report(data['co2'], "CO2 Levels")
stationarity_report(data['co2'].diff().dropna(), "CO2 First Difference")
# =========================================================
# 3. Decomposition
# =========================================================
decomp = seasonal_decompose(data['co2'], model='additive', period=12)
# decomp.plot() # Uncomment in Jupyter
# =========================================================
# 4. Auto ARIMA using pmdarima
# =========================================================
# pip install pmdarima
import pmdarima as pm
auto_model = pm.auto_arima(
data['co2'],
seasonal=True,
m=12,
stepwise=True,
suppress_warnings=True,
trace=True,
error_action='ignore',
information_criterion='aic'
)
print(f"\nBest model: {auto_model.summary()}")
# =========================================================
# 5. Manual ARIMA Fit
# =========================================================
train = data['co2'][:'2000']
test = data['co2']['2001':]
model = ARIMA(train, order=(1, 1, 1), seasonal_order=(1, 1, 1, 12))
fitted = model.fit()
print(fitted.summary())
# Forecast
forecast = fitted.forecast(steps=len(test))
rmse = np.sqrt(np.mean((test.values - forecast.values) ** 2))
print(f"\nSARIMA RMSE: {rmse:.4f}")
# =========================================================
# 6. Exponential Smoothing
# =========================================================
from statsmodels.tsa.holtwinters import ExponentialSmoothing
hw_model = ExponentialSmoothing(
train,
trend='add',
seasonal='add',
seasonal_periods=12
).fit()
hw_forecast = hw_model.forecast(steps=len(test))
hw_rmse = np.sqrt(np.mean((test.values - hw_forecast.values) ** 2))
print(f"Holt-Winters RMSE: {hw_rmse:.4f}")
12.2 Prophet Pipeline
PYTHON
# pip install prophet
from prophet import Prophet
import pandas as pd
import numpy as np
# =========================================================
# Prophet requires columns 'ds' (date) and 'y' (value)
# =========================================================
# Prepare data
df = data.reset_index()
df.columns = ['ds', 'y']
# Train-test split
train_df = df[df['ds'] < '2001-01-01']
test_df = df[df['ds'] >= '2001-01-01']
# =========================================================
# Build Prophet Model
# =========================================================
prophet_model = Prophet(
yearly_seasonality=True,
weekly_seasonality=False,
daily_seasonality=False,
changepoint_prior_scale=0.05, # flexibility of trend
seasonality_prior_scale=10.0, # flexibility of seasonality
interval_width=0.95 # 95% prediction interval
)
# Add Indian holidays (example)
# prophet_model.add_country_holidays(country_name='IN')
prophet_model.fit(train_df)
# =========================================================
# Forecast
# =========================================================
future = prophet_model.make_future_dataframe(periods=len(test_df), freq='M')
forecast = prophet_model.predict(future)
# Evaluate
pred = forecast[forecast['ds'] >= '2001-01-01']['yhat'].values
actual = test_df['y'].values
rmse = np.sqrt(np.mean((actual - pred[:len(actual)]) ** 2))
mape = np.mean(np.abs((actual - pred[:len(actual)]) / actual)) * 100
print(f"Prophet RMSE: {rmse:.4f}")
print(f"Prophet MAPE: {mape:.2f}%")
# =========================================================
# Components
# =========================================================
# prophet_model.plot_components(forecast) # Uncomment in Jupyter
# =========================================================
# Anomaly Detection with Prophet
# =========================================================
anomalies = forecast[
(df['y'].values < forecast['yhat_lower'].values) |
(df['y'].values > forecast['yhat_upper'].values)
]
print(f"\nDetected {len(anomalies)} anomalies")
12.3 VAR Model for Multivariate Forecasting
PYTHON
from statsmodels.tsa.api import VAR
import pandas as pd
import numpy as np
# Simulated multivariate data: Nifty50 + USD/INR
np.random.seed(42)
n = 300
nifty = np.cumsum(np.random.normal(0.5, 50, n)) + 15000
usd_inr = np.cumsum(np.random.normal(0.01, 0.5, n)) + 75
gold = np.cumsum(np.random.normal(0.1, 20, n)) + 50000
df = pd.DataFrame({
'Nifty50': nifty,
'USD_INR': usd_inr,
'Gold': gold
})
# Differencing for stationarity
df_diff = df.diff().dropna()
# Fit VAR
var_model = VAR(df_diff)
# Select optimal lag order
lag_order = var_model.select_order(maxlags=15)
print("Optimal lag orders:")
print(lag_order.summary())
# Fit with optimal lag
fitted_var = var_model.fit(maxlags=5, ic='aic')
print(fitted_var.summary())
# Forecast
forecast_diff = fitted_var.forecast(df_diff.values[-5:], steps=10)
forecast_df = pd.DataFrame(forecast_diff, columns=df.columns)
# Invert differencing
last_values = df.iloc[-1]
forecast_levels = forecast_df.cumsum() + last_values.values
print("\nVAR Forecast (levels):")
print(forecast_levels.round(2))
# Granger Causality Test
from statsmodels.tsa.stattools import grangercausalitytests
print("\n=== Granger Causality: USD/INR โ Nifty50 ===")
gc_result = grangercausalitytests(
df_diff[['Nifty50', 'USD_INR']].dropna(),
maxlag=5, verbose=True
)
Quantitative Analyst (Quant): At firms like Citadel, DE Shaw, or Indian hedge funds like Edelweiss, quants build time series models for algorithmic trading. Skills needed: ARIMA/GARCH for volatility, LSTM for price prediction, and VAR for cross-asset analysis. Compensation in India: โน25-80 LPA for experienced roles.
Indian Case Studies
Case Study 1: Nifty50 Stock Index Forecasting
Context: The Nifty50 index represents the top 50 companies on the National Stock Exchange (NSE). Retail investors in India grew from 40 million in 2020 to over 130 million demat accounts by 2024.
Challenge: Build a daily Nifty50 closing price forecaster for short-term (5-day) predictions.
Approach:
- Data: 10 years of daily Nifty50 closing prices from NSE India (nse-india.com).
- Stationarity: ADF test on log-returns (stationary). Raw prices are non-stationary (unit root).
- Models tested: ARIMA(2,1,2), GARCH(1,1) for volatility, LSTM with 60-day window.
- Feature engineering: Add moving averages (20, 50, 200-day), RSI, Bollinger Bands.
Results:
- ARIMA MAPE: 1.8% (5-day forecast)
- LSTM MAPE: 1.2% (same horizon)
- Key insight: Stock prices follow a random walk โ even small improvements matter. LSTM captured regime changes (budget announcements, RBI rate decisions) better than ARIMA.
Lesson: Stock price prediction is extremely difficult (Efficient Market Hypothesis). Models are more useful for volatility estimation and risk management than price direction.
Case Study 2: IMD Monsoon Rainfall Prediction
Context: The Indian monsoon delivers ~70% of annual rainfall between June-September, critical for agriculture (which employs ~42% of the workforce). IMD issues Long Range Forecasts (LRF) in April.
Approach:
- Data: 150+ years of All-India monsoon rainfall (IITM Pune dataset).
- Predictors: ENSO (El Niรฑo), IOD (Indian Ocean Dipole), snow cover, SST.
- Models: SARIMA(1,0,1)(1,1,1,12), Multiple regression with lagged climate indices, Neural network ensemble.
- Evaluation: Categorical accuracy (normal/excess/deficit).
Results: IMD's statistical model correctly categorizes monsoon in ~70% of years. The 2023 El Niรฑo year was correctly predicted as "below normal" with a 3-month lead time.
Case Study 3: COVID-19 India Curve Forecasting
Context: During the 2020-2021 COVID-19 waves, ICMR and IIT research teams built forecasting models to predict daily cases and guide lockdown decisions.
Approach:
- Data: Daily confirmed cases, deaths, recoveries from covid19india.org API.
- Models: SIR/SEIR epidemiological models, ARIMA on log-transformed daily cases, LSTM with mobility data features, Prophet with custom changepoints for lockdown dates.
Key findings: Prophet with manual changepoints at lockdown start/end dates outperformed pure ARIMA. LSTM required careful windowing โ the second wave (Delta variant, April 2021) had fundamentally different dynamics from the first wave, causing models trained on Wave 1 to severely underestimate Wave 2 peaks.
Lesson: Time series models assume the future resembles the past. Black swan events (new variants, policy changes) break this assumption. Ensemble methods with scenario analysis work best for crisis forecasting.
Indian time series often have unique challenges: (1) Missing data during holidays/hartals, (2) Structural breaks from demonetization (Nov 2016) and GST rollout (Jul 2017), (3) Festival-driven seasonality that follows the lunar calendar (Diwali date varies by 15+ days). Always account for these in your models.
Global Case Studies
Case Study 1: Weather Forecasting (NOAA / ECMWF)
Context: Modern weather forecasting combines physics-based Numerical Weather Prediction (NWP) with statistical/ML post-processing. The ECMWF (European Centre for Medium-Range Weather Forecasts) produces the world's best 10-day forecasts.
ML Role:
- Post-processing NWP output using gradient boosting and neural networks.
- Google DeepMind's GraphCast (2023) uses GNNs to produce 10-day forecasts in under a minute (vs. hours for NWP), matching ECMWF accuracy.
- Time series methods (ARIMA, ETS) still used for localized short-term temperature/wind speed forecasting.
Metrics: RMSE for temperature (typically 1-2ยฐC for 24-hour forecasts), skill scores relative to climatological baselines.
Case Study 2: Energy Demand Forecasting (Electricity Grids)
Context: Power grid operators (like ISO New England, or POSOCO in India) must forecast electricity demand hourly to dispatch generation units efficiently. Over-forecasting wastes fuel; under-forecasting causes blackouts.
Approach:
- Features: Historical load, temperature, humidity, day of week, holidays, special events.
- Models: SARIMA for baseline, gradient boosting for complex patterns, LSTM for intra-day load shape.
- Scale: Forecasts generated for 15-minute, hourly, daily, and weekly horizons.
Results: Modern ensembles achieve MAPE of 1-3% for day-ahead forecasting. The challenge grows with renewable energy integration โ solar and wind output depends on weather, adding another layer of uncertainty.
Case Study 3: Retail Sales Forecasting (Walmart / Amazon)
Context: Walmart manages inventory for 4,700+ stores across the US. Accurate demand forecasting reduces waste, prevents stockouts, and saves billions in supply chain costs.
Approach:
- Walmart's M5 competition (2020) on Kaggle: 42,840 hierarchical time series for sales at item-store-state levels.
- Winning solutions used LightGBM with extensive feature engineering: lag features, rolling means, price changes, calendar events, SNAP benefits.
- Prophet used for interpretable decomposition of trends and holiday effects.
Results: Top solutions achieved WRMSSE (Weighted Root Mean Scaled Squared Error) of ~0.52. Key insight: simple lag features + gradient boosting beat deep learning in this competition.
Amazon's demand forecasting system (used in 2023) runs a combination of DeepAR (autoregressive RNN) and classical methods on millions of SKUs. They published the DeepAR paper showing that training a single global model across all related time series outperforms fitting individual ARIMA models to each series โ a paradigm shift called "cross-learning."
Startup Applications
15.1 FinTech โ Cash Flow Forecasting
Startups: Razorpay, CashFlo, Recko (India)
Fintech startups help SMEs forecast cash flows using transaction history. Prophet models decompose revenue into trend + weekly + monthly seasonality. Alert systems flag anomalies (sudden drops in UPI collections) using residual-based detection.
15.2 AgriTech โ Crop Yield Prediction
Startups: CropIn, Fasal, DeHaat (India)
These startups use satellite imagery time series (NDVI index over crop growth cycles) + weather time series to forecast crop yields. Models include: SARIMA for rainfall patterns, LSTM for multi-source sensor data fusion, and Prophet for price forecasting to advise farmers on optimal selling times.
15.3 HealthTech โ Patient Volume Forecasting
Startups: Practo, 1mg
Hospital bed occupancy, emergency room visits, and appointment demand follow time series patterns with weekly seasonality (higher on Mondays), annual seasonality (flu season), and pandemic-driven anomalies. Accurate forecasting enables efficient staff scheduling.
15.4 E-Commerce โ Demand Sensing
Startups: Meesho, Udaan, Flipkart
Indian e-commerce faces extreme demand spikes during sales events (Big Billion Days, Great Indian Festival). Time series models augmented with event indicators and real-time search trend data enable "demand sensing" โ adjusting forecasts hours/days ahead of traditional weekly cycles.
Data Scientist โ Forecasting: Specialized roles at companies like Amazon, Flipkart, Uber, and OLA focus exclusively on time series forecasting. Required skills: statsmodels, Prophet, deep learning, and strong statistical foundations. These roles are among the highest-paying DS positions due to direct revenue impact.
Government Applications
16.1 RBI โ Inflation Forecasting
The Reserve Bank of India uses time series models to forecast CPI inflation for its monetary policy decisions. The RBI's Quarterly Projection Model combines ARIMA with structural economic models. Accurate inflation forecasts determine repo rate changes that affect every Indian borrower.
16.2 NITI Aayog โ GDP Forecasting
India's GDP forecasting uses VAR models with high-frequency indicators (IIP, PMI, GST collections, electricity consumption) as predictors. The "nowcasting" approach uses real-time data to estimate current-quarter GDP before official statistics are released.
16.3 Ministry of Health โ Epidemic Surveillance
IDSP (Integrated Disease Surveillance Programme) monitors weekly disease incidence across India. Time series anomaly detection flags unusual spikes in dengue, malaria, and respiratory illness cases at the district level, triggering rapid response.
16.4 Smart Cities Mission โ Traffic Forecasting
Cities like Pune, Hyderabad, and Bengaluru use traffic sensor time series data to forecast congestion, optimize signal timings, and plan infrastructure. SARIMA models capture daily and weekly patterns, while LSTM models incorporate weather and event data.
India's GST (Goods and Services Tax) collections are a key economic time series. Monthly collections crossed โน2 lakh crore in April 2024. The Ministry of Finance uses time series analysis to forecast revenue and set fiscal targets for the Union Budget.
Industry Applications
17.1 Manufacturing โ Predictive Maintenance
Sensor time series (vibration, temperature, pressure) from machines are monitored using anomaly detection. LSTM autoencoders learn "normal" operating patterns and flag deviations. Companies like Tata Steel and Reliance Industries use this to prevent unplanned downtime worth crores per hour.
17.2 Telecom โ Network Traffic Forecasting
Jio, Airtel, and Vi forecast network traffic to plan capacity. Time series at the cell tower level show strong daily patterns (peaks at 8-10 PM) and weekly patterns (weekends vs. weekdays). SARIMA and Prophet handle this well, while LSTM captures special events like cricket match streaming.
17.3 Banking โ ATM Cash Demand
Banks forecast cash withdrawal patterns at each ATM to optimize replenishment schedules. The series shows strong patterns: salary days (1st and 15th), weekends, festivals, and even election days. Post-demonetization (Nov 2016), models had to rapidly adapt to fundamentally changed withdrawal patterns.
17.4 Pharma โ Drug Demand Forecasting
Pharmaceutical companies forecast drug demand considering: seasonal illness patterns (flu season), patent expiry impacts (generic entry), and supply chain lead times. Cipla and Sun Pharma use hierarchical time series models (bottom-up from SKU level to national).
17.5 Aviation โ Ticket Price Optimization
IndiGo, Air India, and SpiceJet use time series models for dynamic pricing. Historical booking curves (cumulative bookings vs. days before departure) are modeled per route-class combination. Forecasting residual demand enables revenue optimization (yield management).
| Industry | Typical Series | Preferred Model | Forecast Horizon |
|---|---|---|---|
| Finance | Stock prices, volatility | ARIMA + GARCH | 1-5 days |
| Retail | SKU-level sales | Prophet / LightGBM | 1-13 weeks |
| Energy | Electricity load | SARIMA + LSTM | 1-24 hours |
| Weather | Temperature, rainfall | NWP + ML post-processing | 1-10 days |
| Telecom | Network traffic | SARIMA + Prophet | Hours to days |
| Healthcare | Patient admissions | Holt-Winters | 1-4 weeks |
| Manufacturing | Sensor readings | LSTM Autoencoder | Real-time |
Mini Projects
๐๏ธ Mini Project 1: Stock Price Forecaster
Objective: Build an end-to-end system that forecasts the next 5 trading days of Nifty50/RELIANCE stock prices.
Steps:
- Data Collection: Use
yfinancelibrary to download 5 years of daily OHLCV data for RELIANCE.NS - EDA: Plot closing prices, compute rolling statistics, check stationarity (ADF, KPSS)
- Feature Engineering: Log returns, 20/50/200-day moving averages, RSI, MACD, Bollinger Bands
- Model 1 โ ARIMA: Fit ARIMA on log prices, use auto_arima for order selection
- Model 2 โ Prophet: Fit Prophet with custom holiday for Indian market holidays (Republic Day, Diwali, etc.)
- Model 3 โ LSTM: Window=60 days, 2-layer LSTM (64, 32 units), train on 80% data
- Comparison: Evaluate all models on test set using RMSE, MAPE, directional accuracy
- Deployment: Create a simple Streamlit app showing forecasts with confidence intervals
PYTHON
# Starter code for Mini Project 1
import yfinance as yf
import pandas as pd
import numpy as np
from statsmodels.tsa.arima.model import ARIMA
from prophet import Prophet
# Download data
ticker = "RELIANCE.NS"
df = yf.download(ticker, start="2019-01-01", end="2024-12-31")
print(f"Downloaded {len(df)} rows for {ticker}")
# Basic analysis
df['Log_Return'] = np.log(df['Close'] / df['Close'].shift(1))
df['MA_20'] = df['Close'].rolling(20).mean()
df['MA_50'] = df['Close'].rolling(50).mean()
# Train-test split
train = df[:'2024-06-30']
test = df['2024-07-01':]
print(f"Train: {len(train)}, Test: {len(test)}")
# ARIMA on log prices
log_price = np.log(train['Close'])
model = ARIMA(log_price, order=(2, 1, 2))
fitted = model.fit()
forecast_log = fitted.forecast(steps=len(test))
forecast_price = np.exp(forecast_log)
rmse = np.sqrt(np.mean((test['Close'].values - forecast_price.values) ** 2))
print(f"ARIMA RMSE: โน{rmse:.2f}")
# TODO: Add Prophet and LSTM models, compare, build Streamlit app
Expected Deliverables: Jupyter notebook with visualizations, model comparison table, Streamlit dashboard, and a 2-page report discussing why stock prices are fundamentally hard to predict (EMH).
๐๏ธ Mini Project 2: Weather Predictor (Temperature Forecasting)
Objective: Forecast daily maximum temperature for an Indian city (Delhi/Mumbai/Bangalore) for the next 7 days.
Steps:
- Data: Download historical weather data from Open-Meteo API or Visual Crossing for 10+ years
- EDA: Annual seasonality (summer/winter), decompose into trend + season + residual
- Stationarity: After seasonal differencing (lag=365), check with ADF
- Model 1 โ SARIMA: SARIMA(p,d,q)(P,D,Q,365) โ challenging due to long seasonal period, consider seasonal_period=7 for weekly patterns
- Model 2 โ Holt-Winters: Additive seasonality with period=365
- Model 3 โ Prophet: Captures yearly + weekly seasonality automatically
- Model 4 โ LSTM: Window=90 days, include humidity and wind speed as extra features
- Ensemble: Simple average of top 2 models' forecasts
PYTHON
# Starter code for Mini Project 2
import pandas as pd
import numpy as np
import requests
# Fetch data from Open-Meteo API
url = "https://archive-api.open-meteo.com/v1/archive"
params = {
"latitude": 28.6139, # Delhi
"longitude": 77.209,
"start_date": "2014-01-01",
"end_date": "2024-12-31",
"daily": "temperature_2m_max,temperature_2m_min,precipitation_sum",
"timezone": "Asia/Kolkata"
}
response = requests.get(url, params=params)
data = response.json()
df = pd.DataFrame({
'date': pd.to_datetime(data['daily']['time']),
'temp_max': data['daily']['temperature_2m_max'],
'temp_min': data['daily']['temperature_2m_min'],
'precip': data['daily']['precipitation_sum']
})
df.set_index('date', inplace=True)
print(f"Delhi weather data: {len(df)} days")
print(df.describe())
# Prophet model
from prophet import Prophet
prophet_df = df[['temp_max']].reset_index()
prophet_df.columns = ['ds', 'y']
model = Prophet(yearly_seasonality=True, weekly_seasonality=True)
model.fit(prophet_df)
future = model.make_future_dataframe(periods=7)
forecast = model.predict(future)
print("\nNext 7-day Temperature Forecast:")
print(forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail(7))
Expected Deliverables: Complete notebook, model comparison with RMSE/MAPE, visualization of forecast vs actual for test period, and discussion of which model handles Delhi's extreme summer (45ยฐC+) best.
๐๏ธ Mini Project 3: Anomaly Detection in Server Metrics
Objective: Build a real-time anomaly detection system for server CPU/memory time series.
Steps:
- Generate synthetic server metrics with injected anomalies (spikes, level shifts, trend changes)
- Implement statistical method: Z-score on rolling window residuals
- Implement Prophet-based method: Flag points outside prediction intervals
- Implement LSTM Autoencoder: Reconstruction error-based anomaly scoring
- Compare precision/recall/F1 on known anomalies
- Build a dashboard showing real-time metric with anomaly highlights
Exercises
Given the series [10, 12, 15, 19, 24, 30], compute the first and second differences by hand. Is the first difference stationary? Is the second?
For an AR(1) model Y(t) = 2 + 0.5Y(t-1) + ฮต(t), compute: (a) the long-run mean, (b) ฯ(1), ฯ(2), ฯ(3), (c) Y(t+1) if Y(t) = 8.
Explain the difference between ACF and PACF. How do you use them together to identify ARIMA orders?
For an MA(1) model Y(t) = 10 + ฮต(t) + 0.4ฮต(t-1) with ฯยฒ=9, compute: (a) E[Y(t)], (b) Var(Y(t)), (c) ฯ(1), (d) ฯ(2).
Apply SES with ฮฑ=0.2 to [200, 180, 190, 210, 195, 205, 220]. Start with ลท(1)=200. Compute all forecasts and the final forecast for period 8.
A series has ADF p-value = 0.34 and KPSS p-value = 0.01. What do you conclude? What action should you take?
You see ACF that decays gradually and PACF that cuts off sharply at lag 3. What model would you identify? Write the model equation.
Explain why a random walk Y(t) = Y(t-1) + ฮต(t) is non-stationary. What is its variance at time t?
Compute MAPE, RMSE, and SMAPE for: Actual = [50, 60, 70, 80, 90], Forecast = [52, 58, 73, 76, 95].
What is the difference between additive and multiplicative decomposition? Give a real-world Indian example where multiplicative is preferred.
Write Python code to perform seasonal decomposition on monthly airline passenger data using statsmodels. Plot and interpret each component.
For SARIMA(1,1,1)(1,1,1,12), how many parameters need to be estimated? Write out the full model equation using backshift notation.
Prove that the variance of an AR(1) process Y(t) = ฯY(t-1) + ฮต(t) is ฯยฒ/(1-ฯยฒ). What happens as |ฯ| โ 1?
Implement Holt-Winters (additive) from scratch in Python. Test on a series with clear trend and seasonality. Compare your output with statsmodels.
Build an LSTM model in TensorFlow for multi-step forecasting (predict 7 days ahead simultaneously). Compare seq2one (iterate) vs seq2seq (direct) approaches.
Use Prophet to forecast Indian GST monthly collections. Add custom holidays for Diwali, Eid, and Republic Day. How do holidays affect GST?
Implement a VAR(2) model for Nifty50 and USD/INR exchange rate. Perform Granger causality tests. Does one "cause" the other?
Create a time series anomaly detector using z-scores on rolling window residuals. Test it on synthetic data with 5 injected anomalies.
Compare AIC and BIC for model selection. Fit ARIMA(1,1,0), ARIMA(0,1,1), ARIMA(1,1,1), and ARIMA(2,1,1) on a dataset. Which does AIC prefer? Which does BIC prefer? Why might they differ?
What is the Box-Cox transformation? If the optimal ฮป=0.5, what transformation is applied? How do you inverse-transform forecasts?
Build an LSTM autoencoder for time series anomaly detection. Train on normal data only, then use reconstruction error to flag anomalies in a test set containing anomalies.
Explain cross-validation for time series. Why can't you use standard k-fold? Implement expanding window and sliding window CV schemes.
Multiple Choice Questions
Interview Questions
Model Answer: A stationary time series has constant mean, variance, and autocovariance over time. Most classical models (ARIMA, VAR) assume stationarity because their parameters are estimated from the entire series โ if statistical properties change over time, the estimated parameters are meaningless averages. Non-stationary series also lead to spurious regression (two unrelated trending series appear correlated). We achieve stationarity through differencing, log transforms, or detrending.
Model Answer: AR(p) models the current value as a linear function of p past values โ it captures momentum/persistence. MA(q) models the current value as a function of q past forecast errors โ it captures shocks. ARIMA(p,d,q) combines both with d rounds of differencing to handle non-stationarity. AR captures gradual effects (autocorrelation), MA captures sudden effects (error corrections), and the I (Integration) handles trends.
Model Answer: Options include: (1) SARIMA with seasonal differencing and seasonal AR/MA terms, (2) Holt-Winters with explicit trend and seasonal components, (3) Prophet which models trend (piecewise-linear) and seasonality (Fourier terms) separately, (4) STL decomposition to remove trend/season, then model residuals with ARIMA. For large datasets, LSTM with calendar features works well. I'd start with decomposition to understand the components, then choose the model based on data size and forecasting horizon.
Model Answer: Choose Prophet when: you have multiple seasonalities (daily + weekly + yearly), many missing values, you need to add holiday effects easily, or when you're forecasting at scale (thousands of series with minimal tuning). Choose ARIMA when: you have few data points (<100), you need a purely statistical approach with confidence intervals grounded in theory, or when the series is well-modeled by a low-order ARIMA (interpretable coefficients). ARIMA is also faster for single-series inference.
Model Answer: (1) Forward fill (locf) โ carry last observation forward, (2) Linear interpolation โ works for gradual changes, (3) Seasonal interpolation โ fill with value from same season last year, (4) Model-based imputation โ fit ARIMA on available data, fill gaps with predictions, (5) Prophet handles missing values natively. The key is understanding whether missingness is random (MCAR) or systematic (sensor failure = extended gaps). Never drop rows in time series โ it destroys temporal structure.
Model Answer: Granger Causality tests whether past values of series X improve the forecast of series Y beyond what past values of Y alone provide. If including lagged X significantly reduces the forecast error (F-test p-value < 0.05), we say X "Granger-causes" Y. However, it does NOT imply true causation โ it's about predictive information, not causal mechanism. Two series could be Granger-causal because of a confounding third variable. It's a statistical concept, not a causal one.
Model Answer: Standard k-fold CV randomly shuffles data, breaking temporal dependencies and causing data leakage (future data used to predict past). Instead, use: (1) Train-test split preserving order (last 20% as test), (2) Expanding window CV โ train on data up to time t, test on t+1 to t+h, then expand, (3) Sliding window CV โ fixed-size training window moves forward. Metrics: RMSE for magnitude, MAPE for relative error, SMAPE for symmetry, and directional accuracy for trading applications.
Model Answer: (1) Data hungry โ needs thousands of points, unsuitable for short series, (2) Black box โ no interpretable coefficients, problematic for regulated industries, (3) Training time and compute cost โ much slower than ARIMA, (4) Sensitive to hyperparameters โ window size, layers, units, learning rate, (5) Multi-step forecasting error compounds โ iterative forecasting accumulates errors, (6) Doesn't inherently model seasonality โ must be engineered as features, (7) Modern alternatives (Transformers, N-BEATS) often outperform on benchmarks.
Model Answer: At scale, I'd use a multi-tier approach: (1) Statistical baselines โ rolling z-score (fast, no training), flag |z| > 3, (2) Seasonal-aware โ STL decomposition to remove season, then z-score on residuals, (3) For important series โ Prophet with uncertainty intervals, or Isolation Forest on feature-engineered representation, (4) Training a single LSTM autoencoder on all "normal" series and using reconstruction error, (5) Streaming approaches โ online algorithms (ADWIN) for concept drift detection. The key is balancing precision (avoiding false alarms) with recall (catching real anomalies).
Model Answer: ADF p-value 0.6 โ fail to reject Hโ (unit root), so ADF says non-stationary. KPSS p-value 0.01 โ reject Hโ (stationary), so KPSS also says non-stationary. Both tests agree: the series is non-stationary. Action: apply differencing (d=1), then retest. If the disagreement case arose (ADF says stationary, KPSS says not, or vice versa), the series is "near unit root" โ consider fractional differencing or detrending as alternatives to full differencing.
Model Answer: Holt-Winters directly models level, trend, and seasonality with smoothing equations โ it's intuitive and computationally light. ARIMA models the differenced series through AR and MA terms โ more flexible, with formal diagnostic tools (Ljung-Box, AIC/BIC). Prefer Holt-Winters when: the series has clear trend + seasonality, you need quick results, or for production systems needing fast updates. Prefer ARIMA when: the series is complex (multiple differencing needed), you need formal statistical inference, or for academic rigor.
Research Problems
๐ฌ Research Problem 1: Foundation Models for Indian Time Series
Problem: Large language models have been adapted for time series (TimeGPT, Chronos). Can a foundation model pre-trained on diverse Indian time series (weather, stocks, agriculture, telecom) transfer knowledge across domains? How does cross-domain pre-training compare to domain-specific models?
Approach: Collect a large corpus of Indian time series data across 5+ domains. Pre-train a Transformer-based model. Evaluate zero-shot and few-shot performance on held-out series from each domain. Compare with domain-specific ARIMA and LSTM models.
References: Ansari et al. (2024) "Chronos: Learning the Language of Time Series"; Garza & Mergenthaler-Canseco (2023) "TimeGPT-1".
๐ฌ Research Problem 2: Causal Discovery in Multivariate Indian Economic Time Series
Problem: India's economic indicators (CPI, WPI, repo rate, GDP, Nifty50, USD/INR, crude oil) form a complex web of causal relationships. Can we move beyond Granger causality to discover true causal structure using modern causal inference methods (PCMCI, Convergent Cross-Mapping)?
Approach: Apply PCMCI (Runge et al., 2019) to monthly data from RBI, MOSPI, and NSE. Compare causal graphs with Granger causality results. Validate against known economic mechanisms (e.g., RBI rate โ lending rates โ GDP).
๐ฌ Research Problem 3: Forecasting Indian Monsoon Extremes with Deep Learning
Problem: Climate change is making Indian monsoon patterns more erratic. Can attention-based deep learning models (Temporal Fusion Transformers) improve prediction of extreme rainfall events (>100mm/day) at the district level, with lead times of 1-7 days?
Approach: Use IMD gridded rainfall data (0.25ยฐ resolution), ERA5 reanalysis data, and sea surface temperature as inputs. Train a TFT with custom loss function penalizing missed extremes. Compare with IMD's current NWP-based warnings.
๐ฌ Research Problem 4: Explainable Time Series Anomaly Detection for Smart Cities
Problem: Current anomaly detection in urban sensor networks (traffic, pollution, water quality) produces alerts but rarely explains WHY an anomaly occurred. Can we build interpretable anomaly detectors that provide natural language explanations linking anomalies to root causes?
Approach: Combine SHAP-based feature attribution on gradient boosting models with LLM-generated explanations. Train on labeled historical anomalies from Pune Smart City data where root causes are known (water main break, festival traffic, industrial pollution event). Evaluate explanation quality with domain experts.
Key Takeaways
References
Foundational Texts
- Box, G.E.P., Jenkins, G.M., Reinsel, G.C., & Ljung, G.M. (2015). Time Series Analysis: Forecasting and Control, 5th Edition. Wiley. โ The classic reference.
- Hyndman, R.J. & Athanasopoulos, G. (2021). Forecasting: Principles and Practice, 3rd Edition. OTexts. Free online: otexts.com/fpp3 โ The best modern textbook.
- Hamilton, J.D. (1994). Time Series Analysis. Princeton University Press. โ Graduate-level econometric treatment.
- Shumway, R.H. & Stoffer, D.S. (2017). Time Series Analysis and Its Applications, 4th Edition. Springer.
Key Research Papers
- Dickey, D.A. & Fuller, W.A. (1979). "Distribution of the Estimators for Autoregressive Time Series with a Unit Root." JASA, 74(366), 427-431.
- Kwiatkowski, D. et al. (1992). "Testing the null hypothesis of stationarity against the alternative of a unit root." J. of Econometrics, 54(1-3), 159-178.
- Hochreiter, S. & Schmidhuber, J. (1997). "Long Short-Term Memory." Neural Computation, 9(8), 1735-1780.
- Taylor, S.J. & Letham, B. (2018). "Forecasting at Scale." The American Statistician, 72(1), 37-45. โ The Prophet paper.
- Salinas, D. et al. (2020). "DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks." International J. of Forecasting, 36(3), 1181-1191.
- Lim, B. et al. (2021). "Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting." International J. of Forecasting, 37(4), 1748-1764.
- Oreshkin, B.N. et al. (2020). "N-BEATS: Neural Basis Expansion Analysis for Interpretable Time Series Forecasting." ICLR 2020.
- Lam, R. et al. (2023). "Learning Skillful Medium-Range Global Weather Forecasting." Science, 382(6677), 1416-1421. โ GraphCast.
- Ansari, A.F. et al. (2024). "Chronos: Learning the Language of Time Series." arXiv:2403.07815.
Indian Context References
- Rajeevan, M. et al. (2012). "Analysis of variability and trends of extreme rainfall events over India using 104 years of gridded daily rainfall data." Geophysical Research Letters, 39(6).
- Reserve Bank of India (2023). "Report on Currency and Finance 2022-23." RBI Publications. โ Contains RBI's forecasting methodology.
- Indian Meteorological Department (2024). "End of Season Report โ Southwest Monsoon 2023." IMD, Ministry of Earth Sciences.
- Sahoo, B.B. et al. (2019). "Long short-term memory (LSTM) recurrent neural network for low-flow hydrological time series forecasting." Acta Geophysica, 67, 1471-1481.
Software & Libraries
- statsmodels: statsmodels.org โ Python statistical models including ARIMA, VAR.
- Prophet: facebook.github.io/prophet โ Meta's forecasting library.
- pmdarima: alkaline-ml.com/pmdarima โ Auto-ARIMA for Python.
- TensorFlow/Keras: tensorflow.org โ LSTM implementation.
- Nixtla: github.com/Nixtla โ TimeGPT and StatsForecast libraries.
- Darts: unit8co.github.io/darts โ Unified time series library supporting many models.
Online Resources
- Kaggle โ M5 Forecasting Competition: kaggle.com/c/m5-forecasting-accuracy
- IITM Pune โ Indian Rainfall Data: tropmet.res.in
- NSE India Historical Data: nseindia.com
Time series forecasting is where theory meets humility. George Box's famous quote โ "All models are wrong, but some are useful" โ is most true here. No model can predict a pandemic, a demonetization, or a war. The best forecasters combine statistical rigor with domain knowledge, scenario planning, and honest uncertainty quantification. As you build your career, remember: a simple model with good uncertainty intervals is more useful than a complex model that's overconfident.