Computer Organization & Architecture

Unit 6: Memory Unit

From registers to hard drives โ€” master the memory hierarchy, cache mapping techniques, virtual memory, and solve GATE-level numericals with confidence.

โฑ๏ธ 8 hrs theory + 5 hrs lab  |  ๐ŸŽฏ GATE ~4 marks  |  ๐Ÿ–ฅ๏ธ Snapdragon Cache

๐Ÿ’ผ Jobs this unlocks: VLSI Design Engineer (โ‚น6โ€“12 LPA)  |  Embedded Systems Developer (โ‚น5โ€“10 LPA)  |  SoC Verification Engineer (โ‚น8โ€“15 LPA)

Section A

Opening Hook โ€” Why Does Your Laptop Slow Down with 100 Tabs?

๐Ÿ–ฅ๏ธ The Mystery of the 100-Tab Slowdown

You've done it. We all have. You open Chrome, start with 5 tabs, then 20, then 50โ€ฆ and by the time you hit 100 tabs, your laptop turns into a space heater that can barely scroll. Your fancy 16 GB RAM machine is now slower than a โ‚น5,000 phone. Why?

The answer lies in the memory hierarchy. Your CPU doesn't just grab data from RAM. It first checks its tiny ultra-fast L1 cache (32 KB, ~1 ns). Miss? It checks the L2 cache (256 KB, ~5 ns). Still miss? L3 cache (8 MB, ~20 ns). All misses? It finally goes to RAM (16 GB, ~100 ns). But with 100 tabs, even RAM fills up, and the OS starts using your SSD as virtual memory โ€” that's 1000ร— slower than RAM. That's the slowdown.

Qualcomm's Snapdragon 8 Gen 3 chip (inside your Samsung Galaxy S24) has a 12 MB L3 cache designed by Indian engineers in Hyderabad. Apple's M3 has a 36 MB L2. Every nanosecond saved in cache design translates to billions of dollars in market advantage. This chapter teaches you exactly how that works.

๐Ÿ‡ฎ๐Ÿ‡ณ Qualcomm Hyderabad๐Ÿ‡ฎ๐Ÿ‡ณ Samsung SemiconductorIntelApple SiliconAMD๐Ÿ‡ฎ๐Ÿ‡ณ ISRO NavIC
A single L1 cache access (~1 ns) vs a hard disk access (~10 ms) is a 10,000,000ร— speed difference. If L1 cache speed were a blink of your eye (300 ms), then waiting for a hard disk would be equivalent to waiting 95 years. That's why cache design is the most performance-critical job in chip companies like Qualcomm India.
Section B

Learning Outcomes โ€” Bloom's Taxonomy Mapped

Bloom's LevelLearning Outcome
๐Ÿ”ต RememberList the levels of the memory hierarchy with access times, sizes, and cost per bit
๐Ÿ”ต RememberDefine cache memory, hit ratio, miss penalty, TLB, and page fault
๐ŸŸข UnderstandExplain how direct mapping, fully associative, and set-associative mapping work with tag/line/word fields
๐ŸŸข UnderstandDescribe virtual memory organisation including page tables, TLB, and demand paging
๐ŸŸก ApplyCompute tag, line, and word bits for a given cache configuration and calculate AMAT
๐ŸŸก ApplyTrace a reference string through cache using FIFO replacement and calculate hit rate
๐ŸŸ  AnalyzeCompare write-through vs write-back policies and analyse their performance trade-offs
๐ŸŸ  AnalyzeAnalyse why set-associative mapping is preferred over direct and fully associative in modern CPUs
๐Ÿ”ด EvaluateEvaluate the cache design trade-offs in Snapdragon vs Apple Silicon processors
๐Ÿ”ด EvaluateAssess the impact of page size on TLB miss rate and internal fragmentation
๐ŸŸฃ CreateDesign a 2-level cache hierarchy for a given workload with AMAT constraints
๐ŸŸฃ CreateSimulate a cache replacement algorithm for a given reference string and propose optimisations
Section C

Concept Explanation โ€” Memory Unit from Scratch

1. Memory Hierarchy โ€” The Speed-Size-Cost Pyramid

Imagine a library. You keep your most-used notes on your desk (registers โ€” fastest, tiny). Books you need today are on the shelf beside you (cache). The library room has thousands of books (RAM). The basement archive has millions (SSD/HDD). Each level is bigger but slower. A computer's memory works exactly the same way.

๐Ÿ”บ The Complete Memory Hierarchy Pyramid

        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
        โ”‚ REGISTERS โ”‚  โ† 0.3 ns | 256 Bโ€“2 KB | โ‚นโ‚นโ‚นโ‚นโ‚น (on-chip)
        โ”‚  (CPU)    โ”‚     Flip-flops, zero latency for ALU
        โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
        โ”‚  L1 CACHE โ”‚  โ† 1 ns   | 32โ€“64 KB   | โ‚นโ‚นโ‚นโ‚น  (on-chip SRAM)
        โ”‚ (per core)โ”‚     Split: I-cache + D-cache
        โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
        โ”‚  L2 CACHE โ”‚  โ† 5 ns   | 256 KBโ€“1 MB| โ‚นโ‚นโ‚น   (on-chip SRAM)
        โ”‚ (per core)โ”‚     Unified instruction + data
        โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
        โ”‚  L3 CACHE โ”‚  โ† 20 ns  | 4โ€“36 MB    | โ‚นโ‚น    (shared SRAM)
        โ”‚ (shared)  โ”‚     Shared across all cores
        โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
        โ”‚ MAIN MEM  โ”‚  โ† 100 ns | 4โ€“64 GB    | โ‚น     (DRAM)
        โ”‚  (RAM)    โ”‚     Volatile, row/column addressing
        โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
        โ”‚   SSD     โ”‚  โ† 50 ฮผs  | 256 GBโ€“4 TB| โ‚น/10  (NAND Flash)
        โ”‚(secondary)โ”‚     Non-volatile, no moving parts
        โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
        โ”‚   HDD     โ”‚  โ† 10 ms  | 1โ€“20 TB    | โ‚น/100 (magnetic)
        โ”‚(secondary)โ”‚     Spinning platters, mechanical arm
        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

  Speed:  โ—„โ”€โ”€โ”€โ”€โ”€โ”€ FASTEST โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ SLOWEST โ”€โ”€โ”€โ”€โ”€โ”€โ–บ
  Size:   โ—„โ”€โ”€โ”€โ”€โ”€โ”€ SMALLEST โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ LARGEST โ”€โ”€โ”€โ”€โ”€โ”€โ–บ
  Cost/b: โ—„โ”€โ”€โ”€โ”€โ”€โ”€ MOST EXPENSIVE โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ CHEAPEST โ”€โ”€โ”€โ”€โ”€โ–บ
LevelTechnologyAccess TimeTypical SizeCost/GB (approx.)Volatile?
RegistersFlip-flops0.3 ns~1 KBโ€”Yes
L1 CacheSRAM1 ns32โ€“64 KB~โ‚น5,00,000Yes
L2 CacheSRAM5 ns256 KBโ€“1 MB~โ‚น2,00,000Yes
L3 CacheSRAM20 ns4โ€“36 MB~โ‚น50,000Yes
RAMDRAM100 ns4โ€“64 GB~โ‚น250Yes
SSDNAND Flash50 ฮผs256 GBโ€“4 TB~โ‚น5No
HDDMagnetic10 ms1โ€“20 TB~โ‚น2No
Qualcomm's Snapdragon 8 Gen 3 (designed in Hyderabad & Bangalore) features: 64 KB L1 I-cache + 64 KB L1 D-cache per Cortex-X4 core, 1 MB L2 per core, and a 12 MB shared L3 cache. This cache hierarchy is what makes your Samsung/OnePlus phone run games at 120 FPS without a desktop-class RAM.
GATE Favourite: The key principle is locality of reference. Temporal locality = if you accessed data now, you'll likely access it again soon (loops). Spatial locality = if you accessed address X, you'll likely access X+1 soon (arrays). Caches exploit both.

2. Cache Memory โ€” Structure & Organisation

Cache memory is a small, fast SRAM buffer between the CPU and main memory. Its job: keep the most frequently accessed data close to the CPU so the processor doesn't waste 100 ns waiting for RAM every time.

๐Ÿ—๏ธ Cache Memory Block Diagram

   CPU                           CACHE                        MAIN MEMORY
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”                   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”               โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚     โ”‚ โ”€โ”€ Address โ”€โ”€โ”€โ”€โ”€โ”€โ–บโ”‚              โ”‚               โ”‚              โ”‚
  โ”‚ CPU โ”‚                   โ”‚  Tag  Array  โ”‚โ”€โ”€ Miss โ”€โ”€โ”€โ”€โ”€โ”€โ–บโ”‚     RAM      โ”‚
  โ”‚     โ”‚โ—„โ”€โ”€ Data โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”‚  Data Array  โ”‚โ—„โ”€โ”€ Block โ”€โ”€โ”€โ”€โ”€โ”‚   (DRAM)     โ”‚
  โ”‚     โ”‚                   โ”‚  Valid Bits  โ”‚               โ”‚              โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”˜                   โ”‚  Dirty Bits  โ”‚               โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                            โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

  Cache Line Structure:
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ Valid โ”‚  Tag  โ”‚     Data Block (B bytes)              โ”‚
  โ”‚  (1b) โ”‚(t bits)โ”‚  Wordโ‚€ โ”‚ Wordโ‚ โ”‚ Wordโ‚‚ โ”‚ ... โ”‚ Wโ‚™  โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Key terms:

โ€ข Cache Line (Block): The smallest unit of data transferred between cache and RAM. Typical: 32 or 64 bytes.

โ€ข Tag: Identifies which main memory block is currently stored in this cache line.

โ€ข Valid bit: 1 = line has valid data, 0 = empty/invalid.

โ€ข Dirty bit: (Write-back only) 1 = line modified, needs to be written back to RAM.

โ€ข Hit: Requested data found in cache. Miss: Not found โ†’ fetch from RAM.

โ€ข Hit Ratio (h): h = (Number of hits) / (Total accesses). Typical: 0.90โ€“0.99.

Modern Intel CPUs achieve L1 hit rates of 95โ€“97%. This means out of every 100 memory accesses, only 3โ€“5 actually need to go to L2 or beyond. That's the magic of good cache design + spatial/temporal locality.

3. Direct Mapping โ€” [Tag | Line | Word]

Analogy: Think of a hostel with 8 rooms. Each student is assigned a fixed room based on their roll number: Room = Roll % 8. Student 0, 8, 16, 24 all map to Room 0. If Student 0 is in Room 0 and Student 8 arrives, Student 0 gets kicked out. No choice โ€” it's direct mapping.

๐Ÿ“ Direct Mapped Cache โ€” Address Breakdown

  Given: Main Memory = 2โฟ bytes, Cache Lines = 2หก, Block Size = 2สท bytes

  CPU Address (n bits):
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚     TAG      โ”‚  LINE/INDEX  โ”‚  WORD OFFSET โ”‚
  โ”‚  (n-l-w) bitsโ”‚   (l bits)   โ”‚   (w bits)   โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

  Mapping Formula:
    Cache Line Number = (Main Memory Block Number) mod (Number of Cache Lines)
    Line Number = Block Address mod 2หก

  Example: 32-bit address, 512 lines, 4 words/block (16 bytes)
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚     TAG (19)     โ”‚ LINE (9)  โ”‚ W(4) โ”‚
  โ”‚   19 bits        โ”‚  9 bits   โ”‚ 4 bitsโ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”˜
  
  Total = 19 + 9 + 4 = 32 bits โœ“

  Direct Mapped Cache Layout:
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ Valid โ”‚ Tag โ”‚ Wordโ‚€  โ”‚ Wordโ‚  โ”‚ Wordโ‚‚  โ”‚ Wordโ‚ƒ โ”‚  โ† Line 0
  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
  โ”‚ Valid โ”‚ Tag โ”‚ Wordโ‚€  โ”‚ Wordโ‚  โ”‚ Wordโ‚‚  โ”‚ Wordโ‚ƒ โ”‚  โ† Line 1
  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
  โ”‚  ...  โ”‚ ... โ”‚  ...   โ”‚  ...   โ”‚  ...   โ”‚  ...  โ”‚
  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
  โ”‚ Valid โ”‚ Tag โ”‚ Wordโ‚€  โ”‚ Wordโ‚  โ”‚ Wordโ‚‚  โ”‚ Wordโ‚ƒ โ”‚  โ† Line 511
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

How a lookup works:

  1. Extract the LINE bits from the CPU address โ†’ go to that cache line
  2. Compare the TAG field of that line with the TAG bits from the address
  3. If TAG matches AND Valid=1 โ†’ HIT! Use the WORD OFFSET to pick the right word
  4. If TAG doesn't match or Valid=0 โ†’ MISS! Fetch block from RAM, replace this line
Students confuse "word offset" with "byte offset." If the block has 4 words (each word = 4 bytes), the word offset needs 2 bits (to select which word). But if the question says "byte-addressable," you need 2 extra bits to select within a word. Always check: is the address in words or bytes?

4. Fully Associative Mapping โ€” [Tag | Word]

Analogy: Unlike the hostel (direct mapping), think of a parking lot with 8 spots. Any car can park in any spot. When a new car arrives and the lot is full, you use a replacement policy (kick out the oldest = FIFO, kick out least recently used = LRU). Maximum flexibility, but you need to check all spots simultaneously.

๐Ÿ“ Fully Associative Cache โ€” Address Breakdown

  CPU Address (n bits):
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚          TAG             โ”‚  WORD OFFSET โ”‚
  โ”‚      (n - w) bits        โ”‚   (w bits)   โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

  NO LINE/INDEX field! Any block can go in ANY cache line.

  Example: 32-bit address, Block = 16 bytes (w = 4)
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚         TAG (28)           โ”‚ W(4) โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”˜

  Lookup: CPU sends tag โ†’ ALL lines compare simultaneously (parallel comparators)
  
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ Valid โ”‚   Tag (28)  โ”‚  Data Block (16 bytes)   โ”‚  โ† Line 0  โ”€โ”
  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค              โ”‚
  โ”‚ Valid โ”‚   Tag (28)  โ”‚  Data Block (16 bytes)   โ”‚  โ† Line 1   โ”œโ”€ All compared
  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค              โ”‚  in PARALLEL
  โ”‚  ...  โ”‚    ...      โ”‚        ...               โ”‚              โ”‚
  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค              โ”‚
  โ”‚ Valid โ”‚   Tag (28)  โ”‚  Data Block (16 bytes)   โ”‚  โ† Line N  โ”€โ”˜
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
              โ–ฒ Compare with incoming tag

Advantage: No conflict misses โ€” any block can go anywhere.

Disadvantage: Expensive! Needs a comparator for every cache line. Hardware cost scales with cache size.

Used for: TLBs (small, needs high hit rate), small L1 caches in some designs.

5. Set-Associative Mapping โ€” The Best of Both Worlds

Analogy: Compromise! Instead of one fixed room (direct) or any room (associative), we have hostels (sets), each with a few rooms (ways). A student must go to their assigned hostel but can pick any room inside it. This gives flexibility within a set while keeping hardware cost reasonable.

๐Ÿ“ K-Way Set-Associative Cache (2-Way Example)

  CPU Address (n bits):
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚     TAG      โ”‚  SET INDEX   โ”‚  WORD OFFSET โ”‚
  โ”‚(n - s - w) b โ”‚   (s bits)   โ”‚   (w bits)   โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

  Number of Sets = Total Lines / K  (where K = associativity)
  s = logโ‚‚(Number of Sets)

  2-Way Set-Associative Cache Layout (4 sets, 8 total lines):

         Way 0                    Way 1
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ Vโ”‚Tag โ”‚Data โ”‚      โ”‚  โ”‚ Vโ”‚Tag โ”‚Data โ”‚      โ”‚  โ† Set 0
  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”ค  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”ค
  โ”‚ Vโ”‚Tag โ”‚Data โ”‚      โ”‚  โ”‚ Vโ”‚Tag โ”‚Data โ”‚      โ”‚  โ† Set 1
  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”ค  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”ค
  โ”‚ Vโ”‚Tag โ”‚Data โ”‚      โ”‚  โ”‚ Vโ”‚Tag โ”‚Data โ”‚      โ”‚  โ† Set 2
  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”ค  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”ค
  โ”‚ Vโ”‚Tag โ”‚Data โ”‚      โ”‚  โ”‚ Vโ”‚Tag โ”‚Data โ”‚      โ”‚  โ† Set 3
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”˜

  Lookup Process:
  1. Use SET INDEX โ†’ go to that set
  2. Compare TAG with BOTH Way 0 and Way 1 simultaneously
  3. If either matches (and Valid=1) โ†’ HIT
  4. Both miss โ†’ MISS โ†’ replace one way (FIFO/LRU)

  Special Cases:
  โ€ข K = 1 (1-way)  โ†’ Direct Mapped
  โ€ข K = N (N-way)  โ†’ Fully Associative
  โ€ข K = 2 or 4     โ†’ Most common in modern CPUs
GATE Trick: In a K-way set-associative cache with C total lines: Number of Sets = C/K, Set Index bits = logโ‚‚(C/K). The tag bits increase as K increases (because set index bits decrease). Direct mapping has the most index bits; fully associative has zero.

6. Cache Hit/Miss Trace โ€” Reference String with FIFO

Let's trace how a cache handles a sequence of memory references. This is a classic GATE question type.

๐Ÿ“Š Worked Example: 4-Line Direct-Mapped Cache with FIFO

Setup: 4 cache lines (lines 0โ€“3), direct-mapped, block size = 1 word.

Reference String (block numbers): 0, 8, 0, 6, 8, 2, 0, 6

Mapping: Cache Line = Block Number mod 4

  Block โ†’ Cache Line:
  Block 0 โ†’ Line 0 (0 mod 4 = 0)
  Block 8 โ†’ Line 0 (8 mod 4 = 0)  โ† CONFLICT with Block 0!
  Block 6 โ†’ Line 2 (6 mod 4 = 2)
  Block 2 โ†’ Line 2 (2 mod 4 = 2)  โ† CONFLICT with Block 6!

  Trace Table:
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ Step โ”‚ Request โ”‚ Line 0 โ”‚ Line 1 โ”‚ Line 2 โ”‚ Line 3 โ”‚ Hit/Miss โ”‚
  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
  โ”‚  1   โ”‚  Blk 0  โ”‚  [0]   โ”‚   โ€”    โ”‚   โ€”    โ”‚   โ€”    โ”‚  MISS    โ”‚
  โ”‚  2   โ”‚  Blk 8  โ”‚  [8]   โ”‚   โ€”    โ”‚   โ€”    โ”‚   โ€”    โ”‚  MISS    โ”‚
  โ”‚  3   โ”‚  Blk 0  โ”‚  [0]   โ”‚   โ€”    โ”‚   โ€”    โ”‚   โ€”    โ”‚  MISS    โ”‚
  โ”‚  4   โ”‚  Blk 6  โ”‚  [0]   โ”‚   โ€”    โ”‚  [6]   โ”‚   โ€”    โ”‚  MISS    โ”‚
  โ”‚  5   โ”‚  Blk 8  โ”‚  [8]   โ”‚   โ€”    โ”‚  [6]   โ”‚   โ€”    โ”‚  MISS    โ”‚
  โ”‚  6   โ”‚  Blk 2  โ”‚  [8]   โ”‚   โ€”    โ”‚  [2]   โ”‚   โ€”    โ”‚  MISS    โ”‚
  โ”‚  7   โ”‚  Blk 0  โ”‚  [0]   โ”‚   โ€”    โ”‚  [2]   โ”‚   โ€”    โ”‚  MISS    โ”‚
  โ”‚  8   โ”‚  Blk 6  โ”‚  [0]   โ”‚   โ€”    โ”‚  [6]   โ”‚   โ€”    โ”‚  MISS    โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

  Hits = 0, Misses = 8
  Hit Rate = 0/8 = 0% (Terrible! All conflict misses)

This is the worst case for direct mapping โ€” all references map to just 2 lines, causing constant thrashing. A 2-way set-associative cache would dramatically improve this.

Now with 2-Way Set-Associative (2 sets, 2 ways each):

  Set = Block mod 2
  Block 0 โ†’ Set 0 | Block 8 โ†’ Set 0 | Block 6 โ†’ Set 0 | Block 2 โ†’ Set 0

  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ Step โ”‚ Request โ”‚ Set 0 (W0, W1) โ”‚ Set 1 (W0, W1) โ”‚ Hit/Miss โ”‚
  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
  โ”‚  1   โ”‚  Blk 0  โ”‚  [0, โ€”]        โ”‚  [โ€”, โ€”]        โ”‚  MISS    โ”‚
  โ”‚  2   โ”‚  Blk 8  โ”‚  [0, 8]        โ”‚  [โ€”, โ€”]        โ”‚  MISS    โ”‚
  โ”‚  3   โ”‚  Blk 0  โ”‚  [0, 8]        โ”‚  [โ€”, โ€”]        โ”‚  HIT โœ…  โ”‚
  โ”‚  4   โ”‚  Blk 6  โ”‚  [6, 8] FIFO   โ”‚  [โ€”, โ€”]        โ”‚  MISS    โ”‚
  โ”‚  5   โ”‚  Blk 8  โ”‚  [6, 8]        โ”‚  [โ€”, โ€”]        โ”‚  HIT โœ…  โ”‚
  โ”‚  6   โ”‚  Blk 2  โ”‚  [6, 2] FIFO   โ”‚  [โ€”, โ€”]        โ”‚  MISS    โ”‚
  โ”‚  7   โ”‚  Blk 0  โ”‚  [0, 2] FIFO   โ”‚  [โ€”, โ€”]        โ”‚  MISS    โ”‚
  โ”‚  8   โ”‚  Blk 6  โ”‚  [0, 6] FIFO   โ”‚  [โ€”, โ€”]        โ”‚  MISS    โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

  Hits = 2, Misses = 6
  Hit Rate = 2/8 = 25% (Better than 0% with direct mapping!)
In FIFO replacement, the block that entered earliest is replaced โ€” NOT the least recently used. FIFO and LRU can give different results. GATE often tests this distinction. With LRU, the block that was accessed least recently is replaced, even if it entered later.

7. Write-Through vs Write-Back โ€” Comparison

FeatureWrite-ThroughWrite-Back
MechanismEvery write updates both cache AND main memory simultaneouslyWrite only to cache; update main memory when line is evicted
SpeedSlower (every write goes to RAM)Faster (writes are buffered in cache)
ConsistencyCache and RAM always consistentCan be inconsistent; needs dirty bit tracking
Dirty BitNot neededRequired (1 = modified, needs writeback)
Write BufferOften uses a write buffer to avoid CPU stallsNot needed for writes
ComplexitySimpler hardwareMore complex (needs dirty bit logic + writeback)
Best ForMultiprocessor systems (coherency), I/O devicesSingle-processor, performance-critical systems
Used InL1 D-cache (some ARM designs)L2/L3 caches, most modern CPUs
Miss PolicyWrite-allocate or Write-no-allocateUsually write-allocate (fetch block then write)
GATE Tip: "Write-allocate" = on a write miss, fetch the block into cache first, then write. "Write-no-allocate (write-around)" = on a write miss, write directly to main memory, don't fetch into cache. Write-through usually pairs with write-no-allocate. Write-back usually pairs with write-allocate.

8. Virtual Memory โ€” Page Table, TLB, Demand Paging

Analogy: Imagine you're a teacher with 60 students but only 30 chairs. You give each student a "virtual seat number" (1โ€“60). When a student comes to class, you assign them a real chair. If all chairs are full, the least active student goes to the "waiting room" (disk). That's virtual memory โ€” every process gets its own full address space, but physical RAM is shared.

๐Ÿ“ Virtual Memory Address Translation

  Virtual Address (from CPU):
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚  Virtual Page No. โ”‚  Page Offset โ”‚
  โ”‚    (VPN)          โ”‚   (d bits)   โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
           โ”‚
           โ–ผ
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚       PAGE TABLE           โ”‚
  โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
  โ”‚ โ”‚Validโ”‚ Dirty โ”‚Frame No. โ”‚ โ”‚
  โ”‚ โ”‚  1  โ”‚   0   โ”‚  0x3A    โ”‚ โ”‚ โ† VPN 0
  โ”‚ โ”‚  1  โ”‚   1   โ”‚  0x2F    โ”‚ โ”‚ โ† VPN 1
  โ”‚ โ”‚  0  โ”‚   0   โ”‚   โ€”      โ”‚ โ”‚ โ† VPN 2 (PAGE FAULT!)
  โ”‚ โ”‚  1  โ”‚   0   โ”‚  0x71    โ”‚ โ”‚ โ† VPN 3
  โ”‚ โ”‚ ... โ”‚  ...  โ”‚   ...    โ”‚ โ”‚
  โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
           โ”‚
           โ–ผ
  Physical Address:
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚  Physical Frame   โ”‚  Page Offset โ”‚
  โ”‚   Number (PFN)    โ”‚   (d bits)   โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

  Page Fault: Valid=0 โ†’ page not in RAM โ†’ OS fetches from disk (VERY slow: ~10 ms)

Translation Lookaside Buffer (TLB):

  CPU โ”€โ”€VPNโ”€โ”€โ–บ โ”Œโ”€โ”€โ”€โ”€โ”€โ” Hit โ”€โ”€PFNโ”€โ”€โ–บ Physical Address
               โ”‚ TLB โ”‚ (fast: ~1 ns, fully associative)
               โ””โ”€โ”€โ”€โ”€โ”€โ”˜
                 โ”‚ Miss
                 โ–ผ
               โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
               โ”‚ Page Table  โ”‚ (in RAM: ~100 ns)
               โ”‚  Walk       โ”‚
               โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                 โ”‚ Page Fault
                 โ–ผ
               โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
               โ”‚    Disk     โ”‚ (10 ms โ€” catastrophic!)
               โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

TLB is a small, fast cache (typically 32โ€“128 entries, fully associative) that stores recent VPNโ†’PFN translations. TLB hit rate is typically 99%+ in well-designed systems.

Demand Paging: Pages are loaded into RAM only when accessed (not pre-loaded). This saves RAM โ€” most of a process's pages are never touched.

A page fault takes ~10 ms. At a CPU clock of 3 GHz, that's ~30 million wasted cycles. If page faults happened on even 1% of accesses, your computer would be 100,000ร— slower. That's why the OS works incredibly hard to keep the page fault rate below 0.0001%.

9. Content Addressable Memory (CAM)

Normal memory (RAM): you give an address, it returns data. CAM is the reverse: you give data (a search key), it returns the address/location where that data is stored โ€” in a single clock cycle.

๐Ÿ“ CAM vs RAM โ€” Fundamental Difference

  RAM (Address โ†’ Data):               CAM (Data โ†’ Address):
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ Address โ”‚   Data   โ”‚              โ”‚ Search   โ”‚   Match?     โ”‚
  โ”‚    0    โ”‚  0xAB    โ”‚              โ”‚ Key:0xCD โ”‚              โ”‚
  โ”‚    1    โ”‚  0xCD โ—„โ”€โ”€โ”‚โ”€โ”€ Read       โ”‚          โ”‚ Line 0: No   โ”‚
  โ”‚    2    โ”‚  0xEF    โ”‚              โ”‚          โ”‚ Line 1: YES โ—„โ”คโ”€โ”€ Found!
  โ”‚    3    โ”‚  0x12    โ”‚              โ”‚          โ”‚ Line 2: No   โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜              โ”‚          โ”‚ Line 3: No   โ”‚
   Input: Address                     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
   Output: Data                        Input: Data (search key)
                                       Output: Location (address)

Where is CAM used?

  • TLB โ€” Search by VPN, get PFN in one cycle
  • Network routers โ€” Search by IP address for routing table lookup
  • Fully associative caches โ€” All tags compared simultaneously = CAM behaviour

TCAM (Ternary CAM): Each bit can be 0, 1, or X (don't care). Used in firewalls and routers for wildcard matching.

10. DRAM, SSD & HDD โ€” Main & Secondary Storage

DRAM (Dynamic RAM)

DRAM stores each bit as a charge on a tiny capacitor. The charge leaks, so DRAM needs periodic refresh (every ~64 ms). It's cheaper and denser than SRAM (1 transistor + 1 capacitor per bit vs 6 transistors for SRAM), which is why we use DRAM for main memory.

FeatureSRAM (Cache)DRAM (RAM)
Storage Element6 transistors (flip-flop)1 transistor + 1 capacitor
Speed~1โ€“20 ns~100 ns
Refresh Needed?NoYes (every ~64 ms)
DensityLow (6T per bit)High (1T1C per bit)
Cost/bitHighLow
Used ForL1/L2/L3 cache, registersMain memory (DDR4/DDR5)

SSD (Solid State Drive)

Uses NAND flash memory. No moving parts, so it's shock-resistant and faster than HDD. Data stored in floating-gate transistors that trap electrons. Typical read latency: ~50 ฮผs. Limited write endurance (cells wear out after ~3,000โ€“100,000 write cycles).

HDD (Hard Disk Drive)

Magnetic storage on spinning platters. A mechanical arm moves to the right track (seek time ~5 ms) and waits for the right sector to rotate under it (rotational latency ~4 ms at 7200 RPM). Total access time: ~10 ms. Cheapest โ‚น/GB but slowest.

  HDD Access Time Breakdown:
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚  Seek Time        Rotational Latency     Transfer Time     โ”‚
  โ”‚  (~5 ms)          (~4.2 ms @ 7200 RPM)   (~0.01 ms)       โ”‚
  โ”‚  โ—„โ”€โ”€ Arm moves โ”€โ”€โ–บโ—„โ”€โ”€ Platter spins โ”€โ”€โ–บโ—„โ”€โ”€ Data read โ”€โ”€โ–บ  โ”‚
  โ”‚                                                             โ”‚
  โ”‚  Total โ‰ˆ 9โ€“10 ms per random access                         โ”‚
  โ”‚  Rotational Latency = (1/2) ร— (60/RPM) seconds             โ”‚
  โ”‚  For 7200 RPM: (1/2) ร— (60/7200) = 4.17 ms                โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
Samsung Semiconductor India (Noida & Bangalore) employs 3,000+ engineers working on DRAM and NAND flash design. Samsung is the world's #1 memory chip maker. Their latest DDR5-7200 modules are designed in collaboration with Indian R&D teams and power servers in Google's, Amazon's, and Azure's Indian data centres.

๐Ÿ“ Worked Numerical โ€” Complete GATE-Style Problem

๐Ÿงฎ Problem: 512-Line Direct-Mapped Cache, 32-bit Address

Given:

  • Cache: 512 lines, direct-mapped
  • Block size: 4 words (1 word = 4 bytes โ†’ block = 16 bytes)
  • Address: 32-bit, byte-addressable
  • Cache hit time = 1 ns, miss penalty = 100 ns, hit rate = 0.95

Find: (a) Tag, Line, Word bits   (b) Cache data size   (c) Total cache size   (d) AMAT

Solution:

(a) Address Field Breakdown:

  Block size = 16 bytes โ†’ Word Offset = logโ‚‚(16) = 4 bits
  Lines = 512 = 2โน  โ†’ Line Index  = 9 bits
  Tag = 32 - 9 - 4   = 19 bits

  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚  Tag (19)  โ”‚ Line (9) โ”‚ Word (4) โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

(b) Cache Data Size:

  Data = Number of Lines ร— Block Size
       = 512 ร— 16 bytes
       = 8,192 bytes = 8 KB

(c) Total Cache Size (including overhead):

  Each line stores: 1 valid bit + 19 tag bits + 128 data bits (16 bytes)
                  = 1 + 19 + 128 = 148 bits per line
  Total = 512 ร— 148 = 75,776 bits = 9,472 bytes โ‰ˆ 9.25 KB

  Overhead = Total - Data = 9.25 KB - 8 KB = 1.25 KB (for tags + valid bits)

(d) Average Memory Access Time (AMAT):

  AMAT = Hit Time + Miss Rate ร— Miss Penalty
       = 1 + (1 - 0.95) ร— 100
       = 1 + 0.05 ร— 100
       = 1 + 5
       = 6 ns

  Without cache: 100 ns. With cache: 6 ns โ†’ 16.7ร— speedup!
Cache = Chai Shop at College Gate! โ˜• Think of it this way:
โ€ข Registers = The chai cup already in your hand (instant access)
โ€ข L1 Cache = The chai shop right at the college gate (10 seconds walk)
โ€ข L2 Cache = The canteen inside campus (2 minutes walk)
โ€ข L3 Cache = The CCD/Starbucks on the main road (10 minutes)
โ€ข RAM = Going home to make chai (30 minutes travel)
โ€ข HDD = Ordering chai leaves from Amazon and waiting 2 days
You always try the nearest shop first. If it has your favorite Cutting Chai โ€” HIT! If not โ€” MISS, go to the next level. That's exactly how CPU cache works!
Section D

Learn by Doing โ€” 3-Tier Lab Structure

๐ŸŸข Tier 1 โ€” GUIDED: Cache Address Decoder (Python)

โฑ๏ธ 45โ€“60 minutesBeginnerZero prior knowledge assumed

Objective:

Write a Python program that takes a memory address, cache configuration, and outputs the Tag, Line/Set, and Word offset fields.

Step 1: Get User Inputs

Python
# Cache Address Decoder
address_bits = int(input("Enter address width (bits): "))       # e.g., 32
num_lines    = int(input("Enter number of cache lines: "))       # e.g., 512
block_size   = int(input("Enter block size (bytes): "))          # e.g., 16
address_hex  = input("Enter memory address (hex, e.g. 0x1A3F): ")

Step 2: Calculate Bit Fields

Python
import math

word_bits = int(math.log2(block_size))
line_bits = int(math.log2(num_lines))
tag_bits  = address_bits - line_bits - word_bits

print(f"Tag: {tag_bits} bits | Line: {line_bits} bits | Word: {word_bits} bits")

Step 3: Decode the Address

Python
address = int(address_hex, 16)
word_offset = address & ((1 << word_bits) - 1)
line_index  = (address >> word_bits) & ((1 << line_bits) - 1)
tag_value   = address >> (word_bits + line_bits)

print(f"Address: {address_hex} โ†’ Tag={tag_value} | Line={line_index} | Word={word_offset}")
Enter address width (bits): 32 Enter number of cache lines: 512 Enter block size (bytes): 16 Enter memory address (hex, e.g. 0x1A3F): 0x0001CAFE Tag: 19 bits | Line: 9 bits | Word: 4 bits Address: 0x0001CAFE โ†’ Tag=0 | Line=458 | Word=14

๐ŸŸก Tier 2 โ€” SEMI-GUIDED: Cache Simulator with Hit/Miss Tracking

โฑ๏ธ 90โ€“120 minutesIntermediateHints provided

Mission:

Build a Python cache simulator that takes a reference string and reports hits, misses, and hit rate for direct-mapped and set-associative caches.

Hints:

  1. Create a list of None values to represent cache lines: cache = [None] * num_lines
  2. For each reference: compute line = ref % num_lines
  3. Check if cache[line] == ref โ†’ HIT, else โ†’ MISS and replace
  4. For set-associative: use a list of lists. Each set is a list with K slots
  5. Track hits and misses in counters. Print hit rate at the end
Python
# Skeleton โ€” fill in the blanks
def simulate_direct(refs, num_lines):
    cache = [None] * num_lines
    hits = 0
    for ref in refs:
        line = ref % num_lines
        if cache[line] == ref:
            hits += 1
            print(f"Ref {ref} โ†’ Line {line} โ†’ HIT")
        else:
            cache[line] = ref
            print(f"Ref {ref} โ†’ Line {line} โ†’ MISS")
    print(f"Hit Rate: {hits}/{len(refs)} = {hits/len(refs)*100:.1f}%")

refs = [0, 8, 0, 6, 8, 2, 0, 6]
simulate_direct(refs, 4)
Stretch Goal: Extend to 2-way set-associative with FIFO and LRU replacement. Compare hit rates for the same reference string.

๐Ÿ”ด Tier 3 โ€” OPEN CHALLENGE: Full Cache Hierarchy Analyzer

โฑ๏ธ 2โ€“3 hoursAdvancedNo instructions โ€” design from scratch

The Brief:

Build a complete cache hierarchy simulator that models L1 โ†’ L2 โ†’ RAM access with:

  1. L1 Cache: Direct-mapped, 64 lines, 4-word blocks
  2. L2 Cache: 4-way set-associative, 256 lines, 8-word blocks, LRU replacement
  3. Input: Read a reference string from a file (at least 100 addresses)
  4. Output: L1 hit rate, L2 hit rate, overall AMAT, total access time
  5. Bonus: Generate a visual trace table showing L1/L2 hits/misses per access

AMAT Formula for 2-level cache:

  AMAT = Hit_Time_L1 + Miss_Rate_L1 ร— (Hit_Time_L2 + Miss_Rate_L2 ร— Miss_Penalty_RAM)
This project is resume-worthy. A working cache simulator with trace output demonstrates deep understanding of computer architecture. Include it in your GitHub with a README and screenshots โ€” VLSI/embedded systems companies actively look for this.
Section E

Practice Problems โ€” Diagrams, Numericals, Industry & GATE

๐Ÿ“Š Diagram-Based Questions (3)

D1

Draw the complete memory hierarchy pyramid for a modern smartphone (Snapdragon 8 Gen 3). Label each level with: technology, size, access time, and one real-world example of data stored at that level.

RememberDiagram
Refer to Section C.1 pyramid. Registers: current instruction operands. L1: loop variable. L2: function's local arrays. L3: shared data structures. RAM: open application data. SSD: installed apps. HDD: movies/backups.
D2

Draw a detailed block diagram of a 2-way set-associative cache with 8 sets. Show the address field breakdown for a 32-bit address with 64-byte blocks. Label all comparators, MUX, valid bits, tag arrays, and data arrays.

ApplyDiagram
Word offset = logโ‚‚(64) = 6 bits. Sets = 8 โ†’ Set index = 3 bits. Tag = 32 - 3 - 6 = 23 bits. Two comparators (one per way), both compare tag with incoming 23 bits. Outputs go through OR โ†’ HIT signal. MUX selects data from the matching way.
D3

Draw the virtual memory address translation flow diagram showing: CPU โ†’ TLB โ†’ Page Table โ†’ Physical Memory, with the page fault handler path to disk. Include all timing labels.

UnderstandDiagram
CPU sends VPN โ†’ TLB check (~1 ns). TLB hit: PFN directly โ†’ physical address. TLB miss: Page table walk in RAM (~100 ns). Valid=1: get PFN, update TLB. Valid=0: Page fault โ†’ OS interrupt โ†’ fetch from disk (~10 ms) โ†’ update page table โ†’ update TLB โ†’ retry.

๐Ÿงฎ Numerical Problems (6)

N1

A direct-mapped cache has 1024 lines, block size = 8 words (1 word = 4 bytes), address = 32 bits. Find: (a) Tag, Line, and Byte Offset bits (b) Total cache data storage in KB (c) Total cache size including tag and valid bits.

ApplyIntermediate
Block = 8ร—4 = 32 bytes. Byte offset = logโ‚‚(32) = 5 bits. Lines = 1024 = 2ยนโฐ โ†’ Line = 10 bits. Tag = 32 - 10 - 5 = 17 bits. (b) Data = 1024 ร— 32 = 32,768 bytes = 32 KB. (c) Per line: 1 + 17 + 256 = 274 bits. Total = 1024 ร— 274 = 280,576 bits = 34.25 KB.
N2

A 4-way set-associative cache has 256 total lines, block size = 64 bytes, address = 32 bits. Find: (a) Number of sets (b) Tag, Set, Offset bits (c) Number of tag comparators needed.

ApplyIntermediate
(a) Sets = 256/4 = 64. (b) Offset = logโ‚‚(64) = 6 bits. Set index = logโ‚‚(64) = 6 bits. Tag = 32 - 6 - 6 = 20 bits. (c) 4 comparators (one per way, all compare in parallel within the selected set).
N3

A system has: L1 hit time = 1 ns, L1 miss rate = 5%, L2 hit time = 10 ns, L2 miss rate = 20%, RAM access time = 100 ns. Calculate the Average Memory Access Time (AMAT).

ApplyGATE
AMAT = 1 + 0.05 ร— (10 + 0.20 ร— 100) = 1 + 0.05 ร— (10 + 20) = 1 + 0.05 ร— 30 = 1 + 1.5 = 2.5 ns
N4

A virtual memory system has: virtual address = 32 bits, physical address = 28 bits, page size = 4 KB. Find: (a) Number of virtual pages (b) Number of physical frames (c) Page table entries (d) Size of page table if each entry is 4 bytes.

ApplyGATE
(a) Page offset = logโ‚‚(4K) = 12 bits. Virtual pages = 2^(32-12) = 2ยฒโฐ = 1,048,576. (b) Physical frames = 2^(28-12) = 2ยนโถ = 65,536. (c) Page table entries = 2ยฒโฐ (one per virtual page). (d) Size = 2ยฒโฐ ร— 4 = 4 MB.
N5

An HDD spins at 10,000 RPM. Average seek time = 4 ms. Sector size = 512 bytes, transfer rate = 200 MB/s. Calculate average access time for one sector.

ApplyIntermediate
Rotational latency = (1/2) ร— (60/10000) = 3 ms. Transfer time = 512 / (200 ร— 10โถ) โ‰ˆ 0.0025 ms โ‰ˆ 0. Access time = Seek + Rotational + Transfer = 4 + 3 + 0 โ‰ˆ 7 ms.
N6

A CPU generates 64-bit addresses. The cache is fully associative with 128 lines, block size = 32 bytes. (a) How many tag bits per line? (b) If hit rate = 0.92, hit time = 2 ns, miss penalty = 80 ns, find AMAT. (c) How many comparators are needed?

ApplyAdvanced
(a) Word offset = logโ‚‚(32) = 5 bits. No line bits (fully associative). Tag = 64 - 5 = 59 bits. (b) AMAT = 2 + 0.08 ร— 80 = 2 + 6.4 = 8.4 ns. (c) 128 comparators (one per cache line โ€” all compared in parallel).

๐Ÿญ Industry Application Questions (3)

I1

Qualcomm's Snapdragon 8 Gen 3 has a 12 MB L3 cache shared across 8 cores. If each core generates 2 billion memory accesses per second and the L3 hit rate is 70% (for accesses that miss L1+L2), calculate how many RAM accesses per second the L3 cache prevents.

AnalyzeIndustry
Total accesses reaching L3 = 8 ร— 2B = 16 billion/sec. L3 prevents: 0.70 ร— 16B = 11.2 billion RAM accesses/sec. Without L3, the memory bus would need to handle 16B accesses/sec instead of 4.8B โ€” it would be a bottleneck.
I2

Samsung's DDR5-7200 has a peak bandwidth of 57.6 GB/s per channel. A server motherboard has 8 channels. If a database workload requires 400 GB/s bandwidth, is this configuration sufficient? What would you recommend?

EvaluateIndustry
Total bandwidth = 8 ร— 57.6 = 460.8 GB/s. Yes, sufficient (400 < 460.8) with ~13% headroom. However, sustained bandwidth is typically 60-70% of peak, giving ~276โ€“322 GB/s. Recommendation: use 12 channels or add HBM (High Bandwidth Memory) for the most demanding queries.
I3

ISRO's NavIC satellite navigation system needs to store ephemeris data for 7 satellites with 1 ms update rate. Each update is 256 bytes. Design the cache requirements if data must be accessed within 10 ns with 99.9% hit rate.

CreateIndustry
Data rate per satellite: 256 bytes/ms = 256 KB/s. Total for 7 satellites: 1.75 MB/s. Working set: 7 ร— 256 = 1,792 bytes per update cycle. An L2-level SRAM cache of ~32 KB with fully associative mapping would easily hold all active ephemeris records. With 10 ns requirement โ†’ L2 SRAM is ideal. Set 7 fixed entries with pinning to guarantee 100% hit rate for primary data.

๐ŸŽฏ GATE Previous Year Style Questions (5)

G1 GATE

A direct-mapped cache has 2ยนโด bytes of data and 2โถ byte blocks. The address is 32 bits. What is the tag field size in bits?

  1. 12
  2. 14
  3. 18
  4. 20
ApplyGATE CS
โœ… Answer: (A) 12. Lines = 2ยนโด/2โถ = 2โธ = 256 lines. Byte offset = 6 bits. Line index = 8 bits. Tag = 32 - 8 - 6 = 18. Wait โ€” let me recheck: Data = 2ยนโด bytes, block = 2โถ bytes. Lines = 2ยนโด/2โถ = 2โธ. Offset = 6, Index = 8, Tag = 32-8-6 = 18. Answer: (C) 18.
G2 GATE

Consider a 2-way set-associative cache with 256 cache lines and block size of 4 words (word = 4 bytes). The address length is 32 bits. The size of the tag field is:

  1. 18 bits
  2. 19 bits
  3. 20 bits
  4. 21 bits
ApplyGATE CS
โœ… Answer: (C) 20. Block = 4ร—4 = 16 bytes โ†’ Offset = 4. Sets = 256/2 = 128 = 2โท โ†’ Set index = 7. Tag = 32 - 7 - 4 = 21. Corrected: Answer is (D) 21 bits.
G3 GATE

The effective access time of a memory system with cache hit rate h, cache access time tโ‚, and main memory access time tโ‚‚ (using simultaneous access) is:

  1. h ร— tโ‚ + (1-h) ร— tโ‚‚
  2. tโ‚ + (1-h) ร— tโ‚‚
  3. h ร— tโ‚ + (1-h) ร— (tโ‚ + tโ‚‚)
  4. h ร— (tโ‚ + tโ‚‚) + (1-h) ร— tโ‚‚
UnderstandGATE CS
โœ… Answer: (A). With simultaneous access (cache and memory accessed at the same time): if hit, time = tโ‚ (cancel memory). If miss, time = tโ‚‚. Effective = hร—tโ‚ + (1-h)ร—tโ‚‚. Note: with hierarchical access (check cache first, then memory), the formula is tโ‚ + (1-h)ร—tโ‚‚ โ€” option (B).
G4 GATE

In a virtual memory system with page size of 4 KB, a process has a virtual address space of 2ยณยฒ bytes. The physical memory is 2ยฒโธ bytes. How many entries does the page table have?

  1. 2ยนโถ
  2. 2ยฒโฐ
  3. 2ยฒโด
  4. 2ยฒโธ
ApplyGATE CS
โœ… Answer: (B) 2ยฒโฐ. Page table has one entry per virtual page. Pages = 2ยณยฒ/2ยนยฒ = 2ยฒโฐ = 1,048,576 entries. Physical memory size determines the number of bits in each entry (frame number), not the number of entries.
G5 GATE

A CPU generates 20-bit addresses. The main memory access time is 100 ns. The cache access time is 10 ns with a hit ratio of 0.9. Using hierarchical access, the effective memory access time is:

  1. 20 ns
  2. 19 ns
  3. 110 ns
  4. 91 ns
ApplyGATE CS
โœ… Answer: (A) 20 ns. Hierarchical: Effective = t_cache + (1-h) ร— t_memory = 10 + (1-0.9) ร— 100 = 10 + 10 = 20 ns.
Section F

MCQ Assessment Bank โ€” 30 Questions (Bloom's Mapped)

Remember / Identify (Q1โ€“Q5)

Q1

Which memory is fastest in the memory hierarchy?

  1. DRAM
  2. Cache (SRAM)
  3. CPU Registers
  4. SSD
Remember
โœ… Answer: (C) CPU Registers โ€” Access time ~0.3 ns, directly inside the CPU with zero latency for ALU operations.
Q2

SRAM is used in cache memory because:

  1. It is cheaper than DRAM
  2. It is faster and doesn't need refresh
  3. It has higher density
  4. It uses capacitors for storage
Remember
โœ… Answer: (B) โ€” SRAM uses flip-flops (6 transistors), doesn't need refresh, and is faster (~1-20 ns) than DRAM (~100 ns). It's more expensive but speed is the priority for cache.
Q3

In a direct-mapped cache, the address is divided into:

  1. Tag and Offset
  2. Tag, Line, and Word Offset
  3. Tag and Set
  4. Page Number and Offset
Remember
โœ… Answer: (B) โ€” Direct-mapped: [Tag | Line/Index | Word/Byte Offset]. The Line bits select which cache line, Tag identifies the block, Offset selects the byte within the block.
Q4

Which memory needs periodic refresh?

  1. SRAM
  2. DRAM
  3. ROM
  4. Flash
Remember
โœ… Answer: (B) DRAM โ€” Stores bits as charges on capacitors that leak over time. Must be refreshed every ~64 ms to retain data.
Q5

TLB stands for:

  1. Translation Lookaside Buffer
  2. Table Lookup Block
  3. Transfer Line Buffer
  4. Tag Line Base
Remember
โœ… Answer: (A) Translation Lookaside Buffer โ€” A small, fast cache that stores recent virtual-to-physical address translations to speed up memory access.

Understand / Explain (Q6โ€“Q10)

Q6

Why does increasing cache associativity reduce conflict misses?

  1. It increases cache size
  2. It allows a block to be placed in multiple locations
  3. It makes the cache faster
  4. It reduces the block size
Understand
โœ… Answer: (B) โ€” Higher associativity means more "ways" per set. A block has more placement options, reducing the chance of two blocks fighting for the same slot (conflict miss).
Q7

What is the principle of temporal locality?

  1. If a memory location is accessed, nearby locations will also be accessed
  2. If a memory location is accessed, it will likely be accessed again soon
  3. Memory should be accessed in sequential order
  4. Frequently accessed data should be stored on disk
Understand
โœ… Answer: (B) โ€” Temporal locality: recently accessed data will likely be accessed again soon (e.g., loop counters, frequently called functions). Option (A) describes spatial locality.
Q8

In write-back policy, when is data written to main memory?

  1. On every write operation
  2. Only when the cache line is evicted (replaced)
  3. At fixed time intervals
  4. When the CPU is idle
Understand
โœ… Answer: (B) โ€” Write-back writes to cache only. The dirty bit tracks modifications. Data is written to main memory only when the line is evicted and dirty bit = 1.
Q9

What happens during a page fault?

  1. Cache line is replaced
  2. TLB is flushed
  3. Required page is loaded from disk to RAM by the OS
  4. CPU clock speed is reduced
Understand
โœ… Answer: (C) โ€” Page fault = referenced page is not in RAM (valid bit = 0). OS handles the interrupt: finds the page on disk, loads it into a free frame in RAM, updates the page table, and resumes the instruction.
Q10

Why is fully associative mapping expensive to implement?

  1. It needs more cache lines
  2. It requires a comparator for every cache line
  3. It needs larger block sizes
  4. It requires more address bits
Understand
โœ… Answer: (B) โ€” Every incoming tag must be compared with ALL cache lines simultaneously, requiring N comparators for N lines. Hardware cost is proportional to cache size.

Apply / Calculate (Q11โ€“Q20)

Q11

A direct-mapped cache has 256 lines, block size = 32 bytes, address = 32 bits. The number of tag bits is:

  1. 17
  2. 19
  3. 21
  4. 23
Apply
โœ… Answer: (B) 19. Byte offset = logโ‚‚(32) = 5. Line index = logโ‚‚(256) = 8. Tag = 32 - 8 - 5 = 19 bits.
Q12

A cache has hit rate = 0.96, hit time = 2 ns, miss penalty = 50 ns. The AMAT is:

  1. 4 ns
  2. 3 ns
  3. 5 ns
  4. 4 ns
Apply
โœ… Answer: (A) 4 ns. AMAT = 2 + (1-0.96) ร— 50 = 2 + 0.04 ร— 50 = 2 + 2 = 4 ns.
Q13

In a 4-way set-associative cache with 512 total lines, the number of sets is:

  1. 64
  2. 128
  3. 256
  4. 512
Apply
โœ… Answer: (B) 128. Sets = Total Lines / Associativity = 512 / 4 = 128.
Q14

A virtual memory system has 20-bit virtual addresses, page size = 1 KB. The number of page table entries is:

  1. 512
  2. 1024
  3. 2048
  4. 4096
Apply
โœ… Answer: (B) 1024. Page offset = logโ‚‚(1K) = 10 bits. Virtual pages = 2^(20-10) = 2ยนโฐ = 1024.
Q15

A fully associative cache has 64 lines, block size = 16 bytes, address = 32 bits. The tag size is:

  1. 24 bits
  2. 26 bits
  3. 28 bits
  4. 30 bits
Apply
โœ… Answer: (C) 28 bits. Byte offset = logโ‚‚(16) = 4. No line index (fully associative). Tag = 32 - 4 = 28 bits.
Q16

A 2-level cache system has: L1 access = 1 ns (miss rate 10%), L2 access = 10 ns (miss rate 5%), RAM access = 200 ns. What is the AMAT?

  1. 2 ns
  2. 3 ns
  3. 2 ns
  4. 12 ns
ApplyGATE
โœ… Answer: (B) 3 ns. AMAT = 1 + 0.10 ร— (10 + 0.05 ร— 200) = 1 + 0.10 ร— (10+10) = 1 + 0.10 ร— 20 = 1 + 2 = 3 ns.
Q17

An HDD rotates at 7200 RPM. The average rotational latency is approximately:

  1. 2.08 ms
  2. 4.17 ms
  3. 8.33 ms
  4. 16.67 ms
Apply
โœ… Answer: (B) 4.17 ms. One rotation = 60/7200 = 8.33 ms. Average rotational latency = half a rotation = 8.33/2 = 4.17 ms.
Q18

Cache size = 64 KB, block size = 64 bytes. The number of cache lines is:

  1. 512
  2. 1024
  3. 2048
  4. 4096
Apply
โœ… Answer: (B) 1024. Lines = Cache Size / Block Size = 64 KB / 64 B = 65536/64 = 1024.
Q19

The effective memory access time with hit ratio h=0.9, cache time=10ns, memory time=100ns (hierarchical access) is:

  1. 19 ns
  2. 20 ns
  3. 28 ns
  4. 100 ns
Apply
โœ… Answer: (B) 20 ns. Hierarchical: T_eff = t_cache + (1-h)ร—t_memory = 10 + 0.1ร—100 = 10 + 10 = 20 ns.
Q20

A page table has 2ยฒโฐ entries, each entry is 4 bytes. The total page table size is:

  1. 1 MB
  2. 2 MB
  3. 4 MB
  4. 8 MB
Apply
โœ… Answer: (C) 4 MB. Size = 2ยฒโฐ ร— 4 bytes = 4 ร— 2ยฒโฐ = 4 MB.

Analyze / Compare (Q21โ€“Q25)

Q21

Which cache mapping has the highest conflict miss rate for a given cache size?

  1. Direct mapped
  2. 2-way set-associative
  3. 4-way set-associative
  4. Fully associative
Analyze
โœ… Answer: (A) Direct mapped โ€” each block has exactly one possible location, so conflict misses are maximum. Fully associative has zero conflict misses.
Q22

Increasing block size in a cache initially reduces miss rate but then increases it. This increase is due to:

  1. Increased hit time
  2. Increased conflict misses and reduced number of lines
  3. Decreased tag bits
  4. Increased write-back overhead
Analyze
โœ… Answer: (B) โ€” Larger blocks exploit spatial locality (reducing compulsory misses), but for a fixed cache size, fewer lines means more conflicts. Also, larger blocks mean more unused data brought in (pollution).
Q23

In a multiprocessor system, which write policy simplifies cache coherence?

  1. Write-back
  2. Write-through
  3. Write-allocate
  4. Write-no-allocate
Analyze
โœ… Answer: (B) Write-through โ€” main memory always has the latest data, making it easier for other processors to see consistent values. Write-back requires coherence protocols (MESI, MOESI).
Q24

Which replacement policy can suffer from Bรฉlรกdy's anomaly?

  1. LRU
  2. FIFO
  3. Optimal
  4. Random
AnalyzeGATE
โœ… Answer: (B) FIFO โ€” Bรฉlรกdy's anomaly: increasing the number of cache lines can actually increase the miss rate with FIFO replacement. LRU and Optimal are stack algorithms and immune to this anomaly.
Q25

Why is the TLB typically fully associative despite the high hardware cost?

  1. It needs to store large pages
  2. It has very few entries and must maximize hit rate
  3. It operates at disk speed
  4. It replaces the page table entirely
Analyze
โœ… Answer: (B) โ€” TLB is small (32โ€“128 entries), so the hardware cost of full associativity is manageable. But a TLB miss is extremely expensive (page table walk in RAM), so maximizing hit rate is critical.

Evaluate / Create (Q26โ€“Q30)

Q26

A system architect must choose between a 16 KB direct-mapped cache and an 8 KB 2-way set-associative cache. Assuming the workload has significant conflict misses, which is likely better?

  1. 16 KB direct-mapped (bigger is always better)
  2. 8 KB 2-way set-associative (less conflicts)
  3. Both perform identically
  4. Cannot determine without the workload
Evaluate
โœ… Answer: (D) โ€” While 2-way reduces conflicts, the 16 KB direct-mapped has 2ร— more lines. The answer depends on the specific access pattern. This is why cache simulation is essential in real design.
Q27

If a system's page fault rate increases from 0.001% to 0.01%, and each page fault costs 10 ms, the impact on effective access time is:

  1. Negligible (< 1% change)
  2. Significant (~10ร— increase in fault overhead)
  3. System crashes
  4. Only affects disk performance
Evaluate
โœ… Answer: (B) โ€” Page fault overhead: 0.001% ร— 10ms = 0.0001ms = 100ns per access. At 0.01%: 0.00001 ร— 10ms = 1000ns = 1ฮผs. That's a 10ร— increase โ€” very significant for high-performance systems.
Q28

To design a cache that eliminates all conflict misses, you would choose:

  1. Direct-mapped with large blocks
  2. Fully associative mapping
  3. Set-associative with 2 ways
  4. Write-back policy
Create
โœ… Answer: (B) โ€” Fully associative has zero conflict misses because any block can go anywhere. Only compulsory and capacity misses remain.
Q29

Which approach would most effectively reduce TLB misses for a workload with a 2 GB working set?

  1. Increase TLB entries from 64 to 128
  2. Use larger page sizes (2 MB instead of 4 KB)
  3. Add another TLB level
  4. Both (B) and (C)
EvaluateGATE
โœ… Answer: (D) โ€” Larger pages: 2 GB / 2 MB = 1024 pages vs 2 GB / 4 KB = 524,288 pages. Fewer pages = TLB can cover more of the working set. A second-level TLB catches misses from L1 TLB without going to the page table in RAM.
Q30

A chip designer has a transistor budget of 500K transistors for cache. SRAM uses 6 transistors/bit. What is the maximum data capacity of the cache?

  1. ~10 KB
  2. ~64 KB
  3. ~128 KB
  4. ~83 KB
Create
โœ… Answer: (A) ~10 KB. Total bits = 500,000 / 6 โ‰ˆ 83,333 bits โ‰ˆ 10,416 bytes โ‰ˆ ~10 KB. Note: this is data only โ€” tags, valid bits, and control logic need additional transistors.
Section G

Short Answer Questions (8)

SA1

Define the hit ratio and explain why a hit ratio of 0.95 vs 0.90 can make a significant difference in AMAT. Provide a numerical example.

Answer: Hit ratio h = (Number of cache hits) / (Total memory accesses). Example: Cache time = 1 ns, RAM = 100 ns. At h=0.90: AMAT = 1 + 0.10ร—100 = 11 ns. At h=0.95: AMAT = 1 + 0.05ร—100 = 6 ns. A 5% improvement in hit rate gives a 45% reduction in AMAT (11โ†’6 ns). This is because each miss is extremely expensive (100ร— the hit time), so even small improvements in hit rate yield large performance gains.
SA2

Explain the difference between compulsory, capacity, and conflict misses (the 3 C's of cache misses).

Answer: Compulsory (cold-start) misses: First access to a block โ€” it has never been in cache. Unavoidable. Capacity misses: The working set is larger than cache โ€” blocks get evicted and re-fetched. Solved by increasing cache size. Conflict misses: Multiple blocks map to the same line/set (in direct-mapped or set-associative). Solved by increasing associativity. A fully associative cache has zero conflict misses.
SA3

Distinguish between SRAM and DRAM with respect to storage cell structure, speed, cost, refresh requirement, and usage.

Answer: SRAM: 6 transistors (flip-flop), ~1-20 ns, expensive, no refresh, used for cache. DRAM: 1 transistor + 1 capacitor, ~100 ns, cheap, needs refresh every ~64 ms (charge leaks), used for main memory (RAM). SRAM is ~5-10ร— faster but ~10-20ร— more expensive per bit than DRAM.
SA4

What is a TLB and why is it typically fully associative? What happens on a TLB miss?

Answer: TLB (Translation Lookaside Buffer) is a small, fast cache (32-128 entries) that stores recent virtual page โ†’ physical frame translations. It's fully associative because: (1) it's small (manageable hardware cost), and (2) TLB misses are very expensive (page table walk in RAM: ~100 ns). Maximizing hit rate is critical. On TLB miss: the hardware/OS walks the page table in RAM to find the mapping. If the page is valid โ†’ PFN found, TLB updated. If page is invalid โ†’ page fault, OS loads page from disk.
SA5

Write the AMAT formula for a 2-level cache system and calculate AMAT for: L1 time=1ns (miss rate 8%), L2 time=10ns (miss rate 20%), RAM=200ns.

Answer: AMAT = T_L1 + MR_L1 ร— (T_L2 + MR_L2 ร— T_RAM) = 1 + 0.08 ร— (10 + 0.20 ร— 200) = 1 + 0.08 ร— (10 + 40) = 1 + 0.08 ร— 50 = 1 + 4 = 5 ns. Without any cache: 200 ns. The 2-level cache provides a 40ร— speedup.
SA6

Explain demand paging and how it differs from pre-paging. What are the advantages of demand paging?

Answer: Demand paging: Pages are loaded into RAM only when referenced (on first access โ†’ page fault โ†’ load). Pre-paging: Pages are loaded before they are needed (prefetching). Advantages of demand paging: (1) Saves memory โ€” only needed pages are in RAM, (2) Faster program startup โ€” don't wait to load entire program, (3) Allows running programs larger than physical memory. Disadvantage: initial page faults cause delays.
SA7

Compare write-through and write-back cache policies with respect to: speed, consistency, dirty bit usage, and suitability for multiprocessor systems.

Answer: Write-through: Writes to both cache and RAM simultaneously. Slower (every write hits RAM), but cache and RAM are always consistent. No dirty bit needed. Better for multiprocessor (coherence is simpler). Write-back: Writes only to cache. Faster (no RAM access on every write). Needs dirty bit to track modified lines. Inconsistent until eviction. Preferred for single-processor performance. Most modern CPUs use write-back for L1/L2 with coherence protocols (MESI) for multiprocessor.
SA8

What is Content Addressable Memory (CAM)? How does it differ from conventional RAM? Where is it used?

Answer: RAM: Input = address โ†’ Output = data at that address. CAM: Input = data (search key) โ†’ Output = address/location where that data exists. CAM compares the search key against all stored entries simultaneously (parallel search) in a single clock cycle. Used in: TLBs (search by virtual page number), fully associative caches (parallel tag comparison), network routers (IP lookup tables), and firewalls. TCAM (Ternary CAM) adds "don't care" bits for wildcard matching.
Section H

Long Answer Questions (3)

๐Ÿ“ LA1: Compare all three cache mapping techniques with diagrams, formulas, advantages, disadvantages, and real-world usage (15 marks)

Model Answer Structure:

1. Direct Mapping

Address: [Tag | Line Index | Word Offset]

Formula: Cache Line = Block Number mod (Number of Lines)

Each block has exactly ONE possible cache line.

Advantages: Simple hardware (1 comparator), fast lookup, cheap.

Disadvantages: High conflict miss rate โ€” two blocks mapping to the same line cause thrashing.

Used in: Simple embedded systems, L1 cache in some older designs.

2. Fully Associative Mapping

Address: [Tag | Word Offset] โ€” NO line index field.

Any block can go in ANY cache line.

Advantages: Zero conflict misses, highest flexibility.

Disadvantages: Needs N comparators (one per line), expensive hardware, slower for large caches.

Used in: TLBs (small, need high hit rate), small special-purpose caches.

3. Set-Associative Mapping

Address: [Tag | Set Index | Word Offset]

Formula: Set = Block Number mod (Number of Sets). Within a set, block can go in any way.

K-way: Each set has K lines. Needs K comparators (manageable).

Advantages: Balance of conflict reduction and hardware cost. Optimal for most workloads.

Disadvantages: More complex than direct, slightly slower than direct (K-way comparison).

Used in: L1/L2/L3 in ALL modern CPUs (2-way to 16-way).

  Comparison Summary:
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ Feature        โ”‚ Direct โ”‚ Fully Assoc. โ”‚ K-Way Set-Assoc.  โ”‚
  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
  โ”‚ Placement      โ”‚ Fixed  โ”‚ Anywhere     โ”‚ Within a set      โ”‚
  โ”‚ Comparators    โ”‚ 1      โ”‚ N            โ”‚ K (per set)       โ”‚
  โ”‚ Conflict Miss  โ”‚ High   โ”‚ None         โ”‚ Low               โ”‚
  โ”‚ Hardware Cost  โ”‚ Low    โ”‚ Very High    โ”‚ Medium            โ”‚
  โ”‚ Flexibility    โ”‚ Low    โ”‚ Very High    โ”‚ High              โ”‚
  โ”‚ Hit Rate       โ”‚ Lower  โ”‚ Highest      โ”‚ Near-highest      โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ“ LA2: Explain virtual memory organisation with page table, TLB, and demand paging. Include a complete address translation diagram (15 marks)

Model Answer should cover:

  1. Virtual Memory Concept: Each process gets its own virtual address space (e.g., 4 GB for 32-bit). Physical RAM is shared. The OS + hardware translate virtual โ†’ physical addresses transparently.
  2. Page Table: A data structure (one per process) stored in RAM. Maps virtual page numbers (VPN) to physical frame numbers (PFN). Each entry has: valid bit, dirty bit, frame number, permission bits.
  3. TLB: A fast hardware cache (fully associative, ~32โ€“128 entries) that stores recent VPNโ†’PFN translations. Hit rate typically 99%+. Prevents expensive page table walks for most accesses.
  4. Demand Paging: Pages loaded only when accessed. Page fault โ†’ OS interrupt โ†’ load from disk โ†’ update page table โ†’ resume. Process starts with zero pages in RAM.
  5. Address Translation Flow: CPU sends virtual address โ†’ TLB check (1 ns). Hit โ†’ PFN directly. Miss โ†’ Page table walk (100 ns). Valid โ†’ get PFN, update TLB. Invalid โ†’ Page fault โ†’ Disk (10 ms) โ†’ load page โ†’ update page table โ†’ update TLB โ†’ retry.
  6. Page Replacement: When RAM is full: LRU, FIFO, or Clock algorithm selects a victim page. If dirty โ†’ write back to disk first.

Include the full address translation diagram from Section C.8.

๐Ÿ“ LA3: Solve a comprehensive cache design problem with AMAT calculation for a 2-level cache system (15 marks)

Problem: A processor has a 2-level cache system:

  • L1: Direct-mapped, 128 lines, 32-byte blocks, hit time = 1 ns, miss rate = 10%
  • L2: 4-way set-associative, 1024 lines, 64-byte blocks, hit time = 8 ns, miss rate = 5% (local)
  • RAM access time: 100 ns. Address: 32-bit, byte-addressable.

Find: (a) L1 address breakdown (tag/line/offset) (b) L2 address breakdown (c) AMAT (d) Speedup over no cache (e) If L1 miss rate improves to 5%, new AMAT and % improvement.

Solution:

(a) L1: Block = 32 bytes โ†’ offset = 5 bits
    Lines = 128 = 2โท โ†’ index = 7 bits
    Tag = 32 - 7 - 5 = 20 bits โ†’ [20|7|5]

(b) L2: Block = 64 bytes โ†’ offset = 6 bits
    Sets = 1024/4 = 256 = 2โธ โ†’ set index = 8 bits
    Tag = 32 - 8 - 6 = 18 bits โ†’ [18|8|6]

(c) AMAT = T_L1 + MR_L1 ร— (T_L2 + MR_L2 ร— T_RAM)
         = 1 + 0.10 ร— (8 + 0.05 ร— 100)
         = 1 + 0.10 ร— (8 + 5)
         = 1 + 0.10 ร— 13
         = 1 + 1.3 = 2.3 ns

(d) Speedup = RAM_time / AMAT = 100 / 2.3 = 43.5ร—

(e) New AMAT = 1 + 0.05 ร— 13 = 1 + 0.65 = 1.65 ns
    Improvement = (2.3 - 1.65) / 2.3 ร— 100 = 28.3%
Section I

Industry Spotlight โ€” A Day in the Life

๐Ÿ‘จโ€๐Ÿ’ป Vikram Sahu, 32 โ€” Cache Design Engineer at Samsung Semiconductor, Bangalore

Background: B.Tech (ECE) from NIT Bhopal. M.Tech from IIT Madras (VLSI). Joined Samsung Semiconductor India (SSIR) as a campus hire. Now leads a team of 6 engineers designing L2 cache controllers for Exynos mobile processors.

A Typical Day:

8:30 AM โ€” Morning sync with the Seoul (Korea) team. Review overnight simulation results for the new Exynos 2500 L2 cache design. A corner-case coherence bug was found โ€” discuss fix approaches.

9:30 AM โ€” Write RTL (Register Transfer Level) code in SystemVerilog for a new cache replacement algorithm. Samsung is exploring RRIP (Re-Reference Interval Prediction) to replace LRU.

11:00 AM โ€” Run synthesis and timing analysis using Synopsys Design Compiler. Target: L2 hit time โ‰ค 4 ns at 3.5 GHz. Current design meets timing with 200 ps slack.

1:00 PM โ€” Lunch at Samsung's Bangalore campus. Discuss power-performance trade-offs with the power management team. Every picojoule per cache access matters for phone battery life.

2:00 PM โ€” Run cache trace simulations using SPEC CPU2017 benchmarks. Compare 4-way vs 8-way L2 on workload mix: Chrome, WhatsApp, games, camera app. 8-way gives 2% higher hit rate but 15% more power.

4:30 PM โ€” Code review for a junior engineer's TLB prefetcher design. Suggest optimisations for reducing TLB miss penalty from 20 ns to 14 ns.

6:00 PM โ€” Write a technical report comparing the Exynos cache hierarchy with Snapdragon 8 Gen 3. Present findings to the architecture team in Seoul next week.

DetailInfo
Tools Used DailySystemVerilog, Synopsys VCS, Design Compiler, GEM5 simulator, Python (scripting), Perforce (version control)
Entry Salary (India)โ‚น10โ€“15 LPA (M.Tech) / โ‚น6โ€“8 LPA (B.Tech)
Mid-Level (5โ€“8 yrs)โ‚น20โ€“35 LPA
Senior (10+ yrs)โ‚น40โ€“80 LPA + RSUs
Companies Hiring (India)Samsung SSIR, Qualcomm Hyderabad, Intel Bangalore, AMD Hyderabad, ARM Bangalore, Texas Instruments, MediaTek Noida, NVIDIA
Section J

Earn With It โ€” Memory Optimization Skills

๐Ÿ’ฐ Your Earning Path After This Chapter

Portfolio Piece: A working cache simulator (Python) with trace output + a technical blog post explaining cache mapping with diagrams โ€” hosted on GitHub.

Skill Paths Unlocked:

โ€ข Embedded Systems (Immediate): Optimise memory usage in Arduino/ESP32 projects. Freelance IoT gigs: โ‚น3,000โ€“โ‚น10,000/project

โ€ข VLSI/SoC Design (After GATE/M.Tech): Cache controller design at Samsung, Qualcomm, Intel. Entry: โ‚น10โ€“15 LPA

โ€ข Systems Programming: Write cache-friendly C/C++ code. Performance optimisation gigs: โ‚น5,000โ€“โ‚น20,000/project

โ€ข Technical Content Writing: Write COA tutorials for GeeksforGeeks, Naukri, or Unstop. โ‚น500โ€“โ‚น2,000/article

OpportunitySkills NeededPlatformEarning Potential
COA Tutorial WriterCache concepts + writingGeeksforGeeks, Mediumโ‚น500โ€“โ‚น2,000/article
Embedded IoT ProjectsC/C++, memory optimizationFreelancer, Internshalaโ‚น3,000โ€“โ‚น10,000/project
GATE Coaching AssistantCOA + numerical solvingUnacademy, Physics Wallahโ‚น5,000โ€“โ‚น15,000/month
Performance TuningCache-aware codingUpwork, Toptal$25โ€“$75/hour
Start with technical writing. Write 5 well-explained COA articles with diagrams on Medium or Dev.to. Apply to GeeksforGeeks as a technical writer (โ‚น500โ€“โ‚น2,000/article). This builds your portfolio AND earns money while you're still learning.
Section K

Chapter Summary โ€” Memory Unit at a Glance

๐Ÿง  Key Takeaways

  1. Memory Hierarchy: Registers โ†’ L1 โ†’ L2 โ†’ L3 โ†’ RAM โ†’ SSD โ†’ HDD. Faster = smaller = costlier.
  2. Locality of Reference: Temporal (reuse recently accessed) + Spatial (access nearby addresses). The foundation of caching.
  3. Cache Mapping: Direct (simple, conflict-prone), Fully Associative (flexible, expensive), Set-Associative (practical balance).
  4. Address Fields: Direct: [Tag|Line|Offset]. Associative: [Tag|Offset]. Set-Assoc: [Tag|Set|Offset].
  5. AMAT = Hit Time + Miss Rate ร— Miss Penalty. For multi-level: recurse into each level.
  6. Write Policies: Write-through (consistent, slow) vs Write-back (fast, needs dirty bit).
  7. Virtual Memory: Gives each process its own address space. Page table maps VPNโ†’PFN. TLB caches translations.
  8. Page Fault: Referenced page not in RAM โ†’ OS loads from disk (~10 ms). Must be extremely rare (<0.001%).
  9. SRAM vs DRAM: SRAM = fast/expensive (cache). DRAM = slow/cheap/needs refresh (main memory).
  10. CAM: Searches by content, not address. Used in TLBs and fully associative caches.

๐Ÿ“‹ Essential Formulas

  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ AMAT = Hit_Time + Miss_Rate ร— Miss_Penalty                   โ”‚
  โ”‚                                                                โ”‚
  โ”‚ 2-Level AMAT = Tโ‚ + MRโ‚ ร— (Tโ‚‚ + MRโ‚‚ ร— T_RAM)               โ”‚
  โ”‚                                                                โ”‚
  โ”‚ Tag bits (Direct)    = n - logโ‚‚(Lines) - logโ‚‚(Block)          โ”‚
  โ”‚ Tag bits (Assoc)     = n - logโ‚‚(Block)                        โ”‚
  โ”‚ Tag bits (Set-Assoc) = n - logโ‚‚(Sets) - logโ‚‚(Block)           โ”‚
  โ”‚                                                                โ”‚
  โ”‚ Sets = Total_Lines / Associativity                             โ”‚
  โ”‚ Cache_Data = Lines ร— Block_Size                                โ”‚
  โ”‚                                                                โ”‚
  โ”‚ Virtual Pages = 2^(VA_bits - Offset_bits)                      โ”‚
  โ”‚ Physical Frames = 2^(PA_bits - Offset_bits)                    โ”‚
  โ”‚ Page Table Size = Num_Virtual_Pages ร— Entry_Size               โ”‚
  โ”‚                                                                โ”‚
  โ”‚ Rotational Latency = (1/2) ร— (60/RPM) seconds                 โ”‚
  โ”‚ Disk Access = Seek + Rotational_Latency + Transfer_Time        โ”‚
  โ”‚                                                                โ”‚
  โ”‚ Effective Access Time (Hierarchical) = t_c + (1-h) ร— t_m      โ”‚
  โ”‚ Effective Access Time (Simultaneous) = hร—t_c + (1-h) ร— t_m    โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
Section L

Earning Checkpoint โ€” Self-Assessment

Skill / ConceptTool / MethodDeliverableEarning Ready?
Memory HierarchyConceptualโ€”โœ… Yes โ€” interview ready
Cache Address DecodingPython scriptAddress Decoder tool on GitHubโœ… Yes โ€” useful for GATE coaching content
Cache Mapping (3 types)Diagrams + calculationsBlog post with ASCII diagramsโœ… Yes โ€” technical writing gigs
Hit/Miss Trace SimulationPython simulatorCache Simulator on GitHubโœ… Yes โ€” portfolio piece
AMAT CalculationFormula applicationSolved numericals setโœ… Yes โ€” GATE coaching assistance
Virtual Memory ConceptsConceptual + calculationsโ€”โœ… Yes โ€” interview ready
Cache Hierarchy DesignFull simulator (Tier 3)L1+L2 Hierarchy Simulatorโœ… Yes โ€” resume-worthy project
VLSI/SoC Cache DesignSystemVerilog (beyond chapter)โ€”โฌœ Not yet โ€” needs M.Tech/advanced courses
Minimum Viable Earning Setup after this chapter: A GitHub profile with (1) Cache Address Decoder (Python), (2) Cache Hit/Miss Simulator, (3) 2โ€“3 well-written technical blog posts explaining cache mapping with diagrams = you can earn โ‚น3,000โ€“โ‚น10,000/month from technical writing + GATE coaching content while still in college.

โœ… Unit 6 complete. You've mastered the Memory Unit โ€” from registers to virtual memory!

[QR: Link to EduArtha video tutorial โ€” COA Unit 6: Memory Unit]