Computer Organization & Architecture
Unit 4: Central Processing Unit
From register files to interrupt handling — master CPU internals, instruction formats, addressing modes, and the RISC vs CISC battle that shapes every processor in your pocket.
โฑ๏ธ 7 hrs theory + 5 hrs lab | ๐ฏ GATE ~4 marks | ๐ฅ๏ธ ARM vs Intel
๐ผ Jobs this unlocks: VLSI Design Engineer (โน6โ12 LPA) | Embedded Systems Engineer (โน5โ10 LPA) | CPU Verification Engineer (โน8โ18 LPA)
Opening Hook — Apple M4 vs Snapdragon X Elite: The CPU War
๐ฅ The RISC vs CISC Battle That Changed Computing Forever
In 2024, Apple unveiled the M4 chip — an ARM-based RISC processor that obliterates Intel's Core Ultra in performance-per-watt. A MacBook Pro with M4 delivers 38 trillion operations per second while sipping battery like a phone. Meanwhile, Qualcomm's Snapdragon X Elite brought ARM to Windows laptops, threatening Intel's 40-year x86 CISC monopoly on PCs.
Here's the twist: both M4 and Snapdragon X Elite are RISC processors โ they use a reduced instruction set with fixed-length instructions. Intel's Core Ultra and AMD's Ryzen are CISC processors โ complex instruction sets with variable-length instructions. For decades, everyone thought CISC won the PC war. Now RISC is eating CISC's lunch.
Behind every chip is a CPU architecture built from register files, ALUs, instruction decoders, and interrupt controllers — exactly what this chapter teaches you. Understand this chapter, and you'll understand why Apple's stock is worth $3 trillion.
Learning Outcomes — Bloom's Taxonomy Mapped
| Bloom's Level | Learning Outcome |
|---|---|
| ๐ต Remember | List all 8 addressing modes and define RISC vs CISC architectures |
| ๐ต Remember | Identify components of General Register Organization: register file, MUX, ALU, output bus |
| ๐ต Understand | Explain stack organization with PUSH/POP operations and Stack Pointer movement |
| ๐ต Understand | Describe how 3-address, 2-address, 1-address, and 0-address instruction formats encode operations |
| ๐ข Apply | Evaluate X=(A+B)*(C+D) using all four instruction formats with complete instruction sequences |
| ๐ข Apply | Trace flag changes (CF, ZF, SF, OF) in the PSW after arithmetic operations like ADD 0x7FFF+0x0001 |
| ๐ Analyze | Compare RISC vs CISC across 12+ parameters with ARM vs x86 real-world examples |
| ๐ Analyze | Determine effective addresses for all 8 addressing modes given memory contents and register values |
| ๐ด Evaluate | Justify when to use stack organization vs register organization for different application scenarios |
| ๐ด Evaluate | Assess interrupt priority handling schemes (daisy-chain vs parallel) for real-time embedded systems |
| ๐ฃ Create | Design a simple instruction set with 16 instructions supporting at least 4 addressing modes |
| ๐ฃ Create | Architect a CPU datapath for a given 3-address instruction format with control signals |
Concept Explanation — CPU Architecture from Scratch
1. General Register Organization
A CPU's register organization determines how data flows between registers, the ALU, and memory. In a general register organization, the CPU has a set of general-purpose registers (typically R0–R7), and any register can be used as a source or destination for any operation. This is the most flexible and common organization used in modern CPUs.
๐ง Components of General Register Organization
Register File (R0–R7): A set of 8 general-purpose registers, each capable of holding one data word. Any register can serve as source or destination. Registers are faster than memory because they are inside the CPU.
MUX A (Multiplexer A): Selects one register as the first source operand (input A to ALU). Controlled by SELA (3-bit select line).
MUX B (Multiplexer B): Selects one register as the second source operand (input B to ALU). Controlled by SELB (3-bit select line).
ALU (Arithmetic Logic Unit): Performs the actual operation (ADD, SUB, AND, OR, etc.) on the two inputs from MUX A and MUX B. Controlled by OPR (5-bit operation code).
Output Bus: Carries the ALU result back to the register file. The destination register is selected by SELD (3-bit select line).
ASCII DIAGRAM
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ REGISTER FILE (R0 โ R7) โ
โ โโโโโโฌโโโโโฌโโโโโฌโโโโโฌโโโโโฌโโโโโฌโโโโโฌโโโโโโ
โ โ R0 โ R1 โ R2 โ R3 โ R4 โ R5 โ R6 โ R7 โโ
โ โโโโโโดโโโโโดโโโโโดโโโโโดโโโโโดโโโโโดโโโโโดโโโโโโ
โโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโ
โ โ
โโโโโโผโโโโโโ โโโโโโผโโโโโโ
โ MUX A โ โ MUX B โ
โ (SELA) โ โ (SELB) โ
โโโโโโฌโโโโโโ โโโโโโฌโโโโโโ
โ Input A โ Input B
โผ โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ A L U โ
โ (OPR โ 5 bits) โ
โโโโโโโโโโโโโโฌโโโโโโโโโโโโโ
โ
Output Bus
โ
โโโโโโผโโโโโโ
โ SELD โ โ Destination register select
โ (3 bits) โ
โโโโโโโโโโโโ
โ
(writes back to Register File)
Control Word Format (14 bits)
| Field | Bits | Purpose | Range |
|---|---|---|---|
| SELA | 3 | Select source register A (MUX A input) | 000 (R0) to 111 (R7) |
| SELB | 3 | Select source register B (MUX B input) | 000 (R0) to 111 (R7) |
| SELD | 3 | Select destination register for result | 000 (R0) to 111 (R7) |
| OPR | 5 | ALU operation select | 00000 (Transfer A) to 11111 |
Example: R3 ← R1 + R2
| Field | Value | Meaning |
|---|---|---|
| SELA | 001 | Select R1 as source A |
| SELB | 010 | Select R2 as source B |
| SELD | 011 | Select R3 as destination |
| OPR | 00010 | ADD operation |
Complete control word: 001 010 011 00010 (14 bits)
2. Stack Organization
A stack is a last-in-first-out (LIFO) storage structure. In CPU architecture, stacks are used for subroutine calls (saving return addresses), expression evaluation, and interrupt handling. The stack is managed by a special register called the Stack Pointer (SP).
๐ Memory Stack โ PUSH and POP Operations
PUSH operation (add to stack):
Step 1: SP ← SP − 1 (decrement stack pointer — stack grows downward)
Step 2: M[SP] ← DR (write data register value to memory at SP)
POP operation (remove from stack):
Step 1: DR ← M[SP] (read value from memory at SP into data register)
Step 2: SP ← SP + 1 (increment stack pointer — stack shrinks upward)
MEMORY STACK
Address Memory Notes
โโโโโโโโโโฌโโโโโโโโโโโโโ
โ 4000 โ (empty) โ โ Initial SP (stack empty)
โโโโโโโโโโผโโโโโโโโโโโโโค
โ 3999 โ Data_3 โ โ SP after 3 PUSHes (Top of Stack)
โโโโโโโโโโผโโโโโโโโโโโโโค
โ 3998 โ Data_2 โ
โโโโโโโโโโผโโโโโโโโโโโโโค
โ 3997 โ Data_1 โ โ First item pushed
โโโโโโโโโโผโโโโโโโโโโโโโค
โ ... โ ... โ
โโโโโโโโโโผโโโโโโโโโโโโโค
โ 3000 โ (limit) โ โ Stack bottom (overflow if SP < 3000)
โโโโโโโโโโดโโโโโโโโโโโโโ
Stack grows DOWNWARD (address decreases on PUSH)
Stack shrinks UPWARD (address increases on POP)
FULL condition: SP = 3000 (stack bottom limit)
EMPTY condition: SP = 4000 (initial value)
0-Address Instructions & Stack-Based Expression Evaluation
In 0-address (stack) architecture, instructions like ADD don't specify operands — they implicitly pop two values from the stack, operate, and push the result. This is how the expression (A+B)×C is evaluated using Reverse Polish Notation (RPN).
Infix: (A + B) × C RPN (Postfix): A B + C ×
Assume A=3, B=5, C=4:
| Step | Instruction | Action | Stack (top →) | Result |
|---|---|---|---|---|
| 1 | PUSH A | Push 3 | 3 | — |
| 2 | PUSH B | Push 5 | 3, 5 | — |
| 3 | ADD | Pop 5,3; Push 3+5 | 8 | A+B = 8 |
| 4 | PUSH C | Push 4 | 8, 4 | — |
| 5 | MUL | Pop 4,8; Push 8×4 | 32 | (A+B)×C = 32 |
int x = a + b; in Java, the JVM internally executes: ILOAD a, ILOAD b, IADD, ISTORE x — pure 0-address stack operations. Postscript printers and HP calculators also use stack-based evaluation.
3. Instruction Formats
An instruction format defines the layout of bits in a machine instruction — how many addresses (operands) are specified, what fields are present, and how long each field is. The number of address fields determines the instruction type.
Evaluating X = (A + B) × (C + D) in All Four Formats
3-Address Format: OP DEST, SRC1, SRC2
3-ADDRESS
Instruction 1: ADD R1, A, B ; R1 โ M[A] + M[B]
Instruction 2: ADD R2, C, D ; R2 โ M[C] + M[D]
Instruction 3: MUL X, R1, R2 ; M[X] โ R1 ร R2
Total instructions: 3
Advantage: Fewest instructions, most information per instruction
Disadvantage: Longest instruction word (3 address fields)
2-Address Format: OP DEST, SRC (DEST ← DEST op SRC)
2-ADDRESS
Instruction 1: MOV R1, A ; R1 โ M[A]
Instruction 2: ADD R1, B ; R1 โ R1 + M[B] โ R1 = A+B
Instruction 3: MOV R2, C ; R2 โ M[C]
Instruction 4: ADD R2, D ; R2 โ R2 + M[D] โ R2 = C+D
Instruction 5: MUL R1, R2 ; R1 โ R1 ร R2 โ R1 = (A+B)ร(C+D)
Instruction 6: MOV X, R1 ; M[X] โ R1
Total instructions: 6
Advantage: Moderate instruction length
Disadvantage: One source is always overwritten (destructive)
1-Address Format: OP ADDR (uses Accumulator AC implicitly)
1-ADDRESS
Instruction 1: LOAD A ; AC โ M[A]
Instruction 2: ADD B ; AC โ AC + M[B] โ AC = A+B
Instruction 3: STORE T ; M[T] โ AC โ save A+B
Instruction 4: LOAD C ; AC โ M[C]
Instruction 5: ADD D ; AC โ AC + M[D] โ AC = C+D
Instruction 6: MUL T ; AC โ AC ร M[T] โ AC = (C+D)ร(A+B)
Instruction 7: STORE X ; M[X] โ AC
Total instructions: 7
Advantage: Short instruction word
Disadvantage: Needs temporary storage, more instructions
0-Address Format: OP (uses Stack implicitly)
0-ADDRESS
Instruction 1: PUSH A ; TOS โ A
Instruction 2: PUSH B ; TOS โ B
Instruction 3: ADD ; Pop B,A; Push A+B
Instruction 4: PUSH C ; TOS โ C
Instruction 5: PUSH D ; TOS โ D
Instruction 6: ADD ; Pop D,C; Push C+D
Instruction 7: MUL ; Pop (C+D),(A+B); Push (A+B)ร(C+D)
Instruction 8: POP X ; M[X] โ TOS
Total instructions: 8
Advantage: Shortest instruction word, hardware-friendly
Disadvantage: Most instructions needed, stack management overhead
Comparison of Instruction Formats
| Parameter | 3-Address | 2-Address | 1-Address | 0-Address |
|---|---|---|---|---|
| Fields | OP + 3 addr | OP + 2 addr | OP + 1 addr | OP only |
| Instruction Length | Longest | Medium | Short | Shortest |
| Instructions for X=(A+B)×(C+D) | 3 | 6 | 7 | 8 |
| Program Size | Fewest instructions | Moderate | More | Most instructions |
| Memory Access | Multiple per instruction | 2 per instruction | 1 per instruction | Stack only |
| Register Usage | General purpose | General purpose | Accumulator | Stack |
| Example CPU | ARM, MIPS | x86 (MOV, ADD) | Early PDP-8 | JVM, HP calculators |
4. Addressing Modes — All 8
An addressing mode specifies how the CPU calculates the effective address (EA) of an operand. Different modes provide different trade-offs between flexibility, speed, and code compactness. Mastering all 8 modes is essential for GATE and CPU design.
Mode 1: Immediate Addressing
Definition: The operand value is directly contained in the instruction itself. No memory access needed for operand.
EA: No effective address — operand is part of instruction.
IMMEDIATE
Instruction: [ OP | #25 ]
โ
โโโ Operand = 25 (directly in instruction)
Example: MOV R1, #25 โ R1 = 25
Use: Loading constants, initializing counters
Speed: โ
โ
โ
โ
โ
Fastest (no memory access for operand)
Mode 2: Direct Addressing
Definition: The address field contains the actual memory address of the operand.
EA = Address field of instruction
DIRECT
Instruction: [ OP | 500 ]
โ
โผ
Memory[500] = 42 โ Operand
EA = 500, Operand = 42
Example: LOAD 500 โ AC = M[500] = 42
Use: Accessing global variables
Speed: โ
โ
โ
โ
One memory access
Mode 3: Indirect Addressing
Definition: The address field points to a memory location that contains the effective address. Two memory accesses needed.
EA = M[Address field]
INDIRECT
Instruction: [ OP | 500 ]
โ
โผ
Memory[500] = 800 โ This is the EA (pointer)
โ
โผ
Memory[800] = 42 โ Actual operand
EA = 800, Operand = 42
Example: LOAD @500 โ AC = M[M[500]] = M[800] = 42
Use: Pointers, dynamic memory access, linked lists
Speed: โ
โ
โ
Two memory accesses (slower)
Mode 4: Register Addressing
Definition: The operand is in a CPU register. No memory access needed.
EA: None — operand is in the specified register.
REGISTER
Instruction: [ OP | R3 ]
โ
โผ
R3 = 42 โ Operand is in register R3
Operand = R3 = 42
Example: ADD R1, R3 โ R1 = R1 + R3
Use: Fastest operations, loop variables
Speed: โ
โ
โ
โ
โ
No memory access
Mode 5: Register Indirect Addressing
Definition: The register contains the memory address of the operand.
EA = [Register] (contents of register is the address)
REGISTER INDIRECT
Instruction: [ OP | R3 ]
โ
โผ
R3 = 800 โ R3 holds memory address
โ
โผ
Memory[800] = 42 โ Actual operand
EA = 800 (value in R3), Operand = 42
Example: LOAD (R3) โ AC = M[R3] = M[800] = 42
Use: Array access, pointer dereferencing
Speed: โ
โ
โ
โ
One memory access
Mode 6: Autoincrement Addressing
Definition: Like register indirect, but the register is automatically incremented after use. Perfect for array traversal.
EA = [R]; then R ← R + 1
AUTOINCREMENT
Before: R3 = 800
Instruction: [ OP | (R3)+ ]
โ
R3 = 800 โ Memory[800] = 42 โ Operand
โ
R3 = 801 โ R3 auto-incremented after access
EA = 800, Operand = 42, R3 updated to 801
Example: LOAD (R3)+ โ AC = M[R3]; R3 = R3 + 1
Use: Array traversal, sequential data processing
Speed: โ
โ
โ
โ
One memory access + register update
Mode 7: Displacement (Indexed) Addressing
Definition: EA is computed by adding a constant displacement in the instruction to the contents of a register.
EA = Address field + [R]
DISPLACEMENT / INDEXED
Instruction: [ OP | 100 | R2 ]
โ โ
โ R2 = 500
โ โ
โโโ+โโโโ
โ
EA = 100 + 500 = 600
โ
Memory[600] = 42 โ Operand
EA = 600, Operand = 42
Example: LOAD 100(R2) โ AC = M[100 + R2] = M[600] = 42
Use: Accessing struct fields, array elements with base
Speed: โ
โ
โ
โ
One addition + one memory access
Mode 8: Relative Addressing
Definition: EA is computed by adding the address field (offset) to the Program Counter (PC). Used for branch instructions.
EA = PC + Address field
RELATIVE
Instruction at PC=200: [ OP | +50 ]
โ
PC = 200 + 50 = 250
โ
EA = 250 (target of branch)
EA = 250
Example: BEQ +50 โ if zero flag set, jump to PC+50 = 250
Use: Branch/jump instructions, position-independent code
Speed: โ
โ
โ
โ
One addition (no memory access for address)
Comparison of All 8 Addressing Modes
| Mode | EA Formula | Memory Accesses | Speed | Use Case |
|---|---|---|---|---|
| Immediate | Operand in instruction | 0 | โ โ โ โ โ | Constants |
| Direct | EA = Addr | 1 | โ โ โ โ | Global variables |
| Indirect | EA = M[Addr] | 2 | โ โ โ | Pointers |
| Register | Operand = R | 0 | โ โ โ โ โ | Loop variables |
| Reg. Indirect | EA = [R] | 1 | โ โ โ โ | Array via pointer |
| Autoincrement | EA = [R]; R++ | 1 | โ โ โ โ | Array traversal |
| Displacement | EA = Addr + [R] | 1 | โ โ โ โ | Struct fields |
| Relative | EA = PC + Addr | 0 | โ โ โ โ | Branches/jumps |
5. RISC vs CISC Architecture
The two dominant CPU design philosophies are RISC (Reduced Instruction Set Computer) and CISC (Complex Instruction Set Computer). This is one of the most important comparisons in computer architecture and a frequent GATE topic.
| Parameter | RISC | CISC |
|---|---|---|
| Full Form | Reduced Instruction Set Computer | Complex Instruction Set Computer |
| Instruction Set Size | Small (50โ150 instructions) | Large (200โ300+ instructions) |
| Instruction Length | Fixed (32-bit typically) | Variable (1โ15 bytes in x86) |
| Execution Time | 1 clock cycle per instruction (mostly) | Multiple cycles per instruction |
| Addressing Modes | Few (3โ5 modes) | Many (12โ20+ modes) |
| Pipelining | Highly efficient (fixed-length helps) | Difficult (variable-length hinders) |
| Registers | Many (32โ64 general purpose) | Few (8โ16 general purpose) |
| Memory Access | Only LOAD/STORE access memory | Any instruction can access memory |
| Code Size | Larger (more instructions needed) | Smaller (complex instructions do more) |
| Hardware Complexity | Simple (hardwired control) | Complex (microprogrammed control) |
| Power Consumption | Lower (simpler circuits) | Higher (complex decode logic) |
| Examples | ARM, MIPS, RISC-V, SPARC, PowerPC | Intel x86, AMD x86-64, VAX, Motorola 68k |
| Compiler Complexity | More complex (must optimize simple ops) | Simpler (hardware does the work) |
| Use Cases | Mobile, embedded, IoT, laptops (Apple M4) | Desktops, servers, legacy PCs |
ARM (RISC) vs Intel x86 (CISC) โ Real World
| Parameter | ARM Cortex-A78 (RISC) | Intel Core i7-14700K (CISC) |
|---|---|---|
| Instruction Width | Fixed 32-bit (or 16-bit Thumb) | Variable 1โ15 bytes |
| Registers | 31 general-purpose (AArch64) | 16 general-purpose (x86-64) |
| Power (TDP) | ~1โ5W per core | ~125W package |
| Pipeline Stages | 11โ13 stages | 14โ19 stages |
| Market | 99% of smartphones, Apple MacBooks | ~75% of desktops/servers |
| India Usage | Every Indian smartphone, Raspberry Pi | Office PCs, data centers |
6. Data Transfer & Manipulation Instructions
Data Transfer Instructions
| Instruction | Operation | Example | Description |
|---|---|---|---|
| LOAD | AC ← M[addr] | LOAD 500 | Load memory into accumulator |
| STORE | M[addr] ← AC | STORE 600 | Store accumulator to memory |
| MOV | DEST ← SRC | MOV R1, R2 | Copy data between registers |
| PUSH | SP--; M[SP] ← R | PUSH R3 | Push register onto stack |
| POP | R ← M[SP]; SP++ | POP R3 | Pop stack top into register |
| XCHG | R1 ↔ R2 | XCHG R1, R2 | Exchange contents of two registers |
| IN | R ← Port | IN R1, PORT_A | Input from I/O port |
| OUT | Port ← R | OUT PORT_B, R1 | Output to I/O port |
Data Manipulation Instructions
| Category | Instruction | Operation | Example |
|---|---|---|---|
| Arithmetic | ADD | R1 ← R1 + R2 | ADD R1, R2 |
| SUB | R1 ← R1 − R2 | SUB R1, R2 | |
| MUL | R1 ← R1 × R2 | MUL R1, R2 | |
| DIV | R1 ← R1 ÷ R2 | DIV R1, R2 | |
| Logical | AND | R1 ← R1 AND R2 | AND R1, R2 |
| OR | R1 ← R1 OR R2 | OR R1, R2 | |
| XOR | R1 ← R1 XOR R2 | XOR R1, R2 | |
| NOT | R1 ← complement of R1 | NOT R1 | |
| Shift | SHL | Shift left logical | SHL R1, 1 |
| SHR | Shift right logical | SHR R1, 1 | |
| ROL | Rotate left | ROL R1, 2 | |
| ROR | Rotate right | ROR R1, 2 |
7. Program Control & Interrupts
Program Control Instructions
| Instruction | Operation | Description |
|---|---|---|
| JMP addr | PC ← addr | Unconditional jump |
| BEQ addr | If ZF=1, PC ← addr | Branch if equal (zero flag set) |
| BNE addr | If ZF=0, PC ← addr | Branch if not equal |
| BGT addr | If ZF=0 and SF=OF, PC ← addr | Branch if greater than |
| CALL addr | PUSH PC; PC ← addr | Call subroutine (save return address) |
| RET | POP PC | Return from subroutine |
| NOP | No operation | Pipeline delay, alignment |
| HLT | Halt processor | Stop execution |
Interrupts
An interrupt is a signal that diverts the CPU from its current program to execute a special routine called an Interrupt Service Routine (ISR). After handling the interrupt, the CPU returns to the original program.
| Type | Source | Example | Priority |
|---|---|---|---|
| Hardware External | I/O devices, timers | Keyboard press, disk ready | High |
| Hardware Internal | CPU itself | Division by zero, overflow | Highest |
| Software | Instruction in program | INT 21h (DOS), SVC (ARM) | Programmed |
| Non-Maskable (NMI) | Critical hardware | Power failure, memory parity error | Cannot be disabled |
| Maskable | Peripheral devices | Printer ready, serial data | Can be disabled via IF flag |
Interrupt Handling Cycle
INTERRUPT FLOW
Program Execution
โ
โผ
โโโโโโโโโโโโโโโโโโโ
โ Interrupt Signal โ โ Device sends interrupt request
โโโโโโโโโโฌโโโโโโโโโ
โผ
โโโโโโโโโโโโโโโโโโโ
โ Finish Current โ โ CPU completes current instruction
โ Instruction โ
โโโโโโโโโโฌโโโโโโโโโ
โผ
โโโโโโโโโโโโโโโโโโโ
โ Save Context โ โ Push PC and PSW onto stack
โ (PC, PSW, Regs) โ
โโโโโโโโโโฌโโโโโโโโโ
โผ
โโโโโโโโโโโโโโโโโโโ
โ Identify Source โ โ Polling or vectored interrupt
โโโโโโโโโโฌโโโโโโโโโ
โผ
โโโโโโโโโโโโโโโโโโโ
โ Load ISR Address โ โ From interrupt vector table
โ into PC โ
โโโโโโโโโโฌโโโโโโโโโ
โผ
โโโโโโโโโโโโโโโโโโโ
โ Execute ISR โ โ Handle the interrupt
โโโโโโโโโโฌโโโโโโโโโ
โผ
โโโโโโโโโโโโโโโโโโโ
โ RTI (Return โ โ Pop PC and PSW from stack
โ from Interrupt) โ
โโโโโโโโโโฌโโโโโโโโโ
โผ
Resume Original Program
Priority Interrupt Systems
| Method | How It Works | Pros | Cons |
|---|---|---|---|
| Daisy Chain | Devices connected in series; closest to CPU has highest priority | Simple hardware | Fixed priority, slow for many devices |
| Parallel Priority | Each device has dedicated line; priority encoder selects highest | Fast, flexible | More hardware, more wires |
| Software Polling | CPU checks each device in sequence via status registers | No extra hardware | Slow, wastes CPU cycles |
8. Processor Status Word (PSW) & Flags
The Processor Status Word (PSW), also called the Flags Register or EFLAGS (in x86), is a special register that holds condition codes set by ALU operations. These flags are used by conditional branch instructions to make decisions.
| Flag | Full Name | Set When | Example |
|---|---|---|---|
| CF | Carry Flag | Unsigned operation produces carry/borrow out of MSB | 0xFFFF + 0x0001 โ CF=1 |
| ZF | Zero Flag | Result of operation is zero | 5 − 5 = 0 โ ZF=1 |
| SF | Sign Flag | Result is negative (MSB = 1 in signed representation) | 3 − 5 = −2 โ SF=1 |
| OF | Overflow Flag | Signed operation exceeds representable range | 0x7FFF + 0x0001 โ OF=1 |
| IF | Interrupt Flag | Set = interrupts enabled; Clear = interrupts disabled | CLI clears IF, STI sets IF |
Trace: ADD 0x7FFF + 0x0001 (16-bit signed)
FLAG TRACE
Operand A: 0x7FFF = 0111 1111 1111 1111 (+32767, max positive 16-bit)
Operand B: 0x0001 = 0000 0000 0000 0001 (+1)
Binary Addition:
0111 1111 1111 1111 (0x7FFF = +32767)
+ 0000 0000 0000 0001 (0x0001 = +1)
โโโโโโโโโโโโโโโโโโโโโ
1000 0000 0000 0000 (0x8000 = -32768 in signed!)
Result = 0x8000
Flag Analysis:
โโโโโโโโฌโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Flag โ Value โ Reason โ
โโโโโโโโผโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ CF โ 0 โ No carry out of bit 15 (unsigned OK) โ
โ ZF โ 0 โ Result โ 0 โ
โ SF โ 1 โ MSB = 1 (result appears negative) โ
โ OF โ 1 โ +ve + +ve = โve โ signed overflow! โ
โโโโโโโโดโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Explanation: Adding two positive numbers (0x7FFF + 0x0001) gave a
negative result (0x8000 = -32768). This is SIGNED OVERFLOW.
OF is set because the sign of the result doesn't match expected.
CF is NOT set because there's no carry out in unsigned addition.
More Flag Trace Examples
| Operation | Result (16-bit) | CF | ZF | SF | OF |
|---|---|---|---|---|---|
| 0x0005 + 0x0003 | 0x0008 | 0 | 0 | 0 | 0 |
| 0xFFFF + 0x0001 | 0x0000 | 1 | 1 | 0 | 0 |
| 0x8000 + 0x8000 | 0x0000 | 1 | 1 | 0 | 1 |
| 0x0005 − 0x0005 | 0x0000 | 0 | 1 | 0 | 0 |
| 0x0003 − 0x0005 | 0xFFFE | 1 | 0 | 1 | 0 |
| 0x7000 + 0x7000 | 0xE000 | 0 | 0 | 1 | 1 |
Learn by Doing — 3-Tier Lab Structure
๐ข Tier 1 — GUIDED: Instruction Format Converter (Pen & Paper)
Objective:
Convert the expression X = (A + B) × (C + D) into all 4 instruction formats by hand, showing every step.
Step 1: Write the Expression Tree
Draw the expression tree for X = (A + B) × (C + D):
ร
/ \
+ +
/ \ / \
A B C D
Step 2: 3-Address Format
Each instruction specifies: OP destination, source1, source2
Write three instructions: (1) ADD T1, A, B (2) ADD T2, C, D (3) MUL X, T1, T2
Step 3: 2-Address Format
One operand is both source and destination. You need MOV to copy initial values.
Write: MOV R1,A โ ADD R1,B โ MOV R2,C โ ADD R2,D โ MUL R1,R2 โ MOV X,R1
Step 4: 1-Address Format (Accumulator)
All operations use the implicit accumulator AC. Need STORE for temporary results.
Write: LOAD A โ ADD B โ STORE T โ LOAD C โ ADD D โ MUL T โ STORE X
Step 5: 0-Address Format (Stack)
Convert to postfix (RPN): A B + C D + ×
Write: PUSH A โ PUSH B โ ADD โ PUSH C โ PUSH D โ ADD โ MUL โ POP X
Step 6: Fill the Comparison Table
Count: instructions, memory accesses, and total bits needed for each format. Create a table comparing all four.
๐ Deliverable: A clean, hand-written (or typed) comparison showing all 4 formats with instruction counts and analysis. Take a photo for your portfolio.
๐ก Tier 2 — SEMI-GUIDED: CPU Register Simulator in Python
Your Mission:
Build a Python simulator that models a simple CPU with 8 registers (R0–R7), an ALU, and a memory of 256 words. Implement LOAD, STORE, ADD, SUB, and MOV instructions.
Starter Code (you complete the TODOs):
Python class SimpleCPU: def __init__(self): self.registers = [0] * 8 # R0-R7 self.memory = [0] * 256 # 256-word memory self.flags = {'CF':0, 'ZF':0, 'SF':0, 'OF':0} def load(self, reg, addr): # TODO: Load memory[addr] into registers[reg] pass def store(self, reg, addr): # TODO: Store registers[reg] into memory[addr] pass def add(self, dest, src1, src2): # TODO: registers[dest] = registers[src1] + registers[src2] # TODO: Update ZF, SF, CF, OF flags pass def display_state(self): # TODO: Print all register values and flags pass
Test Case:
Store A=5 at memory[100], B=3 at memory[101]. Execute: LOAD R1,100 โ LOAD R2,101 โ ADD R3,R1,R2 โ STORE R3,102. Verify memory[102] = 8.
๐ด Tier 3 — OPEN CHALLENGE: Design a Custom CPU Instruction Set
The Brief:
Design a complete instruction set architecture (ISA) for a hypothetical 16-bit CPU called ARTHA-16. Your design must include:
- 16 instructions covering: data transfer (4), arithmetic (4), logical (3), control flow (3), stack (2)
- At least 4 addressing modes: Immediate, Direct, Register, Register Indirect
- Instruction encoding: 16-bit fixed format. Show the bit layout for each instruction type.
- 8 registers: R0–R5 (general), R6 (SP), R7 (PC)
- Sample program: Write a program to compute factorial of 5 using your ISA
- Documentation: Create a 2-page ISA reference card (like ARM's quick reference)
Deliverable: A PDF/Google Doc with your ISA specification, encoding tables, and sample program. This is a portfolio-worthy project for embedded systems roles.
Problem Bank — Diagrams, Numericals, Industry & GATE
Diagram-Based Problems (3)
๐ Problem D1: Draw General Register Organization
Q: Draw the complete block diagram of a general register organization with 8 registers. Label MUX A, MUX B, ALU, output bus, and all control signals (SELA, SELB, SELD, OPR). Show the data flow for the operation R5 ← R2 AND R6.
Solution: Use the diagram from Section C, Topic 1. For R5 ← R2 AND R6: SELA=010 (R2), SELB=110 (R6), SELD=101 (R5), OPR=01000 (AND). The control word is: 010 110 101 01000.
๐ Problem D2: Stack Trace for (A−B)×(C+D)÷E
Q: Show the complete stack trace (step-by-step) for evaluating (A−B)×(C+D)÷E using 0-address instructions. Use A=10, B=3, C=4, D=6, E=2.
Solution: Postfix: A B − C D + × E ÷
| Step | Instruction | Stack | Result |
|---|---|---|---|
| 1 | PUSH A | 10 | — |
| 2 | PUSH B | 10, 3 | — |
| 3 | SUB | 7 | 10−3=7 |
| 4 | PUSH C | 7, 4 | — |
| 5 | PUSH D | 7, 4, 6 | — |
| 6 | ADD | 7, 10 | 4+6=10 |
| 7 | MUL | 70 | 7×10=70 |
| 8 | PUSH E | 70, 2 | — |
| 9 | DIV | 35 | 70÷2=35 |
Final answer: 35
๐ Problem D3: Draw the Interrupt Handling Flowchart
Q: Draw the complete flowchart for interrupt handling, showing the steps from interrupt request to resumption of the original program. Include context saving, ISR execution, and RTI.
Solution: Refer to the ASCII flowchart in Section C, Topic 7. The key steps are: (1) Complete current instruction, (2) Save PC and PSW to stack, (3) Identify interrupt source, (4) Load ISR address from IVT, (5) Execute ISR, (6) Execute RTI to restore PC and PSW, (7) Resume original program.
Numerical Problems (6)
๐ข Problem N1: Effective Address Calculation
Q: Given: R1=200, PC=500, Memory[300]=600, Memory[600]=42, Memory[700]=55. Calculate the effective address and operand for: (a) Direct 300, (b) Indirect 300, (c) Register R1, (d) Displacement 500(R1).
Solution:
(a) Direct 300: EA=300, Operand=M[300]=600
(b) Indirect 300: EA=M[300]=600, Operand=M[600]=42
(c) Register R1: Operand=R1=200 (no EA, value in register)
(d) Displacement 500(R1): EA=500+R1=500+200=700, Operand=M[700]=55
๐ข Problem N2: Control Word Generation
Q: For a general register organization with R0–R7, write the 14-bit control word for: (a) R4 ← R1 + R7, (b) R0 ← R3 XOR R5, (c) R6 ← R2 (transfer).
Solution:
(a) SELA=001, SELB=111, SELD=100, OPR=00010 (ADD) โ 001 111 100 00010
(b) SELA=011, SELB=101, SELD=000, OPR=01100 (XOR) โ 011 101 000 01100
(c) SELA=010, SELB=000, SELD=110, OPR=00000 (Transfer A) โ 010 000 110 00000
๐ข Problem N3: Instruction Count Comparison
Q: For the expression Y = (P + Q) × (R − S) + T, determine the number of instructions needed in 3-address, 2-address, 1-address, and 0-address formats.
Solution: 3-address: ADD T1,P,Q โ SUB T2,R,S โ MUL T3,T1,T2 โ ADD Y,T3,T = 4 instructions. 2-address: MOV R1,P โ ADD R1,Q โ MOV R2,R โ SUB R2,S โ MUL R1,R2 โ ADD R1,T โ MOV Y,R1 = 7 instructions. 1-address: LOAD P โ ADD Q โ STORE T1 โ LOAD R โ SUB S โ MUL T1 โ ADD T โ STORE Y = 8 instructions. 0-address: PUSH P โ PUSH Q โ ADD โ PUSH R โ PUSH S โ SUB โ MUL โ PUSH T โ ADD โ POP Y = 10 instructions.
๐ข Problem N4: Stack Operations Trace
Q: Initial SP=1000. Show the SP value after each operation: PUSH A, PUSH B, PUSH C, POP, PUSH D, POP, POP.
Solution: PUSH A: SP=999. PUSH B: SP=998. PUSH C: SP=997. POP: SP=998. PUSH D: SP=997. POP: SP=998. POP: SP=999. Stack grows downward; PUSH decrements SP, POP increments SP.
๐ข Problem N5: Flag Tracing
Q: Determine CF, ZF, SF, OF after each operation (8-bit signed): (a) ADD 127, 1 (b) SUB 0, 1 (c) ADD 128, 128.
Solution:
(a) 127+1 = 128 โ 0x80 = 10000000. CF=0, ZF=0, SF=1, OF=1 (positive+positive=negative)
(b) 0−1 = −1 โ 0xFF = 11111111. CF=1 (borrow), ZF=0, SF=1, OF=0
(c) 128+128 = 256 โ 0x00 (8-bit overflow). CF=1, ZF=1, SF=0, OF=1 (negative+negative=zero)
๐ข Problem N6: Relative Address Calculation
Q: A branch instruction is at address 2050. The instruction is BEQ with an 8-bit signed offset of −30 (decimal). If the PC has already been incremented to 2052 when the offset is applied, what is the target address?
Solution: Target = PC + offset = 2052 + (−30) = 2052 − 30 = 2022. The branch goes backward 30 bytes from the next instruction address. This is how loops work — the branch target is before the branch instruction.
Industry Problems (3)
๐ญ Problem I1: ARM Pipeline Analysis
Q: An ARM Cortex-A78 has a 13-stage pipeline. If the clock frequency is 3 GHz, what is the theoretical maximum throughput in MIPS? Why is actual throughput lower?
Solution: Theoretical: 1 instruction per cycle at 3 GHz = 3000 MIPS. Actual throughput is lower due to: pipeline stalls (data hazards), branch mispredictions (flushing pipeline), cache misses (memory latency), and dependencies between instructions. Modern ARM cores use superscalar execution (multiple instructions per cycle) to partially compensate, achieving effective IPC of 3–5.
๐ญ Problem I2: x86 Instruction Decode Challenge
Q: Intel x86 has variable-length instructions (1–15 bytes). Explain why this makes pipelining harder than ARM's fixed 32-bit instructions. How does Intel solve this problem?
Solution: Variable-length makes it impossible to know where the next instruction starts without decoding the current one. This creates a bottleneck at the decode stage. Intel solves this with: (1) Pre-decode buffers that scan ahead and mark instruction boundaries, (2) Micro-op translation — complex CISC instructions are broken into fixed-length RISC-like micro-operations (ยตops) internally, (3) ยตop caches that store decoded instructions. Modern Intel CPUs are internally RISC-like despite their CISC ISA.
๐ญ Problem I3: RISC-V in Indian Context
Q: Why is RISC-V strategically important for India's semiconductor mission? Compare the cost and licensing models of ARM vs RISC-V for an Indian startup designing a custom IoT chip.
Solution: ARM requires licensing fees: $1M–$10M+ upfront + per-chip royalties (1–2%). RISC-V is open-source — zero licensing cost. For an Indian IoT startup producing 100,000 chips, ARM licensing could cost โน8–80 crore, while RISC-V costs โน0 in licensing. This is why IIT Madras chose RISC-V for SHAKTI and C-DAC chose it for VEGA. India's semiconductor independence depends on not paying royalties to foreign companies for basic CPU IP.
GATE-Style Problems (5)
๐ GATE G1 (2-mark)
Q: A CPU has 16 general-purpose registers. How many bits are needed in the control word for the register selection fields (SELA + SELB + SELD)?
Solution: Each MUX needs logโ(16) = 4 bits. Three fields: SELA(4) + SELB(4) + SELD(4) = 12 bits.
๐ GATE G2 (2-mark)
Q: Consider a byte-addressable memory with 16-bit addresses. If displacement addressing is used with a 6-bit signed offset and a 16-bit base register, what is the addressable range relative to the base?
Solution: 6-bit signed offset: range is −32 to +31 (in 2's complement). So EA ranges from [Base − 32] to [Base + 31]. The addressable range is 64 bytes centered around the base register value.
๐ GATE G3 (1-mark)
Q: In a stack-based CPU, the instruction sequence PUSH 5, PUSH 3, SUB, PUSH 2, MUL produces what result on top of stack?
Solution: PUSH 5 โ [5]. PUSH 3 โ [5,3]. SUB โ Pop 3,5; Push 5−3=2 โ [2]. PUSH 2 โ [2,2]. MUL โ Pop 2,2; Push 2×2=4 โ [4]. Answer: 4.
๐ GATE G4 (2-mark)
Q: A RISC machine has 32 registers and uses 3-address instructions. Each instruction has a 6-bit opcode and three register fields. What is the instruction length?
Solution: Opcode: 6 bits. Each register field: logโ(32) = 5 bits. Total = 6 + 5 + 5 + 5 = 21 bits. In practice, this would be padded to 32 bits with unused/extended fields.
๐ GATE G5 (2-mark)
Q: Which addressing mode is used to implement the branch instruction "if R1 == 0, jump to label L" where L is 40 bytes ahead of the current PC?
Solution: Relative addressing (PC-relative). The offset +40 is added to the current PC to compute the target address. This produces position-independent code. The instruction would be: BEQ +40 (if ZF=1 after comparing R1 with 0).
MCQ Assessment Bank — 30 Questions (Bloom's Mapped)
Remember / Identify (Q1–Q5)
In General Register Organization, the component that selects the source operand for ALU input A is:
- ALU
- MUX A
- SELD
- Output Bus
The PUSH operation on a memory stack (growing downward) performs:
- SP โ SP + 1, then M[SP] โ DR
- SP โ SP โ 1, then M[SP] โ DR
- DR โ M[SP], then SP โ SP + 1
- DR โ M[SP], then SP โ SP โ 1
RISC stands for:
- Reduced Instruction Standard Computer
- Reduced Instruction Set Computer
- Register Instruction Set Computer
- Rapid Instruction Set Computing
The Overflow Flag (OF) in the PSW is set when:
- The result is zero
- There is a carry from the MSB in unsigned arithmetic
- A signed operation produces a result outside the representable range
- The stack is full
Which addressing mode uses the Program Counter (PC) to calculate the effective address?
- Direct
- Immediate
- Relative
- Register Indirect
Understand / Explain (Q6–Q10)
Why does a 0-address instruction format require more instructions than a 3-address format for the same expression?
- Because 0-address uses longer instructions
- Because it must explicitly push each operand and pop the result, using the stack for all operations
- Because it has fewer registers
- Because the ALU is slower
Why is pipelining more efficient in RISC than CISC architectures?
- RISC has more registers
- RISC uses fixed-length instructions, making fetch and decode stages predictable
- RISC has a larger instruction set
- CISC uses hardwired control
In indirect addressing, why are two memory accesses needed to fetch the operand?
- One to read the instruction, one to read the operand
- One to read the pointer address from memory, another to read the actual operand from that address
- One for the opcode, one for the address field
- One for the stack, one for the register
What is the purpose of the SELD field in the general register organization control word?
- Selects the ALU operation
- Selects the first source register
- Selects the destination register where the ALU result is stored
- Selects the memory address
Why does the Carry Flag (CF) and Overflow Flag (OF) serve different purposes?
- CF is for addition, OF is for subtraction
- CF detects unsigned overflow (carry out), OF detects signed overflow (sign error)
- CF is set by the ALU, OF is set by the control unit
- They always have the same value
Apply / Solve (Q11–Q15)
For the expression Y = (A + B) ร C using 0-address instructions, how many instructions are needed?
- 4
- 5
- 6
- 7
After executing ADD 0xFFFF + 0x0001 (16-bit), the flags are:
- CF=0, ZF=1, SF=0, OF=0
- CF=1, ZF=1, SF=0, OF=0
- CF=1, ZF=0, SF=0, OF=1
- CF=0, ZF=0, SF=1, OF=1
Given R2 = 400 and the instruction LOAD 100(R2), with displacement addressing, the effective address is:
- 100
- 400
- 500
- 300
The control word for R7 โ R0 OR R4 (given OPR for OR = 01010) is:
- 000 100 111 01010
- 111 000 100 01010
- 100 000 111 01010
- 000 111 100 01010
How many instructions are needed to compute X = (P โ Q) ร (R + S) in 2-address format?
- 5
- 6
- 7
- 8
Analyze / Compare (Q16–Q20)
Which of the following is NOT an advantage of RISC over CISC?
- Better pipelining efficiency
- Smaller code size for the same program
- Lower power consumption
- Simpler hardware design
A program uses autoincrement addressing to traverse an array of 100 integers. If using direct addressing instead, how many additional instructions would be needed?
- 0 โ same count
- 100 โ one extra increment per element
- 99 โ increment after each access except last
- 200 โ one extra load and increment per element
Why do modern Intel CPUs internally translate CISC instructions into RISC-like micro-operations (ยตops)?
- To save memory
- To enable efficient out-of-order execution and pipelining of fixed-size operations
- To reduce the number of registers
- To make software compatible with ARM
In a daisy-chain priority interrupt system, device D3 is connected between D2 and D4. If D2 and D4 both raise interrupts simultaneously, which gets serviced first?
- D4 (it's farther from CPU)
- D2 (it's closer to CPU, higher priority)
- Both serviced simultaneously
- Neither โ deadlock occurs
Compare the total memory bits for encoding "ADD R1, R2, R3" in a 3-address format (6-bit opcode, 4-bit registers) vs a stack-based 0-address equivalent. Which uses fewer total bits?
- 3-address: fewer bits
- 0-address: fewer bits
- Both use the same number of bits
- Cannot be determined without more information
Evaluate / Justify (Q21–Q25)
For an embedded real-time system controlling an automotive braking mechanism, which interrupt priority scheme is most appropriate?
- Software polling
- Daisy-chain priority
- Parallel priority with hardware encoder
- No interrupts โ use busy waiting
A student argues: "Register addressing is always better than direct addressing because it's faster." Is this correct?
- Yes โ registers are always faster than memory
- No โ register addressing cannot access large data structures in memory
- Yes โ all modern CPUs use only register addressing
- No โ direct addressing is faster for constants
A company is choosing between stack-based and register-based architecture for a new Java bytecode processor. Which is more suitable?
- Register-based โ always better performance
- Stack-based โ matches Java's stack-oriented bytecode natively
- Both are equally suitable
- Neither โ a CISC approach is needed
Is it justified for India to invest in RISC-V over licensing ARM cores? Evaluate:
- No โ ARM is industry-proven and reliable
- Yes โ RISC-V eliminates licensing costs and enables sovereign chip design
- No โ RISC-V has no ecosystem
- Yes โ but only for military applications
A system has both maskable and non-maskable interrupts. The IF (Interrupt Flag) is cleared. Which statement is true?
- Both maskable and non-maskable interrupts are blocked
- Only maskable interrupts are blocked; NMI still fires
- Only NMI is blocked; maskable interrupts still fire
- Neither is affected โ IF only controls software interrupts
Create / Design (Q26–Q30)
You are designing a 32-bit RISC CPU with 64 registers and need a 3-address instruction format. How many bits remain for the opcode if the instruction is 32 bits?
- 8 bits
- 14 bits
- 10 bits
- 12 bits
If you add autoincrement and autodecrement modes to a CPU with 8 registers, how many additional control bits are needed per instruction to specify the mode?
- 1 bit (auto-inc or auto-dec per operand)
- 2 bits per register field (4 modes: none, inc, dec, indirect)
- 3 bits total
- No additional bits needed
Design consideration: A new ISA needs to support both 16-bit and 32-bit instructions (like ARM's Thumb mode). What is the primary advantage?
- Faster execution speed
- Reduced code size while maintaining 32-bit capability for complex operations
- More addressing modes
- More registers
You need to design an interrupt controller supporting 8 devices with programmable priority. What minimum hardware is needed?
- 8 flip-flops and an 8-to-3 priority encoder
- 3 flip-flops and a 3-to-8 decoder
- 8 interrupt request lines, 8 mask flip-flops, an 8-to-3 priority encoder, and a priority register
- A single status register
If you architect a CPU with separate instruction and data caches (Harvard architecture), which stage of the pipeline benefits most?
- Execute stage
- Writeback stage
- Fetch stage โ instruction fetch and data access can happen simultaneously
- Decode stage
Short Answer Questions (8)
SA1: What is General Register Organization? Describe its control word.
General Register Organization is a CPU architecture where multiple general-purpose registers (e.g., R0–R7) are connected through multiplexers to an ALU. Any register can serve as source or destination. The control word has four fields: SELA (selects source register A for MUX A), SELB (selects source register B for MUX B), SELD (selects destination register for ALU output), and OPR (selects the ALU operation). For 8 registers, each SEL field needs 3 bits, and OPR needs 5 bits, giving a 14-bit control word.
SA2: Explain PUSH and POP operations with a memory stack diagram.
PUSH adds data to the stack: SP is decremented first (SP โ SPโ1), then data is written to the memory location pointed to by SP (M[SP] โ DR). POP removes data: the value at SP is read into the data register (DR โ M[SP]), then SP is incremented (SP โ SP+1). The stack grows downward in memory โ PUSH moves SP to lower addresses, POP moves it to higher addresses. Stack overflow occurs if SP goes below the lower limit, and underflow occurs if POP is attempted when the stack is empty.
SA3: Differentiate between Direct and Indirect addressing modes.
In Direct addressing, the address field in the instruction directly contains the effective address (EA) of the operand. Only one memory access is needed: EA = Address field. In Indirect addressing, the address field contains a pointer โ the memory address of a location that holds the actual EA. Two memory accesses are needed: first to read the pointer, then to read the operand at the pointed address. Indirect addressing is slower but more flexible, supporting pointers and dynamic memory allocation. Direct is simpler and faster but limited to a fixed address space.
SA4: What is the difference between 1-address and 0-address instruction formats?
In a 1-address format, instructions have one explicit operand address and use an implicit accumulator (AC) as the other operand and destination. Example: ADD X means AC โ AC + M[X]. In a 0-address format, instructions have no explicit address fields and use an implicit stack. Operations pop operands from the stack, compute, and push the result. Example: ADD pops two values, adds them, and pushes the sum. 0-address instructions are shortest but need more instructions per expression; 1-address instructions are slightly longer but require fewer instructions.
SA5: List 6 key differences between RISC and CISC.
(1) RISC has a small instruction set (50โ150); CISC has a large set (200โ300+). (2) RISC uses fixed-length instructions; CISC uses variable-length. (3) RISC executes most instructions in 1 clock cycle; CISC takes multiple cycles. (4) RISC uses load/store architecture (only LOAD/STORE access memory); CISC allows any instruction to access memory. (5) RISC has many registers (32โ64); CISC has fewer (8โ16). (6) RISC uses hardwired control; CISC uses microprogrammed control. Examples: ARM, MIPS (RISC) vs Intel x86, AMD (CISC).
SA6: What are the different types of interrupts? Give examples.
Hardware External: Generated by I/O devices (keyboard press, timer tick). Hardware Internal (Traps): Generated by CPU errors (division by zero, invalid opcode). Software Interrupts: Triggered by instructions (INT 21h in DOS, SVC in ARM) for system calls. Non-Maskable Interrupt (NMI): Cannot be disabled, used for critical events (power failure, memory parity error). Maskable Interrupt: Can be enabled/disabled via the IF flag in PSW, used for peripheral devices. The CPU handles interrupts by saving context (PC, PSW), executing the ISR, then restoring context via RTI.
SA7: Explain the flags CF, ZF, SF, and OF with one example each.
CF (Carry Flag): Set when unsigned arithmetic produces a carry/borrow. Example: 0xFF + 0x01 = 0x00 with CF=1. ZF (Zero Flag): Set when the result is zero. Example: 5 โ 5 = 0, ZF=1. SF (Sign Flag): Set when the result's MSB is 1 (negative in signed representation). Example: 3 โ 5 = โ2, SF=1. OF (Overflow Flag): Set when signed arithmetic exceeds the representable range. Example: 0x7F + 0x01 = 0x80 (8-bit: +127 + 1 = โ128), OF=1. CF and OF are independent โ CF checks unsigned overflow, OF checks signed overflow.
SA8: What is autoincrement addressing mode? Where is it used?
In autoincrement addressing, the effective address is the content of a specified register, and after the operand is accessed, the register is automatically incremented by the operand size (e.g., +1 for bytes, +4 for 32-bit words). Formula: EA = [R]; then R โ R + 1. This eliminates the need for a separate increment instruction when traversing arrays or sequential data structures. It is widely used in loop-based array processing, string operations, and stack implementations (POP uses autoincrement). ARM supports post-increment addressing: LDR R0, [R1], #4 loads from [R1] then adds 4 to R1.
Long Answer Questions (3)
๐ LA1: ARM vs Intel โ A Comprehensive Architecture Case Study
Question: Compare ARM (RISC) and Intel x86 (CISC) architectures across at least 10 parameters. Analyse why ARM dominates smartphones while Intel dominates desktops. Include the recent shift with Apple M-series and Qualcomm Snapdragon X Elite.
Answer:
1. Historical Context: ARM was designed in 1985 by Acorn Computers (UK) for low-power embedded use. Intel x86 was designed in 1978 for general-purpose computing. Their design philosophies diverged fundamentally: ARM prioritized simplicity and power efficiency; x86 prioritized backward compatibility and computational power.
2. Architecture Comparison:
| Parameter | ARM (RISC) | Intel x86 (CISC) |
|---|---|---|
| Instruction Set | ~150 simple instructions | ~1500+ complex instructions |
| Instruction Length | Fixed 32-bit (or 16-bit Thumb) | Variable 1-15 bytes |
| Registers | 31 GP registers (AArch64) | 16 GP registers (x86-64) |
| Memory Access | Load/Store only | Any instruction can access memory |
| Pipeline Efficiency | Excellent (fixed-length) | Complex (needs ยตop translation) |
| Power Consumption | 0.5โ5W per core | 15โ125W per package |
| Performance/Watt | Industry-leading | Improving but behind ARM |
| Control Unit | Hardwired | Microprogrammed |
| Conditional Execution | Most instructions conditional | Only branch instructions |
| Software Ecosystem | Android, iOS, embedded | Windows, Linux desktop, server |
3. Why ARM Dominates Smartphones: Smartphones are battery-powered devices where power efficiency is paramount. ARM's simple instruction set means less transistor switching, lower heat generation, and longer battery life. A Snapdragon 8 Gen 3 runs at 3.3 GHz consuming only ~5W, while an Intel i7 needs 125W for similar single-threaded performance. ARM's licensing model also allows chip companies (Qualcomm, Samsung, MediaTek) to customize cores for specific needs.
4. Why Intel Dominated Desktops: The x86 ecosystem has 40+ years of software compatibility. Windows, Office, games, and enterprise software were compiled for x86. Switching would break trillions of dollars of existing software. Intel's high power consumption was acceptable because desktops/laptops have wall power and active cooling.
5. The 2020s Shift โ Apple M-Series: Apple's M1 (2020) proved ARM can match or beat Intel in laptops. The M4 (2024) delivers desktop-class performance at laptop power levels. Apple achieved this by: designing custom ARM cores (not off-the-shelf), integrating CPU/GPU/Neural Engine on one chip (SoC), using TSMC's advanced 3nm process, and optimizing macOS for ARM.
6. Qualcomm Snapdragon X Elite: Qualcomm brought ARM to Windows PCs in 2024. Running Windows via emulation (for x86 apps) and native ARM apps, it delivers competitive performance at a fraction of Intel's power. This threatens Intel's last stronghold โ the PC market.
7. India Connect: Every smartphone in India runs ARM. India's SHAKTI (IIT Madras) and VEGA (C-DAC) processors use RISC-V, the open-source cousin of ARM. India's semiconductor mission aims to manufacture ARM-based chips domestically by 2028.
๐ LA2: Design a CPU Datapath for 3-Address Instructions
Question: Design a complete CPU datapath that can execute 3-address register-to-register instructions of the form OP RD, RS1, RS2. Include the register file, ALU, control signals, and show the data flow for ADD R3, R1, R2.
Answer:
Datapath Components:
1. Instruction Register (IR): Holds the current instruction. Fields: Opcode (6 bits), RD (3 bits), RS1 (3 bits), RS2 (3 bits).
2. Register File: 8 registers (R0โR7), two read ports (Port A, Port B) and one write port (Port W). Port A outputs R[RS1], Port B outputs R[RS2], Port W writes to R[RD].
3. ALU: Takes two inputs (from read ports), performs operation based on Opcode, produces Result and Flags.
4. Control Unit: Decodes Opcode and generates: ALU_OP (selects ALU function), RegWrite (enables write to register file), FlagWrite (enables PSW update).
DATAPATH
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ INSTRUCTION REGISTER โ
โ [Opcode| RD | RS1 | RS2 ] โ
โ 6 bits 3 bits 3 bits 3 bits โ
โโโโโโฌโโโโโโโโฌโโโโโโโโฌโโโโโโโโโฌโโโโโโโโโโโโโ
โ โ โ โ
โผ โ โผ โผ
โโโโโโโโโโโ โ โโโโโโโโโโโโโโโโโโโโ
โ Control โ โ โ REGISTER FILE โ
โ Unit โ โ โ Read Port AโRS1 โโโโ Bus A
โ โ โ โ Read Port BโRS2 โโโโ Bus B
โ ALU_OP โ โ โ Write PortโRD โโโโ Result Bus
โ RegWriteโ โ โ Write Enable โโโโ RegWrite
โโโโโโฌโโโโโ โ โโโโโโโโโโโโโโโโโโโโ
โ โ โ โ
โ โ Bus Aโ Bus Bโ
โผ โ โผ โผ
โ โ โโโโโโโโโโโโโโโโโโโ
โโโโโโโโโผโโโโโโ A L U โ
โ โ โ (ALU_OP input) โ
โ โ โโโโโโโโโฌโโโโโโโโโโ
โ โ โ Result
โ โ โผ
โ โ โโโโโโโโโโโโโโโโโ
โ โ โ FLAGS (PSW) โ
โ โ โ CF ZF SF OF โ
โ โโโโโโโค Result Bus โโโโ back to Register File
โ โโโโโโโโโโโโโโโโโ
Data Flow for ADD R3, R1, R2:
- IR contains: Opcode=ADD, RD=011(R3), RS1=001(R1), RS2=010(R2)
- Control Unit decodes ADD โ sets ALU_OP=ADD, RegWrite=1, FlagWrite=1
- Register File Read Port A outputs R1 value onto Bus A
- Register File Read Port B outputs R2 value onto Bus B
- ALU receives Bus A and Bus B, performs addition, outputs Result
- Flags (CF, ZF, SF, OF) are updated based on the addition result
- Result Bus carries the sum back to Register File Write Port
- R3 is updated with the ALU result (since RD=011 selects R3 and RegWrite=1)
๐ LA3: Comprehensive Addressing Modes with EA Calculations
Question: Given the following machine state, calculate the effective address and operand value for all 8 addressing modes:
Registers: R1=500, R2=100, PC=3000. Memory: M[200]=500, M[400]=700, M[500]=800, M[600]=42, M[700]=99, M[800]=55, M[3050]=25.
Instruction address field = 200. Register field points to R1 (value 500).
Answer:
| Mode | EA Formula | EA Calculation | EA | Operand |
|---|---|---|---|---|
| Immediate | Operand = Addr field | Operand = 200 | N/A | 200 |
| Direct | EA = Addr | EA = 200 | 200 | M[200] = 500 |
| Indirect | EA = M[Addr] | EA = M[200] = 500 | 500 | M[500] = 800 |
| Register | Operand = R1 | Operand = R1 = 500 | N/A | 500 |
| Reg. Indirect | EA = [R1] | EA = R1 = 500 | 500 | M[500] = 800 |
| Autoincrement | EA = [R1]; R1++ | EA = 500; R1โ501 | 500 | M[500] = 800 |
| Displacement | EA = Addr + [R2] | EA = 200 + 100 = 300 | 300 | M[300] (not given) |
| Relative | EA = PC + Addr | EA = 3000 + 50 = 3050 | 3050 | M[3050] = 25 |
Key Observations:
- Immediate and Register modes don't access memory for the operand โ fastest
- Indirect mode requires two memory accesses โ slowest
- Register Indirect and Indirect can produce the same EA if the register and memory location hold the same value
- Autoincrement has a side effect โ modifying R1 for the next instruction (useful for array traversal)
- Relative addressing makes the code position-independent โ the target moves with the code
Industry Spotlight — A Day in the Life
๐จโ๐ป Deepak Verma, 31 — CPU Verification Engineer at Samsung Semiconductor, Noida
Background: B.Tech in Electronics from MNNIT Allahabad (2015). No GATE coaching, no IIT background. Joined as a fresher at a small Noida VLSI startup. Self-taught SystemVerilog and UVM during evenings. Moved to Samsung Semiconductor R&D after 3 years.
A Typical Day:
9:00 AM โ Team standup. Review overnight regression results. 3 out of 1,200 test cases failed on the ARM Cortex-A core being verified.
10:00 AM โ Debug failing test case #847: a corner case in the Load-Store Unit where back-to-back PUSH operations with interrupts cause a pipeline stall that isn't handled correctly.
12:00 PM โ Write a new SystemVerilog assertion to catch this bug in future regressions. Run targeted simulation on the modified RTL.
1:30 PM โ Lunch at Samsung's Noida campus cafeteria. Discuss ARM Cortex-X4 micro-architecture with the design team.
2:30 PM โ Write coverage analysis report: 94.7% code coverage, 89.3% functional coverage. Identify 12 uncovered scenarios related to interrupt nesting.
4:30 PM โ Review a colleague's UVM testbench for the branch prediction unit. Suggest improvements to constrained-random stimulus generation.
6:00 PM โ Learning hour: study ARM Architecture Reference Manual (ARM ARM) for ARMv9 security extensions (Realm Management Extension). Samsung is implementing this for the next Galaxy flagship's chip.
| Detail | Info |
|---|---|
| Tools Used Daily | SystemVerilog, UVM, Synopsys VCS, Verdi (waveform debugger), ARM Fast Models, Git, Jira |
| Entry Salary (2024) | โน6โ9 LPA + benefits |
| Mid-Level (3โ5 yrs) | โน12โ20 LPA |
| Senior (7+ yrs) | โน25โ45 LPA |
| Companies Hiring (India) | Samsung Semiconductor, Qualcomm Hyderabad, Intel Bangalore, AMD Hyderabad, Texas Instruments, MediaTek Noida, ARM India, Synopsys, Cadence, NXP |
| Required Skills | Verilog/SystemVerilog, UVM, CPU architecture knowledge, ARM/RISC-V ISA, digital design fundamentals |
Earn With It — Embedded Projects & CPU Design
๐ฐ Your Earning Path After This Chapter
Portfolio Piece: A documented ISA design (Tier 3 lab) + instruction format comparison + flag tracing analysis. This demonstrates CPU architecture knowledge to employers.
Beginner Gig Ideas:
โข Arduino/ESP32 embedded programming projects โ โน3,000โโน10,000/project
โข Assembly language tutoring for B.Tech students โ โน500โโน1,000/session
โข Technical content writing (COA topics for edtech platforms) โ โน1,500โโน5,000/article
โข FPGA project implementation for final-year students โ โน5,000โโน15,000/project
| Opportunity | Platform | Earning Potential |
|---|---|---|
| Embedded Systems Freelancing | Upwork, Freelancer, Fiverr | โน5,000โโน25,000/project |
| FPGA/Verilog Projects | Direct college outreach | โน5,000โโน15,000/project |
| ARM Assembly Tutoring | Superprof, Chegg, local | โน500โโน1,500/hour |
| Technical Blog Writing | Medium, GeeksforGeeks, EduArtha | โน1,500โโน5,000/article |
| Raspberry Pi Projects | IoT project contracts | โน3,000โโน12,000/project |
| VLSI Internships | Samsung, Qualcomm, Intel via Internshala | โน15,000โโน40,000/month |
Chapter Summary
๐ Key Concepts Covered in Unit 4
1. General Register Organization: CPU with R0โR7 register file, MUX A/B for source selection, ALU for computation, output bus for result writeback. 14-bit control word: SELA(3) + SELB(3) + SELD(3) + OPR(5).
2. Stack Organization: LIFO structure managed by Stack Pointer (SP). PUSH: SPโโ, M[SP]โDR. POP: DRโM[SP], SP++. Used in expression evaluation (postfix/RPN), subroutine calls, and interrupt handling.
3. Instruction Formats: 3-address (OP D,S1,S2), 2-address (OP D,S), 1-address (accumulator), 0-address (stack). Trade-off: fewer addresses โ shorter instructions but more of them.
4. Addressing Modes (8): Immediate, Direct, Indirect, Register, Register Indirect, Autoincrement, Displacement, Relative. Each offers different EA calculation, speed, and flexibility trade-offs.
5. RISC vs CISC: RISC = small instruction set, fixed-length, load/store, many registers, efficient pipelining (ARM, MIPS). CISC = large instruction set, variable-length, memory-to-memory, complex decode (x86). Modern Intel CPUs internally convert CISC to RISC ยตops.
6. Data Transfer & Manipulation: LOAD, STORE, MOV, PUSH, POP (transfer). ADD, SUB, MUL, DIV (arithmetic). AND, OR, XOR, NOT (logical). SHL, SHR, ROL, ROR (shift/rotate).
7. Program Control & Interrupts: JMP, CALL, RET for control flow. Interrupts: hardware (external/internal), software, maskable, non-maskable. Priority schemes: daisy chain, parallel, software polling.
8. PSW/Flags: CF (carry), ZF (zero), SF (sign), OF (overflow), IF (interrupt enable). CF detects unsigned overflow; OF detects signed overflow. They are independent.
Key Formulas for Quick Revision
| Formula | Description |
|---|---|
| Control word bits = 3(SELA) + 3(SELB) + 3(SELD) + 5(OPR) = 14 | For 8-register organization |
| Register select bits = logโ(N) | N = number of registers |
| EA (Direct) = Address field | One memory access |
| EA (Indirect) = M[Address field] | Two memory accesses |
| EA (Displacement) = Addr + [R] | Base + offset |
| EA (Relative) = PC + offset | Position-independent code |
| OF = 1 when sign(A) = sign(B) โ sign(Result) | Signed overflow detection |
Earning Checkpoint
| Skill | Tool / Method | Portfolio Artifact | Can You Earn? |
|---|---|---|---|
| General Register Organization | Pen & paper / diagrams | Annotated register org diagram | โ Yes โ interview preparation asset |
| Stack Operations & Expression Eval | Manual tracing / Python | Stack trace tables for complex expressions | โ Yes โ tutoring & content writing |
| Instruction Formats (3/2/1/0-addr) | Manual conversion | Instruction format comparison document | โ Yes โ educational content creation |
| Addressing Modes (All 8) | EA calculation practice | Addressing modes cheat sheet | โ Yes โ GATE coaching material |
| RISC vs CISC Comparison | Research & analysis | ARM vs Intel comparison report | โ Yes โ technical blog writing |
| Assembly/Embedded Programming | ARM on Raspberry Pi | 3 embedded projects on GitHub | โ Yes โ โน3,000โโน12,000/project |
| Interrupt Handling | Conceptual + coding | Interrupt priority simulation | โฌ Not yet โ need practical exposure |
| Flag/PSW Tracing | Manual bit-level tracing | Flag trace examples document | โ Yes โ tutoring sessions |
โ Unit 4 complete. Ready for Unit 5: Pipeline & Vector Processing!
[QR: Link to EduArtha video tutorial โ Central Processing Unit]