Computer Organization & Architecture

Unit 4: Central Processing Unit

From register files to interrupt handling — master CPU internals, instruction formats, addressing modes, and the RISC vs CISC battle that shapes every processor in your pocket.

⏱️ 7 hrs theory + 5 hrs lab | 🎯 GATE ~4 marks | 🖥️ ARM vs Intel

💼 Jobs this unlocks: VLSI Design Engineer (₹6–12 LPA) | Embedded Systems Engineer (₹5–10 LPA) | CPU Verification Engineer (₹8–18 LPA)

Section A

Opening Hook — Apple M4 vs Snapdragon X Elite: The CPU War

🔥 The RISC vs CISC Battle That Changed Computing Forever

In 2024, Apple unveiled the M4 chip — an ARM-based RISC processor that obliterates Intel's Core Ultra in performance-per-watt. A MacBook Pro with M4 delivers 38 trillion operations per second while sipping battery like a phone. Meanwhile, Qualcomm's Snapdragon X Elite brought ARM to Windows laptops, threatening Intel's 40-year x86 CISC monopoly on PCs.

Here's the twist: both M4 and Snapdragon X Elite are RISC processors — they use a reduced instruction set with fixed-length instructions. Intel's Core Ultra and AMD's Ryzen are CISC processors — complex instruction sets with variable-length instructions. For decades, everyone thought CISC won the PC war. Now RISC is eating CISC's lunch.

Behind every chip is a CPU architecture built from register files, ALUs, instruction decoders, and interrupt controllers — exactly what this chapter teaches you. Understand this chapter, and you'll understand why Apple's stock is worth $3 trillion.

🍎 Apple📱 Qualcomm💻 Intel🔴 AMD🇬🇧 ARM Holdings🇰🇷 Samsung

India is designing its own CPU! IIT Madras developed the SHAKTI processor — India's first indigenous RISC-V CPU. RISC-V is an open-source instruction set architecture (no licensing fees, unlike ARM). India's MeitY is investing ₹76,000 crore in semiconductor manufacturing under the India Semiconductor Mission. By 2030, Indian engineers may be designing CPUs rivalling Qualcomm and MediaTek.

Section B

Learning Outcomes — Bloom's Taxonomy Mapped

Bloom's Level	Learning Outcome
🔵 Remember	List all 8 addressing modes and define RISC vs CISC architectures
🔵 Remember	Identify components of General Register Organization: register file, MUX, ALU, output bus
🔵 Understand	Explain stack organization with PUSH/POP operations and Stack Pointer movement
🔵 Understand	Describe how 3-address, 2-address, 1-address, and 0-address instruction formats encode operations
🟢 Apply	Evaluate X=(A+B)*(C+D) using all four instruction formats with complete instruction sequences
🟢 Apply	Trace flag changes (CF, ZF, SF, OF) in the PSW after arithmetic operations like ADD 0x7FFF+0x0001
🟠 Analyze	Compare RISC vs CISC across 12+ parameters with ARM vs x86 real-world examples
🟠 Analyze	Determine effective addresses for all 8 addressing modes given memory contents and register values
🔴 Evaluate	Justify when to use stack organization vs register organization for different application scenarios
🔴 Evaluate	Assess interrupt priority handling schemes (daisy-chain vs parallel) for real-time embedded systems
🟣 Create	Design a simple instruction set with 16 instructions supporting at least 4 addressing modes
🟣 Create	Architect a CPU datapath for a given 3-address instruction format with control signals

Section C

Concept Explanation — CPU Architecture from Scratch

1. General Register Organization

A CPU's register organization determines how data flows between registers, the ALU, and memory. In a general register organization, the CPU has a set of general-purpose registers (typically R0–R7), and any register can be used as a source or destination for any operation. This is the most flexible and common organization used in modern CPUs.

🔧 Components of General Register Organization

Register File (R0–R7): A set of 8 general-purpose registers, each capable of holding one data word. Any register can serve as source or destination. Registers are faster than memory because they are inside the CPU.

MUX A (Multiplexer A): Selects one register as the first source operand (input A to ALU). Controlled by SELA (3-bit select line).

MUX B (Multiplexer B): Selects one register as the second source operand (input B to ALU). Controlled by SELB (3-bit select line).

ALU (Arithmetic Logic Unit): Performs the actual operation (ADD, SUB, AND, OR, etc.) on the two inputs from MUX A and MUX B. Controlled by OPR (5-bit operation code).

Output Bus: Carries the ALU result back to the register file. The destination register is selected by SELD (3-bit select line).

ASCII DIAGRAM
     ┌──────────────────────────────────────────┐
     │          REGISTER FILE (R0 – R7)         │
     │  ┌────┬────┬────┬────┬────┬────┬────┬────┐│
     │  │ R0 │ R1 │ R2 │ R3 │ R4 │ R5 │ R6 │ R7 ││
     │  └────┴────┴────┴────┴────┴────┴────┴────┘│
     └─────────┬──────────────────┬──────────────┘
               │                  │
          ┌────▼─────┐      ┌────▼─────┐
          │  MUX  A  │      │  MUX  B  │
          │ (SELA)   │      │ (SELB)   │
          └────┬─────┘      └────┬─────┘
               │   Input A       │  Input B
               ▼                 ▼
          ┌──────────────────────────┐
          │      A   L   U          │
          │     (OPR — 5 bits)      │
          └────────────┬────────────┘
                       │
                  Output Bus
                       │
                  ┌────▼─────┐
                  │  SELD    │ ← Destination register select
                  │ (3 bits) │
                  └──────────┘
                       │
               (writes back to Register File)

Control Word Format (14 bits)

Field	Bits	Purpose	Range
SELA	3	Select source register A (MUX A input)	000 (R0) to 111 (R7)
SELB	3	Select source register B (MUX B input)	000 (R0) to 111 (R7)
SELD	3	Select destination register for result	000 (R0) to 111 (R7)
OPR	5	ALU operation select	00000 (Transfer A) to 11111

Example: R3 ← R1 + R2

Field	Value	Meaning
SELA	001	Select R1 as source A
SELB	010	Select R2 as source B
SELD	011	Select R3 as destination
OPR	00010	ADD operation

Complete control word: 001 010 011 00010 (14 bits)

Students often confuse SELA/SELB with SELD. SELA and SELB select the source registers (inputs to ALU). SELD selects the destination register (where the result is stored). The output bus always writes to the register selected by SELD.

ARM Cortex-M4 has 16 general-purpose registers (R0–R15), with R13 as the Stack Pointer, R14 as the Link Register, and R15 as the Program Counter. The concept is the same as R0–R7 above, just scaled up. Understanding this 8-register model perfectly prepares you for real ARM programming.

2. Stack Organization

A stack is a last-in-first-out (LIFO) storage structure. In CPU architecture, stacks are used for subroutine calls (saving return addresses), expression evaluation, and interrupt handling. The stack is managed by a special register called the Stack Pointer (SP).

📚 Memory Stack — PUSH and POP Operations

PUSH operation (add to stack):

Step 1: SP ← SP − 1 (decrement stack pointer — stack grows downward)

Step 2: M[SP] ← DR (write data register value to memory at SP)

POP operation (remove from stack):

Step 1: DR ← M[SP] (read value from memory at SP into data register)

Step 2: SP ← SP + 1 (increment stack pointer — stack shrinks upward)

MEMORY STACK
    Address    Memory        Notes
    ┌────────┬────────────┐
    │  4000  │  (empty)   │  ← Initial SP (stack empty)
    ├────────┼────────────┤
    │  3999  │  Data_3    │  ← SP after 3 PUSHes (Top of Stack)
    ├────────┼────────────┤
    │  3998  │  Data_2    │
    ├────────┼────────────┤
    │  3997  │  Data_1    │  ← First item pushed
    ├────────┼────────────┤
    │  ...   │   ...      │
    ├────────┼────────────┤
    │  3000  │  (limit)   │  ← Stack bottom (overflow if SP < 3000)
    └────────┴────────────┘

    Stack grows DOWNWARD (address decreases on PUSH)
    Stack shrinks UPWARD (address increases on POP)
    FULL condition:  SP = 3000 (stack bottom limit)
    EMPTY condition: SP = 4000 (initial value)

0-Address Instructions & Stack-Based Expression Evaluation

In 0-address (stack) architecture, instructions like ADD don't specify operands — they implicitly pop two values from the stack, operate, and push the result. This is how the expression (A+B)×C is evaluated using Reverse Polish Notation (RPN).

Infix: (A + B) × C RPN (Postfix): A B + C ×

Assume A=3, B=5, C=4:

Step	Instruction	Action	Stack (top →)	Result
1	PUSH A	Push 3	3	—
2	PUSH B	Push 5	3, 5	—
3	ADD	Pop 5,3; Push 3+5	8	A+B = 8
4	PUSH C	Push 4	8, 4	—
5	MUL	Pop 4,8; Push 8×4	32	(A+B)×C = 32

Java's JVM is a stack-based virtual machine! Every Java bytecode instruction operates on a stack. When you write int x = a + b; in Java, the JVM internally executes: ILOAD a, ILOAD b, IADD, ISTORE x — pure 0-address stack operations. Postscript printers and HP calculators also use stack-based evaluation.

3. Instruction Formats

An instruction format defines the layout of bits in a machine instruction — how many addresses (operands) are specified, what fields are present, and how long each field is. The number of address fields determines the instruction type.

Evaluating X = (A + B) × (C + D) in All Four Formats

3-Address Format: OP DEST, SRC1, SRC2

3-ADDRESS
Instruction 1:  ADD  R1, A, B      ; R1 ← M[A] + M[B]
Instruction 2:  ADD  R2, C, D      ; R2 ← M[C] + M[D]
Instruction 3:  MUL  X,  R1, R2    ; M[X] ← R1 × R2

Total instructions: 3
Advantage: Fewest instructions, most information per instruction
Disadvantage: Longest instruction word (3 address fields)

2-Address Format: OP DEST, SRC (DEST ← DEST op SRC)

2-ADDRESS
Instruction 1:  MOV  R1, A      ; R1 ← M[A]
Instruction 2:  ADD  R1, B      ; R1 ← R1 + M[B]        → R1 = A+B
Instruction 3:  MOV  R2, C      ; R2 ← M[C]
Instruction 4:  ADD  R2, D      ; R2 ← R2 + M[D]        → R2 = C+D
Instruction 5:  MUL  R1, R2     ; R1 ← R1 × R2          → R1 = (A+B)×(C+D)
Instruction 6:  MOV  X,  R1     ; M[X] ← R1

Total instructions: 6
Advantage: Moderate instruction length
Disadvantage: One source is always overwritten (destructive)

1-Address Format: OP ADDR (uses Accumulator AC implicitly)

1-ADDRESS
Instruction 1:  LOAD  A        ; AC ← M[A]
Instruction 2:  ADD   B        ; AC ← AC + M[B]        → AC = A+B
Instruction 3:  STORE T        ; M[T] ← AC             → save A+B
Instruction 4:  LOAD  C        ; AC ← M[C]
Instruction 5:  ADD   D        ; AC ← AC + M[D]        → AC = C+D
Instruction 6:  MUL   T        ; AC ← AC × M[T]        → AC = (C+D)×(A+B)
Instruction 7:  STORE X        ; M[X] ← AC

Total instructions: 7
Advantage: Short instruction word
Disadvantage: Needs temporary storage, more instructions

0-Address Format: OP (uses Stack implicitly)

0-ADDRESS
Instruction 1:  PUSH  A        ; TOS ← A
Instruction 2:  PUSH  B        ; TOS ← B
Instruction 3:  ADD             ; Pop B,A; Push A+B
Instruction 4:  PUSH  C        ; TOS ← C
Instruction 5:  PUSH  D        ; TOS ← D
Instruction 6:  ADD             ; Pop D,C; Push C+D
Instruction 7:  MUL             ; Pop (C+D),(A+B); Push (A+B)×(C+D)
Instruction 8:  POP   X        ; M[X] ← TOS

Total instructions: 8
Advantage: Shortest instruction word, hardware-friendly
Disadvantage: Most instructions needed, stack management overhead

Comparison of Instruction Formats

Parameter	3-Address	2-Address	1-Address	0-Address
Fields	OP + 3 addr	OP + 2 addr	OP + 1 addr	OP only
Instruction Length	Longest	Medium	Short	Shortest
Instructions for X=(A+B)×(C+D)	3	6	7	8
Program Size	Fewest instructions	Moderate	More	Most instructions
Memory Access	Multiple per instruction	2 per instruction	1 per instruction	Stack only
Register Usage	General purpose	General purpose	Accumulator	Stack
Example CPU	ARM, MIPS	x86 (MOV, ADD)	Early PDP-8	JVM, HP calculators

Don't confuse instruction count with program efficiency. 3-address has fewer instructions but each is longer (more bits). 0-address has more instructions but each is very short. Total program size in bits may be similar. GATE questions often ask you to compare total memory needed, not just instruction count.

4. Addressing Modes — All 8

An addressing mode specifies how the CPU calculates the effective address (EA) of an operand. Different modes provide different trade-offs between flexibility, speed, and code compactness. Mastering all 8 modes is essential for GATE and CPU design.

Mode 1: Immediate Addressing

Definition: The operand value is directly contained in the instruction itself. No memory access needed for operand.

EA: No effective address — operand is part of instruction.

IMMEDIATE
  Instruction: [ OP | #25 ]
                      │
                      └── Operand = 25 (directly in instruction)

  Example: MOV R1, #25    → R1 = 25
  Use: Loading constants, initializing counters
  Speed: ★★★★★ Fastest (no memory access for operand)

Mode 2: Direct Addressing

Definition: The address field contains the actual memory address of the operand.

EA = Address field of instruction

DIRECT
  Instruction: [ OP | 500 ]
                      │
                      ▼
               Memory[500] = 42   ← Operand

  EA = 500, Operand = 42
  Example: LOAD 500        → AC = M[500] = 42
  Use: Accessing global variables
  Speed: ★★★★ One memory access

Mode 3: Indirect Addressing

Definition: The address field points to a memory location that contains the effective address. Two memory accesses needed.

EA = M[Address field]

INDIRECT
  Instruction: [ OP | 500 ]
                      │
                      ▼
               Memory[500] = 800   ← This is the EA (pointer)
                      │
                      ▼
               Memory[800] = 42    ← Actual operand

  EA = 800, Operand = 42
  Example: LOAD @500       → AC = M[M[500]] = M[800] = 42
  Use: Pointers, dynamic memory access, linked lists
  Speed: ★★★ Two memory accesses (slower)

Mode 4: Register Addressing

Definition: The operand is in a CPU register. No memory access needed.

EA: None — operand is in the specified register.

REGISTER
  Instruction: [ OP | R3 ]
                      │
                      ▼
                 R3 = 42   ← Operand is in register R3

  Operand = R3 = 42
  Example: ADD R1, R3      → R1 = R1 + R3
  Use: Fastest operations, loop variables
  Speed: ★★★★★ No memory access

Mode 5: Register Indirect Addressing

Definition: The register contains the memory address of the operand.

EA = [Register] (contents of register is the address)

REGISTER INDIRECT
  Instruction: [ OP | R3 ]
                      │
                      ▼
                 R3 = 800   ← R3 holds memory address
                      │
                      ▼
               Memory[800] = 42   ← Actual operand

  EA = 800 (value in R3), Operand = 42
  Example: LOAD (R3)       → AC = M[R3] = M[800] = 42
  Use: Array access, pointer dereferencing
  Speed: ★★★★ One memory access

Mode 6: Autoincrement Addressing

Definition: Like register indirect, but the register is automatically incremented after use. Perfect for array traversal.

EA = [R]; then R ← R + 1

AUTOINCREMENT
  Before: R3 = 800

  Instruction: [ OP | (R3)+ ]
                      │
                 R3 = 800 → Memory[800] = 42   ← Operand
                      │
                 R3 = 801   ← R3 auto-incremented after access

  EA = 800, Operand = 42, R3 updated to 801
  Example: LOAD (R3)+      → AC = M[R3]; R3 = R3 + 1
  Use: Array traversal, sequential data processing
  Speed: ★★★★ One memory access + register update

Mode 7: Displacement (Indexed) Addressing

Definition: EA is computed by adding a constant displacement in the instruction to the contents of a register.

EA = Address field + [R]

DISPLACEMENT / INDEXED
  Instruction: [ OP | 100 | R2 ]
                      │      │
                      │   R2 = 500
                      │      │
                      └──+───┘
                         │
                    EA = 100 + 500 = 600
                         │
                  Memory[600] = 42   ← Operand

  EA = 600, Operand = 42
  Example: LOAD 100(R2)    → AC = M[100 + R2] = M[600] = 42
  Use: Accessing struct fields, array elements with base
  Speed: ★★★★ One addition + one memory access

Mode 8: Relative Addressing

Definition: EA is computed by adding the address field (offset) to the Program Counter (PC). Used for branch instructions.

EA = PC + Address field

RELATIVE
  Instruction at PC=200: [ OP | +50 ]
                                 │
                      PC = 200 + 50 = 250
                                 │
                          EA = 250 (target of branch)

  EA = 250
  Example: BEQ +50          → if zero flag set, jump to PC+50 = 250
  Use: Branch/jump instructions, position-independent code
  Speed: ★★★★ One addition (no memory access for address)

Comparison of All 8 Addressing Modes

Mode	EA Formula	Memory Accesses	Speed	Use Case
Immediate	Operand in instruction	0	★★★★★	Constants
Direct	EA = Addr	1	★★★★	Global variables
Indirect	EA = M[Addr]	2	★★★	Pointers
Register	Operand = R	0	★★★★★	Loop variables
Reg. Indirect	EA = [R]	1	★★★★	Array via pointer
Autoincrement	EA = [R]; R++	1	★★★★	Array traversal
Displacement	EA = Addr + [R]	1	★★★★	Struct fields
Relative	EA = PC + Addr	0	★★★★	Branches/jumps

GATE shortcut: If a question asks "how many memory accesses to fetch operand?" remember: Immediate=0, Register=0, Direct=1, Register Indirect=1, Autoincrement=1, Displacement=1, Relative=0 (for address calculation), Indirect=2. Don't forget the instruction fetch itself is also a memory access!

5. RISC vs CISC Architecture

The two dominant CPU design philosophies are RISC (Reduced Instruction Set Computer) and CISC (Complex Instruction Set Computer). This is one of the most important comparisons in computer architecture and a frequent GATE topic.

Parameter	RISC	CISC
Full Form	Reduced Instruction Set Computer	Complex Instruction Set Computer
Instruction Set Size	Small (50–150 instructions)	Large (200–300+ instructions)
Instruction Length	Fixed (32-bit typically)	Variable (1–15 bytes in x86)
Execution Time	1 clock cycle per instruction (mostly)	Multiple cycles per instruction
Addressing Modes	Few (3–5 modes)	Many (12–20+ modes)
Pipelining	Highly efficient (fixed-length helps)	Difficult (variable-length hinders)
Registers	Many (32–64 general purpose)	Few (8–16 general purpose)
Memory Access	Only LOAD/STORE access memory	Any instruction can access memory
Code Size	Larger (more instructions needed)	Smaller (complex instructions do more)
Hardware Complexity	Simple (hardwired control)	Complex (microprogrammed control)
Power Consumption	Lower (simpler circuits)	Higher (complex decode logic)
Examples	ARM, MIPS, RISC-V, SPARC, PowerPC	Intel x86, AMD x86-64, VAX, Motorola 68k
Compiler Complexity	More complex (must optimize simple ops)	Simpler (hardware does the work)
Use Cases	Mobile, embedded, IoT, laptops (Apple M4)	Desktops, servers, legacy PCs

ARM (RISC) vs Intel x86 (CISC) — Real World

Parameter	ARM Cortex-A78 (RISC)	Intel Core i7-14700K (CISC)
Instruction Width	Fixed 32-bit (or 16-bit Thumb)	Variable 1–15 bytes
Registers	31 general-purpose (AArch64)	16 general-purpose (x86-64)
Power (TDP)	~1–5W per core	~125W package
Pipeline Stages	11–13 stages	14–19 stages
Market	99% of smartphones, Apple MacBooks	~75% of desktops/servers
India Usage	Every Indian smartphone, Raspberry Pi	Office PCs, data centers

IIT Madras developed SHAKTI — India's first home-grown processor based on RISC-V. The C-class core runs Linux and targets IoT and edge computing. RISC-V is open-source (no ARM licensing fees), which makes it strategic for India's semiconductor independence. The VEGA processor by C-DAC is another Indian RISC-V initiative targeting HPC applications.

6. Data Transfer & Manipulation Instructions

Data Transfer Instructions

Instruction	Operation	Example	Description
LOAD	AC ← M[addr]	LOAD 500	Load memory into accumulator
STORE	M[addr] ← AC	STORE 600	Store accumulator to memory
MOV	DEST ← SRC	MOV R1, R2	Copy data between registers
PUSH	SP--; M[SP] ← R	PUSH R3	Push register onto stack
POP	R ← M[SP]; SP++	POP R3	Pop stack top into register
XCHG	R1 ↔ R2	XCHG R1, R2	Exchange contents of two registers
IN	R ← Port	IN R1, PORT_A	Input from I/O port
OUT	Port ← R	OUT PORT_B, R1	Output to I/O port

Data Manipulation Instructions

Category	Instruction	Operation	Example
Arithmetic	ADD	R1 ← R1 + R2	ADD R1, R2
	SUB	R1 ← R1 − R2	SUB R1, R2
	MUL	R1 ← R1 × R2	MUL R1, R2
	DIV	R1 ← R1 ÷ R2	DIV R1, R2
Logical	AND	R1 ← R1 AND R2	AND R1, R2
	OR	R1 ← R1 OR R2	OR R1, R2
	XOR	R1 ← R1 XOR R2	XOR R1, R2
	NOT	R1 ← complement of R1	NOT R1
Shift	SHL	Shift left logical	SHL R1, 1
	SHR	Shift right logical	SHR R1, 1
	ROL	Rotate left	ROL R1, 2
	ROR	Rotate right	ROR R1, 2

7. Program Control & Interrupts

Program Control Instructions

Instruction	Operation	Description
JMP addr	PC ← addr	Unconditional jump
BEQ addr	If ZF=1, PC ← addr	Branch if equal (zero flag set)
BNE addr	If ZF=0, PC ← addr	Branch if not equal
BGT addr	If ZF=0 and SF=OF, PC ← addr	Branch if greater than
CALL addr	PUSH PC; PC ← addr	Call subroutine (save return address)
RET	POP PC	Return from subroutine
NOP	No operation	Pipeline delay, alignment
HLT	Halt processor	Stop execution

Interrupts

An interrupt is a signal that diverts the CPU from its current program to execute a special routine called an Interrupt Service Routine (ISR). After handling the interrupt, the CPU returns to the original program.

Type	Source	Example	Priority
Hardware External	I/O devices, timers	Keyboard press, disk ready	High
Hardware Internal	CPU itself	Division by zero, overflow	Highest
Software	Instruction in program	INT 21h (DOS), SVC (ARM)	Programmed
Non-Maskable (NMI)	Critical hardware	Power failure, memory parity error	Cannot be disabled
Maskable	Peripheral devices	Printer ready, serial data	Can be disabled via IF flag

Interrupt Handling Cycle

INTERRUPT FLOW
  Program Execution
        │
        ▼
  ┌─────────────────┐
  │ Interrupt Signal │ ← Device sends interrupt request
  └────────┬────────┘
           ▼
  ┌─────────────────┐
  │ Finish Current  │ ← CPU completes current instruction
  │ Instruction     │
  └────────┬────────┘
           ▼
  ┌─────────────────┐
  │ Save Context    │ ← Push PC and PSW onto stack
  │ (PC, PSW, Regs) │
  └────────┬────────┘
           ▼
  ┌─────────────────┐
  │ Identify Source  │ ← Polling or vectored interrupt
  └────────┬────────┘
           ▼
  ┌─────────────────┐
  │ Load ISR Address │ ← From interrupt vector table
  │ into PC         │
  └────────┬────────┘
           ▼
  ┌─────────────────┐
  │ Execute ISR     │ ← Handle the interrupt
  └────────┬────────┘
           ▼
  ┌─────────────────┐
  │ RTI (Return     │ ← Pop PC and PSW from stack
  │ from Interrupt) │
  └────────┬────────┘
           ▼
  Resume Original Program

Priority Interrupt Systems

Method	How It Works	Pros	Cons
Daisy Chain	Devices connected in series; closest to CPU has highest priority	Simple hardware	Fixed priority, slow for many devices
Parallel Priority	Each device has dedicated line; priority encoder selects highest	Fast, flexible	More hardware, more wires
Software Polling	CPU checks each device in sequence via status registers	No extra hardware	Slow, wastes CPU cycles

The ARM Cortex-M series has the NVIC (Nested Vectored Interrupt Controller) that supports up to 240 interrupt sources with 256 priority levels. When your phone receives a call while playing music, the NVIC ensures the call interrupt pre-empts the audio processing interrupt instantly. This happens in nanoseconds.

8. Processor Status Word (PSW) & Flags

The Processor Status Word (PSW), also called the Flags Register or EFLAGS (in x86), is a special register that holds condition codes set by ALU operations. These flags are used by conditional branch instructions to make decisions.

Flag	Full Name	Set When	Example
CF	Carry Flag	Unsigned operation produces carry/borrow out of MSB	0xFFFF + 0x0001 → CF=1
ZF	Zero Flag	Result of operation is zero	5 − 5 = 0 → ZF=1
SF	Sign Flag	Result is negative (MSB = 1 in signed representation)	3 − 5 = −2 → SF=1
OF	Overflow Flag	Signed operation exceeds representable range	0x7FFF + 0x0001 → OF=1
IF	Interrupt Flag	Set = interrupts enabled; Clear = interrupts disabled	CLI clears IF, STI sets IF

Trace: ADD 0x7FFF + 0x0001 (16-bit signed)

FLAG TRACE
  Operand A:  0x7FFF  =  0111 1111 1111 1111  (+32767, max positive 16-bit)
  Operand B:  0x0001  =  0000 0000 0000 0001  (+1)

  Binary Addition:
    0111 1111 1111 1111   (0x7FFF = +32767)
  + 0000 0000 0000 0001   (0x0001 = +1)
  ─────────────────────
    1000 0000 0000 0000   (0x8000 = -32768 in signed!)

  Result = 0x8000

  Flag Analysis:
  ┌──────┬────────┬──────────────────────────────────────────┐
  │ Flag │ Value  │ Reason                                   │
  ├──────┼────────┼──────────────────────────────────────────┤
  │  CF  │   0    │ No carry out of bit 15 (unsigned OK)     │
  │  ZF  │   0    │ Result ≠ 0                               │
  │  SF  │   1    │ MSB = 1 (result appears negative)        │
  │  OF  │   1    │ +ve + +ve = −ve → signed overflow!       │
  └──────┴────────┴──────────────────────────────────────────┘

  Explanation: Adding two positive numbers (0x7FFF + 0x0001) gave a
  negative result (0x8000 = -32768). This is SIGNED OVERFLOW.
  OF is set because the sign of the result doesn't match expected.
  CF is NOT set because there's no carry out in unsigned addition.

More Flag Trace Examples

Operation	Result (16-bit)	CF	ZF	SF	OF
0x0005 + 0x0003	0x0008	0	0	0	0
0xFFFF + 0x0001	0x0000	1	1	0	0
0x8000 + 0x8000	0x0000	1	1	0	1
0x0005 − 0x0005	0x0000	0	1	0	0
0x0003 − 0x0005	0xFFFE	1	0	1	0
0x7000 + 0x7000	0xE000	0	0	1	1

CF and OF are different! CF detects unsigned overflow (carry out of MSB). OF detects signed overflow (when the sign of the result is wrong). A single ADD can set CF=1, OF=0 or CF=0, OF=1 or both. They are independent flags checking different things.

Section D

Learn by Doing — 3-Tier Lab Structure

🟢 Tier 1 — GUIDED: Instruction Format Converter (Pen & Paper)

⏱️ 60–90 minutesBeginnerZero prior knowledge assumed

Objective:

Convert the expression X = (A + B) × (C + D) into all 4 instruction formats by hand, showing every step.

Step 1: Write the Expression Tree

Draw the expression tree for X = (A + B) × (C + D):

         ×
        / \
       +   +
      / \ / \
     A  B C  D

Step 2: 3-Address Format

Each instruction specifies: OP destination, source1, source2

Write three instructions: (1) ADD T1, A, B (2) ADD T2, C, D (3) MUL X, T1, T2

Step 3: 2-Address Format

One operand is both source and destination. You need MOV to copy initial values.

Write: MOV R1,A → ADD R1,B → MOV R2,C → ADD R2,D → MUL R1,R2 → MOV X,R1

Step 4: 1-Address Format (Accumulator)

All operations use the implicit accumulator AC. Need STORE for temporary results.

Write: LOAD A → ADD B → STORE T → LOAD C → ADD D → MUL T → STORE X

Step 5: 0-Address Format (Stack)

Convert to postfix (RPN): A B + C D + ×

Write: PUSH A → PUSH B → ADD → PUSH C → PUSH D → ADD → MUL → POP X

Step 6: Fill the Comparison Table

Count: instructions, memory accesses, and total bits needed for each format. Create a table comparing all four.

🎉 Deliverable: A clean, hand-written (or typed) comparison showing all 4 formats with instruction counts and analysis. Take a photo for your portfolio.

🟡 Tier 2 — SEMI-GUIDED: CPU Register Simulator in Python

⏱️ 90–120 minutesIntermediateBasic Python knowledge assumed

Your Mission:

Build a Python simulator that models a simple CPU with 8 registers (R0–R7), an ALU, and a memory of 256 words. Implement LOAD, STORE, ADD, SUB, and MOV instructions.

Starter Code (you complete the TODOs):

Python
class SimpleCPU:
    def __init__(self):
        self.registers = [0] * 8       # R0-R7
        self.memory = [0] * 256        # 256-word memory
        self.flags = {'CF':0, 'ZF':0, 'SF':0, 'OF':0}

    def load(self, reg, addr):
        # TODO: Load memory[addr] into registers[reg]
        pass

    def store(self, reg, addr):
        # TODO: Store registers[reg] into memory[addr]
        pass

    def add(self, dest, src1, src2):
        # TODO: registers[dest] = registers[src1] + registers[src2]
        # TODO: Update ZF, SF, CF, OF flags
        pass

    def display_state(self):
        # TODO: Print all register values and flags
        pass

Test Case:

Store A=5 at memory[100], B=3 at memory[101]. Execute: LOAD R1,100 → LOAD R2,101 → ADD R3,R1,R2 → STORE R3,102. Verify memory[102] = 8.

Stretch Goal: Add SUB, AND, OR instructions. Implement flag updates. Add a simple instruction parser that reads assembly-like text files and executes them.

🔴 Tier 3 — OPEN CHALLENGE: Design a Custom CPU Instruction Set

⏱️ 2–3 hoursAdvancedNo instructions — real-world design project

The Brief:

Design a complete instruction set architecture (ISA) for a hypothetical 16-bit CPU called ARTHA-16. Your design must include:

16 instructions covering: data transfer (4), arithmetic (4), logical (3), control flow (3), stack (2)
At least 4 addressing modes: Immediate, Direct, Register, Register Indirect
Instruction encoding: 16-bit fixed format. Show the bit layout for each instruction type.
8 registers: R0–R5 (general), R6 (SP), R7 (PC)
Sample program: Write a program to compute factorial of 5 using your ISA
Documentation: Create a 2-page ISA reference card (like ARM's quick reference)

Deliverable: A PDF/Google Doc with your ISA specification, encoding tables, and sample program. This is a portfolio-worthy project for embedded systems roles.

This is exactly what CPU architects do at Qualcomm, ARM, and Intel. Companies like Samsung Semiconductor Noida hire freshers who can demonstrate ISA design skills. This project, polished well, can be the centrepiece of your resume for VLSI/embedded roles at ₹6–12 LPA.

Section E

Problem Bank — Diagrams, Numericals, Industry & GATE

Diagram-Based Problems (3)

📐 Problem D1: Draw General Register Organization

Q: Draw the complete block diagram of a general register organization with 8 registers. Label MUX A, MUX B, ALU, output bus, and all control signals (SELA, SELB, SELD, OPR). Show the data flow for the operation R5 ← R2 AND R6.

Solution: Use the diagram from Section C, Topic 1. For R5 ← R2 AND R6: SELA=010 (R2), SELB=110 (R6), SELD=101 (R5), OPR=01000 (AND). The control word is: 010 110 101 01000.

📐 Problem D2: Stack Trace for (A−B)×(C+D)÷E

Q: Show the complete stack trace (step-by-step) for evaluating (A−B)×(C+D)÷E using 0-address instructions. Use A=10, B=3, C=4, D=6, E=2.

Solution: Postfix: A B − C D + × E ÷

Step	Instruction	Stack	Result
1	PUSH A	10	—
2	PUSH B	10, 3	—
3	SUB	7	10−3=7
4	PUSH C	7, 4	—
5	PUSH D	7, 4, 6	—
6	ADD	7, 10	4+6=10
7	MUL	70	7×10=70
8	PUSH E	70, 2	—
9	DIV	35	70÷2=35

Final answer: 35

📐 Problem D3: Draw the Interrupt Handling Flowchart

Q: Draw the complete flowchart for interrupt handling, showing the steps from interrupt request to resumption of the original program. Include context saving, ISR execution, and RTI.

Solution: Refer to the ASCII flowchart in Section C, Topic 7. The key steps are: (1) Complete current instruction, (2) Save PC and PSW to stack, (3) Identify interrupt source, (4) Load ISR address from IVT, (5) Execute ISR, (6) Execute RTI to restore PC and PSW, (7) Resume original program.

Numerical Problems (6)

🔢 Problem N1: Effective Address Calculation

Q: Given: R1=200, PC=500, Memory[300]=600, Memory[600]=42, Memory[700]=55. Calculate the effective address and operand for: (a) Direct 300, (b) Indirect 300, (c) Register R1, (d) Displacement 500(R1).

Solution:

(a) Direct 300: EA=300, Operand=M[300]=600

(b) Indirect 300: EA=M[300]=600, Operand=M[600]=42

(d) Displacement 500(R1): EA=500+R1=500+200=700, Operand=M[700]=55

🔢 Problem N2: Control Word Generation

Q: For a general register organization with R0–R7, write the 14-bit control word for: (a) R4 ← R1 + R7, (b) R0 ← R3 XOR R5, (c) R6 ← R2 (transfer).

Solution:

(a) SELA=001, SELB=111, SELD=100, OPR=00010 (ADD) → 001 111 100 00010

(b) SELA=011, SELB=101, SELD=000, OPR=01100 (XOR) → 011 101 000 01100

🔢 Problem N3: Instruction Count Comparison

Q: For the expression Y = (P + Q) × (R − S) + T, determine the number of instructions needed in 3-address, 2-address, 1-address, and 0-address formats.

Solution: 3-address: ADD T1,P,Q → SUB T2,R,S → MUL T3,T1,T2 → ADD Y,T3,T = 4 instructions. 2-address: MOV R1,P → ADD R1,Q → MOV R2,R → SUB R2,S → MUL R1,R2 → ADD R1,T → MOV Y,R1 = 7 instructions. 1-address: LOAD P → ADD Q → STORE T1 → LOAD R → SUB S → MUL T1 → ADD T → STORE Y = 8 instructions. 0-address: PUSH P → PUSH Q → ADD → PUSH R → PUSH S → SUB → MUL → PUSH T → ADD → POP Y = 10 instructions.

🔢 Problem N4: Stack Operations Trace

Q: Initial SP=1000. Show the SP value after each operation: PUSH A, PUSH B, PUSH C, POP, PUSH D, POP, POP.

Solution: PUSH A: SP=999. PUSH B: SP=998. PUSH C: SP=997. POP: SP=998. PUSH D: SP=997. POP: SP=998. POP: SP=999. Stack grows downward; PUSH decrements SP, POP increments SP.

🔢 Problem N5: Flag Tracing

Q: Determine CF, ZF, SF, OF after each operation (8-bit signed): (a) ADD 127, 1 (b) SUB 0, 1 (c) ADD 128, 128.

Solution:

(a) 127+1 = 128 → 0x80 = 10000000. CF=0, ZF=0, SF=1, OF=1 (positive+positive=negative)

(b) 0−1 = −1 → 0xFF = 11111111. CF=1 (borrow), ZF=0, SF=1, OF=0

🔢 Problem N6: Relative Address Calculation

Q: A branch instruction is at address 2050. The instruction is BEQ with an 8-bit signed offset of −30 (decimal). If the PC has already been incremented to 2052 when the offset is applied, what is the target address?

Solution: Target = PC + offset = 2052 + (−30) = 2052 − 30 = 2022. The branch goes backward 30 bytes from the next instruction address. This is how loops work — the branch target is before the branch instruction.

Industry Problems (3)

🏭 Problem I1: ARM Pipeline Analysis

Q: An ARM Cortex-A78 has a 13-stage pipeline. If the clock frequency is 3 GHz, what is the theoretical maximum throughput in MIPS? Why is actual throughput lower?

Solution: Theoretical: 1 instruction per cycle at 3 GHz = 3000 MIPS. Actual throughput is lower due to: pipeline stalls (data hazards), branch mispredictions (flushing pipeline), cache misses (memory latency), and dependencies between instructions. Modern ARM cores use superscalar execution (multiple instructions per cycle) to partially compensate, achieving effective IPC of 3–5.

🏭 Problem I2: x86 Instruction Decode Challenge

Q: Intel x86 has variable-length instructions (1–15 bytes). Explain why this makes pipelining harder than ARM's fixed 32-bit instructions. How does Intel solve this problem?

Solution: Variable-length makes it impossible to know where the next instruction starts without decoding the current one. This creates a bottleneck at the decode stage. Intel solves this with: (1) Pre-decode buffers that scan ahead and mark instruction boundaries, (2) Micro-op translation — complex CISC instructions are broken into fixed-length RISC-like micro-operations (µops) internally, (3) µop caches that store decoded instructions. Modern Intel CPUs are internally RISC-like despite their CISC ISA.

🏭 Problem I3: RISC-V in Indian Context

Q: Why is RISC-V strategically important for India's semiconductor mission? Compare the cost and licensing models of ARM vs RISC-V for an Indian startup designing a custom IoT chip.

Solution: ARM requires licensing fees: $1M–$10M+ upfront + per-chip royalties (1–2%). RISC-V is open-source — zero licensing cost. For an Indian IoT startup producing 100,000 chips, ARM licensing could cost ₹8–80 crore, while RISC-V costs ₹0 in licensing. This is why IIT Madras chose RISC-V for SHAKTI and C-DAC chose it for VEGA. India's semiconductor independence depends on not paying royalties to foreign companies for basic CPU IP.

GATE-Style Problems (5)

🎓 GATE G1 (2-mark)

Q: A CPU has 16 general-purpose registers. How many bits are needed in the control word for the register selection fields (SELA + SELB + SELD)?

Solution: Each MUX needs log₂(16) = 4 bits. Three fields: SELA(4) + SELB(4) + SELD(4) = 12 bits.

🎓 GATE G2 (2-mark)

Q: Consider a byte-addressable memory with 16-bit addresses. If displacement addressing is used with a 6-bit signed offset and a 16-bit base register, what is the addressable range relative to the base?

Solution: 6-bit signed offset: range is −32 to +31 (in 2's complement). So EA ranges from [Base − 32] to [Base + 31]. The addressable range is 64 bytes centered around the base register value.

🎓 GATE G3 (1-mark)

Q: In a stack-based CPU, the instruction sequence PUSH 5, PUSH 3, SUB, PUSH 2, MUL produces what result on top of stack?

Solution: PUSH 5 → [5]. PUSH 3 → [5,3]. SUB → Pop 3,5; Push 5−3=2 → [2]. PUSH 2 → [2,2]. MUL → Pop 2,2; Push 2×2=4 → [4]. Answer: 4.

🎓 GATE G4 (2-mark)

Q: A RISC machine has 32 registers and uses 3-address instructions. Each instruction has a 6-bit opcode and three register fields. What is the instruction length?

Solution: Opcode: 6 bits. Each register field: log₂(32) = 5 bits. Total = 6 + 5 + 5 + 5 = 21 bits. In practice, this would be padded to 32 bits with unused/extended fields.

🎓 GATE G5 (2-mark)

Q: Which addressing mode is used to implement the branch instruction "if R1 == 0, jump to label L" where L is 40 bytes ahead of the current PC?

Solution: Relative addressing (PC-relative). The offset +40 is added to the current PC to compute the target address. This produces position-independent code. The instruction would be: BEQ +40 (if ZF=1 after comparing R1 with 0).

Section F

MCQ Assessment Bank — 30 Questions (Bloom's Mapped)

Remember / Identify (Q1–Q5)

In General Register Organization, the component that selects the source operand for ALU input A is:

ALU
MUX A
SELD
Output Bus

Remember

✅ Answer: (B) MUX A — MUX A (Multiplexer A) selects one of the registers as the first source operand for the ALU, controlled by the SELA field.

The PUSH operation on a memory stack (growing downward) performs:

SP ← SP + 1, then M[SP] ← DR
SP ← SP − 1, then M[SP] ← DR
DR ← M[SP], then SP ← SP + 1
DR ← M[SP], then SP ← SP − 1

Remember

✅ Answer: (B) — For a downward-growing stack, PUSH first decrements SP (to point to the next free location) and then writes the data register value to that location.

RISC stands for:

Reduced Instruction Standard Computer
Reduced Instruction Set Computer
Register Instruction Set Computer
Rapid Instruction Set Computing

Remember

✅ Answer: (B) — RISC = Reduced Instruction Set Computer. It uses a small, highly optimized set of instructions.

The Overflow Flag (OF) in the PSW is set when:

The result is zero
There is a carry from the MSB in unsigned arithmetic
A signed operation produces a result outside the representable range
The stack is full

Remember

✅ Answer: (C) — OF is set when the signed result cannot be represented in the given number of bits. For example, adding two positive numbers and getting a negative result.

Which addressing mode uses the Program Counter (PC) to calculate the effective address?

Direct
Immediate
Relative
Register Indirect

Remember

✅ Answer: (C) Relative — In relative addressing, EA = PC + offset. This is primarily used for branch/jump instructions and produces position-independent code.

Understand / Explain (Q6–Q10)

Why does a 0-address instruction format require more instructions than a 3-address format for the same expression?

Because 0-address uses longer instructions
Because it must explicitly push each operand and pop the result, using the stack for all operations
Because it has fewer registers
Because the ALU is slower

Understand

✅ Answer: (B) — In 0-address format, every operand must be explicitly pushed onto the stack, and every operation implicitly pops operands and pushes the result. This requires separate PUSH/POP instructions that 3-address encodes within the instruction itself.

Why is pipelining more efficient in RISC than CISC architectures?

RISC has more registers
RISC uses fixed-length instructions, making fetch and decode stages predictable
RISC has a larger instruction set
CISC uses hardwired control

Understand

✅ Answer: (B) — Fixed-length instructions in RISC allow the CPU to know exactly where each instruction starts, enabling efficient pipeline filling. CISC's variable-length instructions create decode bottlenecks.

In indirect addressing, why are two memory accesses needed to fetch the operand?

One to read the instruction, one to read the operand
One to read the pointer address from memory, another to read the actual operand from that address
One for the opcode, one for the address field
One for the stack, one for the register

Understand

✅ Answer: (B) — The address field points to a memory location containing the actual effective address (a pointer). First access reads the pointer, second access reads the operand at the pointed-to address.

What is the purpose of the SELD field in the general register organization control word?

Selects the ALU operation
Selects the first source register
Selects the destination register where the ALU result is stored
Selects the memory address

Understand

✅ Answer: (C) — SELD (Select Destination) determines which register receives the output from the ALU via the output bus.

Q10

Why does the Carry Flag (CF) and Overflow Flag (OF) serve different purposes?

CF is for addition, OF is for subtraction
CF detects unsigned overflow (carry out), OF detects signed overflow (sign error)
CF is set by the ALU, OF is set by the control unit
They always have the same value

Understand

✅ Answer: (B) — CF indicates an overflow in unsigned arithmetic (carry/borrow out of MSB). OF indicates an overflow in signed arithmetic (result sign doesn't match expected sign). They are independent flags.

Apply / Solve (Q11–Q15)

Q11

For the expression Y = (A + B) × C using 0-address instructions, how many instructions are needed?

Apply

✅ Answer: (C) — PUSH A, PUSH B, ADD, PUSH C, MUL, POP Y = 6 instructions. Postfix: A B + C × → 3 PUSHes + 2 operations + 1 POP = 6.

Q12

After executing ADD 0xFFFF + 0x0001 (16-bit), the flags are:

CF=0, ZF=1, SF=0, OF=0
CF=1, ZF=1, SF=0, OF=0
CF=1, ZF=0, SF=0, OF=1
CF=0, ZF=0, SF=1, OF=1

Apply

✅ Answer: (B) — 0xFFFF + 0x0001 = 0x10000, but in 16-bit: result = 0x0000. CF=1 (carry out), ZF=1 (result is zero), SF=0 (MSB=0), OF=0 (−1 + 1 = 0, sign is correct for signed).

Q13

Given R2 = 400 and the instruction LOAD 100(R2), with displacement addressing, the effective address is:

Apply

✅ Answer: (C) — EA = displacement + [R2] = 100 + 400 = 500. The operand is fetched from Memory[500].

Q14

The control word for R7 ← R0 OR R4 (given OPR for OR = 01010) is:

000 100 111 01010
111 000 100 01010
100 000 111 01010
000 111 100 01010

Apply

✅ Answer: (A) — SELA=000 (R0 as source A), SELB=100 (R4 as source B), SELD=111 (R7 as destination), OPR=01010 (OR). Control word: 000 100 111 01010.

Q15

How many instructions are needed to compute X = (P − Q) × (R + S) in 2-address format?

Apply

✅ Answer: (C) — MOV R1,P → SUB R1,Q → MOV R2,R → ADD R2,S → MUL R1,R2 → MOV X,R1 = wait, that's 6. But with proper destructive semantics: MOV R1,P; SUB R1,Q; MOV R2,R; ADD R2,S; MUL R1,R2; MOV X,R1 = 6. Answer is (B) 6 instructions. Corrected: Actually counting carefully — MOV R1,P (1), SUB R1,Q (2), MOV R2,R (3), ADD R2,S (4), MUL R1,R2 (5), MOV X,R1 (6) = 6. Answer: (B).

Analyze / Compare (Q16–Q20)

Q16

Which of the following is NOT an advantage of RISC over CISC?

Better pipelining efficiency
Smaller code size for the same program
Lower power consumption
Simpler hardware design

Analyze

✅ Answer: (B) — RISC actually produces LARGER code (more instructions needed). CISC has smaller code size because complex instructions encode more work per instruction. All other options are genuine RISC advantages.

Q17

A program uses autoincrement addressing to traverse an array of 100 integers. If using direct addressing instead, how many additional instructions would be needed?

0 — same count
100 — one extra increment per element
99 — increment after each access except last
200 — one extra load and increment per element

Analyze

✅ Answer: (C) — Autoincrement automatically updates the pointer register after each access. Without it, you need an explicit ADD instruction to increment the pointer. For 100 elements, you need 99 extra increment instructions (no increment needed after the last element access).

Q18

Why do modern Intel CPUs internally translate CISC instructions into RISC-like micro-operations (µops)?

To save memory
To enable efficient out-of-order execution and pipelining of fixed-size operations
To reduce the number of registers
To make software compatible with ARM

Analyze

✅ Answer: (B) — Variable-length CISC instructions are difficult to pipeline. By converting them to fixed-size µops, Intel can use RISC-style execution engines with efficient pipelining, out-of-order execution, and superscalar dispatch.

Q19

In a daisy-chain priority interrupt system, device D3 is connected between D2 and D4. If D2 and D4 both raise interrupts simultaneously, which gets serviced first?

D4 (it's farther from CPU)
D2 (it's closer to CPU, higher priority)
Both serviced simultaneously
Neither — deadlock occurs

Analyze

✅ Answer: (B) — In a daisy chain, devices closer to the CPU have higher priority. D2 is closer than D4, so D2's interrupt acknowledge signal reaches it first, blocking the signal from reaching D4 until D2 is serviced.

Q20

Compare the total memory bits for encoding "ADD R1, R2, R3" in a 3-address format (6-bit opcode, 4-bit registers) vs a stack-based 0-address equivalent. Which uses fewer total bits?

3-address: fewer bits
0-address: fewer bits
Both use the same number of bits
Cannot be determined without more information

Analyze

✅ Answer: (D) — 3-address: 6+4+4+4 = 18 bits for 1 instruction. 0-address equivalent needs 3 instructions (PUSH R1, PUSH R2, ADD) but instruction width depends on opcode size. Without knowing the 0-address instruction width, we can't compare total bits.

Evaluate / Justify (Q21–Q25)

Q21

For an embedded real-time system controlling an automotive braking mechanism, which interrupt priority scheme is most appropriate?

Software polling
Daisy-chain priority
Parallel priority with hardware encoder
No interrupts — use busy waiting

Evaluate

✅ Answer: (C) — Parallel priority with hardware encoder provides the fastest response time. In safety-critical automotive systems, microsecond-level response to brake sensor interrupts is essential. Software polling and daisy chain are too slow; busy waiting wastes CPU cycles.

Q22

A student argues: "Register addressing is always better than direct addressing because it's faster." Is this correct?

Yes — registers are always faster than memory
No — register addressing cannot access large data structures in memory
Yes — all modern CPUs use only register addressing
No — direct addressing is faster for constants

Evaluate

✅ Answer: (B) — While register access is faster, registers are limited in number (8–32 typically). Large data structures (arrays, databases) must reside in memory and need direct/indirect addressing. Speed isn't the only consideration — addressability and flexibility matter too.

Q23

A company is choosing between stack-based and register-based architecture for a new Java bytecode processor. Which is more suitable?

Register-based — always better performance
Stack-based — matches Java's stack-oriented bytecode natively
Both are equally suitable
Neither — a CISC approach is needed

Evaluate

✅ Answer: (B) — Java's JVM uses stack-based bytecode. A stack-based hardware processor can execute JVM bytecodes directly without translation, reducing overhead. This is why picoJava (Sun's Java processor) used stack architecture.

Q24

Is it justified for India to invest in RISC-V over licensing ARM cores? Evaluate:

No — ARM is industry-proven and reliable
Yes — RISC-V eliminates licensing costs and enables sovereign chip design
No — RISC-V has no ecosystem
Yes — but only for military applications

Evaluate

✅ Answer: (B) — ARM licensing costs $1M–$10M+ per design plus per-chip royalties. For India's goal of semiconductor self-reliance, RISC-V provides a zero-cost ISA that Indian institutions (IIT Madras, C-DAC) can freely customize. The ecosystem is rapidly growing with SiFive, Alibaba T-Head, and others.

Q25

A system has both maskable and non-maskable interrupts. The IF (Interrupt Flag) is cleared. Which statement is true?

Both maskable and non-maskable interrupts are blocked
Only maskable interrupts are blocked; NMI still fires
Only NMI is blocked; maskable interrupts still fire
Neither is affected — IF only controls software interrupts

Evaluate

✅ Answer: (B) — When IF=0, maskable interrupts are disabled. But Non-Maskable Interrupts (NMI) cannot be disabled by software — they always interrupt the CPU. This ensures critical events like power failure are always handled.

Create / Design (Q26–Q30)

Q26

You are designing a 32-bit RISC CPU with 64 registers and need a 3-address instruction format. How many bits remain for the opcode if the instruction is 32 bits?

8 bits
14 bits
10 bits
12 bits

Create

✅ Answer: (B) — 64 registers need log₂(64)=6 bits each. Three register fields: 6×3 = 18 bits. Remaining for opcode: 32 − 18 = 14 bits, allowing up to 16,384 distinct instructions.

Q27

If you add autoincrement and autodecrement modes to a CPU with 8 registers, how many additional control bits are needed per instruction to specify the mode?

1 bit (auto-inc or auto-dec per operand)
2 bits per register field (4 modes: none, inc, dec, indirect)
3 bits total
No additional bits needed

Create

✅ Answer: (B) — Each register field needs a 2-bit mode specifier to distinguish: (00) register, (01) register indirect, (10) autoincrement, (11) autodecrement. With two source registers, that's 4 additional bits.

Q28

Design consideration: A new ISA needs to support both 16-bit and 32-bit instructions (like ARM's Thumb mode). What is the primary advantage?

Faster execution speed
Reduced code size while maintaining 32-bit capability for complex operations
More addressing modes
More registers

Create

✅ Answer: (B) — ARM Thumb mode uses 16-bit instructions for common operations (reducing code size by ~30%) while allowing switch to 32-bit for complex operations. This saves memory in embedded systems with limited storage.

Q29

You need to design an interrupt controller supporting 8 devices with programmable priority. What minimum hardware is needed?

8 flip-flops and an 8-to-3 priority encoder
3 flip-flops and a 3-to-8 decoder
8 interrupt request lines, 8 mask flip-flops, an 8-to-3 priority encoder, and a priority register
A single status register

Create

✅ Answer: (C) — Programmable priority needs: 8 IRQ lines (input from devices), 8 mask flip-flops (to enable/disable individual interrupts), an 8-to-3 priority encoder (to select highest priority), and a priority register (to store/change priority levels). This is how ARM's NVIC works.

Q30

If you architect a CPU with separate instruction and data caches (Harvard architecture), which stage of the pipeline benefits most?

Execute stage
Writeback stage
Fetch stage — instruction fetch and data access can happen simultaneously
Decode stage

Create

✅ Answer: (C) — Harvard architecture allows the CPU to fetch the next instruction while simultaneously reading/writing data for the current instruction. This eliminates the structural hazard of competing for a single memory port, directly benefiting pipeline throughput at the fetch stage.

Section G

Short Answer Questions (8)

SA1: What is General Register Organization? Describe its control word.

General Register Organization is a CPU architecture where multiple general-purpose registers (e.g., R0–R7) are connected through multiplexers to an ALU. Any register can serve as source or destination. The control word has four fields: SELA (selects source register A for MUX A), SELB (selects source register B for MUX B), SELD (selects destination register for ALU output), and OPR (selects the ALU operation). For 8 registers, each SEL field needs 3 bits, and OPR needs 5 bits, giving a 14-bit control word.

SA2: Explain PUSH and POP operations with a memory stack diagram.

PUSH adds data to the stack: SP is decremented first (SP ← SP−1), then data is written to the memory location pointed to by SP (M[SP] ← DR). POP removes data: the value at SP is read into the data register (DR ← M[SP]), then SP is incremented (SP ← SP+1). The stack grows downward in memory — PUSH moves SP to lower addresses, POP moves it to higher addresses. Stack overflow occurs if SP goes below the lower limit, and underflow occurs if POP is attempted when the stack is empty.

SA3: Differentiate between Direct and Indirect addressing modes.

In Direct addressing, the address field in the instruction directly contains the effective address (EA) of the operand. Only one memory access is needed: EA = Address field. In Indirect addressing, the address field contains a pointer — the memory address of a location that holds the actual EA. Two memory accesses are needed: first to read the pointer, then to read the operand at the pointed address. Indirect addressing is slower but more flexible, supporting pointers and dynamic memory allocation. Direct is simpler and faster but limited to a fixed address space.

SA4: What is the difference between 1-address and 0-address instruction formats?

In a 1-address format, instructions have one explicit operand address and use an implicit accumulator (AC) as the other operand and destination. Example: ADD X means AC ← AC + M[X]. In a 0-address format, instructions have no explicit address fields and use an implicit stack. Operations pop operands from the stack, compute, and push the result. Example: ADD pops two values, adds them, and pushes the sum. 0-address instructions are shortest but need more instructions per expression; 1-address instructions are slightly longer but require fewer instructions.

SA5: List 6 key differences between RISC and CISC.

(1) RISC has a small instruction set (50–150); CISC has a large set (200–300+). (2) RISC uses fixed-length instructions; CISC uses variable-length. (3) RISC executes most instructions in 1 clock cycle; CISC takes multiple cycles. (4) RISC uses load/store architecture (only LOAD/STORE access memory); CISC allows any instruction to access memory. (5) RISC has many registers (32–64); CISC has fewer (8–16). (6) RISC uses hardwired control; CISC uses microprogrammed control. Examples: ARM, MIPS (RISC) vs Intel x86, AMD (CISC).

SA6: What are the different types of interrupts? Give examples.

Hardware External: Generated by I/O devices (keyboard press, timer tick). Hardware Internal (Traps): Generated by CPU errors (division by zero, invalid opcode). Software Interrupts: Triggered by instructions (INT 21h in DOS, SVC in ARM) for system calls. Non-Maskable Interrupt (NMI): Cannot be disabled, used for critical events (power failure, memory parity error). Maskable Interrupt: Can be enabled/disabled via the IF flag in PSW, used for peripheral devices. The CPU handles interrupts by saving context (PC, PSW), executing the ISR, then restoring context via RTI.

SA7: Explain the flags CF, ZF, SF, and OF with one example each.

CF (Carry Flag): Set when unsigned arithmetic produces a carry/borrow. Example: 0xFF + 0x01 = 0x00 with CF=1. ZF (Zero Flag): Set when the result is zero. Example: 5 − 5 = 0, ZF=1. SF (Sign Flag): Set when the result's MSB is 1 (negative in signed representation). Example: 3 − 5 = −2, SF=1. OF (Overflow Flag): Set when signed arithmetic exceeds the representable range. Example: 0x7F + 0x01 = 0x80 (8-bit: +127 + 1 = −128), OF=1. CF and OF are independent — CF checks unsigned overflow, OF checks signed overflow.

SA8: What is autoincrement addressing mode? Where is it used?

In autoincrement addressing, the effective address is the content of a specified register, and after the operand is accessed, the register is automatically incremented by the operand size (e.g., +1 for bytes, +4 for 32-bit words). Formula: EA = [R]; then R ← R + 1. This eliminates the need for a separate increment instruction when traversing arrays or sequential data structures. It is widely used in loop-based array processing, string operations, and stack implementations (POP uses autoincrement). ARM supports post-increment addressing: LDR R0, [R1], #4 loads from [R1] then adds 4 to R1.

Section H

Long Answer Questions (3)

📋 LA1: ARM vs Intel — A Comprehensive Architecture Case Study

Question: Compare ARM (RISC) and Intel x86 (CISC) architectures across at least 10 parameters. Analyse why ARM dominates smartphones while Intel dominates desktops. Include the recent shift with Apple M-series and Qualcomm Snapdragon X Elite.

Answer:

1. Historical Context: ARM was designed in 1985 by Acorn Computers (UK) for low-power embedded use. Intel x86 was designed in 1978 for general-purpose computing. Their design philosophies diverged fundamentally: ARM prioritized simplicity and power efficiency; x86 prioritized backward compatibility and computational power.

2. Architecture Comparison:

Parameter	ARM (RISC)	Intel x86 (CISC)
Instruction Set	~150 simple instructions	~1500+ complex instructions
Instruction Length	Fixed 32-bit (or 16-bit Thumb)	Variable 1-15 bytes
Registers	31 GP registers (AArch64)	16 GP registers (x86-64)
Memory Access	Load/Store only	Any instruction can access memory
Pipeline Efficiency	Excellent (fixed-length)	Complex (needs µop translation)
Power Consumption	0.5–5W per core	15–125W per package
Performance/Watt	Industry-leading	Improving but behind ARM
Control Unit	Hardwired	Microprogrammed
Conditional Execution	Most instructions conditional	Only branch instructions
Software Ecosystem	Android, iOS, embedded	Windows, Linux desktop, server

3. Why ARM Dominates Smartphones: Smartphones are battery-powered devices where power efficiency is paramount. ARM's simple instruction set means less transistor switching, lower heat generation, and longer battery life. A Snapdragon 8 Gen 3 runs at 3.3 GHz consuming only ~5W, while an Intel i7 needs 125W for similar single-threaded performance. ARM's licensing model also allows chip companies (Qualcomm, Samsung, MediaTek) to customize cores for specific needs.

4. Why Intel Dominated Desktops: The x86 ecosystem has 40+ years of software compatibility. Windows, Office, games, and enterprise software were compiled for x86. Switching would break trillions of dollars of existing software. Intel's high power consumption was acceptable because desktops/laptops have wall power and active cooling.

5. The 2020s Shift — Apple M-Series: Apple's M1 (2020) proved ARM can match or beat Intel in laptops. The M4 (2024) delivers desktop-class performance at laptop power levels. Apple achieved this by: designing custom ARM cores (not off-the-shelf), integrating CPU/GPU/Neural Engine on one chip (SoC), using TSMC's advanced 3nm process, and optimizing macOS for ARM.

6. Qualcomm Snapdragon X Elite: Qualcomm brought ARM to Windows PCs in 2024. Running Windows via emulation (for x86 apps) and native ARM apps, it delivers competitive performance at a fraction of Intel's power. This threatens Intel's last stronghold — the PC market.

7. India Connect: Every smartphone in India runs ARM. India's SHAKTI (IIT Madras) and VEGA (C-DAC) processors use RISC-V, the open-source cousin of ARM. India's semiconductor mission aims to manufacture ARM-based chips domestically by 2028.

📋 LA2: Design a CPU Datapath for 3-Address Instructions

Question: Design a complete CPU datapath that can execute 3-address register-to-register instructions of the form OP RD, RS1, RS2. Include the register file, ALU, control signals, and show the data flow for ADD R3, R1, R2.

Answer:

Datapath Components:

1. Instruction Register (IR): Holds the current instruction. Fields: Opcode (6 bits), RD (3 bits), RS1 (3 bits), RS2 (3 bits).

2. Register File: 8 registers (R0–R7), two read ports (Port A, Port B) and one write port (Port W). Port A outputs R[RS1], Port B outputs R[RS2], Port W writes to R[RD].

3. ALU: Takes two inputs (from read ports), performs operation based on Opcode, produces Result and Flags.

4. Control Unit: Decodes Opcode and generates: ALU_OP (selects ALU function), RegWrite (enables write to register file), FlagWrite (enables PSW update).

DATAPATH
  ┌──────────────────────────────────────────┐
  │          INSTRUCTION REGISTER            │
  │  [Opcode|  RD  |  RS1  |  RS2  ]        │
  │   6 bits  3 bits  3 bits  3 bits         │
  └────┬───────┬───────┬────────┬────────────┘
       │       │       │        │
       ▼       │       ▼        ▼
  ┌─────────┐  │  ┌──────────────────┐
  │ Control │  │  │   REGISTER FILE  │
  │  Unit   │  │  │  Read Port A←RS1 │──→ Bus A
  │         │  │  │  Read Port B←RS2 │──→ Bus B
  │ ALU_OP  │  │  │  Write Port←RD   │←── Result Bus
  │ RegWrite│  │  │  Write Enable    │←── RegWrite
  └────┬────┘  │  └──────────────────┘
       │       │           │         │
       │       │      Bus A│    Bus B│
       ▼       │           ▼         ▼
       │       │     ┌─────────────────┐
       ├───────┼────→│      A L U      │
       │       │     │  (ALU_OP input)  │
       │       │     └───────┬─────────┘
       │       │             │ Result
       │       │             ▼
       │       │     ┌───────────────┐
       │       │     │  FLAGS (PSW)  │
       │       │     │ CF ZF SF OF   │
       │       └─────┤  Result Bus   │──→ back to Register File
       │             └───────────────┘

Data Flow for ADD R3, R1, R2:

IR contains: Opcode=ADD, RD=011(R3), RS1=001(R1), RS2=010(R2)
Control Unit decodes ADD → sets ALU_OP=ADD, RegWrite=1, FlagWrite=1
Register File Read Port A outputs R1 value onto Bus A
Register File Read Port B outputs R2 value onto Bus B
ALU receives Bus A and Bus B, performs addition, outputs Result
Flags (CF, ZF, SF, OF) are updated based on the addition result
Result Bus carries the sum back to Register File Write Port
R3 is updated with the ALU result (since RD=011 selects R3 and RegWrite=1)

📋 LA3: Comprehensive Addressing Modes with EA Calculations

Question: Given the following machine state, calculate the effective address and operand value for all 8 addressing modes:

Registers: R1=500, R2=100, PC=3000. Memory: M[200]=500, M[400]=700, M[500]=800, M[600]=42, M[700]=99, M[800]=55, M[3050]=25.

Instruction address field = 200. Register field points to R1 (value 500).

Answer:

Mode	EA Formula	EA Calculation	EA	Operand
Immediate	Operand = Addr field	Operand = 200	N/A	200
Direct	EA = Addr	EA = 200	200	M[200] = 500
Indirect	EA = M[Addr]	EA = M[200] = 500	500	M[500] = 800
Register	Operand = R1	Operand = R1 = 500	N/A	500
Reg. Indirect	EA = [R1]	EA = R1 = 500	500	M[500] = 800
Autoincrement	EA = [R1]; R1++	EA = 500; R1→501	500	M[500] = 800
Displacement	EA = Addr + [R2]	EA = 200 + 100 = 300	300	M[300] (not given)
Relative	EA = PC + Addr	EA = 3000 + 50 = 3050	3050	M[3050] = 25

Key Observations:

Immediate and Register modes don't access memory for the operand — fastest
Indirect mode requires two memory accesses — slowest
Register Indirect and Indirect can produce the same EA if the register and memory location hold the same value
Autoincrement has a side effect — modifying R1 for the next instruction (useful for array traversal)
Relative addressing makes the code position-independent — the target moves with the code

Section I

Industry Spotlight — A Day in the Life

👨‍💻 Deepak Verma, 31 — CPU Verification Engineer at Samsung Semiconductor, Noida

Background: B.Tech in Electronics from MNNIT Allahabad (2015). No GATE coaching, no IIT background. Joined as a fresher at a small Noida VLSI startup. Self-taught SystemVerilog and UVM during evenings. Moved to Samsung Semiconductor R&D after 3 years.

A Typical Day:

9:00 AM — Team standup. Review overnight regression results. 3 out of 1,200 test cases failed on the ARM Cortex-A core being verified.

10:00 AM — Debug failing test case #847: a corner case in the Load-Store Unit where back-to-back PUSH operations with interrupts cause a pipeline stall that isn't handled correctly.

12:00 PM — Write a new SystemVerilog assertion to catch this bug in future regressions. Run targeted simulation on the modified RTL.

1:30 PM — Lunch at Samsung's Noida campus cafeteria. Discuss ARM Cortex-X4 micro-architecture with the design team.

2:30 PM — Write coverage analysis report: 94.7% code coverage, 89.3% functional coverage. Identify 12 uncovered scenarios related to interrupt nesting.

4:30 PM — Review a colleague's UVM testbench for the branch prediction unit. Suggest improvements to constrained-random stimulus generation.

6:00 PM — Learning hour: study ARM Architecture Reference Manual (ARM ARM) for ARMv9 security extensions (Realm Management Extension). Samsung is implementing this for the next Galaxy flagship's chip.

Detail	Info
Tools Used Daily	SystemVerilog, UVM, Synopsys VCS, Verdi (waveform debugger), ARM Fast Models, Git, Jira
Entry Salary (2024)	₹6–9 LPA + benefits
Mid-Level (3–5 yrs)	₹12–20 LPA
Senior (7+ yrs)	₹25–45 LPA
Companies Hiring (India)	Samsung Semiconductor, Qualcomm Hyderabad, Intel Bangalore, AMD Hyderabad, Texas Instruments, MediaTek Noida, ARM India, Synopsys, Cadence, NXP
Required Skills	Verilog/SystemVerilog, UVM, CPU architecture knowledge, ARM/RISC-V ISA, digital design fundamentals

Deepak's advice to students: "You don't need to be from IIT to work in chip design. I learned UVM from YouTube (Verification Academy channel) and practiced on EDA Playground (free online simulator). Understanding COA concepts — register organization, instruction formats, pipelining, interrupts — is the foundation. Every interview I've had asked COA questions."

Section J

Earn With It — Embedded Projects & CPU Design

💰 Your Earning Path After This Chapter

Portfolio Piece: A documented ISA design (Tier 3 lab) + instruction format comparison + flag tracing analysis. This demonstrates CPU architecture knowledge to employers.

Beginner Gig Ideas:

• Arduino/ESP32 embedded programming projects — ₹3,000–₹10,000/project

• Assembly language tutoring for B.Tech students — ₹500–₹1,000/session

• Technical content writing (COA topics for edtech platforms) — ₹1,500–₹5,000/article

• FPGA project implementation for final-year students — ₹5,000–₹15,000/project

Opportunity	Platform	Earning Potential
Embedded Systems Freelancing	Upwork, Freelancer, Fiverr	₹5,000–₹25,000/project
FPGA/Verilog Projects	Direct college outreach	₹5,000–₹15,000/project
ARM Assembly Tutoring	Superprof, Chegg, local	₹500–₹1,500/hour
Technical Blog Writing	Medium, GeeksforGeeks, EduArtha	₹1,500–₹5,000/article
Raspberry Pi Projects	IoT project contracts	₹3,000–₹12,000/project
VLSI Internships	Samsung, Qualcomm, Intel via Internshala	₹15,000–₹40,000/month

Fastest path to earning: Learn ARM assembly on Raspberry Pi (₹3,500 for Pi 4). Build 3 embedded projects (LED matrix, sensor data logger, motor controller). Document them on GitHub. Apply to embedded systems internships on Internshala/LinkedIn. Students with GitHub portfolios showing real hardware projects get 3× more interview calls than those with only theoretical knowledge.

Section K

Chapter Summary

📝 Key Concepts Covered in Unit 4

1. General Register Organization: CPU with R0–R7 register file, MUX A/B for source selection, ALU for computation, output bus for result writeback. 14-bit control word: SELA(3) + SELB(3) + SELD(3) + OPR(5).

2. Stack Organization: LIFO structure managed by Stack Pointer (SP). PUSH: SP−−, M[SP]←DR. POP: DR←M[SP], SP++. Used in expression evaluation (postfix/RPN), subroutine calls, and interrupt handling.

3. Instruction Formats: 3-address (OP D,S1,S2), 2-address (OP D,S), 1-address (accumulator), 0-address (stack). Trade-off: fewer addresses → shorter instructions but more of them.

4. Addressing Modes (8): Immediate, Direct, Indirect, Register, Register Indirect, Autoincrement, Displacement, Relative. Each offers different EA calculation, speed, and flexibility trade-offs.

5. RISC vs CISC: RISC = small instruction set, fixed-length, load/store, many registers, efficient pipelining (ARM, MIPS). CISC = large instruction set, variable-length, memory-to-memory, complex decode (x86). Modern Intel CPUs internally convert CISC to RISC µops.

6. Data Transfer & Manipulation: LOAD, STORE, MOV, PUSH, POP (transfer). ADD, SUB, MUL, DIV (arithmetic). AND, OR, XOR, NOT (logical). SHL, SHR, ROL, ROR (shift/rotate).

7. Program Control & Interrupts: JMP, CALL, RET for control flow. Interrupts: hardware (external/internal), software, maskable, non-maskable. Priority schemes: daisy chain, parallel, software polling.

8. PSW/Flags: CF (carry), ZF (zero), SF (sign), OF (overflow), IF (interrupt enable). CF detects unsigned overflow; OF detects signed overflow. They are independent.

Key Formulas for Quick Revision

Formula	Description
Control word bits = 3(SELA) + 3(SELB) + 3(SELD) + 5(OPR) = 14	For 8-register organization
Register select bits = log₂(N)	N = number of registers
EA (Direct) = Address field	One memory access
EA (Indirect) = M[Address field]	Two memory accesses
EA (Displacement) = Addr + [R]	Base + offset
EA (Relative) = PC + offset	Position-independent code
OF = 1 when sign(A) = sign(B) ≠ sign(Result)	Signed overflow detection

Section L

Earning Checkpoint

Skill	Tool / Method	Portfolio Artifact	Can You Earn?
General Register Organization	Pen & paper / diagrams	Annotated register org diagram	✅ Yes — interview preparation asset
Stack Operations & Expression Eval	Manual tracing / Python	Stack trace tables for complex expressions	✅ Yes — tutoring & content writing
Instruction Formats (3/2/1/0-addr)	Manual conversion	Instruction format comparison document	✅ Yes — educational content creation
Addressing Modes (All 8)	EA calculation practice	Addressing modes cheat sheet	✅ Yes — GATE coaching material
RISC vs CISC Comparison	Research & analysis	ARM vs Intel comparison report	✅ Yes — technical blog writing
Assembly/Embedded Programming	ARM on Raspberry Pi	3 embedded projects on GitHub	✅ Yes — ₹3,000–₹12,000/project
Interrupt Handling	Conceptual + coding	Interrupt priority simulation	⬜ Not yet — need practical exposure
Flag/PSW Tracing	Manual bit-level tracing	Flag trace examples document	✅ Yes — tutoring sessions

Minimum Viable Earning Setup after this chapter: An understanding of CPU architecture + ARM assembly basics on Raspberry Pi + 2–3 embedded projects on GitHub = ready for embedded systems internships at ₹15,000–₹40,000/month while still in college. Companies actively hiring: Samsung, Qualcomm, Texas Instruments, NXP.

✅ Unit 4 complete. Ready for Unit 5: Pipeline & Vector Processing!

[QR: Link to EduArtha video tutorial — Central Processing Unit]