Computer Organization & Architecture

Unit 4: Central Processing Unit

From register files to interrupt handling — master CPU internals, instruction formats, addressing modes, and the RISC vs CISC battle that shapes every processor in your pocket.

โฑ๏ธ 7 hrs theory + 5 hrs lab  |  ๐ŸŽฏ GATE ~4 marks  |  ๐Ÿ–ฅ๏ธ ARM vs Intel

๐Ÿ’ผ Jobs this unlocks: VLSI Design Engineer (โ‚น6โ€“12 LPA)  |  Embedded Systems Engineer (โ‚น5โ€“10 LPA)  |  CPU Verification Engineer (โ‚น8โ€“18 LPA)

Section A

Opening Hook — Apple M4 vs Snapdragon X Elite: The CPU War

๐Ÿ”ฅ The RISC vs CISC Battle That Changed Computing Forever

In 2024, Apple unveiled the M4 chip — an ARM-based RISC processor that obliterates Intel's Core Ultra in performance-per-watt. A MacBook Pro with M4 delivers 38 trillion operations per second while sipping battery like a phone. Meanwhile, Qualcomm's Snapdragon X Elite brought ARM to Windows laptops, threatening Intel's 40-year x86 CISC monopoly on PCs.

Here's the twist: both M4 and Snapdragon X Elite are RISC processors โ€” they use a reduced instruction set with fixed-length instructions. Intel's Core Ultra and AMD's Ryzen are CISC processors โ€” complex instruction sets with variable-length instructions. For decades, everyone thought CISC won the PC war. Now RISC is eating CISC's lunch.

Behind every chip is a CPU architecture built from register files, ALUs, instruction decoders, and interrupt controllers — exactly what this chapter teaches you. Understand this chapter, and you'll understand why Apple's stock is worth $3 trillion.

๐ŸŽ Apple๐Ÿ“ฑ Qualcomm๐Ÿ’ป Intel๐Ÿ”ด AMD๐Ÿ‡ฌ๐Ÿ‡ง ARM Holdings๐Ÿ‡ฐ๐Ÿ‡ท Samsung
India is designing its own CPU! IIT Madras developed the SHAKTI processor — India's first indigenous RISC-V CPU. RISC-V is an open-source instruction set architecture (no licensing fees, unlike ARM). India's MeitY is investing โ‚น76,000 crore in semiconductor manufacturing under the India Semiconductor Mission. By 2030, Indian engineers may be designing CPUs rivalling Qualcomm and MediaTek.
Section B

Learning Outcomes — Bloom's Taxonomy Mapped

Bloom's LevelLearning Outcome
๐Ÿ”ต RememberList all 8 addressing modes and define RISC vs CISC architectures
๐Ÿ”ต RememberIdentify components of General Register Organization: register file, MUX, ALU, output bus
๐Ÿ”ต UnderstandExplain stack organization with PUSH/POP operations and Stack Pointer movement
๐Ÿ”ต UnderstandDescribe how 3-address, 2-address, 1-address, and 0-address instruction formats encode operations
๐ŸŸข ApplyEvaluate X=(A+B)*(C+D) using all four instruction formats with complete instruction sequences
๐ŸŸข ApplyTrace flag changes (CF, ZF, SF, OF) in the PSW after arithmetic operations like ADD 0x7FFF+0x0001
๐ŸŸ  AnalyzeCompare RISC vs CISC across 12+ parameters with ARM vs x86 real-world examples
๐ŸŸ  AnalyzeDetermine effective addresses for all 8 addressing modes given memory contents and register values
๐Ÿ”ด EvaluateJustify when to use stack organization vs register organization for different application scenarios
๐Ÿ”ด EvaluateAssess interrupt priority handling schemes (daisy-chain vs parallel) for real-time embedded systems
๐ŸŸฃ CreateDesign a simple instruction set with 16 instructions supporting at least 4 addressing modes
๐ŸŸฃ CreateArchitect a CPU datapath for a given 3-address instruction format with control signals
Section C

Concept Explanation — CPU Architecture from Scratch

1. General Register Organization

A CPU's register organization determines how data flows between registers, the ALU, and memory. In a general register organization, the CPU has a set of general-purpose registers (typically R0–R7), and any register can be used as a source or destination for any operation. This is the most flexible and common organization used in modern CPUs.

๐Ÿ”ง Components of General Register Organization

Register File (R0–R7): A set of 8 general-purpose registers, each capable of holding one data word. Any register can serve as source or destination. Registers are faster than memory because they are inside the CPU.

MUX A (Multiplexer A): Selects one register as the first source operand (input A to ALU). Controlled by SELA (3-bit select line).

MUX B (Multiplexer B): Selects one register as the second source operand (input B to ALU). Controlled by SELB (3-bit select line).

ALU (Arithmetic Logic Unit): Performs the actual operation (ADD, SUB, AND, OR, etc.) on the two inputs from MUX A and MUX B. Controlled by OPR (5-bit operation code).

Output Bus: Carries the ALU result back to the register file. The destination register is selected by SELD (3-bit select line).

ASCII DIAGRAM
     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
     โ”‚          REGISTER FILE (R0 โ€“ R7)         โ”‚
     โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”โ”‚
     โ”‚  โ”‚ R0 โ”‚ R1 โ”‚ R2 โ”‚ R3 โ”‚ R4 โ”‚ R5 โ”‚ R6 โ”‚ R7 โ”‚โ”‚
     โ”‚  โ””โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”˜โ”‚
     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
               โ”‚                  โ”‚
          โ”Œโ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”      โ”Œโ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”
          โ”‚  MUX  A  โ”‚      โ”‚  MUX  B  โ”‚
          โ”‚ (SELA)   โ”‚      โ”‚ (SELB)   โ”‚
          โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜      โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜
               โ”‚   Input A       โ”‚  Input B
               โ–ผ                 โ–ผ
          โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
          โ”‚      A   L   U          โ”‚
          โ”‚     (OPR โ€” 5 bits)      โ”‚
          โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                       โ”‚
                  Output Bus
                       โ”‚
                  โ”Œโ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”
                  โ”‚  SELD    โ”‚ โ† Destination register select
                  โ”‚ (3 bits) โ”‚
                  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                       โ”‚
               (writes back to Register File)

Control Word Format (14 bits)

FieldBitsPurposeRange
SELA3Select source register A (MUX A input)000 (R0) to 111 (R7)
SELB3Select source register B (MUX B input)000 (R0) to 111 (R7)
SELD3Select destination register for result000 (R0) to 111 (R7)
OPR5ALU operation select00000 (Transfer A) to 11111

Example: R3 ← R1 + R2

FieldValueMeaning
SELA001Select R1 as source A
SELB010Select R2 as source B
SELD011Select R3 as destination
OPR00010ADD operation

Complete control word: 001 010 011 00010 (14 bits)

Students often confuse SELA/SELB with SELD. SELA and SELB select the source registers (inputs to ALU). SELD selects the destination register (where the result is stored). The output bus always writes to the register selected by SELD.
ARM Cortex-M4 has 16 general-purpose registers (R0–R15), with R13 as the Stack Pointer, R14 as the Link Register, and R15 as the Program Counter. The concept is the same as R0–R7 above, just scaled up. Understanding this 8-register model perfectly prepares you for real ARM programming.

2. Stack Organization

A stack is a last-in-first-out (LIFO) storage structure. In CPU architecture, stacks are used for subroutine calls (saving return addresses), expression evaluation, and interrupt handling. The stack is managed by a special register called the Stack Pointer (SP).

๐Ÿ“š Memory Stack โ€” PUSH and POP Operations

PUSH operation (add to stack):

Step 1: SP ← SP − 1   (decrement stack pointer — stack grows downward)

Step 2: M[SP] ← DR   (write data register value to memory at SP)

POP operation (remove from stack):

Step 1: DR ← M[SP]   (read value from memory at SP into data register)

Step 2: SP ← SP + 1   (increment stack pointer — stack shrinks upward)

MEMORY STACK
    Address    Memory        Notes
    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚  4000  โ”‚  (empty)   โ”‚  โ† Initial SP (stack empty)
    โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
    โ”‚  3999  โ”‚  Data_3    โ”‚  โ† SP after 3 PUSHes (Top of Stack)
    โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
    โ”‚  3998  โ”‚  Data_2    โ”‚
    โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
    โ”‚  3997  โ”‚  Data_1    โ”‚  โ† First item pushed
    โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
    โ”‚  ...   โ”‚   ...      โ”‚
    โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
    โ”‚  3000  โ”‚  (limit)   โ”‚  โ† Stack bottom (overflow if SP < 3000)
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

    Stack grows DOWNWARD (address decreases on PUSH)
    Stack shrinks UPWARD (address increases on POP)
    FULL condition:  SP = 3000 (stack bottom limit)
    EMPTY condition: SP = 4000 (initial value)

0-Address Instructions & Stack-Based Expression Evaluation

In 0-address (stack) architecture, instructions like ADD don't specify operands — they implicitly pop two values from the stack, operate, and push the result. This is how the expression (A+B)×C is evaluated using Reverse Polish Notation (RPN).

Infix: (A + B) × C     RPN (Postfix): A B + C ×

Assume A=3, B=5, C=4:

StepInstructionActionStack (top →)Result
1PUSH APush 33
2PUSH BPush 53, 5
3ADDPop 5,3; Push 3+58A+B = 8
4PUSH CPush 48, 4
5MULPop 4,8; Push 8×432(A+B)×C = 32
Java's JVM is a stack-based virtual machine! Every Java bytecode instruction operates on a stack. When you write int x = a + b; in Java, the JVM internally executes: ILOAD a, ILOAD b, IADD, ISTORE x — pure 0-address stack operations. Postscript printers and HP calculators also use stack-based evaluation.

3. Instruction Formats

An instruction format defines the layout of bits in a machine instruction — how many addresses (operands) are specified, what fields are present, and how long each field is. The number of address fields determines the instruction type.

Evaluating X = (A + B) × (C + D) in All Four Formats

3-Address Format: OP DEST, SRC1, SRC2
3-ADDRESS
Instruction 1:  ADD  R1, A, B      ; R1 โ† M[A] + M[B]
Instruction 2:  ADD  R2, C, D      ; R2 โ† M[C] + M[D]
Instruction 3:  MUL  X,  R1, R2    ; M[X] โ† R1 ร— R2

Total instructions: 3
Advantage: Fewest instructions, most information per instruction
Disadvantage: Longest instruction word (3 address fields)
2-Address Format: OP DEST, SRC (DEST ← DEST op SRC)
2-ADDRESS
Instruction 1:  MOV  R1, A      ; R1 โ† M[A]
Instruction 2:  ADD  R1, B      ; R1 โ† R1 + M[B]        โ†’ R1 = A+B
Instruction 3:  MOV  R2, C      ; R2 โ† M[C]
Instruction 4:  ADD  R2, D      ; R2 โ† R2 + M[D]        โ†’ R2 = C+D
Instruction 5:  MUL  R1, R2     ; R1 โ† R1 ร— R2          โ†’ R1 = (A+B)ร—(C+D)
Instruction 6:  MOV  X,  R1     ; M[X] โ† R1

Total instructions: 6
Advantage: Moderate instruction length
Disadvantage: One source is always overwritten (destructive)
1-Address Format: OP ADDR (uses Accumulator AC implicitly)
1-ADDRESS
Instruction 1:  LOAD  A        ; AC โ† M[A]
Instruction 2:  ADD   B        ; AC โ† AC + M[B]        โ†’ AC = A+B
Instruction 3:  STORE T        ; M[T] โ† AC             โ†’ save A+B
Instruction 4:  LOAD  C        ; AC โ† M[C]
Instruction 5:  ADD   D        ; AC โ† AC + M[D]        โ†’ AC = C+D
Instruction 6:  MUL   T        ; AC โ† AC ร— M[T]        โ†’ AC = (C+D)ร—(A+B)
Instruction 7:  STORE X        ; M[X] โ† AC

Total instructions: 7
Advantage: Short instruction word
Disadvantage: Needs temporary storage, more instructions
0-Address Format: OP (uses Stack implicitly)
0-ADDRESS
Instruction 1:  PUSH  A        ; TOS โ† A
Instruction 2:  PUSH  B        ; TOS โ† B
Instruction 3:  ADD             ; Pop B,A; Push A+B
Instruction 4:  PUSH  C        ; TOS โ† C
Instruction 5:  PUSH  D        ; TOS โ† D
Instruction 6:  ADD             ; Pop D,C; Push C+D
Instruction 7:  MUL             ; Pop (C+D),(A+B); Push (A+B)ร—(C+D)
Instruction 8:  POP   X        ; M[X] โ† TOS

Total instructions: 8
Advantage: Shortest instruction word, hardware-friendly
Disadvantage: Most instructions needed, stack management overhead

Comparison of Instruction Formats

Parameter3-Address2-Address1-Address0-Address
FieldsOP + 3 addrOP + 2 addrOP + 1 addrOP only
Instruction LengthLongestMediumShortShortest
Instructions for X=(A+B)×(C+D)3678
Program SizeFewest instructionsModerateMoreMost instructions
Memory AccessMultiple per instruction2 per instruction1 per instructionStack only
Register UsageGeneral purposeGeneral purposeAccumulatorStack
Example CPUARM, MIPSx86 (MOV, ADD)Early PDP-8JVM, HP calculators
Don't confuse instruction count with program efficiency. 3-address has fewer instructions but each is longer (more bits). 0-address has more instructions but each is very short. Total program size in bits may be similar. GATE questions often ask you to compare total memory needed, not just instruction count.

4. Addressing Modes — All 8

An addressing mode specifies how the CPU calculates the effective address (EA) of an operand. Different modes provide different trade-offs between flexibility, speed, and code compactness. Mastering all 8 modes is essential for GATE and CPU design.

Mode 1: Immediate Addressing

Definition: The operand value is directly contained in the instruction itself. No memory access needed for operand.

EA: No effective address — operand is part of instruction.

IMMEDIATE
  Instruction: [ OP | #25 ]
                      โ”‚
                      โ””โ”€โ”€ Operand = 25 (directly in instruction)

  Example: MOV R1, #25    โ†’ R1 = 25
  Use: Loading constants, initializing counters
  Speed: โ˜…โ˜…โ˜…โ˜…โ˜… Fastest (no memory access for operand)

Mode 2: Direct Addressing

Definition: The address field contains the actual memory address of the operand.

EA = Address field of instruction

DIRECT
  Instruction: [ OP | 500 ]
                      โ”‚
                      โ–ผ
               Memory[500] = 42   โ† Operand

  EA = 500, Operand = 42
  Example: LOAD 500        โ†’ AC = M[500] = 42
  Use: Accessing global variables
  Speed: โ˜…โ˜…โ˜…โ˜… One memory access

Mode 3: Indirect Addressing

Definition: The address field points to a memory location that contains the effective address. Two memory accesses needed.

EA = M[Address field]

INDIRECT
  Instruction: [ OP | 500 ]
                      โ”‚
                      โ–ผ
               Memory[500] = 800   โ† This is the EA (pointer)
                      โ”‚
                      โ–ผ
               Memory[800] = 42    โ† Actual operand

  EA = 800, Operand = 42
  Example: LOAD @500       โ†’ AC = M[M[500]] = M[800] = 42
  Use: Pointers, dynamic memory access, linked lists
  Speed: โ˜…โ˜…โ˜… Two memory accesses (slower)

Mode 4: Register Addressing

Definition: The operand is in a CPU register. No memory access needed.

EA: None — operand is in the specified register.

REGISTER
  Instruction: [ OP | R3 ]
                      โ”‚
                      โ–ผ
                 R3 = 42   โ† Operand is in register R3

  Operand = R3 = 42
  Example: ADD R1, R3      โ†’ R1 = R1 + R3
  Use: Fastest operations, loop variables
  Speed: โ˜…โ˜…โ˜…โ˜…โ˜… No memory access

Mode 5: Register Indirect Addressing

Definition: The register contains the memory address of the operand.

EA = [Register] (contents of register is the address)

REGISTER INDIRECT
  Instruction: [ OP | R3 ]
                      โ”‚
                      โ–ผ
                 R3 = 800   โ† R3 holds memory address
                      โ”‚
                      โ–ผ
               Memory[800] = 42   โ† Actual operand

  EA = 800 (value in R3), Operand = 42
  Example: LOAD (R3)       โ†’ AC = M[R3] = M[800] = 42
  Use: Array access, pointer dereferencing
  Speed: โ˜…โ˜…โ˜…โ˜… One memory access

Mode 6: Autoincrement Addressing

Definition: Like register indirect, but the register is automatically incremented after use. Perfect for array traversal.

EA = [R]; then R ← R + 1

AUTOINCREMENT
  Before: R3 = 800

  Instruction: [ OP | (R3)+ ]
                      โ”‚
                 R3 = 800 โ†’ Memory[800] = 42   โ† Operand
                      โ”‚
                 R3 = 801   โ† R3 auto-incremented after access

  EA = 800, Operand = 42, R3 updated to 801
  Example: LOAD (R3)+      โ†’ AC = M[R3]; R3 = R3 + 1
  Use: Array traversal, sequential data processing
  Speed: โ˜…โ˜…โ˜…โ˜… One memory access + register update

Mode 7: Displacement (Indexed) Addressing

Definition: EA is computed by adding a constant displacement in the instruction to the contents of a register.

EA = Address field + [R]

DISPLACEMENT / INDEXED
  Instruction: [ OP | 100 | R2 ]
                      โ”‚      โ”‚
                      โ”‚   R2 = 500
                      โ”‚      โ”‚
                      โ””โ”€โ”€+โ”€โ”€โ”€โ”˜
                         โ”‚
                    EA = 100 + 500 = 600
                         โ”‚
                  Memory[600] = 42   โ† Operand

  EA = 600, Operand = 42
  Example: LOAD 100(R2)    โ†’ AC = M[100 + R2] = M[600] = 42
  Use: Accessing struct fields, array elements with base
  Speed: โ˜…โ˜…โ˜…โ˜… One addition + one memory access

Mode 8: Relative Addressing

Definition: EA is computed by adding the address field (offset) to the Program Counter (PC). Used for branch instructions.

EA = PC + Address field

RELATIVE
  Instruction at PC=200: [ OP | +50 ]
                                 โ”‚
                      PC = 200 + 50 = 250
                                 โ”‚
                          EA = 250 (target of branch)

  EA = 250
  Example: BEQ +50          โ†’ if zero flag set, jump to PC+50 = 250
  Use: Branch/jump instructions, position-independent code
  Speed: โ˜…โ˜…โ˜…โ˜… One addition (no memory access for address)

Comparison of All 8 Addressing Modes

ModeEA FormulaMemory AccessesSpeedUse Case
ImmediateOperand in instruction0โ˜…โ˜…โ˜…โ˜…โ˜…Constants
DirectEA = Addr1โ˜…โ˜…โ˜…โ˜…Global variables
IndirectEA = M[Addr]2โ˜…โ˜…โ˜…Pointers
RegisterOperand = R0โ˜…โ˜…โ˜…โ˜…โ˜…Loop variables
Reg. IndirectEA = [R]1โ˜…โ˜…โ˜…โ˜…Array via pointer
AutoincrementEA = [R]; R++1โ˜…โ˜…โ˜…โ˜…Array traversal
DisplacementEA = Addr + [R]1โ˜…โ˜…โ˜…โ˜…Struct fields
RelativeEA = PC + Addr0โ˜…โ˜…โ˜…โ˜…Branches/jumps
GATE shortcut: If a question asks "how many memory accesses to fetch operand?" remember: Immediate=0, Register=0, Direct=1, Register Indirect=1, Autoincrement=1, Displacement=1, Relative=0 (for address calculation), Indirect=2. Don't forget the instruction fetch itself is also a memory access!

5. RISC vs CISC Architecture

The two dominant CPU design philosophies are RISC (Reduced Instruction Set Computer) and CISC (Complex Instruction Set Computer). This is one of the most important comparisons in computer architecture and a frequent GATE topic.

ParameterRISCCISC
Full FormReduced Instruction Set ComputerComplex Instruction Set Computer
Instruction Set SizeSmall (50โ€“150 instructions)Large (200โ€“300+ instructions)
Instruction LengthFixed (32-bit typically)Variable (1โ€“15 bytes in x86)
Execution Time1 clock cycle per instruction (mostly)Multiple cycles per instruction
Addressing ModesFew (3โ€“5 modes)Many (12โ€“20+ modes)
PipeliningHighly efficient (fixed-length helps)Difficult (variable-length hinders)
RegistersMany (32โ€“64 general purpose)Few (8โ€“16 general purpose)
Memory AccessOnly LOAD/STORE access memoryAny instruction can access memory
Code SizeLarger (more instructions needed)Smaller (complex instructions do more)
Hardware ComplexitySimple (hardwired control)Complex (microprogrammed control)
Power ConsumptionLower (simpler circuits)Higher (complex decode logic)
ExamplesARM, MIPS, RISC-V, SPARC, PowerPCIntel x86, AMD x86-64, VAX, Motorola 68k
Compiler ComplexityMore complex (must optimize simple ops)Simpler (hardware does the work)
Use CasesMobile, embedded, IoT, laptops (Apple M4)Desktops, servers, legacy PCs

ARM (RISC) vs Intel x86 (CISC) โ€” Real World

ParameterARM Cortex-A78 (RISC)Intel Core i7-14700K (CISC)
Instruction WidthFixed 32-bit (or 16-bit Thumb)Variable 1โ€“15 bytes
Registers31 general-purpose (AArch64)16 general-purpose (x86-64)
Power (TDP)~1โ€“5W per core~125W package
Pipeline Stages11โ€“13 stages14โ€“19 stages
Market99% of smartphones, Apple MacBooks~75% of desktops/servers
India UsageEvery Indian smartphone, Raspberry PiOffice PCs, data centers
IIT Madras developed SHAKTI — India's first home-grown processor based on RISC-V. The C-class core runs Linux and targets IoT and edge computing. RISC-V is open-source (no ARM licensing fees), which makes it strategic for India's semiconductor independence. The VEGA processor by C-DAC is another Indian RISC-V initiative targeting HPC applications.

6. Data Transfer & Manipulation Instructions

Data Transfer Instructions

InstructionOperationExampleDescription
LOADAC ← M[addr]LOAD 500Load memory into accumulator
STOREM[addr] ← ACSTORE 600Store accumulator to memory
MOVDEST ← SRCMOV R1, R2Copy data between registers
PUSHSP--; M[SP] ← RPUSH R3Push register onto stack
POPR ← M[SP]; SP++POP R3Pop stack top into register
XCHGR1 ↔ R2XCHG R1, R2Exchange contents of two registers
INR ← PortIN R1, PORT_AInput from I/O port
OUTPort ← ROUT PORT_B, R1Output to I/O port

Data Manipulation Instructions

CategoryInstructionOperationExample
ArithmeticADDR1 ← R1 + R2ADD R1, R2
SUBR1 ← R1 − R2SUB R1, R2
MULR1 ← R1 × R2MUL R1, R2
DIVR1 ← R1 ÷ R2DIV R1, R2
LogicalANDR1 ← R1 AND R2AND R1, R2
ORR1 ← R1 OR R2OR R1, R2
XORR1 ← R1 XOR R2XOR R1, R2
NOTR1 ← complement of R1NOT R1
ShiftSHLShift left logicalSHL R1, 1
SHRShift right logicalSHR R1, 1
ROLRotate leftROL R1, 2
RORRotate rightROR R1, 2

7. Program Control & Interrupts

Program Control Instructions

InstructionOperationDescription
JMP addrPC ← addrUnconditional jump
BEQ addrIf ZF=1, PC ← addrBranch if equal (zero flag set)
BNE addrIf ZF=0, PC ← addrBranch if not equal
BGT addrIf ZF=0 and SF=OF, PC ← addrBranch if greater than
CALL addrPUSH PC; PC ← addrCall subroutine (save return address)
RETPOP PCReturn from subroutine
NOPNo operationPipeline delay, alignment
HLTHalt processorStop execution

Interrupts

An interrupt is a signal that diverts the CPU from its current program to execute a special routine called an Interrupt Service Routine (ISR). After handling the interrupt, the CPU returns to the original program.

TypeSourceExamplePriority
Hardware ExternalI/O devices, timersKeyboard press, disk readyHigh
Hardware InternalCPU itselfDivision by zero, overflowHighest
SoftwareInstruction in programINT 21h (DOS), SVC (ARM)Programmed
Non-Maskable (NMI)Critical hardwarePower failure, memory parity errorCannot be disabled
MaskablePeripheral devicesPrinter ready, serial dataCan be disabled via IF flag

Interrupt Handling Cycle

INTERRUPT FLOW
  Program Execution
        โ”‚
        โ–ผ
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ Interrupt Signal โ”‚ โ† Device sends interrupt request
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
           โ–ผ
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ Finish Current  โ”‚ โ† CPU completes current instruction
  โ”‚ Instruction     โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
           โ–ผ
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ Save Context    โ”‚ โ† Push PC and PSW onto stack
  โ”‚ (PC, PSW, Regs) โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
           โ–ผ
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ Identify Source  โ”‚ โ† Polling or vectored interrupt
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
           โ–ผ
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ Load ISR Address โ”‚ โ† From interrupt vector table
  โ”‚ into PC         โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
           โ–ผ
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ Execute ISR     โ”‚ โ† Handle the interrupt
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
           โ–ผ
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ RTI (Return     โ”‚ โ† Pop PC and PSW from stack
  โ”‚ from Interrupt) โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
           โ–ผ
  Resume Original Program

Priority Interrupt Systems

MethodHow It WorksProsCons
Daisy ChainDevices connected in series; closest to CPU has highest prioritySimple hardwareFixed priority, slow for many devices
Parallel PriorityEach device has dedicated line; priority encoder selects highestFast, flexibleMore hardware, more wires
Software PollingCPU checks each device in sequence via status registersNo extra hardwareSlow, wastes CPU cycles
The ARM Cortex-M series has the NVIC (Nested Vectored Interrupt Controller) that supports up to 240 interrupt sources with 256 priority levels. When your phone receives a call while playing music, the NVIC ensures the call interrupt pre-empts the audio processing interrupt instantly. This happens in nanoseconds.

8. Processor Status Word (PSW) & Flags

The Processor Status Word (PSW), also called the Flags Register or EFLAGS (in x86), is a special register that holds condition codes set by ALU operations. These flags are used by conditional branch instructions to make decisions.

FlagFull NameSet WhenExample
CFCarry FlagUnsigned operation produces carry/borrow out of MSB0xFFFF + 0x0001 โ†’ CF=1
ZFZero FlagResult of operation is zero5 − 5 = 0 โ†’ ZF=1
SFSign FlagResult is negative (MSB = 1 in signed representation)3 − 5 = −2 โ†’ SF=1
OFOverflow FlagSigned operation exceeds representable range0x7FFF + 0x0001 โ†’ OF=1
IFInterrupt FlagSet = interrupts enabled; Clear = interrupts disabledCLI clears IF, STI sets IF

Trace: ADD 0x7FFF + 0x0001 (16-bit signed)

FLAG TRACE
  Operand A:  0x7FFF  =  0111 1111 1111 1111  (+32767, max positive 16-bit)
  Operand B:  0x0001  =  0000 0000 0000 0001  (+1)

  Binary Addition:
    0111 1111 1111 1111   (0x7FFF = +32767)
  + 0000 0000 0000 0001   (0x0001 = +1)
  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    1000 0000 0000 0000   (0x8000 = -32768 in signed!)

  Result = 0x8000

  Flag Analysis:
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ Flag โ”‚ Value  โ”‚ Reason                                   โ”‚
  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
  โ”‚  CF  โ”‚   0    โ”‚ No carry out of bit 15 (unsigned OK)     โ”‚
  โ”‚  ZF  โ”‚   0    โ”‚ Result โ‰  0                               โ”‚
  โ”‚  SF  โ”‚   1    โ”‚ MSB = 1 (result appears negative)        โ”‚
  โ”‚  OF  โ”‚   1    โ”‚ +ve + +ve = โˆ’ve โ†’ signed overflow!       โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

  Explanation: Adding two positive numbers (0x7FFF + 0x0001) gave a
  negative result (0x8000 = -32768). This is SIGNED OVERFLOW.
  OF is set because the sign of the result doesn't match expected.
  CF is NOT set because there's no carry out in unsigned addition.

More Flag Trace Examples

OperationResult (16-bit)CFZFSFOF
0x0005 + 0x00030x00080000
0xFFFF + 0x00010x00001100
0x8000 + 0x80000x00001101
0x0005 − 0x00050x00000100
0x0003 − 0x00050xFFFE1010
0x7000 + 0x70000xE0000011
CF and OF are different! CF detects unsigned overflow (carry out of MSB). OF detects signed overflow (when the sign of the result is wrong). A single ADD can set CF=1, OF=0 or CF=0, OF=1 or both. They are independent flags checking different things.
Section D

Learn by Doing — 3-Tier Lab Structure

๐ŸŸข Tier 1 — GUIDED: Instruction Format Converter (Pen & Paper)

โฑ๏ธ 60โ€“90 minutesBeginnerZero prior knowledge assumed

Objective:

Convert the expression X = (A + B) × (C + D) into all 4 instruction formats by hand, showing every step.

Step 1: Write the Expression Tree

Draw the expression tree for X = (A + B) × (C + D):

         ร—
        / \
       +   +
      / \ / \
     A  B C  D

Step 2: 3-Address Format

Each instruction specifies: OP destination, source1, source2

Write three instructions: (1) ADD T1, A, B (2) ADD T2, C, D (3) MUL X, T1, T2

Step 3: 2-Address Format

One operand is both source and destination. You need MOV to copy initial values.

Write: MOV R1,A โ†’ ADD R1,B โ†’ MOV R2,C โ†’ ADD R2,D โ†’ MUL R1,R2 โ†’ MOV X,R1

Step 4: 1-Address Format (Accumulator)

All operations use the implicit accumulator AC. Need STORE for temporary results.

Write: LOAD A โ†’ ADD B โ†’ STORE T โ†’ LOAD C โ†’ ADD D โ†’ MUL T โ†’ STORE X

Step 5: 0-Address Format (Stack)

Convert to postfix (RPN): A B + C D + ×

Write: PUSH A โ†’ PUSH B โ†’ ADD โ†’ PUSH C โ†’ PUSH D โ†’ ADD โ†’ MUL โ†’ POP X

Step 6: Fill the Comparison Table

Count: instructions, memory accesses, and total bits needed for each format. Create a table comparing all four.

๐ŸŽ‰ Deliverable: A clean, hand-written (or typed) comparison showing all 4 formats with instruction counts and analysis. Take a photo for your portfolio.

๐ŸŸก Tier 2 — SEMI-GUIDED: CPU Register Simulator in Python

โฑ๏ธ 90โ€“120 minutesIntermediateBasic Python knowledge assumed

Your Mission:

Build a Python simulator that models a simple CPU with 8 registers (R0–R7), an ALU, and a memory of 256 words. Implement LOAD, STORE, ADD, SUB, and MOV instructions.

Starter Code (you complete the TODOs):

Python
class SimpleCPU:
    def __init__(self):
        self.registers = [0] * 8       # R0-R7
        self.memory = [0] * 256        # 256-word memory
        self.flags = {'CF':0, 'ZF':0, 'SF':0, 'OF':0}

    def load(self, reg, addr):
        # TODO: Load memory[addr] into registers[reg]
        pass

    def store(self, reg, addr):
        # TODO: Store registers[reg] into memory[addr]
        pass

    def add(self, dest, src1, src2):
        # TODO: registers[dest] = registers[src1] + registers[src2]
        # TODO: Update ZF, SF, CF, OF flags
        pass

    def display_state(self):
        # TODO: Print all register values and flags
        pass

Test Case:

Store A=5 at memory[100], B=3 at memory[101]. Execute: LOAD R1,100 โ†’ LOAD R2,101 โ†’ ADD R3,R1,R2 โ†’ STORE R3,102. Verify memory[102] = 8.

Stretch Goal: Add SUB, AND, OR instructions. Implement flag updates. Add a simple instruction parser that reads assembly-like text files and executes them.

๐Ÿ”ด Tier 3 — OPEN CHALLENGE: Design a Custom CPU Instruction Set

โฑ๏ธ 2โ€“3 hoursAdvancedNo instructions — real-world design project

The Brief:

Design a complete instruction set architecture (ISA) for a hypothetical 16-bit CPU called ARTHA-16. Your design must include:

  1. 16 instructions covering: data transfer (4), arithmetic (4), logical (3), control flow (3), stack (2)
  2. At least 4 addressing modes: Immediate, Direct, Register, Register Indirect
  3. Instruction encoding: 16-bit fixed format. Show the bit layout for each instruction type.
  4. 8 registers: R0–R5 (general), R6 (SP), R7 (PC)
  5. Sample program: Write a program to compute factorial of 5 using your ISA
  6. Documentation: Create a 2-page ISA reference card (like ARM's quick reference)

Deliverable: A PDF/Google Doc with your ISA specification, encoding tables, and sample program. This is a portfolio-worthy project for embedded systems roles.

This is exactly what CPU architects do at Qualcomm, ARM, and Intel. Companies like Samsung Semiconductor Noida hire freshers who can demonstrate ISA design skills. This project, polished well, can be the centrepiece of your resume for VLSI/embedded roles at โ‚น6–12 LPA.
Section E

Problem Bank — Diagrams, Numericals, Industry & GATE

Diagram-Based Problems (3)

๐Ÿ“ Problem D1: Draw General Register Organization

Q: Draw the complete block diagram of a general register organization with 8 registers. Label MUX A, MUX B, ALU, output bus, and all control signals (SELA, SELB, SELD, OPR). Show the data flow for the operation R5 ← R2 AND R6.

Solution: Use the diagram from Section C, Topic 1. For R5 ← R2 AND R6: SELA=010 (R2), SELB=110 (R6), SELD=101 (R5), OPR=01000 (AND). The control word is: 010 110 101 01000.

๐Ÿ“ Problem D2: Stack Trace for (A−B)×(C+D)÷E

Q: Show the complete stack trace (step-by-step) for evaluating (A−B)×(C+D)÷E using 0-address instructions. Use A=10, B=3, C=4, D=6, E=2.

Solution: Postfix: A B − C D + × E ÷

StepInstructionStackResult
1PUSH A10
2PUSH B10, 3
3SUB710−3=7
4PUSH C7, 4
5PUSH D7, 4, 6
6ADD7, 104+6=10
7MUL707×10=70
8PUSH E70, 2
9DIV3570÷2=35

Final answer: 35

๐Ÿ“ Problem D3: Draw the Interrupt Handling Flowchart

Q: Draw the complete flowchart for interrupt handling, showing the steps from interrupt request to resumption of the original program. Include context saving, ISR execution, and RTI.

Solution: Refer to the ASCII flowchart in Section C, Topic 7. The key steps are: (1) Complete current instruction, (2) Save PC and PSW to stack, (3) Identify interrupt source, (4) Load ISR address from IVT, (5) Execute ISR, (6) Execute RTI to restore PC and PSW, (7) Resume original program.

Numerical Problems (6)

๐Ÿ”ข Problem N1: Effective Address Calculation

Q: Given: R1=200, PC=500, Memory[300]=600, Memory[600]=42, Memory[700]=55. Calculate the effective address and operand for: (a) Direct 300, (b) Indirect 300, (c) Register R1, (d) Displacement 500(R1).

Solution:

(a) Direct 300: EA=300, Operand=M[300]=600

(b) Indirect 300: EA=M[300]=600, Operand=M[600]=42

(c) Register R1: Operand=R1=200 (no EA, value in register)

(d) Displacement 500(R1): EA=500+R1=500+200=700, Operand=M[700]=55

๐Ÿ”ข Problem N2: Control Word Generation

Q: For a general register organization with R0–R7, write the 14-bit control word for: (a) R4 ← R1 + R7, (b) R0 ← R3 XOR R5, (c) R6 ← R2 (transfer).

Solution:

(a) SELA=001, SELB=111, SELD=100, OPR=00010 (ADD) โ†’ 001 111 100 00010

(b) SELA=011, SELB=101, SELD=000, OPR=01100 (XOR) โ†’ 011 101 000 01100

(c) SELA=010, SELB=000, SELD=110, OPR=00000 (Transfer A) โ†’ 010 000 110 00000

๐Ÿ”ข Problem N3: Instruction Count Comparison

Q: For the expression Y = (P + Q) × (R − S) + T, determine the number of instructions needed in 3-address, 2-address, 1-address, and 0-address formats.

Solution: 3-address: ADD T1,P,Q โ†’ SUB T2,R,S โ†’ MUL T3,T1,T2 โ†’ ADD Y,T3,T = 4 instructions. 2-address: MOV R1,P โ†’ ADD R1,Q โ†’ MOV R2,R โ†’ SUB R2,S โ†’ MUL R1,R2 โ†’ ADD R1,T โ†’ MOV Y,R1 = 7 instructions. 1-address: LOAD P โ†’ ADD Q โ†’ STORE T1 โ†’ LOAD R โ†’ SUB S โ†’ MUL T1 โ†’ ADD T โ†’ STORE Y = 8 instructions. 0-address: PUSH P โ†’ PUSH Q โ†’ ADD โ†’ PUSH R โ†’ PUSH S โ†’ SUB โ†’ MUL โ†’ PUSH T โ†’ ADD โ†’ POP Y = 10 instructions.

๐Ÿ”ข Problem N4: Stack Operations Trace

Q: Initial SP=1000. Show the SP value after each operation: PUSH A, PUSH B, PUSH C, POP, PUSH D, POP, POP.

Solution: PUSH A: SP=999. PUSH B: SP=998. PUSH C: SP=997. POP: SP=998. PUSH D: SP=997. POP: SP=998. POP: SP=999. Stack grows downward; PUSH decrements SP, POP increments SP.

๐Ÿ”ข Problem N5: Flag Tracing

Q: Determine CF, ZF, SF, OF after each operation (8-bit signed): (a) ADD 127, 1 (b) SUB 0, 1 (c) ADD 128, 128.

Solution:

(a) 127+1 = 128 โ†’ 0x80 = 10000000. CF=0, ZF=0, SF=1, OF=1 (positive+positive=negative)

(b) 0−1 = −1 โ†’ 0xFF = 11111111. CF=1 (borrow), ZF=0, SF=1, OF=0

(c) 128+128 = 256 โ†’ 0x00 (8-bit overflow). CF=1, ZF=1, SF=0, OF=1 (negative+negative=zero)

๐Ÿ”ข Problem N6: Relative Address Calculation

Q: A branch instruction is at address 2050. The instruction is BEQ with an 8-bit signed offset of −30 (decimal). If the PC has already been incremented to 2052 when the offset is applied, what is the target address?

Solution: Target = PC + offset = 2052 + (−30) = 2052 − 30 = 2022. The branch goes backward 30 bytes from the next instruction address. This is how loops work — the branch target is before the branch instruction.

Industry Problems (3)

๐Ÿญ Problem I1: ARM Pipeline Analysis

Q: An ARM Cortex-A78 has a 13-stage pipeline. If the clock frequency is 3 GHz, what is the theoretical maximum throughput in MIPS? Why is actual throughput lower?

Solution: Theoretical: 1 instruction per cycle at 3 GHz = 3000 MIPS. Actual throughput is lower due to: pipeline stalls (data hazards), branch mispredictions (flushing pipeline), cache misses (memory latency), and dependencies between instructions. Modern ARM cores use superscalar execution (multiple instructions per cycle) to partially compensate, achieving effective IPC of 3–5.

๐Ÿญ Problem I2: x86 Instruction Decode Challenge

Q: Intel x86 has variable-length instructions (1–15 bytes). Explain why this makes pipelining harder than ARM's fixed 32-bit instructions. How does Intel solve this problem?

Solution: Variable-length makes it impossible to know where the next instruction starts without decoding the current one. This creates a bottleneck at the decode stage. Intel solves this with: (1) Pre-decode buffers that scan ahead and mark instruction boundaries, (2) Micro-op translation — complex CISC instructions are broken into fixed-length RISC-like micro-operations (ยตops) internally, (3) ยตop caches that store decoded instructions. Modern Intel CPUs are internally RISC-like despite their CISC ISA.

๐Ÿญ Problem I3: RISC-V in Indian Context

Q: Why is RISC-V strategically important for India's semiconductor mission? Compare the cost and licensing models of ARM vs RISC-V for an Indian startup designing a custom IoT chip.

Solution: ARM requires licensing fees: $1M–$10M+ upfront + per-chip royalties (1–2%). RISC-V is open-source — zero licensing cost. For an Indian IoT startup producing 100,000 chips, ARM licensing could cost โ‚น8–80 crore, while RISC-V costs โ‚น0 in licensing. This is why IIT Madras chose RISC-V for SHAKTI and C-DAC chose it for VEGA. India's semiconductor independence depends on not paying royalties to foreign companies for basic CPU IP.

GATE-Style Problems (5)

๐ŸŽ“ GATE G1 (2-mark)

Q: A CPU has 16 general-purpose registers. How many bits are needed in the control word for the register selection fields (SELA + SELB + SELD)?

Solution: Each MUX needs logโ‚‚(16) = 4 bits. Three fields: SELA(4) + SELB(4) + SELD(4) = 12 bits.

๐ŸŽ“ GATE G2 (2-mark)

Q: Consider a byte-addressable memory with 16-bit addresses. If displacement addressing is used with a 6-bit signed offset and a 16-bit base register, what is the addressable range relative to the base?

Solution: 6-bit signed offset: range is −32 to +31 (in 2's complement). So EA ranges from [Base − 32] to [Base + 31]. The addressable range is 64 bytes centered around the base register value.

๐ŸŽ“ GATE G3 (1-mark)

Q: In a stack-based CPU, the instruction sequence PUSH 5, PUSH 3, SUB, PUSH 2, MUL produces what result on top of stack?

Solution: PUSH 5 โ†’ [5]. PUSH 3 โ†’ [5,3]. SUB โ†’ Pop 3,5; Push 5−3=2 โ†’ [2]. PUSH 2 โ†’ [2,2]. MUL โ†’ Pop 2,2; Push 2×2=4 โ†’ [4]. Answer: 4.

๐ŸŽ“ GATE G4 (2-mark)

Q: A RISC machine has 32 registers and uses 3-address instructions. Each instruction has a 6-bit opcode and three register fields. What is the instruction length?

Solution: Opcode: 6 bits. Each register field: logโ‚‚(32) = 5 bits. Total = 6 + 5 + 5 + 5 = 21 bits. In practice, this would be padded to 32 bits with unused/extended fields.

๐ŸŽ“ GATE G5 (2-mark)

Q: Which addressing mode is used to implement the branch instruction "if R1 == 0, jump to label L" where L is 40 bytes ahead of the current PC?

Solution: Relative addressing (PC-relative). The offset +40 is added to the current PC to compute the target address. This produces position-independent code. The instruction would be: BEQ +40 (if ZF=1 after comparing R1 with 0).

Section F

MCQ Assessment Bank — 30 Questions (Bloom's Mapped)

Remember / Identify (Q1–Q5)

Q1

In General Register Organization, the component that selects the source operand for ALU input A is:

  1. ALU
  2. MUX A
  3. SELD
  4. Output Bus
Remember
โœ… Answer: (B) MUX A โ€” MUX A (Multiplexer A) selects one of the registers as the first source operand for the ALU, controlled by the SELA field.
Q2

The PUSH operation on a memory stack (growing downward) performs:

  1. SP โ† SP + 1, then M[SP] โ† DR
  2. SP โ† SP โˆ’ 1, then M[SP] โ† DR
  3. DR โ† M[SP], then SP โ† SP + 1
  4. DR โ† M[SP], then SP โ† SP โˆ’ 1
Remember
โœ… Answer: (B) โ€” For a downward-growing stack, PUSH first decrements SP (to point to the next free location) and then writes the data register value to that location.
Q3

RISC stands for:

  1. Reduced Instruction Standard Computer
  2. Reduced Instruction Set Computer
  3. Register Instruction Set Computer
  4. Rapid Instruction Set Computing
Remember
โœ… Answer: (B) โ€” RISC = Reduced Instruction Set Computer. It uses a small, highly optimized set of instructions.
Q4

The Overflow Flag (OF) in the PSW is set when:

  1. The result is zero
  2. There is a carry from the MSB in unsigned arithmetic
  3. A signed operation produces a result outside the representable range
  4. The stack is full
Remember
โœ… Answer: (C) โ€” OF is set when the signed result cannot be represented in the given number of bits. For example, adding two positive numbers and getting a negative result.
Q5

Which addressing mode uses the Program Counter (PC) to calculate the effective address?

  1. Direct
  2. Immediate
  3. Relative
  4. Register Indirect
Remember
โœ… Answer: (C) Relative โ€” In relative addressing, EA = PC + offset. This is primarily used for branch/jump instructions and produces position-independent code.

Understand / Explain (Q6–Q10)

Q6

Why does a 0-address instruction format require more instructions than a 3-address format for the same expression?

  1. Because 0-address uses longer instructions
  2. Because it must explicitly push each operand and pop the result, using the stack for all operations
  3. Because it has fewer registers
  4. Because the ALU is slower
Understand
โœ… Answer: (B) โ€” In 0-address format, every operand must be explicitly pushed onto the stack, and every operation implicitly pops operands and pushes the result. This requires separate PUSH/POP instructions that 3-address encodes within the instruction itself.
Q7

Why is pipelining more efficient in RISC than CISC architectures?

  1. RISC has more registers
  2. RISC uses fixed-length instructions, making fetch and decode stages predictable
  3. RISC has a larger instruction set
  4. CISC uses hardwired control
Understand
โœ… Answer: (B) โ€” Fixed-length instructions in RISC allow the CPU to know exactly where each instruction starts, enabling efficient pipeline filling. CISC's variable-length instructions create decode bottlenecks.
Q8

In indirect addressing, why are two memory accesses needed to fetch the operand?

  1. One to read the instruction, one to read the operand
  2. One to read the pointer address from memory, another to read the actual operand from that address
  3. One for the opcode, one for the address field
  4. One for the stack, one for the register
Understand
โœ… Answer: (B) โ€” The address field points to a memory location containing the actual effective address (a pointer). First access reads the pointer, second access reads the operand at the pointed-to address.
Q9

What is the purpose of the SELD field in the general register organization control word?

  1. Selects the ALU operation
  2. Selects the first source register
  3. Selects the destination register where the ALU result is stored
  4. Selects the memory address
Understand
โœ… Answer: (C) โ€” SELD (Select Destination) determines which register receives the output from the ALU via the output bus.
Q10

Why does the Carry Flag (CF) and Overflow Flag (OF) serve different purposes?

  1. CF is for addition, OF is for subtraction
  2. CF detects unsigned overflow (carry out), OF detects signed overflow (sign error)
  3. CF is set by the ALU, OF is set by the control unit
  4. They always have the same value
Understand
โœ… Answer: (B) โ€” CF indicates an overflow in unsigned arithmetic (carry/borrow out of MSB). OF indicates an overflow in signed arithmetic (result sign doesn't match expected sign). They are independent flags.

Apply / Solve (Q11–Q15)

Q11

For the expression Y = (A + B) ร— C using 0-address instructions, how many instructions are needed?

  1. 4
  2. 5
  3. 6
  4. 7
Apply
โœ… Answer: (C) โ€” PUSH A, PUSH B, ADD, PUSH C, MUL, POP Y = 6 instructions. Postfix: A B + C ร— โ†’ 3 PUSHes + 2 operations + 1 POP = 6.
Q12

After executing ADD 0xFFFF + 0x0001 (16-bit), the flags are:

  1. CF=0, ZF=1, SF=0, OF=0
  2. CF=1, ZF=1, SF=0, OF=0
  3. CF=1, ZF=0, SF=0, OF=1
  4. CF=0, ZF=0, SF=1, OF=1
Apply
โœ… Answer: (B) โ€” 0xFFFF + 0x0001 = 0x10000, but in 16-bit: result = 0x0000. CF=1 (carry out), ZF=1 (result is zero), SF=0 (MSB=0), OF=0 (โˆ’1 + 1 = 0, sign is correct for signed).
Q13

Given R2 = 400 and the instruction LOAD 100(R2), with displacement addressing, the effective address is:

  1. 100
  2. 400
  3. 500
  4. 300
Apply
โœ… Answer: (C) โ€” EA = displacement + [R2] = 100 + 400 = 500. The operand is fetched from Memory[500].
Q14

The control word for R7 โ† R0 OR R4 (given OPR for OR = 01010) is:

  1. 000 100 111 01010
  2. 111 000 100 01010
  3. 100 000 111 01010
  4. 000 111 100 01010
Apply
โœ… Answer: (A) โ€” SELA=000 (R0 as source A), SELB=100 (R4 as source B), SELD=111 (R7 as destination), OPR=01010 (OR). Control word: 000 100 111 01010.
Q15

How many instructions are needed to compute X = (P โˆ’ Q) ร— (R + S) in 2-address format?

  1. 5
  2. 6
  3. 7
  4. 8
Apply
โœ… Answer: (C) โ€” MOV R1,P โ†’ SUB R1,Q โ†’ MOV R2,R โ†’ ADD R2,S โ†’ MUL R1,R2 โ†’ MOV X,R1 = wait, that's 6. But with proper destructive semantics: MOV R1,P; SUB R1,Q; MOV R2,R; ADD R2,S; MUL R1,R2; MOV X,R1 = 6. Answer is (B) 6 instructions. Corrected: Actually counting carefully โ€” MOV R1,P (1), SUB R1,Q (2), MOV R2,R (3), ADD R2,S (4), MUL R1,R2 (5), MOV X,R1 (6) = 6. Answer: (B).

Analyze / Compare (Q16–Q20)

Q16

Which of the following is NOT an advantage of RISC over CISC?

  1. Better pipelining efficiency
  2. Smaller code size for the same program
  3. Lower power consumption
  4. Simpler hardware design
Analyze
โœ… Answer: (B) โ€” RISC actually produces LARGER code (more instructions needed). CISC has smaller code size because complex instructions encode more work per instruction. All other options are genuine RISC advantages.
Q17

A program uses autoincrement addressing to traverse an array of 100 integers. If using direct addressing instead, how many additional instructions would be needed?

  1. 0 โ€” same count
  2. 100 โ€” one extra increment per element
  3. 99 โ€” increment after each access except last
  4. 200 โ€” one extra load and increment per element
Analyze
โœ… Answer: (C) โ€” Autoincrement automatically updates the pointer register after each access. Without it, you need an explicit ADD instruction to increment the pointer. For 100 elements, you need 99 extra increment instructions (no increment needed after the last element access).
Q18

Why do modern Intel CPUs internally translate CISC instructions into RISC-like micro-operations (ยตops)?

  1. To save memory
  2. To enable efficient out-of-order execution and pipelining of fixed-size operations
  3. To reduce the number of registers
  4. To make software compatible with ARM
Analyze
โœ… Answer: (B) โ€” Variable-length CISC instructions are difficult to pipeline. By converting them to fixed-size ยตops, Intel can use RISC-style execution engines with efficient pipelining, out-of-order execution, and superscalar dispatch.
Q19

In a daisy-chain priority interrupt system, device D3 is connected between D2 and D4. If D2 and D4 both raise interrupts simultaneously, which gets serviced first?

  1. D4 (it's farther from CPU)
  2. D2 (it's closer to CPU, higher priority)
  3. Both serviced simultaneously
  4. Neither โ€” deadlock occurs
Analyze
โœ… Answer: (B) โ€” In a daisy chain, devices closer to the CPU have higher priority. D2 is closer than D4, so D2's interrupt acknowledge signal reaches it first, blocking the signal from reaching D4 until D2 is serviced.
Q20

Compare the total memory bits for encoding "ADD R1, R2, R3" in a 3-address format (6-bit opcode, 4-bit registers) vs a stack-based 0-address equivalent. Which uses fewer total bits?

  1. 3-address: fewer bits
  2. 0-address: fewer bits
  3. Both use the same number of bits
  4. Cannot be determined without more information
Analyze
โœ… Answer: (D) โ€” 3-address: 6+4+4+4 = 18 bits for 1 instruction. 0-address equivalent needs 3 instructions (PUSH R1, PUSH R2, ADD) but instruction width depends on opcode size. Without knowing the 0-address instruction width, we can't compare total bits.

Evaluate / Justify (Q21–Q25)

Q21

For an embedded real-time system controlling an automotive braking mechanism, which interrupt priority scheme is most appropriate?

  1. Software polling
  2. Daisy-chain priority
  3. Parallel priority with hardware encoder
  4. No interrupts โ€” use busy waiting
Evaluate
โœ… Answer: (C) โ€” Parallel priority with hardware encoder provides the fastest response time. In safety-critical automotive systems, microsecond-level response to brake sensor interrupts is essential. Software polling and daisy chain are too slow; busy waiting wastes CPU cycles.
Q22

A student argues: "Register addressing is always better than direct addressing because it's faster." Is this correct?

  1. Yes โ€” registers are always faster than memory
  2. No โ€” register addressing cannot access large data structures in memory
  3. Yes โ€” all modern CPUs use only register addressing
  4. No โ€” direct addressing is faster for constants
Evaluate
โœ… Answer: (B) โ€” While register access is faster, registers are limited in number (8โ€“32 typically). Large data structures (arrays, databases) must reside in memory and need direct/indirect addressing. Speed isn't the only consideration โ€” addressability and flexibility matter too.
Q23

A company is choosing between stack-based and register-based architecture for a new Java bytecode processor. Which is more suitable?

  1. Register-based โ€” always better performance
  2. Stack-based โ€” matches Java's stack-oriented bytecode natively
  3. Both are equally suitable
  4. Neither โ€” a CISC approach is needed
Evaluate
โœ… Answer: (B) โ€” Java's JVM uses stack-based bytecode. A stack-based hardware processor can execute JVM bytecodes directly without translation, reducing overhead. This is why picoJava (Sun's Java processor) used stack architecture.
Q24

Is it justified for India to invest in RISC-V over licensing ARM cores? Evaluate:

  1. No โ€” ARM is industry-proven and reliable
  2. Yes โ€” RISC-V eliminates licensing costs and enables sovereign chip design
  3. No โ€” RISC-V has no ecosystem
  4. Yes โ€” but only for military applications
Evaluate
โœ… Answer: (B) โ€” ARM licensing costs $1Mโ€“$10M+ per design plus per-chip royalties. For India's goal of semiconductor self-reliance, RISC-V provides a zero-cost ISA that Indian institutions (IIT Madras, C-DAC) can freely customize. The ecosystem is rapidly growing with SiFive, Alibaba T-Head, and others.
Q25

A system has both maskable and non-maskable interrupts. The IF (Interrupt Flag) is cleared. Which statement is true?

  1. Both maskable and non-maskable interrupts are blocked
  2. Only maskable interrupts are blocked; NMI still fires
  3. Only NMI is blocked; maskable interrupts still fire
  4. Neither is affected โ€” IF only controls software interrupts
Evaluate
โœ… Answer: (B) โ€” When IF=0, maskable interrupts are disabled. But Non-Maskable Interrupts (NMI) cannot be disabled by software โ€” they always interrupt the CPU. This ensures critical events like power failure are always handled.

Create / Design (Q26–Q30)

Q26

You are designing a 32-bit RISC CPU with 64 registers and need a 3-address instruction format. How many bits remain for the opcode if the instruction is 32 bits?

  1. 8 bits
  2. 14 bits
  3. 10 bits
  4. 12 bits
Create
โœ… Answer: (B) โ€” 64 registers need logโ‚‚(64)=6 bits each. Three register fields: 6ร—3 = 18 bits. Remaining for opcode: 32 โˆ’ 18 = 14 bits, allowing up to 16,384 distinct instructions.
Q27

If you add autoincrement and autodecrement modes to a CPU with 8 registers, how many additional control bits are needed per instruction to specify the mode?

  1. 1 bit (auto-inc or auto-dec per operand)
  2. 2 bits per register field (4 modes: none, inc, dec, indirect)
  3. 3 bits total
  4. No additional bits needed
Create
โœ… Answer: (B) โ€” Each register field needs a 2-bit mode specifier to distinguish: (00) register, (01) register indirect, (10) autoincrement, (11) autodecrement. With two source registers, that's 4 additional bits.
Q28

Design consideration: A new ISA needs to support both 16-bit and 32-bit instructions (like ARM's Thumb mode). What is the primary advantage?

  1. Faster execution speed
  2. Reduced code size while maintaining 32-bit capability for complex operations
  3. More addressing modes
  4. More registers
Create
โœ… Answer: (B) โ€” ARM Thumb mode uses 16-bit instructions for common operations (reducing code size by ~30%) while allowing switch to 32-bit for complex operations. This saves memory in embedded systems with limited storage.
Q29

You need to design an interrupt controller supporting 8 devices with programmable priority. What minimum hardware is needed?

  1. 8 flip-flops and an 8-to-3 priority encoder
  2. 3 flip-flops and a 3-to-8 decoder
  3. 8 interrupt request lines, 8 mask flip-flops, an 8-to-3 priority encoder, and a priority register
  4. A single status register
Create
โœ… Answer: (C) โ€” Programmable priority needs: 8 IRQ lines (input from devices), 8 mask flip-flops (to enable/disable individual interrupts), an 8-to-3 priority encoder (to select highest priority), and a priority register (to store/change priority levels). This is how ARM's NVIC works.
Q30

If you architect a CPU with separate instruction and data caches (Harvard architecture), which stage of the pipeline benefits most?

  1. Execute stage
  2. Writeback stage
  3. Fetch stage โ€” instruction fetch and data access can happen simultaneously
  4. Decode stage
Create
โœ… Answer: (C) โ€” Harvard architecture allows the CPU to fetch the next instruction while simultaneously reading/writing data for the current instruction. This eliminates the structural hazard of competing for a single memory port, directly benefiting pipeline throughput at the fetch stage.
Section G

Short Answer Questions (8)

SA1: What is General Register Organization? Describe its control word.

General Register Organization is a CPU architecture where multiple general-purpose registers (e.g., R0–R7) are connected through multiplexers to an ALU. Any register can serve as source or destination. The control word has four fields: SELA (selects source register A for MUX A), SELB (selects source register B for MUX B), SELD (selects destination register for ALU output), and OPR (selects the ALU operation). For 8 registers, each SEL field needs 3 bits, and OPR needs 5 bits, giving a 14-bit control word.

SA2: Explain PUSH and POP operations with a memory stack diagram.

PUSH adds data to the stack: SP is decremented first (SP โ† SPโˆ’1), then data is written to the memory location pointed to by SP (M[SP] โ† DR). POP removes data: the value at SP is read into the data register (DR โ† M[SP]), then SP is incremented (SP โ† SP+1). The stack grows downward in memory โ€” PUSH moves SP to lower addresses, POP moves it to higher addresses. Stack overflow occurs if SP goes below the lower limit, and underflow occurs if POP is attempted when the stack is empty.

SA3: Differentiate between Direct and Indirect addressing modes.

In Direct addressing, the address field in the instruction directly contains the effective address (EA) of the operand. Only one memory access is needed: EA = Address field. In Indirect addressing, the address field contains a pointer โ€” the memory address of a location that holds the actual EA. Two memory accesses are needed: first to read the pointer, then to read the operand at the pointed address. Indirect addressing is slower but more flexible, supporting pointers and dynamic memory allocation. Direct is simpler and faster but limited to a fixed address space.

SA4: What is the difference between 1-address and 0-address instruction formats?

In a 1-address format, instructions have one explicit operand address and use an implicit accumulator (AC) as the other operand and destination. Example: ADD X means AC โ† AC + M[X]. In a 0-address format, instructions have no explicit address fields and use an implicit stack. Operations pop operands from the stack, compute, and push the result. Example: ADD pops two values, adds them, and pushes the sum. 0-address instructions are shortest but need more instructions per expression; 1-address instructions are slightly longer but require fewer instructions.

SA5: List 6 key differences between RISC and CISC.

(1) RISC has a small instruction set (50โ€“150); CISC has a large set (200โ€“300+). (2) RISC uses fixed-length instructions; CISC uses variable-length. (3) RISC executes most instructions in 1 clock cycle; CISC takes multiple cycles. (4) RISC uses load/store architecture (only LOAD/STORE access memory); CISC allows any instruction to access memory. (5) RISC has many registers (32โ€“64); CISC has fewer (8โ€“16). (6) RISC uses hardwired control; CISC uses microprogrammed control. Examples: ARM, MIPS (RISC) vs Intel x86, AMD (CISC).

SA6: What are the different types of interrupts? Give examples.

Hardware External: Generated by I/O devices (keyboard press, timer tick). Hardware Internal (Traps): Generated by CPU errors (division by zero, invalid opcode). Software Interrupts: Triggered by instructions (INT 21h in DOS, SVC in ARM) for system calls. Non-Maskable Interrupt (NMI): Cannot be disabled, used for critical events (power failure, memory parity error). Maskable Interrupt: Can be enabled/disabled via the IF flag in PSW, used for peripheral devices. The CPU handles interrupts by saving context (PC, PSW), executing the ISR, then restoring context via RTI.

SA7: Explain the flags CF, ZF, SF, and OF with one example each.

CF (Carry Flag): Set when unsigned arithmetic produces a carry/borrow. Example: 0xFF + 0x01 = 0x00 with CF=1. ZF (Zero Flag): Set when the result is zero. Example: 5 โˆ’ 5 = 0, ZF=1. SF (Sign Flag): Set when the result's MSB is 1 (negative in signed representation). Example: 3 โˆ’ 5 = โˆ’2, SF=1. OF (Overflow Flag): Set when signed arithmetic exceeds the representable range. Example: 0x7F + 0x01 = 0x80 (8-bit: +127 + 1 = โˆ’128), OF=1. CF and OF are independent โ€” CF checks unsigned overflow, OF checks signed overflow.

SA8: What is autoincrement addressing mode? Where is it used?

In autoincrement addressing, the effective address is the content of a specified register, and after the operand is accessed, the register is automatically incremented by the operand size (e.g., +1 for bytes, +4 for 32-bit words). Formula: EA = [R]; then R โ† R + 1. This eliminates the need for a separate increment instruction when traversing arrays or sequential data structures. It is widely used in loop-based array processing, string operations, and stack implementations (POP uses autoincrement). ARM supports post-increment addressing: LDR R0, [R1], #4 loads from [R1] then adds 4 to R1.

Section H

Long Answer Questions (3)

๐Ÿ“‹ LA1: ARM vs Intel โ€” A Comprehensive Architecture Case Study

Question: Compare ARM (RISC) and Intel x86 (CISC) architectures across at least 10 parameters. Analyse why ARM dominates smartphones while Intel dominates desktops. Include the recent shift with Apple M-series and Qualcomm Snapdragon X Elite.

Answer:

1. Historical Context: ARM was designed in 1985 by Acorn Computers (UK) for low-power embedded use. Intel x86 was designed in 1978 for general-purpose computing. Their design philosophies diverged fundamentally: ARM prioritized simplicity and power efficiency; x86 prioritized backward compatibility and computational power.

2. Architecture Comparison:

ParameterARM (RISC)Intel x86 (CISC)
Instruction Set~150 simple instructions~1500+ complex instructions
Instruction LengthFixed 32-bit (or 16-bit Thumb)Variable 1-15 bytes
Registers31 GP registers (AArch64)16 GP registers (x86-64)
Memory AccessLoad/Store onlyAny instruction can access memory
Pipeline EfficiencyExcellent (fixed-length)Complex (needs ยตop translation)
Power Consumption0.5โ€“5W per core15โ€“125W per package
Performance/WattIndustry-leadingImproving but behind ARM
Control UnitHardwiredMicroprogrammed
Conditional ExecutionMost instructions conditionalOnly branch instructions
Software EcosystemAndroid, iOS, embeddedWindows, Linux desktop, server

3. Why ARM Dominates Smartphones: Smartphones are battery-powered devices where power efficiency is paramount. ARM's simple instruction set means less transistor switching, lower heat generation, and longer battery life. A Snapdragon 8 Gen 3 runs at 3.3 GHz consuming only ~5W, while an Intel i7 needs 125W for similar single-threaded performance. ARM's licensing model also allows chip companies (Qualcomm, Samsung, MediaTek) to customize cores for specific needs.

4. Why Intel Dominated Desktops: The x86 ecosystem has 40+ years of software compatibility. Windows, Office, games, and enterprise software were compiled for x86. Switching would break trillions of dollars of existing software. Intel's high power consumption was acceptable because desktops/laptops have wall power and active cooling.

5. The 2020s Shift โ€” Apple M-Series: Apple's M1 (2020) proved ARM can match or beat Intel in laptops. The M4 (2024) delivers desktop-class performance at laptop power levels. Apple achieved this by: designing custom ARM cores (not off-the-shelf), integrating CPU/GPU/Neural Engine on one chip (SoC), using TSMC's advanced 3nm process, and optimizing macOS for ARM.

6. Qualcomm Snapdragon X Elite: Qualcomm brought ARM to Windows PCs in 2024. Running Windows via emulation (for x86 apps) and native ARM apps, it delivers competitive performance at a fraction of Intel's power. This threatens Intel's last stronghold โ€” the PC market.

7. India Connect: Every smartphone in India runs ARM. India's SHAKTI (IIT Madras) and VEGA (C-DAC) processors use RISC-V, the open-source cousin of ARM. India's semiconductor mission aims to manufacture ARM-based chips domestically by 2028.

๐Ÿ“‹ LA2: Design a CPU Datapath for 3-Address Instructions

Question: Design a complete CPU datapath that can execute 3-address register-to-register instructions of the form OP RD, RS1, RS2. Include the register file, ALU, control signals, and show the data flow for ADD R3, R1, R2.

Answer:

Datapath Components:

1. Instruction Register (IR): Holds the current instruction. Fields: Opcode (6 bits), RD (3 bits), RS1 (3 bits), RS2 (3 bits).

2. Register File: 8 registers (R0โ€“R7), two read ports (Port A, Port B) and one write port (Port W). Port A outputs R[RS1], Port B outputs R[RS2], Port W writes to R[RD].

3. ALU: Takes two inputs (from read ports), performs operation based on Opcode, produces Result and Flags.

4. Control Unit: Decodes Opcode and generates: ALU_OP (selects ALU function), RegWrite (enables write to register file), FlagWrite (enables PSW update).

DATAPATH
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚          INSTRUCTION REGISTER            โ”‚
  โ”‚  [Opcode|  RD  |  RS1  |  RS2  ]        โ”‚
  โ”‚   6 bits  3 bits  3 bits  3 bits         โ”‚
  โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
       โ”‚       โ”‚       โ”‚        โ”‚
       โ–ผ       โ”‚       โ–ผ        โ–ผ
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ Control โ”‚  โ”‚  โ”‚   REGISTER FILE  โ”‚
  โ”‚  Unit   โ”‚  โ”‚  โ”‚  Read Port Aโ†RS1 โ”‚โ”€โ”€โ†’ Bus A
  โ”‚         โ”‚  โ”‚  โ”‚  Read Port Bโ†RS2 โ”‚โ”€โ”€โ†’ Bus B
  โ”‚ ALU_OP  โ”‚  โ”‚  โ”‚  Write Portโ†RD   โ”‚โ†โ”€โ”€ Result Bus
  โ”‚ RegWriteโ”‚  โ”‚  โ”‚  Write Enable    โ”‚โ†โ”€โ”€ RegWrite
  โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”˜  โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
       โ”‚       โ”‚           โ”‚         โ”‚
       โ”‚       โ”‚      Bus Aโ”‚    Bus Bโ”‚
       โ–ผ       โ”‚           โ–ผ         โ–ผ
       โ”‚       โ”‚     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
       โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ†’โ”‚      A L U      โ”‚
       โ”‚       โ”‚     โ”‚  (ALU_OP input)  โ”‚
       โ”‚       โ”‚     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
       โ”‚       โ”‚             โ”‚ Result
       โ”‚       โ”‚             โ–ผ
       โ”‚       โ”‚     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
       โ”‚       โ”‚     โ”‚  FLAGS (PSW)  โ”‚
       โ”‚       โ”‚     โ”‚ CF ZF SF OF   โ”‚
       โ”‚       โ””โ”€โ”€โ”€โ”€โ”€โ”ค  Result Bus   โ”‚โ”€โ”€โ†’ back to Register File
       โ”‚             โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Data Flow for ADD R3, R1, R2:

  1. IR contains: Opcode=ADD, RD=011(R3), RS1=001(R1), RS2=010(R2)
  2. Control Unit decodes ADD โ†’ sets ALU_OP=ADD, RegWrite=1, FlagWrite=1
  3. Register File Read Port A outputs R1 value onto Bus A
  4. Register File Read Port B outputs R2 value onto Bus B
  5. ALU receives Bus A and Bus B, performs addition, outputs Result
  6. Flags (CF, ZF, SF, OF) are updated based on the addition result
  7. Result Bus carries the sum back to Register File Write Port
  8. R3 is updated with the ALU result (since RD=011 selects R3 and RegWrite=1)

๐Ÿ“‹ LA3: Comprehensive Addressing Modes with EA Calculations

Question: Given the following machine state, calculate the effective address and operand value for all 8 addressing modes:

Registers: R1=500, R2=100, PC=3000. Memory: M[200]=500, M[400]=700, M[500]=800, M[600]=42, M[700]=99, M[800]=55, M[3050]=25.

Instruction address field = 200. Register field points to R1 (value 500).

Answer:

ModeEA FormulaEA CalculationEAOperand
ImmediateOperand = Addr fieldOperand = 200N/A200
DirectEA = AddrEA = 200200M[200] = 500
IndirectEA = M[Addr]EA = M[200] = 500500M[500] = 800
RegisterOperand = R1Operand = R1 = 500N/A500
Reg. IndirectEA = [R1]EA = R1 = 500500M[500] = 800
AutoincrementEA = [R1]; R1++EA = 500; R1โ†’501500M[500] = 800
DisplacementEA = Addr + [R2]EA = 200 + 100 = 300300M[300] (not given)
RelativeEA = PC + AddrEA = 3000 + 50 = 30503050M[3050] = 25

Key Observations:

  • Immediate and Register modes don't access memory for the operand โ€” fastest
  • Indirect mode requires two memory accesses โ€” slowest
  • Register Indirect and Indirect can produce the same EA if the register and memory location hold the same value
  • Autoincrement has a side effect โ€” modifying R1 for the next instruction (useful for array traversal)
  • Relative addressing makes the code position-independent โ€” the target moves with the code
Section I

Industry Spotlight — A Day in the Life

๐Ÿ‘จโ€๐Ÿ’ป Deepak Verma, 31 — CPU Verification Engineer at Samsung Semiconductor, Noida

Background: B.Tech in Electronics from MNNIT Allahabad (2015). No GATE coaching, no IIT background. Joined as a fresher at a small Noida VLSI startup. Self-taught SystemVerilog and UVM during evenings. Moved to Samsung Semiconductor R&D after 3 years.

A Typical Day:

9:00 AM โ€” Team standup. Review overnight regression results. 3 out of 1,200 test cases failed on the ARM Cortex-A core being verified.

10:00 AM โ€” Debug failing test case #847: a corner case in the Load-Store Unit where back-to-back PUSH operations with interrupts cause a pipeline stall that isn't handled correctly.

12:00 PM โ€” Write a new SystemVerilog assertion to catch this bug in future regressions. Run targeted simulation on the modified RTL.

1:30 PM โ€” Lunch at Samsung's Noida campus cafeteria. Discuss ARM Cortex-X4 micro-architecture with the design team.

2:30 PM โ€” Write coverage analysis report: 94.7% code coverage, 89.3% functional coverage. Identify 12 uncovered scenarios related to interrupt nesting.

4:30 PM โ€” Review a colleague's UVM testbench for the branch prediction unit. Suggest improvements to constrained-random stimulus generation.

6:00 PM โ€” Learning hour: study ARM Architecture Reference Manual (ARM ARM) for ARMv9 security extensions (Realm Management Extension). Samsung is implementing this for the next Galaxy flagship's chip.

DetailInfo
Tools Used DailySystemVerilog, UVM, Synopsys VCS, Verdi (waveform debugger), ARM Fast Models, Git, Jira
Entry Salary (2024)โ‚น6โ€“9 LPA + benefits
Mid-Level (3โ€“5 yrs)โ‚น12โ€“20 LPA
Senior (7+ yrs)โ‚น25โ€“45 LPA
Companies Hiring (India)Samsung Semiconductor, Qualcomm Hyderabad, Intel Bangalore, AMD Hyderabad, Texas Instruments, MediaTek Noida, ARM India, Synopsys, Cadence, NXP
Required SkillsVerilog/SystemVerilog, UVM, CPU architecture knowledge, ARM/RISC-V ISA, digital design fundamentals
Deepak's advice to students: "You don't need to be from IIT to work in chip design. I learned UVM from YouTube (Verification Academy channel) and practiced on EDA Playground (free online simulator). Understanding COA concepts — register organization, instruction formats, pipelining, interrupts — is the foundation. Every interview I've had asked COA questions."
Section J

Earn With It — Embedded Projects & CPU Design

๐Ÿ’ฐ Your Earning Path After This Chapter

Portfolio Piece: A documented ISA design (Tier 3 lab) + instruction format comparison + flag tracing analysis. This demonstrates CPU architecture knowledge to employers.

Beginner Gig Ideas:

โ€ข Arduino/ESP32 embedded programming projects โ€” โ‚น3,000โ€“โ‚น10,000/project

โ€ข Assembly language tutoring for B.Tech students โ€” โ‚น500โ€“โ‚น1,000/session

โ€ข Technical content writing (COA topics for edtech platforms) โ€” โ‚น1,500โ€“โ‚น5,000/article

โ€ข FPGA project implementation for final-year students โ€” โ‚น5,000โ€“โ‚น15,000/project

OpportunityPlatformEarning Potential
Embedded Systems FreelancingUpwork, Freelancer, Fiverrโ‚น5,000โ€“โ‚น25,000/project
FPGA/Verilog ProjectsDirect college outreachโ‚น5,000โ€“โ‚น15,000/project
ARM Assembly TutoringSuperprof, Chegg, localโ‚น500โ€“โ‚น1,500/hour
Technical Blog WritingMedium, GeeksforGeeks, EduArthaโ‚น1,500โ€“โ‚น5,000/article
Raspberry Pi ProjectsIoT project contractsโ‚น3,000โ€“โ‚น12,000/project
VLSI InternshipsSamsung, Qualcomm, Intel via Internshalaโ‚น15,000โ€“โ‚น40,000/month
Fastest path to earning: Learn ARM assembly on Raspberry Pi (โ‚น3,500 for Pi 4). Build 3 embedded projects (LED matrix, sensor data logger, motor controller). Document them on GitHub. Apply to embedded systems internships on Internshala/LinkedIn. Students with GitHub portfolios showing real hardware projects get 3ร— more interview calls than those with only theoretical knowledge.
Section K

Chapter Summary

๐Ÿ“ Key Concepts Covered in Unit 4

1. General Register Organization: CPU with R0โ€“R7 register file, MUX A/B for source selection, ALU for computation, output bus for result writeback. 14-bit control word: SELA(3) + SELB(3) + SELD(3) + OPR(5).

2. Stack Organization: LIFO structure managed by Stack Pointer (SP). PUSH: SPโˆ’โˆ’, M[SP]โ†DR. POP: DRโ†M[SP], SP++. Used in expression evaluation (postfix/RPN), subroutine calls, and interrupt handling.

3. Instruction Formats: 3-address (OP D,S1,S2), 2-address (OP D,S), 1-address (accumulator), 0-address (stack). Trade-off: fewer addresses โ†’ shorter instructions but more of them.

4. Addressing Modes (8): Immediate, Direct, Indirect, Register, Register Indirect, Autoincrement, Displacement, Relative. Each offers different EA calculation, speed, and flexibility trade-offs.

5. RISC vs CISC: RISC = small instruction set, fixed-length, load/store, many registers, efficient pipelining (ARM, MIPS). CISC = large instruction set, variable-length, memory-to-memory, complex decode (x86). Modern Intel CPUs internally convert CISC to RISC ยตops.

6. Data Transfer & Manipulation: LOAD, STORE, MOV, PUSH, POP (transfer). ADD, SUB, MUL, DIV (arithmetic). AND, OR, XOR, NOT (logical). SHL, SHR, ROL, ROR (shift/rotate).

7. Program Control & Interrupts: JMP, CALL, RET for control flow. Interrupts: hardware (external/internal), software, maskable, non-maskable. Priority schemes: daisy chain, parallel, software polling.

8. PSW/Flags: CF (carry), ZF (zero), SF (sign), OF (overflow), IF (interrupt enable). CF detects unsigned overflow; OF detects signed overflow. They are independent.

Key Formulas for Quick Revision

FormulaDescription
Control word bits = 3(SELA) + 3(SELB) + 3(SELD) + 5(OPR) = 14For 8-register organization
Register select bits = logโ‚‚(N)N = number of registers
EA (Direct) = Address fieldOne memory access
EA (Indirect) = M[Address field]Two memory accesses
EA (Displacement) = Addr + [R]Base + offset
EA (Relative) = PC + offsetPosition-independent code
OF = 1 when sign(A) = sign(B) โ‰  sign(Result)Signed overflow detection
Section L

Earning Checkpoint

SkillTool / MethodPortfolio ArtifactCan You Earn?
General Register OrganizationPen & paper / diagramsAnnotated register org diagramโœ… Yes โ€” interview preparation asset
Stack Operations & Expression EvalManual tracing / PythonStack trace tables for complex expressionsโœ… Yes โ€” tutoring & content writing
Instruction Formats (3/2/1/0-addr)Manual conversionInstruction format comparison documentโœ… Yes โ€” educational content creation
Addressing Modes (All 8)EA calculation practiceAddressing modes cheat sheetโœ… Yes โ€” GATE coaching material
RISC vs CISC ComparisonResearch & analysisARM vs Intel comparison reportโœ… Yes โ€” technical blog writing
Assembly/Embedded ProgrammingARM on Raspberry Pi3 embedded projects on GitHubโœ… Yes โ€” โ‚น3,000โ€“โ‚น12,000/project
Interrupt HandlingConceptual + codingInterrupt priority simulationโฌœ Not yet โ€” need practical exposure
Flag/PSW TracingManual bit-level tracingFlag trace examples documentโœ… Yes โ€” tutoring sessions
Minimum Viable Earning Setup after this chapter: An understanding of CPU architecture + ARM assembly basics on Raspberry Pi + 2โ€“3 embedded projects on GitHub = ready for embedded systems internships at โ‚น15,000โ€“โ‚น40,000/month while still in college. Companies actively hiring: Samsung, Qualcomm, Texas Instruments, NXP.

โœ… Unit 4 complete. Ready for Unit 5: Pipeline & Vector Processing!

[QR: Link to EduArtha video tutorial โ€” Central Processing Unit]