DDR Protocol Interview Questions for VLSI Interviews

I’ve compiled these 40 DDR memory interview questions from real silicon validation and design interviews over the past decade. If you’re interviewing for a memory IP role, a controller design position, or a validation job, you’ll see variations of almost every question here. The key isn’t memorizing answers—it’s understanding the physics and the practical constraints that drive every design decision in DRAM.

💡 This is for: Silicon engineers, hardware designers, and IC validation engineers interviewing for memory IP, controller, or PHY roles at companies like Samsung, SK Hynix, Micron, Qualcomm, Apple, or in-house ASIC teams.

Table of Contents

Quick Navigation

Section 1: DDR Fundamentals (Q1–Q10)
Section 2: DDR Timing Parameters (Q11–Q20)
Section 3: DDR Controller & PHY (Q21–Q30)
Section 4: Advanced DDR Topics (Q31–Q40)
Most-Asked Topics by Company

Section 1: DDR Fundamentals (Q1–Q10)

Q1. How does a DRAM cell work?

A DRAM cell is a single capacitor (which stores charge) connected to a transistor (which controls access). To read the cell, you assert the wordline, and the transistor conducts, allowing the capacitor to charge or discharge the bitline. To store a bit, you charge the capacitor (1) or leave it discharged (0).

In my experience, the critical insight interviewers want is why DRAM needs refresh: the capacitor leaks charge over time (microseconds to milliseconds, depending on temperature and process), so without periodic refresh cycles, stored data vanishes. That’s why DDR includes automatic refresh mechanisms and why LPDDR5 introduced per-bit refresh (REFpb) to optimize power. Here’s the basic cell structure:

     Wordline (WL)
            |
            | T (transistor)
            |
    -----+-----
    |    Bitline
    |      (BL)
    C (capacitor)
    |
    -----+-----
         GND

📌 Note: Modern DRAM cells (6F² footprint) are at the physical limit of what’s manufacturable. Refresh is now the dominant power consumer in LPDDR, which is why Samsung, SK Hynix, and Micron are all investing in alternative refresh schemes.

Q2. Why is DDR called “double data rate”?

DDR transfers data on both the rising and falling edge of the clock, while SDR (Single Data Rate) transfers only on the rising edge. This effectively doubles the data throughput without doubling the clock frequency.

Practically speaking: if your memory clock is 200 MHz, DDR transfers data twice per clock cycle (on rising and falling edges), so the effective data rate is 400 Msamples/second. That’s why DDR3-1600 means 1600 MT/s (megatransactions per second) but the actual memory clock is only 200 MHz. Here’s the timing:

Clock:     __|‾|_|‾|_|‾|_|‾|_|‾|
DQ_SDR:    ______|X_X|_X_X|_X_X|
DQ_DDR:    ______|X|X|X|X|X|X|X|  (double data on each cycle)

💡 Tip: Interviewers often follow this with “how is this different in DDR5?” The answer: DDR5 adds burst chop, where you can send 8 data words (instead of 16) in a burst, but the underlying mechanism is still DDR on the clock edges.

Q3. What is the evolution from DDR1 to DDR5? Key improvements?

Each DDR generation doubles speed, lowers voltage, and adds architectural features. Here’s the progression:

Generation	Clock (MHz)	Data Rate (MT/s)	Voltage	Key Features
DDR1	100–200	200–400	2.5V	Parallel CA, BL8
DDR2	200–400	400–800	1.8V	Prefetch 4n, ODT, Dual DLL
DDR3	400–800	800–1600	1.5V	Prefetch 8n, DQS, CAS lock
DDR4	800–1200	1600–2400	1.2V	Bank groups, write leveling, per-byte ODT
DDR5	1200–1600	2400–3200	1.1V	On-die ECC, sub-channels, DFE, burst chop

📌 Note: What interviewers actually care about: each voltage drop saves power exponentially (lower V² in switched capacitance). The burst length increase (4n→8n→16n) reflects higher latency tolerance but also more flexibility in burst chop.

Q4. DDR4 vs DDR5—what are the key differences?

DDR5 is not just a faster DDR4. It’s a fundamentally different approach to power and error correction:

Voltage: DDR4 is 1.2V, DDR5 is 1.1V—that’s a 8% drop. Burst length: DDR4 BL8 fixed, DDR5 BL16 with burst chop (can do BL8 too). Banks: DDR4 has 16 banks, DDR5 introduces 4 bank groups per sub-channel, for a total of 32 banks—better independent access. On-die ECC: This is huge. DDR5 includes SECDED (single-error correct) on each 128-bit word right on the die. No more relying on system-level ECC. Sub-channels: DDR5 splits a 64-bit channel into two 32-bit sub-channels, each with independent refresh and power management. From a design perspective, this changes the entire controller architecture—you’re managing two smaller channels instead of one wide one.

💡 Tip: If the interviewer is at Samsung or SK Hynix, they’ll ask follow-ups about why sub-channels matter. The answer: independent refresh scheduling, independent power down, and parallel access to both channels. This is what makes DDR5 scale better than DDR4 at the same clock frequency.

Q5. What is a memory rank? A channel? A bank? A bank group?

These are hierarchical levels of organization in a DRAM system. Let me break it down from largest to smallest:

Channel: The highest level—a complete, independent 64-bit (or 32-bit) interface to a group of DIMMs. A typical system has 2–4 channels. Each channel can be accessed in parallel, so 2 channels = 2× potential bandwidth. Rank: A rank is a group of DIMMs (or chips) that share the same command/address bus but have separate chip selects. You can interleave commands between ranks to improve throughput. Bank: Within a rank, there are 16 (DDR4) or 32 (DDR5 effective) independent banks. You can have one bank open (row activated) while reading from another, hiding row precharge latency. Bank group: DDR4 and later split banks into groups to avoid conflicts. With bank groups, you can switch banks more flexibly without stalling the pipeline.

📌 Note: Interviewers expect you to know that a DIMM is not a rank. A DIMM can contain multiple ranks, and modern server DIMMs (RDIMMs, LRDIMMs) have a register or buffer that further complicates rank timing.

Q6. What is LPDDR? How does LPDDR5 compare to DDR5 for mobile?

LPDDR (Low-Power DDR) is optimized for mobile and embedded, where power is the constraint, not bandwidth. LPDDR uses 1.6V in LPDDR4, dropping to 1.1V in LPDDR5 (same as DDR5). But the architecture is different: LPDDR typically uses a 16-bit or 32-bit interface (not 64-bit), lower refresh rates, and features like DVFS (Dynamic Voltage/Frequency Scaling).

LPDDR5 introduces per-bit refresh (REFpb), which reduces the number of cells you need to refresh simultaneously, slashing refresh power consumption—critical for mobile. It also has write cooling (DQ bus stays idle longer between writes) and bank pause (selectively disable refresh in quiet banks). DDR5, by contrast, is designed for data centers and high-performance systems where bandwidth and performance consistency matter most, and absolute power is less critical. In my experience, if you’re interviewing for Apple, Samsung mobile, or Qualcomm, they’ll grill you on LPDDR5 specifically.

Q7. What is burst length and prefetch architecture?

When you issue a read command, the DRAM doesn’t return just one word—it returns a burst of data. Burst length (BL) is how many words come out in one command. Prefetch tells you how many internal transfers are needed to produce that burst.

In DDR4, BL8 is standard. That means 8 words come out, but internally, the core runs at half speed, so it takes 8 internal memory cycles to fetch 8 words, which at double data rate becomes 4 clock cycles on the interface. DDR5 uses BL16 with burst chop—you can command BL8 or BL16, and the chip internally produces 16 words but you can stop after 8. This flexibility helps controllers optimize latency vs throughput.

💡 Tip: A common mistake: confusing burst length with prefetch. BL is what the user sees; prefetch is internal. DDR3 is 8n prefetch (8 internal transfers for every command), DDR4 is also 8n, but DDR5 increases it to 16n internally to support higher data rates.

Q8. DIMM types—UDIMM, RDIMM, LRDIMM—when is each used?

UDIMM (Unbuffered DIMM) is the simplest: each chip connects directly to the module traces. Good for low-cost systems (laptops, desktops). Voltage regulation happens on the board. RDIMM (Registered DIMM) adds a register on the address/command bus, slowing it slightly but allowing longer traces and more reliable signals in server systems. The register isolates the controller from the capacitive load of the chips. LRDIMM (Load-Reduced DIMM) goes further: a buffer chip isolates both address and data buses, allowing even more DIMMs per channel without signal integrity collapse. LRDIMMs are server-only and expensive.

In practice: UDIMMs for consumer/edge, RDIMMs for enterprise servers, LRDIMMs for hyperscale (Google, AWS, etc.). If you’re interviewing at a hyperscaler, expect questions about rank multiplication and how LRDIMMs reduce the effective rank loading on the bus.

Q9. What is the DDR naming convention?

There are two ways to name DDR: DDR speed tier (e.g., DDR4-3200) refers to the data rate in MT/s. JEDEC speed (e.g., PC4-25600) refers to the bandwidth in MB/s. They’re related: DDR4-3200 means 3200 MT/s. With 64-bit (8 bytes), that’s 3200 × 8 = 25,600 MB/s, hence PC4-25600.

Breakdown: DDR4-3200 = DDR4 generation, 3200 MT/s. PC4 = DDR4 (generation 4). 25600 = 25,600 MB/s. The timing part (CL16-18-18-38) describes tCL-tRCD-tRP-tRAS, which we’ll cover in the next section.

Q10. How do you calculate DDR bandwidth?

Bandwidth = Data Rate (MT/s) × Bus Width (bytes). Formula: BW = DR × BW_bytes = (Freq_MHz × 2) × BW_bytes.

Example: DDR4-3200 with 64-bit interface. Data rate = 3200 MT/s. Bus width = 64 bits = 8 bytes. Bandwidth = 3200 × 8 = 25,600 MB/s = 25.6 GB/s. In a dual-channel system, 2 × 25.6 = 51.2 GB/s. This is why high-frequency DDR is so critical for GPUs and data centers—a single-channel connection becomes the bottleneck.

📌 Note: Actual bandwidth depends on access patterns and refresh overhead. A well-optimized sequential access can nearly saturate the channel, but random access with precharge delays will hit only 60–70% of peak.

Section 2: DDR Timing Parameters (Q11–Q20)

Q11. What are the “big 4” timing parameters?

The four most important DDR timings are: tCL (CAS Latency), tRCD (RAS-to-CAS Delay), tRP (RAS Precharge), tRAS (RAS Active Time). They define the sequence for accessing a row and reading/writing data.

Here’s the sequence: You issue an ACTIVATE command (assert RAS), which opens a row. You must wait tRCD clock cycles before you can issue a READ/WRITE command (assert CAS). You must wait tCL cycles after the READ command before data appears on the bus. When you’re done with the row, you must wait tRP cycles after PRECHARGE before you can open a different row in the same bank. The minimum time the row must stay open is tRAS, which is roughly tRCD + tCL + some margin. Here’s a timing diagram:

Command:   ACT___|‾‾‾‾‾|RD_|‾‾‾|PRE_|‾‾‾‾‾|
             tRCD  tCL             tRP
Data:          __________|X_X_X_X|________

Row:       |‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾|___| tRAS min

💡 Tip: What interviewers actually test: understanding that tRAS is not tRCD + tCL. It’s the absolute minimum row-active time, and violating it (keeping a row open too long without activating another) can cause cell damage due to leakage. It’s also a power constraint—the longer a row is open, the more power it consumes.

Q12. What is CAS latency? Why does it matter for performance?

CAS latency (tCL) is the number of clock cycles between a READ command and when data appears on the bus. In DDR4-3200 at CAS16, that’s 16 clock cycles × (1000 MHz / 1600 MHz) ≈ 10 nanoseconds. It’s a hard latency you can’t hide.

In my experience, here’s what people miss: CAS latency is less about raw bandwidth and more about responsiveness. A single-threaded application that issues one memory request and waits stalls for the full tCL. But a well-threaded server application that has 100+ outstanding requests can hide this latency by overlapping operations. That’s why server DRAM optimizes for throughput (wider buses, more parallelism) while mobile optimizes for latency (lower CL for responsiveness).

Q13. What is tRFC? Why does it increase with density?

tRFC (Refresh-to-RAS Cycle Time) is the minimum time you must wait after a refresh command before you can issue the next ACTIVATE (row open) command. In DDR3, tRFC is typically 90 ns. In DDR4, it’s 260 ns. In DDR5, it’s even longer. Why? Refresh writes to an entire row, and larger rows (higher density) take longer to refresh completely.

Here’s the physics: a refresh operation briefly opens all rows in a bank and refreshes every cell simultaneously. The larger the chip capacity, the more cells to refresh, the longer it takes. In high-density DDR5 (16Gb+), tRFC can exceed 400 ns, which is 400+ clock cycles—a huge penalty if you’re switching between banks frequently. This is why Micron and others optimize refresh scheduling: idle banks don’t need refresh in parallel; staggering refresh reduces peak current.

Q14. What do tWR, tRTW, and tWTR govern?

tWR (Write Recovery) is how long you must wait after a WRITE before you can PRECHARGE the row. tRTW (Read-to-Write) is the gap between read data and write command on the same DQ bus—it’s about not colliding on the output driver. tWTR (Write-to-Read) is the gap between write command and read command, allowing the DQ bus to change direction.

In a controller, these are critical for scheduling. If you issue a read, the data driver is active for multiple cycles. You can’t drive the bus immediately after, so tRTW stalls. Similarly, tWTR prevents read and write commands from thrashing the bus direction. Modern controllers schedule around these transparently, but if you violate them, data gets corrupted.

Q15. How do you interpret DDR timing notation?

The notation CL16-18-18-38 for DDR4-3200 means: tCL=16, tRCD=18, tRP=18, tRAS=38 (all in clock cycles). The timings are always spec’d in cycles, not nanoseconds, because the actual time depends on the frequency. DDR4-3200 at 1600 MHz means each cycle is 0.625 ns, so tCL=16 × 0.625 = 10 ns.

Tighter timings (lower numbers) mean faster access but harder to achieve (requires better signaling, more power). Looser timings (higher numbers) are more forgiving. A DDR4-3200 with CL16 is tighter and faster than CL18; it’ll have slightly lower latency but might cost more or run hotter.

Q16. How does DRAM refresh work? What is REFpb?

Every DRAM cell must be refreshed periodically (typically every 64 ms per JEDEC spec) to restore charge lost to leakage. The standard approach is auto-refresh: the controller issues a REFRESH command, which refreshes an internal row address, incrementing it each time. After 8192 refresh commands, every row has been refreshed. For an 8Gb DDR4, that’s ~13 microseconds per refresh cycle at 3200 MT/s—a noticeable latency hit.

LPDDR5 introduces REFpb (Refresh Per-Bit), which refreshes only the bits in a row that actually need it, based on a sensor. This dramatically reduces the number of cells refreshed simultaneously, cutting refresh power. It’s a game-changer for battery life but requires on-die refresh control.

📌 Note: Interviewers at Micron or SK Hynix will follow up with “how do you handle refresh scheduling in a controller?” The answer: interleave refresh with regular commands, use rank/bank refresh to hide it, or accept a small throughput hit for correctness.

Q17. What is tRAS_min and why does violating it damage the cell?

tRAS_min is the absolute minimum time a row must stay open (activated). If you precharge too quickly, the cell data gets corrupted. Why? When a row is open, the wordline is asserted, keeping the pass transistor turned on. This allows the capacitor to equilibrate with the bitline. If you precharge the row before this equilibration completes, the charge distribution is incorrect, and subsequent reads return wrong data.

In modern DRAM, tRAS_min is set conservatively (well above the minimum equilibration time) to account for process variation and temperature. Violating it doesn’t instantly destroy the cell, but it corrupts that access and future accesses to adjacent cells (if they’re on the same bitline segment). It’s a silent data corruption risk that’s hard to debug.

Q18. What is rank-to-rank switching overhead?

When you switch from a command to Rank 0 to a command to Rank 1, the address/command bus and chip select signals must change. There’s a timing window where the bus is in transition. To be safe, controllers insert tRRD (Rank-to-Rank Delay) between the commands—typically 2–4 cycles. This is overhead that interleaved systems (multiple ranks active) must manage.

In a well-designed controller, you can schedule commands to Rank 0 and Rank 1 in parallel (via separate chip selects), but there’s always some minimum gap. DDR4 uses tRRD_L (Long, same bank group) and tRRD_S (Short, different bank group), making it more flexible.

Q19. What is tAA? How do you calculate it?

tAA (Absolute Address-to-Access time) is the latency from when you issue an ACTIVATE command to when data appears on the DQ bus, measured in nanoseconds rather than cycles. It accounts for the clock frequency.

Formula: tAA = (tRCD + tCL) × (1000 / Freq_MHz). Example: DDR4-3200 (1600 MHz), tRCD=18, tCL=16. tAA = (18 + 16) × (1000 / 1600) = 34 × 0.625 = 21.25 ns. This is the absolute latency your system must tolerate. If your processor expects sub-20ns latency, DDR4-3200 won’t meet it; you’d need DDR5 or a smaller tCL.

Q20. Open vs closed page policy—what are the trade-offs?

Open-page policy keeps a row open after a read/write, hoping the next command accesses the same row (a row hit). If it does, you save tRCD latency. If it doesn’t (a row miss), you pay tRP before opening the new row. Closed-page policy precharges immediately after every access, eliminating row misses but paying tRP every time.

In my experience, open-page works better for sequential access (where row hits are common) and closed-page works better for random access (where row hits are rare). Most modern controllers use a hybrid: keep the page open for a window of time, then precharge speculatively if no new command arrives. High-performance systems tend toward open-page because average-case latency is better.

Section 3: DDR Controller & PHY (Q21–Q30)

Q21. What is the DFI interface?

DFI (DDR PHY Interface) is the standardized interface between the DDR controller and the PHY (physical layer). It abstracts away electrical details—the controller issues logical commands (READ, WRITE, ACTIVATE) and the PHY handles electrical signals (drive strength, slew rate, termination).

Key signals on DFI: ca (command/address), cs_n (chip select), dq (data), dqs (data strobe), clk (clock). The PHY also handles calibration (write leveling, read leveling, ZQ) and reports status back to the controller. Here’s a simplified block diagram:

[Controller]
    |
    | DFI (logical commands: read_en, write_en, cas, ras, etc.)
    |
[PHY]
    | ca/cs/clk/dq/dqs (electrical signals)
    |
[DRAM Chips]

📌 Note: DFI is standardized by JEDEC (DFI 3.1 for DDR4, DFI 4.0 for DDR5). Major tools (Xilinx MIG, Synopsys DDRX) use DFI as the interface. If you’re building a custom controller, DFI compliance ensures your PHY will work with off-the-shelf IPs.

Q22. What is write leveling? What does it compensate for?

Write leveling aligns the write clock (going from controller to DRAM) with the data (DQ) signal. Due to PCB trace delays, the clock and data might arrive at different times. Write leveling forces the DRAM to feedback data aligned to its input clock, allowing the controller to measure and adjust clock delays.

In practice: the PHY shorts DQ to DQS (data strobe) on the DRAM side, issues a WRITE command, and observes the loopback. By varying clock phase shifts in the PHY, it finds the clock delay that aligns DQ and DQS. This is done during initialization (boot) and periodically during runtime if clock skew changes (temperature, voltage drift).

Q23. What is read leveling?

Read leveling (also called gate training or read eye training) aligns the read clock and data strobe (DQS) coming from the DRAM. The DRAM sends DQS transitions with each read, but trace delays mean DQS might arrive early or late relative to the controller’s sample clock. Read leveling adjusts the DQS phase (or sample clock phase) to center the eye—the window where you can safely sample DQ.

The algorithm: the controller issues repeated reads and shifts the DQS phase, measuring data validity each shift. It finds the window where data is correct and centers the phase in that window, maximizing timing margins.

Q24. What is ZQ calibration? When does it run?

ZQ calibration measures the on-die termination (ODT) impedance and adjusts it to match a reference resistor (usually 240Ω for DDR3/4, 120Ω for DDR5). The DRAM includes a small resistor on-die, and the controller applies a calibration command (ZQCL for long-term, ZQCS for short-term) that measures ODT strength and tunes it.

It runs at initialization and periodically during operation (every few milliseconds) because temperature and voltage affect ODT strength. Without ZQ calibration, signal reflections and ringing degrade signal integrity, causing bit errors.

Q25. What is ODT (On-Die Termination)? Why was it introduced?

ODT is a pull-down resistor integrated on the DRAM die that terminates the DQ and command/address buses at the far end. Without termination, signals reflect off the unterminated line, creating ringing and overshoot. With ODT, the line is terminated, eliminating reflections.

ODT was critical for DDR2+ because clock frequencies got high enough that unterminated traces (in DDR1 eras) became problematic. DDR3 added per-byte ODT (separate termination for each DQ byte), and DDR4 added further flexibility with selective ODT (terminate only active ranks). This trades power (ODT draws current) for signal integrity.

💡 Tip: A classic interview question: “If you enable ODT on all DQ lines, what happens?” Answer: power increases (each active ODT is a current sink), and signal swing might be reduced (less voltage headroom). You must balance signal integrity vs power. Smart controllers enable ODT only when needed.

Q26. What is the DQ/DQS relationship? How does the controller use DQS?

DQ is the data line, and DQS is the data strobe—a clock that toggles with every DQ transition. Instead of relying on a global reference clock (which has skew), the controller samples DQ synchronous to DQS. This is called forwarded-clock operation: the DRAM sends clock and data together, eliminating long-trace clock skew.

On reads, DQS comes from the DRAM; the controller aligns its sample clock to DQS (via phase shift). On writes, the controller sends DQS, and the DRAM samples DQ with DQS. This is why DQS is so critical—lose it or corrupt it, and all data corruption follows.

Q27. What does a DDR memory controller scheduler do?

The scheduler decides the order of read and write commands to optimize throughput and latency. Key decisions: bank open/close (when to precharge?), priority aging (old requests get boosted priority to prevent starvation), write-read ordering (minimize bus turnaround).

A good scheduler reduces tRAS violations, maximizes row hits, and balances reads and writes to prevent write queue overflow (which causes read latency spikes). Some schedulers use machine learning in sim (like Google’s work on datacenter DRAM scheduling), but most use heuristics: age, bank affinity, and lookahead.

Q28. What is CA training in LPDDR4/5?

CA (Command/Address) training aligns the command/address bus timing between controller and DRAM, similar to write leveling for DQ. LPDDR buses are narrower (e.g., CA[9:0]) and higher speed, making skew more problematic. CA training runs at boot and uses a loopback mechanism to measure and correct delays.

Q29. What are DDR power states?

DDR supports multiple low-power states to save energy: Active (full power, ready to access), Precharge (rows precharge automatically, minimal power), Self-Refresh (internal refresh mechanism, very low power, external clock off), Power Down (rows open, refresh disabled, lowest static power), MPSM (Maximum Power Saving Mode, DDR5-specific, even lower leakage).

The controller transitions between states based on idle time. Aggressive power gating (quick transition to Self-Refresh) saves power but increases wake-up latency. Conservative gating (longer idle before transition) reduces latency but wastes power. It’s a trade-off that depends on workload.

Q30. What are tphy_wrlat and tphy_rdlat (DFI timing parameters)?

These are DFI-level latencies that tell the controller how many clock cycles elapse between a DFI write/read command and when data appears on the PHY interface. tphy_wrlat is the delay from write_en to when write data must be present on dq. tphy_rdlat is the delay from read_en to when the PHY returns data_valid.

In simple controllers, these are 1–2 cycles. In complex PHYs with leveling and calibration, they might be 10+ cycles. The controller must account for these delays when scheduling commands and data transfers.

Section 4: Advanced DDR Topics (Q31–Q40)

Q31. What are the standout features of DDR5?

DDR5 brings three major innovations: on-die ECC (SECDED every 128 bits, no system ECC needed), sub-channels (two 32-bit channels per 64-bit DIMM, independent refresh/power), DFE (Decision Feedback Equalization, a signal processing technique to equalize channel response at high frequencies), and burst chop (flexible burst length, BL8 or BL16).

From an interviewer’s perspective, sub-channels are the big story because they change the entire controller architecture. Instead of one 64-bit channel, you’re managing two 32-bit channels with independent scheduling. This adds complexity but enables better power and performance isolation.

Q32. What is LPDDR5 DVFS?

DVFS (Dynamic Voltage/Frequency Scaling) allows the DRAM to change its operating voltage and clock frequency on the fly. LPDDR5 supports multiple frequency points (e.g., 200 MHz, 533 MHz, 1066 MHz, 2133 MHz) and corresponding voltages. The controller can scale frequency down during idle periods to save power.

Why this matters for mobile: a smartphone in standby can clock LPDDR5 down to 200 MHz, dropping power consumption dramatically. Under load, it scales up to 2133 MHz. This is much more aggressive than DDR’s fixed-frequency operation.

Q33. How do HBM, GDDR, and DDR compare architecturally?

Aspect	DDR4/5	GDDR6/6X	HBM2/3
Bandwidth (GB/s)	25–50	360–960	400–1000
Interface	64–128 bit parallel	32–64 bit, high clk	1024-bit HBM stack
Clock (MHz)	800–1600	1500–2500	500–1000
Latency	~20 ns	~25 ns	~15 ns
Power (W/Gb)	Low	High	Very low
Use Case	Servers, PCs, mobile	Gaming GPUs	AI accelerators (TPU, H100)

DDR is the workhorse for CPU systems (servers, PCs, mobile). GDDR is tuned for GPUs, prioritizing bandwidth and clock frequency over latency. HBM stacks multiple dies vertically via microbumps, achieving massive bandwidth with a small footprint. If you’re interviewing for GPU or AI chip roles, expect deeper GDDR/HBM questions.

Q34. What is memory interleaving? Why does it improve bandwidth?

Memory interleaving spreads consecutive addresses across different banks or channels so that accesses don’t serialize. Example: address 0 → Bank 0, address 1 → Bank 1, address 2 → Bank 0, etc. If you access sequentially, each bank gets time to recover (tRCD, tRP) while other banks service requests, hiding latency.

Without interleaving, sequential accesses hit the same bank, causing stalls (wait for tRCD, then tRP before next row). With interleaving, the controller can keep all banks busy, saturating the interface bandwidth. It’s why rank/channel/bank interleaving is standard in modern systems.

Q35. What is ECC DRAM? How does SECDED work?

ECC DRAM adds error correction codes (SEC=Single Error Correct, DED=Dual Error Detect) to each data word. SECDED (SEC with DED) can correct 1-bit errors and detect (but not correct) 2-bit errors. DDR5 on-die ECC uses 8 parity bits per 128-bit word, allowing single-bit correction.

Older systems used Chipkill ECC: striping data across multiple DIMMs such that one DIMM failure doesn’t lose data. DDR5’s on-die ECC is simpler (no striping needed) and catches errors early, improving reliability without system-level complexity.

📌 Note: Interviewers at data centers care deeply about ECC. Server DRAM must have ECC; even 1-bit error rates add up with billions of bits. On-die ECC (DDR5) is a huge win.

Q36. What are the top 5 DDR debug issues?

Issue	Symptoms	Root Cause
Leveling Failure	Training hangs, bootloader times out	DQS/DQ not aligned, loopback fails
Bit Errors	Memtest fails after boot	Setup/hold violation, eye center off
Refresh Timeout	Data corruption after idle, CRC errors	Refresh commands not issued, timing violation
Thermal	Random resets after 1 hour	Thermal throttling, ODT power dissipation
tRAS Violation	Intermittent errors, data ghosting	Scheduler precharges row too early

Q37. How is a DDR memory controller verified?

DDR verification typically includes: protocol checking (do you issue valid command sequences?), timing compliance (do you meet all tCL, tRCD, etc.?), self-checking testbenches (write patterns, read them back, compare), and real chip validation (on silicon with actual DRAM).

Tools like VCS, Modelsim, and nSim run directed and randomized tests. Formal verification tools (like Jasper) can check invariants (e.g., “never violate tRAS”). At tapeout, you run full-chip simulations (can take weeks) with millions of test cases. This is why DDR controllers take months to verify—the margin for error is near zero.

Q38. DDR in FPGAs vs ASICs—what’s different?

In FPGAs (Xilinx MIG, Intel EMR), the DRAM controller and PHY are generated by wizards. You specify frequency, DIMM type, and constraints; the tool generates behavioral Verilog or RTL. You then place-and-route (P&R) the generated design. The advantage: fast prototyping. The disadvantage: can’t customize beyond wizard options.

In ASICs, you design the controller from scratch (or license an IP core). You have full control over scheduling, calibration logic, and power optimization. You can implement advanced features (like per-bit deskew, multi-rank scheduling) that FPGAs can’t. If you’re at a hyperscaler (Google, Meta, AWS) or a memory vendor, you’re building custom ASIC controllers.

Q39. What is per-bit deskew? Why is it needed?

In wide DDR interfaces (64 bits or more), different DQ lines have different delays due to trace routing. Without correction, some bits arrive before others at the DRAM, violating setup time. Per-bit deskew phase-shifts each DQ line individually (via delay cells in the PHY) to align them to a common reference. This centers the read/write eye and maximizes timing margins.

It’s a power and silicon cost (delay cells, calibration logic), so narrower systems (16-bit, 32-bit) often skip it. Wide systems (64–128 bit) almost always have it.

Q40. What are DDR power optimization techniques?

Key techniques: rank-level power gating (shut off unused ranks to save leakage), partial array self-refresh (PASR, refresh only part of the DRAM array during idle), ODT scheduling (enable ODT only when needed, disable otherwise), clock gating (stop internal clocks during idle), and DVFS (scale voltage/frequency down at low load).

The trade-off is always latency vs power. Aggressive power gating (clock gating, PASR) saves power but increases wake latency. Conservative gating reduces latency but wastes power. Hyperscale datacenters often disable some power optimizations (keep rank clocks running) to prioritize latency consistency.

Most-Asked Topics by Company

Company	Focus Areas
Samsung, SK Hynix, Micron	Refresh (REFpb), on-die ECC, sub-channels, power optimization
Apple, Qualcomm	LPDDR5, DVFS, mobile power, latency optimization
Google, Meta, AWS	Controller scheduling, power/perf tradeoffs, ECC, high-density
Xilinx, Intel	MIG/EMR usage, constraints, migration (DDR4→DDR5)
NVIDIA, AMD (GPU)	HBM, GDDR, bandwidth optimization, interleaving

Resources & Further Reading

JEDEC Standards: JESD79 (DDR4), JESD80 (DDR5), JESD209 (LPDDR5) — official specs
Books: “DRAM Circuit Design” by Weste & Harris covers physical design
Tools: Synopsys DDRX, Xilinx MIG, Cadence DRAM tools for custom controllers
Validation: Micron’s tools, Samsung datasheets, real chip bring-up
Interview Prep: Solve timing constraint problems, understand refresh power tradeoffs

Last updated: 2026. Keep these concepts fresh—DDR standards evolve every 3–4 years, and interviewers love asking about the latest generation.

DFT Interview Questions and Answers for VLSI Engineers

STA Interview Questions: 52 Real-World Questions with Answers (2026)

TCL Interview Questions for VLSI Engineers