Design For Test is what separates a chip that works from a chip you can actually manufacture and verify. Every test engineer will tell you: a design that ships without robust DFT is a lawsuit waiting to happen. Whether you’re implementing scan chains, debugging ATPG coverage, or dealing with X-states in compression, understanding DFT deeply is what gets you hired at companies pushing silicon to the limits. In my experience, candidates often know the theory but freeze when asked practical questions like “how do you handle shift clock timing in a scan chain?” or “what’s the difference between EDT and MBIST?”
💡 Tip: This guide is for design engineers, DFT architects, CAD engineers, and test/validation engineers interviewing at Qualcomm, NVIDIA, Intel, TI, Synopsys, and semiconductor services firms. If you work on scan chains, ATPG, or test compression, this hits your interview hard.
Quick Navigation:
- Fundamentals (Scan, ATPG, Fault Models)
- Intermediate (Compression, MBIST, X-States)
- Advanced (Cell-Aware ATPG, Safety Standards)
- Practical & Tools (DFT Compiler, TetraMAX, Tessent)
- Interview Cheatsheet by Company
Table of Contents
Fundamentals: Scan Chains, Fault Models & ATPG
Q1. Explain scan-based testing. What is a scan chain and how does it differ from normal operation?
Scan-based testing adds a second mode to flip-flops: scan mode (test) and functional mode (normal). In scan mode, flip-flops are chained together (scan chain) so you can shift test patterns in via a serial input (SI), then capture responses. In functional mode, flip-flops operate normally, breaking the chain.
The key benefit: without scan, testing a flip-flop deep inside logic is nearly impossible—you’d need the exact sequence of inputs to reach it. With scan, you serialize all flip-flops, shift in any test values you want, run one functional clock to capture the response, then shift out the result. This is why scan is called “design for testability.” In my experience, modern designs have 80%+ of flip-flops in scan chains because the coverage benefit far outweighs the area cost. The misconception: scan mode doesn’t test the flip-flop itself; it just makes testing logic around it feasible. Actual flip-flop testing happens through separate JTAG or boundary scan testing.
Scan Mode (Shift): TCO: __|‾‾|___|‾‾|___|‾‾|___ SI: ___X_bit1_X_bit2_X_bit3_X___ SO: _____X_bit0_X_bit1_X_bit2_X__ Each clock edge shifts data: SO = previous flip-flop's Q
Q2. What are the main fault models used in ATPG? (Stuck-at, Transition, Bridging)
Stuck-at fault: a net is permanently 0 (stuck-at-0 / SA0) or 1 (stuck-at-1 / SA1). Transition fault: a net switches slower than expected (doesn’t transition within clock period). Bridging fault: two nets are shorted together, causing voltage conflicts. Stuck-at is most common and simplest to model; transition and bridging are more realistic but harder to detect.
ATPG (Automatic Test Pattern Generation) targets these faults by generating test patterns that make faults observable. For a stuck-at-1 fault on a net, ATPG tries to: (1) justify the net to opposite state (0), then (2) propagate that difference to a primary output where it’s observable. If ATPG can’t justify/propagate, the fault is undetectable. Here’s something most candidates miss: not all faults are detectable—sometimes logic structure prevents you from distinguishing a fault from correct behavior. You can only control so much from inputs, and not all effects reach outputs. This is where scan helps: by controlling flip-flop states directly, you reduce un-detectable faults.
Q3. Define fault coverage. What’s a typical coverage target, and why?
Fault coverage is the percentage of detectable faults that your test patterns actually detect. If 1000 faults exist and your test patterns detect 950, coverage is 95%. Typical targets: 90-95% for mature nodes, 95%+ for advanced nodes, sometimes 98%+ in safety-critical designs (automotive, medical).
Why we can’t reach 100%: some faults are undetectable due to logic structure, others are “redundant” (two faults produce the same failing behavior, so you can’t distinguish them). Reaching 95% is usually straightforward with good scan coverage. Getting to 98%+ requires heroic effort—adding extra logic to observe otherwise-hidden states, using multiple-capture patterns, or formal analysis. In my experience, the last 5% of coverage often costs 20% of the test pattern volume and design effort. Companies choose coverage targets based on defect risk tolerance. A microcontroller might accept 92% coverage; a safety-critical automotive chip demands 98%+. This is where knowing the application context matters in interviews.
Q4. Explain controllability and observability in DFT.
Controllability: how easily you can set a net to 0 or 1 from primary inputs (or scan). Observability: how easily you can observe a net’s value at primary outputs (or scan chain). Both are essential for testability. A net buried deep in logic with many reconverging paths has low controllability and observability.
ATPG implicitly measures both—if a net has low controllability, ATPG may struggle to justify test patterns. Observability limits which faults you can detect; if a faulty net’s effect never reaches an observable point, the fault is undetectable. In my experience, when designers say “we can’t get coverage above 85%,” it’s usually an observability problem. The fix: add additional flops in scan chains at strategic points, or use LBIST/MBIST to observe hard-to-reach areas. Here’s the key: scan chains improve both. By directly controlling flip-flop states, you increase controllability for downstream logic. By observing flip-flop values, you increase observability of upstream logic.
Q5. What is JTAG (IEEE 1149.1)? Explain the TAP controller and key instructions.
JTAG is a standard serial test interface with four pins: TCO (clock), TDI (data in), TDO (data out), TMS (mode select). The TAP controller is a state machine driven by TMS that sequences through states: reset, idle, shift-DR (shift data register), shift-IR (shift instruction register), capture, update. Key instructions: IDCODE (read chip ID), BYPASS (minimal path for daisy-chaining), EXTEST (external boundary scan), INTEST (internal scan).
JTAG enables testing without dedicated test pins—you can control scan chains, apply boundary scan, even debug silicon via the test interface. In my experience, JTAG is mandatory on modern chips. It’s used not just for production test but also for silicon debug, firmware loading, and field updates. The challenge: JTAG timing is tight—you must ensure TCO frequency doesn’t violate setup/hold on internal scan chain clocks. Also, some instructions (like EXTEST) can stress I/O drivers, so you need switching current analysis. Most candidates know JTAG conceptually but forget the practical details: how you clock the scan chain relative to TCO, how you avoid race conditions, how you synchronize test modes.
📌 Note: JTAG is daisy-chainable—multiple chips can share one test interface, with each chip’s TAP controller in series. The bypass instruction allows short-circuiting unused chips.
Q6. What is boundary scan? How does it test I/O and inter-chip connections?
Boundary scan adds a scan chain around the chip’s I/O pins. Before each I/O pad, a mux lets you either drive (in test mode) or let normal logic drive the pin. This allows testing I/O cells and board-level interconnects without functional patterns. Example: you can force an output to 0 or 1, then observe what external logic does.
Boundary scan was revolutionary for board-level test—before it, you had to physically probe pins with oscilloscopes. With boundary scan, you test interconnects purely through JTAG. In my experience, boundary scan is critical for multi-chip modules. I’ve debugged field failures where boundary scan showed a trace was open on a PCB—without it, diagnosis would’ve required destructive analysis. The cost: boundary scan adds a mux at every I/O, which adds delay and area. Modern designs balance this by using selective boundary scan (only on critical I/O) or compressing boundary chains.
Q7. Explain the difference between stuck-at (SA) and transition delay faults (TDF).
Stuck-at assumes a net is permanently 0 or 1, regardless of input changes. Transition delay fault (TDF) assumes a net switches correctly but slowly—it reaches the correct final value but takes too long. TDF is more realistic (actual manufacturing defects often cause delay, not complete stuck-at), but harder to detect.
ATPG for TDF requires two patterns: launch pattern (set the net to one state) and capture pattern (force a transition, then capture the result at the next clock edge). If the net is slow, it won’t transition in time for capture, and the flip-flop captures the old value instead. In my experience, TDF is underutilized. Many designers focus on stuck-at coverage because it’s simpler, but TDF catches real manufacturing defects (process variations causing slower gates). Advanced nodes (7nm and below) are pushing TDF adoption because parametric failures (slow gates due to Vth variation) are more likely than hard faults. The challenge: TDF ATPG uses much more complex algorithms than stuck-at, so runtimes are longer.
Q8. What are scan DFFs? How do they differ from normal flip-flops?
A scan DFF (scan D flip-flop) is a multiplexer added to a standard flip-flop’s D input. In normal mode, the mux selects functional D. In scan mode, the mux selects SI (scan-in from previous flip-flop). The flip-flop clocks normally in both modes, but the data source changes. This adds minimal logic (just a 2:1 mux) compared to specialized test flip-flops.
The mux control is SE (scan enable) signal. When SE=0, normal mode. When SE=1, scan mode. In my experience, the SE signal is often the highest-fanout signal on the chip—it must reach every flip-flop simultaneously to avoid skew. Poor SE distribution can cause timing issues (some flip-flops see SE late, creating race conditions). Also, SE transitions at clock edges can consume significant power. This is why careful clock domain design and gating of SE transitions matter. Candidates often underestimate the complexity of SE distribution in large designs.
Q9. Define test coverage and how it relates to yield/quality.
Test coverage measures how thoroughly tests detect manufacturing defects. 100% fault coverage means every detectable fault has a test pattern. But real yield impact depends on actual defect distribution. If tests focus on one area and miss defects elsewhere, coverage looks good but yield suffers.
The relationship: higher coverage generally correlates with higher yield, but it’s not linear. Going from 85% to 90% coverage catches more defects. Going from 95% to 98% catches even more, but fewer per percentage point. Yield ramp on new technology nodes is heavily influenced by test quality. If you ship products with inadequate tests, warranty costs and field returns skyrocket. In my experience, I’ve seen test escapes (defects that pass test but fail in the field) blamed on insufficient coverage. The fix: increase pattern count, switch to transition-delay testing, add MBIST/LBIST for hard-to-reach areas. This is why test development takes significant engineering effort and isn’t an afterthought.
Q10. What is LBIST (Logic Built-In Self-Test)? How does it differ from scan-based testing?
LBIST uses embedded logic (linear feedback shift registers, or LFSRs) to generate pseudo-random test patterns on-chip, then compresses responses into a signature. Unlike scan-based testing (which requires external ATE and detailed test vectors), LBIST runs autonomously. The challenge: LBIST is only pseudo-random; some faults might not be hit by the PRNG pattern sequence.
LBIST is excellent for: (1) at-speed testing (patterns run at functional clock frequency), (2) reducing test data volume, (3) field testing (chip tests itself). The downside: LBIST doesn’t guarantee 100% coverage for all faults. You need to carefully design the LFSR and signature compressor to avoid correlated patterns that miss certain faults. In my experience, LBIST is perfect for logic blocks (datapath, control) but less effective for distributed logic. Modern designs often use hybrid: LBIST for high-coverage quick tests, scan for detailed diagnosis.
Q11. What are X-states? Why are they problematic in test compression?
X-states are unknown/don’t-care values. In simulation, X propagates: if any input to a gate is X, output is often X (unless the gate is “controlling”—like an AND gate with 0 input). During test, X-states cause problems: if you can’t determine the value of an internal net, you can’t reliably predict flip-flop captures. This breaks pattern generation.
In test compression (discussed later), X-states are especially problematic. Compressed patterns are generated symbolically (values not fully specified), and X-states prevent evaluation. ATPG tools have X-handling logic: they either convert X to known values, or justify X to avoid it. In my experience, X-states cause many ATPG pattern generation failures. The fix: ensure all flip-flops are initialized (no X), use reset sequences before test, or add initialization logic to force known states. This is a practical detail many candidates overlook but is critical when debugging failed ATPG runs.
Q12. Explain scan insertion. What’s the flow and what are the challenges?
Scan insertion converts normal flip-flops to scan flip-flops and chains them. The flow: (1) Mark which flip-flops get scan capability, (2) Replace with scan DFFs or add mux to D input, (3) Route scan chains (SI->D1, Q1->SI of next, etc.), (4) Add SE (scan enable) signal, (5) Verify chain connectivity, no loops. Challenges: (1) Area overhead from scan muxes, (2) Timing (scan mux adds delay), (3) Power (SE has high fanout), (4) Clock domain complexity (scan chains span multiple clock domains).
In my experience, scan insertion is usually automated by EDA tools (DFT Compiler, Tessent). But you need to specify which flip-flops scan, which aren’t testable, exceptions. Bad specifications lead to incomplete chains or untestable logic. Modern tools also handle clock domain crossing—if a scan chain spans two clock domains, you need synchronizers at the boundary. Getting this right is crucial; incomplete or malformed scan chains make ATPG impossible.
Intermediate: Compression, MBIST & Advanced Techniques
Q13. What is test compression? How does EDT (Embedded Deterministic Test) work?
Test compression reduces the volume of test patterns (and therefore test time and cost). EDT (Embedded Deterministic Test) from Synopsys works by: (1) Compressing test patterns on-chip using decompressors, (2) Expanding outputs using compressors. Instead of shifting full patterns into scan chains, you shift compressed patterns, decompress on-chip, run functional clocks, compress results, shift out. This reduces data volume by 10-100x.
Why it matters: on large designs (billions of transistors), full-scan testing would require exabytes of test data and days of test time. Compression brings it to manageable levels (hours). The cost: adding decompressors/compressors consumes area and power, and the ATPG process is more complex. In my experience, compression is essential for modern chips. Without it, test cost alone would kill the product economics. The challenge: compression assumes you can generate patterns that decompress correctly on-chip. If the decompressor logic has a fault, pattern generation fails. This is called the “circular dependency” of compression.
💡 Tip: Compression ratio is limited by the decompressor bandwidth. A decompressor with M inputs can decompress M bits per clock, so you achieve roughly scan-length / M compression.
Q14. What is MBIST (Memory Built-In Self-Test)? How is it different from logic BIST?
MBIST is embedded logic to test memories (SRAM, DRAM, ROM) on-chip without external test vectors. MBIST generates address sequences, write/read patterns, and checks results against expected values. Common algorithms: MARCH (march through all addresses in sequence), pseudorandom, or March variants (MARCHA, MARCHB, MARCHX).
MBIST is essential because: (1) Memory defects are common (bit-flips, stuck bits, weak cells), (2) External testing is slow (billions of bits in modern chips), (3) MBIST can run at-speed (functional clock). Unlike logic BIST (which tests gates), MBIST specifically targets memory faults: stuck-at, transition delay, coupling faults (writes in one cell disturb nearby cells). In my experience, MBIST often catches 95%+ of memory defects. The challenge: MBIST area overhead (address generators, pattern generators, checkers) can be significant for small memories. For memories larger than ~64KB, MBIST is almost always cost-effective.
Q15. Explain the difference between EDT and TestKompress (or similar decompression)
Both EDT and TestKompress compress test patterns, but using different algorithms. EDT uses linear decompression (XOR combinations of compressed bits). TestKompress uses pseudorandom decompression. The key difference: EDT generates deterministic patterns (exact bits), TestKompress uses seeded LFSRs for pseudo-randomness.
EDT is good for deterministic ATPG (you explicitly control every bit). TestKompress is good for pseudo-random testing (simpler decompressor, but less precise control). In my experience, EDT offers better coverage for hard-to-hit faults because you can explicitly target them. TestKompress is faster to generate patterns (LFSR algorithms are simpler). Large designs often use hybrid: EDT for critical logic, TestKompress for high-volume blocks. The choice depends on coverage requirements and ATPG runtime constraints.
Q16. What is OCC (On-Chip Clock)? Why is it used in advanced DFT?
OCC (On-Chip Clock) or “internal timing” allows ATPG to specify exact timing relationships between clocks and captures. Instead of relying on ATE timing, which is coarse and analog, OCC uses on-chip delay chains to generate precise timing. This enables at-speed test without expensive ATE.
Why it matters: ATE can’t always generate precisely-timed clocks at high speed. OCC solves this by generating launch and capture clocks on-chip, with exact delays. This is critical for transition-delay testing at-speed. In my experience, OCC is becoming standard on advanced nodes. It’s especially valuable when test frequency is higher than ATE frequency (e.g., test at 2GHz, but ATE only drives 100MHz clock).
Q17. Explain scan segmentation. When is it necessary?
Scan segmentation splits long scan chains into shorter chains, either by partition or by use of scan multiplexers. One long chain of 10,000 flip-flops requires 10,000 shift clocks. Multiple shorter chains (say, 4 chains of 2,500 each) shift in parallel, reducing test time by 4x.
Segmentation is necessary when: (1) Chain length exceeds tester capabilities, (2) ATE I/O bandwidth is limited, (3) Power/thermal limits prevent shifting at high frequency. The cost: more SI/SO pins (tester I/O), more routing complexity. In my experience, modern designs have 4-16 independent scan chains to balance test time and pin count. The tradeoff: shorter chains shift faster (less capacitive load), but you need more I/O. Modern testers support dozens of simultaneous scan inputs, so segmentation is less of a bottleneck than it once was.
Q18. What is at-speed capture in DFT? How does it relate to timing faults?
At-speed capture is when you launch a test pattern, apply functional clocks at rated frequency (not slow-speed), and capture the response. This stresses timing paths and detects delay faults (transition faults) that slow-speed tests miss. Slow-speed tests relax timing margins, allowing marginal delays to pass.
At-speed testing is critical on modern nodes where timing margins are tight (5-10%). A gate that’s 5% slower due to process variation passes slow-speed test (10 period available) but fails at-speed test (2ns available). In my experience, at-speed testing is mandatory for yield ramp on advanced nodes. The challenge: at-speed capture requires precise clock generation and synchronization. If your ATE can’t deliver synchronized launch/capture clocks, or if clock distribution in your design has skew, at-speed results become unreliable.
Q19. Explain the scan shift and capture clock terminology.
Shift clock (TCO, test clock out from JTAG): used during shift operations to move data through scan chains. Usually low frequency (10-100MHz) to allow long chains. Capture clock: used during capture phase to clock flip-flops with functional logic output. Often high frequency (functional rate) for at-speed testing. Both clocks are specified in the DFT protocol.
The mux on scan flip-flops selects: during shift, clock taps from shift clock. During capture, clock taps from functional clock. If you specify wrong frequencies or don’t properly gate clocks, you get race conditions. I’ve debugged designs where the shift clock accidentally fed the capture path, causing bit-flips during shift operations. Getting the clock architecture right is subtle but critical.
📌 Note: Some designs use the same clock for both shift and capture but at different frequencies (gated/divided). This requires careful synchronization to avoid glitches.
Q20. What is hierarchical DFT? How do you approach DFT in modular designs?
Hierarchical DFT applies DFT at module level, then integrates modules. Each module has its own scan chains, JTAG, BIST. At the top level, you chain together module test interfaces. This allows testing each module independently and the full chip together.
Hierarchical DFT is essential for: (1) Large designs with multiple teams—teams can develop module DFT independently, (2) Reusable IP blocks—each block includes its own test logic, (3) Power management—different power domains can test independently. The challenge: integrating module scan chains is complex. If Module A’s output scan chain feeds Module B’s input, you need careful ordering to avoid deadlocks or race conditions. Also, some paths cross module boundaries; you need to ensure test coverage at boundaries. This is advanced stuff; most candidates don’t encounter it unless in large-design teams, but interviewers at big companies love testing your understanding of hierarchical thinking.
Q21. What is IEEE 1500? When would you use it instead of scan?
IEEE 1500 is a standard for testing embedded cores (pre-designed IP blocks). It defines a wrapper architecture around cores with: WIR (wrapper instruction register), WSO/WSI (wrapper scan-out/in), and a TAP controller for each core. This allows modular test—you can test the core independently of the rest of the chip.
You’d use IEEE 1500 when: (1) You’re using multiple third-party IP cores (memory compiler, CPU), (2) Each core has its own test requirements, (3) You want to debug cores independently. The advantage: standard interface—if you have an ARM CPU core with IEEE 1500 wrapper, you know exactly how to test it without needing core-specific knowledge. The disadvantage: wrapper adds area and complexity. In my experience, IEEE 1500 is becoming more common as IP reuse increases. However, most designs still use simpler approaches (custom scan chains) unless they have complex IP integration.
Advanced: Cell-Aware ATPG & Functional Safety
Q22. What is cell-aware ATPG? How does it differ from gate-level ATPG?
Gate-level ATPG models cells as black boxes (truth tables). Cell-aware ATPG models internal transistor-level behavior of cells, detecting intra-cell faults that gate-level misses. Example: in a NAND gate, a transistor-level short might not cause the output to get stuck (gate-level fault) but instead cause a slow transition (intra-cell delay fault).
Cell-aware ATPG requires: (1) Detailed cell models (transistor layouts), (2) SPICE-level characterization of faults, (3) More complex ATPG algorithms. The benefit: 5-15% additional fault coverage compared to gate-level. In my experience, cell-aware ATPG is increasingly important on advanced nodes where intra-cell effects dominate. However, it adds significant ATPG complexity and runtime. Most companies use cell-aware only on critical paths or power domains where yield risk is highest.
Q23. Explain IJTAG (IEEE 1687). What’s the difference from traditional JTAG?
IJTAG (Internal JTAG) extends JTAG to support hierarchical test networks. Traditional JTAG is linear—one TAP at top, everything daisy-chained below. IJTAG allows tree-like topologies: multiple sub-networks, each with its own JTAG interface, coordinated by a top-level TAP.
Why it matters: on SoCs with many IP blocks, hierarchical JTAG is cleaner. Instead of one massive JTAG daisy-chain, each subsystem has its own TAP, and the top-level TAP orchestrates testing. This improves: (1) Modularity—test each subsystem independently, (2) Bandwidth—parallel test paths, (3) Power management—control per-subsystem during test. In my experience, IJTAG is becoming standard for complex SoCs. However, tool support is still maturing—not all ATPG/DFT tools fully support IJTAG yet.
Q24. What is FMEDA (Failure Mode, Effects & Diagnostic Analysis)? How does it relate to functional safety (ISO 26262)?
FMEDA is a systematic analysis of failure modes and their safety impact. In automotive safety (ISO 26262), you must ensure that faults don’t cause unsafe behavior. FMEDA identifies: what faults can occur, what happens if they occur (diagnostic coverage), whether tests can detect them.
ISO 26262 requires: fault detection via diagnostic coverage (DC). For example, if a logic error could cause unsafe steering command, your test must detect that logic with high probability. DC = (faults that can be diagnosed) / (total faults). Achieving 99%+ DC requires aggressive testing—sometimes more aggressive than traditional test coverage targets. In my experience, safety-critical chips (automotive, medical) require functional safety analysis alongside DFT. This is beyond traditional DFT—you’re not just maximizing coverage, you’re ensuring critical faults are caught. Candidates familiar with both DFT and safety standards stand out in interviews for safety-critical companies.
Q25. How do you handle timing-dependent faults in test? (Timing-aware ATPG)
Timing-dependent faults occur only at certain frequencies or clock edges. Example: a hold-time violation on a data input causes bit-flips only if setup/hold is violated. ATPG must generate patterns that trigger these violations at-speed. Timing-aware ATPG correlates ATPG patterns with STA constraints.
This requires: (1) Integration with STA tools to know timing constraints, (2) ATPG understanding of which patterns stress timing, (3) Synchronized ATE clocking. In my experience, timing-aware ATPG is a frontier area. Some modern tools (Tessent, TetraMAX) support it, but many designs still use separate timing verification and test generation. Ideally, they’d be unified—test patterns that catch timing faults are worth their weight in silicon.
Q26. Explain in-system test and field testing. When is it necessary?
In-system test runs on-chip during operation (after shipping), typically using LBIST or MBIST. Field testing is when a chip tests itself periodically in the customer’s system. This catches defects that escaped manufacturing test (rare but catastrophic).
In-system test is critical for: (1) Long-running systems (servers, network equipment)—periodic self-test catches aging failures, (2) Safety-critical (automotive, medical)—continuous monitoring ensures no silent failures, (3) Space/harsh environments—where field repair is impossible. The challenge: in-system test competes for processor cycles and power. You must balance thoroughness against performance impact. In my experience, automotive and aerospace mandate in-system test. Consumer products rarely use it (cost/power not justified). This is a practical distinction—interviewers care about understanding when in-system test is worth the investment.
Q27. What are the main challenges in scan chain debugging?
Scan chain debugging happens when: (1) Expected patterns don’t shift correctly, (2) Captured data is corrupted, (3) Signature mismatches in BIST. Root causes: (1) Incomplete chains (SI/SO not connected properly), (2) Timing violations (shift clock too fast), (3) Reset issues (flip-flops not in known state), (4) SE signal glitches or skew.
Debugging approach: (1) Verify netlist—is chain connected correctly? (2) Run static checks—any floating inputs, undriven nets? (3) Simulate shift operation—does it work in simulation? (4) If hardware fails, use logic analyzer to trace SI/SO signals. In my experience, scan chain debugging is tedious but straightforward if you’re systematic. The most common issue: SE signal not reaching some flip-flops correctly (timing, gating, missing driver).
Q28. How do you measure and improve yield using DFT data?
Yield data from test: pass/fail patterns, defect locations (if available), signatures. Analysis: cluster failures by die location, compare to known defect models (opens, shorts in specific layers), identify systematic defects (yield killers). If many chips fail at one location, suspect a process issue (e.g., via yield problem in metal layer 2).
Improvement: work with process team to fix identified yield killers. Use in-situ test data to inform design changes—if a particular logic block has high defect density, consider guard banding (more margin, stronger cells) or design changes. In my experience, the feedback loop (test data → yield analysis → design changes) is critical for mature nodes. First silicon usually has 70-80% yield; with three to four iterations, it reaches 95%+. DFT data is invaluable for these iterations because it pinpoints where defects occur.
Practical & Tools: DFT Compiler, TetraMAX & Tessent
Q29. What is DFT Compiler? Explain the basic workflow.
DFT Compiler (Synopsys) automates scan insertion and DFT optimization. Workflow: (1) Read netlist and specify DFT rules (which flip-flops scan, exclusions), (2) Identify logic that needs testing, (3) Insert scan muxes and create chains, (4) Optimize for area/timing, (5) Generate test mode logic (SE distribution, reset), (6) Output results (netlist, test ports).
Key features: automatic scan insertion, scan chain ordering (for power, timing, or other constraints), clock domain handling, test port generation. In my experience, DFT Compiler is straightforward to use for standard flows. Challenges arise with: (1) Complex designs—multiple clock domains, power gating, resets, (2) Constraints—if you specify conflicting requirements, tool may fail, (3) Coverage optimization—getting the last few percent of coverage often requires manual intervention.
📌 Note: DFT Compiler output is a gate-level netlist with scan inserted. You must then pass this to place-and-route and ATPG tools.
Q30. What is TetraMAX? Explain the ATPG flow.
TetraMAX (Synopsys) is an automatic test pattern generation (ATPG) tool. Flow: (1) Read netlist and libraries, (2) Build fault model (stuck-at, TDF, etc.), (3) Generate patterns to detect faults—this is the heavy-lifting step, (4) Compact patterns (remove redundant patterns), (5) Output test vectors in STIL format.
TetraMAX uses sophisticated algorithms (D-algorithm, FAN algorithm, or proprietary) to justify and propagate faults. It can handle complex designs (billions of transistors) by using hierarchical analysis and abstraction. In my experience, TetraMAX is the industry standard for ATPG. Key challenges: (1) Runtime—large designs take hours to ATPG, (2) Coverage—not all faults are detectable; TetraMAX reports undetectable faults, (3) X-handling—must properly manage unknown values. Advanced users exploit: (1) Test patterns from design (user-specified patterns), (2) Compression integration (EDT, TestKompress), (3) Power optimization (reducing peak test power).
Q31. What is Tessent? When would you use it instead of TetraMAX?
Tessent (Mentor, now part of Siemens) is an alternative ATPG/DFT tool. Like TetraMAX, it generates test patterns and optimizes DFT. Some say Tessent is faster for very large designs; others prefer TetraMAX’s pattern quality. Both are industry-leading.
Tool choice often depends on: (1) Company tool ecosystem—if you use other Synopsys tools (Design Compiler, PrimeTime), TetraMAX integrates seamlessly, (2) Specific features—each tool has unique capabilities (Tessent excels at certain compression schemes, TetraMAX at certain ATPG techniques), (3) Support/training. In my experience, both tools produce equivalent results if properly configured. The real difference is integration and workflow. Most large companies support both and let teams choose based on preference. If you’re interviewing and not experienced with the company’s tool, saying “I know TetraMAX well and can learn Tessent quickly” is fine.
Q32. Explain SDC DFT commands for scan insertion and ATPG setup.
SDC (Synopsys Design Constraints) includes DFT-specific commands: ‘set_scan_configuration’ (specify scan chain length, multiplexing), ‘set_scan_signal’ (mark signals as scan-related), ‘set_active_scan_mode’, ‘set_test_hold’. These constraints guide DFT tool behavior.
Example: you might specify “exclude these flip-flops from scan,” “use 4 independent scan chains,” “apply this reset sequence before test.” DFT Compiler reads these constraints and optimizes accordingly. In my experience, most teams use default SDC settings, which usually work fine. Advanced teams customize SDC to: (1) improve pattern coverage, (2) reduce power during shift, (3) optimize for compression ratios. Understanding SDC DFT commands separates junior from senior designers.
Q33. What is a STIL file? How is it used in test?
STIL (Structured Test Input Language) is a format for test patterns and procedures. STIL files specify: (1) I/O timing (setup/hold relative to clocks), (2) Test vectors (values for each scan operation), (3) Specifications (which signals are clocks, scans, resets), (4) Procedures (sequences of scan shift/capture operations).
STIL is the bridge between ATPG tools (TetraMAX) and ATE (test equipment). ATPG generates patterns, exports to STIL, then ATE reads STIL and executes test. In my experience, STIL files can be large (gigabytes for complex designs with high coverage). Compression helps—compressed STIL files are much smaller. Working with STIL requires understanding timing specs (when to clock, when to sample), signal definitions, and the tester’s capabilities.
Q34. Explain the role of ATE (Automatic Test Equipment) in production testing.
ATE is the hardware that physically executes test patterns on manufactured chips. ATE supplies: (1) Digital I/O channels (drive pins, measure pins), (2) Clocks at precise frequencies and phases, (3) Analog resources (power supply, current measurement). Modern ATE can test multiple chips in parallel (“tester sites”).
ATE capabilities constrain DFT design: (1) Pin count—ATE has limited I/O channels; if you have 32 scan inputs but ATE has only 16 channels, you need multiple test runs, (2) Frequency—ATE might support up to 500MHz, but your chip runs at 2GHz; you need special test modes (OCC, on-chip clock), (3) Timing precision—ATE timing is ±nanoseconds; if your test margins are ±picoseconds, you need on-chip synchronization. In my experience, ATE is a major cost factor. Designs that minimize ATE requirements (fewer I/O, slower shift clocks, less parallel testing) reduce test cost significantly.
Q35. How do you debug low fault coverage? Walk through the process.
Low coverage diagnosis: (1) ATPG reports show undetectable faults and hard-to-detect faults, (2) Analyze patterns—why can’t certain faults be hit? (3) Check observability—can the fault effect reach a primary output or flip-flop? (4) Check controllability—can you set up the circuit to sensitize the path? (5) Add test logic if needed (extra flip-flops, probe points, LBIST).
Root causes of low coverage: (1) Internal logic—some blocks (e.g., parity checkers, CRC generators) are inherently hard to test, (2) Redundancy—some logic is intentionally redundant for reliability, so faults don’t manifest, (3) Missing scan—if certain flip-flops aren’t in scan, downstream logic becomes harder to test. The fix: add observation points (extra flip-flops in scan), increase scan chain density, or formally verify that undetectable faults are safe. In my experience, getting from 90% to 95% coverage is hard—it requires this kind of detailed analysis.
Q36. Explain power and thermal management during scan-based testing.
Scan shift operations consume significant power: shifting long chains at high frequency causes high switching activity. This generates heat and can exceed power supply margins, causing test failures. Thermal stress can also cause parametric failures (timing margins shrink at elevated temperatures).
Mitigation: (1) reduce shift clock frequency (slower shift = less power, but longer test time), (2) segment scan chains (distribute power across multiple chains), (3) X-compression (reduce transitions by compressing X’s in patterns), (4) power-aware ATPG (generate patterns with lower power peaks). In my experience, power during test is a real challenge on advanced nodes. I’ve seen test failures due to Vdd sag during heavy shifting. The trade-off: you want fast test (higher clock), but power constraints limit frequency. Careful engineering of shift clock distribution and power delivery is critical.
Q37. What is multi-clock ATPG? How do you set it up?
Multi-clock ATPG handles designs with multiple clock domains. ATPG must respect: (1) Each clock domain’s constraints (clock frequency, phase, skew), (2) CDC synchronizers (no direct paths from one clock to another), (3) Clock domain crossing rules (don’t apply setup/hold across domains). Setup requires: defining each clock, specifying domain boundaries, marking asynchronous paths.
Multi-clock ATPG is complex because: (1) Patterns must synchronize across clock domains (requires careful launch/capture timing), (2) Some clock combinations might be invalid (e.g., two clocks can’t launch/capture simultaneously if domains are asynchronous), (3) Coverage goals might conflict between domains. In my experience, multi-clock designs are increasingly common (SoCs with multiple power domains, clock gating). Setting up ATPG correctly for multi-clock designs is critical. Mistakes lead to invalid patterns that either fail on ATE or introduce race conditions.
Q38. Explain the difference between test compression and test data reduction.
Test compression reduces pattern volume on-the-fly by expanding compressed data on-chip (EDT, TestKompress). Test data reduction reduces volume through ATPG: by generating minimal patterns (e.g., 5000 patterns instead of 10000), you naturally reduce data. Both reduce test time and cost, but via different mechanisms.
Compression is usually more effective (10-100x reduction), but adds hardware (decompressor/compressor). Data reduction is simpler (just fewer patterns), but not as dramatic. In my experience, most modern designs use compression because the hardware cost is small compared to test time savings. Data reduction is a secondary optimization—you compress first, then optimize pattern count within the compressed space.
Q39. What’s the relationship between DFT and power management (power gating, voltage scaling)?
Power gating can isolate logic blocks from supply voltage (turning off power domains). During test, you must either: (1) keep all power domains on (simpler test, but defeats power gating test benefit), or (2) test each domain independently (requires domain-specific test logic). Voltage scaling similarly complicates test—patterns valid at nominal Vdd might fail at reduced voltage.
Handling power domains in DFT: (1) scan chains don’t cross power domain boundaries (unless synchronizers used), (2) each domain has independent reset/clock distribution, (3) test orchestration must power-gate domains in correct sequence. In my experience, power-gated designs are increasingly common for power efficiency. DFT must handle this gracefully—poor integration of power and DFT leads to complex test sequences and sometimes untestable logic.
Q40. Explain the concept of test pattern grading and its impact on yield.
Test pattern grading prioritizes patterns by fault coverage effectiveness. Some patterns detect many faults (high-value), others detect few (low-value). If test time is limited, you run high-value patterns first; if limits allow, you add marginal patterns. Grading helps: (1) reduce test time by eliminating redundant patterns, (2) ensure critical faults are tested early.
Grading impact on yield: if you grade and run only top-50% patterns due to time constraint, you might hit 85% coverage. Running all patterns gives 95% coverage. The question: is the extra 10% coverage worth the test time cost? For safety-critical chips, yes. For consumer products, maybe not. In my experience, grading is a practical tool for managing test cost vs. quality trade-offs. Most production test uses graded patterns—you test the most critical faults thoroughly, then add additional faults if time permits.
| Company / Domain | Key Topics Emphasized | Difficulty Level |
|---|---|---|
| Qualcomm / NVIDIA | Advanced DFT (compression, cell-aware ATPG, multi-clock ATPG), MBIST, LBIST, power-aware testing, at-speed testing at 2GHz+, hierarchical DFT, yield correlation. | Hard |
| Intel / AMD | Scan insertion, ATPG basics, coverage analysis, timing-aware test, JTAG, boundary scan, test compression, ECO test updates. | Medium-Hard |
| TI / NXP / Infineon | Functional safety (ISO 26262), DFT for safety, FMEDA, fault grading for safety, LBIST for continuous monitoring, power-aware testing. | Hard |
| Synopsys / Cadence / Siemens | DFT tool internals (TetraMAX/Tessent algorithms), test compression (EDT/TestKompress), ATPG algorithm (D-algorithm, FAN), tool flow integration. | Hard |
| Broadcom / Marvell | Basic scan, fault coverage, ATPG patterns, JTAG, clock domain crossing, test compression basics, yield debugging. | Medium |
| Design Services / Freshers | Scan chains, fault models (stuck-at), ATPG overview, JTAG basics, test coverage, DFT Compiler basics, basic debugging. | Easy-Medium |
Resources & Further Learning
Key References:
- Test Compression and Silicon Debug by Rajski & Tyszer — comprehensive DFT and compression fundamentals.
- Digital Systems Testing and Testable Design by Bushnell & Agrawal — classic DFT textbook, excellent for ATPG theory.
- IEEE 1149.1 (JTAG) and IEEE 1687 (IJTAG) standards — formal specifications.
- Synopsys DFT Compiler and TetraMAX documentation — practical tool references.
- Mentor Tessent documentation — alternative DFT/ATPG tool reference.
Practice Tips:
- Study open-source DFT examples (OpenCores, GitHub) and understand scan chain implementations.
- Run ATPG on small test designs using open-source tools (e.g., ATAAS, academic tools) to understand pattern generation.
- Read published VLSI papers on DFT innovations (compression, at-speed testing, safety)—companies love when candidates cite recent work.
- In interviews, ask specifics: “What’s your current test coverage target? What compression scheme do you use? How do you handle multiple clock domains?” Show systematic thinking.
💡 Tip: Before your interview, check if the company has published any test-related papers or IEEE papers. If they mention innovations in compression or coverage techniques, knowing those details gives you huge credibility with interviewers.
