Close Menu
VLSI Web
  • Home
    • About Us
    • Contact Us
    • Privacy Policy
  • Analog Design
  • Digital Design
    • Digital Circuits
    • Verilog
    • VHDL
    • System Verilog
    • UVM
  • Job Roles
    • RTL Design
    • Design Verification
    • Physical Design
    • DFT
    • STA
  • Interview Questions
  • Informative
Facebook X (Twitter) Instagram LinkedIn
Instagram LinkedIn WhatsApp Telegram
VLSI Web
  • Home
    • About Us
    • Contact Us
    • Privacy Policy
  • Analog Design
  • Digital Design
    • Digital Circuits
    • Verilog
    • VHDL
    • System Verilog
    • UVM
  • Job Roles
    • RTL Design
    • Design Verification
    • Physical Design
    • DFT
    • STA
  • Interview Questions
  • Informative
VLSI Web
Interview Questions

RTL Synthesis Interview Questions and Answers

Raju GorlaBy Raju Gorla4 March 2024Updated:21 March 2026No Comments47 Mins Read
RTL synthesis interview questions
Share
Facebook Twitter LinkedIn Email Telegram WhatsApp

I’ve compiled 40 RTL synthesis interview questions from years of tapeouts, IP handoffs, and design closures. These questions cover everything from the basics of what synthesis actually does, to the nasty timing violations and power optimization tricks that separate engineers who’ve shipped silicon from those who’ve only read the datasheet. Whether you’re prepping for Synopsys Design Compiler interviews or just want to understand synthesis deeply, this guide walks through real scenarios you’ll hit in the lab.

💡 Who This Is For: RTL design engineers, verification leads, and junior designers prepping for synthesis-heavy roles at companies like Synopsys, Intel, AMD, TSMC, or ARM. If you’ve built RTL but never sat in the back of a synthesis run watching slack reports, you need these answers.

Table of Contents

  • Quick Navigation
  • Section 1: Synthesis Basics (Q1–Q10)
    • Q1. What is RTL synthesis? What’s the difference between a generic netlist and a technology-mapped netlist?
    • Q2. What files does a synthesis tool need? (RTL, .lib, .sdc, .svf) — Explain each
    • Q3. What is a Liberty (.lib) file? What does it contain?
    • Q4. Walk through the synthesis flow step by step (read → elaborate → map → optimize → write)
    • Q5. What is check_design? What errors does it catch?
    • Q6. What is compile vs compile_ultra in Design Compiler?
    • Q7. What is a timing path? How does synthesis identify the worst path?
    • Q8. What is technology mapping? How does the tool select cells?
    • Q9. What are tie cells? Why are they needed?
    • Q10. What is the difference between area optimization and timing optimization? Can you do both?
  • Section 2: SDC & Timing Constraints (Q11–Q20)
    • Q11. What is SDC? Why is it separate from the RTL? (with example SDC snippet)
    • Q12. Explain create_clock — period, waveform, source port
    • Q13. What is clock uncertainty? What contributes to it?
    • Q14. set_input_delay vs set_output_delay — How do you calculate the correct values? (timing diagram showing I/O path timing)
    • Q15. What is a false path? Give 3 real examples (mode select, test path, async interface)
    • Q16. What is a multicycle path? When do you use it? (example: 2-cycle arithmetic operation)
    • Q17. What is set_max_delay -datapath_only? How is it different from a multicycle path?
    • Q18. What is a generated clock? Give a real example (clock divider, PLL output)
    • Q19. What is a virtual clock? When do you use it?
    • Q20. How do you analyze a timing report? (anatomy of report_timing output with annotated example)
  • Section 3: Area, Power & Optimization (Q21–Q30)
    • Q21. What is resource sharing in synthesis? Give an example (shared adder)
    • Q22. What is retiming? When is it beneficial?
    • Q23. What are multi-threshold cells (HVT/LVT/SVT)? When do you use each?
    • Q24. How does clock gating reduce power? What percentage is typical?
    • Q25. What is DesignWare? What are some DesignWare components?
    • Q26. What is boundary optimization? When should you disable it?
    • Q27. What is dont_use? Give examples of when you’d dont_use certain cells
    • Q28. How does synthesis handle unresolved module references (black boxes)?
    • Q29. What is size_cell? When do you manually size cells?
    • Q30. Scan insertion in synthesis — what happens to the netlist? (DFT_Compiler flow)
  • Section 4: Advanced Synthesis (Q31–Q40)
    • Q31. What is physical-aware synthesis? How does it differ from flat synthesis?
    • Q32. What is hierarchical synthesis? What is an Interface Logic Model (ILM)?
    • Q33. What is incremental synthesis? When is it used?
    • Q34. What is an ECO (Engineering Change Order)? Metal ECO vs functional ECO
    • Q35. How do you handle hold violations in synthesis?
    • Q36. What does write_sdf produce? How is it used downstream?
    • Q37. What is check_timing? What warnings are dangerous vs ignorable?
    • Q38. Context-dependent synthesis — what does it mean and why does it matter for hierarchy?
    • Q39. How do synthesis results change between corner libraries (WC/BC/TT)?
    • Q40. What are the most common synthesis failure modes? (top 5 with how to fix each)
  • Interview Cheatsheet: RTL Synthesis Most-Asked Topics by Company
  • Resources & Further Reading

Quick Navigation

Section 1: Synthesis Basics (Q1–Q10) | Section 2: SDC & Timing Constraints (Q11–Q20) | Section 3: Area, Power & Optimization (Q21–Q30) | Section 4: Advanced Synthesis (Q31–Q40) | Interview Cheatsheet

Section 1: Synthesis Basics (Q1–Q10)

Q1. What is RTL synthesis? What’s the difference between a generic netlist and a technology-mapped netlist?

RTL synthesis is the automated process that converts human-readable Hardware Description Language (Verilog, VHDL) into gates and flip-flops that can be placed on silicon. It’s the bridge between logic design and physical design.

A generic netlist contains only idealized logic gates (AND, OR, NOT, flip-flops) with no knowledge of the actual manufacturing process. It’s like describing a circuit with perfect gates that have zero delay and infinite drive strength. A technology-mapped netlist is the real netlist: it uses actual cells from the foundry’s Liberty library (NVT, SVT, HVT versions of AND2, NOR3, DFF, etc.) with real timing arcs, power characteristics, and pin capacitances. The technology mapper is the step that walks through the generic netlist and replaces each ideal gate with an actual cell from the .lib file that meets area and timing constraints.

📌 Note: In every tapeout I’ve been part of, someone always forgets that the generic netlist is just an intermediate. What actually goes to P&R and silicon is the technology-mapped netlist. If you see timing issues in PrimeTime post-route, the root cause almost always traces back to how the mapper chose cells during synthesis.

Q2. What files does a synthesis tool need? (RTL, .lib, .sdc, .svf) — Explain each

A synthesis tool needs four inputs to produce a useful design: RTL source files, liberty library, SDC constraints, and optionally a synopsys setup file.

.RTL files (Verilog/VHDL) contain the design logic. .lib files (Liberty) contain the cell definitions—every gate’s timing arcs, power characteristics, pin capacitances, and metal layer info. .sdc files (Synopsys Design Constraints) tell the tool what you care about: clock periods, input/output delays, false paths, and optimization priorities. .svf (Synopsys View File) is optional but useful—it’s a text file that points the tool to which libraries to use for different scenarios (worst-case, best-case, typical). Without the .lib, the tool has no idea what gates exist. Without the .sdc, it optimizes for area instead of your actual timing goals. This is why SDC is often the most critical file you write.

💡 Tip: Keep your .lib files organized by corner (WC, BC, TT). I’ve seen synthesis runs fail silently because someone pointed to the TT library when they meant WC. Always sanity-check that you’re compiling against the worst-case corner for timing closure.

Q3. What is a Liberty (.lib) file? What does it contain?

A Liberty file is the foundry’s formal specification of every logic cell available for that process technology, including timing, power, and physical properties.

Inside a .lib you’ll find: cell definitions (AND2_X1, DFF_X2, NOR3_X4, etc.), pin-level attributes (pin capacitance, max transition), timing arcs (propagation delay from each input to output across corners), leakage power per cell, dynamic power (switching energy), threshold voltage info, and metal layer constraints. Liberty files are text-based (human-readable), though most commercial flows parse them as binary for speed. The synthesis tool reads the timing arcs to understand how long signals take to propagate through each cell, so it can calculate slack and choose the right-sized cells. If you see a synthesis result with unexpectedly fast timing, it’s often because the .lib you used is overly optimistic—check the corner.

📌 Note: Liberty files are process/voltage/temperature (PVT) specific. A 5nm Liberty file won’t work for 28nm. Always confirm you’re using the right generation .lib for your target node.

Q4. Walk through the synthesis flow step by step (read → elaborate → map → optimize → write)

The synthesis flow has five main stages, each building on the previous: read, elaborate, map, optimize, and write.

Read: The tool parses your Verilog/VHDL and builds an Abstract Syntax Tree (AST). Elaborate: Expands submodules, resolves port connections, and builds a hierarchical design database. Some tools also do simple optimizations here (constant propagation, dead-code removal). Map: The technology mapper replaces generic logic with actual cells from the .lib. This is where timing starts to matter—the mapper chooses cell sizes and versions (NVT/SVT/HVT, X1/X2/X4) to meet timing and area constraints. Optimize: Final passes to reduce slack violations, optimize for area, balance power, and handle special constructs (memory compilers, special I/O cells). Write: Outputs the netlist in Verilog (typically used by P&R teams), SDF timing data for simulation, and reports for analysis. The netlist at the end is what goes downstream to place-and-route.

💡 Tip: Most silicon failures trace back to a synthesis step done wrong. I always run check_design immediately after read, before you waste an hour debugging elaboration errors. Catch mistakes early.

Q5. What is check_design? What errors does it catch?

check_design is your first sanity check—it validates that the design hierarchy is correct, ports are properly connected, and there are no obvious logical inconsistencies.

check_design catches missing module definitions, undriven nets, multi-driven nets, floating pins, port width mismatches, and clock tree issues. It also flags design rule violations specific to the tool (e.g., a flop with no clock connection, a logic loop that shouldn’t exist). Running check_design is non-negotiable before you start a timing-driven compile. In my experience, about 30% of RTL issues show up here before wasting CPU time in the mapping step. If check_design passes but synthesis still fails mysteriously, the error is usually in your SDC constraints or library setup, not the design itself.

📌 Note: check_design is your best friend before you start sweating over timing reports. A design that fails check_design will cause headaches for days downstream in P&R and signoff.

Q6. What is compile vs compile_ultra in Design Compiler?

compile is the standard optimization pass; compile_ultra is Synopsys’s premium optimization engine with advanced algorithms for area, power, and timing.

compile uses faster heuristics and is suitable for most designs under tight time-to-compile constraints. compile_ultra applies advanced techniques like sequential optimization (retiming, gate duplication), multi-level boolean optimization, and cross-hierarchical optimization that compile skips. The tradeoff: compile_ultra takes 2–10x longer wall-clock time but often closes timing or hits area targets that compile cannot reach. In real projects, if your design closes with compile, ship it—the extra 2 hours of compile_ultra might save 2% area but waste two days of critical path. Reserve compile_ultra for blocks that truly cannot close with compile, like highly datapath-intensive modules (FIR filters, ALUs) where the extra algorithms actually help.

💡 Tip: Try compile first on your full design. If you have critical timing failures, then try incremental compile_ultra on just the failing paths with set_critical_range. That’s often faster than full compile_ultra.

Q7. What is a timing path? How does synthesis identify the worst path?

A timing path is any sequence of gates and interconnect that a signal must traverse from a source (input pin or flop Q output) to a sink (flop D input or output pin), passing through combinational logic.

The synthesis tool builds a timing graph where each net and gate has a delay. For each path, it calculates arrival time at the sink: arrival = source time + sum(gate delays) + sum(net delays). It compares this to the required time (deadline set by your SDC constraint). The worst path (highest slack violation, closest to the deadline) is the critical path. Synthesis then focuses optimization effort on reducing delays in that path: upsizing gates, duplicating drivers, removing unnecessary logic. If the critical path has 50 gates and you only optimize 3 of them, you won’t close timing—this is why hierarchical design matters. Tools like Synopsys DC show you the critical path chain in report_timing, highlighting exactly which gates contribute most to delay.

📌 Note: The critical path often changes as you optimize. A path that’s critical at the start might not be critical after upsizing three gates. This is why iterative synthesis is real—you run compile, check results, adjust constraints, and compile again.

Q8. What is technology mapping? How does the tool select cells?

Technology mapping is the automated process of replacing abstract logic gates (AND, OR, NOR) with actual cells from the Liberty library, considering timing, area, power, and drive strength.

The mapper receives a generic netlist from the logic synthesis step. It then faces a choice: for an AND gate, use AND2_X1 (smallest, slowest) or AND2_X4 (bigger, faster)? The mapper consults the .lib timing data—if this AND gate is on the critical path, the mapper chooses X4 to reduce delay. If it’s non-critical, the mapper chooses X1 to minimize area. This happens across the entire design millions of times per compile run. Modern mappers also consider pin capacitance—if a gate has many fanout connections, upsizing it reduces the load it sees, lowering delay further. If the mapper makes poor choices (too aggressive in non-critical areas, too timid on critical paths), the final netlist will have wasted area or unmet timing. This is why good SDC constraints are essential—they guide the mapper’s decisions.

💡 Tip: If you see a report showing your design is X10 larger than expected but timing closes, the mapper went haywire upsize everything. Check your SDC—are you missing false_path declarations that made the tool think everything was critical?

Q9. What are tie cells? Why are they needed?

Tie cells are special utility cells that drive a constant value: TIE_HI drives ‘1’, TIE_LO drives ‘0’. They exist because a gate input cannot simply be wired to VDD or VSS.

In RTL, you might write something like: assign my_signal = 1’b1; The synthesis tool must map this constant to something physical—just wiring the net to VDD is illegal (bad for signal integrity, loading, and manufacturing). Instead, the tool inserts a TIE_HI cell that has a real output pin (with proper drive strength and capacitance) connected to that net. This ensures the constant is driven by a real gate, not a raw supply rail. Tie cells are also used when you have unused pins that must be tied high or low per foundry design rules. They’re tiny cells (usually 1-2x minimum gate size) and cheap in area, but they’re essential for correctness.

📌 Note: Always check your synthesis reports for the number of tie cells. If you see thousands of them in a small design, something is wrong—maybe you have a wide bus that’s all zeros, or unused logic that wasn’t optimized away.

Q10. What is the difference between area optimization and timing optimization? Can you do both?

Area optimization minimizes gate count and capacitance; timing optimization minimizes delay on critical paths. They’re fundamentally opposed: bigger gates are faster, smaller gates save area.

Design Compiler lets you balance the two with set_cost_priority and by using compile with different flags. True simultaneous optimization is hard—upsizing a gate to meet timing adds area, and downsizing to save area adds delay. In practice, engineers prioritize: first, close timing (because timing failures kill silicon), then optimize for area (because area costs yield). Many teams use a two-pass approach: compile with high effort to close timing, then a power/area optimization pass where timing slack is intentionally sacrificed. The goal is not to optimize both equally, but to hit timing *and* meet an area budget. Some tools like Design Compiler support multi-objective optimization where you can weight timing vs. area, but in real projects, a binary hierarchy works: timing is the hard constraint, area is the soft constraint you minimize subject to timing closure.

💡 Tip: I’ve shipped designs where synthesis hit timing but blew area budget because I didn’t apply aggressive set_max_transition and set_max_capacitance constraints. Apply all three (timing, transition, capacitance) constraints from day one—they work together.

Section 2: SDC & Timing Constraints (Q11–Q20)

Q11. What is SDC? Why is it separate from the RTL? (with example SDC snippet)

SDC (Synopsys Design Constraints) is a declarative language that specifies timing, area, and power goals separately from RTL logic. It’s separate because constraints are a property of how the design is used, not how it’s built.

Your RTL defines *what* the design does; your SDC defines the *requirements*—clock frequencies, input arrival times, output deadlines, which paths don’t matter (false paths). Keeping them separate is powerful: the same RTL can be compiled multiple times with different SDC files for different use cases (high-performance mode vs. low-power mode, different clock speeds, different external interfaces). If you baked constraints into Verilog as comments, you’d duplicate RTL for each use case—terrible engineering. SDC is a standard format, readable by every EDA tool in the flow, from synthesis through timing signoff.

Example SDC:

# Define the main clock: 200 MHz
create_clock -name clk -period 5.0 -waveform {0 2.5} [get_ports clk]

# Input constraint: data arrives 3.5 ns after clock
set_input_delay -clock clk -max 3.5 [get_ports {data_in*}]

# Output constraint: data must be ready 4.0 ns after clock
set_output_delay -clock clk -max 4.0 [get_ports {data_out*}]

# False path: this path is never exercised, ignore it
set_false_path -from [get_pins reset_sync_reg/Q] -to [get_pins arbiter/*]

# Multi-cycle path: this operation takes 3 clocks
set_multicycle_path 3 -from [get_pins mult_a_reg/Q] -to [get_pins mult_result_reg/D]

📌 Note: SDC is your contract with the place-and-route team. If you make a mistake in SDC, they inherit it—the design will be routed to meet your wrong constraints. Always review SDC with the full team before locking it in.

Q12. Explain create_clock — period, waveform, source port

create_clock defines a clock signal in your design: its period (in nanoseconds), duty cycle (waveform), and which port/pin it drives.

The period is the reciprocal of frequency: a 200 MHz clock has period 5.0 ns. Waveform is a list {rising_time falling_time} relative to the period start; {0 2.5} means the clock rises at time 0, falls at time 2.5, then repeats every 5.0 ns (50% duty cycle). If you have a 60% clock, use {0 2.0}—rises at 0, falls at 2.0, repeats every 5.0 ns. The source_port is critical: [get_ports clk] anchors the definition to an input port, or [get_pins pll/clk_out] can anchor to an internal clock source (like a PLL output). Tools use this to identify where clock skew and jitter originate. If you create a clock on the wrong port, synthesis won’t optimize correctly because it doesn’t know which signal is the actual clock.

Example:

# 333 MHz clock with 50% duty cycle
create_clock -name clk_333 -period 3.0 -waveform {0 1.5} [get_ports clk]

# 200 MHz clock with 40% duty cycle (specialized circuits)
create_clock -name clk_200 -period 5.0 -waveform {0 2.0} [get_ports clk_200]

💡 Tip: If you have a 400 MHz system clock and a 100 MHz reference clock, create both. The tool needs to know about both to handle clock domain crossings and constrain datapath delays correctly.

Q13. What is clock uncertainty? What contributes to it?

Clock uncertainty is the tool’s margin for timing guardbanding—it accounts for clock skew and jitter that will exist in silicon but are hard to predict at synthesis time.

Clock skew is the difference in clock arrival times at different flops (caused by interconnect delay, buffer delays in the clock tree); clock jitter is the cycle-to-cycle variation in clock period (caused by PLL noise, phase-frequency detectors, substrate noise). Together, they create a timing margin that synthesis must respect. If your clock period is 5.0 ns and uncertainty is 0.4 ns, the effective period for timing analysis is 4.6 ns. Typical values: low-skew on-chip clocks might have 0.1–0.3 ns uncertainty; external input clocks might have 0.5–1.0 ns. A PLL-derived clock might have 0.2 ns jitter alone. You set uncertainty using set_clock_uncertainty; if you set it too low, your design will fail on silicon (real skew/jitter worse than assumed). Too high, and you waste design margin unnecessarily.

Example:

# Setup uncertainty (affects setup time check): 0.3 ns
set_clock_uncertainty 0.3 -setup [get_clocks clk]

# Hold uncertainty (affects hold time check): 0.1 ns (usually less than setup)
set_clock_uncertainty 0.1 -hold [get_clocks clk]

📌 Note: Clock uncertainty is a conservative estimate made before clock tree synthesis (CTS). After CTS, the actual clock skew is measured; then P&R updates constraints with tighter uncertainty for the final timing closure. Synthesis uses a pessimistic number to be safe.

Q14. set_input_delay vs set_output_delay — How do you calculate the correct values? (timing diagram showing I/O path timing)

set_input_delay defines when external inputs are valid relative to your clock edge; set_output_delay defines when outputs must be stable relative to the receiving clock. These constraints model the I/O timing of external interfaces.

set_input_delay tells synthesis: “by the time this signal enters my chip, it’s already delayed by X nanoseconds from the source clock, so I have (period – X) nanoseconds to process it before the next clock edge.” set_output_delay tells synthesis: “the outside world expects my output to be stable X nanoseconds after the clock edge, so my design must have data ready by (period – X).” To calculate: you need the external path delay. For example, if an external chip drives your input with 2 ns of PCB delay, and you need setup time of 1.5 ns at your receiver, then input_delay is 2 + 1.5 = 3.5 ns. For outputs, if your flip-flop Q has 1.5 ns output delay and the external chip needs 2 ns of PCB delay plus 1 ns setup, then output_delay is 2 – 1.5 = 0.5 ns (the external chip must see your output after 0.5 ns).

I/O Path Timing Diagram:

INPUT TIMING (set_input_delay):
External Clk: |_|‾|_|‾|_|‾|_|‾|_|‾|_|‾
Data (PCB):  _____|‾‾‾‾‾‾‾‾|_____
                  <--2ns-->
My Input Pin:  ________|‾‾‾‾‾|____
               <--------3.5ns-------> set_input_delay = 3.5 ns
My Clock:   |_|‾|_|‾|_|‾|_|‾|_|‾|_|

OUTPUT TIMING (set_output_delay):
My Clock:   |_|‾|_|‾|_|‾|_|‾|_|‾|_|
My Flop Q:     |_|‾‾‾‾‾‾‾|_____
              <--1.5ns-->
PCB Delay:        |_|‾‾‾‾‾|_______
                     <-2ns->
Rx Setup:                  |<-1ns->|

set_output_delay = 2 + 1 - 1.5 = 1.5 ns
(Data must leave chip 1.5 ns after clock edge)

💡 Tip: I’ve debugged many timing mismatches where the input_delay was set pessimistic (too early in the clock cycle), burning precious setup time. Work with the board team to nail down PCB delay measurements, then add only the real setup margin you need—not a blanket safety margin.

Q15. What is a false path? Give 3 real examples (mode select, test path, async interface)

A false path is a timing path that logically never executes, so synthesis should ignore its timing slack. Declaring false paths prevents the tool from wasting resources optimizing non-critical paths.

Example 1 — Mode Select: Your design has two modes controlled by mode_sel: “fast path” and “slow path” that cannot both be active simultaneously. The slow path might not meet timing if optimized normally, but it’s guaranteed to be inactive when fast_clk is running. You declare the cross-path timing as false so synthesis focuses on the active fast path. Example 2 — Test Mode: You have a test data path that’s only used during manufacturing (test_en = 1). In normal operation (test_en = 0), this path is disabled. Synthesis doesn’t need to meet timing on the test path—declare it false. Example 3 — Async Interface (Reset): An asynchronous reset signal that reaches multiple flops takes different paths with different delays. You don’t care about the relative timing (one flop gets reset 10 ns before another)—the reset is asynchronous anyway. Declare those cross-flop reset paths as false.

Example SDC:

# Mode select: fast path and slow path never both active
set_false_path -from [get_pins mode_sel_reg/Q] -to [get_pins slow_datapath/*]

# Test mode path (disabled in normal operation)
set_false_path -from [get_pins test_en_reg/Q] -to [get_pins test_mux_out/*]

# Async reset paths (relative timing doesn't matter)
set_false_path -from [get_ports async_reset] -to [get_pins arbiter/*]

📌 Note: Aggressive false pathing is a common mistake. If you declare a path as false and it actually executes, synthesis won’t optimize it—your silicon fails. Only mark paths as false when you’re 100% sure they never run simultaneously with the critical paths.

Q16. What is a multicycle path? When do you use it? (example: 2-cycle arithmetic operation)

A multicycle path tells synthesis: “this path has N clock cycles to settle, not just 1 clock cycle,” allowing longer combinational delays than normal. Use it when an operation intentionally takes multiple cycles and can wait.

Example: a 32×32-bit multiplier that takes 3 clocks to produce a result. The path from multiplicand_reg to result_reg can wait 3 cycles, so you can use smaller, slower gates than if it had to complete in 1 cycle. set_multicycle_path 3 tells synthesis the data has 3 clock periods to propagate. Synthesis then calculates required time as: (current_clock_edge + 3*period – uncertainty) instead of (current_clock_edge + period – uncertainty). This reduces pressure on the mapper to upsize gates, saving area and power. Multicycle paths are very common in datapath designs: multiply-accumulate, divide operations, filter calculations, any arithmetic that’s pipelined or takes multiple clocks internally.

Example:

# 32x32 multiplier takes 3 cycles
set_multicycle_path 3   -from [get_pins mult_a_reg/Q]   -to [get_pins mult_result_reg/D]

# Same for the B operand
set_multicycle_path 3   -from [get_pins mult_b_reg/Q]   -to [get_pins mult_result_reg/D]

💡 Tip: Multicycle paths are powerful but subtle. If you declare a path as 2-cycle and data actually arrives in 1 cycle, hold time violations will appear in P&R (the receiving flop captures data too early). Verify datapath delays with PrimeTime before setting multicycle values.

Q17. What is set_max_delay -datapath_only? How is it different from a multicycle path?

set_max_delay -datapath_only specifies an absolute maximum delay on a path, ignoring normal clock-to-clock timing checks. It’s different from multicycle because it doesn’t assume any specific clock relationship—it just says “this path must propagate within X nanoseconds.”

Multicycle paths assume regular clock edges with a defined period; they’re for synchronous datapaths. set_max_delay -datapath_only is for asynchronous or semi-synchronous interfaces where you can’t reference a clock. Example: a synchronizer output that must settle before the next flop samples it, but the timing isn’t tied to any regular clock edge. Or a one-time pulse that must propagate through combinational logic within a certain window. The -datapath_only flag tells the tool to ignore clock to output delays (clock skew, jitter) and just constrain the data path itself. This is less common in synthesis—most synchronous designs use multicycle or false_path. But for mixed-domain designs or asynchronous protocols, it’s essential.

📌 Note: set_max_delay is a blunt tool. It doesn’t distinguish between setup and hold—it just says “make this path fast enough.” Use multicycle or set_false_path when you can, because they’re more explicit about timing intent.

Q18. What is a generated clock? Give a real example (clock divider, PLL output)

A generated clock is a clock derived from another clock via logic (divider, mux, PLL), not an external input. You define it so synthesis understands the relationship between source and derived clock.

Example 1: Clock divider. Your design has a 400 MHz main clock, but a sub-block needs 100 MHz. You have a divide-by-4 counter: every 4 clocks, a flip-flop toggles, creating a 100 MHz clock. Without declaring this as a generated clock, the tool sees the divider output as random logic with no clock properties. You use create_generated_clock to say: “this clock is derived from main_clk, divided by 4, so its period is 20 ns (4 × 5 ns).” Example 2: PLL output. Your PLL multiplies an external 25 MHz ref clock to 400 MHz. You create_generated_clock with -multiply_by 16 to tell synthesis the output is a real clock with known frequency and phase. Generated clocks are essential for hierarchical designs where different blocks run at different frequencies. Without them, constraint propagation breaks.

Example:

# Divide-by-4 clock divider
create_generated_clock   -name clk_div4   -source [get_ports main_clk]   -divide_by 4   [get_pins div_counter_reg/Q]

# PLL output: multiply by 16
create_generated_clock   -name clk_pll_400   -source [get_ports ref_clk_25]   -multiply_by 16   [get_pins pll/clk_out]

💡 Tip: Always verify generated_clock definitions by running report_clocks. I’ve seen designs fail because a generated clock was defined on the wrong pin or with wrong multiply/divide factors.

Q19. What is a virtual clock? When do you use it?

A virtual clock is a fictitious clock that exists only in your SDC for constraint purposes—it’s not actually used by any flop in the design. Use virtual clocks to constrain I/O timing when your input/output clock source is off-chip and you want timing to be relative to that external clock, not any internal clock.

Example: Your chip receives data synchronized to an external clock (on PCB board). The data enters your chip 3.5 ns after that external clock edge. You can’t connect the external clock directly to synthesis (it’s not a port), so you create a virtual clock with the same period as the external clock, then use it as the reference for set_input_delay and set_output_delay. This way, all I/O timing is relative to the external synchronization point, decoupling it from your internal clock tree. Virtual clocks are common in interfaces like LPDDR, Ethernet, SPI, where the external clock and data must be timed together. They simplify I/O constraints because you specify timing relative to a known external clock, not guessing at internal propagation delays.

Example:

# Virtual clock representing the external interface clock (not in RTL)
create_clock -name virtual_ext_clk -period 10.0 [get_ports virtual_ext_clk]

# But this port doesn't exist! Mark it as virtual:
# (some tools require a special flag or just don't connect it to any pin)

# Now constrain I/O relative to this virtual clock
set_input_delay -clock virtual_ext_clk -max 3.5 [get_ports {ext_data*}]
set_output_delay -clock virtual_ext_clk -max 4.0 [get_ports {ext_data_out*}]

📌 Note: Virtual clocks are purely for constraint definition. The tool never uses them to gate logic or drive flops. They’re a notational convenience for I/O timing.

Q20. How do you analyze a timing report? (anatomy of report_timing output with annotated example)

A timing report shows slack (how close to the timing deadline), the critical path, and per-gate delays. You analyze it by identifying the worst slack and tracing the path to find optimization opportunities.

The report lists each path element: the net name, the gate that drives it, the gate delay, the net (interconnect) delay, and the cumulative arrival time at each node. Slack = required_time – arrival_time. Positive slack means you’re safe; negative slack (violation) means the path is too slow. The report usually highlights the worst few paths and annotates with cell names, so you can see which gates are slowest. You look for opportunities: Can you upsize gates on the critical path? Can you reduce fanout? Are there surprising delays (like a long interconnect net that implies far placement)? The arrival time column shows cumulative delay; sharp jumps indicate bottleneck gates.

Example report_timing output (annotated):

Path: U1/Q -> U8/Z
                          Incr      Path
Instance  Pin    Net      Time      Time  Description
----------------------------------------------------------
input     clk    clk      0.00      0.00  clk (clock edge)
U1        Q      q1       0.35      0.35  DFF_X2 clk->Q, tco = 0.35
U2        A1     u2_a     0.08      0.43  AND2_X1 (gate delay)
U2        Z      u2_z     0.22      0.65  AND2_X1 (interconnect: 0.05, gate: 0.17)
                                           ↑ Gate delay is high! Consider upsizing
U3        A      u3_a     0.04      0.69  BUF_X1
U3        Z      u3_z     0.15      0.84  BUF_X1
...
U8        A      u8_a     0.02      1.95  NOR3_X2
U8        Z      result   0.18      2.13  NOR3_X2 (this is the final result)

Required Time:  5.00 ns     (clock period minus setup uncertainty)
Arrival Time:   2.13 ns
SLACK:         +2.87 ns     ✓ PASS (positive slack)

💡 Tip: When reading reports, look for paths with small slack (< 0.5 ns margin). Even though they pass, they're at risk if foundry corner files are pessimistic or P&R adds jitter. Focus optimization on the top 5–10 paths by slack; optimizing the 100th path is rarely worth the effort.

Section 3: Area, Power & Optimization (Q21–Q30)

Q21. What is resource sharing in synthesis? Give an example (shared adder)

Resource sharing is an optimization where a single hardware resource (like an adder) is reused for multiple operations in different clock cycles, instead of instantiating separate hardware for each operation.

Example: Your RTL has two additions: sum1 = a + b (in cycle 1) and sum2 = c + d (in cycle 3). Naively, synthesis creates two separate adders. But if you use resource sharing, you can reuse one adder: multiplex {a,b} into the adder in cycle 1, multiplex {c,d} into it in cycle 3, and latch the results. This reduces area significantly (one adder instead of two), but adds muxes and a latch, plus requires careful timing (the adder must settle between cycles). Modern high-level synthesis tools do this automatically if you write RTL that’s structurally sharable (separate operations at different times). Combinational synthesis tools like Design Compiler also recognize sharing opportunities in complex datapaths. The tradeoff: less area, but higher latency and potentially more power (muxes and latches add overhead).

📌 Note: Resource sharing works best in highly pipelined designs or designs with low duty cycle operations (operations that don’t happen every cycle). If your adder must produce a result every cycle, sharing doesn’t help.

Q22. What is retiming? When is it beneficial?

Retiming is an optimization where the synthesis tool moves flip-flops from one side of combinational logic to the other, preserving functionality while reducing critical path delay or register count.

Example: You have flop A -> 10-gate chain -> flop B. If the 10-gate chain has high delay, retiming can push a flop deeper into the logic (after gate 5), creating two shorter paths: flop A -> 5-gate -> flop C -> 5-gate -> flop B. Total delay is now the worst of the two paths, often much less than the original 10-gate delay. The number of flops stays the same, but they’re better positioned. Retiming is most beneficial in datapaths with multiple stages (pipelined multipliers, dividers, accumulators) where long combinational chains can be broken by inserted flip-flops. It’s less useful in control logic where chains are short and interconnect dominates. Modern tools do retiming automatically during compile_ultra; many teams disable it if they have tight datapath control (e.g., if adding pipeline stages would break your latency budget).

💡 Tip: Retiming can increase power and area slightly (extra flops, extra muxes to manage the reordering). If you have tight power/area budgets, disable retiming with set_dont_touch on the synthesis flow and manually pipeline critical paths in RTL instead.

Q23. What are multi-threshold cells (HVT/LVT/SVT)? When do you use each?

Multi-threshold cells are different versions of the same logic function with different threshold voltages (Vt), trading off speed, leakage power, and area. HVT (high Vt) cells are slow and low-leakage; LVT (low Vt) cells are fast and high-leakage; SVT (standard Vt) is the middle ground.

Cell Type Speed Leakage Power Area When to Use
HVT (High Vt) Slow (10–15% slower) Very Low (50% less) Small Non-critical paths, always-on domains, leakage-sensitive designs
SVT (Standard Vt) Typical Typical Typical General-purpose logic, default choice
LVT (Low Vt) Fast (10–15% faster) Very High (2–3x more) Slightly larger Critical paths only, timing-critical blocks, performance-driven SoCs

In power-constrained designs (mobile, IoT), you use HVT for the bulk of the design and LVT only where absolutely necessary for timing closure. In performance-driven designs (data center, high-end processors), you might use more LVT to hit high frequencies. SVT is the default; start there and switch only when needed.

💡 Tip: Be aggressive with HVT cells in non-critical areas. I’ve seen designs ship with 30% higher leakage power than needed because synthesis was too conservative and used LVT everywhere. Set the default to HVT, then use set_liberal_sizing to upgrade cells only on critical paths.

Q24. How does clock gating reduce power? What percentage is typical?

Clock gating uses a latch and AND gate to prevent the clock from toggling flip-flops in unused blocks, eliminating dynamic power dissipation in those flops and their logic when inactive.

When a clock is free-running, every flop in that domain toggles every cycle, consuming dynamic power even when the block is not in use. Clock gating inserts a circuit that stops the clock when a control signal (enable) is low. The control signal is latched (soft-latched with a latch, not a flop) so it’s metastability-safe. When enable is low, the AND gate outputs a constant 0, preventing clock pulses. This eliminates all dynamic power in the gated domain (except the gating logic itself, which is tiny). Typical power savings: 15–40% of total chip power in real designs, depending on how much of the design is clock-gated. Data centers and AI chips often reach 40%+ savings because they can gate entire compute units when not needed. Mobile designs might see 20–30%.

📌 Note: Clock gating has a cost: setup time (the latch must settle before the clock is re-enabled) and area overhead. If you gate a small block (< 100 flops), the overhead might outweigh benefits. Gating is most effective on medium-to-large blocks with clear on/off control.

Q25. What is DesignWare? What are some DesignWare components?

DesignWare is Synopsys’s library of predesigned, optimized macrocells for common functions like adders, multipliers, dividers, memories, and multiplexers. Instead of synthesizing these from scratch (which is inefficient), you instantiate DesignWare components in your RTL.

Common DesignWare components: DW01_add (adder), DW02_mult (multiplier), DW02_div (divider), DW01_mux (multiplexer), DW01_cmp (comparator), DW01_prienc (priority encoder), DW_ram_rw (single-port RAM), DW_ram_2r_w (dual-port RAM). These are pre-optimized for area, delay, and power. When you instantiate DW02_mult with parameters (operand width, architecture choice), Synopsys generates the best multiplier for your target process and libraries. You can’t optimize DesignWare better than the designers at Synopsys, so using it is always the right choice for complex datapaths. DesignWare components are treated as black boxes during synthesis (not elaborated), so the tool respects their internal timing and doesn’t try to optimize inside them.

💡 Tip: If you’re writing your own adder or multiplier in RTL, stop. Use DesignWare. You’ll get better area, speed, and fewer bugs. The only exception is if you have a very specialized operation that DesignWare doesn’t cover.

Q26. What is boundary optimization? When should you disable it?

Boundary optimization is a synthesis feature that optimizes gates at the boundaries between modules, allowing cross-hierarchical optimization. Without it, each module is optimized in isolation, leaving inefficiencies at module interfaces.

Example: Module A drives a long chain of gates in Module B. Without boundary optimization, the tool optimizes A’s outputs for typical fanout, and B’s inputs for typical drive strength. But combined, they might be suboptimal. Boundary optimization allows the tool to upsize A’s drivers to match B’s input capacitance and vice versa, reducing delay across the boundary. Disable boundary optimization if: (1) you have a hierarchical design with multiple teams—each team owns a module and wants to optimize in isolation without cross-team interference; (2) you have a black-box module that will be replaced later—optimizing across its boundary would break when the box changes; (3) you have timing constraints between module interfaces that you want to enforce strictly. In flat designs (single team, no plan to replace modules), enable boundary optimization—it typically saves 2–5% area and reduces critical path delay by 1–3%.

📌 Note: Boundary optimization in hierarchical designs can cause subtle bugs: the tool optimizes based on assumed fanout and timing at a module boundary, but if a different version of that module is used elsewhere, optimization might be wrong. Always run full-chip timing checks after enabling boundary optimization in hierarchical flows.

Q27. What is dont_use? Give examples of when you’d dont_use certain cells

dont_use is a constraint that tells synthesis: “never use this cell, even if it would be optimal.” You use it to exclude problematic cells from the synthesis library.

Examples: (1) A clock gate cell (CG_X4) that has a known bug in the foundry—you dont_use it to force the tool to use CG_X2 or CG_X8 instead. (2) A high-power latch that’s overkill for your design—dont_use forces the tool to use a simpler latch and save area. (3) A specialized cell designed for a different process corner—if you’re compiling for worst-case but a cell is only characterized for typical case, dont_use it to avoid optimism. (4) Cells from an older generation that you’re phasing out (old AND2 with poor timing is replaced by new AND2_FAST). You can dont_use at fine granularity: dont_use AND2_X1 but allow AND2_X2 and AND2_X4, forcing the tool to upsize instead of use the smallest weak gate.

💡 Tip: Use dont_use sparingly. It’s a workaround, not a solution. If a cell is broken, fix it in the .lib or work with the foundry. Over-constraining the tool with dont_use can force it into suboptimal choices and waste area/power.

Q28. How does synthesis handle unresolved module references (black boxes)?

When synthesis encounters a module that’s not elaborated (a black box), it treats it as a fixed timing and area component: no optimization inside, just buffer the inputs and outputs.

Black boxes are common when: (1) a module is a memory compiler output (RAM, ROM) that’s pre-optimized; (2) a module is from a third-party IP vendor; (3) a module is under separate design/timing closure (hierarchical design with multiple teams). The tool reads the black box’s timing characterization from a Liberty file or interface specification (pin delays, pin capacitances), then builds timing paths through it without trying to optimize internally. The tool might optimize logic driving the black box’s inputs or driven by its outputs, but not inside. If a black box path is on the critical path, you can’t close timing in synthesis—you’re stuck with whatever timing the black box has. You’d need to work with the IP owner to improve it. Black boxes are practical in hierarchical designs but create limitations—always characterize your black boxes thoroughly and provide good timing/area specs to downstream teams.

📌 Note: Missing black box timing specs cause cascading problems. If a team forgets to provide a Liberty file for their black box, synthesis will assume zero delay and timing will be wildly optimistic. Then P&R fails because real silicon is much slower.

Q29. What is size_cell? When do you manually size cells?

size_cell is a manual override command in some EDA tools that lets you directly set a cell instance to a specific size (X1, X2, X4, etc.), overriding the tool’s automated sizing.

You manually size cells when: (1) the tool made a wrong choice (over-conservative or too aggressive); (2) you have a timing bug that points to a specific gate needing a larger version to close slack; (3) you’re doing post-synthesis optimization and don’t want to re-run the full compile (too slow). Example: a critical path has 50 gates, but slack is still -0.2 ns. Upsizing a mid-path buffer from X1 to X4 might save 0.1 ns, closing the margin. Manual sizing is usually a last resort—if you need it, it means your compilation strategy was suboptimal. In modern flows, you’d fix the issue in SDC (tighten critical path constraints) and re-compile rather than manually tweaking cells. But for quick fixes or targeted optimization in hierarchical designs, size_cell is useful.

💡 Tip: Don’t rely on manual sizing. It’s a sign your synthesis flow or constraints need rework. But when you’re under deadline and the chip ships in 48 hours, size_cell is your friend.

Q30. Scan insertion in synthesis — what happens to the netlist? (DFT_Compiler flow)

Scan insertion is a design-for-test step that modifies the netlist to convert functional flip-flops into scannable flops, allowing test vectors to be shifted in and out for manufacturing test.

Synopsys DFT_Compiler performs scan insertion: it replaces each DFF (D flip-flop) with a DFF_SCAN that has two modes: (1) functional mode—clocked by the main clock, operates normally; (2) scan mode—clocked by a scan clock, data shifts through a chain. All flops are connected into a scan chain: the Q output of one flop connects to the scan input (SI) of the next, forming a long serial shift register. For test, you shift in a test pattern through the scan chain, clock the design once (capture cycle) to measure outputs, then shift out results through the chain. This allows stuck-at and transition fault testing of the entire design. Scan insertion adds: (1) area (~2–5% typically, depending on scan depth); (2) power (shift clocks consume power during test); (3) timing overhead (DFF_SCAN has slightly higher delay than DFF). The benefit: manufacturability—without scan, only logic accessible from primary inputs can be tested, leaving internal logic untested. With scan, 95%+ of faults can be tested.

📌 Note: Scan insertion usually happens after functional synthesis, in a separate DFT compile step. If you enable scan insertion during functional compilation, it can mess up timing closure—separate the concerns.

Section 4: Advanced Synthesis (Q31–Q40)

Q31. What is physical-aware synthesis? How does it differ from flat synthesis?

Physical-aware synthesis (PAS) incorporates placement and timing information during synthesis, making better decisions about cell selection and sizing. Flat synthesis ignores physical layout and assumes ideal interconnect.

In traditional (flat) synthesis, the tool has no idea where cells will be placed or how long interconnect will be—it assumes all nets have zero delay. It optimizes combinational logic assuming unit-delay gates. Then P&R comes along, places cells, routes wires, and measures real delays. Timing often fails because interconnect delays were underestimated. Physical-aware synthesis works with P&R tools (like Synopsys Fusion Compiler) to have a preliminary placement in mind during synthesis. The tool estimates interconnect delays based on expected placement, chooses cell sizes considering realistic wire delays, and routes the design better. The payoff: fewer timing iterations between synthesis and P&R, tighter closure. The cost: slower synthesis runtime and requires placement tool integration. PAS is most valuable for high-frequency designs (>2 GHz) or ultra-dense designs where interconnect dominates delay. For simpler designs, flat synthesis is often sufficient.

💡 Tip: PAS is gaining adoption, but many companies still use flat synthesis + P&R iteration. It’s a flow choice—if you have a tight timeline, PAS might save a week. If you can afford P&R iterations, flat is simpler to manage.

Q32. What is hierarchical synthesis? What is an Interface Logic Model (ILM)?

Hierarchical synthesis synthesizes a design module-by-module, each team owning their block without seeing others’ internals. An ILM (Interface Logic Model) is a timing model of a synthesized block that downstream teams use without elaborating the full RTL.

In a 500M-gate SoC, one team can’t synthesize the entire design—runtime and memory explode. Instead, the chip is divided: CPU team owns the core, memory team owns the L3 cache, NOC team owns the interconnect, etc. Each team synthesizes their block to an ILM: a netlist-free timing/area model that characterizes pin-to-pin delays, capacitances, and power. Another team building the memory controller doesn’t need the full CPU RTL; they use the CPU’s ILM to constrain their timing. This enables parallel development and design reuse. The downside: a module owner can’t see global optimization opportunities (a gate at a module boundary might be sub-optimal but appears optimal locally). ILMs are conservative (over-estimate delays) to be safe for downstream users. Creating good ILMs requires discipline; a poorly modeled interface can propagate errors through the entire design.

📌 Note: ILM generation and validation is a critical path item on large SoC projects. If an ILM is wrong, multiple teams’ work is at risk. Allocate time for ILM review and independent verification.

Q33. What is incremental synthesis? When is it used?

Incremental synthesis is re-running synthesis on a design that’s already been synthesized, keeping most of it unchanged and only re-optimizing changed or affected logic.

Use cases: (1) RTL bug fix—a small module has an RTL change, re-synthesize only that module, not the whole chip. (2) Constraint update—you tighten timing on one path, re-compile to optimize just that path. (3) Post-P&R physical feedback—P&R reports timing issues in specific regions, bring the netlist back to synthesis for localized re-optimization. Incremental synthesis is much faster than full synthesis (minutes instead of hours) but requires the synthesis tool to track which gates were affected by changes and only re-optimize their neighborhoods. Design Compiler has ECO (Engineering Change Order) capabilities for this. In realistic project schedules, incremental synthesis is essential—you can’t afford to wait 4 hours for each small RTL change. The challenge: ensuring incremental results are consistent with full re-synthesis (no regressions in unrelated paths). Many teams run both incremental and full compile weekly for validation.

💡 Tip: Incremental synthesis can introduce subtle bugs if the tool’s change-tracking is wrong. Always compare incremental results to a baseline full synthesis before committing. Don’t trust incremental blindly.

Q34. What is an ECO (Engineering Change Order)? Metal ECO vs functional ECO

An ECO is a small design change made after tapeout, either at the netlist level (functional ECO) or at the metal layers post-fabrication (metal ECO).

Functional ECO: A bug is found in the synthesized netlist (pre-silicon or in simulation). Instead of re-synthesizing the entire 500M-gate design, an ECO applies a surgical fix: remove a few gates, add new logic, rewire connections. The rest of the design is untouched. Functional ECOs must preserve functionality and not break timing elsewhere—this requires careful analysis and simulation. Metal ECO: The netlist is locked, but fabrication reveals a bug (a stuck-at fault in one critical path, a missing constraint). A metal ECO modifies only the metal interconnect layers (no transistor layout change). This is much cheaper than a re-spin—you only re-make metal layers 5–7, saving months and millions of dollars. Metal ECOs are constrained (you can only rewire existing logic, not add new gates). Teams carefully plan metal ECOs during initial design to identify signals that can be cut/rewired safely. The ultimate goal: catch bugs before silicon, but if they slip through, ECOs are your lifeline.

📌 Note: Metal ECO planning should start in synthesis. Identify critical signals that might need re-routing, leave spare wiring, and document ECO risks. Late-stage ECO discovery often means re-tapeout is the only option.

Q35. How do you handle hold violations in synthesis?

Hold violations occur when a signal arrives at a flip-flop input too early (before the hold time window closes), causing the flop to capture the wrong value. Synthesis traditionally focuses on setup time (meeting the clock period deadline) and leaves hold fixing to place-and-route.

However, synthesis can introduce hold violations by making paths too fast. If you upsize gates aggressively to meet setup, you might create a path so fast that the receiving flop captures data before its hold time is satisfied. Modern synthesis tools have set_min_delay constraints to prevent this: they ensure paths are not faster than a minimum threshold. In practice, hold violations are rare in synthesis (combinational paths have no hold check—hold violations are flop-to-flop issues after P&R adds real delays). But in high-frequency designs or when retiming is aggressive, you might see hold problems in synthesis. The fix: add delay (use set_min_delay), insert buffers, or adjust your optimization strategy to not be hyper-aggressive on non-critical paths. Most teams leave hold fixing to P&R, where the physical layout is known. Synthesis teams just ensure they don’t make things worse.

💡 Tip: If you see hold violations in a timing report from synthesis, it’s usually a red flag—it means your constraints are poorly tuned or the optimization got too aggressive. Investigate before moving to P&R.

Q36. What does write_sdf produce? How is it used downstream?

write_sdf (Standard Delay Format) is a file output from synthesis containing gate and net delays extracted from the netlist and Liberty library. Downstream tools (simulator, timing analyzer) use it for accurate delay modeling without needing the Liberty file.

After synthesis writes the netlist in Verilog, it also generates an SDF file with entries like: (DELAY (ABSOLUTE …) (IOPATH D Q (0.35 0.38)))) — meaning the delay from D to Q is 0.35 ns (typical corner) to 0.38 ns (pessimistic corner). Simulators annotate this SDF onto the RTL: when you simulate, gates have real delays instead of #0. This enables gate-level simulation matching synthesis timing. For timing closure, P&R tools import the SDF to initialize timing constraints before CTS. The SDF also documents which corners (WC, BC, TT) were synthesized, so P&R can apply the right SDF for the right corner. Without SDF, downstream tools would have to re-derive delays from the Liberty file, which is slower and error-prone.

📌 Note: SDF is corner-specific. If you synthesize multiple corners (WC, BC, TT), generate SDF for each. Using the wrong SDF in P&R causes timing mismatches.

Q37. What is check_timing? What warnings are dangerous vs ignorable?

check_timing is a comprehensive validation tool that reports timing setup issues: undefined clocks, missing constraints, loops, and inconsistent clock domains. Some warnings are critical; others are benign.

Dangerous warnings: No clocks defined (design has no timing constraints—tool cannot function), unconstrained paths (logic with no input/output delays—optimization is meaningless), timing loops (combinational logic that feeds back on itself—mathematically unsolvable). Ignorable warnings: Virtual clocks undefined (expected, they’re on-purpose fictitious), black box timing not available (if the box is timing-safe, this is fine), generated clock source undefined (OK if the source is internal). Many teams set up report_timing to ignore certain warnings automatically so the critical ones stand out. Running check_timing before every synthesis compile is good practice—it catches constraint bugs early, saving hours of debugging downstream. If check_timing reports a dangerous warning, don’t proceed: fix the constraint before synthesis.

💡 Tip: I’ve seen many projects waste a week because someone ignored a “no clocks defined” warning from check_timing. The tool compiled the design (without constraints), P&R ran (using zero constraints), and timing failed on silicon. Run check_timing religiously.

Q38. Context-dependent synthesis — what does it mean and why does it matter for hierarchy?

Context-dependent synthesis means the tool optimizes a module differently depending on how it’s used in the larger design (its context). Without context, a module is optimized in isolation with pessimistic assumptions.

Example: A multiplier that’s used in two places. In context A, the multiplier is on the critical path and every nanosecond matters. In context B, the multiplier has 10 ns of slack and doesn’t matter. Context-dependent optimization would create two versions of the multiplier: one aggressive (fast, big), one conservative (small). Without context awareness, you synthesize the multiplier once conservatively (to be safe), then both usages are sub-optimal—context A is slow, context B wastes area. In hierarchical designs, teams often synthesize each block independently (no context) because they don’t know how others will use them. This leads to over-optimization in non-critical blocks and under-optimization in critical ones. Some flows (like Synopsys Fusion Compiler) support context-aware synthesis where the hierarchy is flattened locally around critical areas to enable better decisions. For large teams, context-dependent synthesis is hard to coordinate, so many accept the inefficiency and focus on correctness instead.

📌 Note: Context-aware synthesis is a research direction, not yet standard practice. Most real designs use hierarchy with local optimization per block, accepting some inefficiency for design scalability.

Q39. How do synthesis results change between corner libraries (WC/BC/TT)?

Process corners (WC/BC/TT) represent manufacturing variations and temperature/voltage extremes. Synthesizing against different corners produces different results: WC (worst-case) is slowest but safest; BC (best-case) is fastest but risky; TT (typical) is a middle ground.

When you compile against WC (worst-case: high Vt, low voltage, high temp), the tool assumes every gate is slow. It upsize aggressively to meet timing, resulting in a larger, higher-power design. This design is guaranteed to work on silicon—even if the fab is worst-case, you’re covered. When you compile against TT (typical), gate delays are faster, so less upsizing is needed—you get a smaller design, but if silicon is worse than TT, timing fails. BC (best-case) produces the smallest design but is very risky. Standard practice: compile against WC for functional designs, test with TT to verify it works in typical conditions, and simulate BC to understand best-case behavior. Many SoCs compile against WC for the main datapath and TT for non-critical areas to balance safety and area. The corner choice is a business decision: do you want safety margin (WC, bigger chip) or aggressive optimization (TT, risk of failures)? Most teams choose WC for production chips.

💡 Tip: Always compile the same design against all three corners and document the results. If WC is 50% larger than TT, that’s a red flag—maybe your TT library is optimistic or your constraints are wrong. Cross-check corner results to build confidence.

Q40. What are the most common synthesis failure modes? (top 5 with how to fix each)

Synthesis is robust, but certain mistakes are predictable. Here are the top failure modes I’ve seen:

Failure Mode Root Cause Fix
Timing Cannot Close Clock period too tight, RTL is inherently slow, critical path has 20+ gates Increase clock period, pipeline the RTL, identify the 3–5 bottleneck gates and inline them, apply set_critical_range to focus compile effort
Area Explosion Too aggressive timing constraints, over-upsize due to missing false_path declarations, HVT cells not used Review and tighten SDC constraints, add false_path for non-critical paths, switch to HVT by default with LVT only on critical paths, run compile with -area_high_effort
Compilation Timeout Design too large to synthesize in one pass, compile_ultra on 500M gates, hierarchy flattened Break design into smaller blocks, use hierarchical synthesis, use compile instead of compile_ultra, increase server memory
Unmet Library Requirements Wrong .lib file (TT instead of WC), missing DesignWare library, black box Liberty not found Verify .lib corners in SDC, ensure DesignWare is licensed and installed, provide Liberty for all black boxes, run check_lib to validate library format
Netlist Verification Failure check_design reports floating pins, undriven nets, clock loops, multi-driven nets Ensure RTL is elaborated correctly (check module hierarchy), tie all unused pins (dont_touch tie cells), verify clock tree is connected, trace and fix multi-driven nets

📌 Note: The best defense against synthesis failures is a checklist: (1) run check_design immediately after read, (2) validate SDC with check_timing before compile, (3) audit compile logs for warnings, (4) cross-check timing results against PrimeTime, (5) regenerate netlists monthly to catch tool version issues. Prevention beats firefighting.

Interview Cheatsheet: RTL Synthesis Most-Asked Topics by Company

Company / Team Most-Asked Topics Key Questions
Synopsys (Design Compiler) Compile algorithms, multi-level optimization, constraint semantics Q6, Q11, Q20, Q37
Intel (P-Core Design) Timing closure, retiming, physical synthesis, corner analysis Q7, Q22, Q31, Q39, Q40
ARM (RTL Design) Hierarchical synthesis, ILM generation, area/power tradeoffs Q32, Q21, Q25, Q26
AMD (Timing Closure) False paths, multicycle paths, optimization strategy, incremental flows Q15, Q16, Q33, Q35
TSMC (IP & Foundry) Liberty files, process corners, cell characterization, DesignWare Q3, Q23, Q25, Q39
Google (Chip Design) Tool flow integration, open-source synthesis, power optimization Q24, Q27, Q28, Q30

Resources & Further Reading

  • Synopsys Design Compiler Documentation — The official reference for DC, compile_ultra, and SDC syntax. Focus on the “Optimization Strategy” section.
  • IEEE/ACM Papers on Synthesis Algorithms — Search for “technology mapping,” “retiming,” and “multi-level boolean optimization” for deeper theory.
  • Design Compiler Tutorial (Synopsys) — Many companies have internal versions. Find the “SDC Constraints” chapter and the “Timing Analysis” guide.
  • Liberty Format Manual — Understand what’s in a .lib file. Essential for debugging library issues.
  • PrimeTime User Guide — Post-synthesis timing verification. Run PrimeTime on your synthesized netlist to cross-check DC reports.
  • On-the-Job Practice — Nothing beats real synthesis runs. Volunteer for the synthesis rotations on your team’s next project.

Final Interview Tip: When an interviewer asks a synthesis question, they want to hear that you understand *why* a decision matters, not just *what* the tool does. Mention a real project, a failure you debugged, or a trade-off you evaluated. “I’ve shipped three tapeouts…” beats “The textbook says…” every single time.

Share. Facebook Twitter LinkedIn Email Telegram WhatsApp
Previous ArticleCounters – Digital Circuits
Next Article Registers – Digital Circuits
Raju Gorla
  • Website

Related Posts

Interview Questions

DFT Interview Questions and Answers for VLSI Engineers

19 March 2026
Interview Questions

STA Interview Questions: 52 Real-World Questions with Answers (2026)

18 March 2026
Interview Questions

TCL Interview Questions for VLSI Engineers

6 November 2024
Add A Comment
Leave A Reply Cancel Reply

Topics
  • Design Verification
  • Digital Circuits
  • Informative
  • Interview Questions
  • Physical Design
  • RTL Design
  • STA
  • System Verilog
  • UVM
  • Verilog
Instagram LinkedIn WhatsApp Telegram
© 2026 VLSI Web

Type above and press Enter to search. Press Esc to cancel.