Close Menu
VLSI Web
  • Home
    • About Us
    • Contact Us
    • Privacy Policy
  • Analog Design
  • Digital Design
    • Digital Circuits
    • Verilog
    • VHDL
    • System Verilog
    • UVM
  • Job Roles
    • RTL Design
    • Design Verification
    • Physical Design
    • DFT
    • STA
  • Interview Questions
  • Informative
  • VLSI Forge
Facebook X (Twitter) Instagram LinkedIn
Instagram LinkedIn WhatsApp Telegram
VLSI Web
  • Home
    • About Us
    • Contact Us
    • Privacy Policy
  • Analog Design
  • Digital Design
    • Digital Circuits
    • Verilog
    • VHDL
    • System Verilog
    • UVM
  • Job Roles
    • RTL Design
    • Design Verification
    • Physical Design
    • DFT
    • STA
  • Interview Questions
  • Informative
  • VLSI Forge
VLSI Web
Interview Questions

AXI Interview Questions for VLSI Interviews

Raju GorlaBy Raju Gorla28 February 2024Updated:20 March 2026No Comments2 Mins Read
AXI Interview Questions
Share
Facebook Twitter LinkedIn Email Telegram WhatsApp

I’ve pulled together 40 AXI protocol interview questions based on design work with ARM, Xilinx, Microchip, and companies building custom interconnects. AXI is the backbone of most modern SoCs — if you’re designing with ARM IP, you’re working with AXI. The tricky part isn’t memorizing the spec; it’s understanding why certain design decisions were made and how they interact with your interconnect and system performance.

💡 Who This Is For: Digital and SoC design engineers interviewing at ARM, Qualcomm, Nvidia, AMD, or companies designing interconnects and memory controllers. If you’ve built AXI slaves, masters, or crossbars, or debugged performance issues with outstanding transactions, this guide is directly relevant to your work.

Table of Contents

  • Quick Navigation
  • Section 1: AXI Fundamentals
    • Q1. What is AMBA? Where does AXI fit in the AMBA family?
    • Q2. What are the 5 AXI channels? What does each carry? Show ASCII block diagram.
    • Q3. Explain the VALID/READY handshake. What are the rules? Show timing diagram.
    • Q4. Walk through a complete AXI write transaction — all 3 channels involved.
    • Q5. Walk through a complete AXI read transaction — both channels.
    • Q6. What is AxID? How does out-of-order completion work? Show interleaved transactions.
    • Q7. Burst types — FIXED, INCR, WRAP — when is each used? Show address sequence.
    • Q8. What is WSTRB? What's the minimum transaction unit in AXI?
    • Q9. What is AxLen, AxSize, AxBurst? How do you calculate the number of beats?
    • Q10. What are AXI4 differences from AXI3?
  • Section 2: Transactions & Performance
    • Q11. How do outstanding transactions improve throughput? How many can be in flight?
    • Q8. AxLEN and AxSIZE — how do you calculate total bytes transferred?
    • Q9. WSTRB (write strobe) — what is it? Give an example of a 32-bit write with only upper 2 bytes valid.
    • Q10. RRESP/BRESP — what do OKAY/EXOKAY/SLVERR/DECERR mean?
  • Section 2: Transactions & Performance
    • Q11. What are outstanding transactions? How do they improve throughput?
    • Q12. How does AXI handle out-of-order responses? What is the ID matching rule?
    • Q13. What is write data interleaving? (AXI3 vs AXI4 difference)
    • Q14. What is AxLOCK (exclusive access)? How does it implement atomic operations?
    • Q15. AXI ordering model — what guarantees does AXI make about transaction ordering?
    • Q16. How do you calculate AXI bus bandwidth? Give a formula and example.
    • Q17. What is AXI back-pressure? How does a slave signal it? Show timing diagram.
    • Q18. Write data before address — is it allowed? (AXI4 rules)
    • Q19. AXI QoS signals — what are they used for?
    • Q20. What is address alignment in AXI bursts? What happens with an unaligned start address?
  • Section 3: AXI Variants
    • Q21. AXI4 vs AXI3 — what changed? (burst interleaving, ID width, WID removal)
    • Q22. AXI-Lite — what are the restrictions vs AXI4? When do you use it?
    • Q23. AXI-Stream — what channels does it have? Show signal table.
    • Q24. AXI-Stream packet framing — how does TLAST work? TKEEP vs TSTRB?
    • Q25. ACE (AXI Coherency Extensions) — what cache states does it add?
    • Q26. ACE-Lite — how is it different from full ACE? When does a master use ACE-Lite?
    • Q27. What is an AXI crossbar/interconnect? How does it arbitrate multiple masters?
    • Q28. What is an AXI register slice? Why insert one? (pipelining, timing closure)
    • Q29. AXI CDC (clock domain crossing bridge) — how is it implemented?
    • Q30. CHI (Coherent Hub Interface) vs AXI — what problems does CHI solve?
  • Section 4: Design & Verification
    • Q31. How do you design an AXI slave interface? Show state machine for read channel.
    • Q32. Common AXI implementation bugs (top 5) — table with bug/impact/fix.
    • Q33. How does an AXI arbiter work? Show round-robin vs priority arbitration.
    • Q34. AXI protocol checker — what violations does it catch?
    • Q35. AXI performance bottlenecks — how do you identify them?
    • Q36. AXI in low-power designs — clock gating AXI channels
    • Q37. AXI debug — how do you debug a hung AXI transaction? Checklist.
    • Q38. AXI in FPGA (Xilinx SmartConnect / Microchip CoreAXI) vs ASIC
    • Q39. NIC-400 / CoreLink — what does an AXI interconnect IP provide?
    • Q40. AXI-Stream FIFO design — how do you handle backpressure from the slave?
  • Interview Cheatsheet: AXI by Company
  • Key Resources
    • Q4. Walk through a complete AXI write transaction — all 3 channels involved.

Quick Navigation

  • Section 1: AXI Fundamentals (Q1–Q10)
  • Section 2: Transactions & Performance (Q11–Q20)
  • Section 3: AXI Variants (Q21–Q30)
  • Section 4: Design & Verification (Q31–Q40)
  • Interview Cheatsheet

Section 1: AXI Fundamentals

Q1. What is AMBA? Where does AXI fit in the AMBA family?

AMBA (Advanced Microcontroller Bus Architecture) is ARM’s ecosystem of on-chip interconnect protocols. It includes APB (Advanced Peripheral Bus — simple, low-bandwidth), AHB (Advanced High-performance Bus — predecessor to AXI), and AXI (Advanced eXtensible Interface — modern, high-performance).

AXI is the workhorse of modern SoCs. It supports multiple outstanding transactions, out-of-order completion, and wide data paths (up to 1024 bits in AXI5). If you’re building a data center SoC or GPU, you’re using AXI or a variant (ACE for coherent, AXI-Stream for data-flow, CHI for next-gen coherence).

Q2. What are the 5 AXI channels? What does each carry? Show ASCII block diagram.

AXI is a 5-channel interface between a Master (like a CPU) and a Slave (like memory). Each channel is independent and operates via a VALID/READY handshake:

Master                                Slave
  |------- Write Address Channel ------>|
  |      (AxAddr, AxID, AxLen, ...)     |
  |------- Write Data Channel --------->|
  |     (WData, WStrb, WLast)           |
  |<------ Write Response Channel ------|
  |            (BResp, BID)             |
  |
  |------- Read Address Channel ------->|
  |       (AxAddr, AxID, AxLen, ...)    |
  |<------ Read Data Channel -----------|
  |      (RData, RResp, RID, RLast)     |

Channels (independent, can flow in parallel):
1. Write Address (AW)  → address, length, width, ID
2. Write Data (W)      → data payload, write strobes, last flag
3. Write Response (B)  → status (OKAY/EXOKAY/SLVERR/DECERR)
4. Read Address (AR)   → address, length, width, ID
5. Read Data (R)       → data payload, status, last flag

The key insight: address and data are decoupled. You can send an address and then data on different cycles. This allows pipelines and efficient buffering.

Q3. Explain the VALID/READY handshake. What are the rules? Show timing diagram.

VALID and READY are the core AXI handshake signals. On each channel, the transmitter asserts VALID when it has valid data; the receiver asserts READY when it can accept data. When both are high on a clock edge, a transfer occurs. This is sometimes called "ready-valid" or "acknowledge-enabled" handshaking.

Clock:    _|‾|_|‾|_|‾|_|‾|_|‾|_|‾|
VALID:    _|‾‾‾‾‾‾|_______|‾‾‾|_
READY:    _______|‾‾|_|‾‾‾‾‾|___
Data:     xxxx[A][A][B][C][C][D]xxxx
              ↑     ↑ ↑ ↑   ↑
            xfer  wait xfers xfer

Rules:
- VALID can be high for multiple cycles; READY can be low
- When VALID & READY high → one transfer
- Data must be stable while VALID is asserted
- READY can depend combinationally on VALID (recommended)

In practice, READY is usually a combinational function of FIFO occupancy or pipeline depth, allowing the slave to assert READY when it has buffer space.

💡 Tip: Many candidates get the handshake backwards. A transmitter can hold VALID high indefinitely; the receiver's READY controls flow. If you design a slave that waits for VALID to deassert before accepting the next transaction, you've missed the entire point of the handshake.

Q4. Walk through a complete AXI write transaction — all 3 channels involved.

Let's say a master wants to write 8 bytes to address 0x1000:

Cycle 0:
  AW Channel: AWVALID=1, AWADDR=0x1000, AWLEN=0 (1 beat), AWID=1
  W Channel:  WVALID=1, WDATA=0xDEADBEEFCAFEBABE, WSTRB=0xFF, WLAST=1
  Slave:      AWREADY=1, WREADY=1 (can accept)
  → Both channels transfer

Cycle 1:
  AW & W channels idle (transferred)
  B Channel:  BVALID=1, BRESP=OKAY, BID=1
  Master:     BREADY=1 (ready for response)
  → Write response returns with matching ID

Complete write transaction requires:
1. Address (AW) transmitted
2. Data (W) transmitted (can be reordered)
3. Response (B) received (must wait for slave to process)

The key point: address and data can arrive in different orders, but the response must include the ID so the master knows which write it's acknowledging.

Q5. Walk through a complete AXI read transaction — both channels.

A master reads 4 beats (16 bytes) from address 0x2000:

Cycle 0:
  AR Channel: ARVALID=1, ARADDR=0x2000, ARLEN=3 (4 beats), ARID=2
  Slave:      ARREADY=1
  → Read address accepted

Cycles 1-4:
  R Channel:  RVALID=1, RDATA=[beat0], RLAST=0, RID=2
              RVALID=1, RDATA=[beat1], RLAST=0, RID=2
              RVALID=1, RDATA=[beat2], RLAST=0, RID=2
              RVALID=1, RDATA=[beat3], RLAST=1, RID=2
  Master:     RREADY=1 for all beats

Read transactions can have multiple beats:
- ARLEN specifies number of beats - 1
- Each beat returns with ID and RLAST indicating final beat
- Slave determines beat pace (pipelining allowed)

Q6. What is AxID? How does out-of-order completion work? Show interleaved transactions.

AxID (Address ID) is a unique identifier for each transaction. In a system with multiple outstanding transactions, AxID allows the master to match responses to requests even if they complete out-of-order.

Master sends:
  Cycle 0: ARADDR=0x1000, ARID=1, ARLEN=0 (1 beat)
  Cycle 1: ARADDR=0x2000, ARID=2, ARLEN=0 (1 beat)

Slave (out-of-order) returns:
  Cycle 3: RDATA=data_from_2000, RID=2 ← completed in different order!
  Cycle 4: RDATA=data_from_1000, RID=1

Master matches RID to original request:
  RID=2 → data for 0x2000
  RID=1 → data for 0x1000

Key rule: within a single ID, responses must be in-order.
Different IDs can be out-of-order. This allows efficiency:
if ID=1 hits slow memory, ID=2 can complete from fast cache.

This is powerful for SoCs where the interconnect has multiple slave devices with different latencies.

Q7. Burst types — FIXED, INCR, WRAP — when is each used? Show address sequence.

AXI supports three burst types, controlling how the address increments:

INCR (Incrementing): Address increments for each beat (standard mode). For a 64-bit bus, burst of 4 at 0x1000 goes to 0x1000, 0x1008, 0x1010, 0x1018.

FIXED: Address stays the same. Used for FIFO-like interfaces where you write to the same address multiple times (e.g., pushing multiple items to a queue register).

WRAP (Wrapping): Address increments, but wraps within a boundary. For a 4-beat burst starting at 0x1004, it might wrap at 0x1000 (4-beat boundary): 0x1004, 0x1008, 0x100C, 0x1000.

Burst Type Use Case Example (4 beats, 8B)
INCR Standard memory, DMA 0x1000, 0x1008, 0x1010, 0x1018
FIXED FIFO register access 0x1000, 0x1000, 0x1000, 0x1000
WRAP Cache line fill 0x1004, 0x1008, 0x100C, 0x1000 (wraps)

Q8. What is WSTRB? What's the minimum transaction unit in AXI?

WSTRB (Write Strobe) is a byte-enable mask: each bit corresponds to a byte in WDATA. Bit 0 enables byte 0, bit 7 enables byte 7 (for 64-bit bus). If WSTRB bit is 0, the slave ignores that byte. This allows byte-granular writes without reading first.

Minimum transaction unit is 1 byte (byte-enable = 1 bit), but typically the bus uses 32-bit or 64-bit alignment. You can write a single byte by setting only one WSTRB bit.

Q9. What is AxLen, AxSize, AxBurst? How do you calculate the number of beats?

AxLen = number of beats - 1 (range 0–255 for INCR in AXI3, 0–255 for AXI4). So AxLen=3 means 4 beats. AxSize = log2(bytes per beat) — 0 for 1 byte, 1 for 2 bytes, 3 for 8 bytes. AxBurst = burst type (FIXED=0, INCR=1, WRAP=2).

Total bytes transferred = (AxLen + 1) × 2^AxSize. Example: ARLEN=7, ARSIZE=3 → (7+1) × 8 = 64 bytes.

Q10. What are AXI4 differences from AXI3?

AXI4 (released 2010, still dominant) changes from AXI3: longer burst length (INCR up to 256 vs 16 in AXI3), removed write data interleaving (simplifies hardware), added QoS signals, locked transactions different handling, and 4KB boundary rules clarified. AXI3 has write data interleaving (data from multiple transactions can interleave), which is complex; AXI4 removes it. For most designs, AXI4 is the better choice for its simplicity.

Section 2: Transactions & Performance

Q11. How do outstanding transactions improve throughput? How many can be in flight?

An outstanding transaction is one that has been sent but whose response hasn't been received yet. AXI allows multiple transactions to be in flight simultaneously, so the master doesn't have to wait for each to complete before sending the next. This is the core advantage of AXI over AHB.

For a system with 100 ns memory latency and 1 ns clock period, a single-transaction system is limited to 10 million transactions/second. With 100 outstanding transactions, you saturate the bus and achieve near-peak bandwidth. Modern ARM cores have 16–32+ outstanding read transactions (and fewer write) to hide this latency.

📌 Note: The number of outstanding transactions is limited by your ID space and buffer depth. If you only have 8 IDs, you can only have 8 outstanding transactions. This is why many high-performance systems use 8–16 bit IDs.

ffff">Normal memory access 0x1000 → 0x1008 → 0x1010 → 0x1018 FIXED FIFO writes 0x1000 → 0x1000 → 0x1000 → 0x1000 WRAP Cache line fills 0x1004 → 0x100C → 0x1014 → 0x100C

Q8. AxLEN and AxSIZE — how do you calculate total bytes transferred?

AxLEN (0–255) specifies the number of beats minus 1. AxSIZE (0–2) specifies the width: 0=1B, 1=2B, 2=4B, 3=8B, etc.

Total bytes = (AxLEN + 1) × (2^AxSIZE)

Example: ARLEN=7, ARSIZE=3 (8 bytes per beat) → (7+1) × 8 = 64 bytes.

This decoupling allows flexible transaction sizes. A system might have a 64-bit data bus, but you can specify single-byte accesses (AxSIZE=0) or burst entire cache lines (AxSIZE=3 with AxLEN=7 for a 512-bit transaction).

Q9. WSTRB (write strobe) — what is it? Give an example of a 32-bit write with only upper 2 bytes valid.

WSTRB (Write Strobe) is a per-byte enable signal. Each bit corresponds to one byte of WDATA. If WSTRB[3:0] = 0xC (binary 1100), only bytes [7:0] and [15:8] are written; bytes [31:16] and [23:32] are ignored.

32-bit write to address 0x1000, write only upper 2 bytes:
WDATA     = 0xDEADBEEF
WSTRB[3:0] = 0xC (binary 1100)
                  ↑↑   ↑↑
            bytes [31:24] and [23:16] written
            bytes [15:8] and [7:0] NOT written

This is crucial for sub-aligned writes. If you want to write a 16-bit value to an unaligned address, the slave can update only the relevant bytes without corrupting others.

Q10. RRESP/BRESP — what do OKAY/EXOKAY/SLVERR/DECERR mean?

Response status codes indicate transaction outcome:

Code Meaning Recoverable?
OKAY Success Yes (normal completion)
EXOKAY Exclusive access succeeded Yes (atomic op succeeded)
SLVERR Slave error (bad address, access denied) No (retry won't help)
DECERR Decode error (interconnect couldn't route) No (address unmapped)

Section 2: Transactions & Performance

Q11. What are outstanding transactions? How do they improve throughput?

Outstanding transactions are multiple transactions in flight simultaneously. For example, a master can send Read Request 1, then Read Request 2, before receiving Data 1. The slave processes both concurrently and returns data in any order (matched by ID).

Without outstanding transactions (blocking mode): Master sends request, waits for response, then sends next request. Throughput = 1 transaction per round-trip latency. With outstanding transactions: Master pipelines requests, hiding latency. Throughput = N transactions per round-trip, where N = number of outstanding transactions allowed.

This is why high-performance SoCs allow dozens or hundreds of outstanding transactions — it maximizes interconnect utilization and hides memory latency.

Q12. How does AXI handle out-of-order responses? What is the ID matching rule?

Responses are matched to requests using the ID field. When the master sends a read with ARID=3, it expects to get back RDATA with RID=3. Within a single ID, responses must be in-order (beat 0, beat 1, beat 2), but different IDs can return in any order.

This is enforced at protocol level — a slave cannot send RID=3 responses out-of-order with respect to other RID=3 responses, but it can interleave RID=3 and RID=4 responses.

📌 Note: This rule is crucial for interconnect design. Some interconnects (like crossbars) naturally preserve ID-based ordering; others require specific routing policies to ensure it.

Q13. What is write data interleaving? (AXI3 vs AXI4 difference)

In AXI3, write data from different write commands could be interleaved on the W channel (different WLAST signals). This allowed complex pipelining but was hard to verify. AXI4 removed interleaving — all beats for a write command must come as a contiguous group on the W channel.

Example (AXI3 allowed this, AXI4 does not):

AXI3 (interleaved allowed):
  Cycle 0: W beat0_cmd1, WLAST=0
  Cycle 1: W beat0_cmd2, WLAST=0  ← different command
  Cycle 2: W beat1_cmd1, WLAST=1
  Cycle 3: W beat1_cmd2, WLAST=1

AXI4 (no interleaving):
  Cycle 0: W beat0_cmd1, WLAST=0
  Cycle 1: W beat1_cmd1, WLAST=1  ← must finish before cmd2
  Cycle 2: W beat0_cmd2, WLAST=0
  Cycle 3: W beat1_cmd2, WLAST=1

AXI4 simplified this significantly, though it reduced pipelining flexibility. Most modern designs use AXI4.

Q14. What is AxLOCK (exclusive access)? How does it implement atomic operations?

AxLOCK is used for atomic read-modify-write operations. When a master sets ARLOCK=1, the slave locks the memory location for that transaction, preventing other masters from accessing it until the lock is released.

Typical flow: Master reads with ARLOCK=1 (data locked), modifies locally, writes back with AWLOCK=1 (unlocks after write). The slave returns EXOKAY if the lock was held; if another master accessed the location, it returns SLVERR, signaling the read-modify-write failed.

In practice, locks are complex in multi-master systems and are often implemented using transactions with explicit "acquire" and "release" semantics.

Q15. AXI ordering model — what guarantees does AXI make about transaction ordering?

Within the same ID, responses are in-order. Different IDs can be out-of-order. Writes to the same address might not be serialized if they have different IDs — the interconnect can forward them concurrently. To ensure strict ordering, the master should use the same ID for dependent transactions, or use explicit barriers (memory fences).

AXI doesn't provide "memory fence" primitives directly — that's handled at the system level (CPU instruction or software barrier).

Q16. How do you calculate AXI bus bandwidth? Give a formula and example.

Formula: Bandwidth = (Data Width in bits / 8) × Clock Frequency × Utilization

Example: 128-bit AXI bus at 500 MHz with 80% utilization:

Bandwidth = (128 / 8) × 500 MHz × 0.8 = 16 × 500M × 0.8 = 6.4 GB/s

Key factors:
- Data width (bits): bigger = more throughput
- Clock frequency: faster = more throughput
- Utilization: VALID & READY not always high → reduced effective bandwidth

Real designs often measure this with VIP (verification IP) to understand bottlenecks.

Q17. What is AXI back-pressure? How does a slave signal it? Show timing diagram.

Back-pressure occurs when a slave is busy and cannot accept new transactions. It signals this by holding READY low. The master, seeing READY=0, stalls and waits.

Clock:     _|‾|_|‾|_|‾|_|‾|_|‾|_
AWVALID:   _|‾‾‾‾‾‾‾‾‾‾|_
AWREADY:   _|‾|_|‾‾‾‾|_|‾
           Transfer  pause   transfer

Slave reasons for back-pressure:
- Write buffer full
- FIFO full
- Processing transaction in progress
- Memory controller busy

Q18. Write data before address — is it allowed? (AXI4 rules)

AXI4 allows write data to arrive before write address. The slave can buffer the data and process it once the address arrives. This is called "out-of-order address/data."

However, the WLAST signal must match the address's AxLEN — the slave knows when a write is complete by counting beats and seeing WLAST. If address and data are out-of-order, the slave must correlate them properly, which adds complexity.

Most designs don't exploit this because it complicates the slave; instead, they keep address and data in-order by gating the W channel until after AW.

Q19. AXI QoS signals — what are they used for?

AxQOS (4-bit Quality of Service) allows the master to specify transaction priority. Higher AxQOS values indicate higher priority. Interconnects and crossbars can use AxQOS to arbitrate between multiple masters — high-priority transactions get through first.

For example, in a GPU SoC, the GPU's memory requests might have AxQOS=15 (highest), while the CPU has AxQOS=8 (medium). The interconnect prioritizes GPU traffic.

This is optional (not all interconnects implement AxQOS arbitration) and is system-specific.

Q20. What is address alignment in AXI bursts? What happens with an unaligned start address?

AXI doesn't require address alignment. You can issue a read at address 0x1003 and read 4 beats of 8 bytes each. The slave interprets the start address modulo the beat size to determine which bytes are valid on the first beat.

Example: ARADDR=0x1003, ARSIZE=3 (8B), ARLEN=3 (4 beats):

Beat 0: Address 0x1003 (unaligned)
        Only bytes [7:3] returned (5 bytes)
Beat 1: Address 0x1008 (aligned)
        All 8 bytes
Beat 2: Address 0x1010
        All 8 bytes
Beat 3: Address 0x1018
        All 8 bytes

Many slaves require aligned addresses; checking the datasheet is essential.

Section 3: AXI Variants

Q21. AXI4 vs AXI3 — what changed? (burst interleaving, ID width, WID removal)

Key differences:

Feature AXI3 AXI4
Write data interleaving Allowed (complex) Not allowed (simpler)
WID signal Yes (write ID on W channel) No (removed, not needed)
ID width Max 12 bits Max 12 bits (same)
Max data width 1024 bits 1024 bits (same)

AXI4 is cleaner and easier to implement. AXI3 is legacy (rarely used in new designs).

Q22. AXI-Lite — what are the restrictions vs AXI4? When do you use it?

AXI-Lite is a simplified version of AXI4 for low-bandwidth control interfaces:

Restriction AXI4 AXI-Lite
Burst length 1–256 beats 1 beat only
Burst types INCR, FIXED, WRAP INCR only
Data width 8B–128B (up to 1024b) 8B–128B (fixed)
Outstanding trans. Unlimited 1 per channel
ID signals Yes, multiple IDs No ID (single transaction)

Use AXI-Lite for register access (control interfaces, status registers). Use AXI4 for high-throughput data paths (memory, caches, DMA).

Q23. AXI-Stream — what channels does it have? Show signal table.

AXI-Stream is for unidirectional data flow (no address, just data). It's ideal for DSP pipelines, video processing, or any scenario where data flows in one direction without random access.

9;background:#ffffff">TDATA width / 8

Signal Width Purpose
TDATA User-defined Payload data
TVALID 1 Data valid
TREADY 1 Ready to accept
TLAST 1 Last beat of packet
TKEEP TDATA/8 Byte enable (which bytes valid)
Byte validity (variable length)
TSTRB TDATA width / 8 Byte strobes (write protection)
TID User-defined Stream ID (multiplex streams)
TDEST User-defined Destination routing

Q24. AXI-Stream packet framing — how does TLAST work? TKEEP vs TSTRB?

TLAST marks the last beat of a packet. For example, a 100-byte Ethernet frame might be sent as 13 beats of 8 bytes, with TLAST high only on beat 13.

TKEEP: Per-byte valid flag. If TKEEP[2]=0, byte 2 is not part of the packet (used for variable-length last beats). If a 100-byte packet is split into 12 beats of 8 bytes + 1 beat of 4 bytes, the last beat has TKEEP=0x0F (only 4 bytes valid).

TSTRB: Per-byte write strobe (like AXI WSTRB). TSTRB=0 means the byte is not written, even though TKEEP might be 1. TSTRB is less common in AXI-Stream.

Q25. ACE (AXI Coherency Extensions) — what cache states does it add?

ACE extends AXI with coherency support for multi-master caching. It adds cache-related signals and allows masters to share memory coherently.

Cache states (MOESI-like): Modified, Owned, Exclusive, Shared, Invalid. ACE transactions include AxCache signals indicating the transaction's intent (cacheable, write-back, write-through, etc.). Coherency controllers enforce that if one master modifies a line, other caches invalidate their copies.

ACE is used in systems with multiple CPUs or CPU+GPU caches that must remain coherent.

Q26. ACE-Lite — how is it different from full ACE? When does a master use ACE-Lite?

ACE-Lite is a simplified coherency interface for devices that don't need full cache coherency. An ACE-Lite master can issue coherent transactions but doesn't handle snoop responses (another master invalidating its cache).

For example, a GPU might use ACE-Lite to read cached data from main memory coherently, but it doesn't need to handle snoops from other GPUs.

Q27. What is an AXI crossbar/interconnect? How does it arbitrate multiple masters?

An AXI crossbar is a matrix of multiplexers that routes transactions from N masters to M slaves. Each master can target any slave independently.

Arbitration: When multiple masters want to access the same slave, the crossbar uses a priority arbiter (round-robin, fixed priority, or weighted). The selected master gets access; others stall (AWREADY/ARREADY go low).

Modern crossbars (like Xilinx SmartConnect) implement sophisticated arbitration, AxQOS priority, and bandwidth limiting per master.

Q28. What is an AXI register slice? Why insert one? (pipelining, timing closure)

An AXI register slice is a pipeline stage inserted between master and slave, breaking combinational paths. It adds latency (+1 cycle per slice) but improves timing closure (frequency) and allows pipelining.

Use register slices: (1) when timing is critical, (2) when you want to decouple master and slave domains, or (3) to balance latency in a complex interconnect.

Q29. AXI CDC (clock domain crossing bridge) — how is it implemented?

An AXI CDC bridge transfers AXI transactions across clock domains. Each channel is treated independently; signals are synchronized using 2-flop synchronizers. The CDC logic also manages flow control — if the slave's clock is slow, the master's clock may stall.

Implementing CDC for AXI is non-trivial because you must synchronize VALID/READY pairs without deadlock. Most teams use off-the-shelf CDC IP from Xilinx or Synopsys.

Q30. CHI (Coherent Hub Interface) vs AXI — what problems does CHI solve?

CHI is ARM's next-generation coherency protocol, designed for future data center SoCs. Unlike AXI, CHI uses a snooping-based coherency model similar to PCIe or CXL, where coherent transactions are automatically broadcast and snooped.

CHI solves: (1) scalability — AXI ACE becomes complex at many cores, (2) fabric efficiency — CHI reduces duplicate traffic, (3) mixed-coherency — CHI and non-coherent devices can coexist.

CHI is not yet widespread but is expected to dominate in the 2025+ timeframe.

Section 4: Design & Verification

Q31. How do you design an AXI slave interface? Show state machine for read channel.

A simple AXI slave read interface state machine:

State machine (simplified):

Idle:
  if (ARVALID) → capture ARADDR, ARLEN, ARID, go to Fetch

Fetch:
  Read data from memory at current_addr
  If last beat (beat_count == ARLEN) → go to Send

Send:
  Assert RVALID, set RDATA, RID, RLAST
  If (RREADY) → send next beat
  If (beat_count == ARLEN && RREADY) → go to Idle

Key points:
- Decouple address acceptance (Fetch) from data return (Send)
- Only assert RVALID when data is ready
- Only move to next beat when RREADY & RVALID both high
- Always return matching RID for traceability

Q32. Common AXI implementation bugs (top 5) — table with bug/impact/fix.

Bug Impact Fix
Wrong RID on response Master routes data to wrong transaction Always capture and return same ID
READY/VALID handshake backwards Transactions stall, throughput drops READY indicates slave capacity, not master intent
Address/data protocol violation Data routed to wrong address Strict handshake: address first, data after
WLAST mismatch Slave thinks write incomplete, blocks next trans. Count beats: WLAST high when beat == (AWLEN+1)
Credit starvation (flow control) Slave can't accept new requests Allocate enough buffer; never starve READY

Q33. How does an AXI arbiter work? Show round-robin vs priority arbitration.

An arbiter selects among multiple masters requesting the same slave. Two common strategies:

Round-Robin Arbitration:
  Last_grant = M0
  This cycle: requests = {M2, M1, M0}
  Next after M0: M1 → grant M1
  Next cycle: requests = {M2, M1}
  Next after M1: M2 → grant M2
  Fairness: all masters get equal turns

Priority Arbitration:
  M0 (priority 3), M1 (priority 2), M2 (priority 1)
  Always grant highest priority requesting
  M1 & M2 requesting → grant M1
  Only M2 requesting → grant M2
  Risk: starvation if M0 always requesting

Most systems use a hybrid: weighted round-robin with AxQOS priority.

Q34. AXI protocol checker — what violations does it catch?

AXI checkers (built into VIP) catch:

  • ID mismatch (response doesn't match request)
  • WLAST miscount (doesn't match AWLEN)
  • RLAST miscount (doesn't match ARLEN)
  • VALID/READY protocol violations (e.g., data changes when VALID high but READY low)
  • Address/data out-of-order (address never received for data)
  • Burst type errors (e.g., WRAP with unaligned address)
  • Unknown response codes (not OKAY/EXOKAY/SLVERR/DECERR)
  • Timeout (transaction never completes)

Q35. AXI performance bottlenecks — how do you identify them?

Use VIP to measure:

  • Utilization: % of cycles where VALID & READY both high
  • Latency: Time from request to response (should match slave latency)
  • Throughput: Transactions per cycle (accounting for burst length)
  • Stall cycles: Cycles where READY or VALID is low
  • Buffer depth: Peak outstanding transactions

If utilization is low, check: (1) Is the slave stalling (READY low)? (2) Is the master stalling (VALID low)? (3) Do you have enough buffer depth?

Q36. AXI in low-power designs — clock gating AXI channels

Clock gating reduces power by disabling the clock when there's no activity. For AXI, you can gate clocks on idle channels:

Gate W channel clock if no WVALID for N cycles
Gate AR channel clock if no ARVALID for N cycles
Gate R channel clock if no RVALID for N cycles

Caveats:
- Don't gate if there's back-pressure (READY=0)
- Ensure synchronizers can detect gated signal deassertion
- Test carefully; clock gating bugs are subtle

Q37. AXI debug — how do you debug a hung AXI transaction? Checklist.

Debug checklist for hung transaction:
□ Is ARVALID/AWVALID asserted? If not, master didn't send address
□ Is ARREADY/AWREADY asserted? If not, slave won't accept
□ Is RID/BID correct? Trace data to the request
□ Are there enough write data beats? Count W transfers vs AWLEN
□ Is WLAST high on final beat?
□ Is there deadlock? (Slave waiting for data, master waiting for response)
□ Did timeout occur? (Transaction takes too long)
□ Check protocol checker output for violations

Use waveform viewer to trace from request to response.

Q38. AXI in FPGA (Xilinx SmartConnect / Microchip CoreAXI) vs ASIC

FPGAs use parameterized AXI IP (SmartConnect for Xilinx, CoreAXI for Microchip) with configurable data width, clock ratio, and arbitration. ASIC designs often use custom interconnects optimized for specific use cases.

FPGA pros: fast to integrate, no RTL design. FPGA cons: fixed latency, limited customization. ASIC pros: optimized for the SoC's specific needs. ASIC cons: more design/verification effort.

Q39. NIC-400 / CoreLink — what does an AXI interconnect IP provide?

These are ARM's reference interconnect IPs. They provide:

  • AXI crossbar matrix (N masters to M slaves)
  • Priority/QoS arbitration
  • Configurable address decoding
  • Optional CDC bridges (clock domain crossing)
  • Optional register slices for pipelining
  • Protocol checking and monitoring

Most SoCs either use these IPs or implement similar functionality in custom RTL.

Q40. AXI-Stream FIFO design — how do you handle backpressure from the slave?

An AXI-Stream FIFO buffers data between upstream and downstream. When the downstream (slave) asserts ~TREADY, the FIFO must stop accepting data (deassert upstream TREADY).

FIFO state machine:
  If (TVALID_in && TREADY_in) → push to FIFO
  If (TVALID_out && TREADY_out) → pop from FIFO
  TREADY_in = FIFO not full
  TVALID_out = FIFO not empty

Backpressure flow:
  Slave slow → TREADY_out low → FIFO fills → TREADY_in low
  Master sees TREADY_in low → stalls

Key insight: TREADY must be a combinational function of FIFO occupancy to avoid deadlock.

Interview Cheatsheet: AXI by Company

Company Most-Asked Topics Why
ARM AXI fundamentals (Q1–Q5), variants (Q21–Q30), ACE/CHI (Q25–Q30) ARM owns AXI spec; coherency and next-gen protocols are key
Qualcomm Outstanding trans. (Q11), out-of-order completion (Q12), flow control (Q16–Q17) SoCs need high throughput; performance and correctness critical
Nvidia Interconnect design (Q27–Q28), bandwidth (Q16), bottleneck identification (Q35) GPUs are throughput machines; interconnect performance critical
Xilinx (FPGA) SmartConnect usage (Q39), AXI-Lite (Q22), protocol checkers (Q34) FPGA tools focus; practical IP integration knowledge needed
AMD / Intel Custom interconnects (Q33), arbitration (Q33), clock gating (Q36) High-end SoCs use custom interconnects; deep implementation knowledge

Key Resources

  • AMBA AXI4 / AXI5 Specifications — Available from ARM (free registration or IP license)
  • ACE / CHI Specifications — For coherency details
  • Xilinx SmartConnect / IP Integrator User Guides — Practical FPGA integration examples
  • Cadence AXI VIP / Synopsys Protocol Compiler — Verification tools; learn what they check
  • Your company's AXI guidelines — Most large SoC teams have internal playbooks
  • Open-source AXI testbenches — GitHub has many; study for verification patterns

📌 Final Note: AXI is vast, and the specification is 600+ pages. Interviewers don't expect you to memorize it all. They test your understanding of the core concepts: VALID/READY handshakes, ID-based out-of-order completion, flow control, and design tradeoffs. Focus on the "why" — why does AXI have 5 channels? Why are address and data decoupled? Why is ID-based ordering sufficient? You'll stand out by showing intuition, not memorization. Good luck!

Clock:    _|‾|_|‾|_|‾|_|‾|_|‾|_|‾|
VALID:    _|‾‾‾‾‾‾|_______|‾‾‾|_
READY:    _______|‾‾|_|‾‾‾‾‾|___
Data:     xxxx[A][A][B][C][C][D]xxxx
              ↑     ↑ ↑ ↑   ↑
            xfer  wait xfers xfer

Rules:
- VALID can be high for multiple cycles; READY can be low
- When VALID & READY high → one transfer
- Data must be stable while VALID is asserted
- READY can depend combinationally on VALID (recommended)

In practice, READY is usually a combinational function of FIFO occupancy or pipeline depth, allowing the slave to assert READY when it has buffer space.

💡 Tip: Many candidates get the handshake backwards. A transmitter can hold VALID high indefinitely; the receiver's READY controls flow. If you design a slave that waits for VALID to deassert before accepting the next transaction, you've missed the entire point of the handshake.

Q4. Walk through a complete AXI write transaction — all 3 channels involved.

Let's say a master wants to write 8 bytes to address 0x1000:

Cycle 0:
AW Channel: AWVALID=1, AWADDR=0x1000, AWLEN=0 (1 beat), AWID=1
W Channel: WVALID=1, WDATA=0xDEADBEEFCAFEBABE, WSTRB=0xFF, WLAST=1
Slave: AWREADY=1, WREADY=1 (can accept)
→ Both channels transfer

Cycle 1:
AW & W channels idle (transferred)
B Channel: BVALID=1, BRESP=OKAY, BID=1
Master: BREADY=1 (ready for response)
→ Write response returns with matching ID

Complete write transaction requires:
1. Address (AW) transmitted
2. Data (W) transmitted (can be reordered)
3. Response (B) received (must wait for slave to process)

Share. Facebook Twitter LinkedIn Email Telegram WhatsApp
Previous ArticleMastering the Art of FPGA-based Prototyping
Next Article DDR Protocol Interview Questions for VLSI Interviews
Raju Gorla
  • Website

Related Posts

Interview Questions

DFT Interview Questions and Answers for VLSI Engineers

19 March 2026
Interview Questions

STA Interview Questions: 52 Real-World Questions with Answers (2026)

18 March 2026
Interview Questions

TCL Interview Questions for VLSI Engineers

6 November 2024
Add A Comment
Leave A Reply Cancel Reply

Topics
  • Design Verification
  • Digital Circuits
  • Informative
  • Interview Questions
  • Physical Design
  • RTL Design
  • STA
  • System Verilog
  • UVM
  • Verilog
Instagram LinkedIn WhatsApp Telegram
© 2026 VLSI Web

Type above and press Enter to search. Press Esc to cancel.