I’ve pulled together 40 AXI protocol interview questions based on design work with ARM, Xilinx, Microchip, and companies building custom interconnects. AXI is the backbone of most modern SoCs — if you’re designing with ARM IP, you’re working with AXI. The tricky part isn’t memorizing the spec; it’s understanding why certain design decisions were made and how they interact with your interconnect and system performance.
💡 Who This Is For: Digital and SoC design engineers interviewing at ARM, Qualcomm, Nvidia, AMD, or companies designing interconnects and memory controllers. If you’ve built AXI slaves, masters, or crossbars, or debugged performance issues with outstanding transactions, this guide is directly relevant to your work.
Table of Contents
Quick Navigation
- Section 1: AXI Fundamentals (Q1–Q10)
- Section 2: Transactions & Performance (Q11–Q20)
- Section 3: AXI Variants (Q21–Q30)
- Section 4: Design & Verification (Q31–Q40)
- Interview Cheatsheet
Section 1: AXI Fundamentals
Q1. What is AMBA? Where does AXI fit in the AMBA family?
AMBA (Advanced Microcontroller Bus Architecture) is ARM’s ecosystem of on-chip interconnect protocols. It includes APB (Advanced Peripheral Bus — simple, low-bandwidth), AHB (Advanced High-performance Bus — predecessor to AXI), and AXI (Advanced eXtensible Interface — modern, high-performance).
AXI is the workhorse of modern SoCs. It supports multiple outstanding transactions, out-of-order completion, and wide data paths (up to 1024 bits in AXI5). If you’re building a data center SoC or GPU, you’re using AXI or a variant (ACE for coherent, AXI-Stream for data-flow, CHI for next-gen coherence).
Q2. What are the 5 AXI channels? What does each carry? Show ASCII block diagram.
AXI is a 5-channel interface between a Master (like a CPU) and a Slave (like memory). Each channel is independent and operates via a VALID/READY handshake:
Master Slave |------- Write Address Channel ------>| | (AxAddr, AxID, AxLen, ...) | |------- Write Data Channel --------->| | (WData, WStrb, WLast) | |<------ Write Response Channel ------| | (BResp, BID) | | |------- Read Address Channel ------->| | (AxAddr, AxID, AxLen, ...) | |<------ Read Data Channel -----------| | (RData, RResp, RID, RLast) | Channels (independent, can flow in parallel): 1. Write Address (AW) → address, length, width, ID 2. Write Data (W) → data payload, write strobes, last flag 3. Write Response (B) → status (OKAY/EXOKAY/SLVERR/DECERR) 4. Read Address (AR) → address, length, width, ID 5. Read Data (R) → data payload, status, last flag
The key insight: address and data are decoupled. You can send an address and then data on different cycles. This allows pipelines and efficient buffering.
Q3. Explain the VALID/READY handshake. What are the rules? Show timing diagram.
VALID and READY are the core AXI handshake signals. On each channel, the transmitter asserts VALID when it has valid data; the receiver asserts READY when it can accept data. When both are high on a clock edge, a transfer occurs. This is sometimes called "ready-valid" or "acknowledge-enabled" handshaking.
Clock: _|‾|_|‾|_|‾|_|‾|_|‾|_|‾|
VALID: _|‾‾‾‾‾‾|_______|‾‾‾|_
READY: _______|‾‾|_|‾‾‾‾‾|___
Data: xxxx[A][A][B][C][C][D]xxxx
↑ ↑ ↑ ↑ ↑
xfer wait xfers xfer
Rules:
- VALID can be high for multiple cycles; READY can be low
- When VALID & READY high → one transfer
- Data must be stable while VALID is asserted
- READY can depend combinationally on VALID (recommended)
In practice, READY is usually a combinational function of FIFO occupancy or pipeline depth, allowing the slave to assert READY when it has buffer space.
💡 Tip: Many candidates get the handshake backwards. A transmitter can hold VALID high indefinitely; the receiver's READY controls flow. If you design a slave that waits for VALID to deassert before accepting the next transaction, you've missed the entire point of the handshake.
Q4. Walk through a complete AXI write transaction — all 3 channels involved.
Let's say a master wants to write 8 bytes to address 0x1000:
Cycle 0: AW Channel: AWVALID=1, AWADDR=0x1000, AWLEN=0 (1 beat), AWID=1 W Channel: WVALID=1, WDATA=0xDEADBEEFCAFEBABE, WSTRB=0xFF, WLAST=1 Slave: AWREADY=1, WREADY=1 (can accept) → Both channels transfer Cycle 1: AW & W channels idle (transferred) B Channel: BVALID=1, BRESP=OKAY, BID=1 Master: BREADY=1 (ready for response) → Write response returns with matching ID Complete write transaction requires: 1. Address (AW) transmitted 2. Data (W) transmitted (can be reordered) 3. Response (B) received (must wait for slave to process)
The key point: address and data can arrive in different orders, but the response must include the ID so the master knows which write it's acknowledging.
Q5. Walk through a complete AXI read transaction — both channels.
A master reads 4 beats (16 bytes) from address 0x2000:
Cycle 0:
AR Channel: ARVALID=1, ARADDR=0x2000, ARLEN=3 (4 beats), ARID=2
Slave: ARREADY=1
→ Read address accepted
Cycles 1-4:
R Channel: RVALID=1, RDATA=[beat0], RLAST=0, RID=2
RVALID=1, RDATA=[beat1], RLAST=0, RID=2
RVALID=1, RDATA=[beat2], RLAST=0, RID=2
RVALID=1, RDATA=[beat3], RLAST=1, RID=2
Master: RREADY=1 for all beats
Read transactions can have multiple beats:
- ARLEN specifies number of beats - 1
- Each beat returns with ID and RLAST indicating final beat
- Slave determines beat pace (pipelining allowed)
Q6. What is AxID? How does out-of-order completion work? Show interleaved transactions.
AxID (Address ID) is a unique identifier for each transaction. In a system with multiple outstanding transactions, AxID allows the master to match responses to requests even if they complete out-of-order.
Master sends: Cycle 0: ARADDR=0x1000, ARID=1, ARLEN=0 (1 beat) Cycle 1: ARADDR=0x2000, ARID=2, ARLEN=0 (1 beat) Slave (out-of-order) returns: Cycle 3: RDATA=data_from_2000, RID=2 ← completed in different order! Cycle 4: RDATA=data_from_1000, RID=1 Master matches RID to original request: RID=2 → data for 0x2000 RID=1 → data for 0x1000 Key rule: within a single ID, responses must be in-order. Different IDs can be out-of-order. This allows efficiency: if ID=1 hits slow memory, ID=2 can complete from fast cache.
This is powerful for SoCs where the interconnect has multiple slave devices with different latencies.
Q7. Burst types — FIXED, INCR, WRAP — when is each used? Show address sequence.
AXI supports three burst types, controlling how the address increments:
INCR (Incrementing): Address increments for each beat (standard mode). For a 64-bit bus, burst of 4 at 0x1000 goes to 0x1000, 0x1008, 0x1010, 0x1018.
FIXED: Address stays the same. Used for FIFO-like interfaces where you write to the same address multiple times (e.g., pushing multiple items to a queue register).
WRAP (Wrapping): Address increments, but wraps within a boundary. For a 4-beat burst starting at 0x1004, it might wrap at 0x1000 (4-beat boundary): 0x1004, 0x1008, 0x100C, 0x1000.
| Burst Type | Use Case | Example (4 beats, 8B) |
|---|---|---|
| INCR | Standard memory, DMA | 0x1000, 0x1008, 0x1010, 0x1018 |
| FIXED | FIFO register access | 0x1000, 0x1000, 0x1000, 0x1000 |
| WRAP | Cache line fill | 0x1004, 0x1008, 0x100C, 0x1000 (wraps) |
Q8. What is WSTRB? What's the minimum transaction unit in AXI?
WSTRB (Write Strobe) is a byte-enable mask: each bit corresponds to a byte in WDATA. Bit 0 enables byte 0, bit 7 enables byte 7 (for 64-bit bus). If WSTRB bit is 0, the slave ignores that byte. This allows byte-granular writes without reading first.
Minimum transaction unit is 1 byte (byte-enable = 1 bit), but typically the bus uses 32-bit or 64-bit alignment. You can write a single byte by setting only one WSTRB bit.
Q9. What is AxLen, AxSize, AxBurst? How do you calculate the number of beats?
AxLen = number of beats - 1 (range 0–255 for INCR in AXI3, 0–255 for AXI4). So AxLen=3 means 4 beats. AxSize = log2(bytes per beat) — 0 for 1 byte, 1 for 2 bytes, 3 for 8 bytes. AxBurst = burst type (FIXED=0, INCR=1, WRAP=2).
Total bytes transferred = (AxLen + 1) × 2^AxSize. Example: ARLEN=7, ARSIZE=3 → (7+1) × 8 = 64 bytes.
Q10. What are AXI4 differences from AXI3?
AXI4 (released 2010, still dominant) changes from AXI3: longer burst length (INCR up to 256 vs 16 in AXI3), removed write data interleaving (simplifies hardware), added QoS signals, locked transactions different handling, and 4KB boundary rules clarified. AXI3 has write data interleaving (data from multiple transactions can interleave), which is complex; AXI4 removes it. For most designs, AXI4 is the better choice for its simplicity.
Section 2: Transactions & Performance
Q11. How do outstanding transactions improve throughput? How many can be in flight?
An outstanding transaction is one that has been sent but whose response hasn't been received yet. AXI allows multiple transactions to be in flight simultaneously, so the master doesn't have to wait for each to complete before sending the next. This is the core advantage of AXI over AHB.
For a system with 100 ns memory latency and 1 ns clock period, a single-transaction system is limited to 10 million transactions/second. With 100 outstanding transactions, you saturate the bus and achieve near-peak bandwidth. Modern ARM cores have 16–32+ outstanding read transactions (and fewer write) to hide this latency.
📌 Note: The number of outstanding transactions is limited by your ID space and buffer depth. If you only have 8 IDs, you can only have 8 outstanding transactions. This is why many high-performance systems use 8–16 bit IDs.
ffff">Normal memory access
Q8. AxLEN and AxSIZE — how do you calculate total bytes transferred?
AxLEN (0–255) specifies the number of beats minus 1. AxSIZE (0–2) specifies the width: 0=1B, 1=2B, 2=4B, 3=8B, etc.
Total bytes = (AxLEN + 1) × (2^AxSIZE)
Example: ARLEN=7, ARSIZE=3 (8 bytes per beat) → (7+1) × 8 = 64 bytes.
This decoupling allows flexible transaction sizes. A system might have a 64-bit data bus, but you can specify single-byte accesses (AxSIZE=0) or burst entire cache lines (AxSIZE=3 with AxLEN=7 for a 512-bit transaction).
Q9. WSTRB (write strobe) — what is it? Give an example of a 32-bit write with only upper 2 bytes valid.
WSTRB (Write Strobe) is a per-byte enable signal. Each bit corresponds to one byte of WDATA. If WSTRB[3:0] = 0xC (binary 1100), only bytes [7:0] and [15:8] are written; bytes [31:16] and [23:32] are ignored.
32-bit write to address 0x1000, write only upper 2 bytes:
WDATA = 0xDEADBEEF
WSTRB[3:0] = 0xC (binary 1100)
↑↑ ↑↑
bytes [31:24] and [23:16] written
bytes [15:8] and [7:0] NOT written
This is crucial for sub-aligned writes. If you want to write a 16-bit value to an unaligned address, the slave can update only the relevant bytes without corrupting others.
Q10. RRESP/BRESP — what do OKAY/EXOKAY/SLVERR/DECERR mean?
Response status codes indicate transaction outcome:
| Code | Meaning | Recoverable? |
|---|---|---|
| OKAY | Success | Yes (normal completion) |
| EXOKAY | Exclusive access succeeded | Yes (atomic op succeeded) |
| SLVERR | Slave error (bad address, access denied) | No (retry won't help) |
| DECERR | Decode error (interconnect couldn't route) | No (address unmapped) |
Section 2: Transactions & Performance
Q11. What are outstanding transactions? How do they improve throughput?
Outstanding transactions are multiple transactions in flight simultaneously. For example, a master can send Read Request 1, then Read Request 2, before receiving Data 1. The slave processes both concurrently and returns data in any order (matched by ID).
Without outstanding transactions (blocking mode): Master sends request, waits for response, then sends next request. Throughput = 1 transaction per round-trip latency. With outstanding transactions: Master pipelines requests, hiding latency. Throughput = N transactions per round-trip, where N = number of outstanding transactions allowed.
This is why high-performance SoCs allow dozens or hundreds of outstanding transactions — it maximizes interconnect utilization and hides memory latency.
Q12. How does AXI handle out-of-order responses? What is the ID matching rule?
Responses are matched to requests using the ID field. When the master sends a read with ARID=3, it expects to get back RDATA with RID=3. Within a single ID, responses must be in-order (beat 0, beat 1, beat 2), but different IDs can return in any order.
This is enforced at protocol level — a slave cannot send RID=3 responses out-of-order with respect to other RID=3 responses, but it can interleave RID=3 and RID=4 responses.
📌 Note: This rule is crucial for interconnect design. Some interconnects (like crossbars) naturally preserve ID-based ordering; others require specific routing policies to ensure it.
Q13. What is write data interleaving? (AXI3 vs AXI4 difference)
In AXI3, write data from different write commands could be interleaved on the W channel (different WLAST signals). This allowed complex pipelining but was hard to verify. AXI4 removed interleaving — all beats for a write command must come as a contiguous group on the W channel.
Example (AXI3 allowed this, AXI4 does not):
AXI3 (interleaved allowed): Cycle 0: W beat0_cmd1, WLAST=0 Cycle 1: W beat0_cmd2, WLAST=0 ← different command Cycle 2: W beat1_cmd1, WLAST=1 Cycle 3: W beat1_cmd2, WLAST=1 AXI4 (no interleaving): Cycle 0: W beat0_cmd1, WLAST=0 Cycle 1: W beat1_cmd1, WLAST=1 ← must finish before cmd2 Cycle 2: W beat0_cmd2, WLAST=0 Cycle 3: W beat1_cmd2, WLAST=1
AXI4 simplified this significantly, though it reduced pipelining flexibility. Most modern designs use AXI4.
Q14. What is AxLOCK (exclusive access)? How does it implement atomic operations?
AxLOCK is used for atomic read-modify-write operations. When a master sets ARLOCK=1, the slave locks the memory location for that transaction, preventing other masters from accessing it until the lock is released.
Typical flow: Master reads with ARLOCK=1 (data locked), modifies locally, writes back with AWLOCK=1 (unlocks after write). The slave returns EXOKAY if the lock was held; if another master accessed the location, it returns SLVERR, signaling the read-modify-write failed.
In practice, locks are complex in multi-master systems and are often implemented using transactions with explicit "acquire" and "release" semantics.
Q15. AXI ordering model — what guarantees does AXI make about transaction ordering?
Within the same ID, responses are in-order. Different IDs can be out-of-order. Writes to the same address might not be serialized if they have different IDs — the interconnect can forward them concurrently. To ensure strict ordering, the master should use the same ID for dependent transactions, or use explicit barriers (memory fences).
AXI doesn't provide "memory fence" primitives directly — that's handled at the system level (CPU instruction or software barrier).
Q16. How do you calculate AXI bus bandwidth? Give a formula and example.
Formula: Bandwidth = (Data Width in bits / 8) × Clock Frequency × Utilization
Example: 128-bit AXI bus at 500 MHz with 80% utilization:
Bandwidth = (128 / 8) × 500 MHz × 0.8 = 16 × 500M × 0.8 = 6.4 GB/s Key factors: - Data width (bits): bigger = more throughput - Clock frequency: faster = more throughput - Utilization: VALID & READY not always high → reduced effective bandwidth
Real designs often measure this with VIP (verification IP) to understand bottlenecks.
Q17. What is AXI back-pressure? How does a slave signal it? Show timing diagram.
Back-pressure occurs when a slave is busy and cannot accept new transactions. It signals this by holding READY low. The master, seeing READY=0, stalls and waits.
Clock: _|‾|_|‾|_|‾|_|‾|_|‾|_
AWVALID: _|‾‾‾‾‾‾‾‾‾‾|_
AWREADY: _|‾|_|‾‾‾‾|_|‾
Transfer pause transfer
Slave reasons for back-pressure:
- Write buffer full
- FIFO full
- Processing transaction in progress
- Memory controller busy
Q18. Write data before address — is it allowed? (AXI4 rules)
AXI4 allows write data to arrive before write address. The slave can buffer the data and process it once the address arrives. This is called "out-of-order address/data."
However, the WLAST signal must match the address's AxLEN — the slave knows when a write is complete by counting beats and seeing WLAST. If address and data are out-of-order, the slave must correlate them properly, which adds complexity.
Most designs don't exploit this because it complicates the slave; instead, they keep address and data in-order by gating the W channel until after AW.
Q19. AXI QoS signals — what are they used for?
AxQOS (4-bit Quality of Service) allows the master to specify transaction priority. Higher AxQOS values indicate higher priority. Interconnects and crossbars can use AxQOS to arbitrate between multiple masters — high-priority transactions get through first.
For example, in a GPU SoC, the GPU's memory requests might have AxQOS=15 (highest), while the CPU has AxQOS=8 (medium). The interconnect prioritizes GPU traffic.
This is optional (not all interconnects implement AxQOS arbitration) and is system-specific.
Q20. What is address alignment in AXI bursts? What happens with an unaligned start address?
AXI doesn't require address alignment. You can issue a read at address 0x1003 and read 4 beats of 8 bytes each. The slave interprets the start address modulo the beat size to determine which bytes are valid on the first beat.
Example: ARADDR=0x1003, ARSIZE=3 (8B), ARLEN=3 (4 beats):
Beat 0: Address 0x1003 (unaligned)
Only bytes [7:3] returned (5 bytes)
Beat 1: Address 0x1008 (aligned)
All 8 bytes
Beat 2: Address 0x1010
All 8 bytes
Beat 3: Address 0x1018
All 8 bytes
Many slaves require aligned addresses; checking the datasheet is essential.
Section 3: AXI Variants
Q21. AXI4 vs AXI3 — what changed? (burst interleaving, ID width, WID removal)
Key differences:
| Feature | AXI3 | AXI4 |
|---|---|---|
| Write data interleaving | Allowed (complex) | Not allowed (simpler) |
| WID signal | Yes (write ID on W channel) | No (removed, not needed) |
| ID width | Max 12 bits | Max 12 bits (same) |
| Max data width | 1024 bits | 1024 bits (same) |
AXI4 is cleaner and easier to implement. AXI3 is legacy (rarely used in new designs).
Q22. AXI-Lite — what are the restrictions vs AXI4? When do you use it?
AXI-Lite is a simplified version of AXI4 for low-bandwidth control interfaces:
| Restriction | AXI4 | AXI-Lite |
|---|---|---|
| Burst length | 1–256 beats | 1 beat only |
| Burst types | INCR, FIXED, WRAP | INCR only |
| Data width | 8B–128B (up to 1024b) | 8B–128B (fixed) |
| Outstanding trans. | Unlimited | 1 per channel |
| ID signals | Yes, multiple IDs | No ID (single transaction) |
Use AXI-Lite for register access (control interfaces, status registers). Use AXI4 for high-throughput data paths (memory, caches, DMA).
Q23. AXI-Stream — what channels does it have? Show signal table.
AXI-Stream is for unidirectional data flow (no address, just data). It's ideal for DSP pipelines, video processing, or any scenario where data flows in one direction without random access.
| Signal | Width | Purpose |
|---|---|---|
| TDATA | User-defined | Payload data |
| TVALID | 1 | Data valid |
| TREADY | 1 | Ready to accept |
| TLAST | 1 | Last beat of packet |
| TKEEP | TDATA/8 | Byte enable (which bytes valid) | Byte validity (variable length) |
| TSTRB | TDATA width / 8 | Byte strobes (write protection) |
| TID | User-defined | Stream ID (multiplex streams) |
| TDEST | User-defined | Destination routing |
Q24. AXI-Stream packet framing — how does TLAST work? TKEEP vs TSTRB?
TLAST marks the last beat of a packet. For example, a 100-byte Ethernet frame might be sent as 13 beats of 8 bytes, with TLAST high only on beat 13.
TKEEP: Per-byte valid flag. If TKEEP[2]=0, byte 2 is not part of the packet (used for variable-length last beats). If a 100-byte packet is split into 12 beats of 8 bytes + 1 beat of 4 bytes, the last beat has TKEEP=0x0F (only 4 bytes valid).
TSTRB: Per-byte write strobe (like AXI WSTRB). TSTRB=0 means the byte is not written, even though TKEEP might be 1. TSTRB is less common in AXI-Stream.
Q25. ACE (AXI Coherency Extensions) — what cache states does it add?
ACE extends AXI with coherency support for multi-master caching. It adds cache-related signals and allows masters to share memory coherently.
Cache states (MOESI-like): Modified, Owned, Exclusive, Shared, Invalid. ACE transactions include AxCache signals indicating the transaction's intent (cacheable, write-back, write-through, etc.). Coherency controllers enforce that if one master modifies a line, other caches invalidate their copies.
ACE is used in systems with multiple CPUs or CPU+GPU caches that must remain coherent.
Q26. ACE-Lite — how is it different from full ACE? When does a master use ACE-Lite?
ACE-Lite is a simplified coherency interface for devices that don't need full cache coherency. An ACE-Lite master can issue coherent transactions but doesn't handle snoop responses (another master invalidating its cache).
For example, a GPU might use ACE-Lite to read cached data from main memory coherently, but it doesn't need to handle snoops from other GPUs.
Q27. What is an AXI crossbar/interconnect? How does it arbitrate multiple masters?
An AXI crossbar is a matrix of multiplexers that routes transactions from N masters to M slaves. Each master can target any slave independently.
Arbitration: When multiple masters want to access the same slave, the crossbar uses a priority arbiter (round-robin, fixed priority, or weighted). The selected master gets access; others stall (AWREADY/ARREADY go low).
Modern crossbars (like Xilinx SmartConnect) implement sophisticated arbitration, AxQOS priority, and bandwidth limiting per master.
Q28. What is an AXI register slice? Why insert one? (pipelining, timing closure)
An AXI register slice is a pipeline stage inserted between master and slave, breaking combinational paths. It adds latency (+1 cycle per slice) but improves timing closure (frequency) and allows pipelining.
Use register slices: (1) when timing is critical, (2) when you want to decouple master and slave domains, or (3) to balance latency in a complex interconnect.
Q29. AXI CDC (clock domain crossing bridge) — how is it implemented?
An AXI CDC bridge transfers AXI transactions across clock domains. Each channel is treated independently; signals are synchronized using 2-flop synchronizers. The CDC logic also manages flow control — if the slave's clock is slow, the master's clock may stall.
Implementing CDC for AXI is non-trivial because you must synchronize VALID/READY pairs without deadlock. Most teams use off-the-shelf CDC IP from Xilinx or Synopsys.
Q30. CHI (Coherent Hub Interface) vs AXI — what problems does CHI solve?
CHI is ARM's next-generation coherency protocol, designed for future data center SoCs. Unlike AXI, CHI uses a snooping-based coherency model similar to PCIe or CXL, where coherent transactions are automatically broadcast and snooped.
CHI solves: (1) scalability — AXI ACE becomes complex at many cores, (2) fabric efficiency — CHI reduces duplicate traffic, (3) mixed-coherency — CHI and non-coherent devices can coexist.
CHI is not yet widespread but is expected to dominate in the 2025+ timeframe.
Section 4: Design & Verification
Q31. How do you design an AXI slave interface? Show state machine for read channel.
A simple AXI slave read interface state machine:
State machine (simplified): Idle: if (ARVALID) → capture ARADDR, ARLEN, ARID, go to Fetch Fetch: Read data from memory at current_addr If last beat (beat_count == ARLEN) → go to Send Send: Assert RVALID, set RDATA, RID, RLAST If (RREADY) → send next beat If (beat_count == ARLEN && RREADY) → go to Idle Key points: - Decouple address acceptance (Fetch) from data return (Send) - Only assert RVALID when data is ready - Only move to next beat when RREADY & RVALID both high - Always return matching RID for traceability
Q32. Common AXI implementation bugs (top 5) — table with bug/impact/fix.
| Bug | Impact | Fix |
|---|---|---|
| Wrong RID on response | Master routes data to wrong transaction | Always capture and return same ID |
| READY/VALID handshake backwards | Transactions stall, throughput drops | READY indicates slave capacity, not master intent |
| Address/data protocol violation | Data routed to wrong address | Strict handshake: address first, data after |
| WLAST mismatch | Slave thinks write incomplete, blocks next trans. | Count beats: WLAST high when beat == (AWLEN+1) |
| Credit starvation (flow control) | Slave can't accept new requests | Allocate enough buffer; never starve READY |
Q33. How does an AXI arbiter work? Show round-robin vs priority arbitration.
An arbiter selects among multiple masters requesting the same slave. Two common strategies:
Round-Robin Arbitration:
Last_grant = M0
This cycle: requests = {M2, M1, M0}
Next after M0: M1 → grant M1
Next cycle: requests = {M2, M1}
Next after M1: M2 → grant M2
Fairness: all masters get equal turns
Priority Arbitration:
M0 (priority 3), M1 (priority 2), M2 (priority 1)
Always grant highest priority requesting
M1 & M2 requesting → grant M1
Only M2 requesting → grant M2
Risk: starvation if M0 always requesting
Most systems use a hybrid: weighted round-robin with AxQOS priority.
Q34. AXI protocol checker — what violations does it catch?
AXI checkers (built into VIP) catch:
- ID mismatch (response doesn't match request)
- WLAST miscount (doesn't match AWLEN)
- RLAST miscount (doesn't match ARLEN)
- VALID/READY protocol violations (e.g., data changes when VALID high but READY low)
- Address/data out-of-order (address never received for data)
- Burst type errors (e.g., WRAP with unaligned address)
- Unknown response codes (not OKAY/EXOKAY/SLVERR/DECERR)
- Timeout (transaction never completes)
Q35. AXI performance bottlenecks — how do you identify them?
Use VIP to measure:
- Utilization: % of cycles where VALID & READY both high
- Latency: Time from request to response (should match slave latency)
- Throughput: Transactions per cycle (accounting for burst length)
- Stall cycles: Cycles where READY or VALID is low
- Buffer depth: Peak outstanding transactions
If utilization is low, check: (1) Is the slave stalling (READY low)? (2) Is the master stalling (VALID low)? (3) Do you have enough buffer depth?
Q36. AXI in low-power designs — clock gating AXI channels
Clock gating reduces power by disabling the clock when there's no activity. For AXI, you can gate clocks on idle channels:
Gate W channel clock if no WVALID for N cycles Gate AR channel clock if no ARVALID for N cycles Gate R channel clock if no RVALID for N cycles Caveats: - Don't gate if there's back-pressure (READY=0) - Ensure synchronizers can detect gated signal deassertion - Test carefully; clock gating bugs are subtle
Q37. AXI debug — how do you debug a hung AXI transaction? Checklist.
Debug checklist for hung transaction: □ Is ARVALID/AWVALID asserted? If not, master didn't send address □ Is ARREADY/AWREADY asserted? If not, slave won't accept □ Is RID/BID correct? Trace data to the request □ Are there enough write data beats? Count W transfers vs AWLEN □ Is WLAST high on final beat? □ Is there deadlock? (Slave waiting for data, master waiting for response) □ Did timeout occur? (Transaction takes too long) □ Check protocol checker output for violations Use waveform viewer to trace from request to response.
Q38. AXI in FPGA (Xilinx SmartConnect / Microchip CoreAXI) vs ASIC
FPGAs use parameterized AXI IP (SmartConnect for Xilinx, CoreAXI for Microchip) with configurable data width, clock ratio, and arbitration. ASIC designs often use custom interconnects optimized for specific use cases.
FPGA pros: fast to integrate, no RTL design. FPGA cons: fixed latency, limited customization. ASIC pros: optimized for the SoC's specific needs. ASIC cons: more design/verification effort.
Q39. NIC-400 / CoreLink — what does an AXI interconnect IP provide?
These are ARM's reference interconnect IPs. They provide:
- AXI crossbar matrix (N masters to M slaves)
- Priority/QoS arbitration
- Configurable address decoding
- Optional CDC bridges (clock domain crossing)
- Optional register slices for pipelining
- Protocol checking and monitoring
Most SoCs either use these IPs or implement similar functionality in custom RTL.
Q40. AXI-Stream FIFO design — how do you handle backpressure from the slave?
An AXI-Stream FIFO buffers data between upstream and downstream. When the downstream (slave) asserts ~TREADY, the FIFO must stop accepting data (deassert upstream TREADY).
FIFO state machine: If (TVALID_in && TREADY_in) → push to FIFO If (TVALID_out && TREADY_out) → pop from FIFO TREADY_in = FIFO not full TVALID_out = FIFO not empty Backpressure flow: Slave slow → TREADY_out low → FIFO fills → TREADY_in low Master sees TREADY_in low → stalls
Key insight: TREADY must be a combinational function of FIFO occupancy to avoid deadlock.
Interview Cheatsheet: AXI by Company
| Company | Most-Asked Topics | Why |
|---|---|---|
| ARM | AXI fundamentals (Q1–Q5), variants (Q21–Q30), ACE/CHI (Q25–Q30) | ARM owns AXI spec; coherency and next-gen protocols are key |
| Qualcomm | Outstanding trans. (Q11), out-of-order completion (Q12), flow control (Q16–Q17) | SoCs need high throughput; performance and correctness critical |
| Nvidia | Interconnect design (Q27–Q28), bandwidth (Q16), bottleneck identification (Q35) | GPUs are throughput machines; interconnect performance critical |
| Xilinx (FPGA) | SmartConnect usage (Q39), AXI-Lite (Q22), protocol checkers (Q34) | FPGA tools focus; practical IP integration knowledge needed |
| AMD / Intel | Custom interconnects (Q33), arbitration (Q33), clock gating (Q36) | High-end SoCs use custom interconnects; deep implementation knowledge |
Key Resources
- AMBA AXI4 / AXI5 Specifications — Available from ARM (free registration or IP license)
- ACE / CHI Specifications — For coherency details
- Xilinx SmartConnect / IP Integrator User Guides — Practical FPGA integration examples
- Cadence AXI VIP / Synopsys Protocol Compiler — Verification tools; learn what they check
- Your company's AXI guidelines — Most large SoC teams have internal playbooks
- Open-source AXI testbenches — GitHub has many; study for verification patterns
📌 Final Note: AXI is vast, and the specification is 600+ pages. Interviewers don't expect you to memorize it all. They test your understanding of the core concepts: VALID/READY handshakes, ID-based out-of-order completion, flow control, and design tradeoffs. Focus on the "why" — why does AXI have 5 channels? Why are address and data decoupled? Why is ID-based ordering sufficient? You'll stand out by showing intuition, not memorization. Good luck!
Clock: _|‾|_|‾|_|‾|_|‾|_|‾|_|‾|
VALID: _|‾‾‾‾‾‾|_______|‾‾‾|_
READY: _______|‾‾|_|‾‾‾‾‾|___
Data: xxxx[A][A][B][C][C][D]xxxx
↑ ↑ ↑ ↑ ↑
xfer wait xfers xfer
Rules:
- VALID can be high for multiple cycles; READY can be low
- When VALID & READY high → one transfer
- Data must be stable while VALID is asserted
- READY can depend combinationally on VALID (recommended)
In practice, READY is usually a combinational function of FIFO occupancy or pipeline depth, allowing the slave to assert READY when it has buffer space.
💡 Tip: Many candidates get the handshake backwards. A transmitter can hold VALID high indefinitely; the receiver's READY controls flow. If you design a slave that waits for VALID to deassert before accepting the next transaction, you've missed the entire point of the handshake.
Q4. Walk through a complete AXI write transaction — all 3 channels involved.
Let's say a master wants to write 8 bytes to address 0x1000:
Cycle 0:
AW Channel: AWVALID=1, AWADDR=0x1000, AWLEN=0 (1 beat), AWID=1
W Channel: WVALID=1, WDATA=0xDEADBEEFCAFEBABE, WSTRB=0xFF, WLAST=1
Slave: AWREADY=1, WREADY=1 (can accept)
→ Both channels transferCycle 1:
AW & W channels idle (transferred)
B Channel: BVALID=1, BRESP=OKAY, BID=1
Master: BREADY=1 (ready for response)
→ Write response returns with matching IDComplete write transaction requires:
1. Address (AW) transmitted
2. Data (W) transmitted (can be reordered)
3. Response (B) received (must wait for slave to process)
