RISC-V Custom Silicon: 10x Energy-Efficient AI Accelerator Enabling Always-On Intelligence in Battery-Powered Devices
Design and tape-out of a custom RISC-V based AI accelerator SoC achieving 10 TOPS/W efficiency for TinyML workloads, enabling always-on voice, vision, and sensor fusion in battery-powered wearables and IoT devices.
The Challenge
A consumer electronics company developing next-generation smart wearables and hearables needed a custom AI accelerator that could run sophisticated ML models continuously while maintaining week-long battery life on a coin cell battery.
Power Budget Constraints
Existing AI accelerators consumed 50-200mW for inference, far exceeding the 5mW budget required for always-on operation in small batteries. Duty cycling reduced user experience quality.
Impact: Target: <5mW continuous inferenceVendor Lock-in
Available solutions from major vendors came with restrictive licensing, high royalties per unit, and limited customization options. The client needed full IP ownership for differentiation.
Impact: 15-20% cost in licensing feesModel Flexibility
Fixed-function accelerators couldn't adapt to evolving ML models. The client needed to update algorithms post-deployment without hardware changes for competitive advantage.
Impact: 6-month hardware refresh cyclesIntegration Complexity
Off-the-shelf solutions required external memory, PMICs, and supporting chips, increasing BOM cost, PCB area, and power consumption for their compact form factor.
Impact: 4-chip solution increasing size by 3xOur Solution
We designed a fully custom RISC-V based SoC with an integrated neural processing unit (NPU) optimized for TinyML workloads, featuring aggressive power gating, in-memory computing elements, and a flexible dataflow architecture.
System Architecture
Heterogeneous architecture combining a RISC-V application processor with custom neural accelerator blocks and comprehensive power management.
Application Processor
- Dual-core RISC-V RV32IMC (custom microarchitecture)
- 16KB I-cache, 16KB D-cache per core
- Hardware floating-point unit
- Custom DSP extensions for signal processing
- Secure boot and hardware root of trust
Neural Processing Unit
- 256 MAC units in systolic array
- Support for INT4/INT8/INT16 precision
- On-chip SRAM with in-memory computing
- Flexible dataflow (weight/output stationary)
- Hardware activation functions (ReLU, Sigmoid, Softmax)
Memory Subsystem
- 1MB unified on-chip SRAM
- Intelligent memory controller with compression
- 4MB external QSPI flash interface
- DMA engine for zero-copy data movement
- Memory protection unit for security
Sensor Hub & I/O
- Always-on sensor processor (separate power domain)
- PDM microphone interface (up to 4 channels)
- I2S for audio codec
- SPI/I2C/UART for sensors
- 12-bit ADC for analog sensors
Power Management
- Integrated PMIC with multiple LDOs
- Dynamic voltage and frequency scaling
- Power gating for 8 independent domains
- Ultra-low-power RTC and wake-up controller
- Battery fuel gauge integration
Chip Specifications
| Process Node | 12nm FinFET (TSMC) |
| Die Size | 9mm² (3x3mm) |
| Package | WLCSP 4x4mm, 81 balls |
| NPU Performance | 1 TOPS @ 100MHz |
| Power Efficiency | 10 TOPS/W (INT8) |
| Always-On Power | < 5mW (voice wake + basic inference) |
| Deep Sleep | < 1µA with RTC |
Software Stack
- Custom LLVM toolchain with RISC-V extensions
- Lightweight RTOS optimized for power management
- TensorFlow Lite Micro with custom kernels
- Model compiler with quantization support
- Power-aware scheduling runtime
- Secure OTA update mechanism
- HAL with power state management APIs
TinyML Model Support
The NPU architecture was optimized for common TinyML workloads while maintaining flexibility for model updates.
Keyword Spotting
DS-CNN (Depthwise Separable CNN)
96% accuracy on custom vocabulary
8ms inference, <3mW power
Voice Activity Detection
RNN-based classifier
98% detection accuracy
Always-on at 0.8mW
Person Detection
MobileNetV3-Small variant
92% accuracy at 96x96 resolution
45ms inference, <15mW power
Gesture Recognition
1D CNN on accelerometer data
94% on 12 gesture classes
5ms inference, <1mW power
Sensor Fusion
Multi-input neural network
Activity recognition, context awareness
Continuous at 2mW
Implementation Timeline
Phase 1: Architecture & Specification
12 weeks- Workload analysis and benchmarking
- Architecture exploration and trade-off studies
- RTL microarchitecture specification
- Power and performance modeling
Phase 2: RTL Design & Verification
32 weeks- RTL implementation (Verilog)
- Comprehensive UVM testbench development
- Formal verification for critical paths
- Power intent specification (UPF)
Phase 3: Physical Design & Tape-out
24 weeks- Synthesis and floorplanning
- Place and route optimization
- Sign-off (DRC, LVS, timing, power)
- Tape-out to foundry
Phase 4: Silicon Bring-up & Productization
16 weeks- First silicon validation
- Characterization across PVT corners
- SDK and documentation completion
- Production test development
Results & Impact
The custom RISC-V AI accelerator exceeded all specifications, enabling a new category of always-on intelligent devices with week-long battery life and sophisticated on-device AI capabilities.
Energy Efficiency
Always-On Power
Inference Latency
BOM Cost
PCB Area
Licensing Costs
Return on Investment
Implementation Cost
Multi-year silicon development investment
Annual Savings
Payback Period
5-Year ROI
“Rapid Circuitry delivered exactly what we needed - a custom AI chip that lets us differentiate in a crowded market. The 10x efficiency improvement enabled features our competitors simply cannot match. Our devices now have always-on AI with week-long battery life, and we own the IP completely.”
CTO
Client Consumer Electronics Company
Technologies Used
Awards & Recognition
RISC-V Summit Innovation Award 2025
Best Commercial RISC-V Implementation
Embedded Computing Design Award
Most Innovative AI Processor
IEEE Solid-State Circuits Best Demo
Ultra-Low-Power AI Accelerator
Related Case Studies
Ready to Build Your Success Story?
Let's discuss how our expertise can help bring your vision to life with measurable results like this project.