Amplifier stability

Designing With ASICs for Machine Learning in Embedded Systems

By

The ASIC revolution is here to stay as it gives designers the easiest possible route to quickly implement highly integrated, highly standardized features into a new system. They eliminate much of the work involved in developing specialty logic and analog interfaces by packaging everything into a single component. Since 2019, the ASIC revolution has naturally moved into an important new area of technology: machine learning and AI.

While a full "AI brain" type of ASIC is unavailable (and it’s unclear whether that will ever happen), there are very useful ASICs that include machine learning features that can be used as peripherals in an embedded system. These can be implemented just like you would use any other type of ASIC. Designers that want to use ASICs with machine learning in their end devices will need to make some smart floorplanning and component selection choices to support their designs.

Systems Architecture with ASICs for Machine Learning

Using ASICs to implement machine learning in an embedded system requires interfacing with a host processor and a set of peripherals, as shown in the block diagram below. In this diagram, the host processor is typically an MCU that will interface with a small ASIC over USB or another high data rate differential signaling standard. This is appropriate for small IoT products, sensors, and many other smart devices. Larger intelligent systems, like AI/ML-capable single-board computers, will use an MPU or other processor. The remaining peripherals are connected and configured over standard interfaces.

 

ASIC machine learning system architecture

Example system architecture with a large processor interfaced to an ASIC to access AI/ML capabilities.

 

There are several benefits to be gained by instantiating machine learning functions in an external ASIC rather than in the host processor. The main goal here is to offload the AI/ML tasks onto the ASIC because the ASIC will be much more compute-efficient than the sequential/combinational logic implemented in conventional processors. RISC-V based processors and ASICs may change this in the future, but for a huge range of smaller systems, the standard MCU/MPU/FPGA + ASIC-based architecture is here to stay.

Unless you’re using a GPU or FPGA, you likely won’t have enough compute to perform training on the end device except for the simplest models. Newer neural network architectures that provide highly accurate inference in specialized fields are constantly being researched. Architectures that are optimized for deployment on end devices are not researched as heavily as it is generally assumed these will be deployed in the cloud. However, the next generation of AI-specialized processors and ASICs will have to abandon the traditional Si-based digital logic architecture and begin using transistors in their traditional analog format.

Designing PCBs for Machine Learning in ASICs

PCBs that use ASICs for machine learning are high-speed digital systems, and they may include an analog front-end for wireless communication or interfacing with sensors. These devices should follow the standard set of high speed PCB design guidelines to ensure signal integrity, power integrity, and EMI/EMC compliance. Some of the main guidelines to follow in these devices include:

  • Stackup: The stackup for the board must support high speed signaling with controlled impedance. ASICs with machine learning capabilities will interface with the main processor primarily over a moderate speed data connection like PCIe 2 or USB. A MIPI standard like CSI-2 is also a possibility.

  • Routing: This is more of a general guideline in high speed digital systems than it is specific to systems with ASICs. Be mindful of where everything is routed to ensure channel compliance and low noise.

  • Analog isolation: Systems that will use an ASIC with machine learning capabilities need to have the high speed section separated from the analog section to ensure analog signals are not corrupted. One common application of these devices is sensor fusion and inference with a machine learning model, so there should be a dedicated area for sensor interfaces.

  • Power integrity: This is another area of stackup design and PDN impedance design that should be considered in these high speed digital systems. Make sure you allocate enough plane layers and space for decoupling capacitors to ensure low PDN impedance in these systems.

More of these designs are packing high functionality into smaller areas, requiring an HDI approach and very tight placement of components on small boards. These machine learning-capable ASICs are in small quad packages, but newer devices will come in BGA packages that will require fanouts to be designed so all pins can be reached. Make sure to plan for these points in your stackup design and routing strategy.

 

Google Coral footprint

The Google Coral Edge TPU is one popular ASIC with machine learning capabilities that can be deployed on small devices like IoT products.

 

Many smaller systems, like IoT products that include an AI accelerator ASIC, will just use a small MCU to implement host controller functions, especially if USB is the main interface on the ASIC to access inference data. More sophisticated systems are taking a different approach with larger MPUs or FPGAs.

Going Big With an FPGA

Smaller devices, such as mobile or IoT products, will use a common architecture with an MCU as the host controller and an ASIC for machine learning as a peripheral. FPGAs give you a lot of flexibility to implement advanced features and fully customize your system, although the development cost is greater. With an FPGA as the host processor, you can take a different approach to your system architecture:

  1. Instantiate machine learning capabilities in logic directly on the device
  2. Implement the required high-speed interfaces for accessing your AI/ML ASICs

As you scale up the capabilities of your device, you’ll need more ASICs as peripherals for machine learning tasks, larger FPGAs/GPUs, or both. Large FPGAs and banks of GPUs are probably the only subsystems that will provide training in a reasonable amount of time, but their physical size makes them unwieldy just about everywhere except in a data center. However, even on smaller devices, FPGAs are still useful for their full reconfigurability, as well as their high compute capabilities that allow neural network deployment directly on the device.

Large companies are implementing the same type of processor cores on custom silicon SoCs to implement AI/ML tasks directly on the device without the need of a peripheral. These are targeting more advanced applications like equipment for autonomous vehicles, advanced manufacturing, robotics, and even commercial space. With all the possible combinations of processors and AI accelerator chips, systems designers can now target a greater breadth of AI/ML tasks directly on the device.

 

Whether you’re designing an ultra-rugged aerospace system or feature-rich IoT platform, you can add intelligence to your design using ASICs with machine learning capabilities. NWES is an experienced design firm that develops IoT platforms, RF systems, data center products, aerospace systems, and much more. NWES helps aerospace OEMs, defense primes, and private companies in multiple industries design modern PCBs and create cutting-edge embedded technology. We've also partnered directly with EDA companies and advanced ITAR-compliant PCB manufacturers, and we'll make sure your next high speed digital system is fully manufacturable at scale. Contact NWES for a consultation.

 



Ready to start your next design project?



Subscribe to our updates

* indicates required



Ready to work with NWES?
Contact us today for a consultation.

Contact Us Today

Our Clients and Partners