The future of AI is most decidedly embedded, at least that’s what all signs are pointing to. Everyone is probably familiar with AI running on software platforms like web services and in scripts on your desktop; you’ll find plenty of example scripts on StackExchange, Medium, and other boutique blogs. However, at the device level, there are still some challenges to making embedded AI more efficient and scalable.
How do we know there is a shift towards greater embedding at the device level, and what does this look like? The move towards implementation of embedded AI at the device level is reflected in the implementation of machine learning/AI models on SBCs and GPUs. Going into the future, more artificial intelligence SoC options will hit the market, and we’ll see an offloading from the module-level to the chip level. In fact, we’re already seeing this move with a number of embedded AI SoC startups. Here’s what the future of embedded AI will look like and how designers can prepare for the upcoming shift to greater computing power embedded at the chip level.
Embedded AI is just like it sounds: AI capabilities are embedded at the device level so that specific tasks can be performed on-device. These embedded systems are designed to capture data and process it at the device level. This idea of embedding AI capabilities at the device data has become more common over the last several years. Some mobile apps can perform simple AI tasks, while other dedicated embedded systems need to be used for more complex tasks like computer vision.
Up to only recently, computationally intense AI tasks (e.g., recommendation engines) needed to be performed in the cloud (i.e., at a data center). In other words, data was sent to a cloud server, and the server environment would input data into an AI model. The server would then use the results as part of some larger task, or the results would be sent back to the edge device for further processing.
Typical interaction between edge devices and the cloud.
Embedded AI changes this dynamic in that it eliminates the need to send data back to the cloud to run an AI model. Data is captured by the device and any AI models are run at the device level. The results can then be used by the device directly as part of some dedicated task. Data may be stored at the device level temporarily, or it can be sent back to a cloud server for storage.
Chipmakers have been moving things in this direction for some time. NVIDIA has become the dominant player in the space by adapting their GPU architecture specifically for AI tasks at the device level. Google Coral’s TPU (TensorFlow processing unit) was specifically built to expedite TensorFlow models at the device level. These modules can be easily brought into embedded products (e.g., SBCs) as they connect to other devices using standard routing and layout protocols. Here are some of the reasons for moving AI away from the cloud and dedicating processing power at the device-level to AI/ML models:
Currently, the best-in-class embedded AI options are found on modules, where a general purpose chip has its architecture optimized for AI tasks. The alternative is found in NVIDIA’s Jetson Nano system, where a massive number of cores are packed into a single module without specifically designing the architecture for neural networks. In the not-too-distant future, the best embedded AI capabilities will be found on fully-customized SoCs.
Embedding AI capabilities onto custom SoCs allows the chip’s architecture to be optimized to the specific computing application. Rather than using standard ALUs for repetitive matrix calculations in neural networks, the architecture of an SoC can be customized to support a specific set of mathematical operations. This allows instruction counts to be reduced, which then reduces power consumption and overall calculation time. When multiple cores are placed on a single SoC, the product now becomes massively parallelizable and scalable.
Established chipmakers and innovative startups are already moving in this direction. Names like Intel, Arteris, and fabless startups are developing IP for new SoCs or their own fully-custom SoCs for embedded AI devices. As an example application, repetitive matrix operations in neural network models can be executed quickly with an optimized cascaded architecture at the chip level. This type of matrix computing architecture is being implemented by Ambient Scientific in their IP.
One of the remaining challenges involves model training. This would normally be done in the cloud, and the model data is then sent back to the edge device and stored in memory. Alternatively, the model would be flashed onto the device manually by an engineer. The parallelization provided by custom SoCs would allow training to be performed on-device as part of supervised learning in an asynchronous manner. This is a much more scalable and efficient solution than relying on the cloud.
The repetitive matrix calculations in a convolutional neural network can be optimized for minimum power and minimum instruction count when implemented at the hardware level.
If you’re looking for a PCB design and embedded systems design firm, look no further than NWES. We can help you design your next embedded AI product around standard COMs, a custom SBC, or custom SoC. We’re also a digital marketing firm, and we can help you market your new product and engage with your target audience. Contact NWES today for a consultation.