Voice-Controlled In-Car Infotainment System with Full Hands-Free Operation

Built an embedded infotainment and HMI system for a vehicle program — delivering 100% voice-controlled operation for navigation, media, climate, and phone, with a touch fallback that respected driver attention requirements and passed OEM integration testing.

100% of primary functions operable by voice — navigation, media, climate, phone

Sub-400ms wake-word-to-response latency on embedded hardware

Passed OEM integration and EMC testing on first submission

Touch interaction design certified under UN ECE R10 electromagnetic compatibility

The Problem

An automotive Tier 1 supplier needed an infotainment and HMI system for a vehicle program targeting the mid-market European market. The OEM specification required 100% hands-free operation — every primary function reachable by voice without the driver taking their eyes off the road — in addition to a touch interface that met EU driver attention guidelines under the UNECE GSR framework.

The program had a constrained hardware budget. The infotainment unit used an ARM Cortex-A series SoC rather than the high-performance compute platforms available in premium vehicles. The voice processing pipeline had to run on-device with acceptable latency on hardware that would not handle large cloud-dependent language models in a real-time driving context.

The Constraints

On-device NLP with sub-400ms response. The program specification required wake-word detection and command recognition to complete within 400ms of utterance end — fast enough to feel responsive to a driver in motion, without requiring a cloud round-trip that would add unpredictable latency and create a hard dependency on cellular connectivity.

Command coverage across four domains. Navigation (destination entry, point-of-interest search, route change), media (source selection, playback control, station search), climate control (temperature, fan, seat heating), and phone (contacts, recent calls, DTMF input). Each domain had OEM-provided vocabulary and a defined test suite.

Driver attention compliance. The touch interface could not require sequences of more than two taps to reach any primary function from the home screen. Eyes-off-road duration per interaction was tested against NHTSA/EU driver attention guidelines — failing these tests would block OEM type approval.

EMC and automotive integration testing. The system needed to pass electromagnetic compatibility testing under UNECE R10 and integrate with the vehicle CAN bus for climate control commands without generating interference. Software changes affecting the hardware abstraction layer required re-test submissions.

Our Approach

The voice processing pipeline runs in two stages on-device. Wake-word detection uses a compact keyword spotting model (Porcupine) running continuously at low CPU load. On wake-word detection, a domain-classification model routes the utterance to the appropriate handler — navigation, media, climate, or phone — before full NLP processing begins. This staged approach reduces the compute load on the main NLP pass to only utterances that have already been classified, enabling sub-400ms end-to-end response on the target SoC.

Natural language understanding for each domain uses intent classification models fine-tuned on the OEM-provided vocabulary and test set. The models were trained and quantized for on-device inference using ONNX Runtime for ARM — trading some accuracy at the tail of the distribution for consistent latency on constrained hardware.

The touch interface was built to the OEM’s HMI design language in Qt/QML with custom components for the vehicle’s display resolution and aspect ratio. Navigation depth (number of taps to reach each function) was tracked explicitly during development and verified against the OEM specification before submission. The interaction design went through three cycles of driver attention testing before the final sign-off.

CAN bus integration for climate commands used a thin abstraction layer over SocketCAN, mapping HMI actions to the OEM-specified CAN message IDs and data bytes. The layer was designed to isolate HMI software from the CAN physical layer — enabling software testing and CI without hardware, and simplifying the re-test scope when software changes did not touch the CAN abstraction.

The Outcome

100% voice coverage across all four domains — navigation, media, climate, and phone — with the OEM test suite passing on the production hardware configuration
Sub-400ms end-to-end response from wake word to action confirmation, measured on the production SoC under thermal load
OEM integration testing passed on first submission — no rework required after the final hardware abstraction layer revision
Touch interaction design certified under the OEM’s driver attention requirements — all primary functions reachable in ≤2 taps from the home screen

Team

Engagement: 8 months, 4 engineers (1 embedded systems, 1 NLP/ML, 1 HMI/Qt, 1 integration/testing).

Stack: C++17, Qt/QML, Python (model training and quantization), ONNX Runtime (ARM), Porcupine (wake-word), SocketCAN, Yocto Linux (BSP layer), Jenkins (CI with hardware-in-the-loop targets)