As an engineer heavily reliant on the Mijia (Xiaomi Smart Home) ecosystem, configuring an infrared-supported hub gateway in my room is a necessity. Considering ecosystem integration and cost-effectiveness, I purchased the Xiaomi Smart Speaker Pro as my terminal output device. However, treating it solely as a smart home hub wastes its audio hardware capabilities. Since this speaker does not natively support AirPlay—a protocol I heavily rely on—I initiated this hardcore modification plan to fulfill the need for high-quality streaming audio playback.
After some hardware selection and low-level tinkering, I ultimately utilized an idle Cudy TR3000 (MT7981) router, running OpenWrt / ImmortalWrt, to successfully build a zero-cost, low-latency AirPlay 2 receiver. This article documents the architectural thoughts during hardware selection and the various pitfalls I encountered in the Linux ALSA audio layer and the network multicast layer.

1. Motivation: Why AirPlay?
Many people might ask, isn’t direct Bluetooth connection enough? After extensive use, the experiential flaws of Bluetooth audio become glaringly obvious:
- The Disaster of Mixed Notification Sounds: Bluetooth is essentially a system-wide audio output. When you are enjoying immersive music, a sudden WeChat notification or scrolling through TikTok / Reels will rudely interrupt the music and blast through the speaker, causing a terrible experience.
- Cumbersome Device Switching: Repeatedly pairing and switching the Bluetooth speaker among a MacBook, iPhone, and iPad is an enduring and inelegant pain point.
- Audio Compression: Conventional Bluetooth protocols (SBC/AAC) incur non-negligible audio quality loss during resampling.
The advantage of AirPlay lies in its network-layer streaming protocol nature (TCP/UDP). It completely separates media streams from system notification sounds (music plays from the speaker, while notifications only ring on the phone). As long as devices are on the same local network (discovered via mDNS), any Apple device can seamlessly switch and cast audio with a single click, all while natively supporting ALAC lossless transmission.
2. Hardware Selection: Why the Xiaomi Smart Speaker Pro?
The market is flooded with passive and active speakers, but I ultimately chose this Xiaomi speaker as the physical output terminal, primarily based on its potential as a “Smart Home Hub”:
- Native Support for USB-C Digital Audio Input: This is the physical prerequisite for tinkering with Linux USB Audio (UAC).
- Mijia (Xiaomi) Ecosystem Hub: It features a built-in Bluetooth Mesh gateway, easily connecting various temperature/humidity sensors and smart switches in the house.
- Infrared Remote Core: Equipped with an infrared blaster, it can directly integrate the room’s old, non-smart air conditioner into the Mijia system for control.
It forms a perfect Smart Home Node, with its only flaw being the lack of native AirPlay support. We need an “external brain” to help it receive and decode the AirPlay audio stream.
3. Solution Research: Choosing the External Brain
During the initial selection phase, I intended to use the classic AirConnect solution. Its principle is to discover DLNA speakers on the LAN, encapsulate them, and broadcast them as AirPlay devices. However, after purchasing, I realized that the new Xiaomi Smart Speakers have completely stripped out the DLNA protocol. With this path blocked, directly receiving the AirPlay protocol and outputting it via USB Audio to the speaker became the only viable breakthrough.
To achieve this goal, I reviewed several mainstream, low-cost hardware solutions:
- Old Android Phone + AeroPlay App:
- Pros: Zero learning curve, mature App ecosystem (some support UAPP to bypass Android’s SRC).
- Cons: Running heavily long-term plugged in as a server poses a serious battery swelling/fire risk; even locking the battery level after rooting is not elegant enough.
- Raspberry Pi Zero 2 W + Shairport Sync:
- Pros: The “standard answer” in the Linux audio circle, with abundant documentation.
- Cons: Requires additional purchase, and current supply chain instability adds system redundancy.
- ESP32-S3 DIY Modification:
- Pros: A geek’s favorite, extremely low cost, with the S3 natively supporting USB Host.
- Cons: Too hardcore. Most community AirPlay receiver libraries only support AirPlay 1, and using TinyUSB to drive specific USB-C speakers has extremely poor UAC compatibility, requiring extensive low-level C debugging.
- Idle OpenWrt Router (The Winner):
- Hardware: An idle Cudy TR3000 (equipped with the MT7981 SoC, significantly over-performing for this task).
- Pros: Requires continuous power anyway, easily running an extra background daemon; the Linux (ImmortalWrt) system comes with a complete ALSA architecture and USB drivers, and its network processing capabilities far outclass the ESP32.
4. Troubleshooting Record: The Audio Journey on ImmortalWrt
I originally thought that checking a few packages in make menuconfig on the router would suffice, but I ended up running into a chain reaction of pitfalls, spanning from low-level drivers to network architectures.
Pitfall 1: The “Illusion” of Package Names and Dependencies
While compiling ImmortalWrt, finding the USB audio drivers took some time.
- Misconception: I assumed USB device drivers were all under
Kernel modules > USB Support. - Solution: The UAC driver actually belongs under
Sound Support. You must selectkmod-usb-audio(the core driver) andkmod-usb3(the DWC3 controller for MT7981). For the server-side, simply chooseshairport-sync-openssl, accompanied byalsa-utilsfor troubleshooting.
Open Source Assets: The source code and compilation configuration for the entire project have been open-sourced on my GitHub:
t0saki/openwrt-personal. If you happen to own a Cudy TR3000 (v1), you can directly download my compiled AirPlay 2 Dedicated Release Firmware to skip the hassle of compiling.
Pitfall 2: Process Crash and TCP Port 7000 Conflict
After flashing the firmware, the hardware was perfectly recognized via aplay -l. However, manually starting shairport-sync resulted in a direct Panic exit.
- Debug: By checking the logs via
shairport-sync -vvv, I found the error:unable to listen on IPv4 port 7000. The error is: "Address in use". - Root Cause: AirPlay 2 heavily relies on TCP Port 7000 for the RTSP handshake. Meanwhile, the
frps(internal network penetration server) running on my router had natively bound to port 7000 as well. - Fix: Modified
frps.inito bind to a different port, freeing up port 7000.
Pitfall 3: OpenWrt’s Phantom UID/GID Bug (Avahi Daemon Crash)
After freeing the port, the iPhone still couldn’t detect the device. Checking the logs revealed that shairport-sync couldn’t find the mDNS backend, whilst the underlying avahi-daemon was in a frantic Crash Loop.
- Root Cause: After ImmortalWrt introduced the
apkpackage manager, thepostinstscripts for some packages had a bug. When the system installeddbusandavahi, it failed to automatically create the corresponding system users and user groups. Due to security mechanisms, both daemons refused to start asRootand consequently crashed. - Fix: Directly modify
/etc/passwdand/etc/groupto manually inject the users and grant socket permissions:echo "dbus:x:81:" >> /etc/group echo "dbus:x:81:81:dbus:/var/run/dbus:/bin/false" >> /etc/passwd echo "avahi:x:84:" >> /etc/group echo "avahi:x:84:84:avahi:/var/run/avahi-daemon:/bin/false" >> /etc/passwd
Pitfall 4: Progress Bar Moving but No Sound, The Paradox of ALSA Passthrough vs. Resampling
The phone finally found the device and connected successfully. The playback progress bar was moving, but the speaker remained dead silent.
- Root Cause: Initially, I specified the output device as
hw:0,0in/etc/shairport-sync.conf. In the ALSA architecture,hwimplies Bit-perfect passthrough. The AirPlay stream is strictly 44.1kHz / 16-bit audio, but the Xiaomi speaker’s built-in, cheap DAC interface physically only accepts 48kHz input. The low-level handshake failed, and the audio frames were silently dropped. - Fix: Changed the device from
hw:0,0toplughw:0,0. Theplughwplugin automatically handles resampling (Sample Rate Conversion), dynamically converting the 44.1kHz stream to a DAC-compatible 48kHz. After restarting the service, the sound finally blasted out!
Pitfall 5: mDNS and Multicast Blackhole caused by L3 NAT
Everything worked perfectly when tested under the TR3000’s own subnet. However, under the main router (parent network), the device “lost connection” again, or spun indefinitely when clicked.
- Root Cause: The TR3000 was connected to the main router as a Wireless Client, meaning NAT isolation was present.
- mDNS is confined to its subnet: Under the default configuration, the parent network cannot receive the TR3000’s
_raop._tcpbroadcasts. - PTP Clock Synchronization Failure: AirPlay 2 strictly relies on UDP ports 319/320 and the
224.0.1.129multicast address for clock synchronization. By default, the firewall drops multicast packets from the WAN zone.
- mDNS is confined to its subnet: Under the default configuration, the parent network cannot receive the TR3000’s
- Fix: Rather than writing complex Firewall Rules and Avahi Reflector settings, it was better to wage a “dimensionality reduction attack” from the network architecture. Modified the firewall zones in LuCI, moving
wwan(the uplink interface) under the green list oflan, achieving L3 flattening (bypassing NAT altogether). Constrained Avahi to broadcast only towardsphy1-sta0. The network barrier was completely broken—connections were instantaneous.
5. Summary
This solution ultimately met all my expectations:
- Zero Additional Hardware Cost: Fully maximized the residual value of the idle router.
- Extremely High Stability: The MT7981’s computing power and the Linux network stack easily handled AirPlay 2’s Jitter Buffer over WiFi.
- Seamless Ecosystem Integration: The speaker remains the Mijia/Infrared hub, while now boasting Apple’s native streaming cast experience.
As an engineer, the joy derived from tinkering often outweighs the final result itself. Navigating from kernel drivers and ALSA architecture to network-layer mDNS and multicast packet capturing, the sense of accomplishment in solving this cascade of problems is something directly buying a HomePod cannot replace. If you happen to have a dusty OpenWrt router and a USB audio-capable speaker, I strongly recommend trying this out.
Of course, if you own a development board like a Raspberry Pi, many of the specific OpenWrt hurdles mentioned above (such as firewalls, NAT isolation, and package manager bugs) can be easily circumvented. However, the software stack configurations regarding the ALSA audio layer and the network layer in this guide remain highly relevant.