COCO
A cute, interactive AI companion robot in Phase 3 development — embedded systems (ESP32), voice interaction, and cloud AI (OpenAI + Next.js) for learning, emotional regulation, and stress relief.
My Role
Embedded Systems, Backend & Frontend
Core Stack
ESP32, Arduino, Next.js, TypeScript, OpenAI
Timeline
2025.11 — Present
Live Project
—
The Challenge
The core problem
Students and young users need emotional support and learning help, but existing AI interfaces feel cold and screen-only. There was no single system that combined physical presence, voice, and cloud AI in one companion.
Product vision
A warm, interactive AI companion: physical robot (ESP32, sensors, servos, OLED expressions), voice-based interaction, and cloud AI for learning, emotional regulation, and stress relief.
What I owned
Embedded systems, backend, and frontend: ESP32 firmware (I2S, Wi‑Fi, HTTP, OLED, camera), 9+ hardware components (mic, speaker, ultrasonic, servos, 4WD, power). Next.js backend with OpenAI (GPT + Whisper) for voice and responses; REST API for device ↔ cloud; real-time audio pipelines. Protocol design, power management, and under-2s response latency.
Results
Phase 3 prototype: physical presence, voice interaction, and cloud AI in one device. Sub-2s voice response; coordinated hardware and software stack.
System Architecture
ESP32 coordinates sensors, actuators, and display; Next.js cloud handles AI and conversation. Client–server with Wi‑Fi, REST, and real-time audio.
Hardware Components
- ESP32 USB-C dev board as central MCU; ESP32-CAM (OV2640) for facial expression recognition.
- INMP441 I2S mic, PAM8403 amp + speaker; 0.96" OLED (I2C) for expressions and status.
- HC-SR04P ultrasonic proximity; SG-90 servos ×2; 4WD chassis + L298N driver; power management.
Software Stack
- ESP32 firmware (Arduino): I2S, Wi-Fi, HTTP, OLED driver, camera libs.
- Next.js 15+ with TypeScript; OpenAI GPT + Whisper for voice and responses.
- RESTful API for ESP32 ↔ backend; audio streaming and conversation history.
Communication & Flows
- I2S (audio), I2C (display), Wi-Fi 802.11; HTTP/REST, PWM (servos), GPIO.
- Voice flow: mic → ESP32 → Next.js → OpenAI → ESP32 → audio + OLED + servo.
- Proximity flow: HC-SR04P detect → wake-up expression → listening mode.

COCO Live
The real-time emotional interface where users talk to COCO and see its live state (listening, thinking, speaking). Makes the robot feel present, responsive, and alive through voice interaction and synced expressions.

Insights & Control
Usage insights, conversation history, and device controls so users can understand and manage COCO’s behavior. Provides intelligence, transparency, and system-level control behind the companion experience.

System structure
ESP32-based hardware layout: mic, speaker, OLED, sensors, servos, and 4WD. Central to voice I/O, expression feedback, and physical interaction.
Project impact
Phase 3 full-stack integration: hardware, firmware, backend, frontend, and UI/UX. Companion-focused design with reliable Wi-Fi and automatic reconnection.
<2s
Voice-to-AI latency
9+
Hardware components
99%+
Wi-Fi reliability
10+
OLED expression patterns
Reflection
I’d add offline fallbacks for when Wi-Fi drops and more robust power profiling for battery use. Bridging embedded (I2S, I2C, PWM) with cloud AI and keeping latency under 2s was a great learning experience.