Hardware & IoT

COCO

A cute, interactive AI companion robot in Phase 3 development — embedded systems (ESP32), voice interaction, and cloud AI (OpenAI + Next.js) for learning, emotional regulation, and stress relief.

My Role

Embedded Systems, Backend & Frontend

Core Stack

ESP32, Arduino, Next.js, TypeScript, OpenAI

Timeline

2025.11 — Present

Live Project

—

The Challenge

The core problem

Students and young users need emotional support and learning help, but existing AI interfaces feel cold and screen-only. There was no single system that combined physical presence, voice, and cloud AI in one companion.

Product vision

A warm, interactive AI companion: physical robot (ESP32, sensors, servos, OLED expressions), voice-based interaction, and cloud AI for learning, emotional regulation, and stress relief.

What I owned

Embedded systems, backend, and frontend: ESP32 firmware (I2S, Wi‑Fi, HTTP, OLED, camera), 9+ hardware components (mic, speaker, ultrasonic, servos, 4WD, power). Next.js backend with OpenAI (GPT + Whisper) for voice and responses; REST API for device ↔ cloud; real-time audio pipelines. Protocol design, power management, and under-2s response latency.

Results

Phase 3 prototype: physical presence, voice interaction, and cloud AI in one device. Sub-2s voice response; coordinated hardware and software stack.

System Architecture

ESP32 coordinates sensors, actuators, and display; Next.js cloud handles AI and conversation. Client–server with Wi‑Fi, REST, and real-time audio.

Hardware

Software

Communication

Hardware Components

ESP32 USB-C dev board as central MCU; ESP32-CAM (OV2640) for facial expression recognition.
INMP441 I2S mic, PAM8403 amp + speaker; 0.96" OLED (I2C) for expressions and status.
HC-SR04P ultrasonic proximity; SG-90 servos ×2; 4WD chassis + L298N driver; power management.

Software Stack

ESP32 firmware (Arduino): I2S, Wi-Fi, HTTP, OLED driver, camera libs.
Next.js 15+ with TypeScript; OpenAI GPT + Whisper for voice and responses.
RESTful API for ESP32 ↔ backend; audio streaming and conversation history.

Communication & Flows

I2S (audio), I2C (display), Wi-Fi 802.11; HTTP/REST, PWM (servos), GPIO.
Voice flow: mic → ESP32 → Next.js → OpenAI → ESP32 → audio + OLED + servo.
Proximity flow: HC-SR04P detect → wake-up expression → listening mode.

Screen 01

COCO Live

The real-time emotional interface where users talk to COCO and see its live state (listening, thinking, speaking). Makes the robot feel present, responsive, and alive through voice interaction and synced expressions.

VoiceOLED expressionsReal-time state

Screen 02

Insights & Control

Usage insights, conversation history, and device controls so users can understand and manage COCO’s behavior. Provides intelligence, transparency, and system-level control behind the companion experience.

InsightsHistoryDevice controls

Hardware

System structure

ESP32-based hardware layout: mic, speaker, OLED, sensors, servos, and 4WD. Central to voice I/O, expression feedback, and physical interaction.

ESP32I2SOLEDServos

Project impact

Phase 3 full-stack integration: hardware, firmware, backend, frontend, and UI/UX. Companion-focused design with reliable Wi-Fi and automatic reconnection.

<2s

Voice-to-AI latency

Hardware components

99%+

Wi-Fi reliability

10+

OLED expression patterns

Reflection

I’d add offline fallbacks for when Wi-Fi drops and more robust power profiling for battery use. Bridging embedded (I2S, I2C, PWM) with cloud AI and keeping latency under 2s was a great learning experience.