TAS Lab Develops IoT Cam — A Browser-Based AI-Powered Webcam Viewer with Real-Time Face and Object Detection

Date: February 18, 2026

TAS Lab Develops IoT Cam — A Browser-Based AI-Powered Webcam Viewer with Real-Time Face and Object Detection

HONG KONG — February 18, 2026 — The Trustworthy AI and Autonomous Systems Laboratory (TAS LAB) at The Hong Kong Polytechnic University has developed IoT Cam, a modern browser-based webcam application that integrates real-time AI-powered face detection, face recognition, and object detection capabilities — all running entirely in the browser with zero server-side processing.

Abstract

The rapid proliferation of Internet of Things (IoT) devices and edge computing has created new opportunities for deploying artificial intelligence (AI) capabilities directly on end-user devices without reliance on cloud-based infrastructure. This project presents IoT Cam, a browser-based intelligent webcam application that integrates real-time face detection, face recognition, object detection, and facial attribute analysis — all executed entirely on-device through modern web technologies. Unlike conventional AI-powered surveillance systems that require dedicated GPU servers or cloud API calls, IoT Cam leverages TensorFlow.js and client-side inference to deliver a fully functional computer vision pipeline within a standard web browser, ensuring zero-latency processing and complete data privacy.

The system architecture comprises three core AI modules. First, a face detection and analysis module built on the @vladmandic/face-api.js library employs a TinyFaceDetector for real-time face localization, a 68-point facial landmark model for geometric feature extraction, and dedicated neural networks for expression classification (seven categories), age regression, and gender estimation. Second, an in-browser face recognition module enables users to register known individuals by capturing face descriptors (128-dimensional feature vectors) and performing Euclidean distance-based matching against a locally stored database, with configurable similarity thresholds to balance precision and recall. Third, a COCO-SSD object detection module based on MobileNet v2 provides real-time classification across 80 object categories with adjustable confidence thresholds and YOLO-style visual overlays.

IoT Cam is implemented using pure HTML, CSS, and JavaScript — requiring no frameworks, build tools, or server-side components. The application utilizes the browser MediaDevices API for camera access and the MediaRecorder API for video recording, supporting resolutions from 480p to 4K. A comprehensive detection output dashboard provides session-level analytics including cumulative detection counts, average inference times, object class distributions, and facial attribute histograms, with CSV export functionality for offline analysis. Additional features include real-time image adjustment filters, visual presets, snapshot capture with a lightbox gallery, and keyboard shortcuts for efficient operation.

Experimental evaluation demonstrates that the system achieves stable real-time performance at approximately 48 frames per second for video rendering with detection inference cycles of approximately 500 milliseconds on consumer-grade hardware. The privacy-by-design architecture ensures that no images, video frames, or biometric descriptors are transmitted to external servers, with all persistent data stored exclusively in the browser’s localStorage. IoT Cam serves as both a practical prototype for privacy-preserving IoT surveillance and an educational tool for courses in computer vision, embedded AI, and IoT systems at The Hong Kong Polytechnic University.

Keywords: IoT, edge AI, face detection, face recognition, object detection, TensorFlow.js, browser-based inference, privacy-preserving AI, computer vision

System Overview

The figure below illustrates the system architecture of IoT Cam. The application runs entirely client-side within a standard web browser, receiving real-time video streams via the MediaDevices API. Input frames are processed by three core AI modules: (1) a Face Detection & Analysis Module powered by @vladmandic/face-api.js for localization, landmark extraction, expression classification, age regression, and gender estimation; (2) a Face Recognition Module that extracts 128-dimensional descriptors and performs Euclidean distance matching against a locally stored database; and (3) an Object Detection Module using COCO-SSD (MobileNet v2) for 80-category real-time classification. All detection results are rendered as real-time overlays and fed into an analytics dashboard with cumulative statistics, histograms, and CSV export. The entire pipeline adheres to a privacy-by-design principle — zero data is transmitted to cloud or external servers.

IoT Cam System Overview — Architecture diagram showing the three AI modules, input processing, visualization engine, and privacy-by-design approach

Project Overview

IoT Cam is a pure HTML/CSS/JavaScript application that leverages cutting-edge browser APIs and on-device AI models to deliver a comprehensive webcam viewer with intelligent detection features. The system is designed for IoT and smart surveillance research, demonstrating how advanced AI capabilities can be deployed directly on edge devices through standard web technologies.

Key Features

Real-Time Face Detection & Recognition — Powered by @vladmandic/face-api.js (a maintained fork compatible with TensorFlow.js 3.x), the system detects faces in real time, identifies facial landmarks (68-point model), recognizes expressions (happy, sad, neutral, angry, surprised, etc.), and estimates age and gender.
Face Registration & Identification — Users can register known faces by name directly in the browser. The system then identifies registered individuals in real time, displaying match confidence and highlighting unregistered persons with visual warnings. All face descriptor data is stored locally in the browser — nothing is sent to any server.
Object Detection (COCO-SSD / YOLO-style) — Using TensorFlow.js and the COCO-SSD model (MobileNet v2), the application detects and classifies 80 object categories in real time, with adjustable confidence thresholds and color-coded bounding boxes with corner accents.
Detection Output Dashboard — A comprehensive analytics panel provides session statistics (total faces/objects detected, average inference time, frames analyzed), object class breakdowns, face detail breakdowns (gender distribution, average age, expression histogram), and a real-time scrolling detection log with export-to-CSV functionality.
Camera Controls & Image Processing — Supports multiple camera sources, resolution selection (480p to 4K), real-time image adjustments (brightness, contrast, saturation, hue), visual presets (Night Vision, Grayscale, Sepia, High Contrast, etc.), and mirror/flip transforms.
Snapshot & Video Recording — Take snapshots with a built-in lightbox gallery, or record video clips (WebM format) for later review.
Privacy by Design — All AI inference runs on-device using TensorFlow.js. No images, video frames, or face descriptors are ever transmitted to external servers. Face registration data persists only in the browser’s localStorage.

Technical Stack

The application is built with pure web technologies — no frameworks or build tools required:

TensorFlow.js 3.x for on-device AI inference
COCO-SSD 2.2.3 for 80-class object detection
@vladmandic/face-api 1.7.14 for face detection, landmarks, expressions, age/gender, and recognition
MediaDevices API (getUserMedia) for camera access
MediaRecorder API for video recording
Modern CSS with responsive design and dark theme

Results

The screenshot below shows the IoT Cam system in action, demonstrating simultaneous person detection (81% confidence), face recognition (75% match for a registered user “weisong”), age/gender estimation (♂ 20y), and expression analysis (Neutral 100%), all running at 48 FPS in the browser.

IoT Cam — Real-time AI detection demo showing face recognition, object detection, and expression analysis

Significance

This project demonstrates the feasibility of deploying sophisticated AI perception pipelines entirely within a web browser, eliminating the need for dedicated GPU servers or cloud-based inference. It serves as a prototype for privacy-preserving IoT surveillance systems and a teaching tool for courses related to computer vision, IoT, and embedded AI at PolyU.

The IoT Cam project was developed at the TAS LAB under the supervision of Prof. Weisong Wen, supporting the lab’s mission of building trustworthy and accessible AI systems for autonomous applications.

🔗 Source Code: https://github.com/weisongwen/iotProjectCam

Share on

Twitter Facebook LinkedIn

Dr. Weisong Wen