OPEN SOURCE

Pipeline & Tools

Open source annotation pipeline for egocentric household activity data

View on GitHub

How It Works

Raw GoPro Video
Quality Check
Report JSON
Hand Pose
Keypoints JSON
Segmentation
SAM2 Masks
Combined Full JSON
Transcription
Narration JSON

Tools

QUALITY CHECK

HomeHands-QC

Automated video quality assessment for egocentric GoPro footage. Checks resolution, FPS, hand detection rate, and blur score.

Input Raw .mp4 video
Output Quality report JSON
Model MediaPipe Hand Landmarker
pip install mediapipe opencv-python
HAND TRACKING

HomeHands-Pose

21-point hand skeleton detection per frame. Detects left and right hand with confidence scores and pixel coordinates.

Input Raw .mp4 video
Output hand_pose.json per clip
Model MediaPipe Hand Landmarker
pip install mediapipe
SEGMENTATION

HomeHands-Seg

Pixel-level hand segmentation using SAM2. Wrist coordinates used as point prompts. Right hand purple, left hand red.

Input Video + hand_pose.json
Output Colored PNG frames per clip
Model SAM2 tiny (Meta AI)
pip install sam2
AUDIO

HomeHands-Audio

Speech transcription from egocentric narration. Generates subtitles and timestamped narration JSON locally.

Input Raw .mp4 video with audio
Output .srt + narration.json + subtitled .mp4
Model OpenAI Whisper base
pip install openai-whisper
FULL PIPELINE

HomeHands-Pipeline

Master script that runs all modules automatically on every video in a folder. Produces one combined annotation JSON per clip.

Input Folder of .mp4 videos
Output Full annotation JSON per clip
Model All of the above
Run python pipeline/run_pipeline.py
python pipeline/run_pipeline.py

Quick Start

# 1. Clone the repository
git clone https://github.com/aneessaheba/Egocentric_Homes
cd Egocentric_Homes

# 2. Install dependencies
pip install mediapipe sam2 openai-whisper opencv-python
brew install ffmpeg

# 3. Add your videos
cp your_videos/*.mp4 assets/videos/

# 4. Run full pipeline
python pipeline/run_pipeline.py

# Output per video:
# Hand pose JSON     → assets/processed/hand_pose/
# Segmentation PNGs  → assets/processed/segmented/
# Narration JSON     → assets/processed/narrations/
# Combined JSON      → assets/processed/annotations/

Models Used

Model Task Made by Size License
MediaPipe Hands Hand tracking Google 8 MB Apache 2.0
SAM2 tiny Segmentation Meta AI 155 MB Apache 2.0
Whisper base Transcription OpenAI 145 MB MIT
ffmpeg Audio extraction OSS LGPL
OpenCV Video processing OSS Apache 2.0

Output Format

{
  "clip_id": "HH_001",
  "filename": "[video_name].mp4",
  "task": "[task_name]",
  "duration_sec": "--",
  "total_frames": "--",
  "fps": 30,
  "resolution": {
    "width": "--",
    "height": "--"
  },
  "hand_detection_rate": "--",
  "narrations": [
    {
      "id": 1,
      "start": "--",
      "end": "--",
      "text": "[narration text]"
    }
  ],
  "frames": [
    {
      "frame_id": 0,
      "timestamp_sec": "--",
      "hands_detected": "--",
      "hands": [
        {
          "label": "Right",
          "confidence": "--",
          "keypoints": {
            "WRIST": {
              "x": "--", "y": "--",
              "z": "--", "px": "--", "py": "--"
            }
          },
          "segmentation": {
            "method": "SAM2",
            "pixel_count": "--",
            "coverage_pct": "--"
          }
        }
      ],
      "narration": "[narration text at this timestamp]"
    }
  ]
}