Pipeline & Tools
Open source annotation pipeline for egocentric household activity data
View on GitHubHow It Works
Tools
HomeHands-QC
Automated video quality assessment for egocentric GoPro footage. Checks resolution, FPS, hand detection rate, and blur score.
pip install mediapipe opencv-python
HomeHands-Pose
21-point hand skeleton detection per frame. Detects left and right hand with confidence scores and pixel coordinates.
pip install mediapipe
HomeHands-Seg
Pixel-level hand segmentation using SAM2. Wrist coordinates used as point prompts. Right hand purple, left hand red.
pip install sam2
HomeHands-Audio
Speech transcription from egocentric narration. Generates subtitles and timestamped narration JSON locally.
pip install openai-whisper
HomeHands-Pipeline
Master script that runs all modules automatically on every video in a folder. Produces one combined annotation JSON per clip.
python pipeline/run_pipeline.py
Quick Start
# 1. Clone the repository git clone https://github.com/aneessaheba/Egocentric_Homes cd Egocentric_Homes # 2. Install dependencies pip install mediapipe sam2 openai-whisper opencv-python brew install ffmpeg # 3. Add your videos cp your_videos/*.mp4 assets/videos/ # 4. Run full pipeline python pipeline/run_pipeline.py # Output per video: # Hand pose JSON → assets/processed/hand_pose/ # Segmentation PNGs → assets/processed/segmented/ # Narration JSON → assets/processed/narrations/ # Combined JSON → assets/processed/annotations/
Models Used
| Model | Task | Made by | Size | License |
|---|---|---|---|---|
| MediaPipe Hands | Hand tracking | 8 MB | Apache 2.0 | |
| SAM2 tiny | Segmentation | Meta AI | 155 MB | Apache 2.0 |
| Whisper base | Transcription | OpenAI | 145 MB | MIT |
| ffmpeg | Audio extraction | OSS | — | LGPL |
| OpenCV | Video processing | OSS | — | Apache 2.0 |
Output Format
{
"clip_id": "HH_001",
"filename": "[video_name].mp4",
"task": "[task_name]",
"duration_sec": "--",
"total_frames": "--",
"fps": 30,
"resolution": {
"width": "--",
"height": "--"
},
"hand_detection_rate": "--",
"narrations": [
{
"id": 1,
"start": "--",
"end": "--",
"text": "[narration text]"
}
],
"frames": [
{
"frame_id": 0,
"timestamp_sec": "--",
"hands_detected": "--",
"hands": [
{
"label": "Right",
"confidence": "--",
"keypoints": {
"WRIST": {
"x": "--", "y": "--",
"z": "--", "px": "--", "py": "--"
}
},
"segmentation": {
"method": "SAM2",
"pixel_count": "--",
"coverage_pct": "--"
}
}
],
"narration": "[narration text at this timestamp]"
}
]
}