{ "cells": [ { "cell_type": "markdown", "id": "0fb63c15-f349-47ae-8d27-e1429884f4c8", "metadata": {}, "source": [ "# Visualizing Video Embeddings with Feluda and t-SNE \n", "This notebook demonstrates how to use the [`feluda`](https://github.com/tattle-made/feluda) to extract video embeddings and visualize them using t-SNE, with thumbnails overlaid for each video. It includes:\n", "- Setting up Feluda and its operators.\n", "- Using video data from a subset of the [UCF101 video dataset](https://huggingface.co/datasets/sayakpaul/ucf101-subset) dataset. We are using this for demo, but can be replaced with any video dataset.\n", "- Generating video embeddings using feluda CLIP [video operator](https://pypi.org/project/feluda-vid-vec-rep-clip/).\n", "- Reducing embedding dimensions using t-SNE\n", "- Visualizing the reduced embeddings with video thumbnails" ] }, { "cell_type": "markdown", "id": "bf0eabb5", "metadata": {}, "source": [ "[![GitHub](https://img.shields.io/badge/GitHub-View%20Source-blue?logo=github)](https://github.com/tattle-made/feluda/blob/main/docs/examples/plot_tsne_videos.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/tattle-made/feluda/blob/main/docs/examples/plot_tsne_videos.ipynb)" ] }, { "cell_type": "markdown", "id": "63936c9b-ca4e-4aab-a3f8-93b4ae193f2c", "metadata": {}, "source": [ "Install dependencies conditionally based on whether the notebook is running in Colab or locally." ] }, { "cell_type": "code", "execution_count": null, "id": "ca273961-21a0-44c3-9936-bd5f9209e1d4", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Running Notebook locally\n", "\u001b[2mUsing Python 3.10.12 environment at: /home/aatman/Aatman/Tattle/feluda/.venv\u001b[0m\n", "\u001b[2mAudited \u001b[1m5 packages\u001b[0m \u001b[2min 9ms\u001b[0m\u001b[0m\n", "CPU times: user 5.31 ms, sys: 4.17 ms, total: 9.48 ms\n", "Wall time: 132 ms\n" ] } ], "source": [ "%%time\n", "import sys\n", "\n", "IN_COLAB = \"google.colab\" in sys.modules\n", "print(\"Running Notebook in Google Colab\" if IN_COLAB else \"Running Notebook locally\")\n", "\n", "if IN_COLAB:\n", " # Since Google Colab has preinstalled libraries like tensorflow and numba, we create a folder called feluda_custom_venv and isolate the environment there.\n", " # This is done to avoid any conflicts with the preinstalled libraries.\n", " %pip install uv\n", " !mkdir -p /content/feluda_custom_venv\n", " !uv pip install --target=/content/feluda_custom_venv --prerelease allow feluda feluda-vid-vec-rep-clip feluda-dimension-reduction opencv-python matplotlib > /dev/null 2>&1\n", "\n", " sys.path.insert(0, \"/content/feluda_custom_venv\")\n", "else:\n", " !uv pip install feluda feluda-vid-vec-rep-clip feluda-dimension-reduction opencv-python matplotlib > /dev/null 2>&1" ] }, { "cell_type": "code", "execution_count": 2, "id": "071c28fb-542e-4f94-986d-932da58a3404", "metadata": {}, "outputs": [], "source": [ "import os\n", "import tarfile\n", "from pathlib import Path\n", "\n", "import cv2\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "from huggingface_hub import hf_hub_download\n", "from matplotlib.offsetbox import AnnotationBbox, OffsetImage\n", "from tqdm.notebook import tqdm\n", "\n", "from feluda.factory import VideoFactory" ] }, { "cell_type": "markdown", "id": "bf74a578-a69f-43ba-8740-3219b747e380", "metadata": {}, "source": [ "We'll use two operators for this example. One for extracting embeddings and other for dimension reduction." ] }, { "cell_type": "code", "execution_count": null, "id": "71e62b81-db4e-43a5-afbe-de5e44c2c55e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "t-SNE model successfully initialized\n" ] } ], "source": [ "from feluda.operators import DimensionReduction, VidVecRep\n", "\n", "vid_vec_operator = VidVecRep()\n", "dim_red_operator = DimensionReduction(model_type=\"tsne\")" ] }, { "cell_type": "markdown", "id": "320d3505-5d83-422c-a714-84fa0fa6c940", "metadata": {}, "source": [ "Data Preparation" ] }, { "cell_type": "code", "execution_count": 5, "id": "d10d570b-7a11-41c6-a51d-70689b9c06c1", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dataset downloaded at /Users/omkarkabde/.cache/huggingface/hub/datasets--sayakpaul--ucf101-subset/snapshots/b9984b8d2a95e4a1879e1b071e9433858d0bc24a/UCF101_subset.tar.gz\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/var/folders/4p/bw6h5x8x1nb_17vsgfc12dz00000gn/T/ipykernel_56034/3928610610.py:12: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior.\n", " t.extractall(\".\")\n" ] } ], "source": [ "# Download and extract the UCF101 subset dataset\n", "def download_and_extract_dataset() -> str:\n", " \"\"\"Download and extract the UCF101 subset dataset from Hugging Face\"\"\"\n", " if not os.path.exists(\"UCF101_subset\"):\n", " file_path = hf_hub_download(\n", " repo_id=\"sayakpaul/ucf101-subset\",\n", " filename=\"UCF101_subset.tar.gz\",\n", " repo_type=\"dataset\",\n", " )\n", "\n", " with tarfile.open(file_path) as t:\n", " t.extractall(\".\")\n", "\n", " return file_path\n", " return None\n", "\n", "\n", "file_path = download_and_extract_dataset()\n", "print(f\"Dataset downloaded at {file_path}\")" ] }, { "cell_type": "code", "execution_count": 6, "id": "9dfc4d65-08a0-41db-be6c-f03b51f1c105", "metadata": {}, "outputs": [], "source": [ "# Create thumbnails directory\n", "thumbnail_dir = Path(\"thumbnails\")\n", "thumbnail_dir.mkdir(exist_ok=True)" ] }, { "cell_type": "code", "execution_count": 7, "id": "fd38f0a0-13cb-4817-9138-9e6a09426eb0", "metadata": {}, "outputs": [], "source": [ "def get_video_thumbnail(video_path: str) -> str:\n", " \"\"\"Extract and save thumbnail from video.\n", "\n", " Args:\n", " video_path: Path to the video file\n", "\n", " Returns:\n", " Path to the saved thumbnail image\n", "\n", " \"\"\"\n", " thumbnail_path = thumbnail_dir / f\"{Path(video_path).stem}_thumbnail.jpg\"\n", "\n", " # Return existing thumbnail if available\n", " if thumbnail_path.exists():\n", " return str(thumbnail_path)\n", "\n", " # Read the first frame from video\n", " cap = cv2.VideoCapture(str(video_path))\n", " ret, frame = cap.read()\n", " cap.release()\n", "\n", " if ret:\n", " cv2.imwrite(str(thumbnail_path), frame)\n", " return str(thumbnail_path)\n", " return None" ] }, { "cell_type": "code", "execution_count": 8, "id": "28cd0abc-44a8-4d57-bfeb-ea96404a0dad", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Found 405 videos to process\n" ] } ], "source": [ "def get_all_video_paths(base_dir: str = \"UCF101_subset\") -> list[str]:\n", " \"\"\"Get all video paths recursively from the dataset directory.\n", "\n", " Args:\n", " base_dir: Base directory containing videos\n", "\n", " Returns:\n", " List of video file paths\n", "\n", " \"\"\"\n", " video_paths = [str(path) for path in Path(base_dir).rglob(\"*.avi\")]\n", " return video_paths\n", "\n", "\n", "# Get all video paths\n", "video_paths = get_all_video_paths()\n", "print(f\"Found {len(video_paths)} videos to process\")" ] }, { "cell_type": "markdown", "id": "0ed93fcc-a66c-43ea-90fc-00fcd00853f8", "metadata": {}, "source": [ "Video Processing and Embedding Extraction" ] }, { "cell_type": "code", "execution_count": 9, "id": "73ed73d1-e680-42a6-93e7-ad24154e347d", "metadata": {}, "outputs": [], "source": [ "# Process videos and collect embeddings\n", "def process_videos(video_paths: list[str]):\n", " \"\"\"Process videos and extract embeddings using Feluda.\n", "\n", " Args:\n", " video_paths: List of paths to video files\n", "\n", " Returns:\n", " List of dictionaries containing video path, embedding, and thumbnail path\n", "\n", " \"\"\"\n", " operator_parameters = []\n", "\n", " for video_path in tqdm(video_paths, desc=\"Processing videos\"):\n", " try:\n", " # Get video thumbnail\n", " thumbnail_path = get_video_thumbnail(video_path)\n", "\n", " # Get video embedding\n", " video = VideoFactory.make_from_file_on_disk(video_path)\n", " embedding = vid_vec_operator.run(video)\n", " average_vector = next(embedding)\n", "\n", " operator_parameters.append(\n", " {\n", " \"payload\": video_path,\n", " \"embedding\": average_vector.get(\"vid_vec\"),\n", " \"thumbnail_path\": thumbnail_path,\n", " }\n", " )\n", " except Exception as e:\n", " print(f\"Error processing {video_path}: {e}\")\n", " continue\n", "\n", " print(f\"Successfully processed {len(operator_parameters)} videos\")\n", " return operator_parameters" ] }, { "cell_type": "code", "execution_count": 10, "id": "cff38abe-6934-4b85-8ebd-70a27d1f4297", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "0f467180b895458bb95da80e138800cc", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Processing videos: 0%| | 0/405 [00:00" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "x = np.array([item[\"reduced_embedding\"][0] for item in data])\n", "y = np.array([item[\"reduced_embedding\"][1] for item in data])\n", "\n", "\n", "def jitter(arr, jitter_amount=0.02):\n", " return arr + np.random.uniform(-jitter_amount, jitter_amount, arr.shape)\n", "\n", "\n", "x_jittered = jitter(x, jitter_amount=0.3)\n", "y_jittered = jitter(y, jitter_amount=0.3)\n", "\n", "plt.figure(figsize=(20, 16))\n", "plt.scatter(x_jittered, y_jittered)\n", "\n", "\n", "def load_thumbnail(payload):\n", " \"\"\"Load the thumbnail from the pre-saved thumbnail folder.\"\"\"\n", " video_filename = os.path.basename(payload)\n", " thumbnail_filename = video_filename.replace(\".avi\", \"_thumbnail.jpg\")\n", " thumbnail_path = os.path.join(thumbnail_dir, thumbnail_filename)\n", "\n", " if os.path.exists(thumbnail_path):\n", " return cv2.imread(thumbnail_path)\n", " else:\n", " print(f\"Thumbnail not found for {video_filename}\")\n", " return None\n", "\n", "\n", "for i, item in enumerate(data):\n", " video_thumbnail = load_thumbnail(item[\"payload\"])\n", "\n", " if video_thumbnail is not None:\n", " video_thumbnail = cv2.resize(video_thumbnail, (100, 100)) # Smaller thumbnails\n", " video_thumbnail = cv2.cvtColor(video_thumbnail, cv2.COLOR_BGR2RGB)\n", "\n", " # Keep zoom as is, no need for offsets anymore\n", " imagebox = OffsetImage(video_thumbnail, zoom=0.5)\n", " ab = AnnotationBbox(\n", " imagebox, (x_jittered[i], y_jittered[i]), frameon=False\n", " ) # Use jittered values\n", "\n", " plt.gca().add_artist(ab)\n", "\n", "# Set labels and title\n", "plt.title(\"t-SNE Reduced Embeddings with Video Thumbnails\")\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "6fb113a8-a0cd-45e4-bfba-780e2d519ffc", "metadata": {}, "source": [ "Cleanup" ] }, { "cell_type": "code", "execution_count": 13, "id": "17456d96-e3a2-4984-a0a9-bf34107e1084", "metadata": {}, "outputs": [], "source": [ "import shutil\n", "\n", "shutil.rmtree(\"thumbnails\")\n", "shutil.rmtree(\"UCF101_subset\")\n", "\n", "dim_red_operator.cleanup()\n", "vid_vec_operator.cleanup()" ] } ], "metadata": { "kernelspec": { "display_name": ".venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.7" } }, "nbformat": 4, "nbformat_minor": 5 }