Benchmarking

Feluda includes a benchmarking system that measures performance characteristics of all operators, helping developers optimize performance and maintain consistent standards.

Quick Start

Run All Benchmarks

cd benchmark
python main.py

Generates timestamped reports in benchmark/results/.

Run Specific Operator

from benchmark.operators import cluster_embeddings
from benchmark.report import BenchmarkReport

results = cluster_embeddings.benchmark()
report = BenchmarkReport()
report.extend(results)
report.save_json("results.json")

System Components

1. Profiler Engine

Measures execution time, memory usage, and CPU time:

from benchmark.profiler import Profiler

# Profile a function
@Profiler.profile
def my_function():
    # Your code here
    pass

# Benchmark an operator
result = Profiler.benchmark_operator(
    operator_class=MyOperator,
    operator_name="my_operator",
    runtime_kwargs={"input": "test_data"}
)

Features:

System information capture (CPU, memory, platform)
Memory usage tracking with peak detection
Wall clock and CPU time measurement

2. Data Generator

Creates synthetic datasets for testing:

from benchmark.data_generator import DataGenerator

# Generate embeddings
embeddings = DataGenerator.generate_embeddings(1000, 512)

# Generate clustered data
embeddings, labels = DataGenerator.generate_embeddings_with_clusters(
    num_clusters=5, samples_per_cluster=200, dim=512
)

# Generate test images
images = DataGenerator.generate_test_images("test_images/")

Dataset Types:

Embeddings: 100 to 100,000 samples, 256 to 2048 dimensions
Images: 128x128 to 4K resolutions, RGB/grayscale modes
Clustered Data: Known cluster structures for validation

3. Operator Benchmarks

Each operator has a dedicated benchmark:

def benchmark() -> list[dict]:
    """Benchmark the ClusterEmbeddings operator."""
    results = []

    # Test with different dataset sizes
    for n_clusters in [3, 5, 10]:
        embeddings, labels = DataGenerator.generate_embeddings_with_clusters(
            num_clusters=n_clusters, samples_per_cluster=200
        )

        result = Profiler.benchmark_operator(
            operator_class=ClusterEmbeddings,
            operator_name=f"cluster_embeddings_{n_clusters}clusters",
            runtime_kwargs={"input_data": embeddings}
        )

        result["data_description"] = f"{n_clusters} clusters, 200 samples each"
        results.append(result)

    return results

4. Report Generation

Creates comprehensive reports:

from benchmark.report import BenchmarkReport

report = BenchmarkReport()
report.extend(benchmark_results)
report.save_json("results.json")
report.save_markdown("results.md")

Example: Creating a New Benchmark

1. Create Benchmark File

# benchmark/operators/my_operator.py
from benchmark.data_generator import DataGenerator
from benchmark.profiler import Profiler
from feluda.operators import MyOperator

def benchmark() -> list[dict]:
    """Benchmark the MyOperator."""
    results = []

    # Test configurations
    configs = [
        {"param1": "value1", "param2": 10},
        {"param1": "value2", "param2": 20}
    ]

    for config in configs:
        result = Profiler.benchmark_operator(
            operator_class=MyOperator,
            operator_name="my_operator",
            runtime_kwargs={"input": "test_data", **config}
        )

        result["data_description"] = f"config: {config}"
        results.append(result)

    return results

2. Register the Operator

Add to benchmark/operators/__init__.py:

from . import my_operator

all_operators = [
    # ... existing operators
    my_operator,
]

Performance Metrics

Execution Metrics

Execution Time: Total wall clock time
CPU Time: Actual CPU processing time
Memory Change: Net memory allocation/deallocation
Peak Memory: Maximum memory usage

Quality Metrics

Success Rate: Percentage of successful runs
Error Details: Specific failure reasons
Consistency: Standard deviation of performance

Understanding Results

JSON Report Structure

{
  "system_info": {
    "platform": "macOS-14.6.0-x86_64",
    "cpu_count": 8,
    "total_memory_gb": 16.0
  },
  "statistics": {
    "cluster_embeddings_kmeans": {
      "avg_execution_time": 0.045,
      "avg_memory_change": 2.45,
      "success_rate": 1.0
    }
  }
}

Markdown Report

Provides human-readable summaries with:

System configuration details
Per-operator performance statistics
Success rates and error information

Dependencies

pip install psutil memory_profiler numpy pillow