ImageVecRep-Resnet Operator

Description

The ImageVecRep-Resnet operator extracts vector representations from images using the ResNet18 model. It generates a 512-dimensional feature vector from input images, enabling downstream tasks such as image similarity, clustering, and classification. The operator uses the pre-trained ResNet18 model with ImageNet weights and extracts features from the average pooling layer.

Model Information

Model: ResNet18
Source: PyTorch Vision Models
Vector Size: 512
Usage: The model is used to generate embeddings for images, enabling downstream tasks such as image similarity, clustering, and classification.

System Dependencies

Python >= 3.10

How to Run the Tests

Ensure that you are in the root directory of the feluda project.

Install dependencies (in your virtual environment):

uv pip install "./operators/image_vec_rep"
uv pip install "feluda[dev]"

Run the tests:
```
pytest operators/image_vec_rep/test.py
```

Usage

from feluda.factory import ImageFactory
from feluda.operators import ImageVecRep

# Initialize the operator
operator = ImageVecRep()

# Load an image
image_obj = ImageFactory.make_from_url("https://example.com/image.jpg")

# Extract features
features = operator.run(image_obj)
print(f"Feature vector shape: {features.shape}")  # (512,)
print(f"Feature vector dtype: {features.dtype}")  # float16

# Cleanup
operator.cleanup()

class operators.image_vec_rep.image_vec_rep.ImageVecRep[source]

Bases: Operator

Operator to extract image vector representations using ResNet18.

__init__() → None[source]: Initializes the ImageVecRep operator with a pre-trained ResNet18 model.

extract_feature(img: PIL.Image.Image) → numpy.ndarray[source]

Extracts a 512-dimensional feature vector from a PIL Image using ResNet18.

Parameters:: img (Image.Image) – Input image (must be a PIL Image).
Returns:: 512-dimensional feature vector (float16).
Return type:: np.ndarray

run(image_obj: ImageFactory) → numpy.ndarray[source]

Runs the operator on an image object from ImageFactory.

Parameters:: image_obj (dict) – Dictionary with key ‘image’ containing a PIL Image.
Returns:: 512-dimensional feature vector.
Return type:: np.ndarray

state() → dict[source]

Returns the current state of the operator.

Returns:: State of the operator
Return type:: dict

cleanup() → None[source]: Cleans up resources used by the operator.