Dimension Reduction Operator

Description

The Dimension Reduction operator reduces the dimensionality of high-dimensional embeddings using t-SNE and UMAP algorithms. It supports reducing embeddings to 2D or 3D for visualization and analysis purposes while preserving the most important structural information.

Model Information

t-SNE: t-Distributed Stochastic Neighbor Embedding for non-linear dimensionality reduction
UMAP: Uniform Manifold Approximation and Projection for scalable dimension reduction
Vector Size: Configurable (typically 2-3 dimensions for visualization)
Usage: Reduces high-dimensional embeddings to lower dimensions for visualization, clustering, and analysis

Dependencies

scikit-learn >= 1.6.1
numpy >= 1.26,<2.2.0
umap-learn >= 0.5.0

How to Run the Tests

Ensure that you are in the root directory of the feluda project.

Install dependencies (in your virtual environment):

uv pip install "./operators/dimension_reduction"
uv pip install "feluda[dev]"

Run the tests:

pytest operators/dimension_reduction/test.py

Usage

from feluda.operators import DimensionReduction

# Initialize with t-SNE
operator = DimensionReduction("tsne", {
    "n_components": 2,
    "perplexity": 2,
    "random_state": 42
})

# Prepare input data
input_data = [
    {"payload": "sample_1", "embedding": [1.0, 2.0, 3.0, 4.0, 5.0]},
    {"payload": "sample_2", "embedding": [2.0, 3.0, 4.0, 5.0, 6.0]},
    {"payload": "sample_3", "embedding": [3.0, 4.0, 5.0, 6.0, 7.0]}
]


# Run dimension reduction
result = operator.run(input_data)

# Access results
for item in result:
    print(f"Payload: {item['payload']}")
    print(f"Reduced embedding: {item['reduced_embedding']}")

class operators.dimension_reduction.dimension_reduction.DimensionReduction(model_type: str, params: dict[str, Any] | None = None)[source]

Bases: Operator

Main interface for dimensionality reduction.

__init__(model_type: str, params: dict[str, Any] | None = None) → None[source]

Initialize the dimension reduction operator.

Parameters:

model_type – Type of model to use (‘tsne’ or ‘umap’)
params – Optional dictionary of parameters for the model

Raises:

ValueError – If the model type is not supported or initialization fails

static get_reduction_model(model_type: str, params: dict[str, Any]) → ReductionModel[source]

Create a dimension reduction model based on the model type.

Parameters:

model_type – Type of model (‘tsne’ or ‘umap’)
params – Dictionary of parameters for the model

Returns:

A dimension reduction model instance

static gen_data(payloads: list, reduced_embeddings: numpy.ndarray) → list[dict][source]

Generates the formatted output.

Parameters:

payloads (list) – List of paylods.
reduced_embeddings (nd.array) – An array of reduced embeddings.

Returns:

A list of dictionaries containing the payload and corresponding embedding.

Return type:

list

run(input_data: list[dict]) → list[dict][source]

Reduce the dimensionality of the provided embeddings using the initialized model.

Parameters:: input_data (list) – A list of dictionaries containing payload and embeddings to be reduced.

Example

[

{: “payload”: “123”, “embedding”: [1, 2, 3]

}, {

“payload”: “124”, “embedding”: [1, 0, 1]

}

]

Returns:: The reduced embeddings and the corresponding payload as a list of dictionaries.
Return type:: list

Example

[

{: “payload”:”123”, “reduced_embedding”: [1, 2]

}, {

“payload”: “124”, “reduced_embedding”: [1, 0]

}

]

Raises:

ValueError – If the embeddings input is not a non-empty list.
KeyError – If the input data is invalid.

cleanup() → None[source]: Cleans up resources used by the operator.

state() → dict[source]

Returns the current state of the operator.

Returns:: State of the operator
Return type:: dict