Dimension Reduction Operator
Description
The Dimension Reduction operator reduces the dimensionality of high-dimensional embeddings using t-SNE and UMAP algorithms. It supports reducing embeddings to 2D or 3D for visualization and analysis purposes while preserving the most important structural information.
Model Information
t-SNE: t-Distributed Stochastic Neighbor Embedding for non-linear dimensionality reduction
UMAP: Uniform Manifold Approximation and Projection for scalable dimension reduction
Vector Size: Configurable (typically 2-3 dimensions for visualization)
Usage: Reduces high-dimensional embeddings to lower dimensions for visualization, clustering, and analysis
Dependencies
scikit-learn >= 1.6.1
numpy >= 1.26,<2.2.0
umap-learn >= 0.5.0
How to Run the Tests
Ensure that you are in the root directory of the
feludaproject.Install dependencies (in your virtual environment):
uv pip install "./operators/dimension_reduction" uv pip install "feluda[dev]"
Run the tests:
pytest operators/dimension_reduction/test.py
Usage
from feluda.operators import DimensionReduction
# Initialize with t-SNE
operator = DimensionReduction("tsne", {
"n_components": 2,
"perplexity": 2,
"random_state": 42
})
# Prepare input data
input_data = [
{"payload": "sample_1", "embedding": [1.0, 2.0, 3.0, 4.0, 5.0]},
{"payload": "sample_2", "embedding": [2.0, 3.0, 4.0, 5.0, 6.0]},
{"payload": "sample_3", "embedding": [3.0, 4.0, 5.0, 6.0, 7.0]}
]
# Run dimension reduction
result = operator.run(input_data)
# Access results
for item in result:
print(f"Payload: {item['payload']}")
print(f"Reduced embedding: {item['reduced_embedding']}")
- class operators.dimension_reduction.dimension_reduction.DimensionReduction(model_type: str, params: dict[str, Any] | None = None)[source]
Bases:
OperatorMain interface for dimensionality reduction.
- __init__(model_type: str, params: dict[str, Any] | None = None) None[source]
Initialize the dimension reduction operator.
- Parameters:
model_type – Type of model to use (‘tsne’ or ‘umap’)
params – Optional dictionary of parameters for the model
- Raises:
ValueError – If the model type is not supported or initialization fails
- static get_reduction_model(model_type: str, params: dict[str, Any]) ReductionModel[source]
Create a dimension reduction model based on the model type.
- Parameters:
model_type – Type of model (‘tsne’ or ‘umap’)
params – Dictionary of parameters for the model
- Returns:
A dimension reduction model instance
- static gen_data(payloads: list, reduced_embeddings: numpy.ndarray) list[dict][source]
Generates the formatted output.
- run(input_data: list[dict]) list[dict][source]
Reduce the dimensionality of the provided embeddings using the initialized model.
- Parameters:
input_data (list) – A list of dictionaries containing payload and embeddings to be reduced.
Example
- [
- {
“payload”: “123”, “embedding”: [1, 2, 3]
}, {
“payload”: “124”, “embedding”: [1, 0, 1]
}
]
- Returns:
The reduced embeddings and the corresponding payload as a list of dictionaries.
- Return type:
Example
- [
- {
“payload”:”123”, “reduced_embedding”: [1, 2]
}, {
“payload”: “124”, “reduced_embedding”: [1, 0]
}
]
- Raises:
ValueError – If the embeddings input is not a non-empty list.
KeyError – If the input data is invalid.