SigLIP#

This example demonstrates using the SigLIP model to perform image classification.

Parameters#

model_name (str, optional): The name of the pre-trained model to use. Defaults to "google/siglip-base-patch16-224".
device (Device, optional): The device to run the model on. If not specified, it will default to GPU if available, otherwise CPU.
image (Image.Image): The image to classify.
labels (List[str]): The list of candidate labels to classify the image.

Perform Zero-Shot Image Classification#

The model processes a given image and provided labels and produces classification scores for each candidate label. These scores represent the model's confidence in each label being the correct classification for the image. To perform zero-shot image classification using the SigLIP model, follow these steps:

Initialize the Model:

from vision_agent_tools.models.siglip import Siglip

model = Siglip()

Prepare the Image and Labels:

from PIL import Image

image = Image.open("path_to_your_image.jpg")
labels = ["cat", "dog", "bird"]

Run the Classification:

results = model(
    image=image,
    labels=labels,
)

Interpret the Results:

for i, result in enumerate(results['scores']):
    print(f"Label: {results['labels'][i]}, Score: {result:.2f}")

The higher the score, the more confident the model is that the label is the correct classification. This information can be used to understand the model's decision-making process and to evaluate its performance.

Example: - Label: A, 0.85% - Label: B, 0.10% - Label: C, 0.05%

In this example, the model is most confident that the image belongs to Label A.