Florence-2#
This example demonstrates using the Florence2 tool to interpret simple text prompts to perform tasks like captioning, object detection, and segmentation.
NOTE: The Florence-2 model can only be used in GPU environments.
from vision_agent_tools.models.florencev2 import Florencev2, PromptTask
# (replace this path with your own!)
test_image = "path/to/your/image.jpg"
# Choose the task that you are planning to use
task_prompt = PromptTask.CAPTION
# Load the image and create initialize the Florencev2 model
image = Image.open(test_image)
run_florence = Florencev2()
# Time to put Florencev2 to work! Let's see what it finds...
results = run_florence(image=image, task=task_prompt)
# Print the output result
print(f"The image contains: {results[task_prompt]}")
Florencev2
#
Bases: BaseMLModel
Florence-2 can interpret simple text prompts to perform tasks like captioning, object detection, and segmentation.
NOTE: The Florence-2 model can only be used in GPU environments.
__call__(task, image=None, images=None, video=None, prompt='', batch_size=5, nms_threshold=1.0)
#
Performs inference on the Florence-2 model based on the provided task, images, video (optional), and prompt.
Florence-2 is a sequence-to-sequence architecture excelling in both zero-shot and fine-tuned settings, making it a competitive vision foundation model.
For more examples and details, refer to the Florence-2 sample usage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
task |
PromptTask
|
The specific task to be performed. |
required |
image |
Image
|
A single image for the model to process. None if using video or a list of images. |
None
|
images |
List[Image]
|
A list of images for the model to process. None if using video or a single image |
None
|
video |
VideoNumpy
|
A NumPy representation of the video for inference. None if using images. |
None
|
prompt |
str
|
An optional text prompt to complement the task. |
''
|
batch_size |
int
|
The batch size used for processing multiple images or video frames. |
5
|
nms_threshold |
float
|
The IoU threshold value used to apply a dummy agnostic Non-Maximum Suppression (NMS). |
1.0
|
Returns:
| Name | Type | Description |
|---|---|---|
Any |
Any
|
The output of the Florence-2 model based on the provided task, images/video, and prompt. The output type can vary depending on the chosen task. |
__init__(device=Device.GPU)
#
Initializes the Florence-2 model.