Florence-2#

This example demonstrates using the Florence2 tool to interpret simple text prompts to perform tasks like captioning, object detection, and segmentation.

NOTE: The Florence-2 model can only be used in GPU environments.

from vision_agent_tools.models.florencev2 import Florencev2, PromptTask

# (replace this path with your own!)
test_image = "path/to/your/image.jpg"

# Choose the task that you are planning to use
task_prompt = PromptTask.CAPTION

# Load the image and create initialize the Florencev2 model
image = Image.open(test_image)
run_florence = Florencev2()

# Time to put Florencev2 to work! Let's see what it finds...
results = run_florence(image=image, task=task_prompt)

# Print the output result
print(f"The image contains: {results[task_prompt]}")

`Florencev2` #

Bases: BaseMLModel

Florence-2 can interpret simple text prompts to perform tasks like captioning, object detection, and segmentation.

NOTE: The Florence-2 model can only be used in GPU environments.

`call(task, image=None, images=None, video=None, prompt='', batch_size=5, nms_threshold=1.0)` #

Performs inference on the Florence-2 model based on the provided task, images, video (optional), and prompt.

Florence-2 is a sequence-to-sequence architecture excelling in both zero-shot and fine-tuned settings, making it a competitive vision foundation model.

For more examples and details, refer to the Florence-2 sample usage.

Parameters:

Name	Type	Description	Default
`task`	`PromptTask`	The specific task to be performed.	required
`image`	`Image`	A single image for the model to process. None if using video or a list of images.	`None`
`images`	`List[Image]`	A list of images for the model to process. None if using video or a single image	`None`
`video`	`VideoNumpy`	A NumPy representation of the video for inference. None if using images.	`None`
`prompt`	`str`	An optional text prompt to complement the task.	`''`
`batch_size`	`int`	The batch size used for processing multiple images or video frames.	`5`
`nms_threshold`	`float`	The IoU threshold value used to apply a dummy agnostic Non-Maximum Suppression (NMS).	`1.0`

Returns:

Name	Type	Description
`Any`	`Any`	The output of the Florence-2 model based on the provided task, images/video, and prompt. The output type can vary depending on the chosen task.

`init(device=Device.GPU)` #

Initializes the Florence-2 model.

Florence-2#

Florencev2 #

__call__(task, image=None, images=None, video=None, prompt='', batch_size=5, nms_threshold=1.0) #

__init__(device=Device.GPU) #

`Florencev2` #

`call(task, image=None, images=None, video=None, prompt='', batch_size=5, nms_threshold=1.0)` #

`init(device=Device.GPU)` #