InternLM-XComposer-2.5#
This example demonstrates how to use the InternLM-XComposer-2.5 tool to to answer questions about images or videos.
NOTE: The InternLM-XComposer-2.5 model should be used in GPU environments.
import cv2
from vision_agent_tools.models.internlm_xcomposer2 import InternLMXComposer2
# (replace this path with your own!)
video_path = "path/to/your/my_video.mp4"
# Load the video into frames
cap = cv2.VideoCapture(video_path)
frames = []
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
frames.append(frame)
cap.release()
# Initialize the InternLMXComposer2 model
run_inference = InternLMXComposer2()
prompt = "Here are some frames of a video. Describe this video in detail"
# Time to put InternLMXComposer2 to work!
answer = run_inference(video=p_video, prompt=prompt)
# Print the output answer
print(answer)
InternLMXComposer2
#
Bases: BaseMLModel
InternLM-XComposer-2.5 is a tool that excels in various text-image comprehension and composition applications, achieving GPT-4V level capabilities.
NOTE: The InternLM-XComposer-2.5 model should be used in GPU environments.
__call__(prompt, image=None, video=None, frames=MAX_NUMBER_OF_FRAMES, chunk_length=None)
#
InternLMXComposer2 model answers questions about a video or image.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prompt |
str
|
The prompt with the question to be answered. |
required |
image |
Image | None
|
The image to be analyzed. |
None
|
video |
VideoNumpy | None
|
A numpy array containing the different images, representing the video. |
None
|
frames |
int
|
The number of frames to be used from the video. |
MAX_NUMBER_OF_FRAMES
|
chunk_length |
int
|
The number of frames for each chunk of video to analyze. The last chunk may have fewer frames. |
None
|
Returns:
Type | Description |
---|---|
list[str]
|
list[str]: The answers to the prompt. |
__init__()
#
Initializes the InternLMXComposer2.5 model.