InternLM-XComposer-2.5#

This example demonstrates how to use the InternLM-XComposer-2.5 tool to to answer questions about images or videos.

NOTE: The InternLM-XComposer-2.5 model should be used in GPU environments.

import cv2

from vision_agent_tools.models.internlm_xcomposer2 import InternLMXComposer2

# (replace this path with your own!)
video_path = "path/to/your/my_video.mp4"

# Load the video into frames
cap = cv2.VideoCapture(video_path)
frames = []
while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break
    frames.append(frame)
cap.release()

# Initialize the InternLMXComposer2 model
run_inference = InternLMXComposer2()
prompt = "Here are some frames of a video. Describe this video in detail"
# Time to put InternLMXComposer2 to work!
answer = run_inference(video=p_video, prompt=prompt)

# Print the output answer
print(answer)

`InternLMXComposer2` #

Bases: BaseMLModel

InternLM-XComposer-2.5 is a tool that excels in various text-image comprehension and composition applications, achieving GPT-4V level capabilities.

NOTE: The InternLM-XComposer-2.5 model should be used in GPU environments.

`call(prompt, image=None, video=None, frames=MAX_NUMBER_OF_FRAMES, chunk_length=None)` #

InternLMXComposer2 model answers questions about a video or image.

Parameters:

Name	Type	Description	Default
`prompt`	`str`	The prompt with the question to be answered.	required
`image`	`Image \| None`	The image to be analyzed.	`None`
`video`	`VideoNumpy \| None`	A numpy array containing the different images, representing the video.	`None`
`frames`	`int`	The number of frames to be used from the video.	`MAX_NUMBER_OF_FRAMES`
`chunk_length`	`int`	The number of frames for each chunk of video to analyze. The last chunk may have fewer frames.	`None`

Returns:

Type	Description
`list[str]`	list[str]: The answers to the prompt.

`init()` #

Initializes the InternLMXComposer2.5 model.

InternLM-XComposer-2.5#