vision_agent.agent
vision_agent.agent.agent.Agent
Bases: ABC
log_progress
abstractmethod
Log the progress of the agent. This is a hook that is intended for reporting the progress of the agent.
vision_agent.agent.vision_agent.VisionAgent
Bases: Agent
Vision Agent is an agent that can chat with the user and call tools or other agents to generate code for it. Vision Agent uses python code to execute actions for the user. Vision Agent is inspired by by OpenDevin https://github.com/OpenDevin/OpenDevin and CodeAct https://arxiv.org/abs/2402.01030
Example
>>> from vision_agent.agent import VisionAgent
>>> agent = VisionAgent()
>>> resp = agent("Hello")
>>> resp.append({"role": "user", "content": "Can you write a function that counts dogs?", "media": ["dog.jpg"]})
>>> resp = agent(resp)
Initialize the VisionAgent.
PARAMETER | DESCRIPTION |
---|---|
agent |
The agent to use for conversation and orchestration of other agents.
TYPE:
|
verbosity |
The verbosity level of the agent.
TYPE:
|
callback_message |
Callback function to send intermediate update messages.
TYPE:
|
code_sandbox_runtime |
For string values it can be one of: None, "local" or "e2b". If None, it will read from the environment variable "CODE_SANDBOX_RUNTIME".
TYPE:
|
Source code in vision_agent/agent/vision_agent.py
chat
Chat with VisionAgent, it will use code to execute actions to accomplish its tasks.
PARAMETER | DESCRIPTION |
---|---|
chat |
A conversation in the format of: [{"role": "user", "content": "describe your task here..."}] or if it contains media files, it should be in the format of: [{"role": "user", "content": "describe your task here...", "media": ["image1.jpg", "image2.jpg"]}]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[Message]
|
List[Message]: The conversation response. |
Source code in vision_agent/agent/vision_agent.py
chat_with_artifacts
Chat with VisionAgent, it will use code to execute actions to accomplish its tasks.
PARAMETER | DESCRIPTION |
---|---|
chat |
A conversation in the format of: [{"role": "user", "content": "describe your task here..."}] or if it contains media files, it should be in the format of: [{"role": "user", "content": "describe your task here...", "media": ["image1.jpg", "image2.jpg"]}]
TYPE:
|
artifacts |
The artifacts to use in the task.
TYPE:
|
test_multi_plan |
If True, it will test tools for multiple plans and pick the best one based off of the tool results. If False, it will go with the first plan.
TYPE:
|
custom_tool_names |
A list of customized tools for agent to pick and use. If not provided, default to full tool set from vision_agent.tools.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Tuple[List[Message], Artifacts]
|
List[Message]: The conversation response. |
Source code in vision_agent/agent/vision_agent.py
|
|
streaming_message
vision_agent.agent.vision_agent_coder.VisionAgentCoder
VisionAgentCoder(
planner=None,
coder=None,
tester=None,
debugger=None,
verbosity=0,
report_progress_callback=None,
code_interpreter=None,
)
Bases: Agent
Vision Agent Coder is an agentic framework that can output code based on a user request. It can plan tasks, retrieve relevant tools, write code, write tests and reflect on failed test cases to debug code. It is inspired by AgentCoder https://arxiv.org/abs/2312.13010 and Data Interpeter https://arxiv.org/abs/2402.18679
Example
>>> import vision_agent as va
>>> agent = va.agent.VisionAgentCoder()
>>> code = agent("What percentage of the area of the jar is filled with coffee beans?", media="jar.jpg")
Initialize the Vision Agent Coder.
PARAMETER | DESCRIPTION |
---|---|
planner |
The planner model to use. Defaults to AnthropicVisionAgentPlanner.
TYPE:
|
coder |
The coder model to use. Defaults to AnthropicLMM.
TYPE:
|
tester |
The tester model to use. Defaults to AnthropicLMM.
TYPE:
|
debugger |
The debugger model to use. Defaults to AnthropicLMM.
TYPE:
|
verbosity |
The verbosity level of the agent. Defaults to 0. 2 is the highest verbosity level which will output all intermediate debugging code.
TYPE:
|
report_progress_callback |
a callback to report the progress of the agent. This is useful for streaming logs in a web application where multiple VisionAgentCoder instances are running in parallel. This callback ensures that the progress are not mixed up.
TYPE:
|
code_interpreter |
For string values it can be one of: None, "local" or "e2b". If None, it will read from the environment variable "CODE_SANDBOX_RUNTIME". If a CodeInterpreter object is provided it will use that.
TYPE:
|
Source code in vision_agent/agent/vision_agent_coder.py
planner
instance-attribute
debugger
instance-attribute
debugger = (
AnthropicLMM(temperature=0.0)
if debugger is None
else debugger
)
generate_code_from_plan
Generates code and other intermediate outputs from a chat input and a plan. The plan includes: - plans: The plans generated by the planner. - best_plan: The best plan selected by the planner. - plan_thoughts: The thoughts of the planner, including any modifications to the plan. - tool_doc: The tool documentation for the best plan. - tool_output: The tool output from the tools used by the best plan.
PARAMETER | DESCRIPTION |
---|---|
chat |
A conversation in the format of [{"role": "user", "content": "describe your task here..."}].
TYPE:
|
plan_context |
The context of the plan, including the plans, best_plan, plan_thoughts, tool_doc, and tool_output.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Dict[str, Any]
|
Dict[str, Any]: A dictionary containing the code output by the VisionAgentCoder and other intermediate outputs. include: - status (str): Whether or not the agent completed or failed generating the code. - code (str): The code output by the VisionAgentCoder. - test (str): The test output by the VisionAgentCoder. - test_result (Execution): The result of the test execution. - plans (Dict[str, Any]): The plans generated by the planner. - plan_thoughts (str): The thoughts of the planner. - working_memory (List[Dict[str, str]]): The working memory of the agent. |
Source code in vision_agent/agent/vision_agent_coder.py
402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 |
|
generate_code
Generates code and other intermediate outputs from a chat input.
PARAMETER | DESCRIPTION |
---|---|
chat |
A conversation in the format of [{"role": "user", "content": "describe your task here..."}].
TYPE:
|
test_multi_plan |
Whether to test multiple plans or just the best plan.
TYPE:
|
custom_tool_names |
A list of custom tool names to use for the planner.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Dict[str, Any]
|
Dict[str, Any]: A dictionary containing the code output by the VisionAgentCoder and other intermediate outputs. include: - status (str): Whether or not the agent completed or failed generating the code. - code (str): The code output by the VisionAgentCoder. - test (str): The test output by the VisionAgentCoder. - test_result (Execution): The result of the test execution. - plans (Dict[str, Any]): The plans generated by the planner. - plan_thoughts (str): The thoughts of the planner. - working_memory (List[Dict[str, str]]): The working memory of the agent. |
Source code in vision_agent/agent/vision_agent_coder.py
chat
vision_agent.agent.vision_agent_coder.AzureVisionAgentCoder
AzureVisionAgentCoder(
planner=None,
coder=None,
tester=None,
debugger=None,
verbosity=0,
report_progress_callback=None,
code_interpreter=None,
)
Bases: VisionAgentCoder
VisionAgentCoder that uses Azure OpenAI APIs for planning, coding, testing.
Pre-requisites: 1. Set the environment variable AZURE_OPENAI_API_KEY to your Azure OpenAI API key. 2. Set the environment variable AZURE_OPENAI_ENDPOINT to your Azure OpenAI endpoint.
Example
>>> import vision_agent as va
>>> agent = va.agent.AzureVisionAgentCoder()
>>> code = agent("What percentage of the area of the jar is filled with coffee beans?", media="jar.jpg")
Initialize the Vision Agent Coder.
PARAMETER | DESCRIPTION |
---|---|
planner |
The planner model to use. Defaults to AzureVisionAgentPlanner.
TYPE:
|
coder |
The coder model to use. Defaults to OpenAILMM.
TYPE:
|
tester |
The tester model to use. Defaults to OpenAILMM.
TYPE:
|
debugger |
The debugger model to
TYPE:
|
verbosity |
The verbosity level of the agent. Defaults to 0. 2 is the highest verbosity level which will output all intermediate debugging code.
TYPE:
|
report_progress_callback |
a callback to report the progress of the agent. This is useful for streaming logs in a web application where multiple VisionAgentCoder instances are running in parallel. This callback ensures that the progress are not mixed up.
TYPE:
|
Source code in vision_agent/agent/vision_agent_coder.py
planner
instance-attribute
debugger
instance-attribute
debugger = (
AnthropicLMM(temperature=0.0)
if debugger is None
else debugger
)
log_progress
generate_code_from_plan
Generates code and other intermediate outputs from a chat input and a plan. The plan includes: - plans: The plans generated by the planner. - best_plan: The best plan selected by the planner. - plan_thoughts: The thoughts of the planner, including any modifications to the plan. - tool_doc: The tool documentation for the best plan. - tool_output: The tool output from the tools used by the best plan.
PARAMETER | DESCRIPTION |
---|---|
chat |
A conversation in the format of [{"role": "user", "content": "describe your task here..."}].
TYPE:
|
plan_context |
The context of the plan, including the plans, best_plan, plan_thoughts, tool_doc, and tool_output.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Dict[str, Any]
|
Dict[str, Any]: A dictionary containing the code output by the VisionAgentCoder and other intermediate outputs. include: - status (str): Whether or not the agent completed or failed generating the code. - code (str): The code output by the VisionAgentCoder. - test (str): The test output by the VisionAgentCoder. - test_result (Execution): The result of the test execution. - plans (Dict[str, Any]): The plans generated by the planner. - plan_thoughts (str): The thoughts of the planner. - working_memory (List[Dict[str, str]]): The working memory of the agent. |
Source code in vision_agent/agent/vision_agent_coder.py
402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 |
|
generate_code
Generates code and other intermediate outputs from a chat input.
PARAMETER | DESCRIPTION |
---|---|
chat |
A conversation in the format of [{"role": "user", "content": "describe your task here..."}].
TYPE:
|
test_multi_plan |
Whether to test multiple plans or just the best plan.
TYPE:
|
custom_tool_names |
A list of custom tool names to use for the planner.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Dict[str, Any]
|
Dict[str, Any]: A dictionary containing the code output by the VisionAgentCoder and other intermediate outputs. include: - status (str): Whether or not the agent completed or failed generating the code. - code (str): The code output by the VisionAgentCoder. - test (str): The test output by the VisionAgentCoder. - test_result (Execution): The result of the test execution. - plans (Dict[str, Any]): The plans generated by the planner. - plan_thoughts (str): The thoughts of the planner. - working_memory (List[Dict[str, str]]): The working memory of the agent. |