Lab 5c: Multimodal Agents

Lab 5c: Multimodal Agents#

About This Lab#

Throughout this lab, you will encounter two types of interactive elements:

            ![Activity](images/mlu-activity.png)

            ![Challenge](images/mlu-challenge.png)

            No coding is needed for an activity. You try to understand a concept,

answer questions, or run a code cell.

            Challenges are where you test your understanding by implementing something new or taking a short quiz.

Please work through this notebook from top to bottom to avoid errors due to missing code or context.

Table of Contents#

1. Installing dependencies#

In this lab, we will develop custom multimodal tools and agents to accomplish certain complex tasks.

%%capture
!pip install -q -r ../requirements.txt

Let’s import the libraries and modules required for this lab. We will import the invoke_nova_lite_multimodal and get_base64_encoded_image functions we defined and used in previous labs.

import sys
sys.path.append('..')

import boto3
import base64
import json
from IPython.display import JSON
import time
from tqdm import tqdm
from botocore.exceptions import ClientError
from IPython.display import Image, display, Markdown, IFrame
from langchain.tools import tool, BaseTool
import io
from PIL import Image as pil_image,  ImageDraw, ImageFont

from mlu_utils.multimodal_utils import invoke_nova_lite_multimodal, prepare_image, get_base64_encoded_image

2. Multimodal agent for image generation and description#

Let’s see how we can develop a multimodal agent with custom tools for an engaging movie poster and story generation application.

Movie poster generation: You provide a prompt or concept for a movie, and the application utilizes the image-generator-tool to create an initial movie poster based on your input. This tool leverages the Stability AI Stable Diffusion 3.5 Large model to produce a visually compelling movie poster.
Poster variation: Once the first movie poster is generated, the image_variation_tool is employed to create a variation of the initial poster. This tool uses the Stability AI Control Structure service to produce a slightly different version of the movie poster, potentially representing a different genre, mood, or style.
Story generation: With the two movie posters in hand, the application then utilizes the Image-to-story tool, which is powered by the Amazon Nova Lite multimodal model. The multimodal agent analyzes the visual elements, symbolism, and imagery present in the movie posters and generates a compelling story or plot synopsis based on its understanding of the visual cues.
Output: The final output of the application is a set of two visually distinct movie posters and a corresponding story or plot synopsis that captures the essence and narrative suggested by the imagery. This combination of visual and language generation capabilities allows you to explore creative concepts and see how they might translate into compelling movie ideas.

The application leverages the strengths of different tools and models, including the Stability AI Stable Diffusion 3.5 Large and Control Structure services for visual generation and the Amazon Nova Lite model for visual understanding and language generation. By combining these capabilities, the application offers a unique and engaging experience for you to explore movie ideas and see how visual elements can inspire and shape narratives.

2.1 Custom multimodal tools for image generation and description#

@tool
def image_to_story_tool(image_path: str):
    """Use this tool to generate a story related to a given image. The input of the tool is the path of the image."""
    #image_string, image_type = get_base64_encoded_image(image_path.replace("\n", ""))
    image_path_clean = image_path.replace("\n", "").strip().strip('"').strip("'")
    # Remove keyword argument syntax like image_path="..."
    if '=' in image_path_clean:
        image_path_clean = image_path_clean.split('=', 1)[1].strip().strip('"').strip("'")
    # Remove any trailing ReAct artifacts the agent may append
    for suffix in ['Observation', 'Thought', 'Action', 'Final Answer']:
        if suffix in image_path_clean:
            image_path_clean = image_path_clean[:image_path_clean.index(suffix)].strip().strip('"').strip("'")
    image_binary, image_type = prepare_image(image_path_clean)
    prompt = "Write an interesting story related to the given image. Produce the response without a preamble. Just write the story."
    response = invoke_nova_lite_multimodal(prompt=prompt, images=image_binary, image_types=image_type) 
    return response

@tool
def add_text_to_image(image_path):
    """Use this tool to add the title to the movie poster. The input is a string with the image_path and title separated by comma."""
    # Open the image
    image = pil_image.open(image_path)

    # Create a drawing object
    draw = ImageDraw.Draw(image)

    # Define the font and its properties
    font_path = "data/lab4/Agents/FranklinGothic.ttf"  # Replace this with the path to your desired font file
    font_size = 80  # Adjust the font size as needed
    font = ImageFont.truetype(font_path, font_size)

    # Calculate the text position
    text_width = draw.textlength(text, font)
    image_width, image_height = image.size
    text_x = (image_width - text_width) / 2  # Center the text horizontally
    text_y = image_height - font_size - 50  # Position the text near the bottom

    # Draw the text on the image
    draw.text((text_x, text_y), text, font=font, fill=(255, 255, 255))  # White text color

    # Save the modified image
    image.save(output_path)

@tool
def image_generator_tool(prompt: str) -> str:
    """
    Generate an image using a text prompt.
    
    Args:
        prompt (str): The text prompt for image generation.
        
    Returns:
        str: The file path of the generated image.
    """
    
    # Initialize AWS client for Bedrock Runtime
    client = boto3.client(service_name="bedrock-runtime", region_name="us-west-2")
    
    # Set request headers
    accept = "application/json"
    content_type = "application/json"
    
    # Set model ID for image generation
    model_id = 'stability.sd3-5-large-v1:0'
    
    # Prepare request body
    body = json.dumps({
        "prompt": prompt
    })
    
    # Invoke the model
    response = client.invoke_model(
        body=body, modelId=model_id, accept=accept, contentType=content_type
    )
    
    # Parse the response
    response_body = json.loads(response.get("body").read())
    finish_reason = response_body.get('finish_reasons', [None])[0]
    if finish_reason is not None:
        raise Exception(f"Image generation error: {finish_reason}")
    img = response_body.get('images')[0]
    base64_bytes = img.encode('ascii')
    image_bytes = base64.b64decode(base64_bytes)
    
    # Save the generated image
    image_path = "generated_image.png"
    pil_image.open(io.BytesIO(image_bytes)).save(image_path)
    
    return image_path

@tool
def image_variation_tool(prompt: str):
    """
    Generate a second, alternative version of a movie poster. Use this tool AFTER image_generator_tool to create a different poster for the same movie. The input is the text prompt describing the movie poster.

    Args:
        prompt (str): The text prompt for the alternative poster.

    Returns:
        str: The file path of the generated variation image.
    """

    # Initialize the AWS Bedrock Runtime client
    client = boto3.client(service_name="bedrock-runtime", region_name="us-west-2")

    # Set the request headers and parameters
    accept = "application/json"
    content_type = "application/json"
    image_path = 'generated_image.png'
    model_id = 'stability.sd3-5-large-v1:0'

    # Create the request body - generate a variation using a modified prompt
    variation_prompt = f"A different artistic interpretation of: {prompt}. Alternative style, different color palette, unique composition."
    body = json.dumps({
        "prompt": variation_prompt
    })

    # Invoke the Bedrock Runtime model for image variation
    response = client.invoke_model(
        body=body, modelId=model_id, accept=accept, contentType=content_type
    )
    response_body = json.loads(response.get("body").read())

    # Extract the generated image from the response
    finish_reason = response_body.get('finish_reasons', [None])[0]
    if finish_reason is not None:
        raise Exception(f"Image variation error: {finish_reason}")
    img = response_body.get('images')[0]
    base64_bytes = img.encode('ascii')
    image_bytes = base64.b64decode(base64_bytes)

    # Save the generated image to a file
    output_path = "variation_image.png"
    pil_image.open(io.BytesIO(image_bytes)).save(output_path)

    return output_path

2.2 Agentic application for image generation and description#

Let’s define the custom agent that will select the best tool at each planning step using the ReAct logic and accomplish the task. The application utilizes LangChain’s Agent Executor to orchestrate the custom agentic workflow that leverages three tools: an image generator, an image variation tool, and an image-to-story tool.

The agent workflow proceeds as follows:

Ingest the user’s movie concept or prompt.
Invoke the image generator tool to create an initial movie poster.
Call the image variation tool to generate a second poster variation.
Utilize the image-to-story tool (powered by a multimodal model) to analyze the visual elements of both posters and generate a corresponding plot synopsis.
Collate and present the two movie posters and the generated plot synopsis as the final output.

The agent executor acts as the central coordinator, managing the execution flow and data transfer between the custom tools. This agentic approach, facilitated by LangChain, enables the seamless integration and orchestration of the custom tools, resulting in a streamlined process for generating movie posters, variations, and narratives based on the user’s input.

# define custom agent
def create_custom_agent(tools):
    """
    Creates a custom agent with the given tools and a specific prompt template.

    Args:
        tools (list): A list of tools to be used by the agent.

    Returns:
        AgentExecutor: An instance of the AgentExecutor class with the custom agent.
    """
    import re
    from langchain_aws import ChatBedrockConverse
    from langchain.agents import AgentExecutor, create_react_agent
    from langchain_core.prompts.chat import ChatPromptTemplate
    from langchain.agents.output_parsers import ReActSingleInputOutputParser
    from langchain_core.agents import AgentAction, AgentFinish
    
    class FixedReActOutputParser(ReActSingleInputOutputParser):
        """Custom parser that handles function-call-style actions like tool_name(args)."""
        def parse(self, text: str):
            # Fix function-call-style actions: tool_name(args) -> tool_name + args
            func_call_pattern = r'Action:\s*(\w+)\((.*)\)'
            match = re.search(func_call_pattern, text, re.DOTALL)
            if match:
                tool_name = match.group(1).strip()
                tool_input = match.group(2).strip().strip('"').strip("'")
                # Remove keyword argument syntax like key="value"
                if '=' in tool_input and not any(c in tool_input.split('=')[0] for c in ' /\\.'):
                    tool_input = tool_input.split('=', 1)[1].strip().strip('"').strip("'")
                text = text[:match.start()] + f'Action: {tool_name}\nAction Input: {tool_input}'
            return super().parse(text)
    
    # Initialize the large language model (LLM) with the specified model ID and temperature
    llm = ChatBedrockConverse(
        model="amazon.nova-pro-v1:0",
        temperature=0
    )

    # Define the prompt template for the agent
    prompt = ChatPromptTemplate.from_messages(
        [
            (
                "system",
                """Answer the following questions as best you can. You have access to the following tools:

                {tools}

                Use the following format:

                Question: the input question you must answer
                Thought: you should always think about what to do
                Action: the action to take, should be one of [{tool_names}]
                Action Input: the input to the action
                Observation: the result of the action
                ... (this Thought/Action/Action Input/Observation can repeat N times)
                Thought: I now know the final answer
                Final Answer: the final answer to the original input question""",
            ),
            ("user", "Begin!\n\nQuestion: {input}\nThought:{agent_scratchpad}")
        ]
    )

    #########################

    # Create the custom agent using the LLM, tools, and prompt
    agent = create_react_agent(llm, tools, prompt, output_parser=FixedReActOutputParser())

    # Create an instance of the AgentExecutor with the custom agent
    return AgentExecutor(agent=agent, tools=tools, verbose=True, return_intermediate_steps=True, handle_parsing_errors=True, max_iterations=10)

# Create a list of all relevant tools
tools = [image_to_story_tool, image_generator_tool, image_variation_tool]

# Define the custom agent and provide the agent access to the above tools
movie_agent = create_custom_agent(tools)

Activity

Activity: Try it yourself!#

    Try different prompts and observe the posters and the plots of the movies using the custom agent.

# Test out the agent with a simple example
prompt = """Draw a poster for a sci-fi PG-13 movie called 'Paradox' using the image_generator_tool, \
then create a second different poster for the same movie using the image_variation_tool, \
and finally write the story of the movie based on the first poster using the image_to_story_tool."""

response_movie = movie_agent.invoke({"input": prompt})

> Entering new AgentExecutor chain...
Thought: I need to first generate a poster for the sci-fi movie 'Paradox' using the image_generator_tool. After that, I will create a second different poster for the same movie using the image_variation_tool. Finally, I will write the story of the movie based on the first poster using the image_to_story_tool.

Action: image_generator_tool
Action Input: "A sci-fi PG-13 movie poster for 'Paradox' featuring a futuristic cityscape with towering skyscrapers, flying vehicles, and a mysterious protagonist in the foreground. The sky is filled with neon lights and holographic advertisements. The title 'Paradox' is prominently displayed in bold, futuristic font."

Observation

---------------------------------------------------------------------------
AccessDeniedException                     Traceback (most recent call last)
Cell In[18], line 6
# Test out the agent with a simple example
prompt = """Draw a poster for a sci-fi PG-13 movie called 'Paradox' using the image_generator_tool, \
then create a second different poster for the same movie using the image_variation_tool, \
and finally write the story of the movie based on the first poster using the image_to_story_tool."""
----> 6 response_movie = movie_agent.invoke({"input": prompt})

File /opt/conda/lib/python3.12/site-packages/langchain/chains/base.py:170, in Chain.invoke(self, input, config, **kwargs)
except BaseException as e:
   run_manager.on_chain_error(e)
--> 170     raise e
run_manager.on_chain_end(outputs)
if include_run_info:

File /opt/conda/lib/python3.12/site-packages/langchain/chains/base.py:160, in Chain.invoke(self, input, config, **kwargs)
try:
   self._validate_inputs(inputs)
   outputs = (
--> 160         self._call(inputs, run_manager=run_manager)
       if new_arg_supported
       else self._call(inputs)
   )
   final_outputs: Dict[str, Any] = self.prep_outputs(
       inputs, outputs, return_only_outputs
   )
except BaseException as e:

File /opt/conda/lib/python3.12/site-packages/langchain/agents/agent.py:1624, in AgentExecutor._call(self, inputs, run_manager)
# We now enter the agent loop (until it returns something).
while self._should_continue(iterations, time_elapsed):
-> 1624     next_step_output = self._take_next_step(
       name_to_tool_map,
       color_mapping,
       inputs,
       intermediate_steps,
       run_manager=run_manager,
   )
   if isinstance(next_step_output, AgentFinish):
       return self._return(
           next_step_output, intermediate_steps, run_manager=run_manager
       )

File /opt/conda/lib/python3.12/site-packages/langchain/agents/agent.py:1332, in AgentExecutor._take_next_step(self, name_to_tool_map, color_mapping, inputs, intermediate_steps, run_manager)
def _take_next_step(
   self,
   name_to_tool_map: Dict[str, BaseTool],
   (...)
   run_manager: Optional[CallbackManagerForChainRun] = None,
) -> Union[AgentFinish, List[Tuple[AgentAction, str]]]:
   return self._consume_next_step(
       [
           a
-> 1332             for a in self._iter_next_step(
               name_to_tool_map,
               color_mapping,
               inputs,
               intermediate_steps,
               run_manager,
           )
       ]
   )

File /opt/conda/lib/python3.12/site-packages/langchain/agents/agent.py:1415, in AgentExecutor._iter_next_step(self, name_to_tool_map, color_mapping, inputs, intermediate_steps, run_manager)
   yield agent_action
for agent_action in actions:
-> 1415     yield self._perform_agent_action(
       name_to_tool_map, color_mapping, agent_action, run_manager
   )

File /opt/conda/lib/python3.12/site-packages/langchain/agents/agent.py:1437, in AgentExecutor._perform_agent_action(self, name_to_tool_map, color_mapping, agent_action, run_manager)
       tool_run_kwargs["llm_prefix"] = ""
   # We then call the tool on the tool input to get an observation
-> 1437     observation = tool.run(
       agent_action.tool_input,
       verbose=self.verbose,
       color=color,
       callbacks=run_manager.get_child() if run_manager else None,
       **tool_run_kwargs,
   )
else:
   tool_run_kwargs = self._action_agent.tool_run_logging_kwargs()

File /opt/conda/lib/python3.12/site-packages/langchain_core/tools/base.py:895, in BaseTool.run(self, tool_input, verbose, start_color, color, callbacks, tags, metadata, run_name, run_id, config, tool_call_id, **kwargs)
if error_to_raise:
   run_manager.on_tool_error(error_to_raise)
--> 895     raise error_to_raise
output = _format_output(content, artifact, tool_call_id, self.name, status)
run_manager.on_tool_end(output, color=color, name=self.name, **kwargs)

File /opt/conda/lib/python3.12/site-packages/langchain_core/tools/base.py:864, in BaseTool.run(self, tool_input, verbose, start_color, color, callbacks, tags, metadata, run_name, run_id, config, tool_call_id, **kwargs)
   if config_param := _get_runnable_config_param(self._run):
       tool_kwargs |= {config_param: config}
--> 864     response = context.run(self._run, *tool_args, **tool_kwargs)
if self.response_format == "content_and_artifact":
   if not isinstance(response, tuple) or len(response) != 2:

File /opt/conda/lib/python3.12/site-packages/langchain_core/tools/structured.py:93, in StructuredTool._run(self, config, run_manager, *args, **kwargs)
   if config_param := _get_runnable_config_param(self.func):
       kwargs[config_param] = config
---> 93     return self.func(*args, **kwargs)
msg = "StructuredTool does not support sync invocation."
raise NotImplementedError(msg)

Cell In[14], line 29, in image_generator_tool(prompt)
body = json.dumps({
   "prompt": prompt
})
# Invoke the model
---> 29 response = client.invoke_model(
   body=body, modelId=model_id, accept=accept, contentType=content_type
)
# Parse the response
response_body = json.loads(response.get("body").read())

File /opt/conda/lib/python3.12/site-packages/botocore/client.py:569, in ClientCreator._create_api_method.<locals>._api_call(self, *args, **kwargs)
   raise TypeError(
       f"{py_operation_name}() only accepts keyword arguments."
   )
# The "self" in this scope is referring to the BaseClient.
--> 569 return self._make_api_call(operation_name, kwargs)

File /opt/conda/lib/python3.12/site-packages/botocore/client.py:1023, in BaseClient._make_api_call(self, operation_name, api_params)
   error_code = error_info.get("QueryErrorCode") or error_info.get(
       "Code"
   )
   error_class = self.exceptions.from_code(error_code)
-> 1023     raise error_class(parsed_response, operation_name)
else:
   return parsed_response

AccessDeniedException: An error occurred (AccessDeniedException) when calling the InvokeModel operation: Model access is denied due to IAM user or service role is not authorized to perform the required AWS Marketplace actions (aws-marketplace:ViewSubscriptions, aws-marketplace:Subscribe) to enable access to this model. Refer to the Amazon Bedrock documentation for further details. Your AWS Marketplace subscription for this model cannot be completed at this time. If you recently fixed this issue, try again after 5 minutes.

Here’s the first movie poster:#

Image("generated_image.png", width=300)

Here’s the second movie poster:#

Image("variation_image.png", width=300)

And here’s the plot of the movie:#

Markdown("<i>"+ response_movie['intermediate_steps'][-1][1] +"</i>")

Challenge

Challenge: Use all the tools#

    In the example above, we did not need to use the add_text_to_image tool. Try to craft a prompt so that this tool is also used.

### Enter your code below

###

3. Multimodal agent for retrieval-based responses#

Here is a description of the multimodal agent for retrieval-based workflows:

This multimodal AI agent is designed to assist in verifying the authenticity of physical products by leveraging web search capabilities and advanced multimodal techniques. The agent has access to three custom tools, discussed in the next section.

3.1 Custom multimodal tools for retrieval#

Let’s define three custom tools to allow the agent to retrieve results from websites to authenticate the product in the image as well as suggest websites where the authentic image may be purchased.

Image Comparison Tool: This tool utilizes Amazon Nova’s state-of-the-art multimodal capabilities to compare two product images in detail. It can analyze and compare various visual properties, such as shape, color, design elements, and intricate details, to determine the similarity or dissimilarity between the images.
Product Web Search: This tool allows the agent to perform comprehensive web searches using DuckDuckGo to gather information and visual representations of a specific product. It can retrieve product descriptions, specifications, and images from various online sources, building a comprehensive knowledge base about the product.
Image Web Search: This tool enables the agent to search the web for images of a product based on a text prompt. It can find and retrieve relevant product images from various online sources, further enhancing the agent’s visual knowledge base.

@tool
def image_comparison(image_paths:str):
    """Use this tool to compare and contrast two images. The input of the tool is a string consisting of both image paths seperated by comma. Image paths can be local paths or urls."""
    image_paths_arr = [f.strip().replace("\n", "").strip('"').strip("'") for f in image_paths.split(',')]
    image_binary, image_type = prepare_image(image_paths_arr)
    prompt = "Compare and contrast the two images. Share insights on if they are completely identical, similar or distinct. Produce the response without a preamble. Just write the analysis."
    response = invoke_nova_lite_multimodal(prompt=prompt, images=image_binary, image_types=image_type)

    return response

from ddgs import DDGS

@tool
def product_web_search(prompt:str):
    """Search online for a website about a product. The input is the prompt with the product ID and the brand name. The prompt needs to be under 45 characters."""
    search_tool = DDGS()
    time.sleep(1)  # Add delay between requests
    response = search_tool.text(query=prompt, max_results=5, region='us-en', safesearch='on')
    
    return response


@tool
def image_web_search(prompt:str):
    """Search the web for images of a product based on the prompt. The input is the prompt or query used to search for images of a product."""
    search_tool = DDGS()
    time.sleep(10)  # Add delay between requests
    response = search_tool.images(query=prompt, region='us-en', max_results=1)[0]['image'].partition("?")[0]
    time.sleep(10)  # Add delay between requests
    return response

Challenge

Challenge: Create custom tools#

    The agent's ability to pick the appropriate tool is crucial in achieving the desired response. Let's explore this ability by providing the agent with many more tools. Create a few more useful tools in the cell below and test the agent's response when it has to select from many more tools.

### Enter your code below

###

3.2 Agentic application based on retrieval#

Let’s define the custom agent, similar to the previous example, that will select the best tool at each planning step using the ReAct logic.

The multimodal agent’s workflow is as follows:

When presented with an image of a product, the agent utilizes the image web search tool to find additional images of the product, expanding its visual knowledge base.
Using the image comparison tool, the agent compares the visual information gathered from the web with the physical product in question.
Based on the comparison results, the agent can provide an assessment of whether the physical product is likely to be authentic or an imitation.
Finally it will search online for a website where the user may purchase the authenticated product.

This multimodal agent leverages the power of web search, computer vision, and multimodal analysis to provide a comprehensive solution for product authentication. By combining textual and visual information from various online sources with advanced image comparison techniques, the agent can assist in verifying the authenticity of physical products with a high degree of accuracy and reliability.

retrieval_tools = [product_web_search, image_web_search, image_comparison]
retrieval_agent = create_custom_agent(retrieval_tools)

Image("content/Agents/nike.jpg", width=300)

Activity

Activity: Try it yourself!#

    Try different images of products and observe how the agent authenticates the product using the tools.

If you get a RateLimitException, wait a few minutes before trying again. DuckDuckGO is a free web search API and limits the number of API requests.

prompt = """I have an image of a shoe at "./content/Agents/nike.jpg".  
Can you check if they are Nike Air Jordan 1 Low SE FN5214-131? 
If they are, find the link to the website where i can find the product. Do not generate clickable links in the output."""
try:
    response_shoes = retrieval_agent.invoke({"input":prompt})
except Exception as e:
    response_shoes = {}
    if "403" in str(e):
        print(f"\nRatelimitException raised. Wait some minutes before trying again")
    else:
        print(f"\nAn unexpected error occurred: {str(e)}")

Here’s the response about the authenticity of the product:#

⚠️ IMPORTANT SECURITY NOTICE ⚠️#

The URLs displayed in this educational notebook are REAL URLs. While they are shown for educational purposes, we strongly advise:

DO NOT click on or visit these URLs
DO NOT use them for further exploration
DO NOT assume they are safe or vetted

This notebook is for demonstration purposes only. Visiting unknown URLs can expose you to security risks, malware, or inappropriate content. Always practice safe browsing habits and only visit trusted, verified websites.

If you need to explore web resources, please use official documentation and trusted sources.

if 'output' in response_shoes:
    result = Markdown("<i>"+response_shoes['output'] + "</i>")
else:
    result = Markdown("No field `output` found in response data")
result

Here’s the response about the webpage to purchase the original product.#

if 'intermediate_steps' in response_shoes and response_shoes['intermediate_steps']:
    url = response_shoes['intermediate_steps'][-1][1]
    if isinstance(url, str):
        Markdown(f"<i>{url}</i>")
    else:
        JSON(url)
else:
    Markdown("No intermediate steps found in response data")

4. Quizzes#

    ![Challenge](images/mlu-challenge.png)

Challenge: Try it Yourself!#

    Answer the following questions to test your understanding of using multimodal models for generating personalized and inclusive content.

from mlu_utils.quiz_questions import lab5c_question1, lab5c_question2

lab5c_question1.display()
lab5c_question2.display()

Conclusion#

In this lab, you have:

    Developed custom multimodal tools for image generation and description
    Created an agentic application for movie poster generation and storytelling
    Built custom multimodal tools for retrieval-based responses
    Implemented a product authentication agent using web search and image comparison

Additional Resources#

    LangChain Agents Documentation
    Amazon Bedrock Documentation

Lab 5c: Multimodal Agents

Contents

Lab 5c: Multimodal Agents#

About This Lab#

Table of Contents#

1. Installing dependencies#

2. Multimodal agent for image generation and description#

2.1 Custom multimodal tools for image generation and description#

2.2 Agentic application for image generation and description#

Activity: Try it yourself!#

Here’s the first movie poster:#

Here’s the second movie poster:#

And here’s the plot of the movie:#

Challenge: Use all the tools#

3. Multimodal agent for retrieval-based responses#

3.1 Custom multimodal tools for retrieval#

Challenge: Create custom tools#

3.2 Agentic application based on retrieval#

Activity: Try it yourself!#

Here’s the response about the authenticity of the product:#

⚠️ IMPORTANT SECURITY NOTICE ⚠️#

Here’s the response about the webpage to purchase the original product.#

4. Quizzes#

Challenge: Try it Yourself!#

Conclusion#

Additional Resources#

Thank you!#