AutoGPT: Digging into the guts

What can you learn from digging into how AutoGPT works

Apr 21, 2023

AutoGPT: What is it?

If you haven’t read the recent articles, AutoGPT is an open source (code) project that pools together multiple agents in order to allow AI to work independently towards a prescribed objective. If you aren’t sure what that means… watch the entertaining video below and keep reading!

What’s an Agent?

In ChatGPT the interaction pattern is a direct call and response where the user puts in a prompt and the AI returns a direct response. Agents build a layer of reasoning on top of that and are well explained in the Langchain library.

In this example below (from Langchain) the agent uses the ReAct framework for reasoning described in this paper. The agents are equipped with “tools” that allow them to call APIs or other functions under predefined scenarios.

For a little more depth on this example, the search tool can use Bing Search and is this defined scenario:

 """Tool that adds the capability to query the Bing search API."""

    name = "Bing Search"
    description = (
        "A wrapper around Bing Search. "
        "Useful for when you need to answer questions about current events. "
        "Input should be a search query."
    )
    api_wrapper: BingSearchAPIWrapper

So when the LLM determines it needs information about current events, it can select this tool. When selected it will run this block to execute an API based search call:

    def _bing_search_results(self, search_term: str, count: int) -> List[dict]:
        headers = {"Ocp-Apim-Subscription-Key": self.bing_subscription_key}
        params = {
            "q": search_term,
            "count": count,
            "textDecorations": True,
            "textFormat": "HTML",
        }
        response = requests.get(
            self.bing_search_url, headers=headers, params=params  # type: ignore
        )
        response.raise_for_status()
        search_results = response.json()
        return search_results["webPages"]["value"]

These results are now fed back to the language model for forming and observation and eventually a final answer which can be presented to a user (or used for something else in the case of AutoGPT)

The three agents of AutoGPT

The agent above executes search, but they can be equipped with many kinds of tools. Writesonic does a great job describing and visualizing the 3 agents of AutoGPT:

Task Creation Agent: When you enter your goals on AutoGPT, the first AI agent to interact with the task creation agent. Based on your goals, it will create a list of tasks with steps to achieve them and send it to the prioritization agent.
Task Prioritization Agent: After receiving the list of tasks, the prioritization AI agent ensures the sequence is correct and makes logical sense before sending it to the execution agent.
Task Execution Agent: Once prioritization is done, the execution agent completes one task after another. This involves tapping into GPT-4, the Internet, and other resources to get results.

What can we learn from AutoGPT

The nice thing about all this development being done as open source is we can dig into the guts a bit and see how the pieces function. For example, even at the base prompt level you can dig through libraries and see pressure tested prompts used to avoid hallucination in question answering like below:

prompt_template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

{context}

Question: {question}
Helpful Answer:"""

Given that let’s take a look at the three agents of autoGPT and see what we can learn:

Task Definition Agent & Task Prioritization Agent

These seem to be both handled as instances of the base agent class which has this commented description:

class Agent:
    """Agent class for interacting with Auto-GPT.

    Attributes:
        ai_name: The name of the agent.
        memory: The memory object to use.
        full_message_history: The full message history.
        next_action_count: The number of actions to execute.
        system_prompt: The system prompt is the initial prompt that defines everything
          the AI needs to know to achieve its task successfully.
        Currently, the dynamic and customizable information in the system prompt are
          ai_name, description and goals.

        triggering_prompt: The last sentence the AI will see before answering.
            For Auto-GPT, this prompt is:
            Determine which next command to use, and respond using the format specified
              above:
            The triggering prompt is not part of the system prompt because between the
              system prompt and the triggering
            prompt we have contextual information that can distract the AI and make it
              forget that its goal is to find the next task to achieve.
            SYSTEM PROMPT
            CONTEXTUAL INFORMATION (memory, previous conversations, anything relevant)
            TRIGGERING PROMPT

        The triggering prompt reminds the AI about its short term meta task
        (defining the next task)
    """

The actual code leaves most of the prioritization up to the AI, an area where specialized agents could be equipped with a more specific prioritization mechanism.

 triggering_prompt = (
            "Determine which next command to use, and respond using the"
            system_prompt=system_prompt,
            triggering_prompt=triggering_prompt,
        )

Execution Agent

Here’s the prompt used by the execution agent to generate executable code parameters. There’s also a number of useful utils in the codebase to make sure this works or repair the process if it doesn’t

"""Execute the command and return the result

    Args:
        command_name (str): The name of the command to execute
        arguments (dict): The arguments for the command

    Returns:
        str: The result of the command
    """

Agent Manager

While not illustrated above, the system also contains an agent manager which allows for communication across agents. This is a useful technique for multi-agent systems.

@command("message_agent", "Message GPT Agent", '"key": "<key>", "message": "<message>"')
def message_agent(key: str, message: str) -> str:
    """Message an agent with a given key and message"""
@command("delete_agent", "Delete GPT Agent", '"key": "<key>"')
def delete_agent(key: str) -> str:
    """Delete an agent with a given key

What’s next?

The ongoing debate revolves around whether AutoGPT-like structures can lead to Artificial General Intelligence (AGI). However, it is expected that more specialized goal-driven agent systems will be developed and implemented based on this model, even though it remains uncertain if this is the path to AGI. With further tools and enhancements to Language Model Machines (LLMs), it may get closer to achieving AGI. To achieve this, it would be beneficial to extend the experiment by adding additional tools that enable the AI to compile training data sets, fine-tune new models, and replace its core LLM with a self-created model.

Update: Just after posting I saw this self fine-tuning system in the works:

https://twitter.com/danielgross/status/1648728342940758016?s=20

Nevertheless, in the short term, while AGI captures the imagination, companies can generate immediate revenue by using high-performing AI agents designed for specific use cases.

Generative Post produced by Gen AI Partners

The Generative Post

Discussion about this post