Skip to content

Commit

Permalink
opensource functionsmith command-line agent
Browse files Browse the repository at this point in the history
PiperOrigin-RevId: 714292658
  • Loading branch information
The Google Earth Engine Community Authors authored and copybara-github committed Jan 11, 2025
1 parent 4649d8c commit f5d8bb3
Show file tree
Hide file tree
Showing 7 changed files with 1,204 additions and 0 deletions.
55 changes: 55 additions & 0 deletions experimental/functionsmith/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
<!--
Copyright 2024 The Google Earth Engine Community Authors
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

# Earth Engine Dataset Explorer

## Overview

Functionsmith is a general-purpose problem-solving agent.

USING THIS AGENT IS UNSAFE. It directly runs LLM-produced code, and thus
should only be used for demonstration purposes.

This agent uses free-form function calling, which means that instead of
relying on a fixed set of tools predefined in the agent
[in normal LLM function calling](https://ai.google.dev/gemini-api/docs/function-calling),
we let the agent itself write with all the functions it needs.

The functionsmith system prompt asks the agent to first write any low-level
function it needs, as well as tests for them. The agent loop will try
to run these functions and ask the LLM to make corrections if necessary.
Once all the functions are ready, the agent will write and run the code
to solve the actual user task.

The agent does not use function calling features of LLM clients. Instead,
it simply tries to parse all the ```python or ```tool_use sections
present in the raw LLM output. It keeps all function definitions as well
as their source code in memory. Each call to the LLM is preceded
by the function definitions to let the LLM know what functions are available
locally.

The functions are not saved permanently, though this feature can be added.

# TODO(simonf): add a notebook version of the general-purpose agent,
# as well as an Earth Engine-specific notebook agent.

## Attribution

Functionsmith was written by Simon Ilyushchenko (simonf@google.com).
I am grateful to Renee Johnston and other Googlers for implementation advice,
as well as to Earth Engine expert advisors Jeffrey Cardille, Erin Trochim,
Morgan Crowley, and Samapriya Roy, who helped me choose the right training
tasks.
239 changes: 239 additions & 0 deletions experimental/functionsmith/agent.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,239 @@
"""Functionsmith is a general-purpose problem-solving agent.
It writes functions for its own future use, tests the functions, and then uses
them to solve a user-specified problem.
USING THIS AGENT IS UNSAFE. It directly runs LLM-produced code, and thus
should only be used for demonstration purposes.
To get started, get a data file:
```
wget https://raw.githubusercontent.com/davidmegginson/ourairports-data/refs/heads/main/airports.csv
```
then run
```
python3 agent.py
```
See README.md for more information.
To execute the sample task analyzing airports, get the airtports.csv file from
https://raw.githubusercontent.com/davidmegginson/ourairports-data/refs/heads/main/airports.csv
The agent will ask the LLM to find "something interesting" about the data
given its schema. Then the LLM will probably create one or two sets of
low-level function with tests, then actually analyze the files, then stop
and ask the user if they want to do anything else.
Before each code execution phase, the agent will print the code and ask
the user to hit "enter" to confirm the code looks safe to run.
"""

import copy
import inspect
import logging
import os
import sys

import code_parser
import executor
import llm


STARS = '*' * 20 + '\n'


class CustomLoggingHandler(logging.Handler):

def emit(self, record):
msg = self.format(record)
print(msg)


system_prompt = """
To solve the task given below, write first low-level python functions with
tests for each of them in a ```python block. Include all the necessary imports.
The tests should be as simple as possible and not rely on anything external.
All asserts in tests should have an error message to make sure their failure is
easy to detect.
In later responses, never omit parts of the code referring to earlier output -
if you need to do this, define a function and then call it later.
I will save the functions locally, and you can write higher-level code that
will invoke them later. I will pass you the output from the code or any error
messages.
Call the task_done() function when you consider the task done.
Ask the user questions if you need additional input.
If I ask you to compute factorial of 10 and then prompt the user if they want
more factorials computed, your responses should be like this (return one
response at a time): Example chat session (each response should be returned in
a separate answer):
Question 1:
Please compute the factorial of 10
Response 1:
Let's define the requested function and test it.
```python
import math
def factorial(x):
return math.factorial(x)
def test_factorial():
assert factorial(3) == 6
assert factorial(4) == 24
print('success')
test_factorial()
```
Question 2:
The code output was "success"
Response 2:
Now let's call the previously defined function to solve the user task.
```python
print(factorial(10))
```
Question 3:
The code output was "3628800"
Response 3:
The computed answer looks reasonable. Please enter a number if you want another factorial to be computed, or instruct me to exit.
Question 4:
You can exit here
Response 4:
```python
task_done('We can exit')
```
"""
if not os.path.exists('airports.csv'):
print(
"""Download
https://raw.githubusercontent.com/davidmegginson/ourairports-data/refs/heads/main/airports.csv
if you'd like to run this task.
""",
file=sys.stderr,
)
sys.exit(1)

schema = """
"id","ident","type","name","latitude_deg","longitude_deg","elevation_ft","continent","iso_country","iso_region","municipality","scheduled_service","gps_code","iata_code","local_code","home_link","wikipedia_link","keywords"
"""

task = f"""
Please explore a local file airports.csv. First, make some hypotheses about the
data, and then write code to test them to learn something interesting about the
data. By 'interesting', I mean something you wouldn't have guessed from first
principles - eg, finding that the largest countries have the most airports is
not interesting. Explain why what you discovered seems interesting. When done,
ask the user if they want to find out something else about this file. Output
findings in text form, not as images or plots.
Do not overwrite the original file in your code or tests.
The file has the following schema {schema}"""

# If you need to debug the agent, use this simple task.
# task = """
# Compute the factorial of 20. When done, ask the user in a chat response
# if they want to compute another factorial and compute it if they give you
# a new value"""

# This code works with several different LLMs. Uncomment the one you
# have access to. Make sure to set the API key in the appropriate
# environment variable (GOOGLE_API_KEY, ANTHROPIC_API_KEY, or OPENAI_API_KEY).
agent = llm.Gemini(system_prompt, model_name='gemini-2.0-flash-exp')
# agent = llm.Claude(system_prompt, model_name='claude-3-5-sonnet-20241022')
# agent = llm.ChatGPT(system_prompt, model_name='o1-mini')


def task_done(agent_message: str) -> None:
"""Returns control back to the user when the agent thinks the task is done.
This function must always be invoked in a separate response, not at the end
of a code snippet doing something else.
Args:
agent_message(str): the message that the agent wants to print before exit.
"""
print(agent_message)
import sys # pylint:disable=g-import-not-at-top,redefined-outer-name,reimported

sys.exit(0)


syscalls = {}

# Set up a custom logger to be passed to helper objects.
# This is an overkill for the command-line agent, but makes more sense
# for the notebook version of this agent.
logger = logging.getLogger('functionsmith')
logger.handlers = []
logger.addHandler(CustomLoggingHandler())
logger.propagate = False

# Create the object that parses Python code out of LLM responses.
parser = code_parser.Parser(logger)
# Create the object that runs the LLM-generated Python code.
code_executor = executor.Executor(logger)

# 'Syscalls' are functions for which stdout/stderr won't be intercepted.
# For now we only have one syscall, 'task_done'.
for f in [task_done]:
starting_tools = parser.extract_functions(inspect.getsource(f))
syscalls.update(starting_tools.functions)

tools = {}

question = task

while True:
print(STARS)
all_tools = copy.deepcopy(tools)
all_tools.update(syscalls)
question_with_tools = (
question
+ 'The following functions are available:\n'
+ '\n'.join([str(x) for x in all_tools.values()])
)
response = agent.chat(question_with_tools)
print(response)

parsed_response = parser.extract_functions(response)
if not parsed_response.code and not parsed_response.functions:
if parsed_response.error:
question = parsed_response.error
continue
question = input('> ')
continue

tools.update(parsed_response.functions)

if parsed_response.code:
# Concatenate all known source code together.
# Functions that were defined in the most recent response will be repeated,
# which is okay
code_with_tools = (
'\n'.join([x.code for x in tools.values()])
+ '\n'
+ parsed_response.code
)

print(STARS)
input('HIT ENTER TO RUN THIS CODE')
print(STARS)
question = code_executor.run_code(code_with_tools, {'task_done': task_done})
else:
# The response had functions but no code. The agent wanted to define them.
# We tell it to go on (that is, to keep writing code).
question = 'go on'
Loading

0 comments on commit f5d8bb3

Please sign in to comment.