Building an AI-Safe Tool-Calling Proxy with FastAPI

A friend of mine works at a software company. Someone asked the company’s AI agents to get rid of test accounts last month. The assistant interpreted that broadly and started firing off delete requests against the customer database. Two minutes and forty-seven deletions later, someone pulled the plug. The New Insider Threat Is Your Own AI Agent also expands on the same threat.

Luckily, the records were recoverable, and nobody outside the team found out. Nobody can rely on a system where AI has the keys to the live database without anything in between. As discussed in this blog on hackernoon, the real culprit is not AI but the ones who overlooked it and designed no rules for AI agent.

In this tutorial, I will show you how to build the AI agent guardrails. You will end with a FastAPI proxy between your AI agent and your APIs. The FastAPI checks whether a tool is allowed, enforces per-user rate limits, tracks a daily request budget, and keeps a record of every call, whether it went through or got blocked.

Why Direct AI-to-API Access Is Risky

By giving access of a real API to an AI agent, you trust that it will interpret every prompt exactly the way you intended. The assumption holds in controlled testing(your local machine or staging setup). In production, it breaks down because real users phrase things in unpredictable ways, and there are edge cases the model was never prompted for.

Three failure modes show up repeatedly:

Runaway actions: the agent misinterprets a prompt and hammers a destructive endpoint (the deleted-accounts scenario)
Runaway volume: one user asks the agent to “check every customer,” and it starts looping through thousands of records
Scope creep: the agent calls a tool outside its intended role because permissions are only defined in the system prompt, not enforced at the infrastructure level.

A proxy fixes all three by intercepting every tool call before it reaches the real API, checking whether it’s allowed, and logging what happened regardless of outcome.

The Architecture

The pattern is straightforward:

AI Agent → FastAPI Guardrail Proxy → Real APIs (CRM, billing, ticketing)

The agent never talks to the downstream APIs directly. The proxy has the credentials. It is really useful because if you need to change your CRM API key you only have to do it in the proxy.

Project Set Up

You need to install the following dependencies. FastAPI is used to make the proxy work on the web. Pydantic validates whether all the information and data types we get are correct. Uvicorn is the server that runs FastAPI on your computer.

pip install fastapi pydantic uvicorn

Structure

ai_safe_proxy/
├── main.py
├── tool_registry.py
├── audit_log.py
└── requirements.txt

tool_registry.py decides which tools can be used by AI and when can it use them. audit_log.py makes sure that every tool usage gets written down on disk. main.py is where the FastAPI resides. These three dependencies will be listed inside requirements.txt.

Step 1: Tool Registry

The registry is where we get the information for the proxy. Tools mean the functions that the AI agent can use to talk to the world. If a tool is not in registry, the proxy will not let the AI agent use the tool.

method determines what kind of HTTP request is made( GET is used to read something, POST is used to create something, PATCH for updating and DELETE for deletion). risk tells the danger level of an action. max_calls_per_minute is for setting the limit on the calls to a tool before proxy interfers and stops the agent.

The important point is that delete_customer has risk: blocked and max_calls_per_minute is 0. Proxy will see this and won’t let AI agent call delete_customer for safety purposes.

# tool_registry.py
ALLOWED_TOOLS = {
&nbsp;&nbsp;&nbsp;&nbsp;"get_customer": {
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"method": "GET",
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"risk": "low",
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"max_calls_per_minute": 30,
&nbsp;&nbsp;&nbsp;&nbsp;},
&nbsp;&nbsp;&nbsp;&nbsp;"create_ticket": {
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"method": "POST",
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"risk": "medium",
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"max_calls_per_minute": 10,
&nbsp;&nbsp;&nbsp;&nbsp;},
&nbsp;&nbsp;&nbsp;&nbsp;"update_lead_status": {
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"method": "PATCH",
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"risk": "medium",
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"max_calls_per_minute": 10,
&nbsp;&nbsp;&nbsp;&nbsp;},
&nbsp;&nbsp;&nbsp;&nbsp;"delete_customer": {
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"method": "DELETE",
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"risk": "blocked",
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"max_calls_per_minute": 0,
&nbsp;&nbsp;&nbsp;&nbsp;},
}

Step 2: Validation

Validation preceeds tools run. Every incoming request is checked against the Pydantic model. If the data does not match, FastAPI rejects it with a 422 error because of semantic errors in the request.

ToolCallRequest is a Pydantic model and a template for comparing with each incoming request. user_id is allocated to the user making the request. tool_name has the tool name called by AI agent . arguments is a dictionary where data for tools is stored(empty by default). request_id is a unique ID per request which connects to the AI conversation triggering the request. If something goes wrong, you can check the audit log to see what the user typed, what the AI agent decided to call and whether the proxy let the ToolCallRequest through. Without the request_id, it is hard to debug an AI incident because you would have to guess what happened with the ToolCallRequest.

# main.py
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel, Field
from typing import Dict, Any
app = FastAPI()
class ToolCallRequest(BaseModel):
&nbsp;&nbsp;&nbsp;&nbsp;user_id: str = Field(..., min_length=1, max_length=64)
&nbsp;&nbsp;&nbsp;&nbsp;tool_name: str = Field(..., min_length=1, max_length=64)
&nbsp;&nbsp;&nbsp;&nbsp;arguments: Dict[str, Any] = Field(default_factory=dict)
&nbsp;&nbsp;&nbsp;&nbsp;request_id: str = Field(..., min_length=1, max_length=128)

Step 3: Permissions and Rate Limits

The check_tool_permission() function checks if the tool is in the registry. A 403 error means the tool isn’t in the registry or its risk is "blocked” (like delete_customer). The check_rate_limit() function keeps track of the number of times per minute a user calls a tool. It throttles the user’s requests if the user’s agent goes into a runaway loop. The other users remain unaffected.

Note: RATE_LIMIT_STORE is stored in the RAM because it is a python variable which is okay for a tutorial. For production, use Redis database instead. The Redis database offers persistance mechanisms which ensures that the data survives restarts or crashes.

# main.py(Continued)
from datetime import datetime, timedelta
from collections import defaultdict
from tool_registry import ALLOWED_TOOLS
# In-memory store: {user_id: [timestamp, timestamp, ...]}
RATE_LIMIT_STORE = defaultdict(list)
def check_tool_permission(tool_name: str) -> dict:
&nbsp;&nbsp;&nbsp;&nbsp;tool = ALLOWED_TOOLS.get(tool_name)
&nbsp;&nbsp;&nbsp;&nbsp;if not tool:
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;raise HTTPException(
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;status_code=403,
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;detail=f"Tool '{tool_name}' is not registered"
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;)
&nbsp;&nbsp;&nbsp;&nbsp;if tool["risk"] == "blocked":
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;raise HTTPException(
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;status_code=403,
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;detail=f"Tool '{tool_name}' is blocked from AI use"
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;)
&nbsp;&nbsp;&nbsp;&nbsp;return tool
def check_rate_limit(user_id: str, tool: dict):
&nbsp;&nbsp;&nbsp;&nbsp;now = datetime.utcnow()
&nbsp;&nbsp;&nbsp;&nbsp;cutoff = now - timedelta(minutes=1)
&nbsp;&nbsp;&nbsp;&nbsp;recent = [t for t in RATE_LIMIT_STORE[user_id] if t > cutoff]
&nbsp;&nbsp;&nbsp;&nbsp;if len(recent) >= tool["max_calls_per_minute"]:
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;raise HTTPException(
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;status_code=429,
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;detail=f"Rate limit: {tool['max_calls_per_minute']}/min"
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;)
&nbsp;&nbsp;&nbsp;&nbsp;recent.append(now)
&nbsp;&nbsp;&nbsp;&nbsp;RATE_LIMIT_STORE[user_id] = recent

Step 4: Budgeting Daily Requests

There are 1440 minutes in a day. Per-minute rate limits can’t stop users from making a thousand calls even if they are making 1 call per minute. Every user has a specific counter in DAILY_BUDGET which increments on every call. The counter is compared to the user limits in DAILY_BUDGET_LIMITS. Upon hitting the limit, check_daily_budget()raises a 429 error before the request reaches the tool. These limits are necessary for avoiding enormous monthly bills because of some users.

# main.py(Continued)
DAILY_BUDGET = defaultdict(int)
DAILY_BUDGET_LIMITS = {"default": 500}
def check_daily_budget(user_id: str):
&nbsp;&nbsp;&nbsp;&nbsp;used = DAILY_BUDGET[user_id]
&nbsp;&nbsp;&nbsp;&nbsp;limit = DAILY_BUDGET_LIMITS.get(
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;user_id, DAILY_BUDGET_LIMITS["default"]
&nbsp;&nbsp;&nbsp;&nbsp;)
&nbsp;&nbsp;&nbsp;&nbsp;if used >= limit:
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;raise HTTPException(
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;status_code=429,
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;detail=f"Daily budget exceeded: {used}/{limit} calls"
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;)
&nbsp;&nbsp;&nbsp;&nbsp;DAILY_BUDGET[user_id] = used + 1

Step 5: Auditing Logs

The log_tool_call() function records every call, regardless of its status, in LOG_PATH. The records helps distinguish from a system where debuggers have to guess to cause of problem. audit_log.jsonl is used for storing information. Each line in JSONL format is a complete JSON entry unlike JSON file where everything is wrapped in a giant object. JSONL file looks like:

{"timestamp": "...", "user_id": "u1", "tool_name": "get_customer", ...}
{"timestamp": "...", "user_id": "u2", "tool_name": "delete_record", ...}
{"timestamp": "...", "user_id": "u1", "tool_name": "send_email", ...}

Most log management tools like Datadog ClickHouse and Grafana Loki expect the logs to be in JSONL format. JSONL format is really helpful because it saves us time of processing the logs. When we add logs to a JSON array in a JSON file, we have to read the whole file first and then add the new log. If two requests came simultaneously, one of them might overwrite the other. With JSONL each request just adds a line. There is no chance of one log interfering with another log.

# audit_log.py
import json
from datetime import datetime
from pathlib import Path
LOG_PATH = Path("audit_log.jsonl")
def log_tool_call(request: dict, result: str, reason: str = ""):
&nbsp;&nbsp;&nbsp;&nbsp;entry = {
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"timestamp": datetime.utcnow().isoformat(),
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"user_id": request["user_id"],
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"tool_name": request["tool_name"],
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"arguments": request["arguments"],
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"request_id": request["request_id"],
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"result": result,
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"reason": reason,
&nbsp;&nbsp;&nbsp;&nbsp;}
&nbsp;&nbsp;&nbsp;&nbsp;with open(LOG_PATH, "a") as f:
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;f.write(json.dumps(entry) + "n")

Step 6: Tie It All Together

# main.py(Continued)
from audit_log import log_tool_call
@app.post("/tool-call")
def execute_tool_call(request: ToolCallRequest):
&nbsp;&nbsp;&nbsp;&nbsp;request_dict = request.model_dump()
&nbsp;&nbsp;&nbsp;&nbsp;try:
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;tool = check_tool_permission(request.tool_name)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;check_rate_limit(request.user_id, tool)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;check_daily_budget(request.user_id)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;# In production, this is where you'd call the real downstream API.
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;# For the tutorial, we confirm the call would have been made.
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;log_tool_call(request_dict, result="approved")
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return {
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"status": "allowed",
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"tool": request.tool_name,
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"request_id": request.request_id,
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}
&nbsp;&nbsp;&nbsp;&nbsp;except HTTPException as e:
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;log_tool_call(
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;request_dict,
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;result="rejected",
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;reason=str(e.detail),
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;raise

The function execute_tool_call() runs every time an AI agent tries to call a tool. request.model_dump() converts incoming request into a plain dictionary so it can be passed to the logger. Three checks run back to back and if all three pass, the call is logged as ”approved” and proxy returns an allowed response. Code comments mark where the real API call goes in production.

If any check fails, it raises an HTTPException. The except block catches it, logs the call as ”rejected” with the reason and re-raises exception so the caller gets the proper response.

Run it with:

uvicorn main:app --reload

The command breakdown is as following:

uvicorn acts as a server for locally running your FastAPI. main refers to the file: main.py. app is the FastAPI instance defined in the code as app = FastAPI(). --reload handles automatic restart of the server upon every save. Skip it and your code needs a manual restart every time.

Send a valid POST /tool_call request; you will get an allowed response. Send a delete_customer call; you will get a 403. Send more than 30 get_customer calls in under a minute; you will get a 429. Every one of those outcomes lands in audit_log.jsonl with enough context to reconstruct what happened.

Fast API Proxy protect APIs from AI agent anomalous behaviour

What This Looks Like in Production

The tutorial proxy is a starting point. A production version adds:

Postgres or ClickHouse for persistent audit logs
Redis for rate limits and budgets that survive restarts and work across multiple instances
Datadog or Grafana hooks for real-time monitoring and alerting
Human-approval workflows for high-risk tools before execution
Per-customer configuration if you’re building a multi-tenant product
Read-only vs. write tool tagging to make permission tiers explicit
Signed request IDs to make audit logs tamper-evident

Python dictionaries are used to store data like rate limit counters and budgets in this tutorial. For local testing, reset upon every restart keep things clean during development. Before taking them to production, substitute the dictionaries with Redis for persistent storage of data across multiple restarts. Postgres/ClickHouse are proper databases used for storing audit logs.

None of the add-ons is complicated to add. The add-ons separate a prototype that works from a system that handles real customer traffic. The core principle doesn’t change whether you connect your AI agent to CRM, a billing system or a data integration pipeline: don’t let AI agents talk directly to your production systems. Put a proxy in between. Log everything. Make it possible to kill access without touching the agent.

Where This Fits in an AI Integration Strategy

This kind of safety infrastructure is increasingly what separates an AI demo from an AI product. A demo doesn’t need rate limits; there’s one user and the tools are mocked. In production, thousands of users and real APIs are present for interacting with AI assistants. The finance teams want to know the cost of each user. Building the proxy before the AI hits production is faster and cheaper than rebuilding trust after an incident.

If you’re connecting an LLM to a real API today, wire up something like this proxy first. The setup takes a few hours. The savings, the first time something goes wrong, can be significant.