You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Agents have the power to take real-world actions — which means they can also cause real-world harm. Without guardrails, an agent might delete critical files, send embarrassing emails, spend thousands of dollars on API calls, or get stuck in an infinite loop. This lesson covers input/output validation, action sandboxing, human-in-the-loop patterns, rate limiting, and preventing runaway agents.
| Risk | Example |
|---|---|
| Data loss | Agent deletes production database records |
| Financial damage | Agent makes unlimited API calls, costing thousands |
| Security breach | Agent exfiltrates sensitive data via a tool call |
| Reputation damage | Agent sends incorrect emails to customers |
| Infinite loops | Agent calls the same tool repeatedly with no progress |
| Prompt injection | Malicious input causes agent to bypass instructions |
Validate all inputs before they reach the agent:
from pydantic import BaseModel, validator
import re
class AgentInput(BaseModel):
task: str
user_id: str
max_steps: int = 10
@validator("task")
def task_not_empty(cls, v):
if len(v.strip()) < 5:
raise ValueError("Task must be at least 5 characters.")
if len(v) > 5000:
raise ValueError("Task must be under 5000 characters.")
return v
@validator("max_steps")
def reasonable_step_limit(cls, v):
if v < 1 or v > 50:
raise ValueError("max_steps must be between 1 and 50.")
return v
def detect_injection(text: str) -> bool:
"""Detect common prompt injection patterns."""
suspicious_patterns = [
r"ignore (?:all )?previous instructions",
r"you are now",
r"system:\s",
r"forget (?:everything|all|your)",
r"new persona",
]
for pattern in suspicious_patterns:
if re.search(pattern, text, re.IGNORECASE):
return True
return False
Validate what the agent produces before returning it to the user:
class OutputValidator:
"""Validate agent outputs before returning to the user."""
def __init__(self):
self.blocked_patterns = [
r"\b(?:password|secret|api.?key)\s*[:=]\s*\S+",
r"\b\d{3}-\d{2}-\d{4}\b", # SSN pattern
r"\b\d{16}\b", # Credit card pattern
]
def validate(self, output: str) -> tuple[bool, str]:
"""Returns (is_safe, sanitised_output)."""
for pattern in self.blocked_patterns:
if re.search(pattern, output, re.IGNORECASE):
return False, "[Output blocked: contains sensitive information]"
if len(output) > 50000:
return False, output[:50000] + "\n[Output truncated]"
return True, output
# Usage
validator = OutputValidator()
is_safe, clean_output = validator.validate(agent_result)
if not is_safe:
log_blocked_output(agent_result)
Restrict which actions an agent can take:
from enum import Enum
class ActionPermission(Enum):
READ_ONLY = "read_only"
READ_WRITE = "read_write"
ADMIN = "admin"
class ActionSandbox:
"""Restrict agent actions based on permission level."""
TOOL_PERMISSIONS = {
"web_search": ActionPermission.READ_ONLY,
"read_file": ActionPermission.READ_ONLY,
"write_file": ActionPermission.READ_WRITE,
"delete_file": ActionPermission.ADMIN,
"run_code": ActionPermission.READ_WRITE,
"send_email": ActionPermission.ADMIN,
"database_query": ActionPermission.READ_ONLY,
"database_write": ActionPermission.READ_WRITE,
"database_delete": ActionPermission.ADMIN,
}
PERMISSION_HIERARCHY = {
ActionPermission.READ_ONLY: 1,
ActionPermission.READ_WRITE: 2,
ActionPermission.ADMIN: 3,
}
def __init__(self, user_permission: ActionPermission):
self.user_level = self.PERMISSION_HIERARCHY[user_permission]
def is_allowed(self, tool_name: str) -> bool:
required = self.TOOL_PERMISSIONS.get(tool_name)
if required is None:
return False # Unknown tools are blocked
return self.user_level >= self.PERMISSION_HIERARCHY[required]
def filter_tools(self, tools: list[str]) -> list[str]:
return [t for t in tools if self.is_allowed(t)]
# Usage
sandbox = ActionSandbox(ActionPermission.READ_WRITE)
sandbox.is_allowed("web_search") # True
sandbox.is_allowed("delete_file") # False
sandbox.is_allowed("send_email") # False
For high-risk actions, require human approval:
class HumanApprovalGate:
"""Require human approval for specified actions."""
REQUIRES_APPROVAL = {
"send_email", "delete_file", "database_delete",
"make_payment", "deploy_code",
}
def check(self, tool_name: str, arguments: dict) -> bool:
"""Returns True if approved, False if rejected."""
if tool_name not in self.REQUIRES_APPROVAL:
return True # Auto-approve safe actions
print(f"\n{'='*60}")
print(f"APPROVAL REQUIRED")
print(f"Action: {tool_name}")
print(f"Arguments: {json.dumps(arguments, indent=2)}")
print(f"{'='*60}")
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.