← Deep Learning & Modern AI Chapter 8

Coming Soon

AI Agents & Tool Use

From chatbots to autonomous agents: building AI that can think, plan, and act

The Evolution: From Chat to Agents

2020-2022

Chatbots

LLMs respond to questions. No memory, no tools, no autonomy.

"What's the weather?" → "I don't have access to current weather data."

→

2023

Tool-Using Models

LLMs can call functions to access external data and services.

"What's the weather?" → calls weather API → "It's 72°F and sunny."

→

2024-2025

Autonomous Agents

LLMs plan multi-step tasks, use multiple tools, and operate autonomously.

"Book me a flight to Paris" → searches flights, compares prices, books ticket, adds to calendar

What You'll Learn

AI agents represent the next frontier: systems that can reason, plan, use tools, and take actions to accomplish complex goals. This chapter covers the complete agent stack:

Function Calling & Tool Use

The foundation: teaching LLMs to use external tools and APIs

What is function calling? LLM decides which tool to use and with what parameters
How it works: Tool schema definition, detection, execution
OpenAI function calling, Anthropic tool use, LangChain tools
Real examples: Weather API, database queries, web search
Structured outputs: JSON mode, typed responses

1. User: "What's the weather in SF?"

2. LLM thinks: "I need to call get_weather(location='SF')"

3. System: Executes API → Returns {temp: 72, condition: "sunny"}

4. LLM responds: "It's 72°F and sunny in San Francisco."

ReAct: Reasoning + Acting

The pattern that powers most modern agents

The ReAct loop: Thought → Action → Observation → Repeat
Why explicit reasoning improves reliability
Implementing ReAct from scratch
When ReAct breaks down: failure modes
Alternatives: Plan-and-Execute, Tree-of-Thoughts

Thought: I need to find out when Python 3.12 was released

Action: search("Python 3.12 release date")

Observation: Python 3.12.0 was released on October 2, 2023

Thought: I now know the answer

Answer: Python 3.12 was released on October 2, 2023

Planning & Decomposition

Breaking complex goals into achievable steps

Task decomposition: "Book a vacation" → 20 subtasks
Plan-and-Execute pattern: Create plan first, then execute
Hierarchical planning: High-level goals → Mid-level tasks → Low-level actions
Replanning: What to do when plans fail
Examples: AutoGPT, BabyAGI, GPT-Engineer

Goal: "Research competitors and create a comparison report"

Generated Plan:

Identify top 5 competitors
For each competitor: gather info on pricing, features, market share
Compile data into structured format
Analyze strengths/weaknesses
Generate comparison table and summary report

Agent executes each step, using search, scraping, and analysis tools

Memory Systems

How agents remember past interactions and learn over time

Short-term memory: Conversation history within context window
Long-term memory: Vector databases, summary storage
Episodic memory: Remembering specific past events
Semantic memory: Learned facts and knowledge
Memory retrieval strategies: recency, relevance, importance

💭

Short-term

Current conversation (last 10 messages)

💾

Long-term

Past conversations stored in vector DB

🎯

Semantic

Learned facts: "User prefers Python over JavaScript"

Multi-Agent Systems

Multiple AI agents collaborating to solve problems

Specialist agents: Researcher, Coder, Critic, each with specific roles
Communication protocols: How agents talk to each other
Coordination patterns: Sequential, parallel, hierarchical
Consensus mechanisms: When agents disagree
Examples: AutoGen, CrewAI, MetaGPT

Researcher

Gathers information

→

Coder

Writes implementation

→

Critic

Reviews & suggests improvements

↻

Agent Frameworks & Tools

The ecosystem for building production agents

LangChain: The original agent framework (agents, chains, tools)
LlamaIndex: Data-centric agents, retrieval focus
AutoGPT/BabyAGI: Autonomous task executors
CrewAI: Multi-agent orchestration
OpenAI Assistants API: Managed agent infrastructure
Anthropic Claude with tools: Native tool use support

Agent Reliability & Safety

Making agents trustworthy and production-ready

Failure modes: Infinite loops, hallucinated actions, cost explosions
Guardrails: Action approval, cost limits, time limits
Human-in-the-loop: When to ask for confirmation
Monitoring: Logging actions, tracking success rates
Fallback strategies: What to do when agents get stuck

✓ Max iterations limit (prevent infinite loops)

✓ Cost budget per session ($5 max)

✓ Human approval for destructive actions

✓ Comprehensive logging and observability

Real-World Agent Applications

How companies are deploying agents in production

Customer support: Autonomous ticket resolution
Data analysis: SQL query generation, visualization
Code generation: GitHub Copilot Workspace, Cursor
Research assistants: Literature review, summarization
Task automation: Email management, scheduling, CRM updates

Agent Architecture Blueprint

1. Perception

User input → LLM understands intent

↓

2. Planning

Decompose goal into subtasks

↓

3. Tool Selection

Choose appropriate tools/APIs

↓

4. Action Execution

Call functions, get results

↓

5. Reflection

Did it work? Replan if needed

↻

6. Response

Return result to user

Agent Patterns Compared

Simple Function Calling

Complexity: ⭐

LLM calls one function, returns result

Use for: Single-step tasks (weather, calculator)

Example: "What's 25% of 80?" → calculate(0.25 * 80)

ReAct Agent

Complexity: ⭐⭐⭐

LLM reasons, acts, observes in loop until goal met

Use for: Multi-step reasoning with tools

Example: "Find cheapest flight to Paris" → search → compare → recommend

Plan-and-Execute

Complexity: ⭐⭐⭐⭐

Create full plan upfront, then execute steps

Use for: Complex, multi-day tasks

Example: "Prepare quarterly report" → 15-step plan → execute each

Multi-Agent System

Complexity: ⭐⭐⭐⭐⭐

Multiple specialized agents collaborate

Use for: Complex projects requiring diverse expertise

Example: "Build a web app" → Researcher + Designer + Coder + Tester

Current Challenges & Limitations

🔄

Reliability

Agents can get stuck in loops, hallucinate actions, or fail unexpectedly

Status: Active research area, improving with better prompts & scaffolding

💰

Cost

Complex agents make many LLM calls, costs can spiral ($10+ per task)

Solution: Set budgets, use cheaper models for simple steps

⏱️

Latency

Multi-step reasoning takes time (30s-5min for complex tasks)

Trade-off: Accuracy vs speed, streaming helps

🎯

Evaluation

Hard to measure agent quality objectively

Emerging: AgentBench, WebArena, SWE-bench for standardized testing

The Future: What's Next for Agents?

🧠 Better Reasoning Models

o1, o3-style models will make agents more reliable at planning and tool use

🔧 Agentic Operating Systems

Platforms like Anthropic's Computer Use: agents that can control your computer

🤝 Human-Agent Collaboration

Co-pilots that work alongside humans, not replace them

📊 Agent Marketplaces

Buying/selling specialized agents for specific tasks (like app stores)

Coming Soon!

This chapter will include hands-on tutorials for building your first agent, complete with code examples in Python using LangChain and OpenAI. You'll build a practical agent that can research topics and generate reports autonomously.

← Chapter 7 All Chapters Chapter 9 →