← Deep Learning & Modern AI Chapter 8
Coming Soon

AI Agents & Tool Use

From chatbots to autonomous agents: building AI that can think, plan, and act

The Evolution: From Chat to Agents

2020-2022

Chatbots

LLMs respond to questions. No memory, no tools, no autonomy.

"What's the weather?" → "I don't have access to current weather data."
2023

Tool-Using Models

LLMs can call functions to access external data and services.

"What's the weather?" → calls weather API → "It's 72°F and sunny."
2024-2025

Autonomous Agents

LLMs plan multi-step tasks, use multiple tools, and operate autonomously.

"Book me a flight to Paris" → searches flights, compares prices, books ticket, adds to calendar

What You'll Learn

AI agents represent the next frontier: systems that can reason, plan, use tools, and take actions to accomplish complex goals. This chapter covers the complete agent stack:

01

Function Calling & Tool Use

The foundation: teaching LLMs to use external tools and APIs

  • What is function calling? LLM decides which tool to use and with what parameters
  • How it works: Tool schema definition, detection, execution
  • OpenAI function calling, Anthropic tool use, LangChain tools
  • Real examples: Weather API, database queries, web search
  • Structured outputs: JSON mode, typed responses
1. User: "What's the weather in SF?"
2. LLM thinks: "I need to call get_weather(location='SF')"
3. System: Executes API → Returns {temp: 72, condition: "sunny"}
4. LLM responds: "It's 72°F and sunny in San Francisco."
02

ReAct: Reasoning + Acting

The pattern that powers most modern agents

  • The ReAct loop: Thought → Action → Observation → Repeat
  • Why explicit reasoning improves reliability
  • Implementing ReAct from scratch
  • When ReAct breaks down: failure modes
  • Alternatives: Plan-and-Execute, Tree-of-Thoughts
Thought: I need to find out when Python 3.12 was released
Action: search("Python 3.12 release date")
Observation: Python 3.12.0 was released on October 2, 2023
Thought: I now know the answer
Answer: Python 3.12 was released on October 2, 2023
03

Planning & Decomposition

Breaking complex goals into achievable steps

  • Task decomposition: "Book a vacation" → 20 subtasks
  • Plan-and-Execute pattern: Create plan first, then execute
  • Hierarchical planning: High-level goals → Mid-level tasks → Low-level actions
  • Replanning: What to do when plans fail
  • Examples: AutoGPT, BabyAGI, GPT-Engineer
Goal: "Research competitors and create a comparison report"
Generated Plan:
  1. Identify top 5 competitors
  2. For each competitor: gather info on pricing, features, market share
  3. Compile data into structured format
  4. Analyze strengths/weaknesses
  5. Generate comparison table and summary report
Agent executes each step, using search, scraping, and analysis tools
04

Memory Systems

How agents remember past interactions and learn over time

  • Short-term memory: Conversation history within context window
  • Long-term memory: Vector databases, summary storage
  • Episodic memory: Remembering specific past events
  • Semantic memory: Learned facts and knowledge
  • Memory retrieval strategies: recency, relevance, importance
💭
Short-term

Current conversation (last 10 messages)

💾
Long-term

Past conversations stored in vector DB

🎯
Semantic

Learned facts: "User prefers Python over JavaScript"

05

Multi-Agent Systems

Multiple AI agents collaborating to solve problems

  • Specialist agents: Researcher, Coder, Critic, each with specific roles
  • Communication protocols: How agents talk to each other
  • Coordination patterns: Sequential, parallel, hierarchical
  • Consensus mechanisms: When agents disagree
  • Examples: AutoGen, CrewAI, MetaGPT
Researcher

Gathers information

Coder

Writes implementation

Critic

Reviews & suggests improvements

06

Agent Frameworks & Tools

The ecosystem for building production agents

  • LangChain: The original agent framework (agents, chains, tools)
  • LlamaIndex: Data-centric agents, retrieval focus
  • AutoGPT/BabyAGI: Autonomous task executors
  • CrewAI: Multi-agent orchestration
  • OpenAI Assistants API: Managed agent infrastructure
  • Anthropic Claude with tools: Native tool use support
07

Agent Reliability & Safety

Making agents trustworthy and production-ready

  • Failure modes: Infinite loops, hallucinated actions, cost explosions
  • Guardrails: Action approval, cost limits, time limits
  • Human-in-the-loop: When to ask for confirmation
  • Monitoring: Logging actions, tracking success rates
  • Fallback strategies: What to do when agents get stuck
Max iterations limit (prevent infinite loops)
Cost budget per session ($5 max)
Human approval for destructive actions
Comprehensive logging and observability
08

Real-World Agent Applications

How companies are deploying agents in production

  • Customer support: Autonomous ticket resolution
  • Data analysis: SQL query generation, visualization
  • Code generation: GitHub Copilot Workspace, Cursor
  • Research assistants: Literature review, summarization
  • Task automation: Email management, scheduling, CRM updates

Agent Architecture Blueprint

1. Perception

User input → LLM understands intent

2. Planning

Decompose goal into subtasks

3. Tool Selection

Choose appropriate tools/APIs

4. Action Execution

Call functions, get results

5. Reflection

Did it work? Replan if needed

6. Response

Return result to user

Agent Patterns Compared

Simple Function Calling

Complexity: ⭐

LLM calls one function, returns result

Use for: Single-step tasks (weather, calculator)
Example: "What's 25% of 80?" → calculate(0.25 * 80)

ReAct Agent

Complexity: ⭐⭐⭐

LLM reasons, acts, observes in loop until goal met

Use for: Multi-step reasoning with tools
Example: "Find cheapest flight to Paris" → search → compare → recommend

Plan-and-Execute

Complexity: ⭐⭐⭐⭐

Create full plan upfront, then execute steps

Use for: Complex, multi-day tasks
Example: "Prepare quarterly report" → 15-step plan → execute each

Multi-Agent System

Complexity: ⭐⭐⭐⭐⭐

Multiple specialized agents collaborate

Use for: Complex projects requiring diverse expertise
Example: "Build a web app" → Researcher + Designer + Coder + Tester

Current Challenges & Limitations

🔄

Reliability

Agents can get stuck in loops, hallucinate actions, or fail unexpectedly

Status: Active research area, improving with better prompts & scaffolding
💰

Cost

Complex agents make many LLM calls, costs can spiral ($10+ per task)

Solution: Set budgets, use cheaper models for simple steps
⏱️

Latency

Multi-step reasoning takes time (30s-5min for complex tasks)

Trade-off: Accuracy vs speed, streaming helps
🎯

Evaluation

Hard to measure agent quality objectively

Emerging: AgentBench, WebArena, SWE-bench for standardized testing

The Future: What's Next for Agents?

🧠 Better Reasoning Models

o1, o3-style models will make agents more reliable at planning and tool use

🔧 Agentic Operating Systems

Platforms like Anthropic's Computer Use: agents that can control your computer

🤝 Human-Agent Collaboration

Co-pilots that work alongside humans, not replace them

📊 Agent Marketplaces

Buying/selling specialized agents for specific tasks (like app stores)

Coming Soon!

This chapter will include hands-on tutorials for building your first agent, complete with code examples in Python using LangChain and OpenAI. You'll build a practical agent that can research topics and generate reports autonomously.

All Chapters