What is Google ADK (Agent Development Kit)?

Google Agent Development Kit (ADK) is a comprehensive framework for building production-ready AI agents using Google Gemini models. It provides tools for agent orchestration, multi-agent systems, state management, and deployment patterns.

Is Google ADK training free?

Yes! All 34 tutorials, mental models, code examples, and production deployment guides are completely free. You can access everything without any registration or payment.

What will I learn from Google ADK Training Hub?

You will learn to build production AI agents from scratch, including multi-agent systems, agent orchestration patterns, Google Gemini integration, deployment on Cloud Run and Vertex AI, UI integration (React, Next.js, Streamlit), MCP tools, A2A protocol, and comprehensive testing strategies.

Do I need prior AI experience to start?

No prior AI experience required! The training starts from first principles with a "Hello World" agent and progressively covers advanced topics. Basic Python or JavaScript knowledge is helpful but not mandatory.

How long does it take to complete the training?

The complete training comprises 34 tutorials totaling approximately 34 hours of hands-on learning. You can go at your own pace - complete it in a week of intensive study or spread it over several weeks.

Can I deploy agents to production after this training?

Absolutely! The training includes comprehensive production deployment guides for Google Cloud Run, Vertex AI Agent Engine, and GKE. You will learn testing, monitoring, and best practices for production-grade AI agents.

Advanced Tutorial: GEPA-Based Prompt Optimization for Customer Support Agents

Implementation Repository

View the complete implementation on GitHub: raphaelmansuy/adk_training/tutorial_implementation/tutorial_gepa_optimization

This includes working code, tests, Makefile, and both simulated and real GEPA demos.

Why: The Problem with Manual Prompt Engineering

You spend hours tweaking your agent's prompt:

# Version 1: Too vague
"Help customers with refunds"
→ Agent processes refunds without checking identity ❌

# Version 2: Added one rule
"Help customers with refunds. Verify identity first."
→ Agent forgets to check 30-day return policy ❌

# Version 3: Added another rule...
# Version 4: Fixed edge case...
# Version 5: Still failing... 😤

The cycle never ends. Each fix breaks something else. You're guessing what works.

What: GEPA Breeds Better Prompts

Think of GEPA like breeding dogs. You don't manually design every trait—you let evolution do the work:

Start with a basic prompt (mixed breed dog)
Test it on real scenarios (dog show competitions)
See what fails (doesn't retrieve, barks too much)
Create variations addressing failures (breed for specific traits)
Test again, keep the best, repeat

Result: Your prompt evolves from 0% to 100% success automatically.

Try It (Choose Your Path)

Quick Demo (2 minutes - Simulated):

cd tutorial_implementation/tutorial_gepa_optimization
make setup && make demo

Real GEPA (5-10 minutes - Actual LLM Calls):

cd tutorial_implementation/tutorial_gepa_optimization
make setup
export GOOGLE_API_KEY="your-api-key"  # Get free key from https://aistudio.google.com/app/apikey
make real-demo

What you'll see:

Simulated demo shows the concept (instant, free):

Iteration 1: COLLECT → Seed prompt 0/5 passed
Iteration 2: REFLECT → LLM identifies missing security rules
Iteration 3: EVOLVE → Generate improved prompt
Iteration 4: EVALUATE → Evolved prompt 5/5 passed
Result: 0% → 100% improvement ✅

Real GEPA demo shows actual evolution (uses LLM, costs $0.05-0.10):

Iteration 1: COLLECT → Agent runs with seed prompt, collects actual results
           REFLECT → Gemini LLM analyzes failures
           EVOLVE → Gemini generates improved prompt based on insights
           EVALUATE → Test improved prompt
           SELECT → Compare and choose better version

Iteration 2: Repeat with new baseline - improve further

Result: Real optimization with actual LLM reflection!

How: The 5-Step Evolution Loop

GEPA is simple—just 5 steps that repeat:

Step 1: Collect (Gather Evidence)

Run your agent with the current prompt. Track what fails:

Test 1: Customer with wrong email → Agent approved anyway ❌
Test 2: Purchase 45 days ago → Agent ignored policy ❌  
Test 3: Valid request → Agent asked unnecessary questions ❌

Like: Recording which puppies can't retrieve balls.

Step 2: Reflect (Understand Why)

An LLM analyzes the failures:

"The prompt doesn't say to verify email BEFORE approving refunds.
 The prompt doesn't mention the 30-day policy.
 The prompt is too vague about when to ask questions."

Like: Understanding retriever dogs need strong jaw muscles and swimming ability.

Step 3: Evolve (Create Variations)

Generate new prompts fixing the issues:

Variant A: Added "Always verify identity first"
Variant B: Added "Check 30-day return window"  
Variant C: Combined both improvements

Like: Breeding puppies with stronger jaws AND better swimming.

Step 4: Evaluate (Test Performance)

Run all variants against your test scenarios:

Seed prompt:  0/10 passed (0%)
Variant A:    4/10 passed (40%)
Variant B:    6/10 passed (60%)  
Variant C:    9/10 passed (90%) ← Winner!

Like: Dog show results - Variant C wins.

Step 5: Select (Keep the Best)

Variant C becomes your new baseline. Repeat from Step 1 with tougher tests.

Iteration 1: 0% → 90%
Iteration 2: 90% → 95%
Iteration 3: 95% → 98%
...converges at 99%

Like: Each generation of puppies gets better at the specific task.

Quick Start (5 Minutes)

# 1. Setup
cd tutorial_implementation/tutorial_gepa_optimization
make setup

# 2. See evolution in action
make demo

# 3. (Optional) Try it yourself
export GOOGLE_API_KEY="your-key"
make dev  # Open localhost:8000

That's it! You've seen GEPA work and can now experiment.

Tutorial Includes Both Simulated and Real GEPA

Simulated Demo (make demo - 2 minutes):

Shows GEPA concepts without LLM calls
Instant results, no API costs
Great for understanding the algorithm
Uses pattern matching to simulate agent behavior

Real GEPA (make real-demo - 5-10 minutes):

✨ NEW: Uses actual LLM reflection with google-genai
Gemini LLM analyzes real failures
Generates truly optimized prompts
Costs $0.05-$0.10 per run
Production-ready implementation

What this tutorial provides:

✅ Complete GEPA implementation (both simulated and real)
✅ Working code for actual LLM-based optimization
✅ Testable examples with real evaluation
✅ Clear learning progression

For production GEPA optimization:

See the full research implementation in google/adk-python
Read comprehensive guides in research/gepa/ directory
Install DSPy: pip install dspy-ai
Reference the GEPA paper for methodology

Performance metrics cited (10-20% improvement, 35x fewer rollouts) are from the original research paper and represent results from the full research implementation, not this simplified tutorial.

From Tutorial to Production

Learning Path:

✅ Complete this tutorial (2 minutes) - understand concepts
📚 Read research/gepa/README.md (10 minutes) - full overview
🔬 Run research implementation (30-90 minutes) - real optimization
🚀 Deploy optimized prompt to production

The research implementation includes 640+ lines of production code with tau-bench integration, LLM-based reflection, Pareto frontier selection, and parallel execution. See google/adk-python for the full implementation.

Under the Hood (For the Curious)

The demo uses a customer support agent with 3 simple tools:

verify_customer_identity - Checks order ID + email match
check_return_policy - Validates 30-day return window
process_refund - Generates transaction ID

The Seed Prompt (intentionally weak):

"You are a helpful customer support agent.
 Help customers with their requests.
 Be professional and efficient."

The Evolved Prompt (after GEPA):

"You are a professional customer support agent.

CRITICAL: Always follow this security protocol:
1. ALWAYS verify customer identity FIRST (order ID + email)
2. NEVER process any refund without identity verification
3. Only process refunds for orders within the 30-day return window

[...detailed procedures and policies...]"

Why It Works: The evolved prompt has explicit rules the seed prompt lacked.

Run the Tests

make test  # 34 tests validate everything works

Try It Yourself

cd tutorial_implementation/tutorial_gepa_optimization
make setup && make demo

What You'll See:

6 phases showing the complete evolution cycle:

Phase 1: Weak seed prompt
Phase 2: Tests fail (0/5 scenarios pass)
Phase 3: LLM reflects on failures
Phase 4: Evolved prompt generated
Phase 5: Tests pass (5/5 scenarios pass)
Phase 6: Results show 0% → 100% improvement

Want Interactive Mode?

make dev  # Opens ADK web interface on http://localhost:8000

Test these scenarios yourself:

"I bought a laptop but it broke, I want a refund" (valid request)
"Give me a refund for ORD-12345" (missing identity verification)
"I want my money back for the phone I bought 45 days ago" (outside window)

Common Issues

Import Errors?

pip install --upgrade google-genai>=1.15.0

GOOGLE_API_KEY Not Set?

export GOOGLE_API_KEY=your_actual_api_key_here

Tests Failing?

make clean && make setup && make test

Key Takeaways

1. GEPA Works Because:

Explores many prompt variations systematically
Uses real performance data to guide evolution
Combines successful elements from variants
Iterates until convergence

2. Seed Prompt Matters:

Too specific → limited evolution
Too generic → slow convergence
Start with reasonable baseline

3. Evaluation Dataset Quality:

Representative scenarios = robust improvements
Edge cases matter
Test on new data to validate

4. Avoid These Mistakes:

❌ Over-fitting to test scenarios
❌ Stopping too early
❌ Ignoring edge cases
❌ Not validating on fresh data

Next Steps

Apply GEPA to Your Own Agents

Use the same pattern from this tutorial:

Define your evaluation scenarios (real-world test cases)
Create a weak seed prompt
Run GEPA evolution
Measure improvement

Validate with Standard Benchmarks

Instead of only custom test scenarios, validate your GEPA-optimized prompts against established benchmarks:

HELM (Holistic Evaluation of Language Models)

Stanford's comprehensive evaluation framework
Measures accuracy, efficiency, bias, toxicity
100+ scenarios across diverse domains
Install: pip install crfm-helm

# Evaluate your agent with HELM
helm-run --run-entries mmlu:subject=customer_service,model=your-agent \
  --suite gepa-validation --max-eval-instances 100
helm-summarize --suite gepa-validation

DSPy Evaluation Suite

Built-in prompt optimization metrics
Compare GEPA results against DSPy optimizers
GEPA is part of the DSPy ecosystem

Why standardized benchmarks matter:

Objective comparison against baselines
Reproducible results across teams
Track improvements over time
Validate GEPA gains on industry-standard tasks

Track Metrics Over Time

Version control your evolved prompts
A/B test in production (seed vs evolved)
Monitor real-world performance
Re-run GEPA when metrics drop

Deploy to Production

Once validated:

Use evolved prompt as your production baseline
Set up monitoring dashboards
Schedule periodic GEPA optimization
Continuously improve based on real user data

Additional Resources

Official Research & Documentation

GEPA Research Paper - Lakshya A Agrawal et al., Stanford NLP (July 2025)
- Full paper: "GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning"
- Demonstrates 10-20% improvement over GRPO with 35x fewer rollouts
- Comprehensive methodology and evaluation results
DSPy Framework - Stanford NLP (29.9k+ stars)
- GEPA is part of the DSPy ecosystem
- Documentation: dspy.ai
- Install: pip install dspy-ai
- Community: Discord Server

Evaluation Benchmarks

HELM - Holistic Evaluation of Language Models
- Stanford CRFM's comprehensive evaluation framework
- 100+ scenarios across diverse domains
- Leaderboards: crfm.stanford.edu/helm
BIG-bench - Beyond the Imitation Game
- Google's diverse task evaluation suite
- Collaborative benchmark with 200+ tasks

Tutorial 01-35 - Foundation tutorials (prerequisites)
Tutorial 02: Function Tools - Tool implementation patterns
Tutorial 04: Sequential Workflows - Agent orchestration
Tutorial 30: Full-stack Integration - Production deployment

Community & Support

Questions? Open an issue on GitHub Issues
Improvements? Submit a PR to GitHub Repo
Discussions? Join the DSPy Discord community

💬 Join the Discussion

Have questions or feedback? Discuss this tutorial with the community on GitHub Discussions.

Why: The Problem with Manual Prompt Engineering​

What: GEPA Breeds Better Prompts​

Try It (Choose Your Path)​

How: The 5-Step Evolution Loop​

Step 1: Collect (Gather Evidence)​

Step 2: Reflect (Understand Why)​

Step 3: Evolve (Create Variations)​

Step 4: Evaluate (Test Performance)​

Step 5: Select (Keep the Best)​

Quick Start (5 Minutes)​

Under the Hood (For the Curious)​

Run the Tests​

Try It Yourself​

Common Issues​

Key Takeaways​

Next Steps​

Apply GEPA to Your Own Agents​

Validate with Standard Benchmarks​

Track Metrics Over Time​

Deploy to Production​

Additional Resources​

Official Research & Documentation​

Evaluation Benchmarks​

Related Tutorials​

Community & Support​