LLM Integration
π― Purpose: Master prompting, grounding, and reasoning techniques to get the best performance from language models.
π Source of Truth: google/adk-python/src/google/adk/models/ (ADK 1.15) + google/adk-python/src/google/adk/planners/ (ADK 1.15)
π Prompt Engineering Fundamentalsβ
The Prompt = Program Modelβ
Mental Model: A prompt is like programming an LLM:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PROMPT ANATOMY β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β [INSTR] SYSTEM/INSTRUCTION (Operating System) β
β "You are a helpful research assistant..." β
β β Defines agent personality and behavior β
β β Sets constraints and rules β
β β Provides role context β
β β
β [MEM] CONTEXT (Program Data) β
β "Current state: {topic}, Previous: {history}" β
β β Injected from session state β
β β Dynamic data via {key} syntax β
β β Tools available list β
β β
β [USER] USER MESSAGE (Function Call) β
β "Research quantum computing trends" β
β β The actual task/query β
β β Can be text, image, audio, video β
β β
β [TOOLS] TOOL RESULTS (Return Values) β
β "search_result: {...}" β
β β Feedback from tool execution β
β β Multi-turn conversation β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Instruction Patternsβ
Role-Based Instructions:
# Clear role definition
instruction = """
You are a senior software engineer who:
- Writes clean, maintainable code
- Follows Python best practices
- Adds helpful comments
- Explains complex concepts clearly
"""
# Task-specific role
instruction = """
You are a research analyst specializing in:
- Data-driven insights
- Statistical analysis
- Clear visualization recommendations
- Executive summary writing
"""
Constraint-Based Instructions:
# Behavioral constraints
instruction = """
Rules you must follow:
- Always cite sources for facts
- If unsure, say "I don't know" instead of guessing
- Keep responses under 3 paragraphs unless asked for detail
- Use examples when explaining concepts
- Ask clarifying questions when input is ambiguous
"""
# Output format constraints
instruction = """
Response format:
1. Executive summary (2-3 sentences)
2. Key findings (bullet points)
3. Recommendations (numbered list)
4. Next steps (if applicable)
"""
State-Aware Instructions:
# Dynamic context injection
instruction = """
You are helping {user:name} with their {current_task}.
Previous context: {conversation_summary}
Available tools: {tool_list}
Guidelines:
- Reference previous work when relevant
- Use appropriate tools for the task
- Maintain consistency with prior responses
"""
π Grounding Techniquesβ
Web Grounding (Real-World Facts)β
Mental Model: Grounding connects LLM imagination to reality:
# Automatic grounding with Gemini 2.0+
grounded_agent = Agent(
name="researcher",
model="gemini-2.0-flash", # Built-in search
instruction="Research current information using web search"
)
# Explicit search integration
search_agent = Agent(
name="web_researcher",
model="gemini-2.5-flash",
tools=[google_search],
instruction="Find and analyze current information"
)
Grounding Patterns:
- Factual Questions: "What is the current population of Tokyo?"
- Current Events: "What happened in the news today?"
- Real-time Data: "What's the current stock price of AAPL?"
- Local Information: "What's the weather like in San Francisco?"
Data Grounding (Internal Knowledge)β
Mental Model: Connect to your own data sources:
# Database grounding
db_agent = Agent(
name="data_analyst",
model="gemini-2.5-flash",
tools=[database_tool, analysis_tool],
instruction="Analyze company data to answer questions"
)
# Document grounding
doc_agent = Agent(
name="knowledge_assistant",
model="gemini-2.5-flash",
tools=[document_search_tool],
instruction="Find relevant information in company documents"
)
Location Grounding (Spatial Intelligence)β
Mental Model: Geographic awareness and spatial reasoning:
# Maps integration
location_agent = Agent(
name="location_assistant",
model="gemini-2.5-flash",
tools=[google_maps_grounding],
instruction="Help with location-based queries and directions"
)
# Capabilities:
# - Address resolution
# - Distance calculations
# - Points of interest
# - Route optimization
[BRAIN] Thinking & Reasoning Frameworksβ
Built-in Thinking (Native Model Capability)β
Mental Model: Model thinks step-by-step internally:
# Gemini 2.0+ thinking
thinking_agent = Agent(
name="reasoning_assistant",
model="gemini-2.0-flash",
instruction="Solve complex problems with clear reasoning",
thinking_config=ThinkingConfig(
include_thoughts=True, # Show reasoning in response
max_thoughts=10
)
)
# Thinking appears in response:
# "Let me think about this step by step:
# 1. First, I need to understand the problem...
# 2. Then, I should consider the constraints...
# 3. Finally, I'll provide the solution..."
Plan-ReAct Pattern (Structured Reasoning)β
Mental Model: Explicit planning and action framework:
from google.adk.planners import PlanReActPlanner
# Structured reasoning agent
reasoning_agent = Agent(
name="strategic_planner",
model="gemini-2.5-flash",
planner=PlanReActPlanner(),
tools=[research_tool, analysis_tool],
instruction="Plan and execute complex multi-step tasks"
)
# Execution pattern:
# [PLANNING] 1. Research topic 2. Analyze data 3. Create report
# [REASONING] I should start with research to gather facts...
# [ACTION] Call research_tool("quantum computing")
# [OBSERVATION] Found 15 relevant papers...
# [REPLANNING] Now analyze the data...
When to Use Each Approachβ
| Scenario | Built-in Thinking | Plan-ReAct | None |
|---|---|---|---|
| Complex reasoning needed | β | β | β |
| Want visible reasoning | β | β | β |
| Multi-step problems | β | β | π€ |
| Need replanning | β | β | β |
| Simple queries | β | β | β |
| Speed critical | β | β | β |
[FLOW] Multi-Turn Conversationsβ
Context Managementβ
Mental Model: Maintain conversation state across turns:
# State-aware agent
conversational_agent = Agent(
name="assistant",
model="gemini-2.5-flash",
instruction="""
You are a helpful assistant. Previous conversation:
{conversation_history}
Current user: {user:name}
Current task: {current_task}
""",
output_key="response"
)
# State tracks conversation flow
state['conversation_history'] += f"User: {user_input}\nAssistant: {response}\n"
state['current_task'] = extract_task(user_input)
Tool Call Chainsβ
Mental Model: LLMs can call multiple tools in sequence:
# Multi-tool agent
research_agent = Agent(
name="comprehensive_researcher",
model="gemini-2.5-flash",
tools=[web_search, database_query, analysis_tool],
instruction="""
Research thoroughly using all available tools:
1. Search the web for current information
2. Query internal databases for company data
3. Analyze and synthesize findings
"""
)
# LLM can generate multiple tool calls:
# 1. web_search("topic overview")
# 2. database_query("internal data")
# 3. analysis_tool("combine results")
Error Recoveryβ
Mental Model: Handle failures gracefully and recover:
# Robust agent with error handling
robust_agent = Agent(
name="reliable_assistant",
model="gemini-2.5-flash",
instruction="""
If a tool fails, try alternative approaches:
- Web search if database is down
- Simplified analysis if detailed data unavailable
- Clear explanation of limitations
Always provide value even with partial information.
""",
tools=[primary_tool, fallback_tool]
)
π¨ Advanced Prompting Techniquesβ
Chain of Thought Promptingβ
Mental Model: Guide step-by-step reasoning:
# Explicit reasoning steps
reasoning_instruction = """
Solve this problem step by step:
1. Understand the problem: What is being asked?
2. Identify key information: What data do I have?
3. Consider approaches: What methods could work?
4. Evaluate options: Which approach is best?
5. Execute solution: Implement the chosen approach
6. Verify result: Does this make sense?
Show your work at each step.
"""
Few-Shot Learningβ
Mental Model: Learn from examples in the prompt:
# Example-based instruction
few_shot_instruction = """
Classify the sentiment of text as positive, negative, or neutral.
Examples:
Text: "I love this product!" β positive
Text: "This is terrible quality" β negative
Text: "It's okay, nothing special" β neutral
Now classify: "{user_text}"
"""
Meta-Promptingβ
Mental Model: Prompt about how to prompt:
# Self-improving prompts
meta_instruction = """
First, analyze what type of question this is:
- Factual: Look for specific information
- Analytical: Break down components
- Creative: Generate novel ideas
- Advisory: Provide recommendations
Then, choose the appropriate response strategy:
- Factual: Cite sources, be precise
- Analytical: Structure with sections
- Creative: Brainstorm multiple options
- Advisory: Consider pros/cons, provide rationale
Question: {user_question}
"""
π Performance Optimizationβ
Model Selection Strategyβ
Mental Model: Right model for the right task:
# Model routing based on complexity
def select_model(query):
if len(query.split()) < 10: # Simple
return "gemini-2.5-flash" # Fast, cheap
elif "analyze" in query.lower(): # Complex
return "gemini-2.5-pro" # Better reasoning
else:
return "gemini-2.0-flash" # Balanced
# Dynamic model selection
agent = Agent(
name="adaptive_assistant",
model=select_model(user_query), # Dynamic selection
instruction="Provide the best response for this query type"
)
Context Window Managementβ
Mental Model: Keep relevant information, discard irrelevant:
# Context pruning
def prune_context(history, max_tokens=4000):
"""Keep recent and relevant messages"""
relevant = []
total_tokens = 0
for msg in reversed(history):
if total_tokens + len(msg) > max_tokens:
break
if is_relevant(msg): # Custom relevance function
relevant.insert(0, msg)
total_tokens += len(msg)
return relevant
# Efficient context usage
agent = Agent(
name="efficient_assistant",
model="gemini-2.5-flash",
instruction="Use context efficiently: {pruned_history}"
)
Caching Strategiesβ
Mental Model: Reuse computations for common queries:
# Response caching
cache = {}
def get_cached_response(query):
key = hash(query)
if key in cache:
return cache[key]
response = agent.run(query)
cache[key] = response
return response
# Semantic caching
def semantic_cache(query):
"""Cache based on meaning, not exact text"""
intent = extract_intent(query)
if intent in cache:
return adapt_cached_response(cache[intent], query)
π Debugging LLM Integrationβ
Response Analysisβ
Track LLM Behavior:
# Enable detailed logging
import logging
logging.getLogger('google.adk.models').setLevel(logging.DEBUG)
# Inspect LLM calls
result = await runner.run_async(query)
for event in result.events:
if event.type == 'llm_request':
print(f"Prompt: {event.prompt}")
print(f"Model: {event.model}")
elif event.type == 'llm_response':
print(f"Response: {event.response}")
print(f"Tokens: {event.token_count}")
Prompt Iterationβ
A/B Testing Prompts:
# Test different prompt versions
prompts = {
'concise': "Answer briefly: {query}",
'detailed': "Provide comprehensive answer: {query}",
'structured': "Structure answer with sections: {query}"
}
for version, prompt in prompts.items():
agent = Agent(model="gemini-2.5-flash", instruction=prompt)
result = agent.run(test_query)
score = evaluate_response(result)
print(f"{version}: {score}")
Common Issues & Solutionsβ
| Issue | Symptom | Solution |
|---|---|---|
| Hallucinations | Makes up facts | Add grounding tools, fact-checking |
| Verbose responses | Too much text | Set length limits, structured formats |
| Off-topic answers | Ignores constraints | Clear instructions, role definitions |
| Inconsistent style | Varies between responses | Consistent persona, examples |
| Tool misuse | Wrong tool for task | Better tool descriptions, examples |
| Context loss | Forgets conversation | State management, context injection |
π Related Topicsβ
- Tools & Capabilities β: What LLMs can call
- Workflows & Orchestration β: How LLMs coordinate tasks
- Decision Frameworks β: When to use different techniques
π― Key Takeawaysβ
- Prompting: Clear instructions, constraints, and examples
- Grounding: Connect to real world via search, data, location
- Thinking: Use built-in or explicit reasoning frameworks
- Context: Manage conversation state and relevance
- Optimization: Right model, efficient context, caching
- Debugging: Monitor LLM behavior and iterate prompts
π Next: Learn about Production & Deployment to run these integrations at scale.