claude-py: Python SDK for Claude API with Streaming Support
Reverse engineered from Claude Code CLI v2.1.7 with exact API replication
✨ Features
- 🚀 Streaming by default - Lower latency, better user experience
- ⚡ Real-time text display - See responses as they're generated
- 🔄 Non-streaming mode - Available for batch processing
- 🛠️ Full tool support - All 17 tools from Claude Code
- 💾 Prompt caching - 90% cost reduction on multi-turn conversations
- 🎯 Exact API replication - Same headers, metadata, and behavior as Claude Code
- 🔒 OAuth support - Both OAuth tokens and API keys
- 📊 HTTP/2 - Modern, efficient protocol
🚀 Quick Start
Installation
# Install with uv (recommended)
uv pip install -e .
# Or with pip
pip install -e .
Basic Usage - Streaming (Recommended)
import asyncio
from claude import ClaudeAgentClient
async def main():
async with ClaudeAgentClient() as client:
# Stream response in real-time
async for chunk in client.send_message_stream("Tell me a joke"):
if chunk.text_delta:
print(chunk.text_delta, end='', flush=True)
print()
asyncio.run(main())
Output:
Why did the programmer quit his job?
Because he didn't get arrays! 😄
↑ Text appears character-by-character as it's generated
Non-Streaming Mode
import asyncio
from claude import ClaudeAgentClient, AgentOptions
async def main():
options = AgentOptions(stream=False)
async with ClaudeAgentClient(options=options) as client:
# Get complete response at once
response = await client.send_message("What is 2+2?")
print(response.content[0]['text'])
asyncio.run(main())
ChatClient (Simple Chat Completion API)
ChatClient provides a lightweight, stateless interface similar to OpenAI's chat
completions. It does not replicate Claude Code's full agent protocol (no tools,
no session history, no system prompt injection) but it carries the same
fingerprinting headers so the requests look identical on the wire.
import asyncio
from claude import ChatClient
async def main():
async with ChatClient() as c:
# One-shot (non-streaming)
r = await c.chat("Explain monads in one sentence")
print(r.content[0]["text"])
# Streaming
async for chunk in c.stream("Write a haiku about recursion"):
if chunk.text_delta:
print(chunk.text_delta, end="", flush=True)
print()
# Streaming, return text only
text = await c.collect("Say hello")
print(text)
# Per-call overrides
r = await c.chat(
"Who are you?",
system="You are a pirate.",
model="claude-haiku-4-5-20251001",
max_tokens=256,
)
print(r.content[0]["text"])
# Multi-turn via explicit message list
r = await c.chat([
{"role": "user", "content": "My name is Alice"},
{"role": "assistant", "content": "Hi Alice!"},
{"role": "user", "content": "What is my name?"},
])
print(r.content[0]["text"])
asyncio.run(main())
ChatClient reads ANTHROPIC_API_KEY or CLAUDE_CODE_OAUTH_TOKEN from the
environment, just like ClaudeAgentClient.
📖 Usage Examples
1. Simple Query
from claude import query
# One-shot query
result = await query("Explain quantum computing in one sentence")
print(result.content[0]['text'])
2. Multi-Turn Conversation
async with ClaudeAgentClient() as client:
# First message
async for chunk in client.send_message_stream("What's the capital of France?"):
if chunk.text_delta:
print(chunk.text_delta, end='')
print()
# Second message (continues conversation)
async for chunk in client.send_message_stream("What's the population?"):
if chunk.text_delta:
print(chunk.text_delta, end='')
print()
3. Streaming with Callback
def on_text(text: str):
"""Process text as it arrives."""
# Update UI, log, analyze, etc.
print(text, end='', flush=True)
async for chunk in client.send_message_stream("Write a haiku", on_text=on_text):
# Text already printed by callback
# Could also process chunks here
pass
4. Tool Execution
from claude import ToolResult
async with ClaudeAgentClient() as client:
# Assistant requests tool use
async for chunk in client.send_message_stream("Read the file data.txt"):
if chunk.text_delta:
print(chunk.text_delta, end='')
# Detect tool uses in chunks...
# Execute tool (simplified)
with open('data.txt', 'r') as f:
content = f.read()
# Send tool result back
tool_result = ToolResult(
type="tool_result",
tool_use_id="toolu_xxx",
content=content
)
async for chunk in client.send_message_stream("", tool_results=[tool_result]):
if chunk.text_delta:
print(chunk.text_delta, end='')
📊 Streaming vs Non-Streaming
Streaming (Default)
✅ Advantages: - Lower perceived latency - text appears immediately - Better user experience - users can read while generating - Real-time feedback - see progress as it happens - Cancelable - can interrupt mid-generation
❌ Trade-offs: - Slightly more complex to handle - Need async iteration
Use cases: Interactive applications, chat UIs, real-time analysis
Non-Streaming
✅ Advantages: - Simpler code - single await - Complete message at once - Easier to cache/store
❌ Trade-offs: - Higher perceived latency - user waits for complete response - No progress indication - Can't cancel mid-generation
Use cases: Batch processing, automated scripts, testing
⚙️ Configuration
AgentOptions
from claude import AgentOptions
options = AgentOptions(
# Model selection
model="claude-sonnet-4-5-20250929", # sonnet, opus, haiku
# Token limit
max_tokens=32000, # Claude Code default
# Streaming (default: True)
stream=True,
# Tool filtering
allowed_tools=["Read", "Write", "Bash"],
disallowed_tools=["WebSearch"],
# Custom system prompt
system_prompt="You are a helpful assistant.",
append_system_prompt="Additional instructions...",
# Session management
session_id="custom-uuid", # Or None for auto-generated
# Working directory
cwd="/path/to/workspace",
)
client = ClaudeAgentClient(options=options)
🔧 Advanced Features
Session Persistence
# Sessions are client-side - just maintain the client instance
async with ClaudeAgentClient() as client:
# Message 1
await client.send_message("Hello")
# Message 2 (continues conversation)
await client.send_message("How are you?")
# All messages stored in client.messages
print(f"Conversation has {len(client.messages)} messages")
Token Counting
# Count tokens for current conversation
token_count = await client.count_tokens()
print(f"Current conversation uses {token_count} tokens")
Custom Headers
The SDK uses exact headers from Claude Code CLI:
- anthropic-beta: claude-code-20250219,oauth-2025-04-20,interleaved-thinking-2025-05-14
- x-stainless-*: All Stainless SDK headers
- user-agent: claude-cli/2.1.7 (external, cli)
All captured from mitmproxy traffic analysis.
📁 Examples & Tests
See .agents/tests/ directory:
streaming_basic.py- Simple streaming demostreaming_with_callback.py- Using callbacks for processingcompare_streaming_vs_non_streaming.py- Performance comparisonstreaming_chat.py- Full interactive chat app with toolsdebug_streaming_chat.py- Comprehensive debug tool
Run examples:
# Set your API key
export ANTHROPIC_API_KEY="sk-ant-..."
# Run examples
python .agents/tests/streaming_basic.py
python .agents/tests/streaming_chat.py
python .agents/tests/debug_streaming_chat.py
🔐 Authentication
API Keys
export ANTHROPIC_API_KEY="sk-ant-api03-YOUR_KEY_HERE"
OAuth Tokens
export ANTHROPIC_API_KEY="sk-ant-oat01-YOUR_TOKEN_HERE"
Both work identically - no code changes needed!
🏗️ Architecture
Streaming Flow
User Request
↓
Client.send_message_stream()
↓
POST /v1/messages?beta=true
{
"stream": true,
"messages": [...],
...
}
↓
Server-Sent Events (SSE) Stream
↓
event: message_start
event: content_block_start
event: content_block_delta ← Text chunks arrive
event: content_block_delta ← More text
event: content_block_stop
event: message_stop
↓
StreamParser accumulates chunks
↓
Yields StreamChunk objects
↓
User processes in real-time
Non-Streaming Flow
User Request
↓
Client.send_message()
↓
POST /v1/messages?beta=true
{
"stream": false,
"messages": [...],
...
}
↓
Complete JSON Response
↓
AssistantMessage returned
📚 API Reference
ClaudeAgentClient
class ClaudeAgentClient:
async def send_message_stream(
prompt: str,
tool_results: Optional[List[ToolResult]] = None,
on_text: Optional[Callable[[str], None]] = None,
) -> AsyncIterator[StreamChunk]:
"""Stream response chunks."""
async def send_message(
prompt: str,
tool_results: Optional[List[ToolResult]] = None,
) -> AssistantMessage:
"""Get complete response."""
async def count_tokens() -> int:
"""Count tokens in conversation."""
async def close():
"""Close HTTP client."""
StreamChunk
class StreamChunk:
event_type: str # "message_start", "content_block_delta", etc.
data: Dict[str, Any] # Raw event data
text_delta: Optional[str] # Text for this chunk (if text event)
content_block: Optional[Dict] # Content block data (if applicable)
🎯 Streaming Best Practices
- Always use streaming for interactive UIs ```python # Good - streaming async for chunk in client.send_message_stream(user_input): display_text(chunk.text_delta)
# Bad - non-streaming (user waits) response = await client.send_message(user_input) display_text(response.content[0]['text']) ```
- Use callbacks for real-time processing ```python def process_chunk(text): update_ui(text) log_to_file(text) analyze_sentiment(text)
async for chunk in client.send_message_stream(msg, on_text=process_chunk): pass ```
-
Handle errors gracefully
python try: async for chunk in client.send_message_stream(msg): if chunk.text_delta: print(chunk.text_delta, end='') except httpx.HTTPError as e: print(f"\nError: {e}") -
Accumulate for complete message ```python from claude.streaming import StreamParser
parser = StreamParser() async for chunk in client.send_message_stream(msg): parser.add_chunk(chunk) print(chunk.text_delta, end='')
complete_message = parser.to_dict() save_to_database(complete_message) ```
🚦 Performance
Latency Comparison
| Mode | Time to First Byte | Total Time | User Experience |
|---|---|---|---|
| Streaming | ~0.2s | 2.5s | ✅ Sees text immediately |
| Non-streaming | 2.5s | 2.5s | ❌ Waits 2.5s for anything |
Streaming wins for UX! Users can start reading while the rest generates.
Cost
Both modes cost the same per token. Use streaming for better UX at no extra cost!
🐛 Troubleshooting
Import Error: No module named 'httpx_sse'
# Install with http2 and sse support
uv pip install "httpx[http2]>=0.27.0" "httpx-sse>=0.4.0"
Streaming Not Working
Check that stream=True in options:
options = AgentOptions(stream=True)
Text Not Appearing Real-Time
Ensure you're flushing output:
print(chunk.text_delta, end='', flush=True) # flush=True is important!
📁 Project Structure
claude-py/
├── src/claude/ # Core SDK implementation
│ ├── client.py # Main client with streaming support
│ ├── streaming.py # SSE parser and StreamParser
│ ├── types.py # Type definitions
│ ├── tools.json # All 17 tools from Claude Code
│ └── system_prompt.json # 13KB system prompt from Claude Code
├── .agents/ # Self-generated documentation & tests
│ ├── *.md # All internal documentation
│ ├── INDEX.md # Directory navigation guide
│ ├── verify_installation.py # Verify SDK setup
│ ├── reinstall.sh # Reinstall with latest fixes
│ ├── data/ # Analysis assets
│ │ ├── *.mitm # mitmproxy traffic captures
│ │ ├── real_system_prompt.json
│ │ └── real_tools.json
│ ├── research/ # Research notes & analysis
│ └── tests/ # All examples, tests & debug tools
│ ├── streaming_*.py # Streaming examples
│ ├── debug_*.py # Debug utilities
│ ├── test_*.py # Test scripts
│ └── README.md # Test documentation
├── README.md # This file (user documentation)
└── pyproject.toml # Package configuration
📖 Further Reading
User Documentation
- .agents/QUICKSTART.md - 60-second quick start guide
- .agents/STREAMING_GUIDE.md - Deep dive into streaming implementation
- .agents/INSTALL.md - Installation & troubleshooting
- Anthropic API Documentation
Development & Internal Docs
- .agents/INDEX.md - Navigation guide for .agents/ directory
- .agents/tests/README.md - Test & example documentation
- .agents/FINGERPRINTING_FIX.md - How API fingerprinting was resolved
- .agents/IMPLEMENTATION_SUMMARY.md - Technical architecture
- .agents/research/SESSION_MANAGEMENT_REPORT.md - Session/forking architecture
🎉 Migration from v0.1 (Non-Streaming)
# Old (v0.1) - non-streaming only
response = await client.send_message("Hello")
print(response.content[0]['text'])
# New (v0.2) - streaming by default
async for chunk in client.send_message_stream("Hello"):
if chunk.text_delta:
print(chunk.text_delta, end='', flush=True)
# Or keep non-streaming behavior
options = AgentOptions(stream=False)
client = ClaudeAgentClient(options=options)
response = await client.send_message("Hello") # Same as v0.1
⚖️ License
This SDK is a reverse-engineered implementation based on Claude Code CLI traffic analysis. Use at your own discretion.
🙏 Credits
Built through analysis of Claude Code CLI v2.1.7 using mitmproxy traffic captures.
Ready to stream? Check out .agents/tests/streaming_chat.py for a complete working chat app! 🚀
Need help? See .agents/INDEX.md for complete documentation navigation.
