# claude-py: Python SDK for Claude API with Streaming Support **Reverse engineered from Claude Code CLI v2.1.7 with exact API replication** ## ✨ Features - **🚀 Streaming by default** - Lower latency, better user experience - **⚡ Real-time text display** - See responses as they're generated - **🔄 Non-streaming mode** - Available for batch processing - **🛠️ Full tool support** - All 17 tools from Claude Code - **💾 Prompt caching** - 90% cost reduction on multi-turn conversations - **🎯 Exact API replication** - Same headers, metadata, and behavior as Claude Code - **🔒 OAuth support** - Both OAuth tokens and API keys - **📊 HTTP/2** - Modern, efficient protocol ## 🚀 Quick Start ### Installation ```bash # Install with uv (recommended) uv pip install -e . # Or with pip pip install -e . ``` ### Basic Usage - Streaming (Recommended) ```python import asyncio from claude import ClaudeAgentClient async def main(): async with ClaudeAgentClient() as client: # Stream response in real-time async for chunk in client.send_message_stream("Tell me a joke"): if chunk.text_delta: print(chunk.text_delta, end='', flush=True) print() asyncio.run(main()) ``` **Output:** ``` Why did the programmer quit his job? Because he didn't get arrays! 😄 ``` ↑ _Text appears character-by-character as it's generated_ ### Non-Streaming Mode ```python import asyncio from claude import ClaudeAgentClient, AgentOptions async def main(): options = AgentOptions(stream=False) async with ClaudeAgentClient(options=options) as client: # Get complete response at once response = await client.send_message("What is 2+2?") print(response.content[0]['text']) asyncio.run(main()) ``` ### ChatClient (Simple Chat Completion API) `ChatClient` provides a lightweight, stateless interface similar to OpenAI's chat completions. It does **not** replicate Claude Code's full agent protocol (no tools, no session history, no system prompt injection) but it carries the same fingerprinting headers so the requests look identical on the wire. ```python import asyncio from claude import ChatClient async def main(): async with ChatClient() as c: # One-shot (non-streaming) r = await c.chat("Explain monads in one sentence") print(r.content[0]["text"]) # Streaming async for chunk in c.stream("Write a haiku about recursion"): if chunk.text_delta: print(chunk.text_delta, end="", flush=True) print() # Streaming, return text only text = await c.collect("Say hello") print(text) # Per-call overrides r = await c.chat( "Who are you?", system="You are a pirate.", model="claude-haiku-4-5-20251001", max_tokens=256, ) print(r.content[0]["text"]) # Multi-turn via explicit message list r = await c.chat([ {"role": "user", "content": "My name is Alice"}, {"role": "assistant", "content": "Hi Alice!"}, {"role": "user", "content": "What is my name?"}, ]) print(r.content[0]["text"]) asyncio.run(main()) ``` `ChatClient` reads `ANTHROPIC_API_KEY` or `CLAUDE_CODE_OAUTH_TOKEN` from the environment, just like `ClaudeAgentClient`. ## 📖 Usage Examples ### 1. Simple Query ```python from claude import query # One-shot query result = await query("Explain quantum computing in one sentence") print(result.content[0]['text']) ``` ### 2. Multi-Turn Conversation ```python async with ClaudeAgentClient() as client: # First message async for chunk in client.send_message_stream("What's the capital of France?"): if chunk.text_delta: print(chunk.text_delta, end='') print() # Second message (continues conversation) async for chunk in client.send_message_stream("What's the population?"): if chunk.text_delta: print(chunk.text_delta, end='') print() ``` ### 3. Streaming with Callback ```python def on_text(text: str): """Process text as it arrives.""" # Update UI, log, analyze, etc. print(text, end='', flush=True) async for chunk in client.send_message_stream("Write a haiku", on_text=on_text): # Text already printed by callback # Could also process chunks here pass ``` ### 4. Tool Execution ```python from claude import ToolResult async with ClaudeAgentClient() as client: # Assistant requests tool use async for chunk in client.send_message_stream("Read the file data.txt"): if chunk.text_delta: print(chunk.text_delta, end='') # Detect tool uses in chunks... # Execute tool (simplified) with open('data.txt', 'r') as f: content = f.read() # Send tool result back tool_result = ToolResult( type="tool_result", tool_use_id="toolu_xxx", content=content ) async for chunk in client.send_message_stream("", tool_results=[tool_result]): if chunk.text_delta: print(chunk.text_delta, end='') ``` ## 📊 Streaming vs Non-Streaming ### Streaming (Default) ✅ **Advantages:** - Lower perceived latency - text appears immediately - Better user experience - users can read while generating - Real-time feedback - see progress as it happens - Cancelable - can interrupt mid-generation ❌ **Trade-offs:** - Slightly more complex to handle - Need async iteration **Use cases:** Interactive applications, chat UIs, real-time analysis ### Non-Streaming ✅ **Advantages:** - Simpler code - single await - Complete message at once - Easier to cache/store ❌ **Trade-offs:** - Higher perceived latency - user waits for complete response - No progress indication - Can't cancel mid-generation **Use cases:** Batch processing, automated scripts, testing ## ⚙️ Configuration ### AgentOptions ```python from claude import AgentOptions options = AgentOptions( # Model selection model="claude-sonnet-4-5-20250929", # sonnet, opus, haiku # Token limit max_tokens=32000, # Claude Code default # Streaming (default: True) stream=True, # Tool filtering allowed_tools=["Read", "Write", "Bash"], disallowed_tools=["WebSearch"], # Custom system prompt system_prompt="You are a helpful assistant.", append_system_prompt="Additional instructions...", # Session management session_id="custom-uuid", # Or None for auto-generated # Working directory cwd="/path/to/workspace", ) client = ClaudeAgentClient(options=options) ``` ## 🔧 Advanced Features ### Session Persistence ```python # Sessions are client-side - just maintain the client instance async with ClaudeAgentClient() as client: # Message 1 await client.send_message("Hello") # Message 2 (continues conversation) await client.send_message("How are you?") # All messages stored in client.messages print(f"Conversation has {len(client.messages)} messages") ``` ### Token Counting ```python # Count tokens for current conversation token_count = await client.count_tokens() print(f"Current conversation uses {token_count} tokens") ``` ### Custom Headers The SDK uses exact headers from Claude Code CLI: - `anthropic-beta`: claude-code-20250219,oauth-2025-04-20,interleaved-thinking-2025-05-14 - `x-stainless-*`: All Stainless SDK headers - `user-agent`: claude-cli/2.1.7 (external, cli) All captured from mitmproxy traffic analysis. ## 📁 Examples & Tests See `.agents/tests/` directory: - `streaming_basic.py` - Simple streaming demo - `streaming_with_callback.py` - Using callbacks for processing - `compare_streaming_vs_non_streaming.py` - Performance comparison - `streaming_chat.py` - Full interactive chat app with tools - `debug_streaming_chat.py` - Comprehensive debug tool Run examples: ```bash # Set your API key export ANTHROPIC_API_KEY="sk-ant-..." # Run examples python .agents/tests/streaming_basic.py python .agents/tests/streaming_chat.py python .agents/tests/debug_streaming_chat.py ``` ## 🔐 Authentication ### API Keys ```bash export ANTHROPIC_API_KEY="sk-ant-api03-YOUR_KEY_HERE" ``` ### OAuth Tokens ```bash export ANTHROPIC_API_KEY="sk-ant-oat01-YOUR_TOKEN_HERE" ``` Both work identically - no code changes needed! ## 🏗️ Architecture ### Streaming Flow ``` User Request ↓ Client.send_message_stream() ↓ POST /v1/messages?beta=true { "stream": true, "messages": [...], ... } ↓ Server-Sent Events (SSE) Stream ↓ event: message_start event: content_block_start event: content_block_delta ← Text chunks arrive event: content_block_delta ← More text event: content_block_stop event: message_stop ↓ StreamParser accumulates chunks ↓ Yields StreamChunk objects ↓ User processes in real-time ``` ### Non-Streaming Flow ``` User Request ↓ Client.send_message() ↓ POST /v1/messages?beta=true { "stream": false, "messages": [...], ... } ↓ Complete JSON Response ↓ AssistantMessage returned ``` ## 📚 API Reference ### ClaudeAgentClient ```python class ClaudeAgentClient: async def send_message_stream( prompt: str, tool_results: Optional[List[ToolResult]] = None, on_text: Optional[Callable[[str], None]] = None, ) -> AsyncIterator[StreamChunk]: """Stream response chunks.""" async def send_message( prompt: str, tool_results: Optional[List[ToolResult]] = None, ) -> AssistantMessage: """Get complete response.""" async def count_tokens() -> int: """Count tokens in conversation.""" async def close(): """Close HTTP client.""" ``` ### StreamChunk ```python class StreamChunk: event_type: str # "message_start", "content_block_delta", etc. data: Dict[str, Any] # Raw event data text_delta: Optional[str] # Text for this chunk (if text event) content_block: Optional[Dict] # Content block data (if applicable) ``` ## 🎯 Streaming Best Practices 1. **Always use streaming for interactive UIs** ```python # Good - streaming async for chunk in client.send_message_stream(user_input): display_text(chunk.text_delta) # Bad - non-streaming (user waits) response = await client.send_message(user_input) display_text(response.content[0]['text']) ``` 2. **Use callbacks for real-time processing** ```python def process_chunk(text): update_ui(text) log_to_file(text) analyze_sentiment(text) async for chunk in client.send_message_stream(msg, on_text=process_chunk): pass ``` 3. **Handle errors gracefully** ```python try: async for chunk in client.send_message_stream(msg): if chunk.text_delta: print(chunk.text_delta, end='') except httpx.HTTPError as e: print(f"\nError: {e}") ``` 4. **Accumulate for complete message** ```python from claude.streaming import StreamParser parser = StreamParser() async for chunk in client.send_message_stream(msg): parser.add_chunk(chunk) print(chunk.text_delta, end='') complete_message = parser.to_dict() save_to_database(complete_message) ``` ## 🚦 Performance ### Latency Comparison | Mode | Time to First Byte | Total Time | User Experience | |------|-------------------|------------|-----------------| | Streaming | ~0.2s | 2.5s | ✅ Sees text immediately | | Non-streaming | 2.5s | 2.5s | ❌ Waits 2.5s for anything | **Streaming wins for UX!** Users can start reading while the rest generates. ### Cost Both modes cost the same per token. Use streaming for better UX at no extra cost! ## 🐛 Troubleshooting ### Import Error: No module named 'httpx_sse' ```bash # Install with http2 and sse support uv pip install "httpx[http2]>=0.27.0" "httpx-sse>=0.4.0" ``` ### Streaming Not Working Check that `stream=True` in options: ```python options = AgentOptions(stream=True) ``` ### Text Not Appearing Real-Time Ensure you're flushing output: ```python print(chunk.text_delta, end='', flush=True) # flush=True is important! ``` ## 📁 Project Structure ``` claude-py/ ├── src/claude/ # Core SDK implementation │ ├── client.py # Main client with streaming support │ ├── streaming.py # SSE parser and StreamParser │ ├── types.py # Type definitions │ ├── tools.json # All 17 tools from Claude Code │ └── system_prompt.json # 13KB system prompt from Claude Code ├── .agents/ # Self-generated documentation & tests │ ├── *.md # All internal documentation │ ├── INDEX.md # Directory navigation guide │ ├── verify_installation.py # Verify SDK setup │ ├── reinstall.sh # Reinstall with latest fixes │ ├── data/ # Analysis assets │ │ ├── *.mitm # mitmproxy traffic captures │ │ ├── real_system_prompt.json │ │ └── real_tools.json │ ├── research/ # Research notes & analysis │ └── tests/ # All examples, tests & debug tools │ ├── streaming_*.py # Streaming examples │ ├── debug_*.py # Debug utilities │ ├── test_*.py # Test scripts │ └── README.md # Test documentation ├── README.md # This file (user documentation) └── pyproject.toml # Package configuration ``` ## 📖 Further Reading ### User Documentation - [.agents/QUICKSTART.md](.agents/QUICKSTART.md) - 60-second quick start guide - [.agents/STREAMING_GUIDE.md](.agents/STREAMING_GUIDE.md) - Deep dive into streaming implementation - [.agents/INSTALL.md](.agents/INSTALL.md) - Installation & troubleshooting - [Anthropic API Documentation](https://docs.anthropic.com/claude/reference/messages-streaming) ### Development & Internal Docs - [.agents/INDEX.md](.agents/INDEX.md) - Navigation guide for .agents/ directory - [.agents/tests/README.md](.agents/tests/README.md) - Test & example documentation - [.agents/FINGERPRINTING_FIX.md](.agents/FINGERPRINTING_FIX.md) - How API fingerprinting was resolved - [.agents/IMPLEMENTATION_SUMMARY.md](.agents/IMPLEMENTATION_SUMMARY.md) - Technical architecture - [.agents/research/SESSION_MANAGEMENT_REPORT.md](.agents/research/SESSION_MANAGEMENT_REPORT.md) - Session/forking architecture ## 🎉 Migration from v0.1 (Non-Streaming) ```python # Old (v0.1) - non-streaming only response = await client.send_message("Hello") print(response.content[0]['text']) # New (v0.2) - streaming by default async for chunk in client.send_message_stream("Hello"): if chunk.text_delta: print(chunk.text_delta, end='', flush=True) # Or keep non-streaming behavior options = AgentOptions(stream=False) client = ClaudeAgentClient(options=options) response = await client.send_message("Hello") # Same as v0.1 ``` ## ⚖️ License This SDK is a reverse-engineered implementation based on Claude Code CLI traffic analysis. Use at your own discretion. ## 🙏 Credits Built through analysis of Claude Code CLI v2.1.7 using mitmproxy traffic captures. --- **Ready to stream?** Check out `.agents/tests/streaming_chat.py` for a complete working chat app! 🚀 **Need help?** See [.agents/INDEX.md](.agents/INDEX.md) for complete documentation navigation.