claude-py: Python SDK for Claude API with Streaming Support

Reverse engineered from Claude Code CLI v2.1.7 with exact API replication

✨ Features

🚀 Streaming by default - Lower latency, better user experience
⚡ Real-time text display - See responses as they're generated
🔄 Non-streaming mode - Available for batch processing
🛠️ Full tool support - All 17 tools from Claude Code
💾 Prompt caching - 90% cost reduction on multi-turn conversations
🎯 Exact API replication - Same headers, metadata, and behavior as Claude Code
🔒 OAuth support - Both OAuth tokens and API keys
📊 HTTP/2 - Modern, efficient protocol

🚀 Quick Start

Installation

# Install with uv (recommended)
uv pip install -e .

# Or with pip
pip install -e .

Basic Usage - Streaming (Recommended)

import asyncio
from claude import ClaudeAgentClient

async def main():
    async with ClaudeAgentClient() as client:
        # Stream response in real-time
        async for chunk in client.send_message_stream("Tell me a joke"):
            if chunk.text_delta:
                print(chunk.text_delta, end='', flush=True)
        print()

asyncio.run(main())

Output:

Why did the programmer quit his job?

Because he didn't get arrays! 😄

↑ Text appears character-by-character as it's generated

Non-Streaming Mode

import asyncio
from claude import ClaudeAgentClient, AgentOptions

async def main():
    options = AgentOptions(stream=False)

    async with ClaudeAgentClient(options=options) as client:
        # Get complete response at once
        response = await client.send_message("What is 2+2?")
        print(response.content[0]['text'])

asyncio.run(main())

ChatClient (Simple Chat Completion API)

ChatClient provides a lightweight, stateless interface similar to OpenAI's chat completions. It does not replicate Claude Code's full agent protocol (no tools, no session history, no system prompt injection) but it carries the same fingerprinting headers so the requests look identical on the wire.

import asyncio
from claude import ChatClient

async def main():
    async with ChatClient() as c:
        # One-shot (non-streaming)
        r = await c.chat("Explain monads in one sentence")
        print(r.content[0]["text"])

        # Streaming
        async for chunk in c.stream("Write a haiku about recursion"):
            if chunk.text_delta:
                print(chunk.text_delta, end="", flush=True)
        print()

        # Streaming, return text only
        text = await c.collect("Say hello")
        print(text)

        # Per-call overrides
        r = await c.chat(
            "Who are you?",
            system="You are a pirate.",
            model="claude-haiku-4-5-20251001",
            max_tokens=256,
        )
        print(r.content[0]["text"])

        # Multi-turn via explicit message list
        r = await c.chat([
            {"role": "user", "content": "My name is Alice"},
            {"role": "assistant", "content": "Hi Alice!"},
            {"role": "user", "content": "What is my name?"},
        ])
        print(r.content[0]["text"])

asyncio.run(main())

ChatClient reads ANTHROPIC_API_KEY or CLAUDE_CODE_OAUTH_TOKEN from the environment, just like ClaudeAgentClient.

📖 Usage Examples

1. Simple Query

from claude import query

# One-shot query
result = await query("Explain quantum computing in one sentence")
print(result.content[0]['text'])

2. Multi-Turn Conversation

async with ClaudeAgentClient() as client:
    # First message
    async for chunk in client.send_message_stream("What's the capital of France?"):
        if chunk.text_delta:
            print(chunk.text_delta, end='')
    print()

    # Second message (continues conversation)
    async for chunk in client.send_message_stream("What's the population?"):
        if chunk.text_delta:
            print(chunk.text_delta, end='')
    print()

3. Streaming with Callback

def on_text(text: str):
    """Process text as it arrives."""
    # Update UI, log, analyze, etc.
    print(text, end='', flush=True)

async for chunk in client.send_message_stream("Write a haiku", on_text=on_text):
    # Text already printed by callback
    # Could also process chunks here
    pass

4. Tool Execution

from claude import ToolResult

async with ClaudeAgentClient() as client:
    # Assistant requests tool use
    async for chunk in client.send_message_stream("Read the file data.txt"):
        if chunk.text_delta:
            print(chunk.text_delta, end='')
        # Detect tool uses in chunks...

    # Execute tool (simplified)
    with open('data.txt', 'r') as f:
        content = f.read()

    # Send tool result back
    tool_result = ToolResult(
        type="tool_result",
        tool_use_id="toolu_xxx",
        content=content
    )

    async for chunk in client.send_message_stream("", tool_results=[tool_result]):
        if chunk.text_delta:
            print(chunk.text_delta, end='')

📊 Streaming vs Non-Streaming

Streaming (Default)

✅ Advantages: - Lower perceived latency - text appears immediately - Better user experience - users can read while generating - Real-time feedback - see progress as it happens - Cancelable - can interrupt mid-generation

❌ Trade-offs: - Slightly more complex to handle - Need async iteration

Use cases: Interactive applications, chat UIs, real-time analysis

Non-Streaming

✅ Advantages: - Simpler code - single await - Complete message at once - Easier to cache/store

❌ Trade-offs: - Higher perceived latency - user waits for complete response - No progress indication - Can't cancel mid-generation

Use cases: Batch processing, automated scripts, testing

⚙️ Configuration

AgentOptions

from claude import AgentOptions

options = AgentOptions(
    # Model selection
    model="claude-sonnet-4-5-20250929",  # sonnet, opus, haiku

    # Token limit
    max_tokens=32000,  # Claude Code default

    # Streaming (default: True)
    stream=True,

    # Tool filtering
    allowed_tools=["Read", "Write", "Bash"],
    disallowed_tools=["WebSearch"],

    # Custom system prompt
    system_prompt="You are a helpful assistant.",
    append_system_prompt="Additional instructions...",

    # Session management
    session_id="custom-uuid",  # Or None for auto-generated

    # Working directory
    cwd="/path/to/workspace",
)

client = ClaudeAgentClient(options=options)

🔧 Advanced Features

Session Persistence

# Sessions are client-side - just maintain the client instance
async with ClaudeAgentClient() as client:
    # Message 1
    await client.send_message("Hello")

    # Message 2 (continues conversation)
    await client.send_message("How are you?")

    # All messages stored in client.messages
    print(f"Conversation has {len(client.messages)} messages")

Token Counting

# Count tokens for current conversation
token_count = await client.count_tokens()
print(f"Current conversation uses {token_count} tokens")

Custom Headers

The SDK uses exact headers from Claude Code CLI: - anthropic-beta: claude-code-20250219,oauth-2025-04-20,interleaved-thinking-2025-05-14 - x-stainless-*: All Stainless SDK headers - user-agent: claude-cli/2.1.7 (external, cli)

All captured from mitmproxy traffic analysis.

📁 Examples & Tests

See .agents/tests/ directory:

streaming_basic.py - Simple streaming demo
streaming_with_callback.py - Using callbacks for processing
compare_streaming_vs_non_streaming.py - Performance comparison
streaming_chat.py - Full interactive chat app with tools
debug_streaming_chat.py - Comprehensive debug tool

Run examples:

# Set your API key
export ANTHROPIC_API_KEY="sk-ant-..."

# Run examples
python .agents/tests/streaming_basic.py
python .agents/tests/streaming_chat.py
python .agents/tests/debug_streaming_chat.py

🔐 Authentication

API Keys

export ANTHROPIC_API_KEY="sk-ant-api03-YOUR_KEY_HERE"

OAuth Tokens

export ANTHROPIC_API_KEY="sk-ant-oat01-YOUR_TOKEN_HERE"

Both work identically - no code changes needed!

🏗️ Architecture

Streaming Flow

User Request
    ↓
Client.send_message_stream()
    ↓
POST /v1/messages?beta=true
{
  "stream": true,
  "messages": [...],
  ...
}
    ↓
Server-Sent Events (SSE) Stream
    ↓
event: message_start
event: content_block_start
event: content_block_delta  ← Text chunks arrive
event: content_block_delta  ← More text
event: content_block_stop
event: message_stop
    ↓
StreamParser accumulates chunks
    ↓
Yields StreamChunk objects
    ↓
User processes in real-time

Non-Streaming Flow

User Request
    ↓
Client.send_message()
    ↓
POST /v1/messages?beta=true
{
  "stream": false,
  "messages": [...],
  ...
}
    ↓
Complete JSON Response
    ↓
AssistantMessage returned

📚 API Reference

ClaudeAgentClient

class ClaudeAgentClient:
    async def send_message_stream(
        prompt: str,
        tool_results: Optional[List[ToolResult]] = None,
        on_text: Optional[Callable[[str], None]] = None,
    ) -> AsyncIterator[StreamChunk]:
        """Stream response chunks."""

    async def send_message(
        prompt: str,
        tool_results: Optional[List[ToolResult]] = None,
    ) -> AssistantMessage:
        """Get complete response."""

    async def count_tokens() -> int:
        """Count tokens in conversation."""

    async def close():
        """Close HTTP client."""

StreamChunk

class StreamChunk:
    event_type: str              # "message_start", "content_block_delta", etc.
    data: Dict[str, Any]         # Raw event data
    text_delta: Optional[str]    # Text for this chunk (if text event)
    content_block: Optional[Dict]  # Content block data (if applicable)

🎯 Streaming Best Practices

Always use streaming for interactive UIs ```python # Good - streaming async for chunk in client.send_message_stream(user_input): display_text(chunk.text_delta)

# Bad - non-streaming (user waits) response = await client.send_message(user_input) display_text(response.content[0]['text']) ```

Use callbacks for real-time processing ```python def process_chunk(text): update_ui(text) log_to_file(text) analyze_sentiment(text)

async for chunk in client.send_message_stream(msg, on_text=process_chunk): pass ```

Handle errors gracefully python try: async for chunk in client.send_message_stream(msg): if chunk.text_delta: print(chunk.text_delta, end='') except httpx.HTTPError as e: print(f"\nError: {e}")
Accumulate for complete message ```python from claude.streaming import StreamParser

parser = StreamParser() async for chunk in client.send_message_stream(msg): parser.add_chunk(chunk) print(chunk.text_delta, end='')

complete_message = parser.to_dict() save_to_database(complete_message) ```

🚦 Performance

Latency Comparison

Mode	Time to First Byte	Total Time	User Experience
Streaming	~0.2s	2.5s	✅ Sees text immediately
Non-streaming	2.5s	2.5s	❌ Waits 2.5s for anything

Streaming wins for UX! Users can start reading while the rest generates.

# Install with http2 and sse support
uv pip install "httpx[http2]>=0.27.0" "httpx-sse>=0.4.0"

Streaming Not Working

Check that stream=True in options:

options = AgentOptions(stream=True)

Text Not Appearing Real-Time

Ensure you're flushing output:

print(chunk.text_delta, end='', flush=True)  # flush=True is important!

📁 Project Structure

claude-py/
├── src/claude/              # Core SDK implementation
│   ├── client.py           # Main client with streaming support
│   ├── streaming.py        # SSE parser and StreamParser
│   ├── types.py            # Type definitions
│   ├── tools.json          # All 17 tools from Claude Code
│   └── system_prompt.json  # 13KB system prompt from Claude Code
├── .agents/                # Self-generated documentation & tests
│   ├── *.md                # All internal documentation
│   ├── INDEX.md            # Directory navigation guide
│   ├── verify_installation.py  # Verify SDK setup
│   ├── reinstall.sh        # Reinstall with latest fixes
│   ├── data/               # Analysis assets
│   │   ├── *.mitm          # mitmproxy traffic captures
│   │   ├── real_system_prompt.json
│   │   └── real_tools.json
│   ├── research/           # Research notes & analysis
│   └── tests/              # All examples, tests & debug tools
│       ├── streaming_*.py  # Streaming examples
│       ├── debug_*.py      # Debug utilities
│       ├── test_*.py       # Test scripts
│       └── README.md       # Test documentation
├── README.md               # This file (user documentation)
└── pyproject.toml          # Package configuration

📖 Further Reading

User Documentation

.agents/QUICKSTART.md - 60-second quick start guide
.agents/STREAMING_GUIDE.md - Deep dive into streaming implementation
.agents/INSTALL.md - Installation & troubleshooting
Anthropic API Documentation

Development & Internal Docs

.agents/INDEX.md - Navigation guide for .agents/ directory
.agents/tests/README.md - Test & example documentation
.agents/FINGERPRINTING_FIX.md - How API fingerprinting was resolved
.agents/IMPLEMENTATION_SUMMARY.md - Technical architecture
.agents/research/SESSION_MANAGEMENT_REPORT.md - Session/forking architecture

🎉 Migration from v0.1 (Non-Streaming)

# Old (v0.1) - non-streaming only
response = await client.send_message("Hello")
print(response.content[0]['text'])

# New (v0.2) - streaming by default
async for chunk in client.send_message_stream("Hello"):
    if chunk.text_delta:
        print(chunk.text_delta, end='', flush=True)

# Or keep non-streaming behavior
options = AgentOptions(stream=False)
client = ClaudeAgentClient(options=options)
response = await client.send_message("Hello")  # Same as v0.1

⚖️ License

This SDK is a reverse-engineered implementation based on Claude Code CLI traffic analysis. Use at your own discretion.

🙏 Credits

Built through analysis of Claude Code CLI v2.1.7 using mitmproxy traffic captures.

Ready to stream? Check out .agents/tests/streaming_chat.py for a complete working chat app! 🚀

Need help? See .agents/INDEX.md for complete documentation navigation.