# claude-py: Python SDK for Claude API with Streaming Support

**Reverse engineered from Claude Code CLI v2.1.7 with exact API replication**

## ✨ Features

- **🚀 Streaming by default** - Lower latency, better user experience
- **⚡ Real-time text display** - See responses as they're generated
- **🔄 Non-streaming mode** - Available for batch processing
- **🛠️ Full tool support** - All 17 tools from Claude Code
- **💾 Prompt caching** - 90% cost reduction on multi-turn conversations
- **🎯 Exact API replication** - Same headers, metadata, and behavior as Claude Code
- **🔒 OAuth support** - Both OAuth tokens and API keys
- **📊 HTTP/2** - Modern, efficient protocol

## 🚀 Quick Start

### Installation

```bash
# Install with uv (recommended)
uv pip install -e .

# Or with pip
pip install -e .
```

### Basic Usage - Streaming (Recommended)

```python
import asyncio
from claude import ClaudeAgentClient

async def main():
    async with ClaudeAgentClient() as client:
        # Stream response in real-time
        async for chunk in client.send_message_stream("Tell me a joke"):
            if chunk.text_delta:
                print(chunk.text_delta, end='', flush=True)
        print()

asyncio.run(main())
```

**Output:**
```
Why did the programmer quit his job?

Because he didn't get arrays! 😄
```
↑ _Text appears character-by-character as it's generated_

### Non-Streaming Mode

```python
import asyncio
from claude import ClaudeAgentClient, AgentOptions

async def main():
    options = AgentOptions(stream=False)

    async with ClaudeAgentClient(options=options) as client:
        # Get complete response at once
        response = await client.send_message("What is 2+2?")
        print(response.content[0]['text'])

asyncio.run(main())
```

### ChatClient (Simple Chat Completion API)

`ChatClient` provides a lightweight, stateless interface similar to OpenAI's chat
completions. It does **not** replicate Claude Code's full agent protocol (no tools,
no session history, no system prompt injection) but it carries the same
fingerprinting headers so the requests look identical on the wire.

```python
import asyncio
from claude import ChatClient

async def main():
    async with ChatClient() as c:
        # One-shot (non-streaming)
        r = await c.chat("Explain monads in one sentence")
        print(r.content[0]["text"])

        # Streaming
        async for chunk in c.stream("Write a haiku about recursion"):
            if chunk.text_delta:
                print(chunk.text_delta, end="", flush=True)
        print()

        # Streaming, return text only
        text = await c.collect("Say hello")
        print(text)

        # Per-call overrides
        r = await c.chat(
            "Who are you?",
            system="You are a pirate.",
            model="claude-haiku-4-5-20251001",
            max_tokens=256,
        )
        print(r.content[0]["text"])

        # Multi-turn via explicit message list
        r = await c.chat([
            {"role": "user", "content": "My name is Alice"},
            {"role": "assistant", "content": "Hi Alice!"},
            {"role": "user", "content": "What is my name?"},
        ])
        print(r.content[0]["text"])

asyncio.run(main())
```

`ChatClient` reads `ANTHROPIC_API_KEY` or `CLAUDE_CODE_OAUTH_TOKEN` from the
environment, just like `ClaudeAgentClient`.

## 📖 Usage Examples

### 1. Simple Query

```python
from claude import query

# One-shot query
result = await query("Explain quantum computing in one sentence")
print(result.content[0]['text'])
```

### 2. Multi-Turn Conversation

```python
async with ClaudeAgentClient() as client:
    # First message
    async for chunk in client.send_message_stream("What's the capital of France?"):
        if chunk.text_delta:
            print(chunk.text_delta, end='')
    print()

    # Second message (continues conversation)
    async for chunk in client.send_message_stream("What's the population?"):
        if chunk.text_delta:
            print(chunk.text_delta, end='')
    print()
```

### 3. Streaming with Callback

```python
def on_text(text: str):
    """Process text as it arrives."""
    # Update UI, log, analyze, etc.
    print(text, end='', flush=True)

async for chunk in client.send_message_stream("Write a haiku", on_text=on_text):
    # Text already printed by callback
    # Could also process chunks here
    pass
```

### 4. Tool Execution

```python
from claude import ToolResult

async with ClaudeAgentClient() as client:
    # Assistant requests tool use
    async for chunk in client.send_message_stream("Read the file data.txt"):
        if chunk.text_delta:
            print(chunk.text_delta, end='')
        # Detect tool uses in chunks...

    # Execute tool (simplified)
    with open('data.txt', 'r') as f:
        content = f.read()

    # Send tool result back
    tool_result = ToolResult(
        type="tool_result",
        tool_use_id="toolu_xxx",
        content=content
    )

    async for chunk in client.send_message_stream("", tool_results=[tool_result]):
        if chunk.text_delta:
            print(chunk.text_delta, end='')
```

## 📊 Streaming vs Non-Streaming

### Streaming (Default)

✅ **Advantages:**
- Lower perceived latency - text appears immediately
- Better user experience - users can read while generating
- Real-time feedback - see progress as it happens
- Cancelable - can interrupt mid-generation

❌ **Trade-offs:**
- Slightly more complex to handle
- Need async iteration

**Use cases:** Interactive applications, chat UIs, real-time analysis

### Non-Streaming

✅ **Advantages:**
- Simpler code - single await
- Complete message at once
- Easier to cache/store

❌ **Trade-offs:**
- Higher perceived latency - user waits for complete response
- No progress indication
- Can't cancel mid-generation

**Use cases:** Batch processing, automated scripts, testing

## ⚙️ Configuration

### AgentOptions

```python
from claude import AgentOptions

options = AgentOptions(
    # Model selection
    model="claude-sonnet-4-5-20250929",  # sonnet, opus, haiku

    # Token limit
    max_tokens=32000,  # Claude Code default

    # Streaming (default: True)
    stream=True,

    # Tool filtering
    allowed_tools=["Read", "Write", "Bash"],
    disallowed_tools=["WebSearch"],

    # Custom system prompt
    system_prompt="You are a helpful assistant.",
    append_system_prompt="Additional instructions...",

    # Session management
    session_id="custom-uuid",  # Or None for auto-generated

    # Working directory
    cwd="/path/to/workspace",
)

client = ClaudeAgentClient(options=options)
```

## 🔧 Advanced Features

### Session Persistence

```python
# Sessions are client-side - just maintain the client instance
async with ClaudeAgentClient() as client:
    # Message 1
    await client.send_message("Hello")

    # Message 2 (continues conversation)
    await client.send_message("How are you?")

    # All messages stored in client.messages
    print(f"Conversation has {len(client.messages)} messages")
```

### Token Counting

```python
# Count tokens for current conversation
token_count = await client.count_tokens()
print(f"Current conversation uses {token_count} tokens")
```

### Custom Headers

The SDK uses exact headers from Claude Code CLI:
- `anthropic-beta`: claude-code-20250219,oauth-2025-04-20,interleaved-thinking-2025-05-14
- `x-stainless-*`: All Stainless SDK headers
- `user-agent`: claude-cli/2.1.7 (external, cli)

All captured from mitmproxy traffic analysis.

## 📁 Examples & Tests

See `.agents/tests/` directory:

- `streaming_basic.py` - Simple streaming demo
- `streaming_with_callback.py` - Using callbacks for processing
- `compare_streaming_vs_non_streaming.py` - Performance comparison
- `streaming_chat.py` - Full interactive chat app with tools
- `debug_streaming_chat.py` - Comprehensive debug tool

Run examples:

```bash
# Set your API key
export ANTHROPIC_API_KEY="sk-ant-..."

# Run examples
python .agents/tests/streaming_basic.py
python .agents/tests/streaming_chat.py
python .agents/tests/debug_streaming_chat.py
```

## 🔐 Authentication

### API Keys

```bash
export ANTHROPIC_API_KEY="sk-ant-api03-YOUR_KEY_HERE"
```

### OAuth Tokens

```bash
export ANTHROPIC_API_KEY="sk-ant-oat01-YOUR_TOKEN_HERE"
```

Both work identically - no code changes needed!

## 🏗️ Architecture

### Streaming Flow

```
User Request
    ↓
Client.send_message_stream()
    ↓
POST /v1/messages?beta=true
{
  "stream": true,
  "messages": [...],
  ...
}
    ↓
Server-Sent Events (SSE) Stream
    ↓
event: message_start
event: content_block_start
event: content_block_delta  ← Text chunks arrive
event: content_block_delta  ← More text
event: content_block_stop
event: message_stop
    ↓
StreamParser accumulates chunks
    ↓
Yields StreamChunk objects
    ↓
User processes in real-time
```

### Non-Streaming Flow

```
User Request
    ↓
Client.send_message()
    ↓
POST /v1/messages?beta=true
{
  "stream": false,
  "messages": [...],
  ...
}
    ↓
Complete JSON Response
    ↓
AssistantMessage returned
```

## 📚 API Reference

### ClaudeAgentClient

```python
class ClaudeAgentClient:
    async def send_message_stream(
        prompt: str,
        tool_results: Optional[List[ToolResult]] = None,
        on_text: Optional[Callable[[str], None]] = None,
    ) -> AsyncIterator[StreamChunk]:
        """Stream response chunks."""

    async def send_message(
        prompt: str,
        tool_results: Optional[List[ToolResult]] = None,
    ) -> AssistantMessage:
        """Get complete response."""

    async def count_tokens() -> int:
        """Count tokens in conversation."""

    async def close():
        """Close HTTP client."""
```

### StreamChunk

```python
class StreamChunk:
    event_type: str              # "message_start", "content_block_delta", etc.
    data: Dict[str, Any]         # Raw event data
    text_delta: Optional[str]    # Text for this chunk (if text event)
    content_block: Optional[Dict]  # Content block data (if applicable)
```

## 🎯 Streaming Best Practices

1. **Always use streaming for interactive UIs**
   ```python
   # Good - streaming
   async for chunk in client.send_message_stream(user_input):
       display_text(chunk.text_delta)

   # Bad - non-streaming (user waits)
   response = await client.send_message(user_input)
   display_text(response.content[0]['text'])
   ```

2. **Use callbacks for real-time processing**
   ```python
   def process_chunk(text):
       update_ui(text)
       log_to_file(text)
       analyze_sentiment(text)

   async for chunk in client.send_message_stream(msg, on_text=process_chunk):
       pass
   ```

3. **Handle errors gracefully**
   ```python
   try:
       async for chunk in client.send_message_stream(msg):
           if chunk.text_delta:
               print(chunk.text_delta, end='')
   except httpx.HTTPError as e:
       print(f"\nError: {e}")
   ```

4. **Accumulate for complete message**
   ```python
   from claude.streaming import StreamParser

   parser = StreamParser()
   async for chunk in client.send_message_stream(msg):
       parser.add_chunk(chunk)
       print(chunk.text_delta, end='')

   complete_message = parser.to_dict()
   save_to_database(complete_message)
   ```

## 🚦 Performance

### Latency Comparison

| Mode | Time to First Byte | Total Time | User Experience |
|------|-------------------|------------|-----------------|
| Streaming | ~0.2s | 2.5s | ✅ Sees text immediately |
| Non-streaming | 2.5s | 2.5s | ❌ Waits 2.5s for anything |

**Streaming wins for UX!** Users can start reading while the rest generates.

### Cost

Both modes cost the same per token. Use streaming for better UX at no extra cost!

## 🐛 Troubleshooting

### Import Error: No module named 'httpx_sse'

```bash
# Install with http2 and sse support
uv pip install "httpx[http2]>=0.27.0" "httpx-sse>=0.4.0"
```

### Streaming Not Working

Check that `stream=True` in options:
```python
options = AgentOptions(stream=True)
```

### Text Not Appearing Real-Time

Ensure you're flushing output:
```python
print(chunk.text_delta, end='', flush=True)  # flush=True is important!
```

## 📁 Project Structure

```
claude-py/
├── src/claude/              # Core SDK implementation
│   ├── client.py           # Main client with streaming support
│   ├── streaming.py        # SSE parser and StreamParser
│   ├── types.py            # Type definitions
│   ├── tools.json          # All 17 tools from Claude Code
│   └── system_prompt.json  # 13KB system prompt from Claude Code
├── .agents/                # Self-generated documentation & tests
│   ├── *.md                # All internal documentation
│   ├── INDEX.md            # Directory navigation guide
│   ├── verify_installation.py  # Verify SDK setup
│   ├── reinstall.sh        # Reinstall with latest fixes
│   ├── data/               # Analysis assets
│   │   ├── *.mitm          # mitmproxy traffic captures
│   │   ├── real_system_prompt.json
│   │   └── real_tools.json
│   ├── research/           # Research notes & analysis
│   └── tests/              # All examples, tests & debug tools
│       ├── streaming_*.py  # Streaming examples
│       ├── debug_*.py      # Debug utilities
│       ├── test_*.py       # Test scripts
│       └── README.md       # Test documentation
├── README.md               # This file (user documentation)
└── pyproject.toml          # Package configuration
```

## 📖 Further Reading

### User Documentation
- [.agents/QUICKSTART.md](.agents/QUICKSTART.md) - 60-second quick start guide
- [.agents/STREAMING_GUIDE.md](.agents/STREAMING_GUIDE.md) - Deep dive into streaming implementation
- [.agents/INSTALL.md](.agents/INSTALL.md) - Installation & troubleshooting
- [Anthropic API Documentation](https://docs.anthropic.com/claude/reference/messages-streaming)

### Development & Internal Docs
- [.agents/INDEX.md](.agents/INDEX.md) - Navigation guide for .agents/ directory
- [.agents/tests/README.md](.agents/tests/README.md) - Test & example documentation
- [.agents/FINGERPRINTING_FIX.md](.agents/FINGERPRINTING_FIX.md) - How API fingerprinting was resolved
- [.agents/IMPLEMENTATION_SUMMARY.md](.agents/IMPLEMENTATION_SUMMARY.md) - Technical architecture
- [.agents/research/SESSION_MANAGEMENT_REPORT.md](.agents/research/SESSION_MANAGEMENT_REPORT.md) - Session/forking architecture

## 🎉 Migration from v0.1 (Non-Streaming)

```python
# Old (v0.1) - non-streaming only
response = await client.send_message("Hello")
print(response.content[0]['text'])

# New (v0.2) - streaming by default
async for chunk in client.send_message_stream("Hello"):
    if chunk.text_delta:
        print(chunk.text_delta, end='', flush=True)

# Or keep non-streaming behavior
options = AgentOptions(stream=False)
client = ClaudeAgentClient(options=options)
response = await client.send_message("Hello")  # Same as v0.1
```

## ⚖️ License

This SDK is a reverse-engineered implementation based on Claude Code CLI traffic analysis. Use at your own discretion.

## 🙏 Credits

Built through analysis of Claude Code CLI v2.1.7 using mitmproxy traffic captures.

---

**Ready to stream?** Check out `.agents/tests/streaming_chat.py` for a complete working chat app! 🚀

**Need help?** See [.agents/INDEX.md](.agents/INDEX.md) for complete documentation navigation.