9 files changed, 1257 insertions, 66 deletions
diff --git a/packages/multillm-agentwrap/README.md b/packages/multillm-agentwrap/README.md
new file mode 100644
index 0000000..2e0c27c
--- /dev/null
+++ b/packages/multillm-agentwrap/README.md
@@ -0,0 +1,349 @@
+# multillm-agentwrap
+
+Agent wrapper provider for multillm - wraps chat providers with agentic capabilities.
+
+## Overview
+
+The `agentwrap` provider allows you to use any chat provider (OpenAI, Google, Anthropic, etc.) with agentic capabilities including:
+
+- **Tool execution loop**: Automatically executes tools and sends results back
+- **Conversation history management**: Maintains context across tool calls
+- **Multi-turn interactions**: Continues until task is complete or max turns reached
+
+## Installation
+
+```bash
+pip install multillm-agentwrap
+```
+
+Or with uv in a workspace:
+
+```bash
+uv add multillm-agentwrap
+```
+
+## Usage
+
+### Basic Usage
+
+```python
+import asyncio
+import multillm
+
+async def main():
+    client = multillm.Client()
+
+    # Wrap any chat model with agentic capabilities
+    async for msg in client.run("agentwrap/openai/gpt-4", "Hello!"):
+        if msg.type == "text":
+            print(msg.content)
+
+asyncio.run(main())
+```
+
+### With Tools
+
+```python
+import asyncio
+import multillm
+
+# Define a custom tool
+calculate_tool = multillm.Tool(
+    name="calculate",
+    description="Perform a calculation",
+    parameters={
+        "type": "object",
+        "properties": {
+            "expression": {"type": "string", "description": "Math expression"}
+        },
+        "required": ["expression"]
+    },
+    handler=lambda args: {"result": eval(args["expression"])}
+)
+
+async def main():
+    client = multillm.Client()
+
+    # Use with tools
+    async for msg in client.run(
+        "agentwrap/google/gemini-pro",
+        "What's 25 * 4?",
+        tools=[calculate_tool]
+    ):
+        if msg.type == "text":
+            print(msg.content)
+        elif msg.type == "tool_use":
+            print(f"  → Using tool: {msg.tool_name}")
+        elif msg.type == "tool_result":
+            print(f"  ← Result: {msg.tool_result}")
+
+asyncio.run(main())
+```
+
+### With Options
+
+```python
+from multillm import AgentOptions
+
+async def main():
+    client = multillm.Client()
+
+    options = AgentOptions(
+        max_turns=5,
+        system_prompt="You are a helpful assistant.",
+        temperature=0.7
+    )
+
+    async for msg in client.run(
+        "agentwrap/anthropic/claude-3-5-sonnet-20241022",
+        "Explain quantum computing",
+        options=options
+    ):
+        if msg.type == "text":
+            print(msg.content)
+
+asyncio.run(main())
+```
+
+## Supported Chat Providers
+
+Any chat provider supported by multillm can be wrapped:
+
+- `agentwrap/openai/gpt-4` - OpenAI GPT-4
+- `agentwrap/openai/gpt-4-turbo` - OpenAI GPT-4 Turbo
+- `agentwrap/openai/gpt-3.5-turbo` - OpenAI GPT-3.5 Turbo
+- `agentwrap/google/gemini-pro` - Google Gemini Pro
+- `agentwrap/google/gemini-1.5-pro` - Google Gemini 1.5 Pro
+- `agentwrap/anthropic/claude-3-5-sonnet-20241022` - Anthropic Claude 3.5 Sonnet
+- `agentwrap/openrouter/...` - Any OpenRouter model
+
+## Model Format
+
+The model string follows the format:
+
+```
+agentwrap/<chat-provider>/<model-name>
+```
+
+Where:
+- `agentwrap` - The agent wrapper provider
+- `<chat-provider>` - The chat provider to wrap (openai, google, anthropic, openrouter)
+- `<model-name>` - The specific model from that provider
+
+## How It Works
+
+1. **Receives prompt**: User sends initial message
+2. **Calls chat API**: Uses the wrapped chat provider via `chat_complete()`
+3. **Returns response**: If no tool calls, returns text and stops
+4. **Executes tools**: If tool calls present, executes them with provided handlers
+5. **Continues loop**: Sends tool results back and gets next response
+6. **Repeats**: Steps 3-5 until no more tool calls or max turns reached
+
+## Configuration
+
+Configure the wrapped provider via multillm config:
+
+```python
+config = {
+    "openai": {"api_key": "sk-..."},
+    "google": {"api_key": "..."},
+    "agentwrap": {
+        "max_turns": 10  # Default max turns if not specified in options
+    }
+}
+
+client = multillm.Client(config)
+```
+
+## Agent Options
+
+All `AgentOptions` are supported:
+
+```python
+from multillm import AgentOptions
+
+options = AgentOptions(
+    system_prompt="Custom system prompt",
+    max_turns=15,           # Max tool execution iterations
+    temperature=0.8,        # Sampling temperature
+    max_tokens=2000,        # Max tokens to generate
+)
+```
+
+## Message Types
+
+The agent yields different message types during execution:
+
+### System Message
+```python
+AgentMessage(
+    type="system",
+    content="Agentic session started",
+)
+```
+
+### Text Message
+```python
+AgentMessage(
+    type="text",
+    content="The answer is 42",
+    raw=<original response object>
+)
+```
+
+### Tool Use Message
+```python
+AgentMessage(
+    type="tool_use",
+    tool_name="calculate",
+    tool_input={"expression": "6*7"},
+    raw=<tool call object>
+)
+```
+
+### Tool Result Message
+```python
+AgentMessage(
+    type="tool_result",
+    tool_name="calculate",
+    tool_result="42",
+    raw=<result dict>
+)
+```
+
+### Result Message
+```python
+AgentMessage(
+    type="result",
+    content="Final answer",
+)
+```
+
+## Comparison with Native Agent Providers
+
+### AgentWrap (This Provider)
+- ✅ Works with any chat provider
+- ✅ Simple tool execution loop
+- ✅ Full control over chat API settings
+- ❌ No built-in tools (must provide custom tools)
+- ❌ No file system access
+- ❌ More basic agentic capabilities
+
+### Native Agent Providers (e.g., Claude)
+- ✅ Advanced agentic capabilities
+- ✅ Built-in tools (Bash, Read, Write, etc.)
+- ✅ File system access
+- ✅ Plan mode, interactive sessions
+- ❌ Limited to specific providers
+
+## Use Cases
+
+### When to Use AgentWrap
+
+- **Different models**: Want agentic behavior with OpenAI, Google, or other chat models
+- **Custom tools**: Need specific tool implementations
+- **Simple workflows**: Basic tool calling without file system access
+- **Cost optimization**: Use cheaper chat models with agentic capabilities
+
+### When to Use Native Agents
+
+- **File operations**: Need to read/write files, run commands
+- **Complex workflows**: Multi-step tasks requiring planning
+- **Built-in tools**: Want Bash, Read, Write, Grep, etc.
+- **Claude-specific**: Need Claude's advanced agentic features
+
+## Limitations
+
+1. **No built-in tools**: Must provide all tools yourself (unlike Claude agent which has Bash, Read, Write, etc.)
+2. **No file system access**: Can't read/write files unless you implement those tools
+3. **No interactive mode**: Single-shot sessions only (no `run_interactive`)
+4. **Tool handlers required**: Tools must have Python handler functions
+
+## Examples
+
+### Calculator Agent
+
+```python
+import asyncio
+import multillm
+
+calculate = multillm.Tool(
+    name="calculate",
+    description="Evaluate a mathematical expression",
+    parameters={
+        "type": "object",
+        "properties": {
+            "expression": {"type": "string"}
+        },
+        "required": ["expression"]
+    },
+    handler=lambda args: {"result": eval(args["expression"])}
+)
+
+async def main():
+    client = multillm.Client()
+
+    async for msg in client.run(
+        "agentwrap/openai/gpt-4",
+        "What's (125 + 75) * 3?",
+        tools=[calculate]
+    ):
+        if msg.type == "text":
+            print(msg.content)
+
+asyncio.run(main())
+```
+
+### Multi-Tool Agent
+
+```python
+import asyncio
+import multillm
+from datetime import datetime
+
+get_time = multillm.Tool(
+    name="get_current_time",
+    description="Get the current time",
+    parameters={"type": "object", "properties": {}},
+    handler=lambda args: {"time": datetime.now().isoformat()}
+)
+
+get_weather = multillm.Tool(
+    name="get_weather",
+    description="Get weather for a location",
+    parameters={
+        "type": "object",
+        "properties": {
+            "location": {"type": "string"}
+        },
+        "required": ["location"]
+    },
+    handler=lambda args: {"temp": 72, "condition": "sunny"}
+)
+
+async def main():
+    client = multillm.Client()
+
+    async for msg in client.run(
+        "agentwrap/google/gemini-pro",
+        "What time is it and what's the weather in Tokyo?",
+        tools=[get_time, get_weather]
+    ):
+        if msg.type == "text":
+            print(msg.content)
+
+asyncio.run(main())
+```
+
+## License
+
+MIT
+
+## Contributing
+
+Contributions welcome! Please see the main multillm repository for guidelines.
+
+## See Also
+
+- [multillm](https://github.com/yourusername/multillm) - Main library
+- [multillm-claude](https://github.com/yourusername/multillm-claude) - Claude agent provider
diff --git a/packages/multillm-agentwrap/pyproject.toml b/packages/multillm-agentwrap/pyproject.toml
new file mode 100644
index 0000000..3713db2
--- /dev/null
+++ b/packages/multillm-agentwrap/pyproject.toml
@@ -0,0 +1,16 @@
+[project]
+name = "multillm-agentwrap"
+version = "0.1.0"
+description = "Agent wrapper provider for multillm - wraps chat providers with agentic capabilities"
+readme = "README.md"
+requires-python = ">=3.10"
+dependencies = [
+    "multillm>=0.1.0",
+]
+
+[project.entry-points."multillm.providers"]
+agentwrap = "multillm_agentwrap:AgentWrapProvider"
+
+[build-system]
+requires = ["hatchling"]
+build-backend = "hatchling.build"
diff --git a/packages/multillm-agentwrap/src/multillm_agentwrap/__init__.py b/packages/multillm-agentwrap/src/multillm_agentwrap/__init__.py
new file mode 100644
index 0000000..64e7203
--- /dev/null
+++ b/packages/multillm-agentwrap/src/multillm_agentwrap/__init__.py
@@ -0,0 +1,20 @@
+"""
+Agent wrapper provider for multillm.
+
+Wraps chat providers with agentic capabilities including:
+- Tool execution loop
+- Conversation history management
+- Multi-turn interactions
+
+Usage:
+    # Wrap any chat provider with agentic capabilities
+    client = multillm.Client()
+
+    # Use agentwrap/ prefix to make any chat model agentic
+    async for msg in client.run("agentwrap/google/gemini", "Hello"):
+        print(msg)
+"""
+
+from .provider import AgentWrapProvider as Provider
+
+__all__ = ["Provider", "AgentWrapProvider"]
diff --git a/packages/multillm-agentwrap/src/multillm_agentwrap/provider.py b/packages/multillm-agentwrap/src/multillm_agentwrap/provider.py
new file mode 100644
index 0000000..52f9ff7
--- /dev/null
+++ b/packages/multillm-agentwrap/src/multillm_agentwrap/provider.py
@@ -0,0 +1,281 @@
+"""
+Agent wrapper provider implementation.
+
+Wraps chat providers to provide agentic capabilities.
+"""
+
+import sys
+from typing import Any, AsyncIterator
+
+from multillm import (
+    BaseAgentProvider,
+    AgentMessage,
+    AgentOptions,
+    Tool,
+    ProviderError,
+    load_provider_config,
+    merge_config,
+)
+
+
+class AgentWrapProvider(BaseAgentProvider):
+    """
+    Agent wrapper provider that wraps chat providers with agentic capabilities.
+
+    The model parameter should be the chat provider and model to wrap.
+    For example, when using "agentwrap/google/gemini":
+    - Provider: "agentwrap"
+    - Model: "google/gemini" (passed to this provider)
+
+    This provider will:
+    1. Use the specified chat provider internally via chat_complete()
+    2. Implement tool execution loop
+    3. Manage conversation history
+    4. Provide agentic multi-turn interactions
+
+    Usage:
+        # Via client
+        client = multillm.Client()
+        async for msg in client.agent_run("agentwrap/google/gemini", "Hello"):
+            print(msg)
+
+        # With tools
+        async for msg in client.agent_run(
+            "agentwrap/openai/gpt-4",
+            "What's 2+2?",
+            tools=[calculate_tool],
+        ):
+            print(msg)
+    """
+
+    PROVIDER_NAME = "agentwrap"
+
+    def __init__(self, config: dict[str, Any] | None = None):
+        super().__init__(config)
+        self._client = None
+
+    def _get_client(self):
+        """Get or create client instance for making chat API calls."""
+        if self._client is None:
+            # Import here to avoid circular dependency
+            from multillm import Client
+            self._client = Client()
+        return self._client
+
+    def _build_options(self, options: AgentOptions | None) -> dict[str, Any]:
+        """Build options dict for wrapped provider."""
+        if options is None:
+            return {}
+
+        opts = {}
+        if options.system_prompt:
+            opts["system_prompt"] = options.system_prompt
+
+        # Merge with extra options (temperature, max_tokens, etc.)
+        if options.extra:
+            opts.update(options.extra)
+
+        return opts
+
+    async def _execute_tool(
+        self,
+        tool_call: dict,
+        tools: list[Tool] | None,
+    ) -> dict:
+        """
+        Execute a tool call and return the result.
+
+        Args:
+            tool_call: Tool call from chat response (OpenAI format)
+            tools: List of available tools with handlers
+
+        Returns:
+            Tool result dict with 'content' key
+        """
+        function_name = tool_call["function"]["name"]
+        function_args = tool_call["function"].get("arguments", {})
+
+        # Find the tool with matching name
+        if tools:
+            for tool in tools:
+                if tool.name == function_name:
+                    # Execute the tool handler
+                    try:
+                        result = tool.handler(function_args)
+                        # Handle async handlers
+                        if hasattr(result, "__await__"):
+                            result = await result
+
+                        # Return formatted result
+                        return {"content": str(result)}
+
+                    except Exception as e:
+                        return {
+                            "content": f"Error executing tool: {e}",
+                            "is_error": True
+                        }
+
+        # Tool not found or no handlers
+        return {
+            "content": f"Tool '{function_name}' not found",
+            "is_error": True
+        }
+
+    async def run(
+        self,
+        prompt: str,
+        options: AgentOptions | None = None,
+        tools: list[Tool] | None = None,
+    ) -> AsyncIterator[AgentMessage]:
+        """
+        Run agentic workflow with the wrapped chat provider.
+
+        Args:
+            prompt: User message to send
+            options: Agent options (max_turns, system_prompt, etc.)
+            tools: Optional tools the agent can use
+
+        Yields:
+            AgentMessage objects representing the agent's actions and responses
+        """
+        # Yield session start message
+        yield AgentMessage(
+            type="system",
+            content="Agentic session started",
+            raw=None,
+        )
+
+        # Get wrapped model from config
+        # When client routes "agentwrap/google/gemini", we receive "google/gemini" as model
+        file_config = load_provider_config(self.PROVIDER_NAME)
+        merged_config = merge_config(file_config, self.config, {})
+        wrapped_model = merged_config.get("wrapped_model")
+
+        if not wrapped_model:
+            raise ProviderError(
+                "AgentWrap provider requires 'wrapped_model' in config. "
+                "When using via client, the model should be specified as 'agentwrap/provider/model'."
+            )
+
+        # Build options for chat API
+        chat_options = self._build_options(options)
+
+        # Get max turns
+        max_turns = options.max_turns if options and options.max_turns else 10
+
+        # Initialize conversation history
+        messages = []
+
+        # Add system prompt if provided
+        if options and options.system_prompt:
+            messages.append({
+                "role": "system",
+                "content": options.system_prompt
+            })
+
+        # Add user message
+        messages.append({
+            "role": "user",
+            "content": prompt
+        })
+
+        # Get client
+        client = self._get_client()
+
+        # Tool execution loop
+        final_text = ""
+        for turn in range(max_turns):
+            try:
+                # Call chat_complete with wrapped model
+                response = await client.chat_complete(
+                    wrapped_model,
+                    messages,
+                    tools=tools,
+                    **chat_options
+                )
+
+                # Get text from response
+                text = response.choices[0].message.content or ""
+                tool_calls = response.choices[0].message.tool_calls or []
+
+                # Add assistant message to history
+                messages.append({
+                    "role": "assistant",
+                    "content": text,
+                    "tool_calls": tool_calls if tool_calls else None
+                })
+
+                # Yield text message if present
+                if text:
+                    final_text = text
+                    yield AgentMessage(
+                        type="text",
+                        content=text,
+                        raw=response,
+                    )
+
+                # Check if we're done (no tool calls)
+                if not tool_calls:
+                    break
+
+                # Process tool calls
+                for tool_call in tool_calls:
+                    # Yield tool use message
+                    yield AgentMessage(
+                        type="tool_use",
+                        tool_name=tool_call["function"]["name"],
+                        tool_input=tool_call["function"].get("arguments", {}),
+                        raw=tool_call,
+                    )
+
+                    # Execute tool if handler available
+                    tool_result = await self._execute_tool(tool_call, tools)
+
+                    # Yield tool result message
+                    yield AgentMessage(
+                        type="tool_result",
+                        tool_name=tool_call["function"]["name"],
+                        tool_result=tool_result["content"],
+                        raw=tool_result,
+                    )
+
+                    # Add tool result to message history
+                    messages.append({
+                        "role": "tool",
+                        "tool_call_id": tool_call["id"],
+                        "name": tool_call["function"]["name"],
+                        "content": tool_result["content"]
+                    })
+
+            except Exception as e:
+                # Yield error and stop
+                error_msg = f"Error in agentic loop: {e}"
+                print(f"\n{error_msg}", file=sys.stderr)
+                yield AgentMessage(
+                    type="error",
+                    content=error_msg,
+                    raw=e,
+                )
+                raise ProviderError(error_msg) from e
+
+        # Yield final result
+        yield AgentMessage(
+            type="result",
+            content=final_text,
+            raw=None,
+        )
+
+    async def run_interactive(
+        self,
+        options: AgentOptions | None = None,
+        tools: list[Tool] | None = None,
+    ):
+        """
+        Interactive sessions not yet implemented for agentwrap.
+
+        Use multiple calls to run() instead.
+        """
+        raise NotImplementedError(
+            "Interactive sessions not yet implemented for agentwrap provider. "
+            "Use multiple calls to run() for multi-turn conversations."
+        )
diff --git a/packages/multillm-claude/README.md b/packages/multillm-claude/README.md
index a2e242b..e5a47ad 100644
--- a/packages/multillm-claude/README.md
+++ b/packages/multillm-claude/README.md
@@ -112,6 +112,128 @@ Specify the model after the provider prefix:
 - `claude/claude-sonnet-4-20250514` - Claude Sonnet
 - `claude/claude-opus-4-20250514` - Claude Opus
 
+## Tool Support
+
+### Supported Tools
+
+These tools work correctly with the Claude Agent SDK provider:
+
+| Tool | Description | Permission Required |
+|------|-------------|---------------------|
+| `Bash` | Execute bash commands | Yes |
+| `Read` | Read files from filesystem | No |
+| `Write` | Create or overwrite files | Yes |
+| `Edit` | Edit existing files | Yes |
+| `Glob` | Find files by pattern | No |
+| `Grep` | Search file contents | No |
+| `Task` | Launch sub-agents | Varies |
+| `WebFetch` | Fetch web content | No |
+| `WebSearch` | Search the web | No |
+| `NotebookEdit` | Edit Jupyter notebooks | Yes |
+| `KillShell` | Kill background shells | Yes |
+| `EnterPlanMode` | Enter planning mode | No |
+| `ExitPlanMode` | Exit planning mode | No |
+
+### Interactive Tools
+
+| Tool | CLI Support | Notes |
+|------|-------------|-------|
+| `AskUserQuestion` | ✅ **Auto-converts** | CLI automatically uses custom `ask_user` tool |
+| Custom tools | ✅ Full support | Provide Tool objects with interactive handlers |
+
+**About Interactive Tools with Claude:**
+
+The Claude Agent SDK's built-in `AskUserQuestion` runs in a subprocess and can't access our terminal's stdin/stdout for interactive prompting. To solve this, **multillm-cli automatically provides a custom interactive tool** when you request `AskUserQuestion`.
+
+**What happens:**
+
+When you use `--allowed-tools AskUserQuestion`, the CLI:
+1. Removes the built-in AskUserQuestion (which doesn't work interactively)
+2. Adds a custom `ask_user` tool with an interactive handler
+3. The agent uses this tool instead, with full interactive support
+
+**Usage:**
+
+```bash
+# Request AskUserQuestion - CLI auto-provides working alternative
+multillm -m claude/default \
+  -p "Ask me about my preferences and create a summary" \
+  --allowed-tools AskUserQuestion \
+  --permission-mode acceptEdits
+```
+
+**What you'll see:**
+```
+ℹ️  Using custom 'ask_user' tool instead of AskUserQuestion for interactive prompting
+
+======================================================================
+❓ QUESTION FROM ASSISTANT
+======================================================================
+
+What is your favorite programming language?
+
+Suggested options:
+  1. Python
+  2. JavaScript
+  3. Rust
+
+Your answer: 1
+======================================================================
+```
+
+**Programmatic usage:**
+
+For programmatic use, provide your own `ask_user` tool with an interactive handler:
+
+```python
+import multillm
+
+# Define interactive ask_user tool
+ask_user_tool = multillm.Tool(
+    name="ask_user",
+    description="Ask the user a question",
+    parameters={
+        "type": "object",
+        "properties": {
+            "question": {"type": "string"},
+            "options": {"type": "array", "items": {"type": "string"}}
+        },
+        "required": ["question"]
+    },
+    handler=lambda args: {
+        "answer": input(f"\n{args['question']}\nYour answer: ")
+    }
+)
+
+# Use with Claude
+async for msg in client.run(
+    "claude/default",
+    "Ask me questions",
+    options=multillm.AgentOptions(max_turns=10),
+    tools=[ask_user_tool]  # Provide custom tool
+):
+    if msg.type == "text":
+        print(msg.content)
+```
+
+**Why not use the built-in AskUserQuestion?**
+
+The SDK's built-in `AskUserQuestion` is designed for Claude Code CLI's interactive mode where the subprocess has special stdin/stdout handling. In multillm, this doesn't work because:
+- The SDK subprocess can't access our terminal
+- We can't intercept and respond to built-in tool calls
+- The tool returns an error instead of prompting
+
+**Solution:** Use custom tools (which the CLI provides automatically!)
+
+**Comparison:**
+
+| Approach | Works? | How |
+|----------|--------|-----|
+| `--allowed-tools AskUserQuestion` | ✅ Yes | CLI auto-converts to custom tool |
+| Custom `ask_user` Tool | ✅ Yes | Provide Tool object with handler |
+| SDK built-in (direct) | ❌ No | Subprocess can't access stdin/stdout |
+| Chat provider `ask_user` | ✅ Yes | Via agentwrap |
+
 ## Agent Options
 
 ```python
@@ -135,3 +257,69 @@ When streaming with `client.run()`:
 | `tool_result` | Result from a tool |
 | `result` | Final result |
 | `system` | System messages |
+
+## Debugging
+
+### Enable Debug Mode
+
+Set the `MULTILLM_DEBUG` environment variable to see detailed error information and SDK options:
+
+```bash
+export MULTILLM_DEBUG=1
+python your_script.py
+```
+
+This will show:
+- Detailed error messages with full stderr/stdout
+- SDK configuration options
+- Full Python tracebacks
+
+### Common Issues
+
+**Error: "Command failed with exit code 1"**
+
+The provider now captures and displays all available error information to stderr. Look for:
+
+```
+======================================================================
+CLAUDE AGENT SDK ERROR
+======================================================================
+Error: <detailed error message>
+
+Error Details:
+  stderr: <actual error from subprocess>
+  stdout: <subprocess output>
+  exit_code: 1
+======================================================================
+```
+
+**Authentication Errors:**
+```bash
+# Check authentication status
+claude login --check
+
+# Re-authenticate if needed
+claude login
+```
+
+**Permission Errors:**
+
+Always specify `permission_mode` when using tools:
+
+```python
+result = await client.single(
+    "claude/default",
+    "List files",
+    allowed_tools=["Bash"],
+    permission_mode="acceptEdits"  # Required!
+)
+```
+
+### Full Debugging Guide
+
+See [DEBUGGING.md](./DEBUGGING.md) for comprehensive debugging information, including:
+- How to read error messages
+- Common issues and solutions
+- Debug mode usage
+- Testing error handling
+- Reporting bugs
diff --git a/packages/multillm-claude/src/multillm_claude/provider.py b/packages/multillm-claude/src/multillm_claude/provider.py
index 051f845..f74564e 100644
--- a/packages/multillm-claude/src/multillm_claude/provider.py
+++ b/packages/multillm-claude/src/multillm_claude/provider.py
@@ -1,4 +1,5 @@
 import os
+import sys
 from typing import Any, AsyncIterator
 
 from claude_agent_sdk import (
@@ -67,6 +68,17 @@ class ClaudeAgentProvider(BaseAgentProvider):
         if merged_config.get("api_key"):
             env["ANTHROPIC_API_KEY"] = merged_config["api_key"]
 
+        # Enable debug mode if requested
+        if os.environ.get("MULTILLM_DEBUG") or os.environ.get("DEBUG"):
+            env["DEBUG"] = "1"
+            print(f"[DEBUG] Claude Agent SDK options:", file=sys.stderr)
+            print(f"  System prompt: {options.system_prompt if options else None}", file=sys.stderr)
+            print(f"  Max turns: {options.max_turns if options else None}", file=sys.stderr)
+            print(f"  Allowed tools: {options.allowed_tools if options else None}", file=sys.stderr)
+            print(f"  Permission mode: {options.permission_mode if options else None}", file=sys.stderr)
+            print(f"  Working dir: {options.working_directory if options else None}", file=sys.stderr)
+            print(f"  Environment vars: {list(env.keys())}", file=sys.stderr)
+
         if options is None:
             return ClaudeAgentOptions(env=env) if env else ClaudeAgentOptions()
 
@@ -170,16 +182,85 @@ class ClaudeAgentProvider(BaseAgentProvider):
                     yield parsed
 
         except ProcessError as e:
-            error_msg = f"Claude Agent SDK process error: {e}"
-            if hasattr(e, 'stderr') and e.stderr:
-                error_msg += f"\nStderr: {e.stderr}"
-            if hasattr(e, 'stdout') and e.stdout:
-                error_msg += f"\nStdout: {e.stdout}"
-            raise ProviderError(error_msg) from e
+            # Build detailed error message
+            error_parts = [f"Claude Agent SDK process error: {e}"]
+
+            # Print to stderr immediately so user sees it
+            print(f"\n{'='*70}", file=sys.stderr)
+            print("CLAUDE AGENT SDK ERROR", file=sys.stderr)
+            print(f"{'='*70}", file=sys.stderr)
+            print(f"Error: {e}", file=sys.stderr)
+
+            # Collect all available error information
+            error_info = {}
+            for attr in ['stderr', 'stdout', 'exit_code', 'command', 'output', 'message', 'args']:
+                if hasattr(e, attr):
+                    val = getattr(e, attr)
+                    if val:
+                        error_info[attr] = val
+
+            # Print all error details to stderr
+            if error_info:
+                print("\nError Details:", file=sys.stderr)
+                for key, val in error_info.items():
+                    print(f"  {key}: {val}", file=sys.stderr)
+                    # Also add to error message
+                    error_parts.append(f"{key}: {val}")
+
+            # Check exception's __dict__ for any other attributes
+            if hasattr(e, '__dict__'):
+                other_attrs = {k: v for k, v in e.__dict__.items() if k not in error_info and not k.startswith('_')}
+                if other_attrs:
+                    print("\nAdditional Info:", file=sys.stderr)
+                    for key, val in other_attrs.items():
+                        print(f"  {key}: {val}", file=sys.stderr)
+                        error_parts.append(f"{key}: {val}")
+
+            print(f"{'='*70}\n", file=sys.stderr)
+
+            raise ProviderError("\n".join(error_parts)) from e
+
         except ClaudeSDKError as e:
-            raise ProviderError(f"Claude Agent SDK error: {e}") from e
+            # Print to stderr immediately
+            print(f"\n{'='*70}", file=sys.stderr)
+            print("CLAUDE SDK ERROR", file=sys.stderr)
+            print(f"{'='*70}", file=sys.stderr)
+            print(f"Error: {e}", file=sys.stderr)
+
+            # Get all attributes from the error
+            error_parts = [f"Claude Agent SDK error: {e}"]
+            if hasattr(e, '__dict__'):
+                for key, val in e.__dict__.items():
+                    if not key.startswith('_') and val:
+                        print(f"  {key}: {val}", file=sys.stderr)
+                        error_parts.append(f"{key}: {val}")
+
+            print(f"{'='*70}\n", file=sys.stderr)
+
+            raise ProviderError("\n".join(error_parts)) from e
+
         except Exception as e:
-            raise ProviderError(f"Unexpected error: {e}") from e
+            # Print unexpected errors to stderr
+            print(f"\n{'='*70}", file=sys.stderr)
+            print("UNEXPECTED ERROR IN CLAUDE PROVIDER", file=sys.stderr)
+            print(f"{'='*70}", file=sys.stderr)
+            print(f"Type: {type(e).__name__}", file=sys.stderr)
+            print(f"Error: {e}", file=sys.stderr)
+
+            if hasattr(e, '__dict__'):
+                print("\nError attributes:", file=sys.stderr)
+                for key, val in e.__dict__.items():
+                    if not key.startswith('_'):
+                        print(f"  {key}: {val}", file=sys.stderr)
+
+            # Print full traceback
+            import traceback
+            print("\nTraceback:", file=sys.stderr)
+            traceback.print_exc(file=sys.stderr)
+
+            print(f"{'='*70}\n", file=sys.stderr)
+
+            raise ProviderError(f"Unexpected error: {type(e).__name__}: {e}") from e
 
     async def run_interactive(
         self,
diff --git a/packages/multillm-cli/README.md b/packages/multillm-cli/README.md
index 5f9ec94..3de6cc9 100644
--- a/packages/multillm-cli/README.md
+++ b/packages/multillm-cli/README.md
@@ -115,6 +115,37 @@ When using chat providers (OpenAI, Anthropic, Gemini, OpenRouter), you can enabl
 | `calculate` | Perform mathematical calculations | `--use-tools calculate` |
 | `get_current_time` | Get current date and time | `--use-tools get_current_time` |
 | `get_weather` | Get weather information (mock data) | `--use-tools get_weather` |
+| `ask_user` | Ask the user a question interactively | `--use-tools ask_user` |
+
+### Interactive Tools
+
+The `ask_user` tool allows the model to ask you questions during execution and collect your responses. This enables truly interactive conversations where the model can clarify requirements, gather preferences, or get additional information.
+
+**Example:**
+```bash
+multillm -m openai/gpt-4o \
+  -p "Help me choose a programming language for my project by asking about my requirements" \
+  --use-tools ask_user
+```
+
+When the model calls `ask_user`, you'll see:
+```
+======================================================================
+❓ QUESTION FROM ASSISTANT
+======================================================================
+
+What type of project are you building?
+
+Suggested options:
+  1. Web application
+  2. Desktop application
+  3. Data science
+  4. Command-line tool
+
+You can select a number or provide your own answer.
+
+Your answer: _
+```
 
 ### Tool Output
 
diff --git a/packages/multillm-cli/src/multillm_cli/main.py b/packages/multillm-cli/src/multillm_cli/main.py
index b450b71..a38ebe6 100644
--- a/packages/multillm-cli/src/multillm_cli/main.py
+++ b/packages/multillm-cli/src/multillm_cli/main.py
@@ -65,6 +65,28 @@ BUILTIN_TOOLS = {
                 "required": ["location"]
             }
         }
+    },
+    "ask_user": {
+        "type": "function",
+        "function": {
+            "name": "ask_user",
+            "description": "Ask the user a question and get their response. Use this when you need user input or clarification.",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "question": {
+                        "type": "string",
+                        "description": "The question to ask the user"
+                    },
+                    "options": {
+                        "type": "array",
+                        "items": {"type": "string"},
+                        "description": "Optional list of suggested answers (user can still provide their own)"
+                    }
+                },
+                "required": ["question"]
+            }
+        }
     }
 }
 
@@ -118,10 +140,59 @@ def get_weather(location: str, unit: str = "celsius") -> dict:
     }
 
 
+def ask_user(question: str, options: list[str] = None) -> dict:
+    """
+    Ask the user a question and collect their response.
+
+    This is an interactive tool that displays a question to the user
+    and waits for their input.
+    """
+    print("\n" + "=" * 70, file=sys.stderr)
+    print("❓ QUESTION FROM ASSISTANT", file=sys.stderr)
+    print("=" * 70, file=sys.stderr)
+    print(f"\n{question}\n", file=sys.stderr)
+
+    if options:
+        print("Suggested options:", file=sys.stderr)
+        for i, opt in enumerate(options, 1):
+            print(f"  {i}. {opt}", file=sys.stderr)
+        print("\nYou can select a number or provide your own answer.", file=sys.stderr)
+
+    print("\nYour answer: ", file=sys.stderr, end="", flush=True)
+
+    try:
+        # Read from stdin
+        answer = input()
+
+        # If user entered a number and we have options, use that option
+        if options and answer.strip().isdigit():
+            idx = int(answer.strip()) - 1
+            if 0 <= idx < len(options):
+                answer = options[idx]
+
+        print("=" * 70 + "\n", file=sys.stderr)
+
+        return {
+            "question": question,
+            "answer": answer,
+            "selected_from_options": answer in options if options else False
+        }
+
+    except (EOFError, KeyboardInterrupt):
+        print("\n", file=sys.stderr)
+        print("=" * 70 + "\n", file=sys.stderr)
+        return {
+            "question": question,
+            "answer": None,
+            "error": "User cancelled input"
+        }
+
+
 TOOL_FUNCTIONS = {
     "get_current_time": get_current_time,
     "calculate": calculate,
     "get_weather": get_weather,
+    "ask_user": ask_user,
 }
 
 
@@ -220,10 +291,78 @@ async def run_with_tools(
     return "Maximum tool calling iterations reached"
 
 
-async def run_single(model: str, prompt: str, **kwargs) -> multillm.SingleResponse:
-    """Run a single query against the specified model."""
+async def run_agentic(
+    model: str,
+    prompt: str,
+    tools: list[multillm.Tool] | None = None,
+    options: multillm.AgentOptions | None = None,
+    verbose: bool = False
+) -> str:
+    """
+    Run a query using the agentic API.
+
+    Uses agentwrap for chat providers, native agent API for agent providers.
+    """
     client = multillm.Client()
-    return await client.single(model, prompt, **kwargs)
+
+    # For Claude, if AskUserQuestion is requested, provide custom ask_user tool instead
+    provider_name = model.split("/")[0]
+    if provider_name == "claude" and options and options.allowed_tools:
+        if "AskUserQuestion" in options.allowed_tools:
+            # Remove AskUserQuestion (SDK built-in doesn't work interactively)
+            options.allowed_tools = [t for t in options.allowed_tools if t != "AskUserQuestion"]
+
+            # Add our custom ask_user tool
+            if not tools:
+                tools = []
+
+            # Create ask_user tool for Claude
+            ask_user_claude = multillm.Tool(
+                name="ask_user",
+                description="Ask the user a question and get their response. Use this when you need user input or clarification.",
+                parameters={
+                    "type": "object",
+                    "properties": {
+                        "question": {
+                            "type": "string",
+                            "description": "The question to ask the user"
+                        },
+                        "options": {
+                            "type": "array",
+                            "items": {"type": "string"},
+                            "description": "Optional suggested answers"
+                        }
+                    },
+                    "required": ["question"]
+                },
+                handler=ask_user  # Use the same handler as chat providers
+            )
+            tools.append(ask_user_claude)
+
+            print("ℹ️  Using custom 'ask_user' tool instead of AskUserQuestion for interactive prompting", file=sys.stderr)
+
+    # Collect text responses
+    text_parts = []
+    tool_uses = []
+
+    async for msg in client.run(model, prompt, options=options, tools=tools):
+        if msg.type == "text":
+            text_parts.append(msg.content)
+        elif msg.type == "tool_use":
+            tool_uses.append(msg)
+            if verbose:
+                print(f"  → {msg.tool_name}({json.dumps(msg.tool_input)})", file=sys.stderr)
+            else:
+                print(f"  → {msg.tool_name}", file=sys.stderr)
+        elif msg.type == "tool_result":
+            if verbose:
+                print(f"  ← {msg.tool_result}", file=sys.stderr)
+
+    # Show tool usage summary if any tools were used
+    if tool_uses and not verbose:
+        print(f"\n[Used {len(tool_uses)} tool(s)]\n", file=sys.stderr)
+
+    return " ".join(text_parts)
 
 
 async def run_with_chat_tools(
@@ -232,19 +371,34 @@ async def run_with_chat_tools(
     enabled_tools: list[str],
     verbose: bool = False
 ) -> str:
-    """Run with chat provider tools."""
-    client = multillm.Client()
-
-    # Build tool list from enabled tools
-    tools = [BUILTIN_TOOLS[name] for name in enabled_tools if name in BUILTIN_TOOLS]
-
-    if not tools:
-        # No valid tools, just run normally
-        result = await run_single(model, prompt)
-        return result.text
+    """
+    Run with chat provider tools using agentwrap.
 
-    # Run with tool loop
-    return await run_with_tools(client, model, prompt, tools, verbose)
+    Converts built-in tools to Tool objects and uses agentwrap for execution.
+    """
+    # Build Tool objects from enabled tools
+    tool_objects = []
+    for name in enabled_tools:
+        if name in BUILTIN_TOOLS:
+            tool_def = BUILTIN_TOOLS[name]
+            tool_objects.append(multillm.Tool(
+                name=tool_def["function"]["name"],
+                description=tool_def["function"]["description"],
+                parameters=tool_def["function"]["parameters"],
+                handler=TOOL_FUNCTIONS[name]
+            ))
+
+    if not tool_objects:
+        # No valid tools, run without tools
+        return await run_agentic(f"agentwrap/{model}", prompt, verbose=verbose)
+
+    # Run with agentwrap and tools
+    return await run_agentic(
+        f"agentwrap/{model}",
+        prompt,
+        tools=tool_objects,
+        verbose=verbose
+    )
 
 
 def main():
@@ -254,18 +408,21 @@ def main():
         formatter_class=argparse.RawDescriptionHelpFormatter,
         epilog="""
 Examples:
-  # Chat providers (simple queries)
+  # Chat providers (simple queries) - uses agentwrap internally
   multillm -m openai/gpt-4o -p "What is 2+2?"
   multillm -m anthropic/claude-sonnet-4-20250514 -p "Explain async/await"
   multillm -m gemini/gemini-2.0-flash-exp -p "What is Python?"
 
-  # With built-in tools (for chat providers)
+  # With built-in tools (for chat providers) - uses agentwrap with tool execution
   multillm -m openai/gpt-4o -p "What time is it?" --use-tools get_current_time
   multillm -m openai/gpt-4o -p "Calculate 15 * 23" --use-tools calculate
   multillm -m openai/gpt-4o -p "What's the weather in Tokyo?" --use-tools get_weather
   multillm -m openai/gpt-4o -p "What's 5+5 and the current time?" --use-tools calculate get_current_time
 
-  # Agent providers (with tools)
+  # Interactive tools (ask user questions)
+  multillm -m openai/gpt-4o -p "Ask me about my preferences and create a summary" --use-tools ask_user
+
+  # Native agent providers (Claude with built-in tools)
   multillm -m claude/default -p "What Python version?" --allowed-tools Bash
   multillm -m claude/default -p "List files" --allowed-tools Bash Glob --max-turns 5
   multillm -m claude/default -p "Read README.md" --allowed-tools Read
@@ -273,16 +430,23 @@ Examples:
   # With stdin
   cat file.txt | multillm -m openai/gpt-4o -p "Summarize:" --with-stdin
 
-  # Permission modes
+  # Permission modes (for native agents)
   multillm -m claude/default -p "Create hello.py" --allowed-tools Write --permission-mode acceptEdits
 
   # Verbose mode
   multillm -m openai/gpt-4o -p "Calculate 5*5" --use-tools calculate --verbose
 
+Note:
+  - Chat providers (openai, google, anthropic, etc.) are automatically wrapped with
+    agentic capabilities using the 'agentwrap' provider
+  - Native agent providers (claude) use their built-in agentic features
+  - Use --use-tools for chat providers, --allowed-tools for native agents
+
 Available Built-in Tools (for chat providers with --use-tools):
   get_current_time    Get current date and time
   calculate           Perform mathematical calculations
   get_weather         Get weather information (mock data)
+  ask_user            Ask the user a question and get their response (interactive)
 
 Available Tools (for agent providers with --allowed-tools):
   Read, Write, Edit, Bash, Glob, Grep, Task, WebFetch, WebSearch,
@@ -346,37 +510,39 @@ Available Tools (for agent providers with --allowed-tools):
         prompt = f"{prompt}\n--- USER STDIN BEGIN ---\n{stdin_content}"
 
     try:
-        # Check if this is a chat provider with tools
+        # Determine if this is a chat or agent provider
+        provider_name = args.model.split("/")[0]
+        is_agent_provider = provider_name in ["claude"]  # Native agent providers
+
         if args.use_tools:
-            # Use tool calling workflow for chat providers
+            # Use tool calling workflow for chat providers with agentwrap
             result_text = asyncio.run(
                 run_with_chat_tools(args.model, prompt, args.use_tools, args.verbose)
             )
             print(result_text)
         else:
-            # Build kwargs for agent options
-            kwargs = {}
-            if args.max_turns is not None:
-                kwargs["max_turns"] = args.max_turns
-            if args.allowed_tools:
-                kwargs["allowed_tools"] = args.allowed_tools
-            if args.permission_mode:
-                kwargs["permission_mode"] = args.permission_mode
-
-            # Use single() for normal queries or agent providers
-            result = asyncio.run(run_single(args.model, prompt, **kwargs))
-
-            # Show tool usage for agent providers (inline)
-            if result.tool_calls:
-                print(f"\n[Agent used {len(result.tool_calls)} tool(s)]")
-                for tc in result.tool_calls:
-                    if args.verbose:
-                        print(f"  → {tc['function']['name']}({json.dumps(tc['function'].get('arguments', {}))})")
-                    else:
-                        print(f"  → {tc['function']['name']}")
-                print()
+            # Build agent options
+            options = None
+            if args.max_turns is not None or args.allowed_tools or args.permission_mode:
+                options = multillm.AgentOptions(
+                    max_turns=args.max_turns,
+                    allowed_tools=args.allowed_tools,
+                    permission_mode=args.permission_mode,
+                )
+
+            # Determine which model string to use
+            if is_agent_provider:
+                # Use agent provider directly (claude)
+                model_to_use = args.model
+            else:
+                # Use agentwrap for chat providers
+                model_to_use = f"agentwrap/{args.model}"
 
-            print(result.text)
+            # Run with agentic API
+            result_text = asyncio.run(
+                run_agentic(model_to_use, prompt, options=options, verbose=args.verbose)
+            )
+            print(result_text)
 
     except Exception as e:
         print(f"Error: {e}", file=sys.stderr)
diff --git a/packages/multillm/src/multillm/client.py b/packages/multillm/src/multillm/client.py
index 4a108d7..12eb651 100644
--- a/packages/multillm/src/multillm/client.py
+++ b/packages/multillm/src/multillm/client.py
@@ -1,4 +1,5 @@
 import importlib
+import warnings
 from typing import Any, AsyncIterator
 
 from .base import BaseProvider, Response, SingleResponse
@@ -6,7 +7,7 @@ from .agent import BaseAgentProvider, AgentMessage, AgentOptions, Tool
 from .exceptions import ProviderNotFoundError, InvalidModelFormatError
 
 CHAT_PROVIDERS = ["anthropic", "openai", "gemini", "openrouter"]
-AGENT_PROVIDERS = ["claude"]
+AGENT_PROVIDERS = ["claude", "agentwrap"]
 SUPPORTED_PROVIDERS = CHAT_PROVIDERS + AGENT_PROVIDERS
 
 
@@ -119,10 +120,19 @@ class Client:
         self._chat_providers[provider_name] = provider
         return provider
 
-    def _get_agent_provider(self, provider_name: str) -> BaseAgentProvider:
-        """Get or create an agent provider instance."""
-        if provider_name in self._agent_providers:
-            return self._agent_providers[provider_name]
+    def _get_agent_provider(self, provider_name: str, wrapped_model: str | None = None) -> BaseAgentProvider:
+        """
+        Get or create an agent provider instance.
+
+        Args:
+            provider_name: Name of the provider
+            wrapped_model: For agentwrap, the model to wrap (e.g., "google/gemini")
+        """
+        # For agentwrap, use a unique key per wrapped model
+        cache_key = f"{provider_name}:{wrapped_model}" if provider_name == "agentwrap" and wrapped_model else provider_name
+
+        if cache_key in self._agent_providers:
+            return self._agent_providers[cache_key]
 
         if provider_name not in AGENT_PROVIDERS:
             raise ProviderNotFoundError(provider_name)
@@ -134,8 +144,12 @@ class Client:
             **self.config.get(provider_name, {}),
         }
 
+        # For agentwrap, inject the wrapped_model into config
+        if provider_name == "agentwrap" and wrapped_model:
+            provider_config["wrapped_model"] = wrapped_model
+
         provider = module.Provider(provider_config)
-        self._agent_providers[provider_name] = provider
+        self._agent_providers[cache_key] = provider
         return provider
 
     async def single(
@@ -148,13 +162,40 @@ class Client:
         """
         Send a single message and get a response.
 
+        .. deprecated:: 0.2.0
+           Use :meth:`run` with ``agentwrap/<provider>/<model>`` instead for unified agentic API.
+           The single() method will be removed in version 1.0.0.
+
+           Migration examples:
+
+           Instead of::
+
+               result = await client.single("openai/gpt-4", "Hello")
+               print(result.text)
+
+           Use::
+
+               async for msg in client.run("agentwrap/openai/gpt-4", "Hello"):
+                   if msg.type == "text":
+                       print(msg.content)
+
+           For tool calling::
+
+               # Old way
+               result = await client.single("openai/gpt-4", "Calculate 5+3", tools=tools)
+
+               # New way
+               async for msg in client.run("agentwrap/openai/gpt-4", "Calculate 5+3", tools=tools):
+                   if msg.type == "text":
+                       print(msg.content)
+
         This interface allows using both chat and agent providers for single-turn
         interactions. It returns both the text response and any tool calls made.
 
         Interface concepts:
         - chat_complete(): Takes full conversation history, returns completion
         - agent API: Maintains history internally, takes only newest user message
-        - single(): Unified interface for both, handles single message/response
+        - single(): DEPRECATED - Unified interface for both, handles single message/response
 
         Args:
             model: Model identifier (e.g., "openai/gpt-4o", "claude/sonnet")
@@ -171,6 +212,14 @@ class Client:
                 - text: The text response
                 - tool_calls: List of tool calls made (if any)
         """
+        warnings.warn(
+            "single() is deprecated and will be removed in version 1.0.0. "
+            "Use run() with 'agentwrap/<provider>/<model>' instead for unified agentic API. "
+            "Example: client.run('agentwrap/openai/gpt-4', prompt)",
+            DeprecationWarning,
+            stacklevel=2
+        )
+
         provider_name, model_name = self._parse_model(model)
 
         if self._is_agent_provider(provider_name):
@@ -298,20 +347,22 @@ class Client:
         prompt: str,
         options: AgentOptions | None = None,
         tools: list[Tool] | None = None,
+        wrapped_model: str | None = None,
     ) -> AsyncIterator[AgentMessage]:
         """
         Run an agent with the given prompt.
 
         Args:
-            provider: Provider name (e.g., "claude")
+            provider: Provider name (e.g., "claude", "agentwrap")
             prompt: The task or query for the agent
             options: Agent execution options
             tools: Custom tools available to the agent
+            wrapped_model: For agentwrap, the chat model to wrap
 
         Yields:
             AgentMessage objects as the agent works
         """
-        agent = self._get_agent_provider(provider)
+        agent = self._get_agent_provider(provider, wrapped_model=wrapped_model)
         async for msg in agent.run(prompt, options, tools):
             yield msg
 
@@ -326,7 +377,9 @@ class Client:
         Run an agent using model string format.
 
         Args:
-            model: Model identifier (e.g., "claude/sonnet", "claude/default")
+            model: Model identifier
+                - Agent: "claude/sonnet", "claude/default"
+                - Agentwrap: "agentwrap/openai/gpt-4", "agentwrap/google/gemini"
             prompt: The task or query for the agent
             options: Agent execution options (model from string takes precedence)
             tools: Custom tools available to the agent
@@ -341,10 +394,16 @@ class Client:
                 f"'{provider_name}' is a chat provider. Use chat_complete() instead."
             )
 
-        if options is None:
-            options = self._build_agent_options(model_name)
-        elif model_name and model_name != "default":
-            options.extra["model"] = model_name
+        # For agentwrap, pass model_name as wrapped_model
+        wrapped_model = None
+        if provider_name == "agentwrap":
+            wrapped_model = model_name
+        else:
+            # For other agent providers, add model to options
+            if options is None:
+                options = self._build_agent_options(model_name)
+            elif model_name and model_name != "default":
+                options.extra["model"] = model_name
 
-        async for msg in self.agent_run(provider_name, prompt, options, tools):
+        async for msg in self.agent_run(provider_name, prompt, options, tools, wrapped_model=wrapped_model):
             yield msg