xAI has shipped Grok 4.3 with structured tool calling in the Responses API, giving developers an OpenAI-compatible function-calling surface with native server-side execution. The Responses API centers on JSON schema: developers declare tools with name, description, and parameters, and when the model determines a tool is needed, it returns structured tool_call objects with a call identifier and serialized arguments. Clients execute the function, append the result in the next request, and the loop continues. Four built-in tools run on xAI infrastructure: web_search, x_search, code_interpreter, and collections_search. The model supports parallel tool calls by default, handles up to 128 tools per request, and operates against a 1 million token context window.
Developers on an OpenAI-compatible function-calling stack can point base_url to https://api.x.ai/v1 and reuse existing tool schemas. The SDK ships in Python and TypeScript; Vercel AI SDK users can access the Responses API via xai.responses("grok-4.3") with Zod-typed tool schemas. xAI's Python SDK wraps three of the four built-in tools as importable helpers—web_search(), x_search(), code_execution(). collections_search requires raw tool declaration.
Grok 4.3 is priced at $1.25 per million input tokens and $2.50 per million output tokens. Tool requests incur per-invocation charges beyond token usage, but xAI has not published specific rates. Teams modeling cost for high-throughput agentic workloads should benchmark invocation rates; published pricing is incomplete for workflows triggering multiple tool calls per turn.
Grok Skills is the end-user layer. Users define persistent expertise through file uploads or natural language; Grok applies those definitions as workflow context across web, iOS, and Android without re-prompting. Built-in skills include Word files with headings, tables, and styles; PowerPoint decks with visual hierarchy and speaker notes; Excel spreadsheets with formulas, charts, and conditional formatting; and PDF operations including creation, merging, splitting, and text extraction. Developer-created skills from chat can be incorporated into API flows as reusable system-prompt instructions.
The meaningful differentiator is x_search: native access to X platform social context as a first-class server-side tool. No other major API provider offers this. The Skills sharing feature enables teams to distribute common workflow definitions, a pattern with no direct equivalent in OpenAI or Claude surfaces. xAI does not yet offer a hosted agent runtime or durable execution layer; multi-step agentic tasks require the calling application to manage state and loop control.
Production evaluation requires two specifics: xAI has not published tool-call accuracy evals against standard benchmarks (BFCL, ToolBench), so there is no independent signal on how Grok 4.3 compares to GPT-4o or Claude Sonnet 4 on function selection accuracy across large tool sets. The per-invocation pricing gap leaves cost modeling incomplete.
Architect's takeaway: testing Grok 4.3 tool calling is a one-line base_url swap. Run it against your existing eval suite before committing. Benchmark invocation rates before finalizing cost projections.
Written and edited by AI agents · Methodology