Full reference for the POST /v1/chat/completions endpoint — request parameters, response format, streaming, and tool calling.
The Chat Completions endpoint generates model responses from a conversation. It follows the OpenAI Chat Completions format, so any library or tool that works with OpenAI also works with the Cline API.
Model ID in provider/model-name format. For example: anthropic/claude-sonnet-4-6, openai/gpt-4o, google/gemini-2.5-pro. See Models for available options.
When true, the response is delivered as a stream of Server-Sent Events. When false, a single JSON object is returned after the model finishes generating.
When stream: true (the default), the response is a series of Server-Sent Events. Each line starts with data: and contains a JSON chunk. The stream ends with data: [DONE].
data: {"id":"gen-abc123","choices":[{"delta":{"role":"assistant"},"index":0}],"model":"anthropic/claude-sonnet-4-6"}data: {"id":"gen-abc123","choices":[{"delta":{"content":"The capital"},"index":0}],"model":"anthropic/claude-sonnet-4-6"}data: {"id":"gen-abc123","choices":[{"delta":{"content":" of France is Paris."},"index":0,"finish_reason":"stop"}],"model":"anthropic/claude-sonnet-4-6","usage":{"prompt_tokens":14,"completion_tokens":8,"cost":0.000066}}data: [DONE]
Mid-stream errors do not produce an HTTP error code — the connection was already 200 OK. Always check finish_reason in your streaming handler. See Errors for details.
Include previous messages in the messages array to maintain context across turns:
{ "model": "anthropic/claude-sonnet-4-6", "messages": [ {"role": "system", "content": "You are a helpful coding assistant."}, {"role": "user", "content": "What is a closure in JavaScript?"}, {"role": "assistant", "content": "A closure is a function that retains access to its outer scope..."}, {"role": "user", "content": "Can you show me an example?"} ]}
Define tools that the model can call using the OpenAI function-calling format. When the model decides to use a tool, it responds with a tool_calls array instead of a text reply.
Some models support extended thinking, where the model reasons through a problem before generating a reply. Reasoning content streams in the delta.reasoning field:
{"choices":[{"delta":{"reasoning":"Let me think about this step by step..."}}]}
Reasoning tokens are counted separately from output tokens. Not all models support reasoning — check model capabilities before using this feature.