Chat completions

The Chat Completions endpoint generates model responses from a conversation. It follows the OpenAI Chat Completions format, so any library or tool that works with OpenAI also works with the Cline API.

Endpoint

POST https://api.cline.bot/api/v1/chat/completions

Request headers

string

required

Bearer token for authentication. Format: Bearer YOUR_API_KEY.

string

required

Must be application/json.

string

Your application’s URL. Used for usage tracking in logs.

string

Your application’s name. Appears in usage logs.

Request body

string

required

Model ID in provider/model-name format. For example: anthropic/claude-sonnet-4-6, openai/gpt-4o, google/gemini-2.5-pro. See Models for available options.

object[]

required

Array of conversation messages. Each message has a role and content.

Show Message object

string

required

One of system, user, or assistant. Use tool when returning a tool result.

string | object[]

required

The message text, or an array of content parts for multimodal messages (text + images).

boolean

default:"true"

When true, the response is delivered as a stream of Server-Sent Events. When false, a single JSON object is returned after the model finishes generating.

object[]

Tool definitions in OpenAI function-calling format. When provided, the model may respond with a tool_calls array instead of a text reply.

Show Tool object

string

required

Must be "function".

object

required

Show properties

string

required

The function name.

string

What the function does. The model uses this to decide when to call it.

object

JSON Schema describing the function’s parameters.

number

default:"Model default"

Sampling temperature between 0.0 and 2.0. Lower values produce more deterministic output; higher values produce more varied output.

Message roles

Role	Purpose
`system`	Sets the model’s behavior and persona. Place first in the array.
`user`	The human’s input.
`assistant`	Previous model responses, used to maintain conversation context.
`tool`	The result of a tool call, returned after the model calls a function.

Streaming response

When stream: true (the default), the response is a series of Server-Sent Events. Each line starts with data: and contains a JSON chunk. The stream ends with data: [DONE].

data: {"id":"gen-abc123","choices":[{"delta":{"role":"assistant"},"index":0}],"model":"anthropic/claude-sonnet-4-6"}

data: {"id":"gen-abc123","choices":[{"delta":{"content":"The capital"},"index":0}],"model":"anthropic/claude-sonnet-4-6"}

data: {"id":"gen-abc123","choices":[{"delta":{"content":" of France is Paris."},"index":0,"finish_reason":"stop"}],"model":"anthropic/claude-sonnet-4-6","usage":{"prompt_tokens":14,"completion_tokens":8,"cost":0.000066}}

data: [DONE]

Streaming chunk fields

string

Generation ID, consistent across all chunks in a stream.

string

The model ID that generated the response.

object[]

Show properties

string

The new text in this chunk. Accumulate these across chunks to get the full response.

string

Reasoning/thinking content for models that support extended thinking.

object

Encrypted reasoning blocks from some providers. Can be passed back in subsequent requests to preserve the reasoning trace.

string

stop when generation is complete. tool_calls when the model wants to call a tool. error when a mid-stream error occurs.

object

Token counts and cost. Included in the final chunk only.

Show properties

number

Total input tokens.

number

Total output tokens.

number

Tokens served from cache. Cached tokens reduce cost.

number

Total cost of the request in USD.

Mid-stream errors do not produce an HTTP error code — the connection was already 200 OK. Always check finish_reason in your streaming handler. See Errors for details.

Non-streaming response

When stream: false, the API waits for the model to finish and returns a single JSON object:

{
  "id": "gen-abc123",
  "model": "anthropic/claude-sonnet-4-6",
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop",
      "index": 0
    }
  ],
  "usage": {
    "prompt_tokens": 14,
    "completion_tokens": 8
  }
}

Multi-turn conversations

Include previous messages in the messages array to maintain context across turns:

{
  "model": "anthropic/claude-sonnet-4-6",
  "messages": [
    {"role": "system", "content": "You are a helpful coding assistant."},
    {"role": "user", "content": "What is a closure in JavaScript?"},
    {"role": "assistant", "content": "A closure is a function that retains access to its outer scope..."},
    {"role": "user", "content": "Can you show me an example?"}
  ]
}

Tool calling

Define tools that the model can call using the OpenAI function-calling format. When the model decides to use a tool, it responds with a tool_calls array instead of a text reply.

1. Define tools in the request

{
  "model": "anthropic/claude-sonnet-4-6",
  "messages": [
    {"role": "user", "content": "What's the weather in San Francisco?"}
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get the current weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "City and state, e.g. San Francisco, CA"
            }
          },
          "required": ["location"]
        }
      }
    }
  ]
}

2. Receive the tool call

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "tool_calls": [
          {
            "id": "call_abc123",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"location\": \"San Francisco, CA\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ]
}

3. Return the tool result

Send the result back as a tool role message to continue the conversation:

{
  "messages": [
    {"role": "user", "content": "What's the weather in San Francisco?"},
    {
      "role": "assistant",
      "tool_calls": [{"id": "call_abc123", "type": "function", "function": {"name": "get_weather", "arguments": "{\"location\": \"San Francisco, CA\"}"}}]
    },
    {
      "role": "tool",
      "tool_call_id": "call_abc123",
      "content": "{\"temperature\": 62, \"condition\": \"foggy\"}"
    }
  ]
}

Reasoning models

Some models support extended thinking, where the model reasons through a problem before generating a reply. Reasoning content streams in the delta.reasoning field:

{"choices":[{"delta":{"reasoning":"Let me think about this step by step..."}}]}

Reasoning tokens are counted separately from output tokens. Not all models support reasoning — check model capabilities before using this feature.

Image input

Models that support images accept base64-encoded content in the messages array:

{
  "model": "anthropic/claude-sonnet-4-6",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What's in this image?"},
        {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}
      ]
    }
  ]
}

Complete example

curl -X POST https://api.cline.bot/api/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-6",
    "messages": [
      {"role": "system", "content": "You are a concise assistant. Answer in one sentence."},
      {"role": "user", "content": "Explain what an API is."}
    ],
    "stream": false
  }'

from openai import OpenAI

client = OpenAI(
    base_url="https://api.cline.bot/api/v1",
    api_key="YOUR_API_KEY",
)

response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4-6",
    messages=[
        {"role": "system", "content": "You are a concise assistant. Answer in one sentence."},
        {"role": "user", "content": "Explain what an API is."},
    ],
)
print(response.choices[0].message.content)

import OpenAI from "openai"

const client = new OpenAI({
  baseURL: "https://api.cline.bot/api/v1",
  apiKey: "YOUR_API_KEY",
})

const response = await client.chat.completions.create({
  model: "anthropic/claude-sonnet-4-6",
  messages: [
    { role: "system", content: "You are a concise assistant. Answer in one sentence." },
    { role: "user", content: "Explain what an API is." },
  ],
})
console.log(response.choices[0].message.content)

Authentication

Set up your API key.

Errors

Handle errors and implement retry logic.

​Endpoint

​Request headers

​Request body

​Message roles

​Streaming response

​Streaming chunk fields

​Non-streaming response

​Multi-turn conversations

​Tool calling

​1. Define tools in the request

​2. Receive the tool call

​3. Return the tool result

​Reasoning models

​Image input

​Complete example

Authentication

Errors

Endpoint

Request headers

Request body

Message roles

Streaming response

Streaming chunk fields

Non-streaming response

Multi-turn conversations

Tool calling

1. Define tools in the request

2. Receive the tool call

3. Return the tool result

Reasoning models

Image input

Complete example