FreeLLMAPI - Free LLM API Gateway

🚀 Quick Start

Use the unified endpoint to chat with AI models. The system automatically routes your request to the best available provider.

Endpoint

POST /api/chat.php

Authentication

Authorization: Bearer freellmapi-0b44f2978581c439944104f86b529bbd

💬 Chat Completion

Send a message and receive an AI response. Supports streaming for real-time output.

Request Format

Example Request

{
  "messages": [
    {"role": "user", "content": "Hello, how are you?"}
  ],
  "model_id": null,
  "stream": true,
  "temperature": 0.7,
  "max_tokens": 1024
}

Parameters

Parameter	Type	Required	Description
`messages`	array	Yes	Array of message objects with `role` and `content`
`model_id`	integer\|null	No	Specific model ID from database, or null for auto-routing
`stream`	boolean	No	Enable streaming response (default: true)
`temperature`	float	No	Response creativity (0.0-2.0, default: 0.7)
`max_tokens`	integer	No	Maximum tokens in response (default: 1024)

Streaming Response Format

Server-Sent Events

data: {"choices":[{"delta":{"content":"Hello"}}]}
data: {"choices":[{"delta":{"content":"!"}}]}
data: [DONE]

Example Code

JavaScript Fetch

async function chat(message) {
  const response = await fetch('/api/chat.php', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': 'Bearer freellmapi-0b44f2978581c439944104f86b529bbd'
    },
    body: JSON.stringify({
      messages: [{ role: 'user', content: message }],
      stream: true
    })
  });

  const reader = response.body.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;
    
    const chunk = decoder.decode(value);
    const lines = chunk.split('\n');
    
    for (const line of lines) {
      if (line.startsWith('data: ')) {
        const data = line.slice(6);
        if (data === '[DONE]') continue;
        
        const parsed = JSON.parse(data);
        const delta = parsed.choices?.[0]?.delta?.content || '';
        console.log(delta);
      }
    }
  }
}

Python Example

import requests
import json

url = "http://localhost:3001/api/chat.php"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer freellmapi-0b44f2978581c439944104f86b529bbd"
}

data = {
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": True
}

response = requests.post(url, headers=headers, json=data, stream=True)

for line in response.iter_lines():
    if line.startswith(b'data: '):
        data = line[6:].decode()
        if data == '[DONE]':
            break
        try:
            chunk = json.loads(data)
            content = chunk['choices'][0]['delta'].get('content', '')
            print(content, end='', flush=True)
        except:
            pass

cURL Example

curl -X POST http://localhost:3001/api/chat.php \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer freellmapi-0b44f2978581c439944104f86b529bbd" \
  -d '{
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": false
  }'

🤖 List Available Models

Get a list of all available models with their capabilities and limits.

Request

GET /api/models.php

Response

[
  {
    "id": 1,
    "platform": "groq",
    "model_id": "llama-3.3-70b-versatile",
    "display_name": "Llama 3.3 70B",
    "intelligence_rank": 9,
    "speed_rank": 2,
    "size_label": "Medium",
    "rpm_limit": 30,
    "tpm_limit": 6000,
    "context_window": 131072,
    "enabled": true
  }
]

📊 Get Statistics

Retrieve usage statistics for monitoring and analytics.

Request

GET /api/stats.php

Response

{
  "requests_today": 150,
  "tokens_used": 45000,
  "avg_latency": 234,
  "success_rate": 0.98,
  "active_providers": 8
}

⚠️ Error Handling

The API returns standard HTTP status codes with detailed error messages.

Status Code	Meaning
`200`	Success
`400`	Bad Request - Invalid parameters
`401`	Unauthorized - Invalid or missing API key
`429`	Rate Limited - Too many requests
`500`	Internal Server Error
`503`	Service Unavailable - No providers available

Error Response Format

{
  "error": "Rate limit exceeded",
  "code": 429,
  "retry_after": 60
}

📋 Rate Limits

Rate limits vary by provider and model. The system automatically handles fallback when limits are reached.

Note: These are the free tier limits imposed by each provider. Our system aggregates multiple providers to maximize your effective rate limit.

Groq: 30 RPM, 6,000 TPM
Cerebras: 30 RPM, 60,000 TPM
SambaNova: 20 RPM, 200,000 TPD
OpenRouter: 20 RPM (varies by model)
Mistral: 2 RPM, 500,000 TPM
GitHub: 10 RPM, 50 RPD

💡 Best Practices

🔄 Implement Retry Logic

Although our system handles fallback automatically, implement client-side retry for transient errors.

📦 Use Streaming

Enable streaming for better UX, especially for long responses.

🎯 Specify Model When Needed

Use model_id parameter when you need specific model capabilities.

🔐 Secure Your Key

Never expose your unified API key in client-side code. Use a backend proxy.

API Documentation