API Documentation
Integrate FreeLLMAPI into your applications
🚀 Quick Start
Use the unified endpoint to chat with AI models. The system automatically routes your request to the best available provider.
POST /api/chat.php
Authorization: Bearer freellmapi-0b44f2978581c439944104f86b529bbd
💬 Chat Completion
Send a message and receive an AI response. Supports streaming for real-time output.
Request Format
{
"messages": [
{"role": "user", "content": "Hello, how are you?"}
],
"model_id": null,
"stream": true,
"temperature": 0.7,
"max_tokens": 1024
}
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
messages |
array | Yes | Array of message objects with role and content |
model_id |
integer|null | No | Specific model ID from database, or null for auto-routing |
stream |
boolean | No | Enable streaming response (default: true) |
temperature |
float | No | Response creativity (0.0-2.0, default: 0.7) |
max_tokens |
integer | No | Maximum tokens in response (default: 1024) |
Streaming Response Format
data: {"choices":[{"delta":{"content":"Hello"}}]}
data: {"choices":[{"delta":{"content":"!"}}]}
data: [DONE]
Example Code
async function chat(message) {
const response = await fetch('/api/chat.php', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'Bearer freellmapi-0b44f2978581c439944104f86b529bbd'
},
body: JSON.stringify({
messages: [{ role: 'user', content: message }],
stream: true
})
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = line.slice(6);
if (data === '[DONE]') continue;
const parsed = JSON.parse(data);
const delta = parsed.choices?.[0]?.delta?.content || '';
console.log(delta);
}
}
}
}
import requests
import json
url = "http://localhost:3001/api/chat.php"
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer freellmapi-0b44f2978581c439944104f86b529bbd"
}
data = {
"messages": [{"role": "user", "content": "Hello!"}],
"stream": True
}
response = requests.post(url, headers=headers, json=data, stream=True)
for line in response.iter_lines():
if line.startswith(b'data: '):
data = line[6:].decode()
if data == '[DONE]':
break
try:
chunk = json.loads(data)
content = chunk['choices'][0]['delta'].get('content', '')
print(content, end='', flush=True)
except:
pass
curl -X POST http://localhost:3001/api/chat.php \
-H "Content-Type: application/json" \
-H "Authorization: Bearer freellmapi-0b44f2978581c439944104f86b529bbd" \
-d '{
"messages": [{"role": "user", "content": "Hello!"}],
"stream": false
}'
🤖 List Available Models
Get a list of all available models with their capabilities and limits.
GET /api/models.php
[
{
"id": 1,
"platform": "groq",
"model_id": "llama-3.3-70b-versatile",
"display_name": "Llama 3.3 70B",
"intelligence_rank": 9,
"speed_rank": 2,
"size_label": "Medium",
"rpm_limit": 30,
"tpm_limit": 6000,
"context_window": 131072,
"enabled": true
}
]
📊 Get Statistics
Retrieve usage statistics for monitoring and analytics.
GET /api/stats.php
{
"requests_today": 150,
"tokens_used": 45000,
"avg_latency": 234,
"success_rate": 0.98,
"active_providers": 8
}
⚠️ Error Handling
The API returns standard HTTP status codes with detailed error messages.
| Status Code | Meaning |
|---|---|
200 |
Success |
400 |
Bad Request - Invalid parameters |
401 |
Unauthorized - Invalid or missing API key |
429 |
Rate Limited - Too many requests |
500 |
Internal Server Error |
503 |
Service Unavailable - No providers available |
{
"error": "Rate limit exceeded",
"code": 429,
"retry_after": 60
}
📋 Rate Limits
Rate limits vary by provider and model. The system automatically handles fallback when limits are reached.
Note: These are the free tier limits imposed by each provider. Our system aggregates multiple providers to maximize your effective rate limit.
- Groq: 30 RPM, 6,000 TPM
- Cerebras: 30 RPM, 60,000 TPM
- SambaNova: 20 RPM, 200,000 TPD
- OpenRouter: 20 RPM (varies by model)
- Mistral: 2 RPM, 500,000 TPM
- GitHub: 10 RPM, 50 RPD
💡 Best Practices
🔄 Implement Retry Logic
Although our system handles fallback automatically, implement client-side retry for transient errors.
📦 Use Streaming
Enable streaming for better UX, especially for long responses.
🎯 Specify Model When Needed
Use model_id parameter when you need specific model capabilities.
🔐 Secure Your Key
Never expose your unified API key in client-side code. Use a backend proxy.