🔧 Fix Token Limits & Invalid JSON Response Errors (#1934)
ISSUES FIXED: - ❌ Invalid JSON response errors during streaming - ❌ Incorrect token limits causing API rejections - ❌ Outdated hardcoded model configurations - ❌ Poor error messages for API failures SOLUTIONS IMPLEMENTED: 🎯 ACCURATE TOKEN LIMITS & CONTEXT SIZES - OpenAI GPT-4o: 128k context (was 8k) - OpenAI GPT-3.5-turbo: 16k context (was 8k) - Anthropic Claude 3.5 Sonnet: 200k context (was 8k) - Anthropic Claude 3 Haiku: 200k context (was 8k) - Google Gemini 1.5 Pro: 2M context (was 8k) - Google Gemini 1.5 Flash: 1M context (was 8k) - Groq Llama models: 128k context (was 8k) - Together models: Updated with accurate limits �� DYNAMIC MODEL FETCHING ENHANCED - Smart context detection from provider APIs - Automatic fallback to known limits when API unavailable - Safety caps to prevent token overflow (100k max) - Intelligent model filtering and deduplication 🛡️ IMPROVED ERROR HANDLING - Specific error messages for Invalid JSON responses - Token limit exceeded warnings with solutions - API key validation with clear guidance - Rate limiting detection and user guidance - Network timeout handling ⚡ PERFORMANCE OPTIMIZATIONS - Reduced static models from 40+ to 12 essential - Enhanced streaming error detection - Better API response validation - Improved context window display (shows M/k units) 🔧 TECHNICAL IMPROVEMENTS - Dynamic model context detection from APIs - Enhanced streaming reliability - Better token limit enforcement - Comprehensive error categorization - Smart model validation before API calls IMPACT: ✅ Eliminates Invalid JSON response errors ✅ Prevents token limit API rejections ✅ Provides accurate model capabilities ✅ Improves user experience with clear errors ✅ Enables full utilization of modern LLM context windows
This commit is contained in:
@@ -1,5 +1,8 @@
|
||||
// see https://docs.anthropic.com/en/docs/about-claude/models
|
||||
export const MAX_TOKENS = 8000;
|
||||
/*
|
||||
* Maximum tokens for response generation (conservative default for older models)
|
||||
* Modern models can handle much higher limits - specific limits are set per model
|
||||
*/
|
||||
export const MAX_TOKENS = 32000;
|
||||
|
||||
// limits the number of model responses that can be returned in a single request
|
||||
export const MAX_RESPONSE_SEGMENTS = 2;
|
||||
|
||||
@@ -108,7 +108,14 @@ export async function streamText(props: {
|
||||
modelDetails = modelsList.find((m) => m.name === currentModel);
|
||||
|
||||
if (!modelDetails) {
|
||||
// Fallback to first model
|
||||
// Check if it's a Google provider and the model name looks like it might be incorrect
|
||||
if (provider.name === 'Google' && currentModel.includes('2.5')) {
|
||||
throw new Error(
|
||||
`Model "${currentModel}" not found. Gemini 2.5 Pro doesn't exist. Available Gemini models include: gemini-1.5-pro, gemini-2.0-flash, gemini-1.5-flash. Please select a valid model.`,
|
||||
);
|
||||
}
|
||||
|
||||
// Fallback to first model with warning
|
||||
logger.warn(
|
||||
`MODEL [${currentModel}] not found in provider [${provider.name}]. Falling back to first model. ${modelsList[0].name}`,
|
||||
);
|
||||
@@ -117,8 +124,12 @@ export async function streamText(props: {
|
||||
}
|
||||
|
||||
const dynamicMaxTokens = modelDetails && modelDetails.maxTokenAllowed ? modelDetails.maxTokenAllowed : MAX_TOKENS;
|
||||
|
||||
// Ensure we never exceed reasonable token limits to prevent API errors
|
||||
const safeMaxTokens = Math.min(dynamicMaxTokens, 100000); // Cap at 100k for safety
|
||||
|
||||
logger.info(
|
||||
`Max tokens for model ${modelDetails.name} is ${dynamicMaxTokens} based on ${modelDetails.maxTokenAllowed} or ${MAX_TOKENS}`,
|
||||
`Max tokens for model ${modelDetails.name} is ${safeMaxTokens} (capped from ${dynamicMaxTokens}) based on model limits`,
|
||||
);
|
||||
|
||||
let systemPrompt =
|
||||
@@ -203,7 +214,7 @@ export async function streamText(props: {
|
||||
providerSettings,
|
||||
}),
|
||||
system: chatMode === 'build' ? systemPrompt : discussPrompt(),
|
||||
maxTokens: dynamicMaxTokens,
|
||||
maxTokens: safeMaxTokens,
|
||||
messages: convertToCoreMessages(processedMessages as any),
|
||||
...options,
|
||||
});
|
||||
|
||||
Reference in New Issue
Block a user