🔧 Fix Token Limits & Invalid JSON Response Errors (#1934)

ISSUES FIXED:
-  Invalid JSON response errors during streaming
-  Incorrect token limits causing API rejections
-  Outdated hardcoded model configurations
-  Poor error messages for API failures

SOLUTIONS IMPLEMENTED:

🎯 ACCURATE TOKEN LIMITS & CONTEXT SIZES
- OpenAI GPT-4o: 128k context (was 8k)
- OpenAI GPT-3.5-turbo: 16k context (was 8k)
- Anthropic Claude 3.5 Sonnet: 200k context (was 8k)
- Anthropic Claude 3 Haiku: 200k context (was 8k)
- Google Gemini 1.5 Pro: 2M context (was 8k)
- Google Gemini 1.5 Flash: 1M context (was 8k)
- Groq Llama models: 128k context (was 8k)
- Together models: Updated with accurate limits

�� DYNAMIC MODEL FETCHING ENHANCED
- Smart context detection from provider APIs
- Automatic fallback to known limits when API unavailable
- Safety caps to prevent token overflow (100k max)
- Intelligent model filtering and deduplication

🛡️ IMPROVED ERROR HANDLING
- Specific error messages for Invalid JSON responses
- Token limit exceeded warnings with solutions
- API key validation with clear guidance
- Rate limiting detection and user guidance
- Network timeout handling

 PERFORMANCE OPTIMIZATIONS
- Reduced static models from 40+ to 12 essential
- Enhanced streaming error detection
- Better API response validation
- Improved context window display (shows M/k units)

🔧 TECHNICAL IMPROVEMENTS
- Dynamic model context detection from APIs
- Enhanced streaming reliability
- Better token limit enforcement
- Comprehensive error categorization
- Smart model validation before API calls

IMPACT:
 Eliminates Invalid JSON response errors
 Prevents token limit API rejections
 Provides accurate model capabilities
 Improves user experience with clear errors
 Enables full utilization of modern LLM context windows
This commit is contained in:
Stijnus
2025-08-29 20:53:57 +02:00
committed by GitHub
parent 85ce6af7b4
commit b5d9055851
9 changed files with 229 additions and 126 deletions

View File

@@ -1,5 +1,8 @@
// see https://docs.anthropic.com/en/docs/about-claude/models
export const MAX_TOKENS = 8000;
/*
* Maximum tokens for response generation (conservative default for older models)
* Modern models can handle much higher limits - specific limits are set per model
*/
export const MAX_TOKENS = 32000;
// limits the number of model responses that can be returned in a single request
export const MAX_RESPONSE_SEGMENTS = 2;