Overview
The iCendant Speech API converts text messages into emotionally nuanced, natural-sounding speech. The API processes text through an AI system that adds emotional context, pacing, and natural pauses before generating high-quality audio output.Endpoint
Authentication
Both the API key and Account ID are required in headers for all requests:Basic Usage
Notes
- The
async
parameter can be set to “stream”, “await”, or “poll”. See the Processing Modes section below for detailed information on when to use each option. - The voice_id can be set to any of the available voices. The default is “ignacio”, but you can choose from over 500 voices. See the Speech dashboard for a full list of available voices with sample audio. The id is just the name in lower case.
- If you send a message history only the last 4 items in history impact the emotional processing. This allows the AI to maintain context without overwhelming it with too much information. Hence, don’t send more than 4, you are wasting bandwidth!
Request Structure
Required Fields
- messages (array): Array of message objects containing the text to convert to speech
- voice_options (object): Voice configuration options (required)
Message Object
- role (string): Must be “user” or “assistant”
- content (array): Array of content objects
Content Object
- type (string): Must be “text”
- text (string): The text content to convert to speech
Optional Fields
voice_options (required)
- voice_id (string): Voice identifier (default: “ignacio”)
- async (string): Processing mode - “await”, “poll”, or “stream” (default: “await”)
- persona (string): Optional persona context to influence emotional processing
parameters
- temperature (number): Controls randomness in AI processing (0.0-1.0, default: 0.7)
Processing Modes
Await Mode (Default)
Returns complete audio data in the response. Pros:- Simple to implement
- Takes 2-20 seconds depending on content size
- Client must wait for entire processing to complete
- Not suitable for real-time applications
Successful Response (Await Mode)
Stream Mode (Recommended)
Returns audio data as a streaming response that typically begins within ~2 seconds. Pros:- Responsive
- Best user experience for real-time applications
- Almost immediate audio playback possible
- Lowest perceived latency
- Requires specialized client-side coding to handle streaming audio
- More complex implementation
Successful Response (Stream Mode)
Poll Mode
Returns a URL to poll for the completed audio. Pros:- Extremely fast initial response time
- Non-blocking for client applications
- Useful when user may not need the audio
- Requires additional polling requests
- More complex implementation
- Best suited for shorter content
- Poll mode results are only available for 60 minutes even though charges will have accrued
Successful Response (Poll Mode)
Advanced Features
Persona Context
Thepersona
parameter provides additional context that influences emotional processing. The AI prioritizes persona-based emotional cues over the literal content of the text, understanding that humans don’t always say what they mean.
Emotional Intelligence
The API automatically analyzes text and adds appropriate emotional context:- Meditation/Breathing: Uses calm, relaxed, warm, or sad emotions
- General Content: Can use angry, cheerful, surprised, assertive, energetic, direct, fearful, or bright emotions
Prosodic Control
The system automatically varies:- Rate: Speech speed (-15% to +8%)
- Volume: Loudness (-25% to +25%)
- Pauses: Context-aware breaks at punctuation and natural pause points
Natural Punctuation Handling
- Commas: 0.25s pause
- Colons/Semicolons: 0.5s pause
- Periods: 0.75s pause
- Ellipses: 1.0s pause
Validation Mode
Test your request format without generating audio:Error Handling
Common Error Responses
Invalid JSON
Missing Messages
Authentication Error
Response Headers, Rate Limiting & Credits
The API uses a credit-based system where usage is calculated based on character count:- Cost: 0.008 credits per 1,000 characters
- Credit balance is tracked per account
- Balance information is returned in response headers
- 1,000 characters cost 0.008 USD
- credit balance of 50,000 credits is worth $50 USD
Headers
x-icendant-account-credit-balance: ACCOUNT_CREDIT_BALANCE x-icendant-account-id: YOUR_ACCOUNT_ID x-icendant-response-id: UUID of the response x-icendant-voice-character-count: CHARACTER_COUNTBest Practices
- Text Length: Keep individual messages reasonably sized for optimal processing
- Error Handling: Always check for error responses and handle them appropriately
- Validation: Use
validate_only: true
to test request format during development - Voice Selection: Choose appropriate voice_id for your use case
- Processing Mode: Use “stream” for real-time applications, “await” for simple use cases, “poll” for long-running processes