What is SMS Encoding?
Every SMS must be encoded before transmission. The encoding determines how many characters fit in a single SMS segment (160 or 70 characters) and therefore how many segments are needed — which directly affects cost. There are two primary encodings: GSM-7 and UCS-2.
GSM-7 Encoding
GSM-7 (also called GSM 03.38) is the default encoding for SMS. It uses 7 bits per character, allowing 128 basic characters plus a 10-character extended table. A single GSM-7 SMS segment holds 160 characters. For multipart messages, each segment holds 153 characters (7 characters are used for the User Data Header).
The GSM-7 character set includes: uppercase and lowercase Latin letters (A–Z, a–z), digits (0–9), common punctuation, and some special characters like @, £, $, ¥, è, é, ù, ì, ò, Ç, Ø, ø, Å, å, Δ, _, Φ, Γ, Λ, Ω, Π, Ψ, Σ, Θ, Ξ.
Extended GSM-7 characters (escaped, count as 2 chars): [, ], {, }, \, ^, ~, |, €.
Using any character NOT in the GSM-7 set will trigger UCS-2 encoding for the entire message.
| Property | GSM-7 Single | GSM-7 Multipart (each) |
|---|---|---|
| Bits per char | 7 | 7 |
| Max chars/segment | 160 | 153 |
| Typical use | Latin text | Long Latin messages |
| Cost | 1 segment | N segments |
UCS-2 Encoding
UCS-2 (Universal Character Set — 2 bytes per character) supports the full Unicode Basic Multilingual Plane. This allows sending Arabic, Chinese, Japanese, Korean, Cyrillic, Greek, Hindi (Devanagari), emoji, and thousands of other characters.
The tradeoff: 2 bytes per character means only 70 characters per segment (or 67 per segment in multipart messages). This nearly halves the characters-per-pound efficiency and roughly doubles the number of segments needed for equivalent content.
UCS-2 is triggered automatically by our API when your message content contains any non-GSM-7 character. You don't need to specify encoding — it's detected and applied transparently.
| Property | UCS-2 Single | UCS-2 Multipart (each) |
|---|---|---|
| Bits per char | 16 | 16 |
| Max chars/segment | 70 | 67 |
| Typical use | Arabic, Chinese, emoji | Long Unicode messages |
| Cost vs GSM-7 | ~2.3x segments | ~2.3x segments |
Common Characters That Trigger UCS-2
These characters look harmless but are NOT in GSM-7 and will force UCS-2 encoding — doubling your segment count and cost:
| Character | Description | Alternative |
|---|---|---|
| " " | Smart/curly double quotes | " (straight) |
| ' ' | Smart/curly single quotes | ' (straight apostrophe) |
| – — | En dash / Em dash | - (hyphen) |
| … | Ellipsis character | ... (three dots) |
| • · | Bullet points | * or - |
| ™ © ® | Trademark symbols | (TM), (C), (R) |
| Any emoji 🎉 | All emojis | Remove or use GSM-7 |
| Arabic, Chinese, etc. | Non-Latin scripts | No alternative |
How Our API Handles Encoding
The BulkSMSRates API automatically detects the required encoding and reports it back in the API response. You can also request specific encoding.
POST /v1/send
{
"to": "+447700900000",
"from": "MyBrand",
"body": "Hello! Your order is ready 🎉"
}
// Response:
{
"data": {
"message_id": "msg_abc123",
"encoding": "UCS-2", // 🎉 triggered UCS-2
"segments": 1,
"body_length": 30,
"chars_remaining": 40,
"cost": 0.0300,
"currency": "GBP"
}
}Best Practices to Minimise Segment Count
1. Avoid smart quotes: Configure your CMS or word processor to use straight quotes.
2. Avoid em dashes: Use a hyphen instead of — (em dash).
3. Avoid emoji in cost-sensitive campaigns: If you must use emoji, know it will halve your chars/segment.
4. Test before sending: Use our character counter tool to preview segment count.
5. Use message templates: Pre-validated templates ensure you never accidentally include UCS-2 characters.
6. Review copy-pasted text: Text copied from Word, Google Docs, or websites often contains smart quotes and other UCS-2 triggers.
Related Guides
Ready to get started?
Start sending SMS in minutes. No monthly fees. Free test credits.
Create Free Account →