Docs/Guides/SMS Encoding: GSM-7 vs UCS-2 Explained
Technical8 min readUpdated 2025-12-01

SMS Encoding: GSM-7 vs UCS-2 Explained

Learn how GSM-7 and UCS-2 encoding affect your SMS character limits, segment costs, and what characters trigger UCS-2 mode automatically.

What is SMS Encoding?

Every SMS must be encoded before transmission. The encoding determines how many characters fit in a single SMS segment (160 or 70 characters) and therefore how many segments are needed — which directly affects cost. There are two primary encodings: GSM-7 and UCS-2.

GSM-7 Encoding

GSM-7 (also called GSM 03.38) is the default encoding for SMS. It uses 7 bits per character, allowing 128 basic characters plus a 10-character extended table. A single GSM-7 SMS segment holds 160 characters. For multipart messages, each segment holds 153 characters (7 characters are used for the User Data Header). The GSM-7 character set includes: uppercase and lowercase Latin letters (A–Z, a–z), digits (0–9), common punctuation, and some special characters like @, £, $, ¥, è, é, ù, ì, ò, Ç, Ø, ø, Å, å, Δ, _, Φ, Γ, Λ, Ω, Π, Ψ, Σ, Θ, Ξ. Extended GSM-7 characters (escaped, count as 2 chars): [, ], {, }, \, ^, ~, |, €. Using any character NOT in the GSM-7 set will trigger UCS-2 encoding for the entire message.
PropertyGSM-7 SingleGSM-7 Multipart (each)
Bits per char77
Max chars/segment160153
Typical useLatin textLong Latin messages
Cost1 segmentN segments

UCS-2 Encoding

UCS-2 (Universal Character Set — 2 bytes per character) supports the full Unicode Basic Multilingual Plane. This allows sending Arabic, Chinese, Japanese, Korean, Cyrillic, Greek, Hindi (Devanagari), emoji, and thousands of other characters. The tradeoff: 2 bytes per character means only 70 characters per segment (or 67 per segment in multipart messages). This nearly halves the characters-per-pound efficiency and roughly doubles the number of segments needed for equivalent content. UCS-2 is triggered automatically by our API when your message content contains any non-GSM-7 character. You don't need to specify encoding — it's detected and applied transparently.
PropertyUCS-2 SingleUCS-2 Multipart (each)
Bits per char1616
Max chars/segment7067
Typical useArabic, Chinese, emojiLong Unicode messages
Cost vs GSM-7~2.3x segments~2.3x segments

Common Characters That Trigger UCS-2

These characters look harmless but are NOT in GSM-7 and will force UCS-2 encoding — doubling your segment count and cost:
CharacterDescriptionAlternative
" "Smart/curly double quotes" (straight)
' 'Smart/curly single quotes' (straight apostrophe)
– —En dash / Em dash- (hyphen)
Ellipsis character... (three dots)
• ·Bullet points* or -
™ © ®Trademark symbols(TM), (C), (R)
Any emoji 🎉All emojisRemove or use GSM-7
Arabic, Chinese, etc.Non-Latin scriptsNo alternative

How Our API Handles Encoding

The BulkSMSRates API automatically detects the required encoding and reports it back in the API response. You can also request specific encoding.
POST /v1/send
{
  "to": "+447700900000",
  "from": "MyBrand",
  "body": "Hello! Your order is ready 🎉"
}

// Response:
{
  "data": {
    "message_id": "msg_abc123",
    "encoding": "UCS-2",     // 🎉 triggered UCS-2
    "segments": 1,
    "body_length": 30,
    "chars_remaining": 40,
    "cost": 0.0300,
    "currency": "GBP"
  }
}

Best Practices to Minimise Segment Count

1. Avoid smart quotes: Configure your CMS or word processor to use straight quotes. 2. Avoid em dashes: Use a hyphen instead of — (em dash). 3. Avoid emoji in cost-sensitive campaigns: If you must use emoji, know it will halve your chars/segment. 4. Test before sending: Use our character counter tool to preview segment count. 5. Use message templates: Pre-validated templates ensure you never accidentally include UCS-2 characters. 6. Review copy-pasted text: Text copied from Word, Google Docs, or websites often contains smart quotes and other UCS-2 triggers.

Related Guides

Ready to get started?

Start sending SMS in minutes. No monthly fees. Free test credits.

Create Free Account →