SMS Encoding: GSM-7 vs UCS-2 Explained

What is SMS Encoding?

Every SMS must be encoded before transmission. The encoding determines how many characters fit in a single SMS segment (160 or 70 characters) and therefore how many segments are needed — which directly affects cost. There are two primary encodings: GSM-7 and UCS-2.

GSM-7 Encoding

GSM-7 (also called GSM 03.38) is the default encoding for SMS. It uses 7 bits per character, allowing 128 basic characters plus a 10-character extended table. A single GSM-7 SMS segment holds 160 characters. For multipart messages, each segment holds 153 characters (7 characters are used for the User Data Header). The GSM-7 character set includes: uppercase and lowercase Latin letters (A–Z, a–z), digits (0–9), common punctuation, and some special characters like @, £, $, ¥, è, é, ù, ì, ò, Ç, Ø, ø, Å, å, Δ, _, Φ, Γ, Λ, Ω, Π, Ψ, Σ, Θ, Ξ. Extended GSM-7 characters (escaped, count as 2 chars): [, ], {, }, \, ^, ~, |, €. Using any character NOT in the GSM-7 set will trigger UCS-2 encoding for the entire message.

Property	GSM-7 Single	GSM-7 Multipart (each)
Bits per char	7	7
Max chars/segment	160	153
Typical use	Latin text	Long Latin messages
Cost	1 segment	N segments

UCS-2 Encoding

UCS-2 (Universal Character Set — 2 bytes per character) supports the full Unicode Basic Multilingual Plane. This allows sending Arabic, Chinese, Japanese, Korean, Cyrillic, Greek, Hindi (Devanagari), emoji, and thousands of other characters. The tradeoff: 2 bytes per character means only 70 characters per segment (or 67 per segment in multipart messages). This nearly halves the characters-per-pound efficiency and roughly doubles the number of segments needed for equivalent content. UCS-2 is triggered automatically by our API when your message content contains any non-GSM-7 character. You don't need to specify encoding — it's detected and applied transparently.

Property	UCS-2 Single	UCS-2 Multipart (each)
Bits per char	16	16
Max chars/segment	70	67
Typical use	Arabic, Chinese, emoji	Long Unicode messages
Cost vs GSM-7	~2.3x segments	~2.3x segments

Common Characters That Trigger UCS-2

These characters look harmless but are NOT in GSM-7 and will force UCS-2 encoding — doubling your segment count and cost:

Character	Description	Alternative
" "	Smart/curly double quotes	" (straight)
' '	Smart/curly single quotes	' (straight apostrophe)
– —	En dash / Em dash	- (hyphen)
…	Ellipsis character	... (three dots)
• ·	Bullet points	* or -
™ © ®	Trademark symbols	(TM), (C), (R)
Any emoji 🎉	All emojis	Remove or use GSM-7
Arabic, Chinese, etc.	Non-Latin scripts	No alternative

How Our API Handles Encoding

The BulkSMSRates API automatically detects the required encoding and reports it back in the API response. You can also request specific encoding.

POST /v1/send
{
  "to": "+447700900000",
  "from": "MyBrand",
  "body": "Hello! Your order is ready 🎉"
}

// Response:
{
  "data": {
    "message_id": "msg_abc123",
    "encoding": "UCS-2",     // 🎉 triggered UCS-2
    "segments": 1,
    "body_length": 30,
    "chars_remaining": 40,
    "cost": 0.0300,
    "currency": "GBP"
  }
}

Best Practices to Minimise Segment Count

1. Avoid smart quotes: Configure your CMS or word processor to use straight quotes. 2. Avoid em dashes: Use a hyphen instead of — (em dash). 3. Avoid emoji in cost-sensitive campaigns: If you must use emoji, know it will halve your chars/segment. 4. Test before sending: Use our character counter tool to preview segment count. 5. Use message templates: Pre-validated templates ensure you never accidentally include UCS-2 characters. 6. Review copy-pasted text: Text copied from Word, Google Docs, or websites often contains smart quotes and other UCS-2 triggers.

Related Guides

✂️

SMS Message Segmentation Explained

7 min read

🔌

SMPP Protocol Guide — SMPP v3.4 Overview

12 min read

📊

SMS Delivery Report (DLR) Status Codes Explained

6 min read

Ready to get started?

Start sending SMS in minutes. No monthly fees. Free test credits.

Create Free Account →