A Guide to Diagnosing Character Display Problems (2025)

When Bad Things Happen to Good Characters

Get to Know a Character

It can be useful to know your characters, but more practically useful to know one character well.My character is an "e" with an acute accent, character code 233 (decimal) in Latin-1 and Unicode.

Inserting Characters

There are many ways it can be inserted into a document:
  • On Windows, I hold down the Alt key and type 0233 on the numeric keyboard and release the Alt key.I could use the charmap program, too.Or I could copy and paste it(e.g., é).But entering the code directly is risky because, if the character encoding changes,e.g., from Latin-1 to UTF-8,then the meaning of code 233 changes.
  • In an HTML document, I can enter these magical incantations,which are displayed correctly regardless of encoding:
    • é (decimal) ⇒ é
    • é (hex) ⇒ é
    • é (mnemonic) ⇒ é
    Note: HTML/XHTML validation programs might not be acquainted with these and complain.
  • In Microsoft Word, I type an accent code followed by the accented letter.On Windows, Ctrl+quote, then 'e'. On Mac, Option+quote, then 'e'.Accent codes include: grave=backquote, acute=quote, circumflex=hat, colon=umlaut, comma=cedilla, tilde=tilde, slash=slash, and perhaps others.

What Could Possibly Go Wrong?

If é is UTF-8 encoded, but displayed without decoding, it looks like this:

é

The first 128 characters in the Latin-1 character set (same as ASCII),are simply represented as themselves in UTF-8.The second half of Latin-1 characters are split.The first half of the non-ASCII Latin-1 characters are represented by themselves, preceded by code 194 decimalor C2 hex, so the UTF-8 encoding for character code 191 (decimal), ¿, is

¿

The second half of the non-ASCII Latin-1 characters are represented by a different character,preceded by code 195 decimal or C3 hex.So, when looking at UTF-8 encodings of Latin-1 characters,if you see  or à where you do not expect it,there are probably too many UTF-8 encodings.Multiple extra encodings have a pattern to them:

0 é
1 é
2 é
3 é
4 é
5 you get the idea

Note: If you see boxes in the characters above,it is because the font used is missing that character.There is no way to fix it other than getting a new font or by changing the font.Often, the fonts used in a window title or status bar or JavaScript are more limited thanthose used elsewhere,so the "alert", "title", and "status" buttons in theCharacter Conversion Cornercan be used to test characters in those contexts.

Too few encodings can have a bad effect that looks different.When é is not UTF-8 encoded, it can appear like this very high numbered character:

Progressive under-encoding can result in a question mark being displayed.

Diagnostic Reference

You are now ready to diagnose UTF-8 encoding problems (e.g., with é):

SymptomDiagnosis
é no problems
é too much UTF-8 encoding, or viewing UTF-8 encoded text with Latin-1 encoding
é much too much UTF-8 encoding
too little UTF-8 encoding
? something bad happened to this character
wild animals have eaten this character
𐀓 if you see a box, the font in use is missing this character.Firefox 3's boxes contain the hexadecimal value for the missing character,but it's still just a missing character.

Background Information from Wikipedia

A Guide to Diagnosing Character Display Problems (2025)

References

Top Articles
Latest Posts
Recommended Articles
Article information

Author: Trent Wehner

Last Updated:

Views: 6631

Rating: 4.6 / 5 (76 voted)

Reviews: 91% of readers found this page helpful

Author information

Name: Trent Wehner

Birthday: 1993-03-14

Address: 872 Kevin Squares, New Codyville, AK 01785-0416

Phone: +18698800304764

Job: Senior Farming Developer

Hobby: Paintball, Calligraphy, Hunting, Flying disc, Lapidary, Rafting, Inline skating

Introduction: My name is Trent Wehner, I am a talented, brainy, zealous, light, funny, gleaming, attractive person who loves writing and wants to share my knowledge and understanding with you.