Tool: Fix broken unicode characters

So in the past I’ve had many a nightmare receiving garbled unicode characters in an email or in a poorly encoded database field, so I’ve written a basic tool that works with some of the most common Western European characters and will correctly convert them (optionally to their closest matching standard ASCII character).

I am going to try and work on a full set (that can be incrementally loaded), but this has been sufficient for my needs so far.

Use it here!

Edit: I’ve updated the Tool to allow for various functions, input via literals (\x0d or \u02f4), returning to multibyte in the text area and other fixes. The same link will work.

This link was incredibly helpful with my investigation and testing!

4 Replies to “Tool: Fix broken unicode characters”

  1. Nice tool!

    I might be missing something, but I think it could be done in just a few lines of code instead of using your Unicode.js library.

    For example:

    decodeURIComponent(escape('café')); // 'café'
    1. Hi Mathias,

      You are totally right! I never even thought about doing that. Next time I get five minutes I’ll rework the bulk of the script and use that method as it makes much more sense! I think I’ll still keep the rest of the library in the unminified version as it was quite interesting to work through the various chunks of Unicode.

      Thanks for your comment!


  2. This doesn’t work for Greek.

    for instance “Áìáëßá Ãéáííáêïðïýëïõ” should be converted to “Αμαλία Γιαννακοπούλου”

Leave a Reply to Mathias Bynens Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.