How to convert unicode to string in php?

"%uXXXX" is a non-standard scheme for URL-encoding Unicode characters. Apparently it was proposed but never really used. As such, there's hardly any standard function that can decode it into an actual UTF-8 sequence.

It's not too difficult to do it yourself though:

$string = '%u05E1%u05E2';
$string = preg_replace['/%u[[0-9A-F]+]/', '&#x$1;', $string];
echo html_entity_decode[$string, ENT_COMPAT, 'UTF-8'];

This converts the %uXXXX notation to HTML entity notation &#xXXXX;, which can be decoded to actual UTF-8 by html_entity_decode. The above outputs the characters "סע" in UTF-8 encoding.

  1. HowTo
  2. PHP Howtos
  3. PHP UTF-8 Conversion

Created: February-27, 2022

  1. Use utf8_encode[] and utf8_decode[] to Encode and Decode Strings in PHP
  2. Use iconv[] to Convert a String to UTF-8

The UTF-8 is a way to encode Unicode characters, each character in between one to four bytes.

It is used to handle the special character or characters from languages other than English.

PHP has different ways to convert text into UTF-8.

Use utf8_encode[] and utf8_decode[] to Encode and Decode Strings in PHP

Both utf8_encode[] and utf8_decode[] are built-in functions in PHP.

It is used to encode and decode ISO-8859-1, and other types of strings to UTF-8, both of these function takes a string as a parameter.

See the example below:


The code above encodes an ISO-8859-1 string to UTF and then decodes the output again. The input string you see is with ISO-8859-1 encoding.

Output:

UTF-8 Encoded String: àéí
UTF-8 Decoded String: ���
UTF-8 Encoded String from the decoded: àéí

The utf8_decode[] converts a string with ISO-8859-1 characters encoded with UTF-8 to single-byte ISO-8859-1.

When reading an ISO-8859-1 encoded text as UTF-8, you will often see that question mark.

Use iconv[] to Convert a String to UTF-8

iconv[] is another built-in PHP function used to convert string from one Unicode.

It takes three parameters, one is the string’s Unicode, the second is the Unicode you want to convert, and the third is the string itself.

See the example below:


The code above takes three parameters and converts the text to UTF-8.

Output:

The UTF-8 String is: àéí
The UTF-8 String with auto detection is: àéí

PHP also offers other functions like recode_string[] or mb_convert_encoding[], which works similarly to iconv; they convert a string to the requested Unicode.

Write for us

DelftStack articles are written by software geeks like you. If you also would like to contribute to DelftStack by writing paid articles, you can check the write for us page.

Related Article - PHP Encode

  • The utf8_encode Function in PHP
  • PHP Session Encode Decode
  • Encode HTML in PHP
  • [PHP 4, PHP 5, PHP 7, PHP 8]

    utf8_decode Converts a string from UTF-8 to ISO-8859-1, replacing invalid or unrepresentable characters

    Warning

    This function has been DEPRECATED as of PHP 8.2.0. Relying on this function is highly discouraged.

    Description

    utf8_decode[string $string]: string

    Note:

    Many web pages marked as using the ISO-8859-1 character encoding actually use the similar Windows-1252 encoding, and web browsers will interpret ISO-8859-1 web pages as Windows-1252. Windows-1252 features additional printable characters, such as the Euro sign [] and curly quotes [ ], instead of certain ISO-8859-1 control characters. This function will not convert such Windows-1252 characters correctly. Use a different function if Windows-1252 conversion is required.

    Parameters

    string

    A UTF-8 encoded string.

    Return Values

    Returns the ISO-8859-1 translation of string.

    Changelog

    VersionDescription
    8.2.0 This function has been deprecated.
    7.2.0 This function has been moved from the XML extension to the core of PHP. In previous versions, it was only available if the XML extension was installed.

    Examples

    Example #1 Basic examples

    thierry.bo # netcourrier point com

    16 years ago

    In response to fhoech [22-Sep-2005 11:55], I just tried a simultaneous test with the file UTF-8-test.txt using your regexp, 'j dot dittmer' [20-Sep-2005 06:30] regexp [message #56962], `php-note-2005` [17-Feb-2005 08:57] regexp in his message on `mb-detect-encoding` page [//us3.php.net/manual/en/function.mb-detect-encoding.php#50087] who is using a regexp from the W3C [//w3.org/International/questions/qa-forms-utf-8.html], and PHP mb_detect_encoding function.

    Here are a summarize of the results :

    201 lines are valid UTF8 strings using phpnote regexp
    203 lines are valid UTF8 strings using j.dittmer regexp
    200 lines are valid UTF8 strings using fhoech regexp
    239 lines are valid  UTF8 strings using using mb_detect_encoding

    Here are the lines with differences [left to right, phpnote, j.dittmer and fhoech] :

    Line #70 : NOT UTF8|IS UTF8!|IS UTF8! :2.1.1 1 byte [U-00000000]: ""
    Line #79 : NOT UTF8|IS UTF8!|IS UTF8! :2.2.1 1 byte [U-0000007F]: ""
    Line #81 : IS UTF8!|IS UTF8!|NOT UTF8 :2.2.3 3 bytes [U-0000FFFF]: "￿" |
    Line #267 : IS UTF8!|IS UTF8!|NOT UTF8 :5.3.1 U+FFFE = ef bf be = "￾" |
    Line #268 : IS UTF8!|IS UTF8!|NOT UTF8 :5.3.2 U+FFFF = ef bf bf = "￿" |

    Interesting is that you said that your regexp corrected j.dittmer regexp that failed on 5.3 section, but it my test I have the opposite result ?!

    I ran this test on windows XP with PHP 4.3.11dev. Maybe these differences come from operating system, or PHP version.

    For mb_detect_encoding I used the command :

    mb_detect_encoding[$line, 'UTF-8, ISO-8859-1, ASCII'];

    visus at portsonline dot net

    15 years ago

    Following code helped me with mixed [UTF8+ISO-8859-1[x]] encodings. In this case, I have template files made and maintained by designers who do not care about encoding and MySQL data in utf8_binary_ci encoded tables.

    Chủ Đề