"%uXXXX" is a non-standard scheme for URL-encoding Unicode characters. Apparently it was proposed but never really used. As such, there's hardly any standard function that can decode it into an actual UTF-8 sequence.
It's not too difficult to do it yourself though:
$string = '%u05E1%u05E2';
$string = preg_replace['/%u[[0-9A-F]+]/', '$1;', $string];
echo html_entity_decode[$string, ENT_COMPAT, 'UTF-8'];
This converts the %uXXXX
notation to HTML entity notation XXXX;
, which can be decoded to actual UTF-8 by html_entity_decode
. The above outputs the characters "סע" in UTF-8 encoding.
Created: February-27, 2022 The It is used to handle the special character or characters from languages other than English. PHP has different ways to convert text into Both It is used to encode and
decode See the example below: The code above encodes an Output: The When reading an utf8_encode[]
and utf8_decode[]
to Encode and Decode Strings in PHPiconv[]
to Convert a String to UTF-8UTF-8
is a way to encode Unicode characters,
each character in between one to four bytes.UTF-8
.Use
utf8_encode[]
and utf8_decode[]
to Encode and Decode Strings in PHPutf8_encode[]
and utf8_decode[]
are built-in functions in PHP.ISO-8859-1
, and other types of strings to UTF-8
, both of these function takes a string as a parameter.
ISO-8859-1
string to UTF
and then decodes the output again. The input string you see is with ISO-8859-1
encoding.UTF-8 Encoded String: àéí
UTF-8 Decoded String: ���
UTF-8 Encoded String from the decoded: àéí
utf8_decode[]
converts a string with ISO-8859-1
characters encoded with UTF-8
to single-byte ISO-8859-1
.ISO-8859-1
encoded text as UTF-8
, you will often see that question mark.
Use iconv[]
to Convert a String to UTF-8
iconv[]
is another built-in PHP function used to convert string from one Unicode.
It takes three parameters, one is the string’s Unicode, the second is the Unicode you want to convert, and the third is the string itself.
See the example below:
The code above takes three parameters and converts the text to UTF-8
.
Output:
The UTF-8 String is: àéí
The UTF-8 String with auto detection is: àéí
PHP also offers other functions like recode_string[]
or mb_convert_encoding[]
, which works similarly to iconv
; they convert a string to the requested Unicode.
Write for us
DelftStack articles are written by software geeks like you. If you also would like to contribute to DelftStack by writing paid articles, you can check the write for us page.
Related Article - PHP Encode
[PHP 4, PHP 5, PHP 7, PHP 8]
utf8_decode — Converts a string from UTF-8 to ISO-8859-1, replacing invalid or unrepresentable characters
Warning
This function has been DEPRECATED as of PHP 8.2.0. Relying on this function is highly discouraged.
Description
utf8_decode[string $string
]: string
Note:
Many web pages marked as using the
ISO-8859-1
character encoding actually use the similarWindows-1252
encoding, and web browsers will interpretISO-8859-1
web pages asWindows-1252
.Windows-1252
features additional printable characters, such as the Euro sign [€
] and curly quotes [“
”
], instead of certainISO-8859-1
control characters. This function will not convert suchWindows-1252
characters correctly. Use a different function ifWindows-1252
conversion is required.
Parameters
string
A UTF-8 encoded string.
Return Values
Returns the ISO-8859-1 translation of string
.
Changelog
8.2.0 | This function has been deprecated. |
7.2.0 | This function has been moved from the XML extension to the core of PHP. In previous versions, it was only available if the XML extension was installed. |
Examples
Example #1 Basic examples
thierry.bo # netcourrier point com ¶
16 years ago
In response to fhoech [22-Sep-2005 11:55], I just tried a simultaneous test with the file UTF-8-test.txt using your regexp, 'j dot dittmer' [20-Sep-2005 06:30] regexp [message #56962], `php-note-2005` [17-Feb-2005 08:57] regexp in his message on `mb-detect-encoding` page [//us3.php.net/manual/en/function.mb-detect-encoding.php#50087] who is using a regexp from the W3C [//w3.org/International/questions/qa-forms-utf-8.html], and PHP mb_detect_encoding function.
Here are a summarize of the results :
201 lines are valid UTF8 strings using phpnote regexp
203 lines are valid UTF8 strings using j.dittmer regexp
200 lines are valid UTF8 strings using fhoech regexp
239 lines are valid UTF8 strings using using mb_detect_encoding
Here are the lines with differences [left to right, phpnote, j.dittmer and fhoech] :
Line #70 : NOT UTF8|IS UTF8!|IS UTF8! :2.1.1 1 byte [U-00000000]: ""
Line #79 : NOT UTF8|IS UTF8!|IS UTF8! :2.2.1 1 byte [U-0000007F]: ""
Line #81 : IS UTF8!|IS UTF8!|NOT UTF8 :2.2.3 3 bytes [U-0000FFFF]: "" |
Line #267 : IS UTF8!|IS UTF8!|NOT UTF8 :5.3.1 U+FFFE = ef bf be = "" |
Line #268 : IS UTF8!|IS UTF8!|NOT UTF8 :5.3.2 U+FFFF = ef bf bf = "" |
Interesting is that you said that your regexp corrected j.dittmer regexp that failed on 5.3 section, but it my test I have the opposite result ?!
I ran this test on windows XP with PHP 4.3.11dev. Maybe these differences come from operating system, or PHP version.
For mb_detect_encoding I used the command :
mb_detect_encoding[$line, 'UTF-8, ISO-8859-1, ASCII'];
visus at portsonline dot net ¶
15 years ago
Following code helped me with mixed [UTF8+ISO-8859-1[x]] encodings. In this case, I have template files made and maintained by designers who do not care about encoding and MySQL data in utf8_binary_ci encoded tables.