Decoding \x{ZZZZ} utf8 strings inside perl

Published on Author gryzli

We have some cPanel accounts with Cyrillic language set to default, which makes cPanel to return escaped utf8 messages while you operating with the API. 

One of the example messages I was getting , looked like this :

\x{437}\x{430}\x{43f}\x{438}\x{441} \x{437}\x{430} \x{434}\x{43e}\x{43c}\x{435}\x{439}\x{43d} \x{201c} \x{201d} \x{432}\x{435}\x{447}\x{435} \x{441}\x{44a}\x{449}\x{435}

which is not very eye-friendly.


Using simple perl code without additional modules 

The simplest way of decoding the string is by using some perl code that looks like this: 


# Decode the string
my $string='\x{437}\x{430}\x{43f}\x{438}\x{441}\x{437}\x{430} \x{434}\x{43e}\x{43c}\x{435}\x{439}\x{43d} \x{201c}\x{201d} \x{432}\x{435}\x{447}\x{435} \x{441}\x{44a}\x{449}\x{435}\x{441}\x{442}\x{432}\x{443}\x{432}\x{430}.'  =~ s/(\\x\{[0-9a-z]+\})/qq{"$1"}/eerig; 

print "Decoded string: $string \n"; 


The magic happens inside the in-place substitution (=~ s/(\\x\{[0-9a-z]+\})/qq{“$1”}/eerig) , the flags are doing the following: 

‘ee’ -> evaluate the right side as a string then eval the result. This actually calls eval() upon “\x{YYY}” 

‘i’ -> case-insensitive

‘g’ -> do the job for all matches 

‘r’ -> in-place string modification


You could also use this as a perl one-liner: 

(here we have removed the ‘r’ flag from the substitution)

echo '\x{434}\x{43e}\x{43c}\x{435}\x{439}\x{43d}' | perl  -pe 's/(\\x\{[0-9a-z]+\})/qq{"$1"}/eeig'


Using String::Unescape to do the conversion

You could also use the module: String::Unescape to do the job. 

In this case your script will look something like this: 

use String::Unescape; 

my $string='\x{437}\x{430}\x{43f}\x{438}\x{441}\x{437}\x{430} \x{434}\x{43e}\x{43c}\x{435}\x{439}\x{43d} \x{201c}\x{201d} \x{432}\x{435}\x{447}\x{435} \x{441}\x{44a}\x{449}\x{435}\x{441}\x{442}\x{432}\x{443}\x{432}\x{430}.' ;
print String::Unescape->unescape($string) . "\n";