Hi folks
C# has three ways to declare Unicode literals.
See Character literals and Unicode character escape sequences
\x hex-digit hex-digit-opt hex-digit-opt hex-digit-opt
\u hex-digit hex-digit hex-digit hex-digit
\U hex-digit hex-digit hex-digit hex-digit hex-digit hex-digit hex-digit hex-digit
For example
char x1Upper = '\xA';
char x2Lower = '\xab';
char x3Upper = '\xABC';
char x4Mixed = '\xaBcD';
char uUpper = '\uABCD';
char UMixed = '\U000abcD1';
But none of Unicode literals are available in PowerShell
\xnnnn and \unnnn literals can be expressed by a simple cast hex int to char.
$x1Upper = [char] 0xA
$x2Lower = [char] 0xab
$x3Upper = [char] 0xABC
$x4Mixed = [char] 0xaBcD
$uUpper = [char] 0xABCD
\Unnnnnnnn literals require a bit more sophisticated approach
$UMixed = [char]::ConvertFromUtf32(0x000abcD1)
The last approach is the most generic and works for all literals
When we need to declare a string with Unicode characters inside it requires more complex syntax
$str = "xyz$([char] 0xA)klm$([char]::ConvertFromUtf32(0x000abcD1))"
If we need to deal with many Unicode strings we can declare a helper function
function U
{
param
(
[int] $Code
)
[char]::ConvertFromUtf32($Code)
}
And then we can use
$str = "xyz$(U 0xA)klm$(U 0x000abcD1)"
UPD: Just found that my implementation has an issue with surrogate pairs
U 0xd800
fails with
Exception calling "ConvertFromUtf32" with "1" argument(s): "A valid UTF32 value is between 0x000000 and 0x10ffff, inclusive, and should not include surrogate codepoint values (0x00d800 ~ 0x00dfff).
To fix this we need to extend the implementation
function U
{
param
(
[int] $Code
)
if ((0 -le $Code) -and ($Code -le 0xFFFF))
{
return [char] $Code
}
if ((0x10000 -le $Code) -and ($Code -le 0x10FFFF))
{
return [char]::ConvertFromUtf32($Code)
}
throw "Invalid character code $Code"
}