squeek502/zig-unidecode
Very approximate UTF-8 to ASCII transliterator (a Zig implementation of the Text::Unidecode Perl module)
A Zig implementation of the Text::Unidecode Perl module to convert UTF-8 text into a (very) approximate ASCII-only transliteration. That is, this is "meant to be a transliterator of last resort."
For a more detailed description including motivation, caveats, etc, see:
https://metacpan.org/pod/Text::Unidecode
UTF-8 | Transliterated ASCII |
---|---|
"ÿéáh" |
"yeah" |
"北亰" |
"Bei Jing " |
"Славься" |
"Slav'sia" |
"[██ ] 50%" |
"[## ] 50%" |
0x00
-0x7F
).0x7F
will never be transliterated to include any
ASCII control characters except \n
.[?]
.unidecodeAlloc
Takes an allocator in order to handle any input size safely. This should be used for most use-cases.
unidecodeBuf
Takes a dest
slice that must be large enough to handle the transliterated ASCII. Because the output size can vary greatly depending on the input, this is unsafe unless it can be known ahead-of-time that the transliterated output will fit (i.e. comptime).
unidecodeStringLiteral
A way to transliterate a UTF-8 string literal into ASCII at compile time.