FOREWORD:
The question above was deleted by the OP while I was working on the following answer. Not being keen on wasted effort, I managed to copy the OP’s original question, and pasted it into the “new question” above. Yes… this is a bit odd 🙂
I think what you may be looking for is a CLI utility called iconv
. Inconveniently, iconv
requires “from” and “to” argument declarations (ref man iconv
) of the encoding type (e.g. UTF-8, ascii, unicode, etc)… and AFAIK, “shady
” is not a recognized encoding type 🙂 However – the encoding type may be determined from another CLI utility called file
. Still more inconveniently, both iconv
and file
specify that the input be contained in a file :/
Your question intrigued me as it seems a reasonable thing to do; i.e. C&P from PDF to CLI. So I spent a few minutes wrangling with iconv
and file
to get the following answer; an answer which does not require you to C&P your PDF strings into a file. <caveat>This works on my Ventura Mac under zsh
, but it’s been tested nowhere else.</caveat>
You’ve not provided an example, and I was unable to find any malfunctioning PDF code strings in a brief search. So – instead, I found this string in a French-language PDF on Python programming:
print(“Numéro de boucle”, i)
So – first we’ll need to run this string through file
to determine the encoding (note the use of the “dash” -
: a reference to stdin
in lieu of a proper filename):
echo "print("Numéro de boucle", i)" | file -
/dev/stdin: Unicode text, UTF-8 text
So – the string was encoded in UTF-8. Now let’s convert the string to ASCII from UTF-8 using iconv
:
NOTE: The
//translit
option is not addressed in the macOS version ofman iconv
, but it still works (?!). It is used as a flag to telliconv
to transliterate the output to the command line. Another option is to ignore the non-ascii character(s)://ignore
echo "print("Numéro de boucle", i)" | iconv -f utf-8 -t ascii//translit
print(Num'ero de boucle, i)
And so you may be wondering, “Why did it add the extra '
character”??. That’s a good question, and I think the answer has already been supplied here. Apple may be using utf-8-mac
instead of utf-8
. Which I guess would be OK if they had bothered to reflect that in their implementation of iconv
! In fact, there is a UTF8-MAC
encoding listed in the output of iconv --list
– but it doesn’t improve the transliteration!
As written, the iconv
utility cannot properly convert all utf-8-mac
characters to ASCII. It converts the ones it can, and issues an error for the others. To get a “best effort” from iconv
you can add the -c
option, causing iconv
to simply drop the characters it cannot convert. If you have a reasonably current Linux box handy, you can verify that iconv
does a correct and proper ‘transliteration’ (//TRANSLIT
) of the example used in this answer; i.e. no extra '
character.
And so, iconv
seems to work at least some of the time in macOS… hope this helps.