Locale for Sorting
Today I ran afoul of a problem that I first encountered over 20 years ago: non-portable sort.
Back in the early aughts the symptom was that on Linux with a en_US.UTF-8
locale i < j < k
did not hold in sort
. This was on Linux. We worked around this by setting locale to C
for sort
.
Today en_US.UTF-8
no longer has such issues, but it sorts sorts in an order that is not ASCII compatible:
web@b01:~$ printf '%s\n' Apple apple Banana banana | LC_ALL=en_US.UTF-8 sort
apple
Apple
banana
Banana
Case comparison evidently is not the first sorting criterion.
There is now a C.UTF-8
locale that produce ASCII-compatible outputs as does the C
locale:
web@b01:~$ printf '%s\n' Apple apple Banana banana | LC_ALL=C.UTF-8 sort
Apple
Banana
apple
banana
There is also a POSIX
locale equivalent to C
. This treats only the English alphabet as letters with known upper and lower case variants. Furthermore, the POSIX recommendation is to set LC_ALL=POSIX
for utilities such as sort
and ls
.
Plus ça change, plus c’est la même chose.