Locale for Sorting

Today I ran afoul of a problem that I first encountered over 20 years ago: non-portable sort.

Back in the early aughts the symptom was that on Linux with a en_US.UTF-8 locale i < j < k did not hold in sort. This was on Linux. We worked around this by setting locale to C for sort.

Today en_US.UTF-8 no longer has such issues, but it sorts sorts in an order that is not ASCII compatible:

web@b01:~$ printf '%s\n' Apple apple Banana banana | LC_ALL=en_US.UTF-8 sort
apple
Apple
banana
Banana

Case comparison evidently is not the first sorting criterion.

There is now a C.UTF-8 locale that produce ASCII-compatible outputs as does the C locale:

web@b01:~$ printf '%s\n' Apple apple Banana banana | LC_ALL=C.UTF-8 sort
Apple
Banana
apple
banana

There is also a POSIX locale equivalent to C. This treats only the English alphabet as letters with known upper and lower case variants. Furthermore, the POSIX recommendation is to set LC_ALL=POSIX for utilities such as sort and ls.

Plus ça change, plus c’est la même chose.