> Unix paths don’t need to be valid UTF-8 and most programs happily pipe the mess through into text that should be valid
How about a new mount option utf8_only? When that is set on a volume, the VFS would block any attempt to create a new file/directory if the name isn't valid UTF-8. (Pre-existing file/directories with invalid UTF-8 can still be accessed.) Distributions could set it by default on all filesystems, but a user could turn it off if it caused a problem for them (which in practice is probably going to be rare.)
One could also have a flag set on the filesystem (e.g. in the superblock) similar to utf8_only. It could only be set at filesystem creation time. If it is set, then any invalid UTF-8 in a filename is a filesystem corruption which fsck could repair. A filesystem with such a flag set would ban invalid UTF-8 irrespective of any utf8_only mount option.
If we are going to ban invalid UTF-8, it would be a good idea for security reasons to ban C0 controls as well (i.e. all characters in range U+0001 to U+001F), see [1]. This could be included in the utf8_only mount option / filesystem flag, or be an independent mount option / filesystem flag. If going with the same flag for both, maybe "sane_filenames_only" might be a better name.
(Actually, for security, one should ban the UTF-8 encodings of the C1 controls as well... the CSI character U+009B might be interpreted as an ESC[ by some applications, which could have nefarious consequences. Likewise, the APC (application program command) and OSC (operating system command) characters could cause security issues, although in practice support for them is rather limited, which limits the scope of the security issues they pose.)
How about a new mount option utf8_only? When that is set on a volume, the VFS would block any attempt to create a new file/directory if the name isn't valid UTF-8. (Pre-existing file/directories with invalid UTF-8 can still be accessed.) Distributions could set it by default on all filesystems, but a user could turn it off if it caused a problem for them (which in practice is probably going to be rare.)
One could also have a flag set on the filesystem (e.g. in the superblock) similar to utf8_only. It could only be set at filesystem creation time. If it is set, then any invalid UTF-8 in a filename is a filesystem corruption which fsck could repair. A filesystem with such a flag set would ban invalid UTF-8 irrespective of any utf8_only mount option.
If we are going to ban invalid UTF-8, it would be a good idea for security reasons to ban C0 controls as well (i.e. all characters in range U+0001 to U+001F), see [1]. This could be included in the utf8_only mount option / filesystem flag, or be an independent mount option / filesystem flag. If going with the same flag for both, maybe "sane_filenames_only" might be a better name.
(Actually, for security, one should ban the UTF-8 encodings of the C1 controls as well... the CSI character U+009B might be interpreted as an ESC[ by some applications, which could have nefarious consequences. Likewise, the APC (application program command) and OSC (operating system command) characters could cause security issues, although in practice support for them is rather limited, which limits the scope of the security issues they pose.)
[1] https://www.austingroupbugs.net/view.php?id=251