[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Relaxing the restrictions for store item names
From: |
Kaelyn |
Subject: |
Re: Relaxing the restrictions for store item names |
Date: |
Fri, 25 Aug 2023 16:32:41 +0000 |
Hi,
A couple of small early-morning (for me) comments below... not for or against
the idea of percent encoding, but as a little bit of food for thought while
pondering how to handle Unicode in package names and/or store paths.
On Friday, August 25th, 2023 at 2:01 PM, Eidvilas Markevičius
<markeviciuseidvilas@gmail.com> wrote:
> Although now, just a few hours later, I'm having second thoughts on
> this. When you really think about it, it's very unlinkely that some
> user would prefer typing something like
>
> guix install
> %D0%B8%D0%BC%D0%B0%D0%B3%D0%B8%D0%BD%D0%B0%D1%80%D0%B8-%D0%BF%D1%80%D0%BE%D0%B3%D1%80%D0%B0%D0%BC
>
> over
>
> guix install имагинари-програм
I imagine that, for usability, the percent encoding (or other encoding or
transliteration) of non-ASCII characters could be handled transparently, i.e.
for "guix install имагинари-програм", guix would translate "имагинари-програм"
to the encoded form for operations. And if the escape character (e.g. the "%"
in percent encoding) isn't also a valid character for store or package names
then the values can be handled transparently. For example, both "guix install
git" and "guix install %67%69%74" and "guix install g%69t" would all install
git.
> even if they don't have the russian (or whatever other language)
> keyboard layout set up on their system, so just for accessability
> purposes, the solution wouldn't be all that great.
> It would also make
> store name unnecessarily long (they're already long as is), and
> there's a 255 char limit for filenames that we have to keep in mind as
> well. Searching the store using standard utilities such as find and
> grep would too, as a consequence,
I split out the quote above as a bit of reference. While I agree that we have
to keep in mind the 255 char limit for filenames, with percent encoding causing
a single byte in ASCII or UTF-8 to become ~3 bytes (with iirc most non-latin
characters having multi-byte encodings in UTF-8) and the store hashes being a
33 byte prefix (counting the dash), 255 chars is still quite a bit.
Specifically, the extracted quote above--without the "> " prefixes and with
line breaks treated as single characters--is exactly 255 characters. (I find a
bit of readable text to be helpful for wrapping my brain around a value like
"255 characters".)
Cheers,
Kaelyn
> break... There's just too many
> problems with this.
>
> I believe what Julien proposed is the most reasonable solution:
> unrestrict unicode characters in the store and (maybe) make it a
> project policy to not put unicode characters inside package names
> (however, personally I wouldn't be against that either).
>
> Now ensuring that URIs don't break, especially for substitute
> provision, should also be taken into consideration, but this can be
> handled separately.
>
> On Fri, Aug 25, 2023 at 12:14 PM Eidvilas Markevičius
> markeviciuseidvilas@gmail.com wrote:
>
> > On Fri, Aug 25, 2023 at 11:37 AM Nathan Dehnel ncdehnel@gmail.com wrote:
> >
> > > What you could do is implement percent encoding:
> > > https://en.wikipedia.org/wiki/Percent-encoding
> > > -Allows you to store package titles in any language in an encoded form
> > > -Allows the titles to be typed on latin keyboards
> > > -Allows the packages to be accessed through URIs in the future without
> > > causing problems
> >
> > Now that's an idea. I didn't really thought of that. Although it'd
> > probably be trickier to implement in order to make all the tooling
> > compatible. I think that might be a good solution nonetheless.
- Re: Relaxing the restrictions for store item names, (continued)
Re: Relaxing the restrictions for store item names, Simon Tournier, 2023/08/24
Re: Relaxing the restrictions for store item names, Kaelyn, 2023/08/24
Relaxing the restrictions for store item names, Nathan Dehnel, 2023/08/25