[db-wg] Proposal to allow non-ASCII characters in "org-name:", "person:" and "role:" attributes
- Previous message (by thread): [db-wg] Proposal to allow non-ASCII characters in "org-name:", "person:" and "role:" attributes
- Next message (by thread): [db-wg] Proposal to allow non-ASCII characters in "org-name:", "person:" and "role:" attributes
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Job Snijders
job at fastly.com
Fri Nov 24 10:21:30 CET 2023
Dear Edward, On Fri, Nov 24, 2023 at 10:03:15AM +0100, Edward Shryane via db-wg wrote: > Currently the RIPE database only allows a subset of ASCII characters > in the "org-name:", "person:" and "role:" attributes, for a few > reasons including: > > * These attributes are also a look-up key and the Whois protocol does > not allow specifying character sets in queries. > * RPSL names are ASCII according to RFC2622 > * Using a normalised name makes the object easier to query > * Reading a normalised name is easier to interpret > > However there are some drawbacks to forcing names to only use a subset > of ASCII characters: > > * Organisations, roles and persons cannot use their actual name if it > includes characters outside this subset. > * Normalisation is not standard, but is an interpretation done by each > maintainer, e.g. characters could be excluded or converted in > different ways. The above two points are key in making the RIPE database useful and accessible to everyone, I too would love to see those points addressed. > Since we support the Latin-1 character set in the RIPE database, I > propose we also allow non-ASCII Latin-1 characters in these > attributes. > > Querying for a name can be done either using the latin-1 characters > (proposed) or a normalised, ASCII representation (currently). The > normalised version will be generated by Whois and stored in a database > index for querying. The primary key will also be generated from the > normalised version. > > Please let me know your feedback. Wouldn't it be an opportune time to support UTF-8 instead of LATIN-1? As I understand it, through the use of UTF-8 more languages could be supported. UTF-8 seems to be the preferred character encoding in any new IETF work (for good reason). Have the effects of LATIN-1 on downstream applications such as NRTM v3 and NRTM v4 been considered? You indicate that LATIN-1 already is supported in the RIPE database, so I imagine you and the team already deliberated on the pro's and con's of UTF-8 vs LATIN-1; and as such concluded with this particular recommendation. I just wanted to make sure to raise these questions. :-) Some interesting reading material on UTF-8 https://utf8everywhere.org/ Kind regards, Job
- Previous message (by thread): [db-wg] Proposal to allow non-ASCII characters in "org-name:", "person:" and "role:" attributes
- Next message (by thread): [db-wg] Proposal to allow non-ASCII characters in "org-name:", "person:" and "role:" attributes
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
[ db-wg Archives ]