bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#40794: 26.3; HTML entities ☆ and ★ (inter alia) are not p


From: Lars Ingebrigtsen
Subject: bug#40794: 26.3; HTML entities ☆ and ★ (inter alia) are not parsed by libxml-parse-html-region
Date: Wed, 29 Jul 2020 07:35:51 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux)

I had a look at the libxml2 sources.  The logic isn't really explained,
but apparently they include all the <255-value entities, and then a
selected number of the other entities (about 160 of them).

I have no idea what the logic behind this is...  perhaps they've just
forgotten to add the new ones?  Which makes me think that this is really
a libxml2 bug, and you should report it there instead.

Excerpt:

/************************************************************************
 *                                                                      *
 *      The list of HTML predefined entities                    *
 *                                                                      *
 ************************************************************************/

static const htmlEntityDesc  html40EntitiesTable[] = {
/*
 * the 4 absolute ones, plus apostrophe.
 */
{ 34,   "quot", "quotation mark = APL quote, U+0022 ISOnum" },
{ 38,   "amp",  "ampersand, U+0026 ISOnum" },
{ 39,   "apos", "single quote" },
{ 60,   "lt",   "less-than sign, U+003C ISOnum" },
{ 62,   "gt",   "greater-than sign, U+003E ISOnum" },

/*
 * A bunch still in the 128-255 range
 * Replacing them depend really on the charset used.
 */
{ 160,  "nbsp", "no-break space = non-breaking space, U+00A0 ISOnum" },
{ 161,  "iexcl","inverted exclamation mark, U+00A1 ISOnum" },
{ 162,  "cent", "cent sign, U+00A2 ISOnum" },

[...]

{ 376,  "Yuml", "latin capital letter Y with diaeresis, U+0178 ISOlat2" },

/*
 * Anything below should really be kept as entities references
 */
{ 402,  "fnof", "latin small f with hook = function = florin, U+0192 ISOtech" },

{ 710,  "circ", "modifier letter circumflex accent, U+02C6 ISOpub" },
{ 732,  "tilde","small tilde, U+02DC ISOdia" },

{ 913,  "Alpha","greek capital letter alpha, U+0391" },
{ 914,  "Beta", "greek capital letter beta, U+0392" },
{ 915,  "Gamma","greek capital letter gamma, U+0393 ISOgrk3" },
{ 916,  "Delta","greek capital letter delta, U+0394 ISOgrk3" },

[...]

{ 9824, "spades","black spade suit, U+2660 ISOpub" },
{ 9827, "clubs","black club suit = shamrock, U+2663 ISOpub" },
{ 9829, "hearts","black heart suit = valentine, U+2665 ISOpub" },
{ 9830, "diams","black diamond suit, U+2666 ISOpub" },


-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no






reply via email to

[Prev in Thread] Current Thread [Next in Thread]