bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Support for <meta charset=...> tag


From: Sho Amano
Subject: Re: [PATCH] Support for <meta charset=...> tag
Date: Tue, 28 Jul 2020 19:25:26 +0900

Hi Tim! Thanks for your reply.

Yes sure, please find it attached. I created it against latest
`master` branch so I hope it works now.

Best Regards,
Sho

2020年7月26日(日) 20:52 Tim Rühsen <tim.ruehsen@gmx.de>:
>
> Ah, sorry, just saw this email with your patch :-)
>
> Could you attach your patch as attachment. Git can't am/apply your patch
> here.
>
> Regards, Tim
>
> On 15.07.20 12:55, Sho Amano wrote:
> > Hi! I've been using the first version of wget for a long time and first of 
> > all,
> > I want to say thank you to all of the maintainers and contributors of
> > this project!
> >
> > I was looking at the code recently to find that it doesn't support
> > "<meta charset=...>" tag yet.
> > I don't see any issues in bug tracker related to this, so I created a patch.
> > I'm hoping it helps.
> >
> > I also attach two HTML files for verification. One of them specifies
> > Japanese path
> > in UTF-8, others does in Shift-JIS. Serve these files on localhost:8080, 
> > and let
> > wget follow the link. (e.g. `wget -d --recursive --level=2
> > http://localhost:8080/charset_test_shift_jis.html`) Verify that in
> > both cases, wget tries to download
> > http://localhost:8080/%E6%97%A5%E6%9C%AC%E8%AA%9E.html.
> >
> > Thanks!
> > Sho Amano
> >
> > ---
> >  src/html-url.c | 18 +++++++++++++++++-
> >  1 file changed, 17 insertions(+), 1 deletion(-)
> >
> > diff --git a/src/html-url.c b/src/html-url.c
> > index b80cf269..5324d244 100644
> > --- a/src/html-url.c
> > +++ b/src/html-url.c
> > @@ -182,6 +182,7 @@ static const char *additional_attributes[] = {
> >    "http-equiv",                 /* used by tag_handle_meta  */
> >    "name",                       /* used by tag_handle_meta  */
> >    "content",                    /* used by tag_handle_meta  */
> > +  "charset",                    /* used by tag_handle_meta  */
> >    "action",                     /* used by tag_handle_form  */
> >    "style",                      /* used by check_style_attr */
> >    "srcset",                     /* used by tag_handle_img */
> > @@ -191,7 +192,7 @@ static struct hash_table *interesting_tags;
> >  static struct hash_table *interesting_attributes;
> >
> >  /* Will contains the (last) charset found in 'http-equiv=content-type'
> > -   meta tags  */
> > +   or 'charset' meta tags  */
> >  static char *meta_charset;
> >
> >  static void
> > @@ -574,6 +575,7 @@ tag_handle_meta (int tagid _GL_UNUSED, struct
> > taginfo *tag, struct map_context *
> >  {
> >    char *name = find_attr (tag, "name", NULL);
> >    char *http_equiv = find_attr (tag, "http-equiv", NULL);
> > +  char *charset = find_attr (tag, "charset", NULL);
> >
> >    if (http_equiv && 0 == c_strcasecmp (http_equiv, "refresh"))
> >      {
> > @@ -673,6 +675,20 @@ tag_handle_meta (int tagid _GL_UNUSED, struct
> > taginfo *tag, struct map_context *
> >              }
> >          }
> >      }
> > +  else if (charset)
> > +    {
> > +      /* Handle stuff like:
> > +         <meta charset="CHARSET">
> > +         If charset is acquired from http-equiv then it is overwritten. */
> > +
> > +      /* Do a minimum check on the charset value */
> > +      if (check_encoding_name (charset))
> > +        {
> > +          char *mcharset = xstrdup (charset);
> > +          xfree (meta_charset);
> > +          meta_charset = mcharset;
> > +        }
> > +    }
> >  }
> >
> >  /* Handle the IMG tag.  This requires special handling for the srcset attr,
> >
>

Attachment: 0001-Support-charset-through-meta-charset-CHARSET.patch
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]