bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] Another css url bug


From: x86
Subject: [Bug-wget] Another css url bug
Date: Thu, 07 Oct 2010 10:25:04 +0400
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.7) Gecko/20100811 Thunderbird/3.1.1

While in crawler mode wget exit with "bad buffer" then meet following page:
<html>
<font style=i></font>

Can fix with following patch:
--- html-url.c  2009-09-22 07:00:12.000000000 +0400
+++ html-url.c2 2010-10-07 10:13:11.000000000 +0400
@@ -350,12 +350,23 @@
 check_style_attr (struct taginfo *tag, struct map_context *ctx)
 {
   int attrind;
+  int raw_start;
+  int raw_len;
   char *style = find_attr (tag, "style", &attrind);
   if (!style)
     return;

   /* raw pos and raw size include the quotes, hence the +1 -2 */
- get_urls_css (ctx, ATTR_POS(tag,attrind,ctx)+1, ATTR_SIZE(tag,attrind)-2);
+  raw_start = ATTR_POS(tag,attrind,ctx);
+  raw_len   = ATTR_SIZE(tag,attrind);
+  if( *(char *)(ctx->text + raw_start) == ''' ||
+      *(char *)(ctx->text + raw_start) == '"'){
+       raw_start += 1;
+       raw_len   -= 2;
+  }
+  if(raw_len <= 0)
+       return;
+  get_urls_css (ctx, raw_start, raw_len);
 }

 /* All the tag_* functions are called from collect_tags_mapper, as




reply via email to

[Prev in Thread] Current Thread [Next in Thread]