bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] RE: Bug-wget Digest, Vol 11, Issue 21


From: e-soft Helbok Hermann
Subject: [Bug-wget] RE: Bug-wget Digest, Vol 11, Issue 21
Date: Sat, 26 Sep 2009 22:22:15 +0200

Dear Sirs,

I did find out today, what is probably the reason for the crashes:

The crash are the cookies, somewhere the source there.

If I call your program with the flag --no-cookies, 
then no crash occurs.  

Maybe this helps (I did make about 10.000 calls with this
flag and didnt get any access violation using wget).



Firma: e-soft Helbok
SOFTWAREENTWICKLUNG, CONSULTING, MANAGEMENTCONSULTING
Postanschrift: Postfach 267, A-6010 Innsbruck
Mobil.: ++43-699-14106694 Skype ID: Hermann.Helbok
Telefonisch erreichbar: Mo – Fr. 10:00 – 12:00
e-mail: address@hidden
Web: http://www.esoftic.com/
Geschäftsführer: Herr Hermann Helbok

Es gelten die Allgemeinen Geschäftsbedingungen der Firma e-soft Helbok.
Gerichtsstand ist Innsbruck.
****************************************************************************
Dieses Dokument enthaelt vertrauliche und/oder rechtlich geschuetzte
Informationen.
Wenn Sie nicht der richtige Adressat sind oder dieses Dokument irrtuemlich
erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie
diese Mail. Alle Marketingstrategien, sowie Ideen der Firma e-soft Helbok
Hermann welche in diesen Dokumenten oder mündlich erwähnt werden, dürfen
weder an Dritte weitergegeben, noch für eigene Produkte sowie Services
verwendet werden.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht
gestattet. Das Dokument darf nur vom Absender und Empfänger geöffnet werden,
da Sie ansonsten gegen den Datenschutz verstoßen.
 
This e-mail may contain confidential and/or privileged information. 
If you are not the intended recipient (or have received this document in
error) please notify the sender immediately and destroy this document. 
Any unauthorised copying, disclosure or distribution of the material in this
document is strictly forbidden. 
It is not allowed to copy or develop products because of ideas and/or
marketing strategies that are included in this document, or told to others.

Sollte diese Mail / Teile von Dritten gelesen werden oder weiterverarbeitet
werden,
müssen diese unseriösen Firmen mir die Pönale / Schadensersatz bezahlen,
Diese Sache ist als schwerwiegend zu betrachten und wird mit einem
Schadensersatz von mindestens 900 Mio Euro beziffert (Weitergabe an Dritte,
siehe 2 x Passus), da in dem Fall diese Vorgehensweise professionell
illegalerweise von Konzernen durchgeführt wird, und meine Firma durch die
Weiterverwendung meiner Daten daraus Profit ziehen. 
Ich ersuche Sie auch Sie meine Firma endlich auch auftragsmaessig zu
unterstützen, die durch
Rufschädigungen und Mobbing sowie laufenden Illegale durchgeführten 
Machenschaften incl. Datendiebstählen spwie Betrug geschädigt wird / wurde /
wird.
****************************************************************************


-----Original Message-----
From: address@hidden
[mailto:address@hidden On Behalf Of
address@hidden
Sent: Samstag, 26. September 2009 18:03
To: address@hidden
Subject: Bug-wget Digest, Vol 11, Issue 21

Send Bug-wget mailing list submissions to
        address@hidden

To subscribe or unsubscribe via the World Wide Web, visit
        http://lists.gnu.org/mailman/listinfo/bug-wget
or, via email, send a message with subject or body 'help' to
        address@hidden

You can reach the person managing the list at
        address@hidden

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Bug-wget digest..."


Today's Topics:

   1. Re: Thoughts on regex support (Matthew Woehlke)
   2. Re: Re: Thoughts on regex support (Micah Cowan)
   3. RE: Re: Thoughts on regex support (Tony Lewis)
   4. please support MacOS's "Cookies.plist" (Jamie Zawinski)
   5. Re: Wget 1.12 v. VMS (Steven M. Schweda)


----------------------------------------------------------------------

Message: 1
Date: Fri, 25 Sep 2009 12:43:25 -0500
From: Matthew Woehlke <address@hidden>
Subject: [Bug-wget] Re: Thoughts on regex support
To: address@hidden
Message-ID: <address@hidden>
Content-Type: text/plain; charset=UTF-8; format=flowed

Tony Lewis wrote:
> Micah Cowan wrote:
>> Tony Lewis wrote:
>>> Given that the most common use case is to match against suffixes in the
>>> path, perhaps ':path/i:^.*\.' and '$' should be implied so that
--traverse
>>> '(html?|php)' is interpreted as ':path/i:^.*\.(html?|php)$'.
>> Again, I really want consistency with the regex rules.
> 
> OK. So how about adding :suffix: to the mix. Then one can say --traverse
> ':suffix/i:(html?|php)'.

I don't think this will work very well. What is the suffix of 
'vacation_plans.odt.bak'?

> In all the places that I work with regular expressions, anchors are
> explicitly specified so *I* would be most surprised by having implicit
> anchors.

find(1) :-). But that's the /only/ example of explicit anchoring I can 
think of (and actually, Micah pointed it out, I don't know that I have 
ever used regex with find).

> What about the possibility of including multiple components in the same
> argument to match?

If we do that, better to just implement full Boolean logic IMO. Of 
course I think PCRE's allow toggling case sensitivity for parts of the 
regex, which would solve this.

Um... if we require PCRE, we might not need flags at all. And we can 
drop them safely, because the syntax was such that they could still be 
re-added later.

> In your proposal am I allowed to supply two --match parameters that are
> OR'ed together?

URL's are accepted iff:
[ANY match evaluates true] AND NOT [ANY no-match evaluates true]

-- 
Matthew
Please do not quote my e-mail address unobfuscated in message bodies.
-- 
I want to vote for a Conservative Democrat. Too bad they're about as 
rare as an Honest Politician. Maybe I'll get lucky and someone will come 
along that's both.





------------------------------

Message: 2
Date: Fri, 25 Sep 2009 11:11:18 -0700
From: Micah Cowan <address@hidden>
Subject: Re: [Bug-wget] Re: Thoughts on regex support
To: address@hidden
Message-ID: <address@hidden>
Content-Type: text/plain; charset=UTF-8

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Matthew Woehlke wrote:
> Micah Cowan wrote:
>> Tony Lewis wrote:
>>>> If the components aren't specified, it would default to matching just
>>>> the pathname portion of the URL.
>>> I'm not sure this is the obvious behavior, but I would get used to it.
>>
>> It's open for discussion. What do you think the most obvious behavior
>> would be? Full-url? I'm currently trying to aim for
>> most-frequently-used, over most-obvious, so if you think that'd be a
>> different component (or slice of components), lemme know.
> 
> I'd missed this point in the original message. I would think full url is
> most obvious. I'd be hesitant to guess what 'most used' would be; that
> tends to be a failing proposition for at least some audiences. Ergo
> since no solution is best from 'most used' standpoint, 'most sensible'
> wins out IMHO.
> 
> (And I personally think url is more obvious than ':s-p:'...)

I'm not so sure. I think a lot of people who did --match '\.html$' would
be miffed when they discover that the match fails on
"index.html?foo=bar". It depends somewhat on the site, but I think in
general :s-p: will be less surprising than :url:.

BTW, I'm starting to think "-" was a poor choice... it'd be nice to
reserve that for possible use in component names. Maybe :s..p: is the
best way to go after all.

(mine:)
>> As already discussed, --match and --no-match would be analogs to -A and
>> - -R; they'd just use regexes rather than wildcards (and have wider
>> options for what portions you're matching against).
> 
> Thought: is it possible to alter the syntax of -A/-R to tell these that
> you are matching a regex rather than a glob? Maybe by requiring the '::'?

I'm not crazy about that. It would save us the consumption of new
short-options, but...

I'm not sure I can identify what my objection is; obviously it'd break
any previous scripts that happen to match that pattern, but that has
gotta be pretty frikkin' rare.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
Maintainer of GNU Wget and GNU Teseq
http://micah.cowan.name/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkq9B8YACgkQ7M8hyUobTrEGdgCfdXY8kAhVkFCunPv652YjXcrs
slcAn0Q0/40h1hqK9yW8mZVqwBXVSrmR
=xUoI
-----END PGP SIGNATURE-----




------------------------------

Message: 3
Date: Fri, 25 Sep 2009 17:26:19 -0700
From: "Tony Lewis" <address@hidden>
Subject: RE: [Bug-wget] Re: Thoughts on regex support
To: "'Micah Cowan'" <address@hidden>,   <address@hidden>
Message-ID: <address@hidden>
Content-Type: text/plain;       charset="UTF-8"

Micah Cowan wrote:

>Matthew Woehlke wrote:
>> Thought: is it possible to alter the syntax of -A/-R to tell these that
>> you are matching a regex rather than a glob? Maybe by requiring the '::'?
>
>I'm not crazy about that. It would save us the consumption of new
>short-options, but...
>
>I'm not sure I can identify what my objection is; obviously it'd break
>any previous scripts that happen to match that pattern, but that has
>gotta be pretty frikkin' rare.

Especially since, AFAIK, '::' is not valid anywhere within a URL; the way I
read RFC 1630, the only place an unescaped ':' can appear is after the
scheme.

Tony





------------------------------

Message: 4
Date: Fri, 25 Sep 2009 15:44:31 -0700
From: Jamie Zawinski <address@hidden>
Subject: [Bug-wget] please support MacOS's "Cookies.plist"
To: address@hidden
Message-ID: <address@hidden>
Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes

When I want to do cookie-ish things with wget, it sure is a hassle  
that first I have to convert ~/Library/Cookies/Cookies.plist into a  
Netscape-4 style cookie file first.  (Even Firefox doesn't support  
that format of cookies.txt any more!)

Library/Cookies/Cookies.plist is shared by all HTTP-using applications  
on MacOS (it's a WebKit thing, not a Safari thing.)  It's a trivial  
XML format:

<array>
   <dict>
     <key>Domain</key>
     <string>.foo.com</string>

     <key>Name</key>
     <string>...</string>

     <key>Path</key>
     <string>/</string>

     <key>Value</key>
     <string>...</string>
   </dict>
...

--
Jamie Zawinski       address@hidden                  http://www.jwz.org/
                     address@hidden      http://www.dnalounge.com/
                                          http://jwz.livejournal.com/





------------------------------

Message: 5
Date: Fri, 25 Sep 2009 21:40:17 -0500 (CDT)
From: address@hidden (Steven M. Schweda)
Subject: Re: [Bug-wget] Wget 1.12 v. VMS
To: address@hidden
Message-ID: <address@hidden>

From: Micah Cowan

> Steven M. Schweda wrote:
> >    7a.  Lacking Unix-like builders, I stuck some place-holders for
> > "compilation_string" and "link_string" into "src/vms.c":
> 
> Is there no equivalent that could be used in the VMS builders? If not,
> the NULL check seems reasonable to me.

   I know of nothing.  The compiler options are revealed in a compiler
listing file, and the linker options may be revealed in a linker map
file, but none of that stuff would be easy to obtain, or, I claim, of
particular interest to the user.

> >    And, speaking of build-related info, would it be useful on other
> > system types to include an indicator of large-file support somewhere in
> > this stuff?  (Or has large-file support become so common that no one
> > else worries about it now?)  A "sizeof off_t" test would be good enough
> > for me.
> 
> It can't hurt.

   You say that now.

   And, as the author of much sloppy code, I hate to complain -- Well, I
enjoy it, but I'm reluctant -- but in "src/main.c":

      [...]
      /* defined in version.c */
      extern char *version_string;
      extern char *compilation_string;
      extern char *system_getrc;
      extern char *link_string;
      /* defined in build_info.c */
      extern char *compiled_features[];
      [...]

Isn't this why the Great Sprit gave us header files?  Especially, for
example, when the real "compiled_features" is declared differently
("src/build_info.c"):
      const char* (compiled_features[])
(And do those parentheses really do anything?  My "cdecl" says no.)

   At the moment, what I have looks like this:

[...]
extern const char *compiled_features[];  /* I fixed only this one. */
[...]
  int i;
                /* Changes begin. */
  int j;
  int line_length;
  const char *runtime_features[] = { "/", NULL, NULL };
  const char **features[] = { compiled_features, runtime_features, NULL };

  /* Fill runtime_features[].  Be sure to leave an array-terminating NULL.
*/
  if (sizeof (off_t) >= 8)
    runtime_features[1] = "+large-file";
  else
    runtime_features[1] = "-large-file";

  printf (_("GNU Wget %s built on %s.\n\n"), version_string, OS_TYPE);
  /* compiled_features is a char*[]. We limit the characters per
     line to MAX_CHARS_PER_LINE and prefix each line with a constant
     number of spaces for proper alignment. */
  printf (_("Compile-time/Run-time features:\n"));
  line_length = MAX_CHARS_PER_LINE - TABULATION;
  printf ("%*c", TABULATION, ' ');
  for (j = 0; features[j] != NULL; )
    {
      for (i = 0; features[j][i] != NULL; )
        {
          int len = strlen (features[j][i]) + 1;
          line_length -= len;
          if (line_length < 0)
            {
              printf ("\n%*c", TABULATION, ' ');
              line_length = MAX_CHARS_PER_LINE - TABULATION - len;
            }
          printf ("%s ", features[j][i]);
          i++;
        }
      j++;
    }
  printf ("\n");


Here, the result looks like this:

ALP $ wgxl --version
GNU Wget 1.12 built on VMS Alpha V7.3-2.

Compile-time/Run-time features:
    +digest +ipv6 -nls +ntlm +opie +md5/builtin +https -gnutls +openssl
    -iri / +large-file
Wgetrc:
    SYS$LOGIN:.wgetrc (user)

Copyright (C) 2009 Free Software Foundation, Inc.
[...]

   I haven't checked it for portability problems (what could go wrong?)
but I claim that this code actually does limit the line length properly
(although I didn't trim that trailing space), and it really is indented,
as the comment suggests.  It was so easy to toss in that "/" separator
between the two classes of items, that I couldn't resist, but it's easy
to remove, if desired.  Lots of formatting changes could be made, of
course.  And the new array could be relegated to some other backwater,
like "src/build_info.c", or some new module.

   I figured that I'd suck down a fresh source kit soon, and whip up
some patches against the current stuff, and also assemble a kit of new
VMS-specific files (source and builders) for incorporation.  More VMS
builder changes may be needed, but I think that I have something usable
now, and it might be nice to get it into the mainstream, so that an
adventurous VMS user could try it.  (I assume that a normal victim will
have trouble where I don't.)

   I'd also like to see if there's an easy way to auto-generate a VMS
help file from the "wget -h" output, but that may be a while.

   Opinions welcome, as always.

   SMS.




------------------------------

_______________________________________________
Bug-wget mailing list
address@hidden
http://lists.gnu.org/mailman/listinfo/bug-wget


End of Bug-wget Digest, Vol 11, Issue 21
****************************************





reply via email to

[Prev in Thread] Current Thread [Next in Thread]