[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Excluding all-numeric fields

From: Göran Uddeborg
Subject: Re: Excluding all-numeric fields
Date: Mon, 1 Oct 2012 23:28:25 +0200

Dominik Stadler:
> ah, ok, but websec runs the regex on the HTML source, not on the way the data
> displays in a browser.

Sure.  I'm talking about paragraphs that consists completely of
numeric data in the HTML source.

> So you got all the HTML markup in between the actual
> text, which likely makes your regex not match at all here.

But there isn't any additional markup.  According to the ignore.list
manual page, the ignore patterns are applied to "paragraphs".  By
experimentation, I've figured out that e.g. table data cells are
considered paragraphs for this purpose.  The cells I'm trying to
ignore don't contain any additional markup, only the numbers,
sometimes beginning with a minus sign.

> Looking at the source of the page, I cannot find any line with just
> numbers in there. Which exact item would you like to ignore?

There are four differences in the example I provided at
ftp://ftp.uddeborg.se/pub/webdiff each in its own table data cell:

    23:17 mimmi$ diff a+science.old.html a+science.html 
    <           <td align="right">-3.37</td>
    >           <td align="right">-3.60</td>
    <           <td align="right">-7.37</td>
    >           <td align="right">-7.62</td>
    <           <td align="right">-24.16</td>
    >           <td align="right">-24.44</td>
    <           <td align="right">2.04</td>
    >           <td align="right">1.81</td>

If I do a webdiff without any ignore patterns, all those four fields
are marked.  If I add my ignore pattern, ^-?[0-9.]+$, the change of
the positive value, 2.04 to 1.81, is ignored as expected.  But the
three that have negative values remain marked as differences.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]