gnumed-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnumed-devel] help with trimming $


From: catmat
Subject: Re: [Gnumed-devel] help with trimming $
Date: Wed, 02 Mar 2005 02:27:47 +1100
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20041231

Horst Herb wrote:

On Tue, 1 Mar 2005 14:05, Richard Terry wrote:
eg 10/12/2004 Hypertension to return just Hypertension
or 10/2000 Appendix
0r 2001 Cancer Bowel to return Cancer of the bowel

Anyone volunteer the python code to use.

For this I wouldn't even use regular expressions (which cost a lot of processor time and memory)
- all examples stated start with a number, so would just parse:
if (line starts with number):
1.) read character until character not in [0-9, '.', '/', '-'] and append to parsestr
2.) try split parsestr with date separators ('.', '/', '-')
3.) if only one split str and length ==4: this is the date (check plausibility of date) 4.) else, if two split strings: assume first junk is months, last is year, check for plausibility
5.) else, ...

hey, it's quick and easy to do in Python, and since you are learning Python just now, ... a good exercise in string manipulation / parsing?

Horst
I initially tried to use regex for parsing twiki markup, but in the end, I found
it was easier just thinking about characters, and lines, and using
a state machine state variable e.g. INIT=0, IN_NUMBERS=1 , AFTER_NUMBERS=2
and using a big IF statement.
e.g. state = INIT
      accumulators = ....

      for x in line:
         if state = INIT:
               do ...
               if condition : change state = IN_NUMBERS
         elif state= IN_NUMBERS:
               do...   if condition : change state=....
         elif  state= AFTER_NUMBERS:
                do

      process_accumulators







reply via email to

[Prev in Thread] Current Thread [Next in Thread]