probem:
i have string containing different numbers, math signs , words, e.g.
str = ".1**2 + x/(10.0 - 2.e-4)*n_elts" i extract numbers , keep parts between numbers can place again later (after working on numbers).
lst = [".1", "**", "2", " + ", "x/(", "10.0", " - ", "2.e-4", ")*n_elts"] would 1 of many acceptable results. elements not numbers can split further in arbitrary way, since next step be
"".join(process(l) l in lst) where process (suggestions better way check l number welcome):
def process(l): try: n = float(l) except valueerror: return l else: return work_on_it(l) current state:
from this answer figured out how keep deliminators , worked way to
lst = re.split('( |\+|\-|\*|/)', ".1**2 + x/(10.0 - 2.e-4)*n_elts") now need somehow avoid splitting 2.e-4.
i tried work out regex (vi syntax, hope universal) covers numbers possibly appear , think
\d*\.\d*[e|e]*[|+|-]*\d* should ok.
one strategy somehow re.
i found related answer seems number matching part. might bit more complex need, not know how combine keeping deliminators bit.
one general note: inside character classes don't use |, because it's treated character matched. inside character classes, allowed characters listed after 1 another.
to solve problem: since keeping delimiters anyway, doesn't matter whether matching numbers or non-numbers right? use
lst = re.split(r'(\d*\.\d*[ee]*[+-]*\d*)', ".1**2 + x/(10.0 - 2.e-4)*n_elts") you might want improve on number regex bit though:
lst = re.split(r'((?:\d+\.\d*|\.?\d+)(?:[ee][+-]?\d+)?)', ".1**2 + x/(10.0 - 2.e-4)*n_elts") this way, make decimal point optional, require @ least 1 digit before or after it. makes exponential part optional, ensures it's well-formatted if present. ?: suppresses capturing. otherwise inner groups same outer set of parentheses, , add parts matched inside result of split - don't want though, because give complete number, part before exponential, , exponential separately. need use ?: suppress capturing (which in general habit unless explicitly need capturing).
finally, note use of raw strings (the r preceding string literal). without escaping can ugly (in may have double escape regex meta-characters). in python, should use raw strings denote regex patterns.
Comments
Post a Comment