python - Pandas: extract and select data from columns using a pattern -


my data contains structure similar (reduced 2 elements, there tens):

variable        elem_1_pre    elem_1_post   elem_2_pre    elem_2_post observation1    present       absent        absent        present observation2    absent        present       present       absent 

the ultimate objective select observations (and possibly associated column names) present in pre absent in post , vice versa.

in other words, operation (pseudocode)

("present" in *_pre , "absent" in *_post) or ("present" in *_post , "absent" in *_pre) 

i'm thinking groupby used this. such thing possible pandas?

if values in dataframe strings 'present' , 'absent', convert string values boolean values

in [17]: df.values == 'present' out[17]:  array([[ true, false, false,  true],        [false,  true,  true, false]], dtype=bool) 

once have boolean values, can use numpy xor logical operator, ^, combine 2 columns desired value:

import pandas pd df = pd.dataframe(['present absent absent present'.split(),                    'absent present present absent'.split()],                   columns='elem_1_pre elem_1_post elem_2_pre elem_2_post'.split(),                   index='observation1 observation2'.split(),) df = pd.dataframe(df.values == 'present',                   columns=df.columns,                   index=df.index) print(df) #              elem_1_pre elem_1_post elem_2_pre elem_2_post # observation1       true       false      false        true # observation2      false        true       true       false  in range(1,3):     elem = ['elem_{i}_{s}'.format(i=i, s=suf) suf in ('pre', 'post')]     change = 'elem_{i}_change'.format(i=i)     df[change] = df[elem[0]] ^ df[elem[1]] print(df.ix[:, 'elem_1_change elem_2_change'.split()]) 

yields

             elem_1_change elem_2_change observation1          true          true observation2          true          true 

Comments