my data contains structure similar (reduced 2 elements, there tens):
variable elem_1_pre elem_1_post elem_2_pre elem_2_post observation1 present absent absent present observation2 absent present present absent the ultimate objective select observations (and possibly associated column names) present in pre absent in post , vice versa.
in other words, operation (pseudocode)
("present" in *_pre , "absent" in *_post) or ("present" in *_post , "absent" in *_pre) i'm thinking groupby used this. such thing possible pandas?
if values in dataframe strings 'present' , 'absent', convert string values boolean values
in [17]: df.values == 'present' out[17]: array([[ true, false, false, true], [false, true, true, false]], dtype=bool) once have boolean values, can use numpy xor logical operator, ^, combine 2 columns desired value:
import pandas pd df = pd.dataframe(['present absent absent present'.split(), 'absent present present absent'.split()], columns='elem_1_pre elem_1_post elem_2_pre elem_2_post'.split(), index='observation1 observation2'.split(),) df = pd.dataframe(df.values == 'present', columns=df.columns, index=df.index) print(df) # elem_1_pre elem_1_post elem_2_pre elem_2_post # observation1 true false false true # observation2 false true true false in range(1,3): elem = ['elem_{i}_{s}'.format(i=i, s=suf) suf in ('pre', 'post')] change = 'elem_{i}_change'.format(i=i) df[change] = df[elem[0]] ^ df[elem[1]] print(df.ix[:, 'elem_1_change elem_2_change'.split()]) yields
elem_1_change elem_2_change observation1 true true observation2 true true
Comments
Post a Comment