r - Split and unsplit a dataframe in four parts -


i'd split dataframe in 4 equals parts, because i'd use 4 cores of computer.

i did :

df2 <- split(df, 1:4) unsplit(df2, f=1:4) 

and that

df2 <- split(df, 1:4) unsplit(df2, f=c('1','2','3','4') 

but unsplit function did not work, have these warnings messages

1: in split.default(seq_along(x), f, drop = drop, ...) :   data length not multiple of split variable ... 

do have idea of reason ?

how many rows in df? warning if number of rows in table not divisible 4. think using split factor f incorrectly, unless want put each subsequent row different split data.frame.

if want split data 4 dataframes. 1 row after other make splitting factor same size number of rows in dataframe using rep_len this:

## split this: split(df , f = rep_len(1:4, nrow(df) ) ) ## unsplit this: unsplit( split(df , f = rep_len(1:4, nrow(df) ) ) , f = rep_len(1:4,nrow(df) ) ) 

hopefully example illustrates why error occurs , how avoid (i.e. use proper splitting factor!).

## want split our data.frame 2 halves, rows not divisible 2 df <- data.frame( x = runif(5) ) df  ## splitting still works but... ## warning because split factor 'f' not recycled multiple of it's length split( df , f = 1:2 ) #$`1` #         x #1 0.6970968 #3 0.5614762 #5 0.5910995  #$`2` #         x #2 0.6206521 #4 0.1798006  warning message: in split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...) :   data length not multiple of split variable   ## instead let's use same split levels (1:2)... ## make equal length of rows in table: splt <- rep_len( 1:2 , nrow(df) ) splt #[1] 1 2 1 2 1   ## split works, , f not recycled because there  ## same number of values in 'f' rows in table split( df , f = splt ) #$`1` #         x #1 0.6970968 #3 0.5614762 #5 0.5910995  #$`2` #         x #2 0.6206521 #4 0.1798006  ## , unsplitting works expected , reconstructs our original data.frame unsplit( split( df , f = splt ) , f = splt ) #         x #1 0.6970968 #2 0.6206521 #3 0.5614762 #4 0.1798006 #5 0.5910995 

Comments