R how to remove first row of duplicate values from a big column -

- July 15, 2013

this question has answer here:

how select last row among subset of rows satisfying condition in r programming 3 answers

in r have file (df) consisting in 2 big columns, , b (aprox. 1000000 elements each). know have many duplicate values in a. know how remove duplicates (remove second rows of each duplicate):

df1 = df[!duplicated(df$a), ]

but remove first rows in duplicate , keep second rows. instance, in following example, remove 71 t , keep 71 c, not other way around:

a   b  4   8   c 21  t 71  t 71  c 74  c 75  g 78  c 86  t

thanks in advance

using dplyr, can this:

library(dplyr) df %>% group_by(a) %>% slice(-1)

if need arrange column in specific way first, can incorporate arrange mix follows:

library(dplyr) df %>% arrange(a) %>% group_by(a) %>% slice(-1) # sorts in ascending order

Search This Blog

Stadnd

R how to remove first row of duplicate values from a big column -

Comments

Post a Comment

Popular posts from this blog

Capture and play voice with Asterisk ARI -

c - Unrecognised emulation mode: elf_i386 on MinGW32 -

magic numbers - Java's checkstyle, MagicNumberCheck -