R how to remove first row of duplicate values from a big column -
this question has answer here:
in r have file (df) consisting in 2 big columns, , b (aprox. 1000000 elements each). know have many duplicate values in a. know how remove duplicates (remove second rows of each duplicate):
df1 = df[!duplicated(df$a), ]
but remove first rows in duplicate , keep second rows. instance, in following example, remove 71 t , keep 71 c, not other way around:
a b 4 8 c 21 t 71 t 71 c 74 c 75 g 78 c 86 t
thanks in advance
using dplyr, can this:
library(dplyr) df %>% group_by(a) %>% slice(-1)
if need arrange column in specific way first, can incorporate arrange mix follows:
library(dplyr) df %>% arrange(a) %>% group_by(a) %>% slice(-1) # sorts in ascending order
Post a Comment