r - Query from dataset -
i have 2 data sets following.
dat1 <- read.table(header=true, text=" id log dist ab7 1.1 2 ab8 1.6 1.5 ab21 3 1 ab3 2.05 1.09 ab300 1.5 0.45 ab4 1.78 1.11 ab10 1.9 2 ab501 1.5 0.2 ") dat1 id log dist 1 ab7 1.10 2.00 2 ab8 1.60 1.50 3 ab21 3.00 1.00 4 ab3 2.05 1.09 5 ab300 1.50 0.45 6 ab4 1.78 1.11 7 ab10 1.90 2.00 8 ab501 1.50 0.20 dat2 <- read.table(header=true, text=" id lfrom lto it1 it2 it3 it4 ab7 1 1.05 47 152 259 140 ab7 1.05 1.96 29 45 39 30 ab7 1.96 2.35 59 65 47 40 ab7 2.35 4.45 27 36 31 37 ab7 4.45 5 58 60 60 56 ab8 1.1 2.1 88 236 251 145 ab8 2.1 3.1 51 66 47 43 ab8 3.1 3.5 31 63 46 37 ab8 3.5 3.8 58 35 31 51 ab8 3.8 3.9 29 40 30 48 ab21 1.2 2.1 72 263 331 147 ab3 1 2 71 207 290 242 ab3 2 3 22 38 64 46 ab3 3 4 35 35 43 61 ab3 4 4.5 42 37 33 53 ab300 1 2 54 65 51 67 ab4 1.2 2.1 67 38 54 24 ab4 2.1 2.3 67 30 20 50 ab4 2.3 9.1 67 27 34 39 ab10 1.1 2 64 56 21 34 ab501 1 2 47 152 259 140 ") dat2 id lfrom lto it1 it2 it3 it4 1 ab7 1.00 1.05 47 152 259 140 2 ab7 1.05 1.96 29 45 39 30 3 ab7 1.96 2.35 59 65 47 40 4 ab7 2.35 4.45 27 36 31 37 5 ab7 4.45 5.00 58 60 60 56 6 ab8 1.10 2.10 88 236 251 145 7 ab8 2.10 3.10 51 66 47 43 8 ab8 3.10 3.50 31 63 46 37 9 ab8 3.50 3.80 58 35 31 51 10 ab8 3.80 3.90 29 40 30 48 11 ab21 1.20 2.10 72 263 331 147 12 ab3 1.00 2.00 71 207 290 242 13 ab3 2.00 3.00 22 38 64 46 14 ab3 3.00 4.00 35 35 43 61 15 ab3 4.00 4.50 42 37 33 53 16 ab300 1.00 2.00 54 65 51 67 17 ab4 1.20 2.10 67 38 54 24 18 ab4 2.10 2.30 67 30 20 50 19 ab4 2.30 9.10 67 27 34 39 20 ab10 1.10 2.00 64 56 21 34 21 ab501 1.00 2.00 47 152 259 140
i need query data set match id first , check second , third columns of dat1
perform query. condition check whether log
data position in between lfrom
, lto
in dat2
. example, ab3
, log
2.05 , dist
1.09. so, range (2.05, [2.05+1.09=]3.14)
. row 13
in dat2
: (lfrom, lto)=(2,3)
, row 14
in dat2
: (lfrom, lto)=(3,4)
. range of dat1
(2.05, 3.14)
inside range (2,4)
of these 2 rows. satisfies condition. final data set this:
id log dist lfrom lto it1 it2 it3 it4 1 ab7 1.10 2.00 1.05 1.96 29 45 39 30 2 ab7 1.10 2.00 1.96 2.35 59 65 47 40 3 ab7 1.10 2.00 2.35 4.45 27 36 31 37 4 ab8 1.60 1.50 1.10 2.10 88 236 251 145 5 ab8 1.60 1.50 2.10 3.10 51 66 47 43 6 ab3 2.05 1.09 2.00 3.00 22 38 64 46 7 ab3 2.05 1.09 3.00 4.00 35 35 43 61 8 ab300 1.50 0.45 1.00 2.00 54 65 51 67 9 ab4 1.78 1.11 1.20 2.10 67 38 54 24 10 ab4 1.78 1.11 2.10 2.30 67 30 20 50 11 ab4 1.78 1.11 2.30 9.10 67 27 34 39 12 ab501 1.50 0.20 1.00 2.00 47 152 259 140
dplyr solution:
dat2 %>% group_by(id) %>% mutate(log = dat1$log[dat1$id == unique(id)]) %>% mutate(dist = dat1$dist[dat1$id == unique(id)]) %>% mutate(lfrommin = min(lfrom)) %>% mutate(ltomax = max(lto)) %>% mutate(upper = log+dist) %>% filter(log > lfrommin & upper<ltomax) %>% filter(lfrom >= lfrom[which(lfrom-log== max((lfrom-log)[lfrom-log < 0]))]) %>% filter(lto <= lto[which(lto-upper == min((lto - upper)[lto-upper>0]))]) %>% select(c(id, log, dist, lfrom, lto, it1, it2, it3, it4))
Comments
Post a Comment