How to merge two data frames in r by a common column with mismatched date/time values -

i wish merge 2 datasets using ‘date time’ columns present in both (posixct format: dd/mm/yyyy hh:mm). below example data 2 datasets:

# dataset 1 (dts1)             datetime   count   period    1  30/03/2011 02:32      27      561 2  30/03/2011 02:42       3      600 3  30/03/2011 02:52       0      574 4  30/03/2011 03:02       1      550 5  30/03/2011 03:12      15      600 6  30/03/2011 03:22       0      597  # dateset 2 (dts2)     datetime         dist car   satd      alt 1 30/03/2011 01:59  23.9   1      3     1.76        2 30/03/2011 02:58  14.7   1      7     6.36        3 30/03/2011 03:55  10.4   2      9    -0.34       4 30/03/2011 04:53  35.4   1      3     3.55       5 30/03/2011 05:52  56.1   1      7    -0.91        6 30/03/2011 06:48  12.3   1      4     6.58       7 30/03/2011 07:48  10.7   1      5     4.18

if simple case of merging matching rows 2 frames basic merge(data1, data2, by="datetime") or rbind() function used.

however, problem more complicated time intervals in 2 datasets not equal. dataset 1 contains data @ precise 10-minute intervals (each row incorporates information on 10-minute block ends @ specified date/time), while dataset 2 contains data @ approximate 1-hour intervals (each row incorporates information 1-hour block ends @ specified date/time).

to make things further harder, there time mismatch between start times of rows in 2 datasets (i.e. dts1: 01/03/2013 10:00:00, dts2: 01/03/2012 09:58:12). dts2 intervals vary throughout dataset (± few minutes either side of 1 hour). want link every 10-minute data row in dataset 1 hour block fits within in dataset 2 (along associated column values dts2). there rows fit 2 different hour blocks (i.e. 30/03/2011 03:02), need these rows linked 1 of hour blocks.

i end this:

        datetime_dts1 count period     datetime2_dts2  dist  car satd      alt   1    30/03/2011 02:32    27    561   30/03/2011 02:58  14.7   1     7     6.36       2    30/03/2011 02:42     3    600   30/03/2011 02:58  14.7   1     7     6.36 3    30/03/2011 02:52     0    574   30/03/2011 02:58  14.7   1     7     6.36 4    30/03/2011 03:02     1    550   30/03/2011 02:58  14.7   1     7     6.36 5    30/03/2011 03:12    15    600   30/03/2011 03:55  10.4   2     9    -0.34 6    30/03/2011 03:22     0    597   30/03/2011 03:55  10.4   2     9    -0.34

i have searched answer problem have not been able solve , r abilities not advanced. if give me direction or provide solution, extremely grateful.

after first converting datetime character strings posixt classes, combination of rounding , truncating times should can use basis of merge.

first read in data, , create corresponding posixt datetimes:

dts1 <- structure(list(datetime = structure(1:6,    .label = c("30/03/2011 02:32", "30/03/2011 02:42",    "30/03/2011 02:52", "30/03/2011 03:02", "30/03/2011 03:12",    "30/03/2011 03:22"), class = "factor"), count = c(27l, 3l,    0l, 1l, 15l, 0l), period = c(561l, 600l, 574l, 550l, 600l,    597l)), .names = c("datetime", "count", "period"),    class = "data.frame", row.names = c(na, -6l)) dts2 <- structure(list(datetime = structure(1:7,     .label = c("30/03/2011 01:59", "30/03/2011 02:58",     "30/03/2011 03:55", "30/03/2011 04:53", "30/03/2011 05:52",     "30/03/2011 06:48", "30/03/2011 07:48"), class = "factor"),     dist = c(23.9, 14.7, 10.4, 35.4, 56.1, 12.3, 10.7), car =     c(1l, 1l, 2l, 1l, 1l, 1l, 1l), satd = c(3l, 7l, 9l, 3l, 7l,     4l, 5l), alt = c(1.76, 6.36, -0.34, 3.55, -0.91, 6.58,     4.18)), .names = c("datetime", "dist", "car", "satd",     "alt"), class = "data.frame", row.names = c(na, -7l))  # create corresponding posixlt vector # (you update 'datetime' columns in-place if prefer) datetime1 <- strptime(dts1$datetime, format="%d/%m/%y %h:%m") datetime2 <- strptime(dts2$datetime, format="%d/%m/%y %h:%m")

the following code produces merged table based on nearest hour in cases. inside merge it's prepending column rounded times each of data frames, merging based on (i.e., column number 1), using -1 index remove column @ end:

# merge based on nearest hour merge(     cbind(round(datetime1, "hours"), dts1),     cbind(round(datetime2, "hours"), dts2),     by=1, suffixes=c("_dts1", "_dts2") )[-1]       datetime_dts1 count period    datetime_dts2 dist car satd  alt 1 30/03/2011 02:32    27    561 30/03/2011 02:58 14.7   1    7 6.36 2 30/03/2011 02:42     3    600 30/03/2011 02:58 14.7   1    7 6.36 3 30/03/2011 02:52     0    574 30/03/2011 02:58 14.7   1    7 6.36 4 30/03/2011 03:02     1    550 30/03/2011 02:58 14.7   1    7 6.36 5 30/03/2011 03:12    15    600 30/03/2011 02:58 14.7   1    7 6.36 6 30/03/2011 03:22     0    597 30/03/2011 02:58 14.7   1    7 6.36

as above, time truncating on hour:

merge(     cbind(trunc(datetime1, "hours"), dts1),     cbind(trunc(datetime2, "hours"), dts2),     by=1, suffixes=c("_dts1", "_dts2") )[-1]       datetime_dts1 count period    datetime_dts2 dist car satd   alt 1 30/03/2011 02:32    27    561 30/03/2011 02:58 14.7   1    7  6.36 2 30/03/2011 02:42     3    600 30/03/2011 02:58 14.7   1    7  6.36 3 30/03/2011 02:52     0    574 30/03/2011 02:58 14.7   1    7  6.36 4 30/03/2011 03:02     1    550 30/03/2011 03:55 10.4   2    9 -0.34 5 30/03/2011 03:12    15    600 30/03/2011 03:55 10.4   2    9 -0.34 6 30/03/2011 03:22     0    597 30/03/2011 03:55 10.4   2    9 -0.34

as above, dts1 treat record belonging previous hour until 10 minutes past hour, subtracting 10*60 seconds before truncating. 1 produces same output specified, though without more information i'm not sure it's exact rule want.

merge(     cbind(trunc(datetime1 - 10*60, "hours"), dts1),     cbind(trunc(datetime2, "hours"), dts2),     by=1, suffixes=c("_dts1", "_dts2") )[-1]       datetime_dts1 count period    datetime_dts2 dist car satd   alt 1 30/03/2011 02:32    27    561 30/03/2011 02:58 14.7   1    7  6.36 2 30/03/2011 02:42     3    600 30/03/2011 02:58 14.7   1    7  6.36 3 30/03/2011 02:52     0    574 30/03/2011 02:58 14.7   1    7  6.36 4 30/03/2011 03:02     1    550 30/03/2011 02:58 14.7   1    7  6.36 5 30/03/2011 03:12    15    600 30/03/2011 03:55 10.4   2    9 -0.34 6 30/03/2011 03:22     0    597 30/03/2011 03:55 10.4   2    9 -0.34

you tweak details of ones round, ones truncate, , whether first subtract/add time depending on specific rule.

edit:

not elegant, here different approach accommodates more complicated conditional rule described in comments. leans heavily on na.locf zoo package first determine dts2 times come before , after each dts1 record. in hand, it's matter of applying rule select desired dts2 time, matching original dts1 table, merging.

library(zoo)  # create ordered list of datetimes, using names keep # track of ones come each data frame alldts <- sort(c(     setnames(datetime1, rep("dts1", length(datetime1))),     setnames(datetime2, rep("dts2", length(datetime2))))) is.dts1 <- names(alldts)=="dts1"  # each dts1 record, previous closest dts2 time dts2.prev <- alldts dts2.prev[is.dts1] <- na dts2.prev <- na.locf(dts2.prev, na.rm=false)[is.dts1]  # each dts1 record, next closest dts2 time dts2.next <- alldts dts2.next[is.dts1] <- na dts2.next <- na.locf(dts2.next, na.rm=false, fromlast=true)[is.dts1]  # each dts1 record, apply rule choose dts2 time use.prev <- !is.na(dts2.prev) & (alldts[is.dts1] - dts2.prev < 5) dts2.to.use <- ifelse(use.prev, as.character(dts2.prev),      as.character(dts2.next))  # merge based on chosen dts2 times, prepended character vector # purpose of merging merge(     cbind(.dt=dts2.to.use[match(datetime1, alldts[is.dts1])], dts1),     cbind(.dt=as.character(datetime2), dts2),     by=".dt", all.x=true, suffixes=c("_dts1", "_dts2") )[-1]

Search This Blog

Stadnd

How to merge two data frames in r by a common column with mismatched date/time values -

Comments

Post a Comment

Popular posts from this blog

Capture and play voice with Asterisk ARI -

visual studio - Installing Packages through Nuget - "Central Directory corrupt" -

python - Statsmodels.api Logit model error ValueError: endog must be in the unit interval -