Author: Ildi Czeller, @czeildi on Twitter and Github
It caused me a great deal of headache to work around as.difftime
to get what I wanted but also I learned much more than I expected along the way so I wanted to share the journey with you.
Initial experience
I had a data frame with two columns containing timestamps and I wanted to calculate the difference between them.
## send_time open_time
## 1: 2018-04-01 00:01:00 2018-04-01 00:07:00
## 2: 2018-04-02 00:01:00 2018-04-02 02:01:00
In the first row, there are 6 minutes between send and open time and in the second row there are 2 hours. The essence of the unexpected behavior I experienced:
dt[, open_time - send_time][2]
## Time difference of 120 mins
dt[2, open_time - send_time]
## Time difference of 2 hours
Basically the result for one row depends on other rows in the data frame.
Finding a workaround
My first idea was to explicitly provide the units
argument. However, it did not help.
dt[, as.difftime(open_time - send_time, units = "hours")][2]
## Time difference of 120 mins
I found that if I create a difftime object with difftime
and not with subtraction it works as I expect.
dt[, difftime(open_time, send_time, units = "hours")][2]
## Time difference of 2 hours
Digging deeper
How does timestamp subtraction actually work?
base_time <- lubridate::ymd_hms("2018-04-01 00:00:00")
later_times <- base_time + c(
lubridate::period(1, "month"),
lubridate::period(1, "day"),
lubridate::period(1, "minute"),
lubridate::period(1, "second")
)
later_times[1:1] - base_time
## Time difference of 30 days
later_times[1:2] - base_time
## Time differences in days
## [1] 30 1
later_times[1:3] - base_time
## Time differences in mins
## [1] 43200 1440 1
later_times[1:4] - base_time
## Time differences in secs
## [1] 2592000 86400 60 1
Based on this I am quite confident to say that firstly the smallest unit will be used for all values and secondly subtraction won’t result in units greater than a day.
Why does units
not help?
It still puzzled me why the explicit units
argument did not help. Time to look at the source code at last!
as.difftime
function (tim, format = "%X", units = "auto")
{
if (inherits(tim, "difftime"))
return(tim)
if (is.character(tim)) {
difftime(strptime(tim, format = format), strptime("0:0:0",
format = "%X"), units = units)
}
else {
# ...
}
}
Only by skimming the code we can see that it behaves differently depending on the type of its argument. What did we pass?
class(dt[2, open_time - send_time])
## [1] "difftime"
In the first two rows we get our answer: if the argument to as.difftime
is already a difftime
object, then the units
argument will have no effect.
units="auto"
My next question was where is the code to determine the used unit? As as.difftime
makes no transformation it must happen when we subtract two vectors.
Now it is useful to know that -
is an S3 generic which means it selects what S3 method to call based on the type (S3 class) of its argument(s). In base R subtraction is defined specifically for timestamp objects (of class POSIXt
). Note that you can call S3 methods directly with [generic_name].[class_name]
.
`-.POSIXt`
function (e1, e2)
{
# ...
if (!inherits(e1, "POSIXt"))
stop("can only subtract from \"POSIXt\" objects")
if (nargs() == 1)
stop("unary '-' is not defined for \"POSIXt\" objects")
if (inherits(e2, "POSIXt"))
return(difftime(e1, e2))
# if ...
# ...
}
We subtract two values of class POSIXt
so the relevant part for us is return(difftime(e1, e2))
. Let’s look at difftime
now.
difftime
function (time1, time2, tz, units = c("auto", "secs", "mins",
"hours", "days", "weeks"))
{
# ...
z <- unclass(time1) - unclass(time2)
attr(z, "tzone") <- NULL
units <- match.arg(units)
if (units == "auto")
units <- if (all(is.na(z)))
"secs"
else {
zz <- min(abs(z), na.rm = TRUE)
if (!is.finite(zz) || zz < 60)
"secs"
else if (zz < 3600)
"mins"
else if (zz < 86400)
"hours"
else "days"
}
# ...
}
Without understanding every detail we can see that if units
are not specified as in our case it will depend on the smallest absolute value. What makes these computations work is that unclass()
on a POSIXct
type object gives the number of seconds by default since 1970-01-01
.
Finally I took the time to read the longish help file of difftime
which contained most of this information. Still I would go this way again as I believe I learned more.
Conclusions
Look at the source code earlier and always be explicit with conversions if possible.