4 min read

Investigating difftime behavior

Author: Ildi Czeller, @czeildi on Twitter and Github

## Warning: package 'data.table' was built under R version 3.5.2

It caused me a great deal of headache to work around as.difftime to get what I wanted but also I learned much more than I expected along the way so I wanted to share the journey with you.

Initial experience

I had a data frame with two columns containing timestamps and I wanted to calculate the difference between them.

##              send_time           open_time
## 1: 2018-04-01 00:01:00 2018-04-01 00:07:00
## 2: 2018-04-02 00:01:00 2018-04-02 02:01:00

In the first row, there are 6 minutes between send and open time and in the second row there are 2 hours. The essence of the unexpected behavior I experienced:

dt[, open_time - send_time][2]
## Time difference of 120 mins
dt[2, open_time - send_time]
## Time difference of 2 hours

Basically the result for one row depends on other rows in the data frame.

Finding a workaround

My first idea was to explicitly provide the units argument. However, it did not help.

dt[, as.difftime(open_time - send_time, units = "hours")][2]
## Time difference of 120 mins

I found that if I create a difftime object with difftime and not with subtraction it works as I expect.

dt[, difftime(open_time, send_time, units = "hours")][2]
## Time difference of 2 hours

Digging deeper

How does timestamp subtraction actually work?

base_time <- lubridate::ymd_hms("2018-04-01 00:00:00")
later_times <- base_time + c(
    lubridate::period(1, "month"),
    lubridate::period(1, "day"),
    lubridate::period(1, "minute"),
    lubridate::period(1, "second")
)
later_times[1:1] - base_time
## Time difference of 30 days
later_times[1:2] - base_time
## Time differences in days
## [1] 30  1
later_times[1:3] - base_time
## Time differences in mins
## [1] 43200  1440     1
later_times[1:4] - base_time
## Time differences in secs
## [1] 2592000   86400      60       1

Based on this I am quite confident to say that firstly the smallest unit will be used for all values and secondly subtraction won’t result in units greater than a day.

Why does units not help?

It still puzzled me why the explicit units argument did not help. Time to look at the source code at last!

as.difftime
function (tim, format = "%X", units = "auto")
{
    if (inherits(tim, "difftime"))
        return(tim)
    if (is.character(tim)) {
        difftime(strptime(tim, format = format), strptime("0:0:0",
            format = "%X"), units = units)
    }
    else {
        # ...
    }
}

Only by skimming the code we can see that it behaves differently depending on the type of its argument. What did we pass?

class(dt[2, open_time - send_time])
## [1] "difftime"

In the first two rows we get our answer: if the argument to as.difftime is already a difftime object, then the units argument will have no effect.

units="auto"

My next question was where is the code to determine the used unit? As as.difftime makes no transformation it must happen when we subtract two vectors.

Now it is useful to know that - is an S3 generic which means it selects what S3 method to call based on the type (S3 class) of its argument(s). In base R subtraction is defined specifically for timestamp objects (of class POSIXt). Note that you can call S3 methods directly with [generic_name].[class_name].

`-.POSIXt`
function (e1, e2)
{
    # ...
    if (!inherits(e1, "POSIXt"))
        stop("can only subtract from \"POSIXt\" objects")
    if (nargs() == 1)
        stop("unary '-' is not defined for \"POSIXt\" objects")
    if (inherits(e2, "POSIXt"))
        return(difftime(e1, e2))
    # if ...
    # ...
}

We subtract two values of class POSIXt so the relevant part for us is return(difftime(e1, e2)). Let’s look at difftime now.

difftime
function (time1, time2, tz, units = c("auto", "secs", "mins",
    "hours", "days", "weeks"))
{
    # ...
    z <- unclass(time1) - unclass(time2)
    attr(z, "tzone") <- NULL
    units <- match.arg(units)
    if (units == "auto")
        units <- if (all(is.na(z)))
            "secs"
        else {
            zz <- min(abs(z), na.rm = TRUE)
            if (!is.finite(zz) || zz < 60)
                "secs"
            else if (zz < 3600)
                "mins"
            else if (zz < 86400)
                "hours"
            else "days"
        }
    # ...
}

Without understanding every detail we can see that if units are not specified as in our case it will depend on the smallest absolute value. What makes these computations work is that unclass() on a POSIXct type object gives the number of seconds by default since 1970-01-01.

Finally I took the time to read the longish help file of difftime which contained most of this information. Still I would go this way again as I believe I learned more.

Conclusions

Look at the source code earlier and always be explicit with conversions if possible.