ISO 8601 and Nanosecond Precision Across Languages

Date formats are like fruits, sometimes you don’t know they’re rotten until you cut them open

Introduction

Given a date format that supports arbitrary precision, the question of how best to represent the date in a language of choice is generally tough to answer. In situations like these, I like to look at current standards and see how implementations work. As an example I will use ISO 8601, which is a standard way to represent a datetime with arbitrary precision (obligatory xkcd).

Before we get to the examples, the impetus for this post comes from an emerging healthcare standard called fhir (it’s pronounced “fire” and is an acronym but I prefer writing in lowercase). One of the jobs that fhir has is to define models described in JSON to improve interoperability and part of these models include dates. JSON does not specify how dates should be formatted, so fhir defines it for us as a regular expression (visualized):

-?[0-9]{4}(-(0[1-9]|1[0-2])(-(0[0-9]|[1-2][0-9]|3[0-1])(T([01][0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9](\.[0-9]+)?(Z|(\+|-)((0[0-9]|1[0-3]):[0-5][0-9]|14:00)))?)?)?

It’s not pretty and it looks to have a large intersection with valid ISO 8601 dates. We’ll have to see if ISO 8601 compliance status gets added to the docs.

Anyways, the author of hapi fhir (“happy fire”) was looking for implementation tips for implementing these dates in Java with an emphasis on Java 6, precision, and data representation. As a comparison, I decided to look at ISO 8601 implementations.

Python

The official datetime object supports microsecond precision. datetime.isoformat() creates a string in the following ISO8601 format:

YYYY-MM-DDTHH:MM:SS.mmmmmm or, if microsecond is 0, YYYY-MM-DDTHH:MM:SS

If more than microsecond precision is specified in the payload during parsing you’ll start seeing errors.

from datetime import datetime

datetime.strptime('2016-06-10T21:42:24.760738', '%Y-%m-%dT%H:%M:%S.%f')
# datetime.datetime(2016, 6, 10, 21, 42, 24, 760738)

datetime.strptime('2016-06-10T21:42:24.76073899', '%Y-%m-%dT%H:%M:%S.%f')
# raises ValueError: unconverted data remains: 99

Popular Python libraries don’t do much more with a lack of an intermediate format and simply truncate the trailing digits:

from dateutil.parser import parse
parse('2016-06-10T21:42:24.76073899')
# datetime.datetime(2016, 6, 10, 21, 42, 24, 760738)

from iso8601 import parse_date
parse_date('2016-06-10T21:42:24.76073899')
# datetime.datetime(2016, 6, 10, 21, 42, 24, 760738, tzinfo=<iso8601.Utc>)

import arrow
arrow.get('2016-06-10T21:42:24.76073899')
# <Arrow [2016-06-10T21:42:24.760738+00:00]>

If you want a higher resolution you’ll have to use NumPy’s datetime64, which supports 64 bits of precision of your choosing. For instance, I could have 64 bits worth of years to 64 bits of attoseconds:

import numpy
numpy.datetime64('2016-06-10T21:42:24.76073899')
# numpy.datetime64('2016-06-10T21:42:24.760738990+0000')
# Created a datetime with nanosecond resolution

numpy.datetime64('2016-06-10T21:42:24.7607389988')
# numpy.datetime64('1970-04-07T02:09:22.937684421136+0000')
# Created a datetime with picosecond resolution but
# unfortunately 64bits of picoseconds can't represent
# today's date with the default offset of Jan 1, 1970

So if one wants ISO 8601 strings down to the nanosecond, only a single Numby datetime64 is needed – any higher precision will require coordination between two datetime64s.

.NET

Eric Lippert wrote a great article on Precision and accuracy of DateTime, which uses .NET specific DateTime, but his findings that accuracy vs precision is a very real distinction that is applicable across all languages.

DateTime has a resolution of 100 nanoseconds (known as a tick)

// Parses fine: 100 nanosecond resolution
DateTime.ParseExact("2016-06-10T21:42:24.7607389", "yyyy-MM-ddTHH:mm:ss.fffffff", null)

// Does not parse, throws error
DateTime.ParseExact("2016-06-10T21:42:24.76073899", "yyyy-MM-ddTHH:mm:ss.ffffffff", null);

NodaTime does not change this fact.

I’m not familiar with any other date libraries for .NET so if you need higher precision than 100 nanoseconds then you may have to consider rolling your own object as suggested by a stackoverflow (unaccepted) answer, though the answer could certainly be improved upon.

Javascript

Javascript may be the worst off when in comes to support in sub-millisecond resolution (much less in supporting parsing ISO 8601 to an arbitrary precision). The built in Date object goes down to only milliseconds and the popular moment.js library is only a wrapper around the native date type and so offers no additional granularity. I’m aware of no javascript library that offers a datetime data strucutre with nano (or micro) second precision, and google searches proved fruitless.

To get additional precision, process.hrtime() exists, which returns a tuple of [seconds, nanoseconds] where the nanoseconds are the nanoseconds since the last second. Additionally, it’s seconds from an arbitrary point in time, so the function is not applicable to this discussion, but I wanted to showcase it for completeness sake.

Popular ELK frontend, kibana, seems afflicted with this limited granularity partly because the underlying Elasticsearch (written in Java) suffers the same problem that dates are recorded down to milliseconds

In the end, if we were working with javascript, we’d have to roll our own datetime class as well as our own ISO 8601 parser. Seems discouraging.

Rust

Rust is a breath of fresh air. Everything related to datetimes is handled through the rust-chrono package, which is precise to the nanosecond and can parse ISO 8601 formatted strings, truncating digits more granular than nanoseconds.

extern crate chrono;

use chrono::*;

fn main() {
    let s = "2016-06-10T21:42:24.760738998Z";
    let dt = s.parse::<DateTime<UTC>>().unwrap();
    println!("{:?}", dt);

    // Prints 2016-06-10T21:42:24.760738998Z
}

The team’s research is documented on their wiki.

Go

Go is in the same boat as Rust, support for up to nanosecond resolution ISO8601 dates is readily accessible in the standard library. In Go’s case, it’s RFC 3339, the subset of ISO 8601. Pretty straightforward.

import (
	"fmt"
	"time"
)

func main() {
	dt := "2016-06-10T21:42:24.760738998Z"
	res, err := time.Parse(time.RFC3339Nano, dt)
	if err == nil {
		fmt.Println(res)
	}

	// prints 2016-06-10 21:42:24.760738998 +0000 UTC
}

Java

Until Java8’s time.LocalDateTime there was no built in datetime class that had sub-millisecond resolution (lookin’ at you util.Date). sql.Timestamp doesn’t count as it inherits util.Date only for implementation and not for semantics (makes one almost wish that Java had C++’s private inheritance).

Looking outside the standard library, the venerable JodaTime only supports millisecond resolution and this fact will never change.

Another alternative is date4j, which is a pretty straight forward (and small) library and can parse ISO 8601 dates down to nanoseconds:

import hirondelle.date4j.DateTime;
import org.junit.Test;

import static org.assertj.core.api.Assertions.assertThat;

public class TestDates {
    @Test
    public void date4jParsing() {
        final DateTime date = new DateTime("2016-06-10T21:42:24.760738998");
        assertThat(date.toString())
                .isEqualTo("2016-06-10T21:42:24.760738998");
    }
}

Only ISO 8601 format is supported, but since this library is open source there isn’t a reason why it couldn’t support other formats.

Probably the newest and best date library for pre-Java8 code is Threeten which is Java8 time backported to Java6. Quick example:

import org.junit.Test;
import org.threeten.bp.LocalDateTime;
import org.threeten.bp.format.DateTimeFormatter;

import static org.assertj.core.api.Assertions.assertThat;

public class TestDates {
    @Test
    public void threeTenRoundTripParsing() {
        final String s = "2016-06-10T21:42:24.760738998";
        final DateTimeFormatter formatter = DateTimeFormatter.ISO_LOCAL_DATE_TIME;
        final LocalDateTime date = LocalDateTime.parse(s, formatter);
        assertThat(formatter.format(date))
                .isEqualTo("2016-06-10T21:42:24.760738998");
    }
}

Anyone familiar with Java8 time module will be familiar with Threeten. If the only reason for not using the Java8 time classes is that it requires Java8 then Threeten seems an appropriate fit.

Tracking Precision

Sometimes one wants to keep track the precision of the data. For instance, 2016 would be tagged with a precision of Year and 2016-01 would be tagged with Month. Here’s are a couple of ways to represent this model (I’m using F# because the implementation is the shortest!)

type Precision = Year | Month | Day | Hour | Minute | Second | Nanosecond
type ParsedDate = { precision: Precision; date: DateTime }

Alternatively, one can use a discriminated union of tuples to ensure the data can never be misinterpreted (eg. a 2016 parsed into a DateTime of 2016-01-01T00:00:00 could be misinterpreted as having second precision).

type ParsedDate =
| Year of year: int
| Month of year: int * month: int
| Day of year: int * month: int * day: int
# ...

I believe that it may not be good idea to keep track of the original date’s precision, as programmers may erroneously latch onto the precision as the source of truth. To give an example, imagine two systems recording the time of an operation that happened at 10:15am. System A has hour precision so creates a datetime of 10am. System B has half hour precision so records 10:30am. While System B has more precision, System A has the same accuracy in this instance. A downstream recipient of these events may get confused if using parsed date’s precision in any meaningful business logic. Most of the time when grouping, filtering, and selecting dates they are truncated anyways. For instance, the business logic may detect if two dates occurred on the same day:

final DateTimeFormatter formatter = DateTimeFormatter.ISO_LOCAL_DATE_TIME;
final LocalDateTime hour = LocalDateTime.parse("2016-06-10T10:00", formatter);
final LocalDateTime minute = LocalDateTime.parse("2016-06-10T10:30", formatter);

assertThat(hour.truncatedTo(ChronoUnit.DAYS))
        .isEqualTo(minute.truncatedTo(ChronoUnit.DAYS));

By eschewing retention of the parsed precision, it’s easier to support standard parsing mechanisms. Notice that none of the date parsing libraries showcased, outside of date4j (and arguably Numpy) kept track of precision.

But I can understand some situations where precision is wanted. If someone sends in a “2016-06” for a birthdate the client should have the opportunity to reject it because the date is not precise enough. This may be the strongest argument to write a custom parser, though it is certainly possible to determine precision with custom Java 8 (and by extension Threeten) datetime formatters.

Recommendation

  1. Before committing to a custom date format, check if ISO 8601 doesn’t already cover it.

  2. Nanosecond precision is as granular as one should go. It should be clear from the all the code examples that even though ISO 8601 supports arbitrary precision dates, the libraries around it don’t, partially because a 1GHz processor has a clock rate of 1ns, so recording a higher precision is rare and unlikely. Thus, creating an arbitrarily precise date class would only hurt interoperability as too many home grown solutions of questionable quality will appear across systems to satisfy the requirement or they’ll truncate the data.

  3. If what you care about is supporting Java 6, nanosecond granularity, and ability to parse strings in multiple formats, consider using the ThreeTen library. Precision can be determined based on the parser that succeeds.

  4. If one’s still stuck on using util.Date and building a custom parser for multiple formats, then at least have int nanoseconds since the last millisecond so there is no data overlap between util.Date and nanoseconds, as util.Date has a millisecond field (ie, you don’t want a fractional seconds field, you want fractional milliseconds).

Comments: