XBRL and large instances - enabling stream-based processing

I’ve had a lot of positive and interesting feedback on a previous post on XBRL and very large instances. One aspect which I didn’t cover, and which has been commented on by Michele Romanelli, amongst others is the problems created by the lack of constraint of ordering in an XBRL instance document.

In this post, I’ll explain the problem, and propose a possible solution, in the form of a Syntax for Stream Processing of XBRL.

XBRL Instances are made up of various different components. For our purposes, the interesting ones are facts, contexts and unit. Every fact in an XBRL instance is associated with a context, and all numeric facts are associated with a unit. The associations are made by means of contextRef and unitRef attributes on the facts, referencing ID attributes on the context and unit elements.

The XBRL specification places no constraints on the order in which these components appear. Contexts and units can be used before or after they are declared. This freedom of ordering creates a barrier to efficient stream-based processing of an XBRL document. In order to consume a document, a processor needs to remember every context or unit that it encounters, in case a later fact uses it, and in the case that it encounters a fact before its associated unit or context, it must retain details of that fact until it encounters the context or unit declaration. In the worst case, this could lead to a requirement for pretty much the entire document to be held in memory before it can be processed, which rather defeats the point of stream-based processing.

Proposal: Syntax for stream-based processing of XBRL

The amount that a processor needs to store in memory could be drastically reduced if it could rely on contexts and units appearing before the facts that reference them, and if it could know when it had seen all the facts that reference them.

This could be achieved by following two simple rules:

Contexts and units must appear before the facts that reference them.
Facts may only reference the most recently declared context and unit.

This would give rise to instance documents that look something like this:


  <unit id="u1" />
  <context id="c1" />
  <fact1 contextRef="c1" unitRef="u1" />
  <fact2 contextRef="c1" unitRef="u1" />
  <fact3 contextRef="c1" unitRef="u1" />

  <context id="c2" />
  <fact1 contextRef="c2" unitRef="u1" />
  <fact2 contextRef="c2" unitRef="u1" />
  <fact3 contextRef="c2" unitRef="u1" />

The benefit to a consuming processor is that it only needs to hold one context and one unit in memory at any given time – as soon as it encounters another context or unit declaration, it can forget about the previous one.

As it stands, the approach is a bit flawed. If you imagine a case where you have facts reported against multiple contexts and multiple units then this is going to result in a large amount of duplication of unit or context declarations, leading to an instance document that is unnecessarily large. Whilst real world instance documents often have many contexts (due to use of dimensions), it’s rare to see more than a handful of units. In this case, the benefits of applying the second constraint above to units is limited.

To address this, my proposed solution would allow you to select one of three different serialisation conventions for each of units and contexts. The three options are:

None – no constraint on ordering, as per standard XBRL v2.1.
Pre-declare – units/contexts must be declared before they are used.
Immediate pre-declare – the referenced unit/context must be the most recent declaration.

An instance document would declare which serialisation conventions it adhered to by including a couple of additional attributes, e.g.:

contextSerialisationConvention=”none|predeclare|immediate”
unitSerialisationConvention=”none|predeclare|immediate”

These could take the form of either custom attributes on the xbrli:xbrl element or, perhaps more appropriately, a processing instruction.

The combination of contextSerialisationConvention=”immediate” and unitSerialisationConvention=”predeclare” is the most likely to be useful for typical documents.

Implementation

It should be noted that the approach described here is completely backwards compatible. Documents conforming to this proposal would be completely valid XBRL v2.1, and could be consumed by any XBRL v2.1 processor (provided that it could cope with the document size).

Where a reporting regime is likely to encounter large documents, it would be open to receivers to specify a minimum level of “streamability”. For example, they could insist that units are at least “pre-declared” and that contexts are “immediately pre-declared”.

This is an archived post, please visit our homepage or contact us for more information.

XBRL and large instances – enabling stream-based processing

Proposal: Syntax for stream-based processing of XBRL

Implementation

Recent Posts

Publications

Archives