HXL tagging conventions (version 1.1)
Release 1.1, 2018-04-30 (permalink, previous
release)
Part of the HXL 1.1
standard.
Datasets like this — longer, of course, and with more
columns — are the backbone of humanitarian information
management, and they provide the input for most reports, maps,
and visualisations coming out of a crisis. Unfortunately,
creating those data products is time-consuming, and responders
have to duplicate the work from crisis to crisis and even
dataset to dataset, because it is hard to build reusable
software tools that can understand the many different ways
responders may choose to label their data. For example, the
text header of the last column could have appeared in dozens
of variants, and in several different languages:
Now, whether the text at the top of the column reads
“Number affected” or “عدد الأشخاص المتضررين”,
software for cleaning, validating, analysing, mapping, or
visualising the data can automatically recognise the hashtag
#affected and use the figures below accordingly.
More than one row of headers may appear above the HXL hashtag
row — the hashtags themselves act as a marker to show
automated systems where the headers end and the data begins:
HXL software should expect to find the hashtag row anywhere
within the first 25 rows of a dataset and should assume that
all rows below the hashtag row contain data.
1. Introduction
This document is part of the Humanitarian Exchange Language (HXL) version 1.1, a standard for increasing the efficiency and effectiveness of data exchange during humanitarian crises. This new version is fully backwards-compatible with data produced using HXL 1.0 (released 18 March 2016), and adds several new features, including JSON-based encodings and a standard way to refer to taxonomies/controlled vocabularies. There are also several new hashtags and attributes in the hashtag dictionary. The intended audience for this specification is information-management professionals and software developers who require a formal definition of the HXL syntax. Most users who simply want to add hashtags to their data may prefer the HXL postcards and the tutorial information at hxlstandard.org, as well as interactive HXL tool support under development at the Humanitarian Data Exchange (HDX). The HXL standard consists of two normative parts:- HXL tagging conventions (this document) — instructions for adding HXL hashtags to spreadsheets.
- HXL hashtag dictionary — a list of hashtags for identifying humanitarian data fields.
1.1. Design philosophy
HXL is a lightweight standard by design. Most data standards dictate to users how they should collect and format their data; HXL, on the other hand, encourages organisations to add hashtags to their existing datasets, without requiring new skills or software tools, and interferes as little as possible in their current ways of working. The primary focus of HXL is tabular-style data such as spreadsheets or API output from database tables, which represent the vast majority of the operational data collected in the humanitarian sphere; however, HXL hashtags can potentially have other applications, including labelling attributes for map layers or identifying data types in SMS messages.1.2. Target audience
The standard’s primary audiences are information-management specialists who are familiar with spreadsheets or relational databases, and computer programmers and database specialists looking to consume data produced by those information-management specialists.1.3. Terms of use
HXL is available as an open standard — the working groups have designed it for use with humanitarian data, but people and organisations are welcome to use it for any purpose they choose. Note, however, that users may not claim support or endorsement from any members of the HXL working group or the organisations for which they work. The authors offer no warranty of any kind, so implementors use the standard at their own risk. The text of the standard itself is released into the public domain.2. Adding HXL hashtags to data
2.1 Spreadsheet Data eg. csv, xls, xlsx
Consider the following simple spreadsheet:LOCATION NAME | LOCATION CODE | NUMBER AFFECTED |
---|---|---|
Camp A | 01000001 | 2000 |
Camp B | 01000002 | 750 |
Camp C | 01000003 | 1920 |
- Number affected
- Affected
- People affected
- # de personnes concernées
- Afectadas/os
- عدد الأشخاص المتضررين
LOCATION NAME | LOCATION CODE | NUMBER AFFECTED |
---|---|---|
#loc +name | #loc +code | #affected |
Camp A | 01000001 | 2000 |
Camp B | 01000002 | 750 |
Camp C | 01000003 | 1920 |
CAMP INFORMATION | NEEDS | |
---|---|---|
LOCATION NAME | LOCATION CODE | NUMBER AFFECTED |
#loc +name | #loc +code | #affected |
Camp A | 01000001 | 2000 |
Camp B | 01000002 | 750 |
Camp C | 01000003 | 1920 |
2.2 JSON data
It is becoming increasingly common for organisations to share data through APIs. HXL is well placed to add interoperability to that data through its support for JSON, the format most-commonly used by APIs. HXL is purposely restricted to a simplified subset of the full JSON specification. In this simplified subset, the data must be laid out in a non-hierarchical and tabular form. Two such forms are currently supported.2.2.1. Array of objects JSON style
This is a very common way for data to be presented where each row is a lookup between a hashtag key and a value:[ { "#hashtag": value, "#hashtag": value }, { "#hashtag": value, "#hashtag": value } ]An example of this is shown below:
[ { "#event+id": 1, "#affected+killed": 1, "#region": "Mediterranean", "#meta+source+reliability": "Verified", "#date+reported": "05/11/2015", "#geo+lat": 36.891500, "#geo+lon": 27.287700 }, { "#event+id": 3, "#affected+killed": 1, "#region": "Central America incl. Mexico", "#meta+source+reliability": "Partially Verified", "#date+reported": "03/11/2015", "#geo+lat": 15.956400, "#geo+lon": -93.663100 } ]For repeated same named hashtags eg. to express multiple sectors using repeated #sector columns, the equivalent in this format is a comma separated list of sectors (see 4.1.1. The +list attribute), e.g.
"#sector": "WASH,health"Note that the array of objects form does not allow for human-readable headers. If there is a demand for these — and the array of arrays form outlined below does not suffice — then they will appear in a future version of the standard. Note: HXL allows hashtag attributes to appear in any order, case-insensitive, with or without whitespace separating them, so these are all considered equivalent: “#affected+f+children”, “#affected +children +f”, and “#affected+Children+F”. In JSON objects, it is essential that the property names be consistent, so you should take the following steps when converting a HXL hashtag specification (hashtag and attributes) for use as a JSON object property:
- Convert to lowercase.
- Remove all whitespace.
- Present the attributes in US-ASCII alphabetical order.
2.2.2. Array of arrays JSON style
Although not widely used, this form is ideally suited to visualisations because it is significantly more compact than the Array of Objects format as the hashtags are only defined once in the first element of the Array:[ ["#hashtag", "#hashtag"], [value,value], [value,value] ]Below is an example:
[ ["#event+id","#affected+killed","#region","#meta+source+reliability", "#date+reported","#geo+lat","#geo+lon"], [1, 1, "Mediterranean", "Verified", "2015-11-05", 36.891500,27.287700], [3, 1, "Central America incl. Mexico", "Partially Verified", "2015-11-03", 15.956400, -93.663099] ]If headers are needed, they can be added as an extra array prior to the hashtags.