OpenCodeList Specification
Version 0.1.3
The key words "MUST", "REQUIRED", "SHOULD", and "MAY" in this document are to be interpreted as described in RFC2119 and RFC8174, when, and only when, they appear in all capitals, as shown here.
This specification is licensed under the Apache License, Version 2.0.
Introduction
The Open Code List Representation Format (OpenCodeList) defines a generic standard data format for representing code lists. Based on the JSON Standard, this format can be easily generated and read by almost any programming language. Documents in the OpenCodeList format can be validated for syntactic correctness using the OpenCodeList Document Schema.
OpenCodeList can be used for exchanging code lists between services or applications, as a representation format for official code lists, or as a response format for API requests (e.g., for RESTful web services).
What are Code Lists?
Code lists are structured collections of codes or keys used for the identification and classification of data. These code lists are essential in numerous fields such as databases, management systems, scientific research, and industrial applications. They help in organising, storing, and retrieving data consistently and efficiently.
Code lists play a crucial role in the standardisation and harmonisation of data. By using standardised codes, different systems and organisations can interpret and exchange data uniformly.
Examples of code lists include:
-
International codes for countries, languages, and currencies by the International Organization for Standardization (ISO).
-
National area codes (e.g., municipal codes of the Federal Republic of Germany).
-
National and subnational codes in the public sector (e.g., statistical codes from statistical authorities).
In principle, any data selection can be mapped to a code list; even a simple Boolean value can be represented by the codes No and Yes.
How are Code Lists structured?
The simplest form of a code list consists of two columns: key and description.
Here is an example of a country code list:
Code | Description |
---|---|
AT | Austria |
CH | Switzerland |
DE | Germany |
Code lists can potentially contain an unlimited number of columns. Here is an example of a country code list with three columns:
Code | Short Name | Long Name |
---|---|---|
AT | Austria | Republic of Austria |
CH | Switzerland | Swiss Confederation |
DE | Germany | Federal Republic of Germany |
Code lists can refer to each other. Here is an example of a country code list referring to another continents code list:
Code | Short Name | Long Name | Continent |
---|---|---|---|
AT | Austria | Republic of Austria | EU |
CH | Switzerland | Swiss Confederation | EU |
DE | Germany | Federal Republic of Germany | EU |
MA | Morocco | Kingdom of Morocco | AF |
The corresponding continents code list might look like this:
Code | Name |
---|---|
AF | Africa |
AM | Americas |
EU | Europe |
OC | Oceania |
Code lists can also have more than one key. Here is an example of a country code list with different ISO3166 codes:
Alpha2Code | Alpha3Code | NumericCode | Name |
---|---|---|---|
AT | AUT | 040 | Austria |
CH | CHE | 756 | Switzerland |
DE | DEU | 276 | Germany |
Let’s go a step further with this multilingual country code list:
Code | Language | Name |
---|---|---|
AT | en | Austria |
AT | de | Österreich |
CH | en | Switzerland |
CH | de | Schweiz |
DE | en | Germany |
DE | de | Deutschland |
Here, it is the combination of key and language (also represented by a language key) that defines uniqueness.
The structure of code lists can thus be simple or somewhat complex.
Are There Already Standards for Code Lists?
Yes, there are: The Organization for the Advancement of Structured Information Standards (OASIS) has defined an XML-based standard for code lists with Code List Representation (genericode).
The OASIS Code List Representation format, “genericode”, is a single model and XML format (with a W3C XML Schema) that can encode a broad range of code list information. The XML format is designed to support interchange or distribution of machine-readable code list information between systems. Note that genericode is not designed as a run-time format for accessing code list information, and is not optimized for such usage. Rather, it is designed as an interchange format that can be transformed into formats suitable for run-time usage, or loaded into systems that perform run-time processing using code list information.
This sounds good, but there’s a catch. There is no JSON representation of "genericode".
Recognising the custom use of JSON in a tight binding between user-defined processes, the committee sees no purpose served by standardising a JSON syntax for the genericode vocabulary.
This means there is no official support for JSON as a data format and thus no official JSON schema.
So, if you want to represent code lists in XML format, the OASIS standard is recommended. However, if you want to work with JSON, OpenCodeList provides a suitable alternative. OpenCodeList is significantly influenced by the OASIS Code List Representation Format and strives to be largely compatible with it.
Definitions
OpenCodeList Document
An OpenCodeList document is a self-contained resource that defines and describes either a code list or a collection of code lists. It MUST contain at least the opencodelist
field and either codeList
or codeListSet
, but not both. An OpenCodeList document uses and conforms to the OpenCodeList specification.
Code List
A code list is a classic relational table with columns and data rows, where at least one column should serve as the key (code). OpenCodeList allows for defining generic code lists.
Code List Collection
A code list collection is a list of references to external code lists. A code list collection can be used to group different versions of a code list.
Specification
Versioning
The OpenCodeList specification is versioned according to the major.minor.patch
scheme. The major-minor part of the version number (e.g., 0.1
) MUST indicate the functional set of the specification. Patch versions address errors in this document or provide clarifications to this document, not to the functionality. Tools that support OpenCodeList version 0.1
MUST be compatible with all 0.1.*
versions of OpenCodeList. The patch version SHOULD NOT be considered by the tools, so that no distinction is made between 0.1.0
and 0.1.1
.
An OpenCodeList document always contains a mandatory opencodelist
field that specifies the version of the OpenCodeList specification being used.
Format
An OpenCodeList document that conforms to the OpenCodeList specification is itself a JSON object that can be represented in JSON format.
All field names in the specification are case-sensitive. The schema defines two types of fields: fixed fields that have a declared name, and free fields whose names MUST conform to a specific pattern. Additional fields MUST have unique names within the contained JSON object.
JSON Schema
JSON Schema is a specification for defining JSON data structures. A JSON schema itself is expressed declaratively using JSON. The OpenCodeList Document Schema is a JSON schema for OpenCodeList documents.
Multilingual Support
OpenCodeList supports multilingualism, meaning text strings can optionally be provided with one or more IETF BCP 47 language tags. Each language tag can then be followed by the corresponding translation.
Example:
"description": [
{
"language": "en",
"text": "Currency codes of the International Standard ISO 4217"
},
{
"language": "de",
"text": "Währungscodes der internationalen Norm ISO 4217"
}
]
Language Tags
IETF BCP 47 (Best Current Practice 47) is a document that defines the rules and procedures for creating and using language tags. These tags are used to identify the language of content on the Internet. BCP 47 is maintained by the Internet Engineering Task Force (IETF) and consists of two RFCs (Request for Comments): RFC 5646 and RFC 4647.
Language tags are necessary to ensure proper localisation and internationalisation of web content and software. They allow applications and services to provide the correct language content to users and support the accurate display of language-specific data such as date formats, numbers, and text direction.
A BCP 47 language tag consists of a series of subtags separated by hyphens. These subtags define various aspects of the language and its variants. The most basic components are:
-
Primary language subtag: This is a mandatory subtag consisting of a two- or three-letter code that denotes the main language (e.g.,
en
for English,de
for German). -
Region subtag: An optional subtag that specifies a country or region (e.g.,
US
for the United States inen-US
). -
Script subtag: An optional subtag that indicates the writing system (e.g.,
Latn
for the Latin alphabet inzh-Latn
). -
Variant subtag: An optional subtag that describes language variants or dialects (e.g.,
1901
for traditional German orthography inde-1901
). -
Extension subtag: Extensions for specifying additional information.
-
Private use subtag: A range reserved for private use and not standardised.
Dates and Times
The formatting of dates and times in OpenCodeList, as specified by JSON Schema, is defined by RFC 3339, Section 5.6.
Examples:
date-time
: Date and time together, e.g.,2024-11-13T20:20:39
or2024-11-13T20:20:39+00:00
.time
: Time only, e.g.,20:20:39
or20:20:39+00:00
.date
: Date only, e.g.,2024-11-13
.
URIs
When a Uniform Resource Identifier (URI) is required, it is important to distinguish between two JSON string formats depending on the context:
-
The
uri
format expects the JSON string to be an absolute URI. An absolute URI contains all the necessary information to identify the resource independently of its context. This means auri
:- begins with a scheme (like http, https, etc.),
- may contain a hostname or an IP address,
- and may optionally include a path, a query, and a fragment.
-
The
uri-reference
format is more flexible and allows both absolute and relative URIs.uri-reference
can be a complete URI as described above, or it can be a relative reference that needs to be interpreted in a specific context. A relative URI might include only a path or even just a fragment.
Examples:
uri
:https://example.com/path/to/resource?query=param#fragment
uri-reference
:https://example.com/path/to/resource?query=param#fragment
/path/to/resource
#fragment
Schema
OpenCodeList Object
This is the root object of an OpenCodeList document and contains the following fields:
opencodelist
-
A JSON string with the version number of the OpenCodeList specification. This field is REQUIRED.
codeList
-
A
codeList
object that includes the column definitions and data content of a code list. codeListSet
-
A
codeListSet
object that defines a collection of code lists.
The codeList
and codeListSet
fields are mutually exclusive. One of these fields is REQUIRED.
codeList Object
The codeList
object defines a complete code list along with its data:
info
-
A
codeListInfo
object containing metadata about the code list. This field is REQUIRED. columnSet
-
A
columnSet
object that defines the columns and unique keys of the code list. This field is REQUIRED. dataSet
-
A
dataSet
object that contains the data rows of the code list.
codeListInfo Object
The codeListInfo
object contains metadata about the code list:
shortName
-
A JSON string with the short name of the code list.
longName
-
The long name of the code list. This can be either a regular JSON string or a JSON array of
localizableString
objects. description
-
A detailed description of the code list. This can be either a regular JSON string or a JSON array of
localizableString
objects. notes
-
Additional notes about the code list. This can be either a regular JSON string or a JSON array of
localizableString
objects. changeLog
-
Documents the changes compared to previous versions of this code list. It MUST be a JSON array of JSON strings.
agency
-
An
agency
object with information about the organisation responsible for the publication and/or maintenance of the codes. version
-
A JSON string with the version of the code list.
validFrom
-
A JSON string in
date-time
format that defines the start date from which this code list is valid. validTo
-
A JSON string in
date-time
format that defines the end date up to which this code list is valid. canonicalUri
-
A JSON string in
uri
format. This URI uniquely identifies all versions (together) of this code list. This field is REQUIRED. canonicalVersionUri
-
A JSON string in
uri
format. This URI uniquely identifies a specific version of this code list. locationUri
-
Either a JSON string in
uri
format or a JSON array with JSON string values inuri
format. These URId are suggested retrieval locations for the referenced code list, in OpenCodeList format. publishedAt
-
A JSON string in
date-time
format with the publication date of this code list. publishedFrom
-
A
publisher
object with information about the publisher of this code list. language
-
A JSON string with the language of this code list. This MUST be an IETF BCP 47 language tag.
This object MAY be extended.
Example:
"info": {
"shortName": "GermanFederalStateCodes",
"longName": "ISO 3166-2 Codes for Germany",
"description": "ISO 3166-2 Codes for the federal states of Germany",
"agency": {
"shortName": "ISO",
"longName": "International Organization for Standardization",
"url": "https://www.iso.org/"
},
"version": "2024-07-12",
"canonicalUri": "urn:iso:std:iso:3166-2",
"canonicalVersionUri": "urn:iso:std:iso:3166-2:2024-07-12",
"locationUri": "https://raw.githubusercontent.com/openpotato/opencodelist/main/samples/germany.federal-state-codes-2024-07-12.json",
"publishedAt": "2024-07-12T10:00:00",
"publishedFrom": {
"shortName": "OpenCodeList",
"url": "https://openpotato.github.io/opencodelist/"
},
"language": "en"
}
codeListSet Object
The codeListSet
object defines a collection of code lists:
info
-
A
codeListSetInfo
object containing metadata about the code list collection. This field is REQUIRED. codeLists
-
A list of code lists. It MUST be a JSON array of
codeListRef
objects. This field is REQUIRED.
codeListSetInfo Object
The codeListSetInfo
object contains metadata about a code list collection:
shortName
-
A JSON string with the short name of the code list collection.
longName
-
The long name of the code list collection. This can be either a regular JSON string or a JSON array of
localizableString
objects. description
-
A short description of the code list collection. This can be either a regular JSON string or a JSON array of
localizableString
objects. summary
-
A longer summary of the content of the code list collection. This can be either a regular JSON string or a JSON array of
localizableString
objects. language
-
A JSON string with the language of this code list collection. This MUST be an IETF BCP 47 language tag.
This object MAY be extended.
Example:
"info": {
"shortName": "GermanFederalStateCodeListSet",
"longName": "ISO 3166-2 Code Lists for Germany",
"description": "All versions of the ISO 3166-2 Code Lists for the federal states of Germany",
"language": "en"
}
codeListRef Object
The codeListRef
object defines a reference to an external code list:
canonicalUri
-
A JSON string in
uri
format. This URI uniquely identifies all versions (together) of the referenced code list. This field is REQUIRED. canonicalVersionUri
-
A JSON string in
uri
format. This URI uniquely identifies a specific version of the referenced code list. locationUri
-
Either a JSON string in
uri
format or a JSON array with JSON string values inuri
format. These URId are suggested retrieval locations for the referenced code list, in OpenCodeList format.
columnSet Object
The columnSet
object defines columns and unique keys of a code list:
columns
-
Defines the columns of the code list. It MUST be a JSON array of
column
objects. This field is REQUIRED. keys
-
Defines the unique keys of the code list. It MUST be a JSON array of
key
objects. defaultKey
-
A JSON string with an ID that refers to a
key
object. This can be used to define the default key, that is, the preferred key fromkeys
to be used as the code source. references
-
Defines internal references as well as external references to other code lists. It MUST be a JSON array of
reference
objects.
column Object
The column
object defines a column for a code list:
id
-
A JSON string with the ID of the column. This field is REQUIRED.
name
-
The name of the column. This can be either a regular JSON string or a JSON array of
localizableString
objects. description
-
A short description of the column. This can be either a regular JSON string or a JSON array of
localizableString
objects. type
-
Defines the data type of the column. This field is REQUIRED. It MUST be a JSON string with one of these values:
string
enum
enum-set
integer
number
bool
time
date
date-time
object
nullable
-
A JSON boolean that defines whether this column can also contain JSON NULLs.
Depending on the value in type
, further properties are available.
string
Represents a JSON string. The following additional schema properties are available:
minLength
: A JSON number ininteger
format with the minimum allowable number of charactersmaxLength
: A JSON number ininteger
format with the maximum allowable number of characterspattern
: A JSON string with a regular expression that must always match the values in this column.language
: A JSON string with the language for the contents of this column. This MUST be an IETF BCP 47 language tag.
enum
A JSON string representing an enumeration. The following additional schema properties are available:
members
: Defines the possible values of the enumeration. It MUST be a JSON array ofenumMember
objects. This field is REQUIRED.
enum-set
A JSON string representing an enumeration set, formatted as a CSV string. The following additional schema properties are available:
delimiterChar
: A JSON string defining the CSV delimiter.quoteChar
: A JSON string defining the CSV quote character.members
: Defines the possible values of the enumeration. It MUST be a JSON array ofenumMember
objects. This field is REQUIRED.
integer
Represents a JSON number in integer
format. The following additional schema properties are available:
minValue
: A JSON number ininteger
format with the minimum allowable valuemaxValue
: A JSON number ininteger
format with the maximum allowable value
number
Represents a JSON number in number
format. The following additional schema properties are available:
minValue
: A JSON number innumber
format with the minimum allowable valuemaxValue
: A JSON number innumber
format with the maximum allowable value
date
Represents a JSON string in date
format. The following additional schema properties are available:
minValue
: A JSON string indate
format with the minimum allowable valuemaxValue
: A JSON string indate
format with the minimum allowable value
Here’s the translation into British English:
date-time
Represents a JSON string in the date-time
format. The following additional schema properties are available:
minValue
: A JSON string in thedate-time
format specifying the allowable minimum value.maxValue
: A JSON string in thedate-time
format specifying the allowable maximum value.
time
Represents a JSON string in the time
format. The following additional schema properties are available:
minValue
: A JSON string in thetime
format specifying the allowable minimum value.maxValue
: A JSON string in thetime
format specifying the allowable maximum value.
object
Represents a JSON object. The following additional schema properties are available:
schemaUri
: A JSON string in theuri
format. This URI is the location for retrieving the JSON schema for this column.
enumMember Object
The enumMember
object defines a value in an enumeration:
value
-
A JSON string with the enumeration value. This field is REQUIRED.
description
-
The description of the enumeration value. This can be either a standard JSON string or a JSON array with
localizableString
objects.
key Object
The key
object defines a unique key for the code list:
id
-
A JSON string with the unique ID of the key. This field is REQUIRED.
name
-
The name of the key. This can be either a standard JSON string or a JSON array with
localizableString
objects. description
-
A brief description of the key. This can be either a standard JSON string or a JSON array with
localizableString
objects. columnRefs
-
A JSON array of strings with IDs that each reference a
column
object. This field is REQUIRED.
reference Object
The reference
object defines a reference to either an internal or external code list:
codeListRef
-
A
codeListRef
object that points to an external code list. IfcodeListRef
is defined,keyRef
refers to akey
object from the referenced code list; otherwise, it refers to akey
object from this code list. keyRef
-
A JSON string with an ID that references a
key
object. This field is REQUIRED.
dataSet Object
The dataSet
object contains the data of a code list:
rows
-
Defines the data rows of the code list. It MUST be a JSON array of
row
objects. This field is REQUIRED.
row Object
The row
object defines a data row in a code list:
cells
-
Defines the data cells of the data row. It MUST be a JSON array of
cell
objects. This field is REQUIRED.
cell Object
The cell
object defines the content of a data cell:
columnRef
-
A reference to a code list column. A JSON string with an ID that references a
column
object. value
-
Defines the actual value of the data cell. Depending on the column definition, this can be a JSON null value, a JSON string, a JSON number, a JSON object, or a JSON array of
localizableString
objects. This field is REQUIRED.
localizableString Object
The localizableString
object is a text string associated with a language code:
language
-
A JSON string with a language tag. This MUST be an IETF BCP 47 language tag. This field is REQUIRED.
text
-
The actual text. This field is REQUIRED.
Example:
"longName": [
{
"language": "en",
"text": "ISO 3166-2 Codes for Germany"
},
{
"language": "de",
"text": "ISO 3166-2 Codes für Deutschland"
}
]
Extension of the Specification
The OpenCodeList specification can be extended with additional data at certain points.
The properties of these extensions are implemented as free fields, which must always be prefixed with x-
(e.g., x-external-id
). The value can be a string, a number, a boolean value, null, an object, or an array.
The extensions may or may not be supported by the available tools. Ideally, these tools can be extended to add the desired support (e.g., in Open Source projects).
Example:
"info": {
"shortName": "GermanFederalStateCodes",
"agency": {
"shortName": "ISO",
"longName": "International Organization for Standardization",
"x-contact-name": "ISO Central Secretariat",
"x-contact-address": "Chemin de Blandonnet 8, 1214 Geneva, Switzerland",
"x-contact-email": "central@iso.org"
}
}