There is a need to give advice to organizations outside of HL7 so that they have guidance when assigning canonical URLs for code systems, code system supplements and sometimes value sets that they publish and which are (or are expected to be) used in FHIR-based implementations.   For any terminological content that is licensed for redistribution or republishing (through agreements negotiated by the HL7 Terminology Authority), the publishing and distribution mechanism is via the HL7 Terminology, maintained through the UTG process - this requires all code systems and value sets to have a canonical URL also.  There also exist organizations with terminological content widely used in healthcare IT systems where there exists no canonical URL, or where the responsible organization chooses not to assign a URL.

The goal for a Code System canonical URL is that it is:

  1.  Stable over time.   It is used to identify the code system of captured information in numerous clinical and administrative healthcare systems (potentially into the tens or hundreds of thousands of installs), and thus would present both a hardship for the industry and a cause for patient errors on longitudinal data look-ups and searches if the URL changes.   Once established and in use, the canonical URL should never be changed.
    1. i.e even if the organization responsible for managing the code system, the preferred descriptive name for the code system, or other key metadata changes, the canonical URL should remain unchanged.
  2. The canonical URL should be human readable and recognizable.  It has turned out that development and debugging of implementations is slowed and made more expensive with the use of meaningless identifiers for things like code systems, and the current sense is that the advantages of using meaningless identifiers does not make up for the cost increases for system development.  Meaningful identifiers are also slightly less susceptible to keying errors that result in the wrong URL being used.
  3. It is preferable that a code system URL be a resolvable endpoint, i.e. pasting it into a browser will do something more than a "404 - not found' error.  In some cases, this might be to a terminology server endpoint or formal FHIR resource definition, but in many cases it will just point to a documentation page about the code system.  
  4. The syntax of the URL must be machine processable by XML tools, i.e. no special characters or active script commands embedded in it.  It should also not be specific to the format of the page (i.e. it shouldn't end in .pdf, .html or .htm, etc.) 

Discussion Points:

  1. Organizations have the flexibility to name their servers or domains and/or folders to be whatever makes the most sense to their business.   If, as time goes on, the original canonical URL is a string that no longer closely matches what the organization considers to be its primary key words, this is not a problem: all  servers that can support DNS and internet browsing have the capability of a simple redirect to a browsable web page for their domain (usually the root of their main server), or any folder under that domain on that server.  This is a server configuration parameter, supported by all server technologies and products. This permits the entry page for the code system canonical URL to be anything the publisher wants it to be; all that is required is the server configuration parameter to make the canonical URL redirect to that page.
    1. i.e. the canonical URL should be a 'permalink'

Recommendations:

  • No labels

12 Comments

  1. Do we have formal definitions for "redistribution or republishing"?  I am sure we have discussed them, but my memory fades at the conclusions.

    Also, somewhere in this policy I'd like to see either a statement or a hyperlink to a formal statement elsewhere that describes the purpose of the canonical URL for the code system.  It is only against the purpose that the goals listed above can be judged or fulfilled.  I have searched a bit in the FHIR spec, and found something I could paste in here as a suggestion, but there may be others better placed to do that than me.  

    The initial rubric mentions value sets, but the title and the listed goals refer to code systems.  I think we should be clear that this is code systems only

  2. Julie, I will dig up the references to the need for unique stable identifiers for code systems.  This has been well known and well documented since the days of the Desiderata.

    Although the immediate and most pressing need is to identify Code Systems, the same requirement exists for Value Sets.    At this time, HL7/UTG does not have any value sets curated by organizations outside of HL7.    Very soon, we will have maybe half a dozen of them (ones from CDC and some other places) that are used very widely in HL7 IGs.  Note that ALL implementations using FHIR resources and tools REQUIRE unique URLs for BOTH code systems and value sets. This goes for FHIR terminology services as well.

  3. We need to also add that we prefer to not have https://, instead prefer http://

  4. You may wish to review the identifier best practices in this three-year community manifesto: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.2001414

    Many of the recommendations there also apply to use of terminologies here. 

    In the semantic web world, we ideally recommend release URIs that are resolvable to a file, and term URIs that are resolvable to a term landing page. 

    1. Thanks for this.  Lots of good thoughts, though I only had time to skim.

      Our 'ideal' is that the page will resolve to XML, JSON, RDF or HTML based on the 'Accept' header, with a default of HTML.  That's something we can pull off for URLs in the HL7 space, but is unlikely to be something that URLs elsewhere can be expected to do.  (Which is yet one more reason to prefer HL7-assigned SIDs to potentially transient website URLs assigned by the owner of the code system/identifier space.)

      Lesson 4 about avoiding embedded meaning or requiring uniqueness definitely doesn't hold for our use.  We want these URIs to be meaningful to the software developers who have to work with them.  Meaningless identifiers are extremely error prone and hard to debug.  Also, for interoperability, we need each namespace identifier to be consistently used by all systems and to always mean the same thing.

      Lesson 6 also doesn't apply for code systems as versioning of code systems is handled outside the URL.  For identifier systems, it only comes into play if identifiers within the system get recycled over time.

      We don't have to worry about local URIs as canonical references must always be absolute.

  5. I think it is reasonable that HL7 requires that there is a release URI that resolves to a file rather than a website. 

    A fairly large portion of the ontology community will never use labels in their URIs, it would certainly be a disservice to HL7 to not leverage such terminology resources (in fact the Vulcan accelerator will require we solve this problem). IMHO this is a tooling issue, the devs need tools that retrieve the labels in a standardized manner. Also feel strongly that code systems are versioned and that there are resolvable URIs to each release, but not sure if HL7 could or should require this, but we usually do.



    1. We don't want  a release URI.  The URI's that we're talking about are for identifying code systems and identifier code systems.  These can't change by release.  The URI for SNOMED will be the same forevermore, no matter how many releases SNOMED puts out.  We have an alternate way of specifying a particular SNOMED release that doesn't involve changing the code systems URI.  Similarly, the URI for "Michigan state driver's license" will be similarly immutable - unless Michigan recycles their driver's license identifiers, in which case we'd ideally want distinct URIs for particular time-periods to ensure uniqueness.

    2. FHIR actually doesn't care whether the URIs resolve or not – within the spec they are treated purely as Identifiers.

      If they resolve (as JSON, XML, HTML, etc) then that can be very convenient and useful for people, but it's not relevant to how the terminology API operates.

      Having said that, "resolve" is an overloaded word.  One can think of it as direct resolution (eg a GET request on http://snomed.info/sct) or as logical resolution by a resolving service such as the FHIR terminology API.  In that case a GET on [base]/CodeSystem?url=http://snomed.info/sct&version=http://snomed.info/sct/123456789/version/20210309 


  6. We require both the parent URI of the code system itself, as well as the versioned URI. That way everyone can have what they need (smile).  For rapidly changing code systems that are used in translational research and in diagnostics, it is important evidence-wise to know what version was used. The option to include the version number as part of the provenance is really important.

    1. Not sure what you mean when you say "we require".  For FHIR, you're not allowed to have a versioned URI for the code system.  Instead, you must send the URL using the canonical syntax where you have the URL followed by "|" followed by the version number as exposed in the code system.  That's how the standard works, and there's no way to be conformant using a different mechanism.  I.e. we fully support referring to a specific release, but we don't do that by changing the URI, instead the release is communicated via a distinct element.

      1. If we look at SNOMED, then it has "both the parent URI of the code system itself, as well as the versioned URI".  The parent URI is http://snomed.info/sct and http://snomed.info/sct/123456789/version/20210309 is an example versioned URI.

        The key is that using them in FHIR you always use the "parent URI" as the CodeSystem url and the versioned URI as the version "string".

        In FHIR, as Lloyd McKenziesays, when you need to convey the pair (because we need to handle non-URI versions) you join them with a "|" and the resulting type is known as a "canonical".  For example http://snomed.info/sct|http://snomed.info/sct/123456789/version/20210309 and http://loinc.org|2.68 

        I don't believe there's a major conflict here.

  7. I'd be happy as a clam if every URL for a value set or code system gets to something, rather than giving a 404 error.