Getting started with the public ORCID API using swagger – quickstart guide

ORCID recently implemented a swagger definition file for it’s v2.0 API, which means it’s now even easier to access the public ORCID API from your website.  Just use swagger.js.  It’s Super.  And Easy.

Let’s give it a go.

First, clone swagger onto your machine.  Either use the git desktop client, click the button on the repository or fetch it like this if you’re on Linux or OSX:

Next, create a simple webpage called orcid.html in the swagger-js directory.  This is just so we can play around, if you move to production you’ll want to organise your code differently.  Something like this will work fine:

Load the webpage in your browser, then pat yourself on the back.  You’ve just used the ORCID API and written the JSON response to the web page!


That’s not very user friendly though, so let’s not stop there.  Let’s use the data for something useful and make it fancy.  This time we’re going to extract a list of titles for all of the works in the ORCID record.  Create another .html file and paste this into it:

Fantastic.  It should look something like this:


That’s fine, as far as it goes, but we’re not using the real power of identifiers here.  Let’s put a bit more effort in and create some links.  The code below does two things; first it restructures the JSON into something more useful, second it checks to see if we’ve got some kind of link we can apply (from a DOI, Handle or URI).  I’ve only pasted in the main function for brevity.

Which gives you some lovely extra info:


Hopefully that’s got you up and running with a small insight into what it possible.  Next time I’ll run through using the member API (including updating records) in the same way.

Differences between ORCID and DataCite Metadata

(written by Martin Fenner and cross posted from the Datacite blog – I’m one of the co-authors of the report)

One of the first tasks for DataCite in the European Commission-funded THOR project that started in June was to contribute to a comparison of the ORCID and DataCite metadata standards. Together with ORCID, CERN, the British Library and Dryad we looked at how contributors, organizations and artefacts – and the relations between them – are described in the respective metadata schemata, and how they are implemented in two example data repositories, Archaeology Data Service and Dryad Digital Repository. The focus of our work was on identifying major gaps. Our report was finished and made publicly available via last week . The key findings are summarized below:

  • Common Approach to Personal Names
  • Standardized Contributor Roles
  • Standardized Relation Types
  • Metadata for Organisations
  • Persistent Identifiers for Projects
  • Harmonization of ORCID and DataCite Metadata

Common Approach to Personal Names

While a single input field for contributor names is common, separate fields for given and family names are required for proper formatting of citations. As long as citations to scholarly content rely on properly formatted text rather than persistent identifiers, services holding bibliographic information have to support these separate fields. Further work is needed to help with the transition to separate input fields for given and famliy names, and to handle contributors that are organizations or groups of people.

Standardized Contributor Roles

The currently existing vocabularies for contributor type (DataCite) andcontributor role (ORCID) provide a high-level description, but fall short when trying to describe the author/creator contribution in more detail. Project CRediT is a multi-stakeholder initiative that has developed a common vocabulary with 14 different contributor roles, and this vocabulary can be used to provide this detail, e.g. who provided resources such as reagents or samples, who did the statistical analysis, or who contributed to the methodology of a study.

CRediT is complementary to existing contributor role vocabularies such as those by ORCID and DataCite. For contributor roles it is particularly important that the same vocabulary is used across stakeholders, so that the roles described in the data center can be forwarded first to DataCite, then to ORCID, and then also to other places such as institutional repositories.

Standardized Relation Types

Capturing relations between scholarly works such as datasets in a standardized way is important, as these relations are used for citations and thus the basis for many indicators of scholarly impact. Currently used vocabularies for relation types between scholarly works, e.g. by CrossRef and DataCite, only partly overlap. In addition we see differences in community practices, e.g. some scholars but not others reserve the term citation for links between two scholarly articles. The term data citation is sometimes used for all links from scholarly works to datasets, but other times reserved for formal citations appearing in reference lists.

Metadata for Organisations

Both ORCID and DataCite not only provide persistent identifiers for people and data, but they also collect metadata around these persistent identifiers, in particular links to other identifiers. The use of persistent identifiers for organisations lags behind the use of persistent identifiers for research outputs and people, and more work is needed.

Persistent Identifiers for Projects

Research projects are collaborative activities among contributors that may change over time. Projects have a start and end date and are often funded by a grant. The existing persistent identifier (PID) infrastructure does support artefacts, contributors and organisations, but there is no first-class PID support for projects. This creates a major gap that becomes obvious when we try to describe the relationships between funders, contributors and research outputs.

Both the ORCID and DataCite metadata support funding information, but only as direct links to contributors or research outputs, respectively. This not only makes it difficult to exchange funding information between DataCite and ORCID, but also fails to adequately model the sometimes complex relationships, e.g. when multiple funders and grants were involved in supporting a research output. We therefore not only need persistent identifiers for projects, but also infrastructure for collecting and aggregating links to contributors and artefacts.

Harmonization of ORCID and DataCite Metadata

We identified significant differences between the ORCID and DataCite metadata schema, and these differences hinder the flow of information between the two services. Several different approaches to overcome these differences are conceivable:

  1. only use a common subset, relying on linked persistent identifiers to get the full metadata
  2. harmonize the ORCID and DataCite metadata schemata
  3. common API exchange formats for metadata

The first approach is the linked open data approach, and was designed specifically for scenarios like this. One limitation is that it requires persistent identifiers for all relevant attributes (e.g. for every creator/contributor in the DataCite metadata). One major objective for THOR is therefore to increase the use of persistent identifiers, both by THOR partners, and by the community at large.

A common metadata schema between ORCID and DataCite is neither feasible nor necessarily needed. In addition, we have to also consider interoperability with other metadata standards (e.g. CASRAI, OpenAIRE, COAR), and with other artefacts, such as those having CrossRef DOIs. What is more realistic is harmonization across a limited set essential metadata.

The third approach to improve interoperability uses a common API format that includes all the metadata that need to be exchanged, but doesn’t require the metadata schema itself to change. This approach was taken by DataCite and CrossRef a few years ago to provide metadata for DOIs in a consistent way despite significant differences in the CrossRef and DataCite metadata schema. Using HTTP content negotiation, metadata are provided in a variety of formats.

C# FluentValidation – why we’re using it

A bit of background

I’ve been working in the C# world for a few months now. While the code is very similar to Java,the culture around open source could not be more different.  Where open source as business as usual in the Java & Javascript worlds, it’s very much exceptional circumstances only in the .Net one.  It took a bit of an extra push from myself to convince a client they should be looking at Fluent Validation for their validation needs, but I’m fairly certain (and more importantly they’re fairly certain) it was the right idea.

Fluent Validation

Fluent validation is an apache 2 licensed library that’s just moved from codeplex to github.  It enabled developers to code validation rules in a fluent manner and apply them to models.  In contrast to many modern validation approaches where the rules are declaratively mixed up with the models themselves using attributes (or annotations in Java parlance) fluent validation very firmly separates the models from the rules and has you define Validator classes instead.  Like this:


The scenario

Their requirements are not exactly simple, but not particularly odd either.  They’ve got a large legacy code base that encapsulated a bunch of business logic, validation and data access.  They’re gradually migrating away from it because it’s slow and difficult to maintain, which is a pretty common scenario.  Due to the way the code base has evolved and the varying requirements they’re attempting to fulfill there are now four different ways into the database. To be clear, that’s four different implementations of their domain model and bindings to the underlying SQL schema.   Oh, and of course there are also custom validation rules coded by individual clients in various places and ways, not to mention that many of the rules for one entity depend on data from another. Currently, the default ‘always-apply’ rules are only in one place – the slow and inflexible legacy components.

The solution to all this is to extract the validation rules out of the legacy components and create a modern, fast and re-usable validation library that is data layer agnostic and can support configurable rule-sets.

Why Fluent Validation?

The main reason was flexibility.  We needed something that could attack such a wide variety of use cases it had to be able to validate anything in any context.  Fluent Validation fitted the bill

Other reasons included:

  • Speed of development – it’s really easy to work with.
  • Decoupled rules and models – meaning we could write multiple (and possibly conflicting, don’t ask) rule-sets against the same models.  In fact, enabling customers to write their own validators was pretty trivial exercise (see a future post on this)
  • Speed of execution – the rules are immutable objects meaning you can create them once and cache them.  When the rules are part of the model you’re needlessly setting them up every time.  This is not a problem when working with one simple entity, but quickly becomes a performance problem when you’re working with multiple complex validators over 50,000 rows. Benchmarking showed that we saved considerable time doing this – one import went from 12 minutes to 1 minute when we started caching validators.
  • Ability to work with odd relationships and dependent entities – many of the entities we work with have dependencies on other things in the system, even though they’re not ‘formally’ related.  With fluent validation we could write validators that handle this easily.

ORCiD Java Client now supports schema version 1.2!

Thanks to the hard work of Stephan Windmüller (@Stovocor) the ORCiD client library now supports version 1.2 of the ORCiD schema.  He’s also updated the companion ORCiD profile updater web app to use the new library.

Users have probably noticed that github have siliently dropped support for the way we were hosting the maven repository and we’re now looking to move it to Maven Central.  Not only is that  great news for anyone out there who was thinking of upgrading, it’s great news for the library’s future.

Goodbye to the British Library, hello corporate life.

I’ve moved on.  I had a great couple of years working at the library and met a ton of really enthusiastic folk.  The ODIN project came to an end and there was little left for me to do, so I’ve found myself a new workplace more local to home.

I’m a delivery engineer, apparently.  I’ve been here a couple of days and I’m still not sure what I’ll be delivering, but otherwise first impressions are very good.  I’m currently sat in a room full of graduates writing some sort of game and they seem to be enjoying themselves.  There’s a Waitrose round the corner meaning I get a couple of free coffees a day and soon I’ll be able to cycle to work.  Working from home has piled on the pounds so burning it off with cycle power would be fantastic.  It’s a corporate code shop, but seems agile and modern in it’s approach.  Time will tell, but all my kit is top notch and they support staff and HR have been great.

I’ll continue posting technical blog posts on whatever I’m working on as and when I encounter new stuff myself, which I think is going to be rather a lot in the coming months 🙂

eta: I’m now a Tech lead!

ETD2014 slides

I’m having a great time attending the ETD2014 conference.  There’s been lots of lively discussion around ORCiD and DOIs and it’s been fantastic to gather wider perspectives.  It’s also been great to get some coding in adapting the import tool to work with the Leicester institutional repository.

For those that are interested in the ORCiD integration I was discussing earlier, the live application can be found at and the code at I’ve popped the slides on figshare (