Getting started with the public ORCID API using swagger – quickstart guide

ORCID recently implemented a swagger definition file for it’s v2.0 API, which means it’s now even easier to access the public ORCID API from your website.  Just use swagger.js.  It’s Super.  And Easy.

Let’s give it a go.

First, clone swagger onto your machine.  Either use the git desktop client, click the button on the repository or fetch it like this if you’re on Linux or OSX:

Next, create a simple webpage called orcid.html in the swagger-js directory.  This is just so we can play around, if you move to production you’ll want to organise your code differently.  Something like this will work fine:

Load the webpage in your browser, then pat yourself on the back.  You’ve just used the ORCID API and written the JSON response to the web page!

orcid

That’s not very user friendly though, so let’s not stop there.  Let’s use the data for something useful and make it fancy.  This time we’re going to extract a list of titles for all of the works in the ORCID record.  Create another .html file and paste this into it:

Fantastic.  It should look something like this:

works

That’s fine, as far as it goes, but we’re not using the real power of identifiers here.  Let’s put a bit more effort in and create some links.  The code below does two things; first it restructures the JSON into something more useful, second it checks to see if we’ve got some kind of link we can apply (from a DOI, Handle or URI).  I’ve only pasted in the main function for brevity.

Which gives you some lovely extra info:

works2

Hopefully that’s got you up and running with a small insight into what it possible.  Next time I’ll run through using the member API (including updating records) in the same way.

C# FluentValidation – why we’re using it

A bit of background

I’ve been working in the C# world for a few months now. While the code is very similar to Java,the culture around open source could not be more different.  Where open source as business as usual in the Java & Javascript worlds, it’s very much exceptional circumstances only in the .Net one.  It took a bit of an extra push from myself to convince a client they should be looking at Fluent Validation for their validation needs, but I’m fairly certain (and more importantly they’re fairly certain) it was the right idea.

Fluent Validation

Fluent validation is an apache 2 licensed library that’s just moved from codeplex to github.  It enabled developers to code validation rules in a fluent manner and apply them to models.  In contrast to many modern validation approaches where the rules are declaratively mixed up with the models themselves using attributes (or annotations in Java parlance) fluent validation very firmly separates the models from the rules and has you define Validator classes instead.  Like this:

Simple!

The scenario

Their requirements are not exactly simple, but not particularly odd either.  They’ve got a large legacy code base that encapsulated a bunch of business logic, validation and data access.  They’re gradually migrating away from it because it’s slow and difficult to maintain, which is a pretty common scenario.  Due to the way the code base has evolved and the varying requirements they’re attempting to fulfill there are now four different ways into the database. To be clear, that’s four different implementations of their domain model and bindings to the underlying SQL schema.   Oh, and of course there are also custom validation rules coded by individual clients in various places and ways, not to mention that many of the rules for one entity depend on data from another. Currently, the default ‘always-apply’ rules are only in one place – the slow and inflexible legacy components.

The solution to all this is to extract the validation rules out of the legacy components and create a modern, fast and re-usable validation library that is data layer agnostic and can support configurable rule-sets.

Why Fluent Validation?

The main reason was flexibility.  We needed something that could attack such a wide variety of use cases it had to be able to validate anything in any context.  Fluent Validation fitted the bill

Other reasons included:

  • Speed of development – it’s really easy to work with.
  • Decoupled rules and models – meaning we could write multiple (and possibly conflicting, don’t ask) rule-sets against the same models.  In fact, enabling customers to write their own validators was pretty trivial exercise (see a future post on this)
  • Speed of execution – the rules are immutable objects meaning you can create them once and cache them.  When the rules are part of the model you’re needlessly setting them up every time.  This is not a problem when working with one simple entity, but quickly becomes a performance problem when you’re working with multiple complex validators over 50,000 rows. Benchmarking showed that we saved considerable time doing this – one import went from 12 minutes to 1 minute when we started caching validators.
  • Ability to work with odd relationships and dependent entities – many of the entities we work with have dependencies on other things in the system, even though they’re not ‘formally’ related.  With fluent validation we could write validators that handle this easily.

ORCiD tools – who’s claiming what?

As part of my work with data-centres and ORCiD I’ve put together a tool that lets you see where works claimed within ORCiD have been published.   Start typing a publisher into the search box and it’ll look up the DOI prefix (or other identifier prefix) for that publisher from a list nearly 4000 long.  Current highlights include the American Helicopter Society with 9 ORCiDs.

See it in action at http://ethos-orcid.appspot.com/search

Why?

One of the things that’s repeatedly come up when talking to data-centres about author metadata is that while it’s easy to push data in, it’s pretty hard to get it back out.  Recent changes in the ORCiD API have made this easier, hence this tool.

Many datasets have scant author metadata, with little more that an institution name attributed to them.  Datacentres can now pull the claims information out of ORCiD and use it to selectively enrich their own metadata, completing a “virtuous circle” of integration.

Source code?

This code is available as part of the orcid-update-java application, which uses the orcid-java-client library.