Mar 6 2014

Generating POJOs from XML schemas using JAXB XJC

A little bit of history

XML processing in Java has come a long way in the last ten years.  Back in the old days mapping XML to Java was a bit of a nightmare, deserialising usually meant pulling the DOM apart bit by bit to get at the interesting parts. Serialisation was worse – the Java DOM API is truly horrific.  Helper libraries existed, JDOM, XMLBeans etc etc but while they made things somewhat easier they never made it easy.  Add to that the problems introduced by having various versions of xerces, various implementations of the org.w3c.dom API, and versions of the DOM api itself (urgh – dom level 2 anyone?) it added up to development hell.

Introducing XJC

Now days, there’s a plethora of libraries and methods to choose from and some of them are baked right into the core java class libraries.  JAXB is one of them.  JAXB comes with a handy little command line tool called xjc, which takes a schema and spits out an annotated class hierarchy.   The class hierarchy behaves in a sane manner (unlike the weird stuff XMLBeans used to make) and can be used for marshalling and unmarshalling XML to POJOs.

Generating POJOS

Generating Java classes from an XML schema is easy.  It goes a little something like this:

bash> xjc my-schema.xsd

Yup, it’s usually that simple.  To generate it in the package you want, you can do

bash> xjc -p my.pkg my-schema.xsd

Place the generated classes somewhere in your project and you’re ready to go.

Using JAXB

In the simplest cases you can do something as easy as:

While there’s loads more to JAXB than this example shows, this will certainly get you started.  I wrote the ORCiD Java Library using xjc and JAXB, so if you’re interested, check it out.

Feb 11 2014

The age of geocities – Bubba says HOWDY!!!

There’s a fantastic project out there that’s taking screenshots of random Geocites pages as they would have appeared when they were live.

It’s strangely compelling viewing.

bubba says howdy!!!

bubba says howdy!!!

Sites like these showcase an important aspect of our cultural heritage.  Back when the internet was called the “information superhighway” and people were still talking about the “digital frontier”, Geocites was where you could stake your claim.

I did it myself once.  I set up my own little homestead that hosted a Java Applet I’d written – a Java version of the classic game Elite.  I couldn’t get the flight engine quite right, but you could trade from Lave to Diso, view things on the radar, travel between systems and view all the original craft in their rotating vector glory.  The site is sadly lost but the memories remain.

Grab yourself a bit of nostalgic indulgence here: http://oneterabyteofkilobyteage.tumblr.com/

More background of the project can be found here: http://rhizome.org/editorial/2014/feb/10/authenticity-access-digital-preservation-geocities/

 

 

 

Feb 10 2014

ORCiD tools – who’s claiming what?

As part of my work with data-centres and ORCiD I’ve put together a tool that lets you see where works claimed within ORCiD have been published.   Start typing a publisher into the search box and it’ll look up the DOI prefix (or other identifier prefix) for that publisher from a list nearly 4000 long.  Current highlights include the American Helicopter Society with 9 ORCiDs.

See it in action at http://ethos-orcid.appspot.com/search

Why?

One of the things that’s repeatedly come up when talking to data-centres about author metadata is that while it’s easy to push data in, it’s pretty hard to get it back out.  Recent changes in the ORCiD API have made this easier, hence this tool.

Many datasets have scant author metadata, with little more that an institution name attributed to them.  Datacentres can now pull the claims information out of ORCiD and use it to selectively enrich their own metadata, completing a “virtuous circle” of integration.

Source code?

This code is available as part of the orcid-update-java application, which uses the orcid-java-client library.

Feb 7 2014

I didn’t go to university to get myself a job

Chris Bourg has written a great piece about the insidiousness of neo-liberalism and education-as-an-investment over at her blog, check it out here: The Neoliberal Library: Resistance is not futile

I am one of those hopeless idealists who still believes that education is – or should be – a social and public good rather than a private one, and that the goal of higher education should be to promote a healthy democracy and an informed citizenry. And I believe libraries play a critical role in contributing to that public good of an informed citizenry.

In the neoliberal university, students are individual customers, looking to acquire marketable skills. Universities (and teachers and libraries) are evaluated on clearly defined outcomes, and on how efficiently they achieve those outcomes.  Sound familiar?

I’ve managed to make this blog of mine really dull tech stuff and zero politics for a while now, probably out of a desire to keep myself sane.  That said, the almost inevitable (and widely ignored in the press) move in the UK towards a for-profit education system should strike fear into the hearts of anyone who stops to think about it.

There’s two sides to this, the education-as-an-investment and the for-profit education system and they go hand in hand. Ever since the introduction of tuition fees in the UK, the ideology of “investing in yours education” has gained a lot of traction here.  The next stop will almost certainly be the ramping up of the for-profit private education market, starting with the lift of tuition fee caps and ending with a two-tier education system that pumps out workers and  perpetuates inherited privilege.

Chris also talks a lot of sense about the ridiculous focus on the personal within politics, the focus on individualism at the expense of the wider movement.  Check out her blog, it’s a refreshing blast and a welcome change from the celebrity twitterati politics of ME that seems to pass for political discourse nowadays.  Sure, it’s important to understand that other people have different experiences that you.  Essential in fact.  But just understanding gets us nowhere and changes nothing.  It’s Acting, Doing.  That’s what we need.

For more info on these topics, and to actually help do something, tale a look at campaign group Public University https://twitter.com/public_uni and the UCU http://www.ucu.org.uk/index.cfm

Jan 21 2014

ORCID Open Source Java Client – I made this!

Update: ORCiD client now available as a Maven dependency!

I’ve just open sourced a Java application I’ve been working on at the British Library.  It’s a RESTlet server and JQuery/bootstrap client that enables people to claim a work from a remote service, log into ORCID using OAuth and add the work to their profile.

It was built to work with a British Library service called Ethos (http://ethos.bl.uk), but is easily customizable for use with other metadata providers, integrators simply implement an interface to fetch  metadata from their own backend and update the configuration to use it.

You can find the ORCiD client library source (with examples) here: https://github.com/TomDemeranville/orcid-java-client

You can find the application source code here: https://github.com/TomDemeranville/orcid-update-java

You can see it in action here: http://ethos-orcid.appspot.com/

Jan 18 2014

Controlling the cache headers for a RESTlet directory

My previous post described how to serve webjars with RESTlet.  This post will describe how to add caching so that users don’t swamp your servers with requests.

Put simply, you put a filter in the chain before the directory and modify the HTTP headers of successful requests.  Like so:

 

Jan 16 2014

Using Webjars without Servlet 3 on Google App Engine (GAE)

I recently stumbled upon what is just about the best thing since I first discovered the wonders of maven – the ability to add javascript dependencies in your pom.xml using webjars.  Like this:

Maven will manage their sub-dependencies and  your web pages can access the scripts direct from the jars.  Lovely.  Sign me up!  Except wait, what?  Webjars requires servlet 3 and GAE only supports 2.5.  Boo.  Thanks Google.

Google have been quick off the mark unable to deal with this at all.  There’s a fairly funny unsolved support request from way back in 2010 about it here .

Frustrated about this, and really surprised there was no existing workaround anywhere, I braved about eleventy million not particularly helpful bits of restlet documentation and finally came up with a workaround.  Yay!

You’ll need restlet for this, I expect you could use restlet just for this around the rest of your <insert framework here> app, but as I’m using restlet anyway it works a treat.

This took me ages because the RESTlet website has been utterly broken for months and RESTlet behaves really rather oddly sometimes.  It’s impossible to get hold of the application within the router and add the CLAP protocol (getApplication() is returning null) and if you attempt to add the correct client connector when building the application itself it somehow forgets by the time you get to the router.  I’ve still no idea why.

What you can do is either (a) create a new component with a new context and add the CLAP protocol to it, or (b) modify your init-parms in the web.xml to contain CLAP, like so:

The add a new component workaround will log “SEVERE don’t do this” style warnings when you start up, so it’s best to do it in the web.xml if possible.

Oct 28 2013

Battle of the tokenizers – delimited text parser performance

An interesting question about StringTokenizer popped up on stackoverflow the other day.  It was essentially about how to optimise reading delimitated data, in this case lines of integers separated by lines of spaces.

It demonstrated three things.

  1. Don’t fixate on micro-optomisations when you probably have big bottlenecks elsewhere
  2. String.split() is really slow
  3. The difference is pretty negligible even over millions of rows.

I ran some benchmarks to see how different methods compared.  I’m not claiming these are great benchmarks, but they’re good enough to demonstrate general performance of various solutions.   The results of parsing 100000 rows of “12 34″ on my macbook were:

  • Split: 366ms
  • IndexOf: 50ms
  • StringTokenizer: 89ms
  • GuavaSplit: 109ms
  • IndexOf2 (some super optomised solution given in the above question): 14ms
  • CsvMapperSplit (mapping row by row): 326ms
  • CsvMapperSplit_DOC (building one doc and mapping all rows in one go): 177ms

The surprising thing here is how slow split is compared to other solutions, I really had no idea until I ran these tests. You can map to Objects twice as fast using Jackson CsvMapper than split and three times as fast using the Guava Splitter.   Building your own using indexOf is even faster, but of course harder to code and maintain.  I wouldn’t recommend anyone use the super optimised version unless it’s really, really important to shave off those milliseconds – one reason it’s faster is that it implements it’s own String->Int conversion.

Here’s the complete hacked together JUnit test case I threw together:

 

Oct 24 2013

Formatting file sizes – making bytes human readable in GWT

It’s one of those simple UI things that make a real difference.   You have to be pretty special to be able to instantly convert 45742364 bytes into 43.6MB in your head.

Enter some great code snippets I’ve discovered on StackOverflow that do this.  This four line wonder is top answer to that question (from Mr Ed.):

There’s another similar answer to this question, that can also produces binary units (from aioobe):

Unfortunately, neither of these will work on the client side of GWT.  String.format and DecimalFormat all give it a Sadface.

How do we make it work with GWT?

We drop in the excellent GWT NumberFormat. With very little effort we end up with this fully GWT compatible version:

Which is lovely.  Happyface once more.

Oct 23 2013

Deserialising JSON or XML to a Map using Java

Well, here’s a thing.  Imagine you have some XML or JSON that looks like a map, only you don’t know the names or number of the properties in advance.  For example:

Or some JSON that looks like this:

How can you do it?  Using Jackson @JsonAnyGetter and @JsonAnySetter . All you need is the XML root element name (if you’re using XML).  The XmlMapper also forces us to wrap the Map in a POJO, with JSON it’s possible to simply read into a Map instance.  Here’s a JUnit test showing how it can be done for both JSON and XML using the same POJO:

Easy eh?