I’ve been working on a new project at work for a couple of months now, it’s another of those informatics projects that I seem to be doing more of nowadays. Obviously much time was spent coming up with the acronym – Biomedical Research Infrastructure Software Service kit. It is based up at the Glenfield Hospital with the Cardiovascular Biomedical Research Unit.
The main idea behind the project is to plug together a bunch of software packages and stick them on the cloud. We will then let other researchers come along and have an instance of our software stack at the click of button (well a few key stokes really). The obvious benefit being that it will take a few minutes to set up, as opposed to the few years it has taken to get the software to where it is now, and it will be centrally maintained – so no specialist IT skills will be needed.
We are currently going down the route of putting it on a VMWare backed cloud infrastructure, so my main responsibility (on account of being a linux geek) is to get it all into the cloud. Then we need to be able to monitor and manage it, so that’s in my domain also.
For the last six months or so I have been working on a project called HALOGEN, this was a venture to bring together various different geospatial data sets into one unified database. By doing this we can start to ask cross data-set queries – something that you certainly couldn’t do when one source is in an Excel file at one university and the other in an Access database at another!
One of the advantages of this approach is that we can then stick a web front end on the database and let the general public look at the data. It is this bit that we launched yesterday. If you head on over to halogen.le.ac.uk and do a query then you should be able to find the derivation of your favourite English place name, or where your ancestors lived during 1881, or even what and where treasure has been found!
I am finishing up on a project at work called HALOGEN, it’s a cool geospatial platform that I’ve been developing to help researchers store and use their data more efficiently. At its core, HALOGEN has a MySQL database that stores several different geospatial data sets. Each data set is generally made up of several tables and has a coordinate for each data point. Now most of the geo-folk at work like to use ArcGIS to do their analysis and since we have it (v9.3) installed system-wide I thought I would plug it into our database. Simple.
As it happens the two don’t like to play nicely at all.
To get the ball rolling I installed the MySQL ODBC so they could communicate. That worked pretty well with ArcGIS being able to see the database and the tables in it. However, trying to do anything with the data was close to impossible. Taking the most simple data set that consisted of one table I could not get it to plot as a map. The problem was the way ArcGIS was interpreting the data types from MySQL; each and every one was being interpreted as a text field. This meant that it couldn’t use the coordinates to plot the data. I would have thought that the ODBC would have given ArcGIS something it could understand, but I guess not. The work around I used for this was to change the data types at the database level to INTs (they were stored as MEDIUMINTs on account of being BNG coordinates). I know this is overkill, and a poor use of storage etc, but as a first attempt at a fix it worked.
Then I moved on to the more complex data sets made up of several tables with rather complex JOINs needed to properly describe the data. This posed a new problem, since I couldn’t work out how to JOIN the data ArcGIS side to a satisfactory level. So the solution I implemented here was to create a VIEW in the database that fully denormalized the data set. This gave ArcGIS all the data it needed in one table (well, not a real table, but you get the idea).
If we take a step back and look at the two ‘fixes’ so far, you can see that they can be easily combined in to one ‘fix’. By recasting the different integers in the original data in the VIEW, I can keep the data types I want in the source data and make ArcGIS think it is seeing what it wants.
And then steps in the final of the little annoyances that got in my way. ArcGIS likes to have an index on the source data. When you create a VIEW there is no index information cascaded through, so again ArcGIS just croaks and you can’t do anything with the data. The rather ugly hack I made to fix this (and if anyone has a better idea I will be glad to hear it) was to create a new table that has the same data types as those presented by the VIEW and do an
INSERT INTO new_table SELECT * FROM the_view
That leaves me with a fully denormalised real table with data types that ArcGIS can understand. Albeit at the price of having a lot of duplicate data hanging around.
Ultimately, if I can’t find a better solution, I will probably have a trigger of some description that copies the data into the new real table when the source data is edited. This would give the researchers real-time access to the most up-to-date data as it is updated by others. Let’s face it, it’s a million times better than the many different Excel spreadsheets that were floating around campus!
It just occurred to me that I have been in my new job for almost a month and not really told anyone what it is about, so here goes.
I am still at Leicester – I got ‘redeployed’ at the last minute as I am awesome and they couldn’t bear the thought of me leaving. I now work in the research computing support team in IT services. Like all my jobs it is split into a couple of different roles, although this time it is more of an equal split. First of all I am working in the HALOGEN project. This aims to bring together a collection of different spatial data-sets and allow correlations to be found between them, hopefully in the guise of a web front end that can do all sorts of pretty plotting etc. In that sense it is very similar to the work I have done in the past with the astronomy data sets.
The second aspect of the job is to help the RCS guys deploy their new LAMP stack to users. This means that I am going to have to deal with people, shudder.
It’s a fairly short term position, but at least it will keep me off the streets for a while.
I had an interview for a position at Leicester University today, and guess what, I got it 🙂
The position is an archive scientist, working with the LEDAS stuff and the superwasp stuff. The LEDAS side is looking after the hardware and software of the X-ray archive server at Leicester, so lots of web interface stuff to huge X-ray databases 🙂 The superwasp stuff is along similar lines; looking after the hardware and software of the archive. This is fast becoming one of the biggest astronomical databases in the World, so lots of things for me to break. Then there is the research side too, I get to do independent research too!
So all in all it really is an awesome job for me, since I do the web stuff as a hobby this is kind of merging of my job and my hobby 🙂
They said I can start at the beginning of November, so by then I need to have finished decorating the house and sold it. Then I can go and move up to Leicester. Oh yeah, and I need to finish that minor task also known as my thesis….
So I have been upgrading my computer recently, the main reason for this being that I spend a lot of time playing SecondLife and I thought it would be nice if I could run it in its prettiest mode. So this is what I have been doing:
- First up I had to figure out what sort of motherboard I have. I am sure there is an easy way of doing that, but I couldn’t figure it out, so I popped the box open and read the serial number off it, how technical of me.
- One of the key things I wanted to do was get a better graphics card, so I had to familiarize myself with them, turns out I have an AGP slot, which is being depreciated in the future, ho hum. Anyway I found a half decent card on OverClockers, I went for the Radeon 2600 XT 256MB GDDR3.
- I also wanted more RAM, you can never have too much, well actually you can, but I still wanted more, so I bought a Gig, bringing me up to 1.5GB.
- Then the waiting began. Then woo it turned up, then sigh, the fools had sent the wrong graphics card, they stuck the PCI-E one in instead of the AGP version, is it really that had to pick something off a shelf?
- Then comes installation time, RAM – easy just slot it in, well actually had to take the 512MB out and put the Gig in the first slot, so really just unslot slot slot. Graphics card – not so easy. This is the first graphics card I have owned that needs it’s own power supply, so I had to jump it off the hard drive.
- I had also decided to rebond my heat sink as my CPU had been getting to about 90 degees, so I took it off and cleaned it up with some surgical spirt (rubbing alcohol for all you Americans) and cotton wool buds. It was covered in the bubble gum type stuff that’s really messy. It’s amazing how clean you can get it with a bit of patience, see….
- Add to this a longer cable for my keyboard and legal(!) copy of Norton and I am done upgrading things for now.
So there we have it, it’s now running much faster, cooler (more like 60 degrees) and it makes things look so pretty. And I have learnt never to buy anything from Overclockers again. Trouble is now I am thinking that I could do with a bigger hard drive, and a better DVD-drive. Oh and I could do with some more USB ports……