Clojure at a Newspaper – Cluster Killers

July 7, 2014 § 1 Comment

My last blog post covered building a newspaper website with Clojure. I thought it’d be useful to go deeper and write about the problems we suffered when the newly built Clojure cluster occasionally went down. We referred to the issues causing these outages as Cluster Killers.

Cluster Killer #1 – Avout on Zookeeper

Avout is a distributed STM; giving you Atoms and Refs whereby the state is managed in a centralised store such as ZooKeeper. I’m fan of the ideas behind Avout and we used it to store all our config. Config changes in ZooKeeper would be delivered out to cluster nodes on the fly and the relevant STM watchers fired accordingly.

Our bug stemmed from our underlying network having some sporadic temporal issues. To make Avout work, the refs and atoms need to be backed by data in ZooKeeper which in turn meant that each web-app node in our cluster needed a persistent ZooKeeper connection, to keep a watch on the data. This is OK, but occasionally the persistent connection would be dropped and so each node would then need to re-establish the connection. This is still OK, and reconnection is handled automatically by the Java ZooKeeper client. So what was the problem?

The root-cause issue is tricky to explain and it took us a while to diagnose and to successfully reproduce. ZooKeeper has the concept of watchers that watch for state changes. When disconnection occurs between client and server the client-side watchers are fired but not removed. When this happens Avout re-adds the watchers. Hence if you have 5 refs backed by 5 watchers, then after a disconnect this becomes 10 watchers, then 20 etc. Eventually we had over 100K watchers on our nodes and so occasionally they self-destructed. In the middle of the night and at Christmas.

I’m aware that I should probably have fixed the problem in Avout¬†rather than write this post. We tried switching to Atoms in Avout rather than Refs and ran into a completely different bug, and so we ended up dropping the technology altogether and rolled out our own less ambitious ZK/watcher wrapper code.

Personally I wouldn’t advise using Avout again until all the GitHub issues have been fixed and there are reports of others successfully using it in production. It looks like this will happen now as the library has gained a contributor, and so I’m hopeful.

Cluster Killer #2 – Using Riemann out the box

Riemann has achieved cult status in the Clojuresphere and deservedly so. In the past we’ve been tempted to faff around with tangental event handling logic directly in application code, but Riemann gives us a sensible place where we can put this fluidic configuration. I.e. I want to email out an alert only when X happens Y amount of times, but only at a constrained rate of Z per hour.

Riemann is a handy tool to have, but on our project it became a victim of it’s own success. We initially used it for dispatching metrics to graphite and for alerting, yet after a time other teams wanted in on the action. We took our eye of the ball as we welcomed Clojure in through the back door onto other projects. Soon our Riemann config became enriched with rules keeping so much state that the Riemann server suffered an out-of-memory.

OK, surely the Riemann server going down is not a Cluster Killer? Well, by default the Riemann Clojure client uses TCP connections¬†and communicates synchronously with the server waiting for acks. It’s up to the code-users of the client to ensure that they handle the back-pressure building up inside the application when the Riemann client/server pipe becomes blocked.

We hadn’t deeply considered this in our rush to exploit Riemann funkage, and so when our Riemann server went down all our Clojure apps talking to Riemann went down too.

By all means use the Riemann Clojure client, but wrap it with fault tolerant code.

Cluster Killer #3 and #4 – Caching

We were building a newspaper website. As is common we didn’t serve all realtime traffic directly to the users as that would be challenging in the extreme. Instead we made use of cloud caching providers. Occasionally we tweaked the caching timeout rules so that we could serve more traffic dynamically (as was the ultimate goal), but often the wrong tweak in the wrong place could put massively more load on the server pool and the cluster would struggle. I figured that caching configuration was like walking a tightrope; even parts art and science.

On a related note we were using Mustache for our template rendering. Mustache templates need to be compiled ahead of being used for actual HTML rendering. Once a config change went live that set the Mustache compiled template cache time to zero. The CPUs went mental.

What Cluster Killers have taught me

Firstly, is that shit will aways happen. It was embarrassing when our new shit did happen as we built the system to stop the old shit from happening. Worse is that our new shit was shit that people weren’t used to seeing.

I’ve now got more respect for all kinds of monitoring. We had our own diagnostic framework that fired tracer bullets into the live environment and used Riemann to alert on errors. We were quite happy with this, as it meant that many problems were seen only by us, and we could act. We also had fantastic Ops personnel who delivered a Nagios/Graphite solution that gave us a whole manner of metrics on CPUs, heap-sizes etc for which I was grateful. We got skilled up on using AWK and friends to slice up logs files into something meaningful, and profiling the JVM was essential.

Performance testing is key as is A/B testing. I hear about companies (i.e. USwitch) doing great things in this space that I would hope to emulate at some stage.

Lastly I’m now a touch more conservative about using bleeding edge technology choices. Best to spend time browsing the project pages on github before adding to your lein deps.

Any other Cluster Killer stories people want to share?

Managing Multiple REPLs with Emacs Cider – Screencast

February 27, 2014 § Leave a Comment

This screencast demonstrates some recent updates to Emacs Cider around working with multiple REPLs. This covers what’s in the relevant section in the Cider README.

Hope it’s useful.

Clojure at a Newspaper

February 24, 2014 § 4 Comments

Let’s be frank about it; the MailOnline isn’t to everyone’s taste. As the worlds biggest newspaper website it is a guilty past time for many. It has some decent editorial content but it can also be distressingly shallow.

I worked for there for a year. I was lured by the opportunity to rebuild the old website system in Clojure. Whilst some in my circle have been furthering mankind over at the Guardian, I’ve been working for alternative forces.

There is a thread of arrogance in my desire to join a big media company and petition the building of a new system. The developers I found there humbled me in being some of the most personable and most talented I’ve had the pleasure of working with. Not to mention diverse.

When I first joined this project Fred George had been extolling the values of Programmer Anarchy. I don’t want to get bogged down in discussing the particulars of this software methodology, suffice to say that it is not a software methodology. Programmer Anarchy was Fred’s way of putting developers more in charge where a command and control regime had previously existed. It was a means to shake up an establishment which has since successfully led to durable self organising teams leveraging modern technologies.

I arrived as “anarchy” was rife and the continents had yet to form. The rewriting of the public facing newspaper website had yet to begin and so there was a clear opportunity to put a stake in the ground.

The reality was that I was fortuitous. The incumbent CTO had already manifested a data pipeline where all the data needed for the website went from the old system straight into an ElasticSearch instance. He knew that only way to begin slaying the old publishing/editing system was to free up the data. Future developers could now build a new Eden by exploiting this easily accessibly NoSQL store.

Before Clojure (BC) the CTO had the idea that competing teams would trial a Node Vs Ruby Vs X solution as to best appropriate the data for building the next generation website. I was joined on my first day by Antonio Terreno and we soon started hacking on Clojure to see what we could do.

After a couple of weeks we presented a prototype of the website back to the wider team. During the presentation we demonstrated how easy it is to hit ElasticSearch from the REPL and to mash Clojure data structures against HTML templates as to get the desired output. We showed how you could easily arrange sequences of articles using functions such as partition-by to get table layouts. With website content being broken up into simple data structures along with Clojure’s stupendously powerful sequence library, it all becomes very easy.

After the presentation the wasn’t much appetite to build similar efforts in Ruby or Node*, and so Clojure was declared as the winner.

There was nothing onerous about what we had achieved. We used some functional programming to get the data into a form that it could be easily rendered, and we were organised about how we pulled back data from ES as to get good performance.

Yet fundamentally we hadn’t written much code. I raised this sheepishly with the CTO and his response was: “that’s how I know it’s the right solution”. Clojure is the winner here.

A year later and 20k LOC the website is predominantly Clojure based. The home pages, main channel pages, and articles are served through Clojure, along with a mobile version of each. The old platform will linger on serving legacy parts of the website that need to be updated, but its day are numbered. Killing off the various dusty corners of the website requires business stakeholder input as well as bullish developers.

The bulk of the work was (and still is) around having a production ready platform; monitoring, diagnostics, regression testing, metrics gathering, performance tuning and configuration management**. We have a variety of back-end services which interact with systems such as Facebook and Twitter, and we’ve upgraded significantly the data-pipe that carries data from the old system into the new. We also have a significant web-application that proxies the main web application as to manipulate the HTML using Enlive as to feed it into a CMS for editors to use.

All this is really only phase one. There is a pent up demand from the business for new functionality and many separate sub-systems that need to be rewritten.

For me, the real win has been how the wider development team has embraced Clojure. By the end of my year there are devs hacking on elisp as to make their development environments more personable. We have vim warriors doing their vim warrior thing. There have been numerous Clojure open source contributions and involvements on the London Clojure community scene. We’ve been able to spawn Clojurians in various countries outside the UK. We had a couple of presentations at the Berlin EuroClojure and I had a world beating hangover afterwards for the plane ride home.

I’m sad to move on but I leave behind a strong team who have fully embraced Clojure and love the technology. My immediate future is working for JUXT full time (juxt.pro).

* Node.js is a dominant choice for various self-contained public facing web applications.
** I plan to do a couple of future posts about the technical particulars.

REPL bootstrap pimpage

May 2, 2013 § 4 Comments

The last couple of Clojure projects I’ve been on the team has invested a bit of time in pimping the REPL. Here are some of the things we’ve got happening when a custom bootstrap REPL namespace is loaded:

Dev of the day

Because it’s nice to make people feel special, show the dev of the day:

Screen Shot 2013-05-02 at 08.58.13

This can be pulled from GIT:

This code needs the command line tool figlet installed.

Print last few GIT commits.

Make the project feel alive with some recent commits

Screen Shot 2013-05-02 at 09.12.20

May as well show the GIT branch whilst you’re at it:

Architecture Diagrams

A dev on our team wanted to draw a picture. Rather than using something like Omnigraph he stuck to the principal of always using Emacs for everything, no matter how painful or irrational. So he fired up artist-mode and starting sketching. In one of our projects we’ve now got a nice ASCII sequence diagram explaining the flow to the wandering REPL dev. The ascii itself is stored as a txt file, slurped in and displayed upon bootstrap.

Random Stuff

We once had a sequence of project quotes stored in a namespace and the REPL displayed one of them using rand-nth. It’s a nice way to start the day being confronted with messages like ‘The Horror.. The Horror!’, or quotes from Macbeth, Forest Gump and various gangster movies. If you get sick of the same quotes then add a new one.

Since my current project is for a newspaper company we’ve got a random headline showing up. You could in theory sit down and read the news (a flavour of) by repeatedly recompiling the REPL namespace.

Other stuff:

  • If you’ve got a web-app, launch it.
  • Display what is running, ports etc.
  • Use stuff – (:use [clojure.pprint] [clojure.repl])
  • Config – show what’s in ZooKeeper or whatever.

Useful REPL functions: Our bootstrap code has a bunch of handy functions in for making life easier for the dev. For example pre-loaded queries against some data-store or helper functions for data-structures you’re always munging. We have a separate repl-helpers NS that contains them and we list the contents using ns-publics.

I’m interested in what other people/teams have done to make life more interesting in the REPL. What’s cool and fun?

Five reasons to learn Clojure and Emacs together

December 27, 2012 § 12 Comments

There’s often debate about whether newcomers should try to learn both Clojure and Emacs at the same time. My take on it is that yes they should and here’s five reasons why.

1) Liberation

Smart IDEs ares like powerful drugs. They treat the sick and the dying wonderfully well but tend to concentrate on the symptoms more than the problem, that of bloated code. They are great for refactoring around masses of static types, but you do so inside of a walled garden in that the features you want are built into the IDE itself, and therefore you have to rely mostly on the IDE developers to make things better. The modern powerhouse IDEs are complected.

Emacs at heart is a basic text editor, one that has been around for decades with extensibility at it’s core. Because of this, and the fact it’s open source, the plugins for Emacs are essentially limitless in what they can do. There are vast amounts of major and minor modes and colour themes. Org mode awaits you.

It’s this ability to tailor Emacs so much that makes it feel liberating; you can do what you want with it. And if you really don’t enjoy a spot of configuration-masochism then they are many ‘starter kits’ on the web that will quickly get you going, see the ‘Emacs Starter Kit’ for starting out with Clojure. You could postpone engaging with Paredit mode if you wish, and make use of Cua mode to bring the sanity back to copy and pasting. The feeling of liberation can always be throttled.

2) Change

Moving from a statically typed language to Clojure is not easy. With this in mind are we really making things that much easier by encouraging devs to stick things out in a familiar IDE? Without a pertinent and dramatic sense of change are we not just making things harder? Why try to disguise the fact that Clojure is a different development world away from Java. You can’t refactor or navigate around in the same way and the debugging just isn’t as good. Developing in a sophisticated IDE without 80% of the IDEs usefulness would likely be incredibly frustrating and a turnoff. You’d be dressed up for the wrong gig.

Change also begets change. Moving to Clojure is a chance to review a lot of things; the VCS used, testing tools, approach to CI, artefact management, anything could be up for grabs. Better to make a clean break from the past.

3) Support

Following on from 2 there is the issue of support. You could be ambivalent about devs on your team using Clojure with their favourite Java IDE but then you must consider what the support cost will be. They may stumble across some half hearted wiki page about getting a certain Clojure plugin working but then would it work with Lein 2 and not just Lein 1? Is it using the deprecated Swank-Clojure or NRepl? The Clojure landscape is still shifting, newbies should probably be using the best lifeboats.

4) Simple and Easy

Like with Clojure learning Emacs from scratch isn’t excessively easy but then in the long run it’s simpler. Why would we suggest newbies to stick with their current IDEs when there’s a danger that they might not move on afterwards? Clojure and Emacs are a package of simplicity, they go well together.

And we know that simplicity is not always for free. There has to be some learning-pain before you can reap the rewards. If a developer is truly unwilling to switch to a lighter weight, more extensible dev environment then they may also be equally unwilling to make the paradigm shift to functional programming.

Of course the flip side to this is that it’s possible a developer may go to lengths to make coding even more of a pleasurable experience in the smart IDE landscape. If so, then hats off to them.

5) Mass Adoption

Trying to spread Clojure whilst recommending an IDE change sounds very counterintuitive, but then is Clojure itself ripe for mass adoption across the industry? Is this the end goal? These questions are beyond the scope of this post, but suffice to say that I’m not sure it would be a good thing if all people at the large institutions I’ve worked at started coding everything in Clojure. Emacs certainly adds to the learning divide, and makes it more explicit.

Please Vim warriors take note that these arguments could just as easily pertain to Vim.

Clojure at a Bank – Freeing the Rules

November 29, 2012 § 1 Comment

I’ve written previous posts about a team at an investment bank making the switch from Java to Clojure. In this post I’d like to focus on the business rules being moved in the process.

Rules/Memes

I’ve found that business rules at large institutions tend to resemble viral memes and genes. This is to say that they are regularly duplicated amongst systems, manual or automated, and most will persist long after the people maintaining them move on. The hardy ones manage to jump from dying systems into new ones.

In our case the rules are migrating from a large monolithic Java stack into a new Clojure code base. We wanted to give them a reformed existence that would be more streamlined, free of elaborately crafted OO structures, and where they would not be hemmed in by overly enthusiastic and rigidly defined tests.

Tags rather than Types

One of the first things we did differently was to use tags rather than types for modeling the business domain. In the old world a financial product would immediately be given its own Java class and its similarities with other products expressed through inheritance and interfaces, perhaps with some mixin styled object composition thrown in. This approach has failed our system. Sure that with the right people, time and foresight, most systems in most languages can be beautiful, but in my opinion the generalistic approach of modeling the problem with static types opens the door wide open to epic waste and codebases with limited options.

For our Clojure codebase we instead simply treated products and trades as maps of immutable data and we built up a corresponding set of keywords – ‘tags’ – to describe what we know about them.

Imagine that we’re processing orders for cups of coffee. Without breaking the problem domain down to the nth degree, we can start by introspecting and tagging whatever data comes our way with business significant terms such as :hot, :milky, and :has-froth. Predicates to drive business rules become trivial to write, i.e. (when (superset? tags #{:sprinkles :caffiene :large}) (do-something)). The code matching up data to tags would then become an all-important kernel of the system.

CSS styled JSON building

We used tags for a range of things, one of the prime cases being to build up JSON for sending to a downstream system. We used tags to determine a) JSON schemas and b) JSON values.

Taking values first, imagine that we’ve built up a JSON structure but with the values omitted, i.e:

{:beverage {:type _ :temperature _ }}

A simple DSL for defining values may then look like:

(values/add :type #{:cappucino} "Coffee")

and:

(values/add :temperature #{:cappucino} "Hot")

Here the rules look a lot like CSS. We use path selectors such as :type and :temperature and instead of matching on HTML class attributes we simply match on tags; #{:cappucino}.

The rules for building up the schema could work in a similar way. A :frappuccino may need a different outgoing data structure than for a :cappucino, perhaps to do with some inherent complexity of iced drinks.

What’s covered here is fairly trivial stuff but it’s a starting point for more. You could extend this DSL as to allow multiple value-rules to be defined at once, to add predicates, and to make the values Clojure functions that work off the input data. You do of course need some core boilerplate code for mashing the JSON schema and value rules together but this should be simple to write.

Like CSS there are some pros and cons. Pros are that adding new rules becomes trivial and that the rules themselves are inherently reusable. Cons are of having to manage a large set of flattened rules.

Rules as Data

On our project we’ve got a fight on our hands in that the number of business rules is large whilst having a wide amount of variance across them. Instead of spreading the rules out we have now a few thousand – and growing – number of rules living in a concentrated handful of Clojure namespaces. At first glance this looks like poor design, this is fugly and bad right? What about storing the rules in a DB or reapplying a fair bit of OO styled modeling?

First because all Clojure code is data then we absolutely do have a rule DB, and not just a pile of namespace code editable in Emacs. By thinking of the rules entirely as data we’ve opened options for ourselves.

For example we’ve built a UI that allows non-technical users to directly browse the rules and to walk the hierarchies between them. Users can play around with input data and tags to see what rules are used for a given context. They can view information about why a certain rule is selected above another, the CSS-like specificity behavour. We also show the source code, namespace and line number behind each rule. A team member recently added the ability for devs to click on an html link that opens up Emacs at exactly the right point where the rule-code is.

The rules form an audit trail of how payloads were crafted. Since we also use them for data reconciliation purposes users can now write comments on data mismatches, directly in the context of a rule.

REPL, Builds and Tests

The tags and rules-as-data approach has allowed us to build up a set of tools for experimenting in the REPL. For example it’s common to query an ElasticSearch instance from the REPL to bring back a certain population of test-data and then to throw it at the rule engine to see what happens. We’ve got build-server agents doing a similar thing to alert us when changes in the rules bring about unexpected consequences.

As a more immediate line of defense we’ve got unit tests utilising core.logic to make sure that the rules are sane and that repetition and redundancy are kept to a minimum if not prohibited. We’ve added to the UI the ability to highlight sets of rules that are too close to each other, where a revision of tags might be applied to clean things up.

And if one day we decided that our rules needed to be stored in a completely different way, then we could always write some more Clojure code to read them back in and to spit them out again as something different. The door to moving them into a graph DB, RDF, or into a fact based DB like Datomic is not closed.

Wrap up

We’ve learnt that when you’re working with rules as data then many more possibilities open up as to what you can do with them. Quite the opposite to what you typically see at large institutions where the rules are tucked away like diamonds in the earth inside of type-heavy OO modeled systems. For us Clojure has been a great emancipator of business rules.

Clojure at a Bank – Clojure Code Immaturity

November 4, 2012 § 13 Comments

I’ve posted recently about a team at an investment bank wanting to make the transition from Java to Clojure. Here I want to write about some of the issues around our Clojure code being ‘immature’. Before I do though it’s only fair I state up front that not all of our early code was terrible, Clojure is indeed a pragmatic language where you can write decent and understandable code relatively easily. Still..

Comments and LOC

Most of the devs on our team have a TDD/DDD/BDD background with half having once plied a trade as XP consultants. Our approach to writing beautiful Java code was to make it flow and to tell a story. Expressive names for classes, methods and variables, each chosen to convey clarity and meaning to the fortunate reader.

Therefore when we jumped both feet into Clojure we unconsciously brought with us the belief that comments just weren’t needed. Add in to the mix that we gave our args the shortest possible names – most of the time just single characters – one could argue that we purposefully went about trying to obfuscate what we were writing.

Then to make our code more fugly, we executed the common newbie sin of not really knowing what’s in Clojure.core but churning out bucket-loads of FP code anyway. For example we had a brave early attempt to get around assoc-in not being able to work with nested vectors as well as maps when it actually could (assoc-in m [:key1 0 :bar]). This led to some funky code existing in functions with interesting sounding names like ‘weave-in-vectors’ – the choice of naming being a sure fire smell that there must be a more idiomatic way of doing things. Then there’s the little stuff: ‘if’ and ‘let’ vs ‘if-let’, ‘let’ as its own form rather than embedded into a ‘for’ as ‘:let’. Then there’s zipmap, juxt, mapcat, group-by… a considerable list of helper functions that avoids us having to write our own cruft.

I also have to own up to having a personal fetish for a low LOC wanting it to compare ever so favourably to the old Java stack that preceded it. The cost was that some people wondered wtf some of my code did but at least there were few lines of it. There’s got to be some prize for that, right?

We matured past these issues by communicating amongst ourselves as we found better ways of doing things and thankfully we had a team where criticism was generally well received. Clojure itself is an opinionated language and when you’re coding in a more idiomatic way the pieces of the language tend to fit more easily together. Idiomatic = more graceful/simplistic. Don’t say to a colleague: “Your code is shit”, do say: “There’s a more idiomatic way of doing this”. StackOverFlow and blog posts are full of examples of how to write more idiomatic code for particular use-cases and the Clojure Google Groups are good also.

Namespaces

In Java/.NET development we now have extremely powerful tools that help you to navigate your way around a large code-base – i.e. Eclipse/Intellij for Java. As the amount of class files inexorably grows it never really seems to matter and you just get used to it. (Here’s a controversial Recursivity blog post entitled “IDEs Are a Language Smell”).

In FP a single namespace will nearly always contain much more logic than compared to the average OO class. Since you’ll be spending more focused time in fewer files this then creates a need for namespaces to be presentable. Comments at the top can be helpful, they should have fewer public functions, and they should be split up if they get too large – we’ve occasionally split out the cruft from a namespace into an accompanying ns-utils.clj to make the main one clean. We’ve also reapplied various bits of DDD and OO to model namespaces around business domain concepts and to keep them well encapsulated.

Then there’s (:require) vs (:use). (:require) is much better as each dependency usage is clearly marked with a prefix so you can clearly see where dependencies are used. This is kind of obvious but in the early days we used (:use) in most places – without only – and now we’re having to go back over and correct. Note that we did play around with using lein-slamhound for optimising our namespace declarations but then we found that the kind of namespaces you typically want to use this on need restructuring anyway.

Macros, Protocols, Defrecords

Having fun with macros is a right of passage. Some people passionately detest them whilst others enjoy using complicated solutions to solve complicated problems. I’ve learned that if I’m going to build a macro then it helps to keep it minimal and to delegate out the logic into a separate function relatively quickly. We once had some special code eval’ing deftest forms to generate tests based on some data that we had saved up. The idea was that the auto-generated tests would then play nicely with lein-test and consequently our build server. The trouble is that arguments to macros have to be serialized and you’re limited in by this, not to mention that the code can become that much harder to follow. By looking under the hood at what the deftest macro actually did – basically registering test functions – we replaced this little mini-framework of macros and evals without much fuss and it gave better performance in return (we stopped reloading the same data twice, once to register the tests and once to run them). Macros are helpful and powerful but they come with a cost.

Protocols are like having an extremely awkward member of the team around; someone who can get stuff done with an air of awesomeness yet at the same time you’re wondering if there is just isn’t a simpler way. They have some quirks setting them up and it’s added complexity, but on the whole I have to say that Protocols have been good for us. We had some hairy areas of the codebase where we were doing concurrent operations passing around a lot of functions, partially built up or otherwise. Introducing Protocols allowed us to pass a family of related functions around together as one thing with some immutable state. In a different area of the code-base they also laid down some hard interface definitions. We didn’t strictly need them for this purpose but enforcing a little of compiler-checked OO felt like a good thing.

Defrecords were primarily introduced for performance reasons. Where we were creating lots of little map instances we switched to using simple Defrecords instead. I would also argue that they forced us to think a bit harder about our modeling of data and use of simple data structures, leading to cleaner code in some areas.

Wrap Up

In my opinion figuring how to write Clojure code that is more idiomatic and simplistic is what makes Clojure fun, along with the fact that Clojure is blistering pragmatic. The learning never stops.

Clojure at a Bank – Support

October 24, 2012 § 2 Comments

My last post covered the rationale behind our team at an investment bank wanting to make a switch to Clojure from Java. I want to write in this post about how we were supported in our efforts by those at the bank.

Aircover from Within

First not all the team jumped in at once to use Clojure. We created what we called a ‘pod’ within our team. Invitation to the pod was extended to all but to join you had to get the Emacs/Swank/Lein environment setup working first by reading our projects github README page. People joined the pod at varying times with their own personal approach. I.e. some devs wanted some space to try out their own thing in the Clojure codebase whilst others wanted to pair up. Some devs wanted to install the very latest version of Emacs themselves and understand all of our Emacs config whilst others took a more direct route through to the code. At one point we had an Emacs versioning arms race of which I dropped out.

Some of the team didn’t join the pod initially and stayed back to provide air-cover to those within the pod. For the system we had to maintain/develop they stepped up to satisfy whatever business requirements came our way as well as handling any emergent technical problems. Gratitude is owed and deserved.

Technical Leadership Vs Bottom Up, Organic Adoption

I’ve heard of a case at a different investment bank of someone fairly senior decreeing that development going forward needs to be done using Clojure (I disclose now that I don’t have all the facts). I’m not sure how I feel about the idea of this. On the one had it feels good that someone high up has got a kick-ass mentality to drag up the quality of the tools that people in the trenches are using to eliminate waste and to speed up development. On the other hand I can see some people getting frustrated – i.e. not everyone wants to get bloody on the cutting edge and to call themselves Lisp hackers, having to revert from the cosy wizardry of professional refactoring powerhouse tools like Eclipse/Intellji to having to use Emacs or Vim, tools of old.

Changing from a statically typed OO language to a dynamically typed FP one is a fundamental change requiring lots of organic support and I’d be interested to hear if it can be forced or not. Maybe it can be. Maybe this is good. There’s also still immaturity around the Clojure tools ecosystem to grapple with. Nrepl is replacing Swank-Clojure, Lein 2 is about to be released which is a major upgrade. Although you can get copious amounts done using Clojure one has to say that the ground still has the occasional tremor and this may affect bringing onboard the masses. I would expect that there is lots of devil in the detail around Clojure mass adoption.

At our bank we were given some room and trust to take technical decisions so long as our cost/benefit case was solid. It felt right for our team at the time to start using Clojure – we were not instructed to do so.

Support from the Business

The Business were typically not as patiently supportive as our technical stakeholders. However pleasant our business customers are there’s always an element of ‘why we have got X amount of the team diverted away from serving the direct requirements of the business to do tech stuff?’

It may sound obvious but the only effective way I’ve found to deal with this is get a synergy between the ‘tech stuff’ and needs of the business as soon as is reasonable. Our pod of Clojure devs made a switch from straightforward 100% reengineering to doing a mixed blend of new requirements for the business too. Now the business are happier and we get some relief. Since development is more productive around our new infrastructure we can turn around long standing requirements for them faster and it’s fun to do. They will reap the rewards from here on in.

This does however mean that we’ve increased the amount of work we’ve signed up for and overall pressure on ourselves, this has affected our morale. We are currently sailing quite close to the wind with anti-patterns ‘scope creep’ and even at times ‘death march’ hovering. There’s no doubt that we’ve had our share of stresses and that there are more to come. I can though take some comfort that with hindsight the thought of sticking with the old status quo of a monolithic Java app does not become any more appealing. Eventually the waste will crush no matter what the resources of an institution.

Clojure at a Bank – Moving from Java

October 23, 2012 § 7 Comments

An increasing number of large institutions are now wanting to leverage Clojure. Large institutions are inherently complex and can often be dysfunctional in various areas. They do however attract lots of very good people and good people often bring with them the best tools.

Investment banks in particular have hard problems to solve. The financial instruments themselves have varying degrees of complexity, although whatever complexity they have often pales in comparison against the systems ecosystem that maintains them. I said once to a senior manager at one place that the number of systems I was seeing in various strates reminded me of Darwin’s survival of the fittest, beasts coming in all shapes and sizes. He said I was wrong because in the wild animals actually die rather than live on to relentlessly cause problems.

Over a year and half ago I was a team lead on a medium sized development team running a strategically important middle-office system. We were 100% a Java team and we had some pretty good devs onboard with decent backgrounds. We had plans to split up our Spring/Hibernate 1 million LOC behemoth and were getting in some decent wins to make this happen. Then I persuaded a Swedish colleague I’d worked with before to come work with us. I had the feeling the team needed someone to come in to and give us a kick and to challenge our general direction – he certainly lived up to this. Large scale refactorings I was fairly proud of he denounced as ‘just moving shit around’. If we were serious about changing our system then the only way to do this was to change our course radically. Although he was challenged fairly robustly – to the extent that at times I know he got quite pissed off – a few things we were already conscious of.

Firstly Java encourages lots of modelling. “Let’s sit down with all the power of OO and create of lots of classes that attempt to model the problem at hand. We won’t use UML but instead we’ll evolve it in an agile way.” The model in the investment banking world is actually quite complicated so this OO graph grows quite large. Abstraction is brought in do deal with variance; abstract classes and interfaces proliferate. The type system encourages this. I had used some dynamical languages before and it was quite obvious that we were essentially forcing lots of schema and type definition on to a problem domain that just didn’t want or need it. Sure you can minimize the pain with patterns such as composition over inheritance, better abstractions etc, but we had already admitted to ourselves in some quarters that we were fighting the wrong battle.

And soon it becomes like wading in cement. Our system was predominately conceived around 2007 when it was the heyday of using Test Driven Development (TDD) and to have a heavy reliance on dependency-injection frameworks like Spring. I should state now that I don’t actually dislike Spring with any religious fervour – it just looks redundant in retrospect now that I’m working 95% with FP code. Our app back then also had lots of Mock framework usage – we had both JMock and Mockito flavours. Before our big move to Clojure I’d already decided I just didn’t get on with mock frameworks. They slow down development as you constantly find yourself fighting against lots of incomprehensible tests written dogmatically that rarely test anything useful. I really can’t really think of many situations right now when using Mocking would be a good idea, even in a Java/OO setting.

I had also picked up a consensus view amongst industry peers that the only way to truly break up a big monolithic system like ours was to enforce hardline decoupling around service boundaries. The best way to achieve this is naturally to use different languages. If we’d have stuck with Java we’d never have ‘gone for broke’ to drop a crowbar into our system and to wrestle it apart. Now we’re using Clojure this is essentially what we’re doing, and not in a bad way.

You also need to ask yourself when contemplating a fairly seismic shift of tool usage what about the business? What about the stakeholders? Who is going to pay for this kind of re-engineering and why?

In the end there are a lot of answers to these questions. My answer of choice tends to be waste. Waste on the scale lots of project have should not be tolerated. Waste in terms of time spent wading through cement-styled Java, time spent hunting through layers upon layers of indirection, interfaces, unit/integration/acceptance tests, Spring XML files, just to find the one if-statement that actually has some business significance, Eureka! We had/have waste on a fairly large scale and to start developing our system in a fundamentally different way almost seems the only sensible option especially if you want people to stick around.

Here are some specifics – our new code is going to be an order of magnitude less in volume than the old and this is being conservative. With the new code you can use your REPL to navigate in and to run any little bit of it you like. You can change any code and see the changes immediately in your browser (our app lives inside of web-server instances) or REPL. Instead of modelling the world, every man and his dog, we’re going to instead write code that operates against simple data structures. If then we need to be able to determine what is actually going on – wtf – we’ll add lots of introspective abilities to our code. In short, to replace copious amounts of Java code along with built up DSLs that only devs use, we’re going to build a simple rule engine. And we’re going to provide some visualisation tools so that you can actually see the rules themselves. A rule editor may follow.

Sure we had lots of debates about why we couldn’t do what we need to do in Java, and in the end it became clear that you couldn’t really have the debate on these terms. The decision of switching to Clojure fell in to three natural brackets. 1) What’s good for the long term nature of the project? 2) What’s good for this present team? and 3) What’s good for the individuals involved? 1) FP seemed a better fit for our system – it’s basically functional blocks of code hanging of financial market events. Our system also suits a dynamic language making all the rigid schema definition less of a problem. On a different tack surely the long term view project is best served by using the best tools and being able to attract the best people. 2) The team will be motivated learning something new and we’ll likely get better retention. This is ultimately good for the costs of the business. 3) The majority of the team wanted to try Clojure. As Hakan made the point – we don’t get paid to enjoy ourselves at work, but it doesn’t hurt. It’s an obvious point – a happier team is more productive.

In the end at some point a team just has to take a jump. Certainly there’s been problems and long standing issues in our move to Clojure and there are some significant questions still yet to be resolved. We’re a global team and this brings with it a wealth of complexities. A year ago we were also a bunch of Clojure newbies finding our way and some of our code and decisions reflect this. I’ll write about these in a subsequent post.

Year of Clojure at a Bank

October 22, 2012 § 2 Comments

We’re roughly coming up to a year anniversary of when myself and a ragtag bunch of pan European devs started to use Clojure for our day jobs at an investment bank working on on a strategically important middle-office platform. Hakan Raberg is the change agent that when joining the team argued the case for moving to Clojure to whom I am indebted. Hakan and I did a talk at EuroClojure talking about the project – video here.

We’re also going to talk this week at FP Day in Cambridge. Here we’re going to address ‘Clojure after the honeymoon’. Here we want to give an honest account of the problems we faced introducing Clojure inside somewhat of a conversative environment, and of what is the current state of Clojure is. We also want to talk about some of the technical challenges we had particularly around making the jump from Java to Clojure. Hakan left the project to rewrite Emacs in Clojure and to widen his brain neural pathways even further, I’m still on this very project dealing with the fallout.

Time permitting I hope to write up some of our experiences talking about some of the cultural shifts and also about some of the technical stuff we did – i.e. our approach to regression testing, our Clojure rule engine, and the continued battle inside and out of a large Java monolithic beast. I’ll do this in a short series of posts.

Where Am I?

You are currently browsing the General category at Pithering About.