Peas – The Modern Dataist

“I eat my peas with honey;

I’ve done it all my life.

It makes the peas taste funny,

But it keeps them on the knife.” – Anonymous

I’m the sort of person who hoards random leftover stuff in the toolshed because reasons. One reason is that they’re useful (wrenches). Another is that I use them a lot (screwdrivers). Yet another is that I know other people use them, so I’ve got one in case I need one (chisels).

Perhaps the most maddening reason is that there’s some stuff I have just because someone offered me it and I just couldn’t say no. Like the bench-mounted double grinder (I don’t have a bench) and a mitre box, when I think something might be cool or useful, it goes in the shed perhaps never to be used, but I’d rather have a tool that I might need than not have a tool I do need.

With data, there are just so many tools though ! Knowing which one to use is difficult at the best of times – they all do one thing very well and other things beside – and it sets off my anxiety just thinking about the range of possibilities.

Helpfully, there’s someone in the data community (isn’t there always) who takes it upon themselves to document all the available options to make chooosing a tools easier. In this case, it’s Matt Turck. Every year for the past five years, maybe longer, Matt has produced a handy infographic of the Big Data Landscape, accompanied by a very insightful blog which you should make a point of reading. This is the picture for 2016, and this is the one for 2021. Please take a look at them before reading further…

The thing that’s most striking about these images is how much more there is in the later one. The explosion in data and analytics over the last few years, coupled with the availability of more open-source software has led (in my view) to a bewildering array of options for data – so much so that it’s probably easier to stick with what you know.

Making the right technology choice for your thing is important. Making the right tooling choice for manipulating, moving, transforming, aggregating, tagging, reporting, analysing, and visualising your data is also very important. And just as hard, if not harder.

Imagine designing a system in a NoSQL datastore like Apache Cassandra, only to realise that reporting is *not* as simples as a SELECT * FROM table JOIN otherTable ON commonColumn, and you should really redesign your entire keyspace to get that kind of data materialised up front (please, no comments on the relative merits of Materialised Views in Cassandra – that can only end badly).

My point here is that sometimes a plethora of choices can lead to suboptimal decisions. Tooling for the data professional is not ‘When all you’ve got is a hammer, everything starts to look like nails’ – it’s a delicate balancing act between function and useability. Not everyone is going to know every tool. No one expects them to, except possibly recruiters, and nor should anyone denigrate one tool in favour if another to another potential user.

It’s a personal choice, unless your org has a specific toolset, but I’m guessing that most larger orgs won’t have one that covers all of the current tech stack (or if they do, it’s so out of date it’s not funny any more).

The choice comes down to whether you continue to eat peas with honey, buy a spoon, borrow a fork, or stop eating peas in the first place.

Back soon….

Share this:

Related

Leave a comment Cancel reply