scabl: Two Major Design Flaws

Throughout the time I spent developing longevity, I've made any number of major design flaws. The vision I started out with has changed drastically over time. It's evolved as I experimented with developing the longevity API. While my core guiding principles have stayed in place, the early versions of the library are nearly unrecognizable when compared to the latest version.

Since the beginning, I've had a comprehensive test suite, that has allowed me to correct mistakes and simplify my API with ease and assurance of correctness. And working with the API as it has grown has allowed me to spot problems as I moved forward. But there were two major design flaws that remain in place. Both of them have been nagging me for many months now, until they have become clear in my head. Now I'm ready to admit and address these two errors: the failure to use shapeless, and the lack of column-oriented storage of persistent entities with column-oriented back ends such as Cassandra and SQLite.

Early on in development, I was considering whether or not to use shapeless. I decided against it, and I've been regretting this decision for a while now. But it would be easy to blame myself for it, in hindsight. At the time, I think it was not an unreasonable decision. This was back in the latter part of 2014. While I was an experienced Scala developer by this time, I was still inexperienced with shapeless, and other typelevel libraries. I still relied heavily on my Java background, and I still thought of Scala as a sort of Java++. I just didn't get shapeless. I tried to understand it, but it was a bit beyond my abilities. And it was hard to find reading materials back then that would help me understand it. Since then, I've experimented with cats, shapeless, and other typelevel libraries, and I am more comfortable and accepting of them. The Type Astronaut's Guide to Shapeless came out, which I read, and for the first time, I felt like a vaguely understood how shapeless actually worked. I've since put in a minor shapeless-related PR to frameless, and adopted shapeless in longevity for test data generation. All in all, I'm much more comfortable with shapeless and category theory now, even though I still consider myself a beginner in these areas.

My own unfamiliarity and reluctance to jump up to the next level of functional programming wasn't the only thing that caused me to decide against using shapeless in longevity. I wanted to make a persistence library for Scala that was accessible to Java programmers, and at the time, I felt that libraries like Scalaz and shapeless had the potential to alienate Java expats. The use of Scala macros would also complicate the builds of people using longevity, which I wanted to avoid at the time, even though I adopted macros later on myself.

By this point, I recognize my avoidance of shapeless as a major architectural mistake. It would have improved my codebase by relying on a tried and true reflective library, rather than building my own home grown reflective tools. The benefits of pure compile-time reflection over mostly run-time reflection would have been significant. And it would probably have generated more interest among the potential user community. I'm now ready to migrate the entire library to shapeless, although of course, this will take some time.

I want to talk a bit about the second major longevity design flaw that I have yet to correct. When I started writing longevity, the only back end I supported (aside from an in-memory back end that is intended for use in testing only) was MongoDB. Naturally, I was storing the persistent entities in a JSON format. I found myself working on a contract where we were using Cassandra, and storing entities as JSON in a Cassandra column. I immediately saw how I could add a Cassandra back end to longevity that matched this style. The entities are stored as JSON in a column, and other columns are added for things like indexing and optimistic concurrency control. Later came the SQLite back end, and it followed the same approach.

Over time, it became more and more clear that this was not the best solution. One important piece of feedback in this regard came whenever I was on a job that required the use of a Scala database integration layer. Every time this happened, I ruled out longevity as a possibility. Why? Because the database longevity was reading from and writing to, had to be used by other tools within the environment. And these other tools were expecting the data to be represented in a columnar format! Obviously, I had made a mistake here. It's certainly correctable, and I intend to correct it. But again, it will take some time.

The first three years I was working on longevity, I was largely working on it full time. I would take a contract here or there to pay the bills, but longevity always remained a focus. Then last year, I decided to go back to the workplace full time. I needed a bit of a change of pace, and having a steady, well-paying job was also a very welcome development. For the first six months or so on the job, I pretty much set longevity entirely to the side. Eventually, it started creeping back into my head. I had actually started working on my first foray into shapeless before I took the job, and I started getting curious as to what the state of that branch was. I had identified the two major design flaws I had to work with, and I had an idea for a plan of how to address them. Here it is.

I love PostgreSQL. I think it's about the best relational database ever. One thing nice about Postgres is that it supports arrays and composite types. And I mean fully supports them, including indexing, full query support, etc. This would make a columnar back end particularly easy. Correcting each one of my major design flaws would be a major challenge - why not address them both at once, and save some hassle? On top of that, it would probably be easier to do this work in a greenfield back end, rather than migrate an existing back end. Implement it entirely with shapeless, and implement it entirely in columns, without any JSON. Once I have this under my belt, it will be much easier to go back and migrate the existing back ends.

So this is what I plan to do next. Many things have changed over the past year or so, and longevity is now a hobby project, and no longer my primary focus. This is freeing in a lot of ways. It takes the pressure off. It also means development will go slowly, and developing the new back end will take some time. But I'm really excited about it! It's going to be a lot of fun.

I'm also glad to be done with this little essay. It means the next time I have an hour to poke at longevity, I'll be able to dig right into the code!

scabl

2018-07-04

Two Major Design Flaws

No comments:

Post a Comment