Onboarding

I wasn’t sure at what point I would be ready to write a “how I approach onboarding” post, and then I took dev offline (it’s dev no customers were harmed). I think that means I’m sufficiently embedded that I can talk about my process of getting there.

Week 1

Week one is about information gathering. How much data, system architecture, and user flow can I shove into my head in as short a time as possible. Everything from “what are we selling?” to “how does this thing work?”

I ask a lot of questions in this time. Both synchronously and asynchronously. I find the questions in a slack thread are super informative, because people pop in with extra answers later on, and sometimes the rest of the team learns something new too. Huddles or calls in which I can ask “but why?” to people’s faces are a great way of getting to know one another as well as build context.

Local Development

If you’re lucky you get a single repository to pull and play with. It might have testing, it might have a README.md it might even have some architecture diagrams. But most people don’t know which diagrams are helpful, so you might have something that looks like a five year old was let loose with a pencil. It’s accurate, but not super helpful.

Step one is always getting local at least testable. That means I have dependencies installed and I can run whatever level of unit tests exist. I don’t really care what the coverage percentage is. We aim for 100% on our backend, but that’s only realistic because we outsource writing the really dumb ones to the AI. This time around I was completely discombobulated because it turned out I needed gcloud auth to run the frontend tests. They won’t work if I’m offline.

When I started working that would have been terrifying. That’s a rant for another day, but let’s take a moment to appreciate the stable internet connections we now take for granted, and remember with a grin that we will tell our grand-children about the days of dial-up, hot-spots, and load-shedding.

Pictures Please

I mentioned architecture diagrams. There are a handful of different ones which I find immensely valuable. If they don’t already exist, I will use up my LLM credits having it read the entire code base and draw the pictures for me. I also like to put people on the spot and ask them to draw these things from memory, but that’s hard mode.

Sequence Diagrams are great for understanding user flows and data flows. I like to have the for the most critical APIs, not necessarily the most complex. They can be replaced or augmented with flow charts, depending on how much space and patience you have.

Entity-Relationship Diagrams (ERDs) are usually used to describe a complex relational database. Some things here, if you have a complex enough relational database that your ERD is spidery, you should consider if you’re using the right data store. Also, if you are having issues with transactions, or forcing non-relational data into a relational datastore, think twice. That being said, truly relational data can be shown really nicely in an ERD, even if the things should be regenerated with every DB migration you run, rather than given to an LLM to make up. They give you a visual of what you’re doing with your data modelling which you really won’t find anywhere else.

Cloud-Architecture diagrams didn’t exist when all the modelling and diagramming languages were written, and our forebears would be horrified at all the pretty little icons we use for networks, compute, schedulers, etc. They are really helpful though, and I don’t know if there is a terraform plugin which can draw a diagram from the definitions, but that would be another great way to understand your dependencies and what’s happening. Also, goes out of date really fast if you add too much detail, but the high level version is super useful.

Week 2

Yep, by week two the basics are in the head and you’re ready to start making simple low-risk changes, right? For me, this is important. I don’t learn how things work until I use them and try them. I’m an inside-out learner not an outside-in learner. I need the theory eventually, but first show me that it works, then use the theory to explain what’s going on, so that I can expand on it later.

For me that’s frequently changes to logging and observability. It’s adding metrics. It’s small bug fixes. Things which add value, and which will add value to for ages to come, but which aren’t going to break anything. And which no one is going to mind too much if it takes a while to figure them out. Developer experience improvements, minor version upgrades, things like that. They let me get my hands on the code, see where the critical pathways are, and iterate on improvements the team can see and use.

These changes eat up a week in no time, but by the end of them we are all super happy.

Week 3

Starting to be more useful. I am still looking for small isolated changes, but in general they are gong to be a little more far reaching. Still focused on developer experience, I might be making a number of iterative changes to dashboards, logging, minor version updates, tooling etc. I will be taking a stronger stance in code reviews, giving opinions on designs, and generally getting in the middle of things.

I didn’t mean to, but one of the things I approached this week was upgrading a NextJS app from 14 to 16. The final review had 82 files changed. It was a refactor handled mostly by automation, and with a lot of AI. Why? Well there are published tools for doing it safely, and there are these LLMs which can read type check failures and read test failures and fix them a lot faster than I can. I just have to keep an eye on the quality of the changes being made.

No, that’s not how I took down dev.

I took down dev by trying to speed up our deployment process. We were only deploying to dev when a release was approved and looked good. But that’s what staging is for. We should be sending code to dev every time we push to main on the repository. Which is all very well and good, but when you don’t realise there is a release trigger which builds the container images you need to deploy, and then the entire team reviews the PR and approves it (also forgetting the release trigger), and you push a change that just… can’t work. Well, someone sends a screenshot of the dev site returning a 404 Not Found and you have to debug and revert.

That’s how we learn

It’s a sign of a good team and a healthy culture that at no point was anyone being mean about things breaking. We simply team up on the debugging, figure out a path forwards, and go for it. Yes, there is a little bit of ribbing about reviewing PRs and Pippa finally breaking something. Given that it is all in good fun, I consider that healthy.

Overall, taking down dev after three weeks is about right for me. I know a lot more about my systems than I did when I started, and I am very happy with the decisions I’ve made. Learning is happening at a sustainable pace. This is good.