⏴ Home

Unexpected interface outcomes per hour

Oct 22, 2024

It's intuitive that tools sometimes fail. You're often not surprised when they do. It's an ingrained experience in using any tool. When I swing a hammer, I expect to miss. In my mind I chalk that mostly up to myself (I know that I don't always swing the handle straight, and then head will miss by a centimeter or two).

I don't typically blame the hammer for that, even though when you think about it you definitely could. A heavy hammer for driving framing nails is a terrible tool for driving picture frame finishing nails into your drywall and if you use it, you'll probably end up with some angled nicks to the side. So right now because I don't have a nice light finishing hammer, I am extra careful and slow with the heavy hammer when putting up pictures and I even grip it higher up the handle towards the head so that I have less leverage and less acceleration.

So users know the the tool doesn't always fit the job perfectly and habitually modify their behavior to reduce their exposure to mistaken uses of the tool.

Users have the same unconscious negotiation with software tools. When you see a spinner for a certain length of time, you realize the request is not coming back. And you know that you probably don't even need to reset your request to something else, that it probably isn't a bad request but some transient server error.

Most people don't have those words for it but they do have that feeling. Some of that is associations born of convention (spinner) but some of it is generic tool feeling, the same as with positioning a hammer.

Usually when there's a discussion of software reliability it will start with the quantitiative or the "how many 9's are enough" variety of really drilling into precision about behavior, and off-loading the qualitative on very artificial perspectives like service level agreements/objectives (SLAs/SLOs).

I'm bringing up the discussion around tools, how they feel, and how users develop nature compensating behaviors because I'm think you can start with the long tail of user experience when thinking about reliability. Things build up over time, see: Microsoft.

In a product like Linear, for instance, architecting for optimistic interactions that produce a high immediate reliability produces a really different user expeirence. When I'm using Zendesk or Intercom or Jira or whatever, I'm literally afraid, I know it's going to fail, and I'm praying that only vaporizes an unimportant note or results in double-sending a innocuous message. Compare that to when I'm using Linear, there's a kind of flow mode and actually a feeling of safety and trust even. I know this software sometimes fails (e.g. it does get a bit weird during network troubles in a way that makes me afraid of like, how many minutes/hours of my changes might get erased? But then it recovers). But overall I feel comfortable in a way that's much different than most products. I think you could say the same thing about Apple / Microsoft as a comparison.

Maybe you call this user-centric reliability. Whereas most reliability engineering starts with how much load does it place on developers if something fails, or whats the mean time to recovery, etc. in a way that makes sense if you're processing dollars per hour or something. It's much harder to capture trust, like -- unexpected outcomes per hour? User-facing error budgets? Software trust scores in place of NPS?