In a previous blog post we saw how we can detect common code smells using NDepend. In that version, I implemented the detection strategies with CQLinq. This worked great, but it did have some caveats:
It was hard to run the analysis across several Visual Studio solutions, as I needed to open each solution, load the NDepend project and the rule file and run the NDepend analysis.
If you wanted to process the NDepend results, you’d need to first export them as an XML file. Then you could write some custom code to transform from XML to whatever format you needed.
Using the NDepend.API
So, for a more automated analysis of large codebases (that might contain many solutions), I decided to migrate the detection strategies from CQLinq to the NDepend.API.
The translation went quite smooth and you can find the code here. The Getting Started documentation and the existing open source Power Tools (that come with the NDepend installation) helped me hit the ground running. The only thing I missed was the default notmycode queries. These come out of the box with the default CQLinq ruleset and help you exclude generated code. If you use the NDepend.API, you won’t have the notmycode prefix, so you need to re-implement the exclusions in code.
Now, I can point it to a folder that contains several Visual Studio solutions. It will create NDepend projects for all solutions, run the NDepend analysis, run the custom detection strategies and export the results as JSON. Of course there are trade-offs. With the NDepend.API, I didn’t have the quick feedback that you get when writing CQLinq rules.
In this microservices era, many teams are building messaging solutions. What starts as a simple solution with 5 deployment units can quickly grow to tens or even hundreds. I have worked on such a solution for several years and experienced both the advantages and disadvantages of a Service-Oriented Architecture. We used NServiceBus (part of Particular’s Service Platform).
The solution started out simple, but, as the number of features grew, so did the complexity. With tens of message endpoints, it was hard to see both the big picture and the details. For the big picture, we could rely on some manually created diagrams (e.g. a Context View in Simon Brown’s C4 model). But things got trickier when we wanted to understand the details. When I talk about details, I mean answers to specific questions. For example:
What messages does endpoint X send/receive?
What endpoints are coupled to the X endpoint?
What messages are part of the Y business flow?
What messages is service Z sending?
What messages trigger message W to be sent?
Show me the entire message flow that starts with message W.
While I was thinking about this, I saw this interesting tweet from Jack Kleeman that showed the communication paths between microservices at Monzo:
Now, the system I worked on was nowhere near this complex, but it made me wonder: how can you answer the questions above when working on such a system? In this blog post we’ll explore some options. To keep things simple, in this blog post we’ll use a sample eCommerce solution (that I’ve also used in my article series about Designing long-running processes in distributed systems).
I’ve been working with NServiceBus (part of the Particular platform) for the past 5 years. In all this time, I’ve always been impressed with the community around it. This article is my experience report. I’ll go over some of the highlights of working with this great piece of technology:
In the previous posts in this series, we’ve seen some examples of long running processes, how to model them and where to store the state. But building distributed systems is hard. And if we are aware of the fallacies of distributed systems, then we know that things fail all the time. So how can we ensure that our long running process doesn’t get into an inconsistent state if something fails along the way?
Let’s see some strategies for dealing with failure in the Shipping service. First, let’s have another looks at the shipping policy defined in the previous post:
First, attempt to ship with Fan Courier.
If cannot ship with Fan Courier, attempt to ship with Urgent Cargus.
If we did not receive a response from Fan Courier within the agreed SLA, cancel the Fan Courier shipment and attempt to ship with Urgent Cargus.
If we cannot ship with Urgent Cargus or did not receive a response within the agreed SLA, notify the IT department.
Retries
The Fan Courier Gateway handles the ShipWithFanCourierRequest message and calls the Fan Courier HTTP API. What happens if we get an Internal Server Error?
The simplest thing we could do would be to retry. What if it still fails? Then we can wait a bit, then retry again. For example, we can retry after 10 seconds. If it still fails, retry after 20 and so on. These Delayed Retries are a very useful strategy for getting over transient errors (like a deadlock in the database). We could even increase the time between retries exponentially, using an exponential backoff strategy.
Idempotent Receiver
One thing that you need to be mindful when retrying is message idempotency. What happens if we get an HTTP timeout when calling the Fan Courier HTTP API, but our shipment request was actually processed successfully, we just didn’t get the response back? When we retry, we don’t want to send a new shipment. This is why the Fan Courier Gateway needs to be an Idempotent Receiver. This means that it doesn’t matter if it processes the same message only once or 5 times, the result will always be the same: a single shipment request. There are several ways of implementing an idempotent receiver, but these are outside of the scope of this article.
Timeouts
But what if the Fan Courier API is down? Retrying won’t help. So what can we do? When we send the ShipWithFanCourierRequest we can also raise a timeout within 30 minutes (at line 8). When we receive the timeout message (line 13) we can take some mitigating actions. The shipping policy states that we’d like to attempt to ship with Urgent Cargus. In order to do that, we’ll want to first cancel the Fan Courier shipment (line 17). This is what’s called a compensating transaction because it will undo the effects of the initial transaction. Then, we’ll send a ShipWithUrgentCargusRequest.
What happens if the UrgentCargus API is down too? We can send the message to an error queue. This is an implementation of the Dead Letter Channel pattern. A message arriving in the error queue can trigger an alert and the support team can decide what to do. And this is important: you don’t need to automate all edge cases in your business process. What’s the point in spending a sprint to automate this case, if it only happens once every two years? The costs will definitely outweigh the benefits. Instead, we can define a manual business process for handling these edge cases.
In our example, if Bob from IT sees a message in the error queue, he can inspect it and see that it failed with a CannotShipOrderException. In this case he can notify the Shipping department and they can use another shipment provider. But all of this happens outside of the system, so the system is less complex and easier to build.
Saga
Another failure management pattern is the Saga pattern. Let’s see an example.
Requirement
The Product Owner would like to introduce a new feature – the ability to ship high volume orders. But there’s a catch: high volume orders are too large to ship in a single shipment. We need to split them in batches. But, we only want to ship complete orders. This means that if we cannot ship one batch, we don’t want to ship any batch.
The Saga pattern advocates splitting the big transaction (ship all batches) into smaller transactions (one per batch). But since these transactions are not isolated, we need to be able to compensate them:
The ShipHighVolumeOrderSaga in the sample code base shows how to use the Saga pattern to implement this feature.
Benefits
Avoids Distributed Locks
By using the Saga pattern you avoid using distributed locks and two-phase commits. This means that you avoid the single point of failure – the distributed transaction coordinator – and it’s more performant.
Atomic, Consistent, Durable
If you implement this pattern correctly, you can get Atomicity, Consistency and Durability guarantees.
Drawbacks
Lack of Isolation
The lack of isolation can cause anomalies. If between T1 and T2 you get a T4, you need to decide what to do. You can easily get into an inconsistent state.
Complex
Handling these cases and all the different orders that messages can arrive can introduce complexity.
In this article we’ve seen some patterns for handling failures in long running processes. We started with the easier ones: retries and delayed retries, timeouts, compensating transactions and dead letter channels. Then we’ve briefly covered a more complex pattern – the saga pattern. I keep the saga pattern at the bottom of my toolbox and I avoid it if possible. Many times, you can get around it by using simpler patterns.
In this article series we’ve seen how we can use different patterns to implement long running processes. To showcase the patterns, we’ve used a sample eCommerce product that looks like this:
If you want to have a look at the code, you can find it on my github account.
Identities are the defining characteristic of an entity in Domain-Driven Design. And as soon as the Id is public and leaves its immediate context, other components might use it. For example if service A references by Id an entity from service B, changing the Id of the entity will have a knock-on effect of service A. This is why its important to have several tools in the toolbox. In this blog post we’ll discuss 7 strategies for assigning Ids and their trade-offs.
Testing is an important part of building a product right. Continuous Delivery makes that more explicit by building quality in. In this blog post we’ll see how you can start off testing on the wrong foot. Then we’ll see how asking basic questions like Why, What, How and Where can help you define a sound test strategy in a Continuous Delivery context.
The Deployment Pipeline
Most of the teams nowadays think about Continuous Delivery. Continuous Delivery means automating the release process, from code merge to production release. How do you do that? By using the deployment pipeline pattern. The deployment pipeline models and automates the release process. Here is an example:
I think that everybody agrees that testing is required in order to build a quality product. But there’s also a lot of confusion about the boundaries of each test type. What’s the scope of a unit test? What’s the difference between an integration test, an integrated test and a contract test? If you ask 3 developers about test boundaries, you’ll most likely get 3 different answers. For example, I still talk to people who consider that a unit test should test a single class/method.
What’s clear is that most teams don’t have a consensus on what’s the scope of the different types of automated tests and the differences between them. Getting to a universal consensus might be hard, but getting to a consensus inside the team should be easy enough. In this blog post we’ll see an example of how to do that.
Have you ever been on a project where, because of the tight schedule or tight budget, the focus was only on delivering business stories? How did that work for you? Did you ever manage to pay all the technical debt incurred? What about the process debt?
I think that in most cases, this is a false economy. We’re gaining a small benefit now (maybe not event that), but we’re paying a much larger cost in the future. This is because, as Mike Rother says, a process that does not improve, degrades over time. For example, if you’re not continuously improving the feedback through the deployment pipeline, your 20 minute test suite will grow into an 1 hour test suite. If you’re not constantly fixing brittle tests, people will get used to ignoring them. This is not a people problem. It’s a system problem. It’s much harder to make people do something. It’s easier to put the required controls in the process. As an example, fail the build if the tests take longer than 30 minutes.
We have all used code analysis tools on our projects and these are useful for identifying some code smells. The issue is that most of them treat metrics in isolation and isolated metrics can’t tell you if the design is good or bad. You need more context.
In this blog post we’ll see how to go beyond code smells. We’ll see how to identify design smells and inappropriate coupling in the technical architecture. We’ll define detection strategies for common design smells (like God Class and Feature Envy) and implement them using NDepend. Last but not least, we’ll see how we can define fitness functions that detect dependency violations in our application’s architecture.