Complex mixes of monoliths, micro-services, databases, data centers, networking, and cloud providers provide a dizzying array of opportunities for your services to fail. No one has perfect failover, so you have be prepared to play defense.

We will look at two categories of failures and ways to recognize them coming, and avoid spreading the carnage they cause to other services you provide. * Zombies: Long running but abandoned requests that eat up memory and crash the system long after the user who conjured it gave up. * Black Holes: Dependent services that take connections but never give them up, or perform so poorly for a time that all your attention eventually gets focused on that one thing.

You can expect to see: * Novel concurrent data structures for tracking ongoing work used as a basic building block for recognizing Zombies and Black Holes. * Circuit Breaking: Statistics used for recognizing normal and when to stop using a dependency. * Dealing with the behavior differences of highly vs little used and fast vs slow dependencies. * Coordinating circuit breakers among many nodes so that they all get the message when something is wrong and when it clears. * Considerations for overriding automatic decisions with human intervention. * A demonstration of a working system. * Sci-fi references and Horror Stories.