A circuit breaker is an automatically operated electrical switch designed to
protect an electrical circuit from damage caused by over-current or overload or
short circuit. Its basic function is to interrupt current flow after protective
relays detect a fault. A circuit breaker can be reset (either manually or
automatically) to resume normal operation.

The software analogue as described in Release
it!
chapter 5.2 can prevent repeated
calls to a failing service by detecting issues and providing a fallback, by
using this pattern it is possible to avoid cascading failures.

Sample implementation in Go

Potential failing calls will be wrapped in a circuit breaker to help minimize
issues.

Imagine the following weather information service (an unreliable source):

func fetchWeather(location string) string {
    time.Sleep(time.Duration(rand.Intn(5)) * time.Second)
    return "sunny"
}

A potential circuit breaker function signature, using high order functions:

WithCircuit(call func() error, fallback func())

We will attempt to use the call function, however if it returns an error or
the circuit is Open, the fallback function is going to be invoked instead.

Usually circuit breakers are used with another stability pattern,
Time-outs*.*

Instead of waiting indefinitely until the service returns the expected answer,
we set a stopwatch. This way, if the answer takes longer, we just return an
error.

Improving the circuit breaker:

func (c *Circuit) WithCircuit(call func() error, fallback func()) error {
	if !c.isOpen() {
		wait := make(chan error, 1)

		// run function with timeout
		go func() { wait <- call() }()

		select {
		case err := <-wait:
			if err != nil {
				break
			}
			return nil
		case <-time.After(time.Second * c.TimeOut):
			break
		}
	}

	fallback()
	return ErrorCircuitTripped
}

The only bit missing is
state-machine controlling
the transitions between circuit Open, Closed, Half-Open,
Half-Closed (to keep it simple I will ignore the half states).

type Circuit struct {
	TimeOut         time.Duration // how long to wait for the execution
	FailThreshold   uint32        // how many fails until circuit is tripped
	RetryThreshold  uint32        // how many failed requests until try again
	failuresCounter uint32
	retriesCounter  uint32
}

To automatically recover from failures I will use a retry mechanism: every N
calls in a open state, circuit will allow one request to see if the issue still
persists.

The retry logic:

func (c *Circuit) shouldRetry() bool {
	if c.retriesCounter > c.RetryThreshold {
		c.retriesCounter = 0
		return true
	}
	return false
}

And open circuit logic:

func (c *Circuit) isOpen() bool {
	if c.shouldRetry() {
		return false
	} else {
		return c.failuresCounter > c.FailThreshold
	}
}

An improved version of the circuit breaker main block:


func (c *Circuit) WithCircuit(call func() error, fallback func()) error {
	if !c.isOpen() {
		wait := make(chan error, 1)

		// run function with timeout
		go func() { wait <- call() }()

		select {
		case err := <-wait:
			if err != nil {
				break
			}
			c.close()
			return nil
		case <-time.After(time.Second * c.TimeOut):
			break
		}
	}
	c.incrementFailures()
	fallback()
	return ErrorCircuitTripped
}

And the helpers:

// close circuit
func (c *Circuit) close() {
	c.failuresCounter = 0
	c.retriesCounter = 0
}

// increase failures counter circuit
func (c *Circuit) incrementFailures() {
	c.failuresCounter++
	c.retriesCounter++
}

Using the circuit breaker:

func fetchWeather(location string) string {
	time.Sleep(time.Duration(rand.Intn(10)) * time.Second)
	return "sunny"
}

func fallbackWeather() string {
	return "raining"
}

func main() {
	fuse := &Circuit{
		TimeOut:        3,
		FailThreshold:  5,
		RetryThreshold: 5,
	}

	var localWeather string

	for {
		fuse.WithCircuit(func() error {
			localWeather = fetchWeather("London")
			return nil
		},
			func() {
				localWeather = fallbackWeather()
			})

		fmt.Println(localWeather)
	}
}

This example shows the concepts behind circuit breaker pattern. Although it’s
easy to understand, as it matches a real life example, a “*production ready”
*circuit breaker is way more complex and should include additional
functionalities like:

  1. Logging
  2. Manual Override
  3. Concurrency support

A perfect example to follow is NetflixHystrix.