Posts

2016

Kafka — Accelerating!

November 22, 2016

Quick tips and insights on how to make Apache Kafka
work faster!

Hardware

CPU doesn’t matter that much.
Memory helps a lot (a lot) in performance.
SSDs are not required, since most operations are sequential read and writes.
If possible run in bare
metal.

Linux

Configure to maximize memory
usage (tweak until you feel comfortable):

vm.dirty_background_ratio = 5
vm.dirty_ratio = 80
vm.swappiness = 1

Assuming you are using ext4, don’t waste space with reserved blocks:

tune2fs -m 0 -i 0 -c -1 /dev/device

Mount with noatime:

/dev/device       /mountpoint       ext4    defaults,noatime

Keep an eye on the number of free inodes:

tune2fs -l /dev/device | grep -i inode

Increase limits, for example, using systemd:

$ cat /etc/systemd/system/kafka.service.d/limits.conf
[Service]
LimitNOFILE=10000

Tweak your network settings, for example:

net.core.somaxconn = 1024
net.core.rmem_max = 67108864
net.core.wmem_max = 67108864
net.ipv4.tcp_rmem = 4096 87380 33554432
net.ipv4.tcp_wmem = 4096 65536 33554432
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.tcp_syncookies = 1

Kafka

log.dirs accepts a comma separated list of disks and will distribute
partitions across them, however:

Doesn’t rebalance, some disks could be full and others empty.
Doesn’t tolerate any disk failure, more info in
KIP-18.
Raid 10 is probably the best middle ground between performance and
reliability.

*num.io.threads, *number of I/O threads that the server uses for executing
requests. You should have at least as many threads as you have disks.
num.network.threads, number of network threads that the server uses for
handling network requests. Increase based on number of producers/consumers and
replication factor.
Use Java 1.8 and G1 Garbage
collector:

-XX:MetaspaceSize=96m
-XX:+UseG1GC            # use G1
-XX:MaxGCPauseMillis=20 # gc deadline
-XX:InitiatingHeapOccupancyPercent=35
-XX:G1HeapRegionSize=16M
-XX:MinMetaspaceFreeRatio=50 -XX:MaxMetaspaceFreeRatio=80

KAFKA_HEAP_OPTS, 5–8Gb heap should be enough for most deployments, file system
cache is way more important. Linkedin runs 5Gb heap in 32Gb RAM
servers.
pcstat can help understand how well the
system is caching:

./pcstat /kafka/data/*

Any comments or suggestions are welcome!

Stability Patterns — Circuit Breaker

August 20, 2016

A circuit breaker is an automatically operated electrical switch designed to
protect an electrical circuit from damage caused by over-current or overload or
short circuit. Its basic function is to interrupt current flow after protective
relays detect a fault. A circuit breaker can be reset (either manually or
automatically) to resume normal operation.

The software analogue as described in Release
it! chapter 5.2 can prevent repeated
calls to a failing service by detecting issues and providing a fallback, by
using this pattern it is possible to avoid cascading failures.

Ansible vs PCI

August 16, 2016

If you need to comply with PCI
requirements
like:

Requirement 2: Maintain an inventory of system components in scope for PCI DSS
to support effective scoping practices.

You will find that using public-key authentication is sometimes
forbidden
as it’s almost impossible to ensure employees are rotating the keys, keeping the
private key safe and with a strong password.

Using Ansible without ssh key based authentication is painful if you need to
run a playbook against hundreds of servers, as you will need to insert your
password ad nauseam.

Docker: Configuration files

August 14, 2016

Things no one tells you about.

One of Docker’s killer features is the environment
parity, yet it feels like one little detail was left untold: how to handle
configuration files.

Unless you are using the same configuration between development, quality,
production, etc. you will end up with different endpoints, API keys, secret
tokens and feature switches for each environment.

Available Options

There are a couple of different ways to handle configuration in Docker. Below,
you will find a non-exhaustive list.

Collectd plugins made easy

August 9, 2016

Collectd is a Unix daemon that collects, transfers and stores performance
data of computers and network equipment. The acquired data is meant to help
system administrators maintain an overview over available resources to detect
existing or looming bottlenecks.

Collectd provides a long list of
plugins available
out-of-box. However, if you need to collect additional metrics, one of the
easiest ways to do so is using the exec
plugin.

In order to use the exec plugin, create an Collectd configuration file:

Chef Custom Resources

July 13, 2016

Lately I’ve been writing some Chef code. One of the best things about Chef
is custom resources:
https://docs.chef.io/custom_resources.html

Let’s see an example on how to create a Kafka topic
using Chef and how to make it
idempotent.

Before writing any Chef code it is important to understand how to manage a
topic (grouping of messages of a similar type).

From the Kafka install directory, first check if the topic already exists:

bin/kafka-topics.sh
    --zookeeper localhost:2181 \
    --describe \
    --topic <name>

If it doesn’t exist, create it:

Ansible in AWS Lambda

February 18, 2016

Originally published at https://medium.com/@jacoelho/ansible-in-aws-lambda-980bb8b5791b