Twemproxy Is A Proxy Developed At Twitter

February 9, 2022
0

Partitioning: how to split data among multiple Redis instances.

Partitioning is the process of splitting your data into multiple Redis instances, so that every instance will only contain a subset of your keys. The first part of this document will introduce you to the concept of partitioning, the second part will show you the alternatives for Redis partitioning.
*Why partitioning is useful
Partitioning in Redis serves two main goals:
It allows for much larger databases, using the sum of the memory of many computers. Without partitioning you are limited to the amount of memory a single computer can support.
It allows scaling the computational power to multiple cores and multiple computers, and the network bandwidth to multiple computers and network adapters.
*Partitioning basics
There are different partitioning criteria. Imagine we have four Redis instances R0, R1, R2, R3, and many keys representing users like user:1, user:2,… and so forth, we can find different ways to select in which instance we store a given key. In other words there are different systems to map a given key to a given Redis server.
One of the simplest ways to perform partitioning is with range partitioning, and is accomplished by mapping ranges of objects into specific Redis instances. For example, I could say users from ID 0 to ID 10000 will go into instance R0, while users form ID 10001 to ID 20000 will go into instance R1 and so forth.
This system works and is actually used in practice, however, it has the disadvantage of requiring a table that maps ranges to instances. This table needs to be managed and a table is needed for every kind of object, so therefore range partitioning in Redis is often undesirable because it is much more inefficient than other alternative partitioning approaches.
An alternative to range partitioning is hash partitioning. This scheme works with any key, without requiring a key in the form object_name:, and is as simple as:
Take the key name and use a hash function (e. g., the crc32 hash function) to turn it into a number. For example, if the key is foobar, crc32(foobar) will output something like 93024922.
Use a modulo operation with this number in order to turn it into a number between 0 and 3, so that this number can be mapped to one of my four Redis instances. 93024922 modulo 4 equals 2, so I know my key foobar should be stored into the R2 instance. Note: the modulo operation returns the remainder from a division operation, and is implemented with the% operator in many programming languages.
There are many other ways to perform partitioning, but with these two examples you should get the idea. One advanced form of hash partitioning is called consistent hashing and is implemented by a few Redis clients and proxies.
*Different implementations of partitioning
Partitioning can be the responsibility of different parts of a software stack.
Client side partitioning means that the clients directly select the right node where to write or read a given key. Many Redis clients implement client side partitioning.
Proxy assisted partitioning means that our clients send requests to a proxy that is able to speak the Redis protocol, instead of sending requests directly to the right Redis instance. The proxy will make sure to forward our request to the right Redis instance according to the configured partitioning schema, and will send the replies back to the client. The Redis and Memcached proxy Twemproxy implements proxy assisted partitioning.
Query routing means that you can send your query to a random instance, and the instance will make sure to forward your query to the right node. Redis Cluster implements an hybrid form of query routing, with the help of the client (the request is not directly forwarded from a Redis instance to another, but the client gets redirected to the right node).
*Disadvantages of partitioning
Some features of Redis don’t play very well with partitioning:
Operations involving multiple keys are usually not supported. For instance you can’t perform the intersection between two sets if they are stored in keys that are mapped to different Redis instances (actually there are ways to do this, but not directly).
Redis transactions involving multiple keys can not be used.
The partitioning granularity is the key, so it is not possible to shard a dataset with a single huge key like a very big sorted set.
When partitioning is used, data handling is more complex, for instance you have to handle multiple RDB / AOF files, and to make a backup of your data you need to aggregate the persistence files from multiple instances and hosts.
Adding and removing capacity can be complex. For instance Redis Cluster supports mostly transparent rebalancing of data with the ability to add and remove nodes at runtime, but other systems like client side partitioning and proxies don’t support this feature. However a technique called Pre-sharding helps in this regard.
*Data store or cache?
Although partitioning in Redis is conceptually the same whether using Redis as a data store or as a cache, there is a significant limitation when using it as a data store. When Redis is used as a data store, a given key must always map to the same Redis instance. When Redis is used as a cache, if a given node is unavailable it is not a big problem if a different node is used, altering the key-instance map as we wish to improve the availability of the system (that is, the ability of the system to reply to our queries).
Consistent hashing implementations are often able to switch to other nodes if the preferred node for a given key is not available. Similarly if you add a new node, part of the new keys will start to be stored on the new node.
The main concept here is the following:
If Redis is used as a cache scaling up and down using consistent hashing is easy.
If Redis is used as a store, a fixed keys-to-nodes map is used, so the number of nodes must be fixed and cannot vary. Otherwise, a system is needed that is able to rebalance keys between nodes when nodes are added or removed, and currently only Redis Cluster is able to do this – Redis Cluster is generally available and production-ready as of April 1st, 2015.
*Presharding
We learned that a problem with partitioning is that, unless we are using Redis as a cache, to add and remove nodes can be tricky, and it is much simpler to use a fixed keys-instances map.
However the data storage needs may vary over the time. Today I can live with 10 Redis nodes (instances), but tomorrow I may need 50 nodes.
Since Redis has an extremely small footprint and is lightweight (a spare instance uses 1 MB of memory), a simple approach to this problem is to start with a lot of instances from the start. Even if you start with just one server, you can decide to live in a distributed world from day one, and run multiple Redis instances in your single server, using partitioning.
And you can select this number of instances to be quite big from the start. For example, 32 or 64 instances could do the trick for most users, and will provide enough room for growth.
In this way as your data storage needs increase and you need more Redis servers, what you do is simply move instances from one server to another. Once you add the first additional server, you will need to move half of the Redis instances from the first server to the second, and so forth.
Using Redis replication you will likely be able to do the move with minimal or no downtime for your users:
Start empty instances in your new server.
Move data configuring these new instances as replicas for your source instances.
Stop your clients.
Update the configuration of the moved instances with the new server IP address.
Send the REPLICAOF NO ONE command to the replicas in the new server.
Restart your clients with the new updated configuration.
Finally shut down the no longer used instances in the old server.
So far we covered Redis partitioning in theory, but what about practice? What system should you use?
*Redis Cluster
Redis Cluster is the preferred way to get automatic sharding and high availability.
It is generally available and production-ready as of April 1st, 2015.
You can get more information about Redis Cluster in the Cluster tutorial.
Once Redis Cluster is available, and if a Redis Cluster compliant client is available for your language, Redis Cluster will be the de facto standard for Redis partitioning.
Redis Cluster is a mix between query routing and client side partitioning.
*Twemproxy
Twemproxy is a proxy developed at Twitter for the Memcached ASCII and the Redis protocol. It is single threaded, it is written in C, and is extremely fast. It is open source software released under the terms of the Apache 2. 0 license.
Twemproxy supports automatic partitioning among multiple Redis instances, with optional node ejection if a node is not available (this will change the keys-instances map, so you should use this feature only if you are using Redis as a cache).
It is not a single point of failure since you can start multiple proxies and instruct your clients to connect to the first that accepts the connection.
Basically Twemproxy is an intermediate layer between clients and Redis instances, that will reliably handle partitioning for us with minimal additional complexities.
You can read more about Twemproxy in this antirez blog post.
*Clients supporting consistent hashing
An alternative to Twemproxy is to use a client that implements client side partitioning via consistent hashing or other similar algorithms. There are multiple Redis clients with support for consistent hashing, notably Redis-rb, Predis and Jedis.
Please check the full list of Redis clients to check if there is a mature client with consistent hashing implementation for your language.
twitter/twemproxy: A fast, light-weight proxy for ... - GitHub

twitter/twemproxy: A fast, light-weight proxy for … – GitHub

twemproxy (pronounced “two-em-proxy”), aka nutcracker is a fast and lightweight proxy for memcached and redis protocol. It was built primarily to reduce the number of connections to the caching servers on the backend. This, together with protocol pipelining and sharding enables you to horizontally scale your distributed caching architecture.
Build
To build twemproxy 0. 5. 0+ from distribution tarball:
$. /configure
$ make
$ sudo make install
To build twemproxy 0. 0+ from distribution tarball in debug mode:
$ CFLAGS=”-ggdb3 -O0″. /configure –enable-debug=full
To build twemproxy from source with debug logs enabled and assertions enabled:
$ git clone
$ cd twemproxy
$ autoreconf -fvi
$. /configure –enable-debug=full
$ src/nutcracker -h
A quick checklist:
Use newer version of gcc (older version of gcc has problems)
Use CFLAGS=”-O1″. /configure && make
Use CFLAGS=”-O3 -fno-strict-aliasing”. /configure && make
autoreconf -fvi &&. /configure needs automake and libtool to be installed
make check will run unit tests.
Older Releases
Distribution tarballs for older twemproxy releases (<= 0. 4. 1) can be found on Google Drive. The build steps are the same (. /configure; make; sudo make install). Features Fast. Lightweight. Maintains persistent server connections. Keeps connection count on the backend caching servers low. Enables pipelining of requests and responses. Supports proxying to multiple servers. Supports multiple server pools simultaneously. Shard data automatically across multiple servers. Implements the complete memcached ascii and redis protocol. Easy configuration of server pools through a YAML file. Supports multiple hashing modes including consistent hashing and distribution. Can be configured to disable nodes on failures. Observability via stats exposed on the stats monitoring port. Works with Linux, *BSD, OS X and SmartOS (Solaris) Help Usage: nutcracker [-? hVdDt] [-v verbosity level] [-o output file] [-c conf file] [-s stats port] [-a stats addr] [-i stats interval] [-p pid file] [-m mbuf size] Options: -h, --help: this help -V, --version: show version and exit -t, --test-conf: test configuration for syntax errors and exit -d, --daemonize: run as a daemon -D, --describe-stats: print stats description and exit -v, --verbose=N: set logging level (default: 5, min: 0, max: 11) -o, --output=S: set logging file (default: stderr) -c, --conf-file=S: set configuration file (default: conf/) -s, --stats-port=N: set stats monitoring port (default: 22222) -a, --stats-addr=S: set stats monitoring ip (default: 0. 0. 0) -i, --stats-interval=N: set stats aggregation interval in msec (default: 30000 msec) -p, --pid-file=S: set pid file (default: off) -m, --mbuf-size=N: set size of mbuf chunk in bytes (default: 16384 bytes) Zero Copy In twemproxy, all the memory for incoming requests and outgoing responses is allocated in mbuf. Mbuf enables zero-copy because the same buffer on which a request was received from the client is used for forwarding it to the server. Similarly the same mbuf on which a response was received from the server is used for forwarding it to the client. Furthermore, memory for mbufs is managed using a reuse pool. This means that once mbuf is allocated, it is not deallocated, but just put back into the reuse pool. By default each mbuf chunk is set to 16K bytes in size. There is a trade-off between the mbuf size and number of concurrent connections twemproxy can support. A large mbuf size reduces the number of read syscalls made by twemproxy when reading requests or responses. However, with a large mbuf size, every active connection would use up 16K bytes of buffer which might be an issue when twemproxy is handling large number of concurrent connections from clients. When twemproxy is meant to handle a large number of concurrent client connections, you should set chunk size to a small value like 512 bytes using the -m or --mbuf-size=N argument. Configuration Twemproxy can be configured through a YAML file specified by the -c or --conf-file command-line argument on process start. The configuration file is used to specify the server pools and the servers within each pool that twemproxy manages. The configuration files parses and understands the following keys: listen: The listening address and port (name:port or ip:port) or an absolute path to sock file (e. g. /var/run/) for this server pool. client_connections: The maximum number of connections allowed from redis clients. Unlimited by default, though OS-imposed limitations will still apply. hash: The name of the hash function. Possible values are: one_at_a_time md5 crc16 crc32 (crc32 implementation compatible with libmemcached) crc32a (correct crc32 implementation as per the spec) fnv1_64 fnv1a_64 (default) fnv1_32 fnv1a_32 hsieh murmur jenkins hash_tag: A two character string that specifies the part of the key used for hashing. Eg "{}" or "$$". Hash tag enable mapping different keys to the same server as long as the part of the key within the tag is the same. distribution: The key distribution mode for choosing backend servers based on the computed hash value. Possible values are: ketama (default, recommended. An implementation of) modula (use hash modulo number of servers to choose the backend) random (choose a random backend for each key of each request) timeout: The timeout value in msec that we wait for to establish a connection to the server or receive a response from a server. By default, we wait indefinitely. backlog: The TCP backlog argument. Defaults to 512. tcpkeepalive: A boolean value that controls if tcp keepalive is enabled for connections to servers. Defaults to false. preconnect: A boolean value that controls if twemproxy should preconnect to all the servers in this pool on process start. Defaults to false. redis: A boolean value that controls if a server pool speaks redis or memcached protocol. Defaults to false. redis_auth: Authenticate to the Redis server on connect. redis_db: The DB number to use on the pool servers. Defaults to 0. Note: Twemproxy will always present itself to clients as DB 0. server_connections: The maximum number of connections that can be opened to each server. By default, we open at most 1 server connection. auto_eject_hosts: A boolean value that controls if server should be ejected temporarily when it fails consecutively server_failure_limit times. See liveness recommendations for information. Defaults to false. server_retry_timeout: The timeout value in msec to wait for before retrying on a temporarily ejected server, when auto_eject_hosts is set to true. Defaults to 30000 msec. server_failure_limit: The number of consecutive failures on a server that would lead to it being temporarily ejected when auto_eject_hosts is set to true. Defaults to 2. servers: A list of server address, port and weight (name:port:weight or ip:port:weight) for this server pool. For example, the configuration file in conf/, also shown below, configures 5 server pools with names - alpha, beta, gamma, delta and omega. Clients that intend to send requests to one of the 10 servers in pool delta connect to port 22124 on 127. 1. Clients that intend to send request to one of 2 servers in pool omega connect to unix path /tmp/gamma. Requests sent to pool alpha and omega have no timeout and might require timeout functionality to be implemented on the client side. On the other hand, requests sent to pool beta, gamma and delta timeout after 400 msec, 400 msec and 100 msec respectively when no response is received from the server. Of the 5 server pools, only pools alpha, gamma and delta are configured to use server ejection and hence are resilient to server failures. All the 5 server pools use ketama consistent hashing for key distribution with the key hasher for pools alpha, beta, gamma and delta set to fnv1a_64 while that for pool omega set to hsieh. Also only pool beta uses nodes names for consistent hashing, while pool alpha, gamma, delta and omega use 'host:port:weight' for consistent hashing. Finally, only pool alpha and beta can speak the redis protocol, while pool gamma, delta and omega speak memcached protocol. alpha: listen: 127. 1:22121 hash: fnv1a_64 distribution: ketama auto_eject_hosts: true redis: true server_retry_timeout: 2000 server_failure_limit: 1 servers: - 127. 1:6379:1 beta: listen: 127. 1:22122 hash_tag: "{}" auto_eject_hosts: false timeout: 400 - 127. 1:6380:1 server1 - 127. 1:6381:1 server2 - 127. 1:6382:1 server3 - 127. 1:6383:1 server4 gamma: listen: 127. 1:22123 backlog: 1024 preconnect: true server_failure_limit: 3 - 127. 1:11212:1 - 127. 1:11213:1 delta: listen: 127. 1:22124 timeout: 100 - 127. 1:11214:1 - 127. 1:11215:1 - 127. 1:11216:1 - 127. 1:11217:1 - 127. 1:11218:1 - 127. 1:11219:1 - 127. 1:11220:1 - 127. 1:11221:1 - 127. 1:11222:1 - 127. 1:11223:1 omega: listen: /tmp/gamma 0666 hash: hsieh - 127. 1:11214:100000 Finally, to make writing a syntactically correct configuration file easier, twemproxy provides a command-line argument -t or --test-conf that can be used to test the YAML configuration file for any syntax error. Observability Observability in twemproxy is through logs and stats. Twemproxy exposes stats at the granularity of server pool and servers per pool through the stats monitoring port by responding with the raw data over TCP. The stats are essentially JSON formatted key-value pairs, with the keys corresponding to counter names. By default stats are exposed on port 22222 and aggregated every 30 seconds. Both these values can be configured on program start using the -c or --conf-file and -i or --stats-interval command-line arguments respectively. You can print the description of all stats exported by using the -D or --describe-stats command-line argument. $ nutcracker --describe-stats pool stats: client_eof "# eof on client connections" client_err "# errors on client connections" client_connections "# active client connections" server_ejects "# times backend server was ejected" forward_error "# times we encountered a forwarding error" fragments "# fragments created from a multi-vector request" server stats: server_eof "# eof on server connections" server_err "# errors on server connections" server_timedout "# timeouts on server connections" server_connections "# active server connections" requests "# requests" request_bytes "total request bytes" responses "# responses" response_bytes "total response bytes" in_queue "# requests in incoming queue" in_queue_bytes "current request bytes in incoming queue" out_queue "# requests in outgoing queue" out_queue_bytes "current request bytes in outgoing queue" See notes/ for examples of how to read the stats from the stats port. Logging in twemproxy is only available when twemproxy is built with logging enabled. By default logs are written to stderr. Twemproxy can also be configured to write logs to a specific file through the -o or --output command-line argument. On a running twemproxy, we can turn log levels up and down by sending it SIGTTIN and SIGTTOU signals respectively and reopen log files by sending it SIGHUP signal. Pipelining Twemproxy enables proxying multiple client connections onto one or few server connections. This architectural setup makes it ideal for pipelining requests and responses and hence saving on the round trip time. For example, if twemproxy is proxying three client connections onto a single server and we get requests - get keyrn, set key 0 0 3rnvalrn and delete keyrn on these three connections respectively, twemproxy would try to batch these requests and send them as a single message onto the server connection as get keyrnset key 0 0 3rnvalrndelete keyrn. Pipelining is the reason why twemproxy ends up doing better in terms of throughput even though it introduces an extra hop between the client and server. Deployment If you are deploying twemproxy in production, you might consider reading through the recommendation document to understand the parameters you could tune in twemproxy to run it efficiently in the production environment. Utils collectd-plugin munin-plugin twemproxy-ganglia-module nagios checks circonus puppet module nutcracker-web redis-twemproxy agent sensu-metrics redis-mgr smitty for twemproxy failover Beholder, a Python agent for twemproxy failover chef cookbook twemsentinel Companies using Twemproxy in Production Twitter Wikimedia Pinterest Snapchat Flickr Yahoo! Tumblr Vine Wayfair Kiip Wanelo Kontera Bright Digg Gawkermedia Ooyala Twitch Socrata Hootsuite Trivago Machinezone Path AOL Soysuper Vinted Poshmark FanDuel Bloomreach Tradesy Uber (details) Greta Issues and Support Have a bug or a question? Please create an issue here on GitHub! Committers Manju Rajashekhar (@manju) Lin Yang (@idning) Tyson Andre (@TysonAndre) Thank you to all of our contributors! License Copyright 2012 Twitter, Inc. Licensed under the Apache License, Version 2. 0: Twemproxy: A fast, light-weight proxy for memcached - Twitter ...

Twemproxy: A fast, light-weight proxy for memcached – Twitter …

Today, we’re open sourcing Twemproxy, a fast, light-weight proxy for the memcached protocol. Twemproxy was primarily developed to reduce open connections to our cache servers. At Twitter, we have thousands of front-end application servers, and we run multiple Unicorn instances on each one. Each instance establishes at least one connection to every cache server. As we add front-ends and increase the number of Unicorns per front-end, this connection count continues to increase. As a result the performance of the cache server degrades due to per-connection overhead.
We use Twemproxy to bound the number of connections on the cache servers by deploying it as a local proxy on each of the front-ends. The Unicorns on each front-end connect to the cache servers through the local proxy, which multiplexes and pipelines the front-end requests over a single connection to each cache server. This reduces the number of connections to each cache server by a significant factor.
We have been running Twemproxy in production for six months, during which time we’ve increased the number of front-ends by 30%. Twemproxy has been instrumental in allowing front-end and cache server pools to scale independently as capacity requirements change.
Twemproxy can:
Maintain persistent server connections
Keep connection count on cache servers low
Pipeline requests and responses
Use multiple server pools simultaneously
Proxy the complete memcached ASCII protocol
Easily configure server pools using a YAML file
Use multiple hashing modes, including consistent hashing
Increase observability using stats exposed on a monitoring port
Feel free to use, fork and contribute back to Twemproxy if you can. If you have any questions, feel free to file any issues on GitHub.

Frequently Asked Questions about twemproxy is a proxy developed at twitter

What is twemproxy?

Twemproxy is a proxy server that allows you to reduce the number of open connections to your Memcached or Redis server.Dec 14, 2012

Why twemproxy?

Twemproxy was primarily developed to reduce open connections to our cache servers. At Twitter, we have thousands of front-end application servers, and we run multiple Unicorn instances on each one. … This reduces the number of connections to each cache server by a significant factor.Feb 22, 2012

What is Redis proxy?

Redis Cluster Proxy is a proxy for Redis Clusters. Redis has the ability to run in Cluster mode, where a set of Redis instances will take care of failover and partitioning. … MULTI transactions or blocking commands), the multiplexing gets disabled and the client will have its own cluster connection.

ProxyBoys