HAProxy, Varnish and the single hostname website

As explained in a previous article, HAProxy and Varnish are two great OpenSource software which aim to improve performance, resilience and scalability of web applications.
We saw also that these two softwares are not competitors. Instead of that, they can work properly together, each one bringing the other one its features, making any web infrastructure more agile and robust at the same time.

In the current article, I’m going to explain how to use both of them on a web application hosted on a single domain name.

Main advantages of each soft


As a reminder, here are the main features each product owns.

HAProxy


HAProxy‘s main features:

  • Real load-balancer with smart persistence
  • Request queueing
  • Transparent proxy

Varnish


Varnish‘s main features:

  • Cache server with stale content delivery
  • Content compression
  • Edge Side Includes

Common features


HAProxy and Varnish both have the features below:

  • Content switching
  • URL rewritting
  • DDOS protection

So if we need any of them, we could use either HAProxy or Varnish.

Why a single domain

In web application, there are two types of content: static and dynamic.

By dynamic, I mean content which is generated on the fly and which is dedicated to a single user based on its current browsing on the application. Anything which is not in this category, can be considered as static. Even a page which is generated by PHP and whose content does change every minutes or few seconds (like the CMS WordPress or drupal). I call these pages “pseudo-static

The biggest strength of Varnish is that it can cache static objects, delivering them on behalf of the server, offloading most of the traffic from the server.



An object is identified by a Host header and its URL. When you have a single domain name, you have a single Host header for all your requests: static, pseudo static or dynamic.

You can’t split your traffic: everything requests must arrive on a single type of device: the LB, the cache, etc…

A good practise to split dynamic and static content is to use one domain name per type of object: http://www.domain.tld for dynamic and static.domain.tld for static content. Doing that you could forward dynamic traffic to the LB and static traffic to the caches directly.



Now, I guess you understand that the web application host naming can have an impact on the platform you’re going to build.

In the current article, I’ll only focus on applications using a single domain name. We’ll see how we can route traffic to the right product despite the limitation of the single domain name.



Don’t worry, I’ll write an other article later about the fun we could have when building a platform for an application hosted on multiple domain names.

Available architectures

Considering I summarize the “web application” as a single brick called “APPSERVER“, we have 2 main architectures available:

  1. CLIENT ==> HAPROXY ==> VARNISH ==> APPSERVER
  2. CLIENT ==> VARNISH ==> HAPROXY ==> APPSERVER

Pro and cons of HAProxy in front of Varnish


Pros:

  • Use HAProxy‘s smart load-balancing algorithm such as uri, url_param to make varnish caching more efficient and improve the hit rate
  • Make the Varnish layer scalable, since load-balanced
  • Protect Varnish ramp up when starting up (related to thread pool creation)
  • HAProxy can protect against DDOS and slowloris
  • Varnish can be used as a WAF

Cons:

  • no easy way do do application layer persistence
  • HAProxy queueing system can hardly protect the application hidden by Varnish
  • The client IP will be mandatory forwwarded on the X-Forwarded-For header (or any header you want)

Pro and cons of Varnish in front of HAProxy


Pros:

  • Smart layer 7 persistence with HAProxy
  • HAProxy layer scalable (with persistence preserved) since load-balanced by Varnish
  • APPSERVER protection through HAProxy request queueing
  • Varnish can be used as a WAF
  • HAProxy can use the client IP address (provided by Varnish in a HTTP header) to do Transparent proying (getting connected on APPSERVER with the client ip)

Cons:

  • HAProxy can’t protect against DDOS, Varnish will do
  • Cache size must be big enough to store all objects
  • Varnish layer not scalable

Finally, which is the best architecture??


No need to choose between both architecture above which one is the less worst for you.

It would be better to build a platform where there are no negative points.

The Architecture


The diagram below shows the architecture we’re going to work on.
haproxy_varnish
Legend:

  • H: HAProxy Load-Balancers (could be ALOHA Load-Balancer or any home made)
  • V: Varnish servers
  • S: Web application servers, whatever the product used here (tomcat, jboss, etc…)…
  • C: Client or end user

Main roles of each layers:

  • HAProxy: Layer 7 traffic routing, first row of protection against DDOS (syn flood, slowloris, etc…), application request flow optimiation
  • Varnish: Caching, compression. Could be used later as a WAF to protect the application
  • Server: hosts the application and the static content
  • Client: browse and use the web application

traffic flow


Basically, the client will send all the requests to HAProxy, then HAProxy, based on URL or file extension will take a routing decision:

  • If the request looks to be for a (pseudo) static object, then forward it to Varnish
    If Varnish misses the object, it will use HAProxy to get the content from the server.
  • Send all the other requests to the appserver. If we’ve done our job properly, there should be only dynamic traffic here.

I don’t want to use Varnish as the default option in the flow, cause a dynamic content could be cached, which could lead to somebody’s personal information sent to everybody

Furthermore, in case of massive misses or purposely built request to bypass the caches, I don’t the servers to be hammered by Varnish, so HAProxy protects them with a tight traffic regulation between Varnish and appservers..

Dynamic traffic flow


The diagram below shows how the request requiring dynamic content should be ideally routed through the platform:
haproxy_varnish_dynamic_flow
Legend:

  1. The client sends its request to HAProxy
  2. HAProxy chooses a server based on cookie persistence or Load-Balancing Algorithm if there is no cookie.
    The server processes the request and send the response back to HAPRoxy which forwards it to the client

Static traffic flow


The diagram below shows how the request requiring static content should be ideally routed through the platform:
haproxy_varnish_static_flow

  1. The client sends its request to HAProxy which sees it asks for a static content
  2. HAProxy forward the request to Varnish. If Varnish has the object in Cache (a HIT), it forwards it directly to HAProxy.
  3. If Varnish doesn’t have the object in cache or if the cache has expired, then Varnish forwards the request to HAProxy
  4. HAProxy randomly chooses a server. The response goes back to the client through Varnish.

In case of a MISS, the flow looks heavy 🙂 I want to do it that way to use the HAProxy traffic regulation features to prevent Varnish to flood the servers. Furthermore, since Varnish sees only static content, its HIT rate is over 98%… So the overhead is very low and the protection is improved.

Pros of such architecture

  • Use smart load-balancing algorithm such as uri, url_param to make varnish caching more efficient and improve the hit rate
  • Make the Varnish layer scalable, since load-balanced
  • Startup protection for Varnish and APPSERVER, allowing server reboot or farm expansion even under heavy load
  • HAProxy can protect against DDOS and slowloris
  • Smart layer 7 persistence with HAProxy
  • APPSERVER protection through HAProxy request queueing
  • HAProxy can use the client IP address to do Transparent proxying (getting connected on APPSERVER with the client ip)
  • Cache farm failure detection and routing to application servers (worst case management)
  • Can load-balance any type of TCP based protocol hosted on APPSERVER

Cons of such architecture


To be totally fair, there are a few “non-blocking” issues:

  • HAProxy layer is hardly scalable (must use 2 crossed Virtual IPs declared in the DNS)
  • Varnish can’t be used as a WAF since it will see only static traffic passing through. This can be updated very easily

Configuration

HAProxy Configuration

# On Aloha, the global section is already setup for you
# and the haproxy stats socket is available at /var/run/haproxy.stats
global
  stats socket ./haproxy.stats level admin
  log 10.0.1.10 local3

# default options
defaults
  option http-server-close
  mode http
  log global
  option httplog
  timeout connect 5s
  timeout client 20s
  timeout server 15s
  timeout check 1s
  timeout http-keep-alive 1s
  timeout http-request 10s  # slowloris protection
  default-server inter 3s fall 2 rise 2 slowstart 60s

# HAProxy's stats
listen stats
  bind 10.0.1.3:8880
  stats enable
  stats hide-version
  stats uri     /
  stats realm   HAProxy\ Statistics
  stats auth    admin:admin

# main frontend dedicated to end users
frontend ft_web
  bind 10.0.0.3:80
  acl static_content path_end .jpg .gif .png .css .js .htm .html
  acl pseudo_static path_end .php ! path_beg /dynamic/
  acl image_php path_beg /images.php
  acl varnish_available nbsrv(bk_varnish_uri) ge 1
  # Caches health detection + routing decision
  use_backend bk_varnish_uri if varnish_available static_content
  use_backend bk_varnish_uri if varnish_available pseudo_static
  use_backend bk_varnish_url_param if varnish_available image_php
  # dynamic content or all caches are unavailable
  default_backend bk_appsrv

# appsrv backend for dynamic content
backend bk_appsrv
  balance roundrobin
  # app servers must say if everything is fine on their side
  # and they can process requests
  option httpchk
  option httpchk GET /appcheck
  http-check expect rstring [oO][kK]
  cookie SERVERID insert indirect nocache
  # Transparent proxying using the client IP from the TCP connection
  source 10.0.1.1 usesrc clientip
  server s1 10.0.1.101:80 cookie s1 check maxconn 250
  server s2 10.0.1.102:80 cookie s2 check maxconn 250

# static backend with balance based on the uri, including the query string
# to avoid caching an object on several caches
backend bk_varnish_uri
  balance uri # in latest HAProxy version, one can add 'whole' keyword
  # Varnish must tell it's ready to accept traffic
  option httpchk HEAD /varnishcheck
  http-check expect status 200
  # client IP information
  option forwardfor
  # avoid request redistribution when the number of caches changes (crash or start up)
  hash-type consistent
  server varnish1 10.0.1.201:80 check maxconn 1000
  server varnish2 10.0.1.202:80 check maxconn 1000

# cache backend with balance based on the value of the URL parameter called "id"
# to avoid caching an object on several caches
backend bk_varnish_url_param
  balance url_param id
  # client IP information
  option forwardfor
  # avoid request redistribution when the number of caches changes (crash or start up)
  hash-type consistent
  server varnish1 10.0.1.201:80 maxconn 1000 track bk_varnish_uri/varnish1
  server varnish2 10.0.1.202:80 maxconn 1000 track bk_varnish_uri/varnish2

# frontend used by Varnish servers when updating their cache
frontend ft_web_static
  bind 10.0.1.3:80
  monitor-uri /haproxycheck
  # Tells Varnish to stop asking for static content when servers are dead
  # Varnish would deliver staled content
  monitor fail if nbsrv(bk_appsrv_static) eq 0
  default_backend bk_appsrv_static

# appsrv backend used by Varnish to update their cache
backend bk_appsrv_static
  balance roundrobin
  # anything different than a status code 200 on the URL /staticcheck.txt
  # must be considered as an error
  option httpchk
  option httpchk HEAD /staticcheck.txt
  http-check expect status 200
  # Transparent proxying using the client IP provided by X-Forwarded-For header
  source 10.0.1.1 usesrc hdr_ip(X-Forwarded-For)
  server s1 10.0.1.101:80 check maxconn 50 slowstart 10s
  server s2 10.0.1.102:80 check maxconn 50 slowstart 10s

Varnish Configuration

backend bk_appsrv_static {
        .host = "10.0.1.3";
        .port = "80";
        .connect_timeout = 3s;
        .first_byte_timeout = 10s;
        .between_bytes_timeout = 5s;
        .probe = {
                .url = "/haproxycheck";
                .expected_response = 200;
                .timeout = 1s;
                .interval = 3s;
                .window = 2;
                .threshold = 2;
                .initial = 2;
        }
}

acl purge {
        "localhost";
}

sub vcl_recv {
### Default options

        # Health Checking
        if (req.url == /varnishcheck) {
                error 751 "health check OK!";
        }

        # Set default backend
        set req.backend = bk_appsrv_static;

        # grace period (stale content delivery while revalidating)
        set req.grace = 30s;

        # Purge request
        if (req.request == "PURGE") {
                if (!client.ip ~ purge) {
                        error 405 "Not allowed.";
                }
                return (lookup);
        }

        # Accept-Encoding header clean-up
        if (req.http.Accept-Encoding) {
                # use gzip when possible, otherwise use deflate
                if (req.http.Accept-Encoding ~ "gzip") {
                        set req.http.Accept-Encoding = "gzip";
                } elsif (req.http.Accept-Encoding ~ "deflate") {
                        set req.http.Accept-Encoding = "deflate";
                } else {
                        # unknown algorithm, remove accept-encoding header
                        unset req.http.Accept-Encoding;
                }

                # Microsoft Internet Explorer 6 is well know to be buggy with compression and css / js
                if (req.url ~ "\.(css|js)" && req.http.User-Agent ~ "MSIE 6") {
                        remove req.http.Accept-Encoding;
                }
        }

### Per host/application configuration
        # bk_appsrv_static
        # Stale content delivery
        if (req.backend.healthy) {
                set req.grace = 30s;
        } else {
                set req.grace = 1d;
        }

        # Cookie ignored in these static pages
        unset req.http.cookie;

### Common options
         # Static objects are first looked up in the cache
        if (req.url ~ "\.(png|gif|jpg|swf|css|js)(\?.*|)$") {
                return (lookup);
        }

        # if we arrive here, we look for the object in the cache
        return (lookup);
}

sub vcl_hash {
        hash_data(req.url);
        if (req.http.host) {
                hash_data(req.http.host);
        } else {
                hash_data(server.ip);
        }
        return (hash);
}

sub vcl_hit {
        # Purge
        if (req.request == "PURGE") {
                set obj.ttl = 0s;
                error 200 "Purged.";
        }

        return (deliver);
}

sub vcl_miss {
        # Purge
        if (req.request == "PURGE") {
                error 404 "Not in cache.";
        }

        return (fetch);
}

sub vcl_fetch {
        # Stale content delivery
        set beresp.grace = 1d;

        # Hide Server information
        unset beresp.http.Server;

        # Store compressed objects in memory
        # They would be uncompressed on the fly by Varnish if the client doesn't support compression
        if (beresp.http.content-type ~ "(text|application)") {
                set beresp.do_gzip = true;
        }

        # remove any cookie on static or pseudo-static objects
        unset beresp.http.set-cookie;

        return (deliver);
}

sub vcl_deliver {
        unset resp.http.via;
        unset resp.http.x-varnish;

        # could be useful to know if the object was in cache or not
        if (obj.hits > 0) {
                set resp.http.X-Cache = "HIT";
        } else {
                set resp.http.X-Cache = "MISS";
        }

        return (deliver);
}

sub vcl_error {
        # Health check
        if (obj.status == 751) {
                set obj.status = 200;
                return (deliver);
        }
}
  

Related links

Links

Advertisements

About Baptiste Assmann

Aloha Product Manager HAProxy consultant
This entry was posted in Aloha, architecture, HAProxy, performance and tagged , , , , , , , , , , . Bookmark the permalink.

32 Responses to HAProxy, Varnish and the single hostname website

  1. Cyril Bonté says:

    Hi Baptiste,
    there’s a small typo hidden in the sections : the 2nd “Pro and cons of HAProxy in front of Varnish” should be “Pro and cons of Varnish in front of HAProxy” 😉

  2. Hi Baptiste,

    nice useful article as usual. I’d like to mention that when there are more than 2 caches, it’s worth using “hash-type consistent” in the backend and a slowstart
    parameter on server lines. This avoids redistributing all objects across remaining servers. The slowstart also allows the cache to slowly take traffic, leaving some time to get objects and avoid starting with 100% misses.

    • Hi Willy,

      Thank for your comment.
      Note the slowstart is already present in the defaults section, on the default-server line 😉
      Let me update the conf with the hash-type parameter.

      cheers

  3. Pingback: HAProxy and Varnish comparison | Exceliance – Aloha Load Balancer

  4. Thanks Baptiste for this article.
    I’m using similar setup since 1 year now on a big e-commerce website, and it’s works very fine.
    I was using only 1 varnish (active-passive) but hash-type consistent should help me.

    Thanks Again !

  5. Pingback: Application Delivery Controller and ecommerce websites | Exceliance – Aloha Load Balancer

  6. Pingback: Scalable WAF protection with HAProxy and Apache with modsecurity | Exceliance – Aloha Load Balancer

  7. Pingback: high performance WAF platform with Naxsi and HAProxy | Exceliance – Aloha Load Balancer

  8. APZ says:

    Hi Bapsite,
    I need a little help in understanding few things. I am using HAProxy to load balance a bunch of PHP servers and want to introduce Varnish in the scene now.
    HAP sends request to Varnish iff app-login cookie is not available, Varnish doesnt have to do anything here except to serve request(Cache HIT) or send it back to HAP incase of Cache MISS, HAP then selects a PHP server & resources are fetched from it and served to client by HAP through Varnish. I have the following config file.
    I don’t understand 2 things here;

    in cache miss scenario varnish gives request back to HAP which checks for the app-login cookie, doesn’t find it and sends it back to varnish (end less loop situation), I can have Varnish set a cookie and make HAP to check for that and select PHP servers backend on this basis(suggestions welcome).
    Secondly, how can I achieve that when resources are fetched by HAP in case of MISS it sends to varnish and then it is served to client , this way Varnish builds up the cache eventually.
    Please also let me know if some thing critical is missed out here.

    Thanks in advance

    Config File

    #BE for Varnish is HAP in this machine
    backend default {
    .host = “127.0.0.1”;
    .port = “80”;
    }

    sub vcl_recv {
    # HAP sends request to Varnish iff app-login cookie is not available
    # Varnish doesnt have to do anything here except to serve request(Cache HIT) or
    # send it back to HAP incase of Cache MISS, resouces are then fetched from PHP servers
    # and served to client by HAP through Varnish

    # We unset the cookies here as they dont affect the response
    unset req.http.cookie;
    # Lighttpd is already compressing resources, so we dont do it here.
    return (lookup); # Control is passed to vcl_hit or vcl_miss
    }
    sub vcl_hit {
    return (deliver);
    }

    sub vcl_miss {
    return (fetch);
    }

    sub vcl_fetch {
    set obj.ttl = 1m;
    return (deliver);
    }

    sub vcl_deliver {
    if (obj.hits > 0) {
    set resp.http.X-Cache = “HIT”;
    } else {
    set resp.http.X-Cache = “MISS”;
    }
    return (deliver);
    }

    sub vcl_init {
    return (ok);
    }

    sub vcl_fini {
    return (ok);
    }

  9. Hi,

    The response is in your HAProxy configuration: since you know your varnish IPs, you can use an ACL to force routing the traffic to your PHP farm, breaking the loop.
    Once the PHP server answers HAProxy, then the response is follow the reverse pass, so the response pass through Varnish (so it’s cached) then pass back to HAProxy then the client.

    I hope this helps.

    cheers

  10. APZ says:

    Thanks Bapsite, that was certainly helpful, one thing I realized that loop condition I mentioned is not possible as in vcl_recv I unset the cookie so when cache miss happens HAP gets request from Varnish, fetches resources and as you mentioned follows the reverse path and goes to HAP via Varnish and finally to user.
    I tested this condition and HAP logs clearly states that it’s working:

    108.108.108.108:21478 [08/Jan/2013:11:58:16.637] inbound varnish/varnish0 1/0/0/1803/2182 200 45872 – – —- 2/2/0/1/0 0/0 {abc.xxx.com} “GET / HTTP/1.1”

    192.168.1.1:37029 [08/Jan/2013:11:58:16.639] inbound worker/worker0 0/0/0/1796/1802 200 45776 – – –NI 2/2/0/1/0 0/0 {abc.xxx..com} “GET / HTTP/1.1”

    Call comes to HAP, goes to varnish, cache miss happens and worker0 is used to get resources by HAP and the next call Varnish serves everything from cache(varnishlog told me this) and PHP servers are not contacted.

    Thanks again 🙂

  11. APZ says:

    Correction – You were right Baptiste, I had to put a ACL in order to avoid the loop situation and force traffic to PHP Servers in case of Cache MISS, when I posted my earlier comment I was testing the wrong condition, putting a ACL did help me attain the desired result.

    Thanks

  12. Matteo says:

    Hello boys,
    Thanks for your help in advantage. I want to know if HAProxy configuration described in this article it is possible to set at pfsense with HAProxy. Have a nice day. Regards. Matteo

  13. Matteo says:

    Hello,
    I’ve installated HAproxy on Debian with kernel Linux 2.6.32+.
    When HAproxy starts, I’ve the following error:
    Starting haproxy: haproxy

    [ALERT] 024/152707 (1802) : parsing [/etc/haproxy/hapr xy.cfg:54] : ‘usesrc’ not allowed here because support for TPROXY was not compiled in.

    [ALERT] 024/152707 (1802) : parsing [/etc/haproxy/haproxy.cfg:92] : error detected while parsing a ‘monitor fail’ condition.

    [ALERT] 024/152707 (1802) : parsing [/etc/haproxy/haproxy.cfg:104] : ‘usesrc’ not allowed here because support for TPROXY was not compiled in.

    [ALERT] 024/152707 (1802) : Error(s) found in configuration file : /etc/haproxy/haproxy.cfg

    [ALERT] 024/152707 (1802) : Fatal errors found in configuration.

    May you help me? Have any suggestions?
    Best regards.
    Matteo

    • Matteo says:

      Hi all,
      With new installation I’ve only one error:
      [ALERT] 024/152707 (1802) : parsing [/etc/haproxy/haproxy.cfg:92] : error detected while parsing a ‘monitor fail’ condition.

      monitor fail if nbsrv(bk_appsrv_static) eq 0

      Thanks. Waiting a feedback.
      Best regards.

  14. dtorgy says:

    I have been playing around with a setup like this:
    L4 -> haproxy (scale group) -> caching servers (scale group) -> back to the haproxy that it came from -> app servers.
    This allows us to not send all traffic through the caching servers because for some paths/domains the haproxy servers behind the L4 load balancer will send traffic directly to the app servers. We can also handle the failure case of the caching servers being down because the app servers are in a “backup backend”.

    I am wanting to implement this same type of setup in our production environments but I am curious if anyone else has played around with a setup like this. Thoughts?

  15. timworking says:

    I know this is way late and you might never come back to this, but your IPs in this config are very confusing. The images at the top show 2 HAProxy servers, but the config only seems to use one. You use 10.0.0.3 as your “frontend ft_web”, what IP is this? It’s not the HAProxy server.

    Also, how are you setting all these check urls? How are you setting them in HAProxy or Varnish? There’s no “web root” that I can figure out on these apps, so how is Varnish checking on the HAProxy server and vice versa?

    • Hi Tim,

      10.0.0.3 is a Virtual IP address hosted by VRRP protocol and shared by both HAProxy servers.
      Note that only one server can own the IP at a time.

      Concerning health checking, it’s in the varnish configuration: you can force varnish to forge a response on specific URLs.

      Baptiste

  16. Mike Sirs says:

    Hello
    do you have any news about configuration for multiple domain names?
    Thank you in advance!

  17. Mike Sirs says:

    hello
    did you write anything new about haproxy + varnish 🙂 ?

  18. Pingback: Scalable WAF protection with HAProxy and Apache with modsecurity | HAProxy Technologies – Aloha Load Balancer

  19. Pingback: high performance WAF platform with Naxsi and HAProxy | HAProxy Technologies – Aloha Load Balancer

  20. Pingback: Application Delivery Controller and ecommerce websites | HAProxy Technologies – Aloha Load Balancer

  21. Thomas Decaux says:

    What about HTTPS endpoint? Since Varnish doesnt talk SSL, I think this is the most important check point to consider, so HAProxy in first!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s