Who would not want to have a fast service? No matter how good your web service is, if it takes 5 seconds to load a page, people will dislike using it. Even search engines dislike slow servers and decrease their ranking. Faster is always better. In our article a few months ago we asked what is the fastest web server in the world. The results combined with other arguments (open source, ease of use, security) lead us to decide on using Nginx as our preferred general web server for new web services. However, choosing the software is only the first step on the path to blazing fast web services. Here are some tips on how to optimize Nginx for serving static files and dynamic PHP content.

Metrics and validated learning

When trying to improve something, it is essential to find a way to measure it first. How can you be sure there was any improvement if there is no metric to prove it? Trying to improve something without metrics is most of the time just random and wasted work, and even at best only application of old knowledge. Metrics enables also the persons doing the measurement to learn from each iteration and perhaps discover something new and take the world one small step forward.

In context of web server speed our primary metric is server response time, that is, the time it takes for the server to start sending content in reply to the visitors request. A good tool for testing just that is Apache Bench (command ‘ab’) and as it is rather old and mature, it is available from pretty much any Linux distribution’s repository.

In our case there are two distinct scenarios for response time. The simple one is when a static file is requested (e.g. CSS, image, JavaScript) and the web server only has to parse the request URI, fetch the file from the file system and send it away. The second and more complex case is when there is dynamic content: the web server parses the URI, notices it’s meant for a PHP file, passes the request via FastCGI to the PHP processor. The PHP processor can in turn do more complex things, like query a database for information to be included in the response. Finally when the PHP processor is done, the web server passes the result back to the browser that requested it.

Using Apache Bench we benchmarked the server by requesting a single CSS file thousands of times in succession, and varying the count of concurrent connections. An example command that downloads a CSS file 8000 times using 100 concurrent connections is:

The process for any static file is the same, so there is not much point in benchmarking many different static files. In the second scenario however the response varies a lot depending on what the PHP code in question executes, so to benchmark it we choose to measure the response of the main page and our blog page.

This illustrates very well how much heavier the serving of PHP pages is. With 100 concurrent requests arriving all the time the server resources are quickly saturated, so we did another benchmark using only 10 concurrent requests doing 80 requests in total:

Comparing the 6 milliseconds of a static page to on average 300 milliseconds of a PHP page tells us that serving PHP is 50 times heavier and an obvious goal for our optimization.

Nginx general settings

As the virtual server in question has two CPU cores, the first thing to do was to match the Nginx worker process count to that we changed in /etc/nginx/nginx.conf:

For visitor statistics we use an external service, that tracks the pages mostly via JavaScript, so a separate access log is not needed:

Nginx has the option to cache file descriptors, meaning that if the same file is accessed many times, Nginx will be able to fetch in faster from the file system.

All of these optimizations are likely to effect the total request time by just a few milliseconds, so using the static file benchmark possible small changes were more prominent, but still changes on the scale of 5 ms to 4 ms are really tiny:

The option multi_accept makes the worker process accept all new connections instead of serving on at a time:

A huge keepalive in turn makes the server keep all connections open ready for consecutive requests:

Again, results are minimal. The jumps between 4000 and 5000 for the two changes reflect the point where the response time is rounded to 5 instead of 4 milliseconds.

Gzip: no compression, on-the-fly-compression and pre-compression

It costs some CPU for the web server to compress the output with gzip before sending it out to the client, so we might want to disable it, but on the other hand compressed data during transit is a such a big benefit. To save the web server from over and over gzipping content on per-request-basis, Nginx has an option that makes the server to check if a .gz-ending version of a file exists. If it does, it is sent as the response instead. This enables us to pre-compress static files (but that has to be done with another custom program).

At the moment we have configured this:

Be aware of the cost of these options we benchmarked the situation with either or both off:

All we can conclude from this, is that the differences are so small they are irrelevant. For now we’ll have both options enabled and later we should do benchmarks with different file types and sizes to determine optimal gzip usage on a larger scale.

Nginx, PHP-FPM, APC, FastCGI and FastCGI cache

Nowadays the only viable option to run PHP on Nginx is via FastCGI using the PHP FastCGI Process Manager. For PHP acceleration there are still some other options too, but APC is officially endorsed by the core PHP developers and will be built-in as of PHP6. For optimization APC and FastCGI cache are most interesting.

Depending on your distribution you should have be able to copy the file /usr/share/doc/php-apc/apc.php (or .gz) to your web server root and then view it so see how you PHP object cache performs:

By default the cache was very small and quickly got filled, leading to an inferior cache hit/miss rate. First setting the cache size to 1 gigabyte with option apc.shm_size=1000 in your php.ini and then running some load on the server showed that there is about 60 MB of cacheable objects and then miss-ratio was less than 1%. Eventually in this case setting the cache size to 100 MB was the optimal solution, as we don’t want to waste RAM either.

Fastcgi cache in Nginx

Last, but certainly not least, is the most amazing optimization available for PHP in Nginx: FastCGI cache. With this enabled Nginx omits executing PHP altogether if the requested URL has recently been requested and that result contained headers that allowed caching. As our earlier article on web server speed showed, Nginx servers static files faster than e.g. Varnish, and with this built-in proxy feature available, there is no real need to put Varnish in front of Nginx. In fact, Varnish as an extra step would only slow things down and increase point of failures.

To enable it first add in the main nginx.conf the line:

Also make sure that the defined path exists and the user id running the web server has write access to it.

Then enable the cache in the site configuration files options like these:

The results speak for themselves:

The difference is so big, that the cached page speeds are barely visible at the bottom of the graph.

This graph shows that the FastCGI cache scales as well as if Nginx was serving static files.

More optimization

There is still a lot more to optimize. We could tune the network stack parameters of Linux. We could mount the www directory and cache directories as RAM disks using tmpfs, so that all files would reside in RAM all of the time. Using 32-bit binaries memory usage would be lower. Some PHP apps could be precompiled into bytecode. We could fine tune the settings of PHP-FPM and most importanlty we could fine tune the settings of the database server that PHP uses to store and retrieve data. We are likely to return to these later – stay tuned!

All of the components mentioned before constitute the infrastructure part, and any application will benefit from optimized infrastructure, let it be WordPress, Drupal, Joomla, Moodle, MediaWiki, Roundcube, Magento, SugarCRM, Kolab Groupware or whatever. Still, it is the application itself that has the biggest influence of its speed and performance. If it generates big outputs, parses and traverses complex structures, makes hundreds of database queries etc then it will stay slow. For the FastCGI cache (or any cache actually) to work the application needs to have sane headers with expiration times and no unnecessary cookies set.

In the above example the application is WordPress and there are some WordPress-specific options. In the case of WordPress there are also plugins available, like the W3 Total Cache, which prepares the output from WordPress to be smaller and easily cacheable.

Finally, to make sure that you web server stays fast and to spot any sudden changes, use some kind of monitoring solution that loads several subpages of your site at regular intervals. At Seravo we use Zabbix.

This content was originally published here.