URLs Without Trailing Slash or Extension (Nginx, Apache)

I want clean URLs. And by clean I mean URLs without noise. URLs that contain the least amount of characters needed to describe the resource they point to: Just as I want URLs without ports I want URLs without a trailing slash and without a file extension!

Ok, I admit it, folder and folder/ are different things—on the technical side. file and file.html are also different—again, on the technical side. But the technical side is an implementation detail. As a user, especially as a non-technical one, I just want my pretty web page. I don’t care if it’s a static .html-file or if it was generated by a .php-script. I just want the content, I want it to look slick—and the URL is a part of that.

But, of course, from a technical perspective, we need to take care of some things, exactly because file and file.html are different. This has implications for SEO. We need to make sure all valid paths to a resource are rewritten to the one and only URL.

Let me clear up another thing: I’m solely talking about serving html content to a browser. If it’s a PDF or an image it, of course, should have an extension. Script, style and whatever assets have a right to their extension, too. This is just about html pages.

On we go to the configuration part…

Nginx Configuration

First, we tell nginx which files are suitable candidates to serve an incoming request:

http {
    server {
        try_files $uri $uri.html $uri/index.html =404;

Then, we redirect URLs with an extension or trailing slash to clean ones:

        rewrite ^/index(?:\.html|/)?$ / permanent;
        rewrite ^/(.*)/index(?:\.html|/)?$ /$1 permanent;
        rewrite ^/(.*)(?:\.html|/)$ /$1 permanent;

The first rule redirects /index.html and /index/ to /. The second one redirects /some/path/index.html and some/path/index/ to /some/path. The third redirects /some/path/page.html and /some/path/page/ to /some/path/page.

And that’s it already!

Apache Configuration

First, to serve a request, we issue an internal redirect to the corresponding .html-file:

# configuration block in e.g. .htaccess
    RewriteEngine On
    RewriteBase /

    RewriteCond %{REQUEST_FILENAME}.html -f
    RewriteRule ^(.*)$ $1.html [L]
    RewriteCond %{REQUEST_FILENAME}/index.html -f
    RewriteRule ^(.*)$ $1/index.html [L]

For an incoming request, if request.html exists, serve it. If it doesn’t, check the directory variant request/index.html and serve that, if it exists. That covers the basic case, where the request has an already clean URL.

Next, we want to redirect URLs with a trailing slash or extension to clean ones:

    RewriteCond %{ENV:REDIRECT_STATUS} ^$
    RewriteRule ^(.*)/index(?:\.html)?$ $1 [R=301,L]
    RewriteCond %{ENV:REDIRECT_STATUS} ^$
    RewriteRule ^(.*)(?:\.html|/)$ $1 [R=301,L]

The first rule redirects request/index or request/index.html to request. The second one redirects request.html or request/ to request. Take note of the RewriteCond. It makes sure the redirect is only applied if we are not coming from one of the internal redirects above. Not checking that would lead to an infinite redirection loop. For example: request is internally redirected to request.html for which then the external redirect to request would be applied which is then internally redirected to request.html which is then again external redirected, and so on.

And that does, what we intended to achieve. Well, almost…

By default, Apache’s mod_dir automatically appends a trailing slash to directory requests. That is, a request to folder is redirected to folder/. With our rewrite rules that again leads to an infinite redirection loop. We need to turn that off:

    DirectorySlash Off
# end of configuration block in .htaccess

Now, we are done for good.