I want clean URLs. And by clean I mean URLs without noise. URLs that contain the least amount of characters needed to describe the resource they point to: Just as I want URLs without ports I want URLs without a trailing slash and without a file extension!
Ok, I admit it,
folder/ are different things—on the technical side.
file.html are also different—again, on the technical side. But the technical side is an implementation detail. As a user, especially as a non-technical one, I just want my pretty web page. I don’t care if it’s a static
.html-file or if it was generated by a
.php-script. I just want the content, I want it to look slick—and the URL is a part of that.
But, of course, from a technical perspective, we need to take care of some things, exactly because
file.html are different. This has implications for SEO. We need to make sure all valid paths to a resource are rewritten to the one and only URL.
Let me clear up another thing: I’m solely talking about serving html content to a browser. If it’s a PDF or an image it, of course, should have an extension. Script, style and whatever assets have a right to their extension, too. This is just about html pages.
On we go to the configuration part…
First, we tell nginx which files are suitable candidates to serve an incoming request:
Then, we redirect URLs with an extension or trailing slash to clean ones:
The first rule redirects
The second one redirects
The third redirects
And that’s it already!
First, to serve a request, we issue an internal redirect to the corresponding
For an incoming
request.html exists, serve it. If it doesn’t, check the directory variant
request/index.html and serve that, if it exists. That covers the basic case, where the request has an already clean URL.
Next, we want to redirect URLs with a trailing slash or extension to clean ones:
The first rule redirects
request. The second one redirects
request. Take note of the
RewriteCond. It makes sure the redirect is only applied if we are not coming from one of the internal redirects above. Not checking that would lead to an infinite redirection loop. For example:
request is internally redirected to
request.html for which then the external redirect to
request would be applied which is then internally redirected to
request.html which is then again external redirected, and so on.
And that does, what we intended to achieve. Well, almost…
By default, Apache’s mod_dir automatically appends a trailing slash to directory requests. That is, a request to
folder is redirected to
folder/. With our rewrite rules that again leads to an infinite redirection loop. We need to turn that off:
Now, we are done for good.