URLs Without Trailing Slash or Extension (Nginx, Apache)
I want clean URLs. And by clean I mean URLs without noise. URLs that contain the least amount of characters needed to describe the resource they point to: Just as I want URLs without ports I want URLs without a trailing slash and without a file extension!
Ok, I admit it, folder
and folder/
are different things—on the technical side. file
and file.html
are also different—again, on the technical side. But the technical side is an implementation detail. As a user, especially as a non-technical one, I just want my pretty web page. I don’t care if it’s a static .html
-file or if it was generated by a .php
-script. I just want the content, I want it to look slick—and the URL is a part of that.
But, of course, from a technical perspective, we need to take care of some things, exactly because file
and file.html
are different. This has implications for SEO. We need to make sure all valid paths to a resource are rewritten to the one and only URL.
Let me clear up another thing: I’m solely talking about serving html content to a browser. If it’s a PDF or an image it, of course, should have an extension. Script, style and whatever assets have a right to their extension, too. This is just about html pages.
On we go to the configuration part…
Nginx Configuration
First, we tell nginx which files are suitable candidates to serve an incoming request:
Then, we redirect URLs with an extension or trailing slash to clean ones:
The first rule redirects /index.html
and /index/
to /
.
The second one redirects /some/path/index.html
and some/path/index/
to /some/path
.
The third redirects /some/path/page.html
and /some/path/page/
to /some/path/page
.
And that’s it already!
Apache Configuration
First, to serve a request, we issue an internal redirect to the corresponding .html
-file:
For an incoming request
, if request.html
exists, serve it. If it doesn’t, check the directory variant request/index.html
and serve that, if it exists. That covers the basic case, where the request has an already clean URL.
Next, we want to redirect URLs with a trailing slash or extension to clean ones:
The first rule redirects request/index
or request/index.html
to request
. The second one redirects request.html
or request/
to request
. Take note of the RewriteCond
. It makes sure the redirect is only applied if we are not coming from one of the internal redirects above. Not checking that would lead to an infinite redirection loop. For example: request
is internally redirected to request.html
for which then the external redirect to request
would be applied which is then internally redirected to request.html
which is then again external redirected, and so on.
And that does, what we intended to achieve. Well, almost…
By default, Apache’s mod_dir automatically appends a trailing slash to directory requests. That is, a request to folder
is redirected to folder/
. With our rewrite rules that again leads to an infinite redirection loop. We need to turn that off:
Now, we are done for good.