Simulated Virtual Hosts with Apache Server

By Randall E. Krause
Last Updated December 13, 2006


I. Introduction.

The following document describes how to use Apache's mod_rewrite and mod_setenvif directives to simulate virtual hosts on a shared Web-hosting platform. Installation instructions and configuration examples are included with each method.

II. Table of Contents.

  1. Wildcard Subdomain Virtual Hosts
  2. Wildcard Domain Virtual Hosts
  3. Directory-Mapped Virtual Hosts
  4. Customized Error Responses

III. Prerequisites.

This document assumes the following minimum system requirements:

  • Apache Server 1.3 or higher
  • Apache Rewrite Module
  • Apache SetEnvIf Module
  • Apache Includes Module (for the SSI examples)
  • Perl 5.6.1 or higher (for the Perl examples)

A basic knowledge of both mod_rewrite and POSIX compliant regular expressions is also highly recommended.

The examples have been thoroughly tested under two separate installs of the FreeBSD operating system. Compatibility with Microsoft Windows or with any other variant of UNIX may be limited or non-existent.

IV. Terms of Use.

This document represents ongoing research and development. It is a recommendation and a work-in-progress. It should not be construed as a standard.

Copyright © 2006, Randall E. Krause. Some Rights Reserved.

All software documentation and source-code contained herein may be freely distributed and/or modified under the terms and conditions of The Artistic License as published by Larry Wall, either version 2.0, or any later revision.

http://www.perlfoundation.org/legal/licenses/artistic-2_0.html

V. Troubleshooting.

The examples should work "out of the box," with minimal changes. Be sure to read the a Reminder and ~ Caveat notices to avoid common pitfalls. If you continue to experience technical difficulties, ask your hosting provider to enable rewrite logging.

For feature requests, bug reports, or portability issues do not hesitate to contact me. All contributions are appreciated.



1. Wildcard Subdomain Virtual Hosts

This first method provides a simple, robust means of achieving mass virtual hosting via wildcard subdomains. There is no need for specially-named subdirectories. Additionally, an environment variable is available for later use by dynamically-generated pages.

1.1. Source Code Framework

Complete source code for this method is shown in Figure 1.1. It is imperative that this text be inserted near the top of the .htaccess file located at the document root. Placing it elsewhere may cause undesireable results.

Figure 1.1

Source Code Snippet of /.htaccess

RewriteEngine On
RewriteBase / 

####################################################################
##                                                                ##
## Wildcard Subdomain Virtual Hosts (v2.01)                       ##
## Copyright (c) 2006, Randall E. Krause                          ##
## http://www.perlfoundation.org/legal/licenses/artistic-2_0.html ##
##                                                                ##
####################################################################


Configuration Options

Host Name:   Port Number:   Home Directory:

  


The RewriteEngine directive need only appear once in a per-directory configuration file. Duplicate instances can therefore be removed. Also, it is recommended that the RewriteBase be properly set, otherwise all requests will most likely fail.

1.2. Directory Occlusion Side-Effect

Nominally, if the the request URI references a directory path that mirrors the current subdomain's directory path, then that directory should still be processed by the rewrite rule.

http://images.example.com/subdomain_images/

This URL, despite its redundancy, is supposed to reference the subdomain_images directory after an internal redirection to the subdomain's subdomain_images directory:

/subdomain_images/subdomain_images/

However, using more commonplace methods, the correct internal redirect will not occur. Hence, the subdomain_images directory is actually occluded by its parent directory. Hence, the client is directed to

/subdomain_images/

To avoid this side-effect, we employ an environment variable named SUBDOMAIN to prevent unwanted recursion during the rewriting process. This is shown in Figure 1.2.

Figure 1.2:

RewriteCond %{ENV:REDIRECT_SUBDOMAIN} =""

1.3. Strict Input-Field Validation

The HOST header can be forged with any falsified or corrupt data, therefore it is important to verify that it complies with the HTTP/1.1 specification. If your primary site is on its own IP address, then Apache most likely hasn't even validated the HOST header. Thus we employ strict validation of this input-field before proceeding. This is shown in Figure 1.3.

Figure 1.3

RewriteCond %{HTTP_HOST} ^(www\.)?([a-z0-9][-a-z0-9]+)\.example\.org\.?(:80)?$ [NC] 

a Reminder: You will want to replace "example\.org" with your actual domain name. Be sure to escape each dot with a corresponding backslash.


1.4. Verifying the Subdomain Directory

It is important to verify that the subdomain has its own directory. If one is not found (or if the subdomain is reserved), the request should immediately default to the primary site. This is shown in Figure 1.4.

Figure 1.4

RewriteCond %2 !^www|ftp|mail|pop3|localhost$
RewriteCond %{DOCUMENT_ROOT}/subdomains/%2 -d

a Reminder: If your subdomain directories are to be located someplace other than under the subdomains directory, then you will want to replace "%{DOCUMENT_ROOT}/subdomains/%2" with the actual path. For example, if they are based directly off of the document root, simply insert "%{DOCUMENT_ROOT}/%2" by itself.

~ Caveat: On some shared-hosting configurations, the special DOCUMENT_ROOT variable may not be correct until the rewriting phase is complete (for whatever reason). Therefore it may be necessary to refer to ENV:DOCUMENT_ROOT instead. If that still does not work, then as a last resort you can hard-code the path as shown in Figure 1.5.

Figure 1.5

RewriteCond /usr/local/apache/htdocs/subdomains/%2 -d

1.5. Environment Variables for CGI Scripts

Within a CGI script or SSI page, it should be possible to determine whether a page was accessed via a subdomain. Thus, we produce an environment variable as shown in Figure 1.6.

Figure 1.6:

RewriteRule ^(.*) subdomains/%2/$1 [E=SUBDOMAIN:%1,L] 

a Reminder: As previously, if your subdomain directories are to be located someplace other than under the subdomains directory, then you will want to replace "subdomains/%2/$1" with the actual path. For example, if they are based directly off of the document root, simply insert "%2/$1" by itself.

The variable, SUBDOMAIN, will always contain the subdomain obtained from the request or an empty string if the directory couldn't be located (see Section 1.4). Due to how Apache manages existing environment variables during internal redirects (by prepending REDIRECT_ to the name) both of these requirements may be accomplished in one step, as shown in Figure 1.7.

Figure 1.7:

RewriteRule ^ - [E=SUBDOMAIN:%{ENV:REDIRECT_SUBDOMAIN}] 

While the variable is guaranteed to be valid, its case is entirely dependent on the user-agent. You should typically use the lc( ) function in Perl to ensure output consistency. This is detailed in Figure 1.8.

Figure 1.8:

#!/usr/local/bin/perl

print( "Content-type: text/html\n\n" );
if( $ENV{ 'SUBDOMAIN' } ne "" )
{
	printf( qq[<P>You requested the %s subdomain</P>], lc( $ENV{ 'SUBDOMAIN' } ) );
}
exit( 0 );



2. Wildcard Domain Virtual Hosts

This second method is an alternate means of achieving mass virtual hosting via wildcard domains. Again, there is no need for specially-named subdirectories. Additionally, an environment variable is available for later use by dynamically-generated pages.

2.1. Source Code Framework

Complete source code for this method is shown in Figure 2.1. It is imperative that this text be inserted near the top of the .htaccess file located at the document root. Placing it elsewhere may cause undesireable results.

Figure 2.1:

Source Code Snippet of /.htaccess

RewriteEngine On
RewriteBase /

####################################################################
##                                                                ##
## Wildcard Domain Virtual Hosts (v2.01)                          ##
## Copyright (c) 2006, Randall E. Krause                          ##
## http://www.perlfoundation.org/legal/licenses/artistic-2_0.html ##
##                                                                ##
####################################################################


Configuration Options

Port Number:   Home Directory:   TLD Extension Grouping

  

i Update: This version is no longer supported. A new version is currently in development that will be upwards compatible with the directory-mapped virtual hosts described in Section 3 and the customized error responses described in Section 4.

The RewriteEngine directive need only appear once in a per-directory configuration file. Duplicate instances can therefore be removed. Also, it is recommended that the RewriteBase be properly set, otherwise all requests will most likely fail.

2.2. Directory Occlusion Side-Effect

Nominally, if the the request URI references a directory path that mirrors the current domain's directory path, then that directory should still be processed by the rewrite rule.

http://www.example.com/domain_example/

This URL, despite its redundancy, is supposed to reference the domain_example directory after an internal redirection to the domain's domain_example directory:

/domain_example/domain_example/

However, using more commonplace methods, the correct internal redirect will not occur. Hence, the domain_example directory is actually occluded by its parent directory. Hence, the client is directed to

/domain_example/

To avoid this side-effect, we employ an environment variable named DOMAIN to prevent unwanted recursion during the rewriting process. This is shown in Figure 2.2.

Figure 2.2:

RewriteCond %{ENV:REDIRECT_DOMAIN} =""

2.3. Strict Input-Field Validation

The HOST header can be forged with any falsified or corrupt data, therefore it is important to verify that it complies with the HTTP/1.1 specification. If your primary site is on its own IP address, then Apache most likely hasn't even validated the HOST header. Thus we employ strict validation of this input-field before proceeding. This is shown in Figure 2.3.

Figure 2.3

RewriteCond %{HTTP_HOST} \.?(([a-z][-a-z0-9]+)\.[a-z]+)\.?(:80)?$ [NC]

2.4. Verifying the Domain Directory

It is important to verify that the domain has its own directory. If one is not found, the request needs to default to the document root. This is shown in Figure 2.4.

Figure 2.4

RewriteCond %{DOCUMENT_ROOT}/domains/%2 -d 

a Reminder: If your domain directories are to be located someplace other than under the domains directory, then you will want to replace "%{DOCUMENT_ROOT}/domains/%2" with the actual path. For example, if they are based directly off of the document root, insert "%{DOCUMENT_ROOT}/%2" exclusively.

a Reminder: This method assumes the TLD extension is insignificant. In other words, requests for both example.info and example.com will be associated with the same domain directory, "example", which is normally what you want. If however, TLD extensions should not be grouped in this manner, then replace "%2" with "%1".

~ Caveat: On some shared-hosting configurations, the special DOCUMENT_ROOT variable may not be correct until the rewriting phase is complete (for whatever reason). Therefore it may be necessary to refer to ENV:DOCUMENT_ROOT instead. If that still does not work, then as a last resort you can hard-code the path as shown in Figure 1.5.

Figure 2.5

RewriteCond /usr/local/apache/htdocs/domains/%2 -d

2.5. Environment Variables for CGI Scripts

Within a CGI script or SSI page, it should be possible to determine whether a page was accessed via a domain. Thus, we produce an environment variable as shown in Figure 2.6.

Figure 2.6:

RewriteRule ^(.*) domains/%2/$1 [E=DOMAIN:%1,L] 

a Reminder: As previously, if your domain directories are to be located someplace other than under the domains directory, then you will want to replace "domains/%1/$1" with the actual path. For example, if they are based directly off of the document root, insert "%1/$1" exclusively.

a Reminder: As previously, this method assumes the TLD extension is insignificant. In other words, requests for example.org and example.com will be associated with the same domain directory, "example", which is normally what you want. If however, the TLD extension is significant, then replace "%2" with "%1".


The variable, DOMAIN, will always contain the domain obtained from the request or an empty string if the directory couldn't be located (see Section 2.4). Due to how Apache manages existing environment variables during internal redirects (by prepending REDIRECT_ to the name) both of these requirements may be accomplished in one step, as shown in Figure 2.7.

Figure 2.7:

RewriteRule ^ - [E=DOMAIN:%{ENV:REDIRECT_DOMAIN}]

While the variable is guaranteed to be valid, its case is entirely dependent on the user-agent. You should typically use the lc( ) function in Perl to ensure output consistency. This is detailed in Figure 2.8.

Figure 2.8:

#!/usr/local/bin/perl

print( "Content-type: text/html\n\n" );
if( $ENV{ 'DOMAIN' } ne "" )
{
	printf( qq[<P>You requested the %s domain</P>], lc( $ENV{ 'DOMAIN' } ) );
}
exit( 0 );



3. Directory-Mapped Virtual Hosts

There are situations when mass virtual hosting via wildcard domains and subdomains is either infeasible or impractical. This is particularly true if one or more sites exist within a unique directory structure or if several sites coexist under a single directory.

Directory-mapping makes it possible for system administrators to maintain a centralized table of site properties, thus eliminating the need to restrict domains and subdomains to a logical file hierarchy. By far, this method is the most powerful and flexible way to simulate virtual hosts when limited to a per-directory configuration file.

3.1. Source Code Framework

Complete source code for this method is shown in Figure 3.1. It is imperative that this text be inserted near the top of the .htaccess file located at the document root. Placing it elsewhere may cause undesireable results.

Figure 3.1:

Source Code Snippet of /.htaccess

RewriteEngine On
RewriteBase /

####################################################################
##                                                                ##
## Directory-Mapped Virtual Hosts (v3.02)                         ##
## Copyright (c) 2006, Randall E. Krause                          ##
## http://www.perlfoundation.org/legal/licenses/artistic-2_0.html ##
##                                                                ##
####################################################################


Configuration Options

Root-Type Support:  Directory  Executable File  Non-Executable File  External URL

  

~ Caveat: On some shared-hosting configurations, the special DOCUMENT_ROOT variable may not be correct until the rewriting phase is complete (for whatever reason). Therefore it may be necessary to refer to ENV:DOCUMENT_ROOT instead. If that still does not work, then as a last resort you can hard-code the path.

The RewriteEngine directive need only appear once in a per-directory configuration file. Duplicate instances can therefore be removed. Also, it is recommended that the RewriteBase be properly set, otherwise all requests will most likely fail.

3.2. Defining the Site Properties

A table of SetEnvIf directives is used to populate the environment with the properties for each site. This information will later be used in the rewriting process. Every record provides a single pattern match against the HOST header, and a list of one or more attributes that pertain to the corresponding site.

~ Caveat: Use of back-references in the SetEnvIf directive is, unfortunately, not supported prior to Apache 2.0. If you really need this capability or any more complicated logic, you can use the URL rewriting engine. See Figure 3.2.

Figure 3.2:

RewriteCond %{HTTP_HOST}  ^(www\.)?example\.(com|net|org)\.?(:80)?$
RewriteRule ^ - [E=SITE_ADMIN:admin@example.%1,E=SITE_NAME:example.%1,E=SITE_ROOT:/example/%1.html]

The complete list of configurable and non-configurable properties are listed in Figure 3.3.

Figure 3.3:

Read-only properties:

  • SITE_TYPE
    Indicates whether this site's root behaves as a "node" (e.g. non-executable files) or a "tree" (e.g. executable files, directories, and external URLs).

Read-and-write properties:

  • SITE_NAME
    Host name of this site. This defaults to the host name of the server.
  • SITE_PORT
    Port number of this site. This defaults to the port number of the server.
  • SITE_ROOT
    Location of this site's root file or directory. This defaults to the top-level public directory of the server.
  • SITE_ADMIN
    Email address of this site's administrator. This defaults to the administrator email address of the server.

Since no single record is mutually exclusive, it is always possible to assign site properties universally via a less restrictive regular-expression pattern. One possibility is shown in Figure 3.4.

Figure 3.4:

SetEnvIf HOST ^(www\.)?example\.net\.?(:80)?$            SITE_NAME=example.net
SetEnvIf HOST ^(www\.)?example\.com\.?(:80)?$            SITE_NAME=example.com
SetEnvIf HOST ^(www\.)?example\.org\.?(:80)?$            SITE_NAME=example.org
SetEnvIf HOST ^(www\.)?example\.(net|com|org)\.?(:80)?$  SITE_ADMIN=admin@example.com  SITE_ROOT=/example.cgi/

Alternatively, there may be times when mutual-exclusion proves more beneficial, particularly for denying requests of unknown host names (and from HTTP/1.0 clients which do not even send a Host header). See Figure 3.5 for a demonstration.

Figure 3.5:

SetEnvIf HOST ^(www\.)?example\.(com|net|org)\.?(:80)?$  SITE_NAME=example.com      SITE_ROOT=/example.cgi/
SetEnvIf SITE_ROOT ^$                                    SITE_ROOT=/unwelcome.html

All fields are completely optional. This includes SITE_ROOT, which automatically directs requests through the main site when not defined (thus bypassing the rewriting phase for maximum performance).

While the fields SITE_ADMIN and SITE_PORT don't perform any rewriting functions, they are available for later use in CGI scripts. Refer to Section 4 for a practical example.

a Reminder: The value of SITE_NAME does not have to coincide with the host name specified in the request. Be careful to avoid this shortcut for root directories, however, because an external redirect is necessary to enforce trailing directory slashes.

3.3. Root Files and Directories

A site can have either a non-executable file, an executable file, a directory, or an external URL as its root. Each has its own advantages, depending on how the request should be processed.

The behavior of the root in all four cases is summarized in Figure 3.6.

Figure 3.6:

  • Directory
    The root behaves as a "tree" including directories and symbolic links. The request URI is retained as the logical pathname. This type of site is suitable for most applications.
  • Non-Executable File
    The root behaves as a "node" including static files with extensions such as html, txt, jpg, and gif. If the request URI is non-particular, it is ignored; otherwise it is rejected. This type of site is suitable for landing-pages.
  • Executable File
    The root behaves as a "tree" including dynamic files with extensions such as cgi, php, pl, and shtml. The request URI is made available via the PATH_INFO variable. This type of site is suitable for templating-engines.
  • External URL
    The root behaves as a "tree". An external redirect is performed and the request URI is retained as the logical pathname. This type of site is suitable for resource-shortcuts.

While a directory cannot serve as a "node" and a non-executable file cannot serve as a "tree", executable files are the notable exception: They can serve as a "node" or a "tree". The former case offers the convenience of a landing-page with the functionality of a templating-engine, which is particularly useful for parked domains. See Figure 3.7 for a simple implementation using SSI.

Figure 3.7:

Source Code of /misc/parked.shtml

<HTML>
<HEAD>
<TITLE>The future home of <!--#echo var="SITE_NAME" --> is coming soon!</TITLE>
<STYLE>
  P, H1 { font-family: arial,helvetica; }
  H1    { font-size: 28px; }
  P     { font-size: 16px; }
</STYLE>
</HEAD>
<BODY BGCOLOR="#BFBFDD" LINK=black VLINK=black>
<CENTER>
<TABLE BGCOLOR=white BORDER="1" BORDERCOLOR="#444477" WIDTH="600" CELLPADDING="20" CELLSPACING="0">
<TR><TD>
<H1 ALIGN=center>Welcome to <!--#echo var="SITE_NAME" --></H1>
<P ALIGN=center>
This domain is parked until we get our new site online!<BR><BR>
Email us at <A HREF="mailto:<!--#echo var="SITE_ADMIN" -->"><!--#echo var="SITE_ADMIN" --></A>
</P>
</TD></TR>
</TABLE>
</CENTER>
</BODY>
</HTML>

Source Code Snippet of /.htaccess

SetEnvIf HOST ^(www\.)?example\.(com|net|org)\.?(:80)?$  SITE_ADMIN=admin@example.com  SITE_NAME=example.com  SITE_ROOT=/misc/parked.shtml

Rendering of http://www.example.net/


~ Caveat: Since Apache 2.0, it is possible to append a path-info string to both static and dynamic files. While this could be used to induce "tree-like" behavior to a non-executable file, it is not recommended.

For purposes of server optimization, the physical root file or directory is never validated to determine its status at run-time. However, this is unnecessary since the behavior can be easily infered from the SITE_ROOT field: a "tree" always ends in a slash while a "node" does not. See Figure 3.8 for a summary of these important distinctions.

Figure 3.8:

Tree (Directory)
SetEnvIf HOST ^(www\.)?example\.com\.?(:80)?$   SITE_ROOT=/domains/example/

Tree (Executable File)
SetEnvIf HOST ^browse\.example\.com\.?(:80)?$   SITE_ROOT=/domains/example/browse.cgi/

Node (Executable File)
SetEnvIf HOST ^search\.example\.com\.?(:80)?$   SITE_ROOT=/domains/example/search.cgi

Node (Non-Executable File)
SetEnvIf HOST ^welcome\.example\.com\.?(:80)?$  SITE_ROOT=/domains/example/welcome.html

Tree (External URL)
SetEnvIf HOST ^(www\.)?example\.net\.?(:80)?$   SITE_ROOT=//www.example.com/


i Update: In a forthcoming release, it will be possible to affect "node-like" behavior to a directory or an external URL by appending a dot after the trailing slash. Stay tuned.

~ Caveat: A hard-coded query string can be appended to the value of SITE_ROOT, but only in the case of an executable file acting as a "node". This feature is not officially supported as of yet, so please use it with caution. See Figure 3.9.

Figure 3.9:

SetEnvIf HOST ^(www\.)?example\.net\.?(:80)?$  SITE_NAME=example.com  SITE_ROOT=/home.php?template=sample1.html
SetEnvIf HOST ^(www\.)?example\.com\.?(:80)?$  SITE_NAME=example.com  SITE_ROOT=/home.php?template=sample2.html
SetEnvIf HOST ^(www\.)?example\.org\.?(:80)?$  SITE_NAME=example.com  SITE_ROOT=/home.php?template=sample3.html



4. Customized Error Responses

Due to the way that Apache reports certain errors, the internal redirection is often revealed to the end-user. In such situations, you may want to generate your own error responses. A drop-in replacement, fully compatible with directory-mapped virtual hosts, is provided here for your convenience.

First, create a file entitled "error.cgi" within the cgi-bin directory. Insert the Perl source code shown in Figure 4.1. Afterwards, be sure to enable the file's execute permissions.

Figure 4.1:

Source Code of /cgi-bin/error.cgi

#!/usr/local/bin/perl
use strict;

####################################################################
##                                                                ##
## Customized Error Responses (v1.1)                              ##
## Copyright (c) 2006, Randall E. Krause                          ##
## http://www.perlfoundation.org/legal/licenses/artistic-2_0.html ##
##                                                                ##
####################################################################

our( $Output, $ErrCode );
our( $SiteType, $SiteRoot, $SiteAdmin ) =
(
        $ENV{ 'SITE_TYPE' },
        $ENV{ 'SITE_ROOT' },
        $ENV{ 'SITE_ADMIN' }
);
our( %StatusCodes ) =
(
        403 => "Forbidden",
        404 => "Not Found",
        410 => "Gone",
        500 => "Internal Server Error"
);

if(	$SiteType eq "node" && $SiteRoot =~ m/^\/(.+?)(\?.*)?$/ &&
	!-f( "$ENV{ 'DOCUMENT_ROOT' }/$1" ) ||
	$SiteType eq "tree" && $SiteRoot =~ m/^\/(.+)\/$/ &&
	!-d( "$ENV{ 'DOCUMENT_ROOT' }/$1" ) && !-f( _ ) )
{
        $ErrCode = 500;
}
else
{
        $ErrCode = $ENV{ 'REDIRECT_STATUS' };
}

$Output = qq[
<HEAD>
<TITLE>$ErrCode $StatusCodes{ $ErrCode }</TITLE>
</HEAD>
<BODY>
<H1>$StatusCodes{ $ErrCode }</H1>];

if( $ErrCode == 500 )
{
        $Output .= qq[
<P>The configured site is either not available or not accessible. Please notify the administrator.</P>];
}
else
{
        $Output .= qq[
<P>The specified resource "$ENV{ 'REDIRECT_SCRIPT_URL' }" is either not available or not accessible.</P>];
}

$Output .= qq[
<HR>
<ADDRESS>For more information, contact &lt;<A HREF="mailto:$SiteAdmin">$SiteAdmin</A>&gt;</ADDRESS>
</BODY>
</HTML>];

printf( "Status: %d\nContent-Type: %s\nContent-Length: %d\n\n",
        $ENV{ 'REDIRECT_STATUS' }, "text/html", length( $Output ) );
print( $Output );

exit( 0 );

Second, insert the appropriate ErrorDocument directives near the top of the .htaccess file located at the document root. You may override any or all of the 403, 404, and 410 status codes as depicted in Figure 4.2. The order is not significant.

Figure 4.2:

Source Code Snippet of /.htaccess

ErrorDocument 410 /cgi-bin/error.cgi
ErrorDocument 404 /cgi-bin/error.cgi
ErrorDocument 403 /cgi-bin/error.cgi

Now, whenever the given error conditions occur, the end-user will always be provided a friendly response and one that does not divulge the internal rewriting process.

Figure 4.3:

Rendering of /images/missing.gif



Copyright © 2006, Randall E. Krause. Some Rights Reserved.
For questions or comments, please contact randall@searstower.org .