GitHub - strlcat/ryshttpd: ryshttpd -- simple filesharing http server.

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 129 Commits
.gitattributes		.gitattributes
.githooks.tar.gz		.githooks.tar.gz
.gitignore		.gitignore
COPYRIGHT		COPYRIGHT
Makefile		Makefile
README		README
VERSION		VERSION
args.c		args.c
client.c		client.c
clientlist.c		clientlist.c
conf.c		conf.c
config.h		config.h
conv.c		conv.c
date.c		date.c
dir.c		dir.c
env.c		env.c
error.c		error.c
fnmatch.c		fnmatch.c
getpass.c		getpass.c
getpasswd.c		getpasswd.c
getpasswd.h		getpasswd.h
headers.c		headers.c
htaccess.c		htaccess.c
htcrypt.c		htcrypt.c
httpd.c		httpd.c
httpd.h		httpd.h
htupload.c		htupload.c
htupload.conf		htupload.conf
htupload.html		htupload.html
index.c		index.c
io.c		io.c
io_stream.c		io_stream.c
log.c		log.c
memory.c		memory.c
mime.c		mime.c
mimedb.h		mimedb.h
misc.c		misc.c
netaddr.c		netaddr.c
netio.c		netio.c
pwdb.c		pwdb.c
regexmatch.c		regexmatch.c
resolve.c		resolve.c
resource.c		resource.c
resources.h		resources.h
response.c		response.c
response_codes.h		response_codes.h
rsrc_about_uuid.h		rsrc_about_uuid.h
rsrc_about_uuid.html		rsrc_about_uuid.html
rsrc_download.h		rsrc_download.h
rsrc_download.png		rsrc_download.png
rsrc_error.h		rsrc_error.h
rsrc_error.html		rsrc_error.html
rsrc_favicon.h		rsrc_favicon.h
rsrc_favicon.ico		rsrc_favicon.ico
rsrc_robots.h		rsrc_robots.h
rsrc_robots.txt		rsrc_robots.txt
rsrc_style.css		rsrc_style.css
rsrc_style.h		rsrc_style.h
rsrc_view.h		rsrc_view.h
rsrc_view.png		rsrc_view.png
say.c		say.c
signal.c		signal.c
skein.c		skein.c
skein.h		skein.h
str.c		str.c
strxstr.c		strxstr.c
tfcore.h		tfcore.h
tfctrapi.c		tfctrapi.c
tfdec.c		tfdec.c
tfdef.h		tfdef.h
tfenc.c		tfenc.c
tfxtsdec.c		tfxtsdec.c
tfxtsenc.c		tfxtsenc.c
url.c		url.c
usage.c		usage.c
xmalloc.c		xmalloc.c
xmalloc.h		xmalloc.h
xmemmem.c		xmemmem.c
xstrlcpy.c		xstrlcpy.c

Repository files navigation

ryshttpd -- simple filesharing http server.

"Sometimes you just need a web server. Not an Apache, and not even nginx.
A thing which you may easily build for any existing Unix system, just
remember some easy cmdline arguments and have it share a directory."

ryshttpd is a simple forking Unix HTTP server designed to be:
- portable, so it does not require any special features from OS kernel,
- file oriented, quickly gives out files as soon as possible,
- small, so it does have minimal dependenices to setup, and not a memory hog,
- plain, so it is VERY easy to build and configure it.

It supports:
- full and partial file transfers of almost any size (up to 2^64 bytes),
- attempts to read file anyway, even if it is "virtual" or "special",
- good directory listings,
- running CGI/1.1 scripts and programs and piping data to them,
- determining arbitrary MIME types by using libmagic (optional),
- following symlinks, even to outside of http root directory.

ryshttpd is considered a modern successor of such small servers
as mini_httpd and darkhttpd, but with more functionality.

Features:
- small, simple, lightweight both in binary and memory sizes, self hosted,
- single executable: little or no dependencies (only libmagic), no installation needed,
- works completely without configuration file involved, or any other filesystem entries,
configured purely by command line arguments,
- emits directory listings, or index file,
- direct read() calls to read a file,
- no special APIs involved, minimum build time configuration,
- is able to run CGI 1.1 scripts, piping input to it and output to client,
- multiple CGI modes: CGI emits it's own headers, CGI appends it's own headers,
- is able to run single CGI script for every request (a so called "CGI server" mode),
- configurable HTTP keep alive support,
- supports IPv6,
- limits number of concurrent client connections,
- limits IPv6 connections to a /64 IPv6 subnet, not to a /128 single IPv6 address,
to prevent possible DoS/resource exhaustion attacks,
- htaccess support (see below) with good enough homebrew rewrites,
- xrealip support (setting "admin" proxy address and listening for X-Real-IP header),
- simple forking server: creates a child per each connected client,
- supports switching privileges, tuning them down to setuid/setgid and grouplist,
- supports switching file system root (chroot), and dropping privileges after such a switch,
- lookups user info at fs root, then chroots, then switches privileges to a previously looked up information,
- has embedded resources such as error page template, style.css, favicon and robots.txt,
- has network speed limit mechanism, both for download and upload (headers are not limited),
which is also may be configured inside .htaccess per directory or on rewrite rule match,
- supports specifying partial file transfers as a parameter to a file in form
like '/file.jpg?range=128-1024',
- special parameters to (attempt) force browser to display content inline with
'/file.jpg?vi=1' to view inline and '/file.jpg?dl=1' to force show a download box,
- path restoration support, so directory listings behind a path rewriting frontend
are not messed up,
- Simple, yet extendable codebase, accurate IO layer, source of reusable well commented code.

Security:
- parses client request after privilege switch,
- uses it's own memory allocation routines designed to prevent buffer overflows,
to catch programming bugs and to ease debugging a bit,
- filters incoming data to match pure ASCII text or urlencoded data,
- filters unsafe path parts which try to break http root (such as "/../"),
- ensures that request path must start from a '/' character,
- escapes HTML characters that came from user to prevent CSS/XSS attacks,
- filters attempts to include %00 NUL bytes,
- filters logs so tty control characters are escaped,
- drops stale or stuck clients and their connections,
- listening master server is not connected to a client in any way,
other than logging pipe and SIGCHLD handling and pid resolving function,

Standards:
- conforms to HTTP 0.9/1.0/1.1 specs.
- currently supports only HTTP GET, HEAD and POST (for CGI) requests,
- POSIX.1-2001, C99 conformant, gcc 4.2.1 compatible, no special apis or syscalls used,
- POSIX.1-2001 extended regex matching engine,
- At least builds fine on musl system with only _BSD_SOURCE defined.

To be considered (long term):
- authenticated (somehow; over TLS?) file uploads using HTTP PUT request,
- FastCGI/SCGI support,
- Move to poll() instead of forking,
- HTTP/2.

Not TODO (harmful or junk stuff):
- virtual hosts of any form (other than restricting to regex),
- Pure dialogue HTTP authentication of any form / 401 HTTP code,
- Modules of any form and any other pluggable in code,
- Server side includes.

RATIONALE

ryshttpd is written in spare time since Jan2018 by Andrey Rys. It's primary goal is to
replace mini_httpd instances in the author's home and office infrastructure.

ryshttpd was written to serve Really Large Files such as disk images. It also must
serve partial file transfers of any length, such as big internet radio captures.

While mini_httpd served fine for many years, author did not looked into it. It lacked
support for things like X-Real-IP header, tried to enforce sendfile() usage with 2003
incompatible semantics and by 2018 it was awkward to use. When it was patched out of
these fallacies, author discovered that it's unable to transfer a 1GB file with
mini_httpd: it simply tried to read the _whole file_ into memory before sending it
to a client. This resulted in an OOM Killer action.

ryshttpd must be small such as busybox httpd or mini_httpd, really, but must be flexible enough.
I gained great experience during writing my own access(8) program, and I wanted ryshttpd
to be the same flexible and reusable thing. I took much code from access(8) to form it.

I also dreamed about writing my own httpd since even 2007. But I lacked experience for years,
and only now I ready to write one.

ryshttpd also is a future teststand for my experiments with it. Things like FastCGI or
TLS support may end up there, and will form a basis for a more advanced HTTP server.

ryshttpd is going to be maintained for a long time, unlike mini_httpd, which looks like
abandoned. Even if there are releases, internally code is still very messy, 2003 style.

HTACCESS

There is the only filesystem thing on which ryshttpd optionally relies: .htaccess files.
ryshttpd chose not to have an individual configuration file of it's own, so altering
certain (specific!) behavior of ryshttpd requires them to be present.

The .htaccess file format is NOT compatible with any of existing implementations.
Nor Apache or any other syntax is supported. They're have my own syntax.
But .htaccess file name is matched recursively (Apache style).
Of course, .htaccess file is forbidden from accessing outside.
Naming of .htaccess file may be changed with command line option (see `ryshttpd -h` for help).

Note that htupload.cgi component included does NOT check for a forbidden
.htaccess file name to upload. You should check it yourself!

"allow IP/subnet/all": permit access to a directory,
IP[/subnet] specifies IP address or subnet to which rule applies,
while "all" specifies that anyone may have access.
"deny IP/subnet/all": deny access to a directory.
The syntax is same as for "allow".
"done": stop further parsing htaccess file at this line. May be applied inside
"rewrite"/"rematch"/"matchip" rules, see their description below.
"httproot /path/to/newroot": change virtual HTTP root to arbitrary location pointed to
by /path/to/newroot. The location must be accessible within the current rootfs tree,
and permissions must allow access to it to the current privileges of ryshttpd process.
"secure_httproot yes": lock further httproot calls inside, disallowing them to ascend out.
Cannot be unset (so values other than "yes" do not work here).
"return HTTPcode": immediately return an HTTP code.
If such code is not supported by ryshttpd (see response_codes.h),
then a generic 500 Server Error code will be returned instead.
"header Name Value": append custom header or replace existing one.
While "Name" cannot contain spaces, "Value" can.
if a "Value" is not specified, this instructs httpd to remove such header permanently.
"redirect URL": immediately redirect user to a URL, with 302 status code.
URL may be absolute (begin with protocol and such), or relative.
redirect returns a 302 Found code and proper Location header.
"movedto URL": the same as redirect above, but it returns 301 status code.
301 status code usually means that resource was renamed or moved
permanently and the accessing URL should not be referred as it was before,
while 302 status code just does temporarily redirection.
These two are just to provide an ability to administrator to choose from.
"noindex yes": restrict indexing of directory, if it has no index file.
If this directive is applied, an attempt to index directory will return 403 error.
This is applied to all indexing modes: both HTML and plain text (with "txt=1" query string).
However, all files inside that directory will be accessible (if client knows their names).
The proper Unix way maybe is to put index.html into that directory, or
set access mode so that http server cannot get file list for it.
This is just another way to disable directory indexing.
Note that this directive is NOT recursive: subdirectories will be indexed.
"tar yes": allow to download directory as a whole TAR archive.
By default, ryshttpd does not allow this because code which emits tar archives eats
some more memory than the usual, and it's currently recursive (as busybox tar).
This is embedded feature. It does not require external tools.
Please see more info in "DIRECTORY DOWNLOADING" section of this document.
"cryptpw string": set symmetric cryptographic password for filesystem files or TAR archive dumps.
Any files served then become encrypted, using password string supplied.
See further CRYPTOGRAPHY section for details about how to decrypt received files.
"regex_no_case yes": turn off regex sensitivity, so that you'll able to match both
"Apple" and "apple" with the same "/(apple)(|/.*)" regex pattern.
Note that if you will give the same command with opposite direction, then
case sensitivity matching will return back to normal.
-O regex_no_case affects this, but until this command is given.
This command works recursively for all the htaccess lookup tree.
"hideindex regex/none": hide these items from directory listings. regex specifies all the
patterns which files or directories must match to be hidden.
The patterns are recursive for child directories. They also implicitly inherited
through pattern string concatenation, but this inheritance may be turned off by first
specifying a "hideindex none" rule, then specifying overriding rule.
To hide files behind 404 error, you should write appropriate
rewrite/rematch rule based on req_path instead.
"ratelimit number": limit network speed (rate), both upload and download, by this number.
The number may be specified as a raw number of bytes (for example, 1048576),
or as a simple, non fractioned, human readable number (for example, 80k).
The speed will be forcibly limited to this value, but you should get the fact
that it depends on a network read/write buffer size, so small numbers up to 10M
will work flawlessly but larger ones may require you to extend the buffer size
with -O rdwr_bufsize option up to 131072 bytes.
"ratelimit_up number": same as ratelimit, but applied only to upload (from client to server) speed.
"ratelimit_down number": same as ratelimit, but applied only to download
(from server to client) speed.
"rewrite spec [!]pattern destination": the experimental rewrite service. It will match pattern
directed by spec, parse destination string and overwrite it as the new pathspec for
request to continue, then it will partially restart the request again without
incrementing internal state. Please see htaccess.c source file for more info.
An example is something like that: `rewrite req_path "/w00tw00t.*" "/bots.html"`
Rewriting to internal resources is permitted too. FS lookups are restarted for
single matched rewrite. Note that rewrite is NOT recursive at this time!
Quotes surrounding pattern and destination are mandatory.
If '!' is specified before pattern, then match is inverted.
"rematch spec [!]pattern destination": same as "rewrite", but works recursively.
CAUTION: improperly or vulnerable rematch rules may hang your httpd child
into a infinite loop! You must think three times before placing one.
"matchip [!]netaddr htaccess_cmd": match client IP address by specified netaddr subnetwork,
then execute any htaccess_cmd htaccess command listed above. The command permits
executing rewrite rule. Also, rewrite rule permits executing matchip command.
They even can go recursively.
If '!' is specified before netaddr, then match is inverted.

Comments are supported, but only at the beginning of the line.
Inline comments starting after a config item are NOT supported.

Please note that, as with access(8), .htaccess files are considered an admin maintained thing.
The same rules apply here too: if a syntax error causes ryshttpd to fail or even segfault,
it's an admin duty to fix it and know a valid syntax. ryshttpd will NEVER hint you about a
valid syntax or point to a syntax errors because that's an added unnecessary code.

LIBMAGIC

ryshttpd is able to resolve file types very accurately, but using only libmagic.
if built-in statically, this bloats ryshttpd binary somewhere up by 200k, and requires
libmagic database containing mime types of desired files. Although MIME lookups then
are very accurate (if using an official `File` package database), space requirements,
especially on embedded systems arise.

For this reason it was considered to make it optional and provide a static MIME
database (by default) as well. But then memory requirements shift from storage to runtime:
regex engine typically wants it's own largy memory allocations while compiling patterns.
For this reason the provided default MIME database (mimedb.h) is not much big.

So _accurate_ MIME lookup always requires space, either storage or runtime.

RATE LIMITING

ryshttpd supports simple (wget style) network rate limiting by accurate userspace sleeping.
Because each client connected gains it's own process, the sleeping strategy is adequeate.

Both download and upload rates may be limited, and both globally, or via .htaccess rule.

Rate limit numbers are expressed in bytes per second (kilo-, mega-, giga-), these numbers
are expressed as powers of two numbers instead of powers of 10 as casual user may think.
(hence the 256k is 256 * 1024 but not 256 * 1000).

The intermediate I/O buffer (so called workbuf in client.c) gets divided into smaller chunks,
and the whole second of transfer is also gets divided into fractions. The number of chunks
is not fixed: it starts from 32 and multiplies by two if rate limit is higher. The core idea
is that the workbuf shall never be increased: instead, time (second) gets fractioned to fit
the workbuf. Then when sending a file, ryshttpd works with workbuf in a chunk idea: it sends
each chunk separately, measuring time spent in sending, then calculates the difference. If
client retrieved chunk faster than desired, then ryshttpd sleeps the remaining time,
effectively limiting network speed. This happens for each complete chunk. On incomplete chunk
the same strategy is applied, only calculating remaining time for the size of that chunk.
Because workbuf can make incomplete tail chunk, it is also required to calculate a time value
of that last chunk. It maybe slightly inaccurate due to dividing and multiplying, but in
practice it shown great accuracy.

The only "bug" hides when admin requests rate limit of 10M(bytes) or more. The small workbuf
causes the second of time to be fractioned even more (2048 or 4096 times or more), so ryshttpd
starts spending time inside nanosleep "borders" / clock_gettime syscalls instead of normal sleep.
This does not give a user retrieving a file any privilege, instead, the speed is limited
more than specified (tests shown with 32k workbuf a 100M limit resulted in 70M limit). This is
_easily_ solvable by increasing workbuf size with -O rdwr_bufsize= cmdline option. The default
value now is 32k, but it can be extended to 128k and 100M rate limit will work accurately.

REWRITE SERVICE

ryshttpd provides an incompatible rewrite for those who like to "beautifulise" their URLs.
But rewrite service provides more than simple URL rewriter. It embeds regex matches and
some builtin variables to match against.

The syntax is plain and might be confusing from the beginning:

rewrite <varspec> "regex" "<replacer/.htaccess rule>"

, where <varspec> is a single or multiple specificators of ryshttpd recognised variables,
divided by comma (multiple specificators values are concatenated into single string),
"regex" is a total regex pattern to match specificators built string against,
and "<replacer/.htaccess rule>" is either replaces matched string with parsed one,
or applies a .htaccess rule from the list above.

Take this rule for example:

rewrite req_path "/data/images/[^/]*\.jpg" "/data/images/lynx.jpg"

It will replace every "/data/images/image.jpg" with "/data/images/lynx.jpg" and it will continue
executing the request with new, rewritten request value, which will become "/data/images/lynx.jpg".
The rule matches "/data/images/image.jpg", "/data/images/image2.jpg" but not "/data/images/img/image.jpg".

A rate limiting rule example:

rewrite req_path "/cgi/upload\.pl" "ratelimit_up 100k"

It will apply "ratelimit_up 100k" .htaccess rule to anything which request path matches
the specified "/cgi/upload.pl". Another example:

rewrite clinfo_proto "http" "movedto https://%{hdr_host}%{req_request}"

does a modern HTTPS everywhere redirects to a secure website location, if a client
is matched to be attempting browsing with plain HTTP protocol.

More complex example:

rewrite hdr_user_agent,%{@},req_path "(.*(w|W)get.*)%\{@\}(/data(|/.*))" "return 403"

denies any client identified itself as wget of any kind to retrieve things from /data.
Note "%{@}" between two commas. It is a separator that gets added into pattern as is.
It also echoed in regex. Because client cannot introduce %{} fmtstr templates into
supplied data, it is to be sure that there was no violation from the client side
(however in most cases, a single '@' is enough).

Regex machine permit you to build more sophisticated rules, not only examples I provided here.
It must be unlimited in ryshttpd, so you're free to implement as much as complex rules.

There is also recursive rewrite available. Just use "rematch" instead of "rewrite" keyword.

HTUPLOAD

ryshttpd includes a handy htupload program which downloads file from client, stores it
into specified directory by configuration and shows a message, or sends him a webpage.
The program is a CGI compatible. It is created to fill the gap of missing HTTP PUT
method which ryshttpd currently does not implement.
It reads a configuration file htupload.conf which must reside in the directory where it
was called from (hence, $PWD). The following parameters it understands:

"upload_dir /path": absolute or relative path to a directory where to put the uploaded files.
It should exist. htupload will never try to create directories or directory trees.
"max_file_size fsize": an exact size in bytes to limit the uploading file size.
Please note that if size is exceeded (as reported to program by CONTENT_LENGTH
CGI environment variable), the program will interrupt a POST upload _immediately_,
leaving connection and your browser will probably display an "Connection was reset" error.
There is NO any way to "successfully" interrupt a progressing upload. Sorry.
The same behavior will be there is any other error will occur.
"allow_overwrite": allow to overwrite files (uploading them again over). This may help
to "resume" interrupted transfers, but also this may lead to abuse. By default,
htupload denies to overwrite existing files.
"log /path": path to a writable log file. It will contain date of upload, IP address of
uploader, file name and it's size. By default htupload never logs anything.
Failure to write the log item will yield an error to uploader.
"success_page /path": read and send this webpage to the uploader. The path is absolute or relative
to an existing file (maybe outside of server root). Currently there are no any
templates residing in webpage are replaced with file names, sizes etc.
This is to be implemented in future versions.
"success_message message": display this message to uploader. The message is sent as a plain string.
The parameter may contain any characters you like, including space.
"forbidden_filenames regex": a regex string denying uploading certain file names, types etc.
If those file names are matched, the transfer is interrupted with an message.
"allowed_filenames regex": an exclusion to forbidden_filenames parameter. You may deny all
files with forbidden_filenames and permit a subset of them here.
It's not effective without forbidden_filenames previously set!

DIRECTORY DOWNLOADING

ryshttpd supports downloading the directories as a single file (currently, a POSIX TAR archive).
You can trigger such a download by adding a "?tar" single argument to a URL query string.
Note that no other arguments are listened to, and feature must be enabled from .htaccess file
with a "tar yes" string (you can use rewrite/matchip rules together too).

What is supported and what is not:
- Long file/directory names are SUPPORTED,
- Unicode or other arbitrary encodings are SUPPORTED,
- Hardlinks are NOT SUPPORTED. There is a much memory hog to collect all of them and compare,
- Files bigger than 4/8G are SUPPORTED natively by ryshttpd, but somewhat is broken with
busybox tar, which is buggy about base256 file size field encoding,
- Archive items sorting currently NOT supported, but it could be easily implemented,
- Keep-Alive requests to tar archives are NOT supported. Built-in tar archiver works just like an
ordinary CGI script, and (currently) should be treated as such,
- Owner and group info is currently forged, as well as file and directory modes,
- No support to store items others than directory, file (regular) and long filename,
- Compatible with GNU tar, busybox tar and bsdtar/libarchive,
- By default it is disabled at runtime and must be enabled from .htaccess file.

CRYPTOGRAPHY

Starting from Rel.116, ryshttpd now includes a lightweight Threefish512/768 symmetric encryption algorithm.
The purpose of it is to enable private file transfer, to which password (key) is known only by authorized persons.

Unlike traditional authentication methods like WWW-Authenticate, forms, cookies or other "guarding fence" methods,
ryshttpd simply encrypts the file it serves without asking for password or any other sensitive data to be transmitted
or at least assumed by both parties, then proceeding to sending file data in cleartext.

The burden of decrypting such file contents once sent to client lies purely on client itself.

Here is specification and encrypted file format:

- Block size, bytes: 64 (512 bits)
- IV size, bytes: 64 (512 bits)
- Internal master key size, bytes: 96 (768 bits), two keys for XTS mode
- XTS key is derived from master key

- Threefish512/768 is used in single block XTS mode. IV is derived from the key itself.
- Server side key is derived from password specified either globally or in .htaccess file.
- Password can be of any length, is not salted.

Please note that encryption facility here does not do a strong password to key conversion,
to prevent possible DoS attacks on server itself. Encryption key is derived from password
just by doing a single hash iteration over it with Skein hash function.
It is believed that underlying cipher is strong enough to provide required brute force resistance.

The embedded cryptography is subject to change at any time and now it is not considered stable.
Hence, here is a program included, htcrypt, is to help to decrypt current protocol encrypted files.

Please also note that partial transfers with encrypted files are problematic. You always shall
align to a cipher block size boundary when continuing transfer, otherwise, server will refuse
such a transfer to unaligned boundary. This will only happen once server is required to encrypt a file.
Use "truncate" Unix tool to strip unaligned tail and continue download as usual.

PORTABILITY

It was confirmed that ryshttpd runs on Linux and it probably would work on other (modern)
Unices too, such as flavors of BSD. It was also successfully built and running under
Cygwin (but not Mingw). The code however aims to be as portable as it possible.

WORK MODEL

Currently, ryshttpd works as a simple "forking" server: it accepts a connection and creates
a separate process for the single client. This has benefits and drawbacks, notably:
Benefits:
- forking is slightly more secure, as the client task is separated from server (master) task,
- since forking creates processes, the child can change credentials,
- forking is easy to implement (at the beginning),
- forking ensures that all available hardware is used ("native" threading),
so, for example, encryption is run on each core per each child process separately,
- forking maybe more portable, especially to very old platforms,
- administrator can limit total amount of clients simply with resource limits.
Drawbacks:
- forking is slow if you need a really high performance service (like nginx),
- forking wastes more resources per each child process,
- forking does not randomise child address space again, and by design never will.

ryshttpd currently targets itself as an easy replacement for such simple servers as mini_httpd,
offering saner and cleaner code, more features and not taking much space in return.

As the author will learn more efficient request processing techniques, he will implement them
in ryshttpd sooner or later, and this server will not be the same through the time.
So expect changes.