Building a CGI-powered app on moontowercomputing.club!
In the past week I've learned about CGI and generally how web servers work. I've put that knowledge together into a pretty cool webapp and along the way I've found some "best practices" that I think is worth sharing!
Motivation/Pictochat
When I was a nerdy middle schooler, I used to sneak my DS to school.
During classes, I'd talk to my other sneaky friends with pictochat! It
was such a fun and thrilling experience, and I've never really felt that
excitement with any other chat platform. I figured that building an app
similar to pictochat would be a great way to stress test the CGI server
and iron out any kinks. And there were many kinks indeed - mostly around
process management and caching/streaming, but maybe that's for a
separate blog post! However, I'm now pretty confident that other users
on moontowercomputer.club
are able to build anything they
want with CGI, and I think I also made a fun little toy to boot!
What is CGI?
Common Gateway Interface is a family of protocols for webservers to
communciate with arbitrary applications to service HTTP requests -
basically, instead of serving a static file for a request, or routing to
a full-fledged HTTP server, we can have small, individual
scripts/programs that run when an end point is accessed. For example, if
you go to
https://moontowercomputer.club/~USER/cgi-bin/foo.sh
,
instead of giving you the contents of the file
/home/USER/public/cgi-bin/foo.sh
, foo.sh
is
executed and the output is sent to the browser. It is assumed that
foo.sh
will output a stream where any HTTP headers are sent
(e.g. Content-Type: text/html
) followed by two blank lines,
and then the remainder of the stream is passed along to the browser
directly (notably, it isn't assumed that the stream contents are ASCII
after the headers - allowing scripts to send back images/media, or any
binary data).
On moontowercomputer.club
CGI scripts receive a number
of variables in the environment, including any query parameters
($QUERY_STRING
) and for POST/PUT
requests, the
request body can be read from STDIN, allowing programs to fully
implement a sort of ad-hoc server. If you've ever worked with
"serverless" applications before, this is vaguely similar.
App Architecture
The app is relatively straightfoward. Broadly, there are three main operations we need to implement in the backend:
- Take in new messages and save them to a database
- Provide a stream of new messages being added to the database to send to the frontend
- Delete old messages
For 1
- we can have a POST endpoint that consumes a
messages and saves it to a sqlite file. For 2
- we can use
inotify
events to detect updates to the database and use an
event stream to send new events back to the frontend. And finally for
3
, a simple GET endpoint that triggers a script to delete
rows from the database will suffice.
This isn't the best architecture and there's so many ways it can be made better. But the point of this project wasn't the build the best pictochat clone, it was just to build a pictochat clone that would let me find out if everything in the CGI path worked.
The most interesting part of this was figuring out how to get
2
to work. I learned about
Content-Type: text/event-stream
, which is used for
streaming text to a client. On the javascript side, this can be
connected to with an EventStream
, as long as your data is
in the form data: {data}\n\n
for each message. I realized
that there could be instances where the connection drops, so I added an
additional query parameter on the endpoint so the client can provide a
timestamp to indicate that they only want to subscribe to notifications
newer than some known timestamp. This is needed because when a new
client connects, the default behavior is to send all currently known
messages to the client before waiting for new events.
It's also important to note that when working with streams, it's
probably best to also sent the header X-Accel-Buffering: no
and Cache-Control: no-cache
to disable caching (in
nginx
and the browser respectively) and ensure timely and
reliable delivery of messages.
Additionally, you also need to send some kind of heartbeat to prevent the connection from being closed.
Putting it all together, the CGI side of my app can be found at https://github.com/aneeshdurg/moontower/blob/main/.nosrv/apps/chat/cgi-bin/chat.py
Process Management
If you look at the code you may wonder how processes are managed in
the event of things like client disconnects when listening to a stream.
Essentially, the CGI server detects that the connection is closed, sends
a SIGTERM
to the CGI script, waits for a little while, then
sends a SIGKILL
if the process is still alive. For my
application, this terminates the CGI script, but leaves zombie processes
for any children I spawned. To get around that I added a bit of
code:
def kill_on_exit(p: subprocess.Popen):
p.terminate()try:
=1)
p.wait(timeoutexcept subprocess.TimeoutExpired:
p.kill()
p.wait()
# This is the method that services the streaming request
def listener():
print("Content-Type: text/event-stream")
print("X-Accel-Buffering: no")
print("Cache-Control: no-cache")
print("\n\n", end="")
...
= subprocess.Popen(
p "inotifywait", "-m", db, "-e", "modify"],
[=subprocess.PIPE,
stdout=subprocess.DEVNULL,
stderr
)lambda: kill_on_exit(p)) atexit.register(
The atexit
callback will now ensure that my child
process is cleaned up correctly when the client disconnects.
File structure
Serving files can be daunting - especially when you want to ensure
that you don't accidentally make you database downloadable. While I
could keep my app, the database, and any resources that don't need to be
served outside of the ~/public
directory, it makes
development and deployment messy. To combat that, I found a nice method
that I wanted to share with everyone.
Here is a simplified view of what my ~/public
directory
looks like:
.
├── .git
│ └── ...
├── .nosrv
│ ├── .gitignore
│ ├── apps
│ │ ├── ...
│ │ └── chat
│ │ ├── app.py
│ │ ├── cgi-bin
│ │ │ └── chat.py
│ │ ├── chat.sqlite3
│ │ └── static
│ │ ├── index.html
│ │ ├── script.js
│ │ └── style.css
│ ├── pydb
│ │ └── ...
│ └── setupvenv.sh
├── cgi-bin
│ ├── ...
│ └── chat -> ../.nosrv/apps/chat/cgi-bin/
├── chat -> .nosrv/apps/chat/static/
└── index.html
My entire ~/public
directory is a git repo (which you
can see publicly at https://github.com/aneeshdurg/moontower). The
nginx
config is set to never serve any hidden
files or directories, so things in .git
and
.nosrv
are not publicly accessible via https://moontowercomputer.club/. This means that I can
keep things like my database inside .nosrv
without the
possibility of users downloading the entire database. To make something
visible publicly, I symlink it into
somewhere in ~/public
.
The scripts that manage the database all run inside of a venv to make
dependency management easy. I've documented the command I used to create
the venv here.
The CGI script itself just uses the system python since it spawns a
subprocess that calls my database scripts, but I could have used the
venv
interpreter by pointing to it in the shebang line. In
a way, my current approach naturally led me to rediscover the MVC
pattern and gain a greater appreciation for it.
This system has been working for me pretty well. I now have three
different "services" in .nosrv/apps
, and organizing files
and resources for each "service" has been pretty easy and
self-documenting with the symlinks!
Conclusion
I think before I started on this journey, I would have felt like
building an app like this was best done by spinning up a webserver like
flask
and hosting on the free tier of some cloud provider
like AWS. However, I think nginx+cgi+self-hosting
is an
underrated approach for building fun weekend projects and making cool
things to share with the world around us. CGI scripts are awesomely
powerful as they let users build complex services without needing to
reserve ports, manage long-lived server processes, or modify the global
webserver config. I definitely learned a lot - during my debugging
sessions when the CGI stuff wasn't working, I even ended up cloning and
modify the nginx source code to better understand how the fastcgi model
works lol! Hopefully this post helps/inspires others to build as well.
Happy Hacking!