Chrysanthemum, an overengineered Discord bot
My partner co-runs the unofficial Roblox Discord server, one of the single largest Discord servers in existence, with almost 800,000 members as of this writing. Managing a server this large requires a lot of assistance from bots for moderation. In an effort to help her out, I’ve spent the past six months on and off writing perhaps the most overengineered Discord bot around: Chrysanthemum, a content filtering bot.
Goals
Chrysanthemum has a couple of goals that I was trying to meet while developing it:
- Performance. Given the sheer scale of the Roblox server, Chrysanthemum needs to be as performant as possible. At busier times the server sees multiple messages per second, which is a per-server load that you don’t really see anywhere else.
- Stability. This bot is the first line of defence against a wide array of malicious content, including phishing links, slurs, and abusive behavior. Downtime needs to be minimized and the bot needs to be able to recover from crashes.
- Configurability. The filter settings for the Roblox Discord change on a daily basis. We need to be able to handle this without taking the bot down every time we update the filter.
Technology
Chrysanthemum, like most of my hobby projects, is written in Rust. I initially started off trying to write my own wrapper for the Discord API, but I gave up on that - it was just too much work for me to do on my own, and I wanted the bot to see the light of day sometime. I switched to Twilight instead, which does everything I want without me having to write the API interactions 😅.
I use serde_yaml
for configuration file loading, and regex
as my regex crate of choice. Serde allows for a very rich configuration file parser that gives reasonable errors. regex
’s subset of regex syntax promises linear-time matching in all cases, which is important when we’re evaluating patterns against user-provided text.
Architecture
Most of Chrysanthemum is stateless; filters apply to messages regardless of other messages sent. This means that Chrysanthemum can handle message events on multiple threads without worrying about having to interact with global state. The one exception to this is spam filtering. It’s not sufficient to detect spam on a per-message basis; in order to detect things like duplicates, you also need to check the past messages of a user.
Spam filtering is stored in the following structs:
pub struct SpamRecord {
content: String,
emoji: u8,
links: u8,
attachments: u8,
spoilers: u8,
mentions: u8,
sent_at: u64,
}
pub type SpamHistory = HashMap<UserId, Arc<Mutex<VecDeque<SpamRecord>>>>;
SpamHistory
is further wrapped in Arc<Mutex<SpamHistory>>
for storage in Chrysanthemum’s application state. This nested-mutex approach ensures that Chrysanthemum, over time, will be able to independently alter multiple users’ spam records at the same time. We don’t want to have to take a global write lock on the main map every time a user sends a message; that will force Chrysanthemum to handle messages essentially single-threaded.
Performance
Chrysanthemum is incredibly performant, consuming less than 50MB of memory and using barely 1% of a CPU core when running it locally. It’s an event-driven application; the more events that come in, the harder it has to work, and the volume of events that the Roblox Discord generates isn’t enough to cause any serious performance bottlenecks. There are a couple of cases where we run into Discord rate limits, but the bot handles those reasonably gracefully.
Configuration management
Chrysanthemum doesn’t provide a web UI for configuring the bot. At the moment, configuration is done through a GitHub repository that authorized users have access to. A cron job pulls down the latest configuration every few minutes, and the bot will load configurations from disk every few minutes as well. This allows the filter configuration to change without having to restart the bot and cause downtime.
Future plans
There are a number of future improvements I want to make, including:
- A web UI for editing the configuration file in a more graphical way
- Better scoping (by category, etc.)
- Better metrics and logging
- Some quality-of-life features to reduce config bloat