Go to the homepage

Please add categories to blog feeds

by Tim Severien

I parsed RSS and Atom feeds in various recent projects, and I keep running into the same issue: the absence of categories.

In one project, I compiled a list of people that communicate about web technologies. The project is not live, but I have mentioned some stats in a two-toot thread on Mastodon (throot, anyone?). I want to aggregate these feeds per topic, like a list of Accessibility posts by the likes of Erik Kroes, Hidde de Vries, Scott O’Hara, Sara Soueidan, and more.

I also created a Slack Workspace in which I want to feed blog posts to. Ideally, I wanted posts to end up in relevant channels. Given the example above, we could have those blog posts end up in the #accessibility channel.

A lot of bloggers don’t stick to a single topic. Assigning authors to a specific channel would make my Accessibility blog feed and #accessibility channel include Hidde’s post It’s pretty rude of OpenAI to make their use of your content opt-out, which I’d rather see in the #ai channel. Some stick to a niche, like how Dr Axel Rauschmayer’s blog and my ijsjes.dev are mainly about JavaScript and TypeScript, but those are the exception.

Finally, as a consumer, I sometimes don’t want to subscribe to everything on a blog. If I could somehow filter posts based on topics, I would. To my knowledge, few feed readers support filtering based on categories, but it would be neat. Perhaps someone built a feed proxy that can.

At one point, I considered analysing the content. The first problem is that not all feeds contain the body of each post. We can work around that by fetching the content from the entry’s URL, which is where we get to the hard part. How can we tell what content is about? I’ve mentioned accessibility in this post — is this post about accessibility? It turns out, that distilling categories out of content is a non-trivial Natural Language Processing problem. I’m sure someone published an open-source AI model on Hugging Face that could be of use, though I have doubts it’ll perform well for niche topics.

It turns out RSS and Atom feeds support categories. See the category element in the RSS specification and the atom:category element in the Atom specification. Why not use that, instead?

<!-- Atom -->
<entry>
	<title>...</title>
	<category term="accessibility" />
	<category term="web" />
	<!-- ... -->
</entry>
<!-- RSS -->
<item>
	<title>...</title>
	<category>accessibility</category>
	<category>web</category>
	<!-- ... -->
</item>

I glossed over 30 different feeds, and about ⅓ have categories defined. As one might expect, these are non-standardised string values that can be set to whatever the author wants. Consequently, they’re inconsistent: we find variants of the same word, like accessibility and a11y, or JavaScript and JS. Posts may be tagged with concepts on different levels in a hierarchy, like web and WebGL, or CSS and Flexbox. Finally, some feeds have very generic categories. One feed I checked has all tech-related posts filed under blog.

Time and time again, I’m surprised by the web development blogging community and the power of feeds. Feed categories is another strength I feel we should tap into more. In web development, we have a well-defined vocabulary of technologies and concepts, allowing us to use semi-consistent tags or categories in our feeds. Although ambitions, filtering (and/or aggregating) based on categories feels feasible.