Building a markup language: part one

User-generated content sites like Wikipedia and Whirlpool rely heavily on the ability for users to submit richly and contextually marked up content; everything from references to quotations, from links to lists. And this ability needs to be given to a large array of users who don't know HTML, or worse still, know too much. So we build syntax that gives users the ability to provide rich content within the parameters we set — without opening the doors to unwanted, nuisance or malicious mark-up.

Wikipedia's answer is wikitext, a comprehensive (though occasionally bewildering) syntax which makes it possible to have a generally consistent feel to most articles. It still requires authors to follow the guidelines, but a lot of heavy lifting and boilerplate is handled by the parser.

Whirlpool has a similar need. We've got thousands of new posts every day, hundreds of private messages, a job board and our own Wiki — all needing rich text handling of one sort or another. It's an important component to get right.

Up till now, Whirlpool's answer was Whirlcode. It's a fairly simple mark-up that is also difficult to trigger accidentally. Tags like [*bold*], [/italics/] and ["quotes"] aren't necessarily intuitive, but once observed are easily learned and retyped. However over the past couple of weeks I've been hard at work on Whirlcode 2, a major rewrite of the parser that handles this mark-up. And yes, I've written it in JavaScript.

What features could a parser support? Here's a list of the ones I've specifically dealt with in the development of Whirlcode 2:

Check out this live-in-browser example:

I will be writing more about the specific algorithms developed as part of Whirlcode 2 in a future post.

13 comments

I must admit, I've never really been a fan of Whirlcode. I use lots of different sites and forums and for the most part, their markup systems are all BBCode compatible in some form or they have a basic visual editor.

It irks me when I want to format something on WP because it's all different and I couldn't be bothered learning it because it's only one website.

Is the heading feature is only available for wiki use?

Interesting read Simon. It's nice to see some more upfront musings with a personal touch.

Not that it matters in the grand scheme of Whirlpool, but a lot of older, (and other), members, know nothing else other than Whirlcode.

Regards

Whirlpool just keeps getting better. The auto-preview is handy, nice work.
I actually find Whirlcode better than BBCode, quite often find myself hitting ctrl + enter on BB sites too... lol

Cheers.

The problem with auto-hyphenation is, that it is wrong too often. You can use the TeX algorithm, but you need a way to manually specify the hyphenation for some words. It also breaks, when you use languages other than english.

I'm not talking about ridiculously long words like antidisestablishmentarianism (which I oppose), but rather the utterly stupid repetition of letters such as a bawling git saying "waaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaahhh!!!!!". You don't want that breaking your layout.

I'd just like to say that the auto numbering system has worked wonders for the PC Hardware benchmarks wiki! Saves having to change a heap of numbers every time someone has a new score.

Thanks. :-)

so, you are anti-antidisestablishmentarianism

Are you considering WYSIWYG posting?

Excellent stuff Simon.
Even a non-coder/programmer like me can use it.
To the detractors I say "If you want a site using the code you want, go set it up yourself!"
I for one appreciate the time & effort you have & still dedicated to Whirlpool for our benifit.
THANK YOU!

Some minor graphical smilies would be nice too ;-)
Not many, just the basics like tongue, smile and frown would be good. Might help remove some of the undetected sarcasm.

Headings aren't supported yet :-(

Ref:
http://forums.whirlpool.net.au/forum-replies.cfm?t=1006623&p=9#r169
http://forums.whirlpool.net.au/forum-replies.cfm?t=1006623&p=9#r175

Good to have a little Whirlpool sandbox to play in too. ;










(no HTML)