Regex of the Day

Better Quote Excerpts for Templates and CMSs

Content is random, so there's not a great way to create an excerpt by character counts if you don't want words to get sliced up awkwardly in th... ...the middle ;) Here's a better way.

Jess Brown

02 May 2023 — 2 min read

The Problem

Content is random, so there's not a great way to cut off a quote or an excerpt by character counts if you don't want words to get sliced up awkwardly in th...

...the middle ;)

Watch the demo or read on!

The character count looks like this:

quote = "Some quote here..."
print(quote[:100])

This will print out the first 100 characters of the quote. Not too bad for the short quote we have hard-coded, but once this is in a production environment, you can't rely on the quote being short or conforming perfectly to any of your ideal conventions.

Regex to the Rescue

Instead of boxing in your content with a character count, you can use a word count to keep the excerpt readable. We'll use a regular expression to get the words and output however many of those words that you want.

Here it is in a python function:

import re

quote = "Some quote here..."

def excerpt(quote):
	word_count = re.compile(r'(\b\w+\b[\s\W]*){1,15}')
    return word_count.search(quote).group()
    
print(excerpt(quote))

This regex is looking for words using the word boundary character \b surrounding any number of word characters with \w.

We've added the [\s\W]* to be able to include punctuation and spaces so that we can put all the words into a single group and not have to worry about that in python—the range is defined inside the regex itself. Here's an illustration of the difference in matching each word individually vs. all in one match group:

Without capturing the space/punctuation, it matches words individually. (Thanks to Regex101.com for this screenshot—it's my favorite tool to understand regexes)

Including the spaces/punctuation, we get a single match with the number of words we're trying to use as the excerpt (12 in this case).

The range notation {1, 15} is looking for the first 15 words. That makes it easy to simply grab the first group from the .search() method.

After running this function, the output would be:

There are actually 15 words here..."adipisicing" is one word, despite how the terminal displays it in this shot ;)

Extra Flair

If you absolutely know that your excerpt will be way shorter than the full content (a blog post excerpt, perhaps) you can add the ellipsis (...) to the end of the excerpt that gets returned like this:

def excerpt(quote):
	...
    return word_count.search(quote).group() + '...'

Now when you've got a good, long quote, the excerpt will stop after 15 words and show the ellipsis so that it's clear there's more to the content.

Better Quote Excerpts for Templates and CMSs

Jess Brown

The Problem

Regex to the Rescue

Extra Flair

Read more

Building a Color Contrast Checker in the Terminal

See, Speak, and Hear No Evil

The Problem We're Avoiding with AI

How to Get the Value of a Specific URL Parameter with Regex