The Best Ways to Separate Filename and Extension in Python

There is a post here on NDT that can help you get the file extension separated from the filename. One problem: how do we preserve the full filename when there are multiple periods?

Let's back up. There are wise Internet people out there who have made the argument that we shouldn't think of URLs (and I'll add file paths) as simply strings.

If we lived in a perfectly pre-standardized world where everyone followed conventions and there were no exceptions or secret use cases, thinking of paths as strings would be fine. However, case sensitivity is real, people and programs adding multiple periods to a filename happens, spaces are technically allowed, and all of this complexity makes "parsing" or cutting up paths into specific parts becomes pretty tough to hand-code.

Conceptually, a path should be like this: folder(s)/filename.extension. So let's use that as a basis for how we want to separate parts of a path (and make our code work in a lot more situations than we can think of off the top of our head).

Here's where we start running into complexity by treating the filename as a string and simply cutting up the parts:

>>> filename = "this.something.png"

>>> name_fragments = filename.split('.')
>>> name_fragments
['this', 'something', 'png']

>>> ".".join(name_fragments[:-1])
'this.something'

This works, but it also assumes a lot. If you know you're working with filenames that have multiple periods, this could work for you to split the filename from the extension and use both separately. Problem is, there are a lot of steps and also potential use cases we might have overlooked (how do you handle files that start with a period, like .gitignore and other hidden files, for example?).

Use Libraries

Enter solutions from the built-in os library.

This was built to help you interact with the file system. Let's take a look at one of the functions you can use to help us separate extension from filename:

os.path.splitext(your_path_here)

It's pretty simple, just import splitext and try it out with some filenames or paths:

>>> from os.path import splitext

>>> splitext('filename.jpg')
('filename', '.jpg')

>>> splitext('filename.cool.jpg')
('filename.cool', '.jpg')

>>> splitext('.gitignore')
('.gitignore', '')

>>> splitext('/foo/bar.exe')
('/foo/bar', '.exe')

The cool thing with this function is that you can quickly separate the extension and filename (or path) and assign it to appropriately named variable like this in one step!

>>> path, extension = splitext('filename.txt')


# For clarity, here's what gets stored:
>>> path
'filename'
>>> extension
'.txt'

This could be a great way to swap out extensions if, for example, you were converting a .jpg into a .webp copy with the same path and filename, but the correct extension.

Down the Rabbit Hole

Here's another built-in library that you can use to accomplish the same thing:

>>> import pathlib
>>> pathlib.Path('/some/path/here/filename.txt').suffix
'.txt'

Arguably, the pathlib library might be even more useful, simply because there are quite a few awesome features you can use. For example, if you had to pass in a full path, but need to manipulate the filename and extension separately (without the rest of the path), you can use .suffix and .stem to get those specific parts.

>>> pathlib.Path('/some/path/here/filename.txt').stem
'filename'

Another great use case as seen in this Stack Overflow thread, is handling compound extensions where there are multiple periods and you need to know what that full "extension" is:

>>> ''.join(pathlib.Path('somedir/file.tar.gz').suffixes)
'.tar.gz'

Which one should I use?

Choose the library that makes most sense to you because you'll use it more and understand what's going on in your code. Beyond that, you may want to choose pathlib.Path over os.path if you want a simpler return value (a string instead of a tuple of strings).