Audiobook Splitter

In this post I will cover, how to split up large files containing audiobooks into smaller chuncks using a python script generated with ChatGPT.

I recently discovered the Project Gutenberg Open Audiobook Collection through a post on hacker news. For this site, Project Gutenberg, Microsoft, and MIT have worked together to create thousands of free and open audiobooks using new neural text-to-speech technology and Project Gutenberg’s large open-access collection of e-books. Unfortunately, the audiobooks are rather long. “Wuthering Heights” by Emily Brontë is 11:23:18, for example. If you play a file that large on your phone, chances are, you will not finish it in one go. Finding the position, where you left of can be tedious.

To make the file more manageable, I generated a python script using ChatGPT which splits it into 30 minute chunks.

Before executing the script, it is advisable to set up an anaconda virtual environment for python. After that, install the dependencies of “audiobook-splitter.py” using pip and chocolatey:

pip install pydub

choco install ffmpeg

Create a new python file and paste the following contents or clone my python utility scripts repository

# audiobook-splitter.py
import os
import argparse
from pydub import AudioSegment


def split_audio(input_file, output_dir):
    audio = AudioSegment.from_file(input_file)
    segment_length = 30 * 60 * 1000  # 30 minutes in milliseconds

    if not os.path.exists(output_dir):
        os.makedirs(output_dir)

    for i, start_time in enumerate(range(0, len(audio), segment_length)):
        end_time = start_time + segment_length
        segment = audio[start_time:end_time]
        output_file = os.path.join(
            output_dir, f"{os.path.basename(input_file)[:-4]}_part_{i + 1}.mp3")
        segment.export(output_file, format="mp3")


if __name__ == "__main__":
    parser = argparse.ArgumentParser(
        description="Split an audio file into 30-minute segments.")
    parser.add_argument("input_file", help="Path to the input audio file.")
    parser.add_argument("--output-dir", default="output",
                        help="Directory to save the segmented audio files.")
    args = parser.parse_args()

    input_file = args.input_file
    output_dir = args.output_dir

    split_audio(input_file, output_dir)

The script can be executed from the commandline with the following syntax:

python audiobook-splitter.py input_audio.mp3 --output-dir output_directory

Have fun with the audiobook splitter and happy listening to the Project Gutenberg Open Audiobook Collection!