Sign Up to Our Newsletter

Be the first to know the latest tech updates

[mc4wp_form id=195]

Python 3.14 and the End of the GIL

Python 3.14 and the End of the GIL


of the most eagerly awaited releases in recent times, is finally here. The reason for this is that several exciting enhancements have been implemented in this release, including:

Sub-interpreters. These have been available in Python for 20 years, but to use them, you had to drop down to coding in C. Now they can be used straight from Python itself.

T-Strings. Template strings are a new method for custom string processing. They use the familiar syntax of f-strings, but, unlike f-strings, they return an object representing both the static and interpolated parts of the string, instead of a simple string.

A just-in-time compiler. This is still an experimental feature and should not be used in production systems; however, it promises a performance boost for specific use cases.

There are many more enhancements in Python 3.14, but this article is not about those or the ones we mentioned above. 

Instead, we will be discussing what is probably the most anticipated feature in this release: free-threaded Python, also known as GIL-free Python. Note that regular Python 3.14 will still run with the GIL enabled, but you can download (or build) a separate, free-threaded version. I’ll show you how to download and install it, and through several coding examples, demonstrate a comparison of run times between regular and GIL-free Python 3.14.

What is the GIL?

Many of you will be aware of the Global Interpreter Lock (GIL) in Python. The GIL is a mutex—a locking mechanism—used to synchronise access to resources, and in Python, ensures that only one thread is executing bytecode at a time.

On the one hand, this has several advantages, including making it easier to perform thread and memory management, avoiding race conditions, and integrating Python with C/C++ libraries. 

On the other hand, the GIL can stifle parallelism. With the GIL in place, true parallelism for CPU-bound tasks across multiple CPU cores within a single Python process is not possible.

Why this matters

In a word, “performance”.

Because free-threaded execution can use all the available cores on your system simultaneously, code will often run faster. As data scientists and ML or data engineers, this applies not only to your code but also to the code that builds the systems, frameworks, and libraries that you rely on.

Many machine learning and data science tasks are CPU-intensive, particularly during model training and data preprocessing. The removal of the GIL could lead to significant performance improvements for these CPU-bound tasks.

A lot of popular libraries in Python face constraints because they have had to work around the GIL. Its removal could lead to:-

  • Simplified and potentially more efficient implementations of these libraries
  • New optimisation opportunities in existing libraries
  • Development of new libraries that can take full advantage of parallel processing

Installing the free-threaded Python version

If you’re a Linux user, the only way to obtain free threading Python is to build it yourself. If, like me, you’re on Windows (or macOS), you can install it using the official installers from the Python website. During the process, you’ll have an option to customise your installation. Look for a checkbox to include the free-threaded binaries. This will install a separate interpreter that you can use to run your code without the GIL. I’ll demonstrate how the installation works on a 64-bit Windows system.

To get started, click the following URL:

https://www.python.org/downloads/release/python-3140

And scroll down until you see a table that looks like this.

Image from Python website

Now, click on the Windows Installer (64-bit) link. Once the executable has been downloaded, open it and, on the first installation screen that’s displayed, click on the Customize Installation link. Note that I also checked the Add Python.exe to path checkbox.

On the next screen, select the optional extras you want to add to the installation, then click Next again. At this point, you should see a screen like this,

Image from Python installer

Ensure the checkbox next to Download free-threaded binaries is selected. I also checked the Install Python 3.14 for all users option.

Click the Install button.

Once the download has finished, in the install folder, look for a Python application file with a ‘t’ on the end of its name. This is the GIL-free version of Python. The application file, called Python, is the regular Python executable. In my case, the GIL-free Python was called Python3.14t. You can check that it’s been correctly installed by typing this into a command line.

C:\Users\thoma>python3.14t

Python 3.14.0 free-threading build (tags/v3.14.0:ebf955d, Oct  7 2025, 10:13:09) [MSC v.1944 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> 

If you see this, you’re all set. Otherwise, check that the installation location has been added to your PATH environment variable and/or double-check your installation steps.

As we’ll be comparing the GIL-free Python runtimes with the regular Python runtimes, we should also verify that this is also installed correctly.

C:\Users\thoma>python
Python 3.14.0 (tags/v3.14.0:ebf955d, Oct  7 2025, 10:15:03) [MSC v.1944 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>

GIL vs GIL-free Python

Example 1 — Finding prime numbers

Type the following into a Python code file, e.g example1.py

#
# example1.py
#

import threading
import time
import multiprocessing

def is_prime(n):
    """Check if a number is prime."""
    if n < 2:
        return False
    for i in range(2, int(n**0.5) + 1):
        if n % i == 0:
            return False
    return True

def find_primes(start, end):
    """Find all prime numbers in the given range."""
    primes = []
    for num in range(start, end + 1):
        if is_prime(num):
            primes.append(num)
    return primes

def worker(worker_id, start, end):
    """Worker function to find primes in a specific range."""
    print(f"Worker {worker_id} starting")
    primes = find_primes(start, end)
    print(f"Worker {worker_id} found {len(primes)} primes")

def main():
    """Main function to coordinate the multi-threaded prime search."""
    start_time = time.time()

    # Get the number of CPU cores
    num_cores = multiprocessing.cpu_count()
    print(f"Number of CPU cores: {num_cores}")

    # Define the range for prime search
    total_range = 2_000_000
    chunk_size = total_range // num_cores

    threads = []
    # Create and start threads equal to the number of cores
    for i in range(num_cores):
        start = i * chunk_size + 1
        end = (i + 1) * chunk_size if i < num_cores - 1 else total_range
        thread = threading.Thread(target=worker, args=(i, start, end))
        threads.append(thread)
        thread.start()

    # Wait for all threads to complete
    for thread in threads:
        thread.join()

    # Calculate and print the total execution time
    end_time = time.time()
    total_time = end_time - start_time
    print(f"All workers completed in {total_time:.2f} seconds")

if __name__ == "__main__":
    main()

The is_prime function checks if a given number is prime.

The find_primes function finds all prime numbers within a given range.

The worker function is the target for each thread, finding primes in a specific range.

The main function coordinates the multi-threaded prime search:

  • It divides the total range into the number of chunks corresponding to the number of cores the system has (32 in my case).
  • Creates and starts 32 threads, each searching a small part of the range.
  • Waits for all threads to complete.
  • Calculates and prints the total execution time.

Timing results

Let’s see how long it takes to run using regular Python.

C:\Users\thoma\projects\python-gil>python example1.py
Number of CPU cores: 32
Worker 0 starting
Worker 1 starting
Worker 0 found 6275 primes
Worker 2 starting
Worker 3 starting
Worker 1 found 5459 primes
Worker 4 starting
Worker 2 found 5230 primes
Worker 3 found 5080 primes
...
...
Worker 27 found 4346 primes
Worker 15 starting
Worker 22 found 4439 primes
Worker 30 found 4338 primes
Worker 28 found 4338 primes
Worker 31 found 4304 primes
Worker 11 found 4612 primes
Worker 15 found 4492 primes
Worker 25 found 4346 primes
Worker 26 found 4377 primes
All workers completed in 3.70 seconds

Now, with the GIL-free version:

C:\Users\thoma\projects\python-gil>python3.14t example1.py
Number of CPU cores: 32
Worker 0 starting
Worker 1 starting
Worker 2 starting
Worker 3 starting
...
...
Worker 19 found 4430 primes
Worker 29 found 4345 primes
Worker 30 found 4338 primes
Worker 18 found 4520 primes
Worker 26 found 4377 primes
Worker 27 found 4346 primes
Worker 22 found 4439 primes
Worker 23 found 4403 primes
Worker 31 found 4304 primes
Worker 28 found 4338 primes
All workers completed in 0.35 seconds

That’s an impressive start. A 10x improvement in runtime.

Example 2 — Reading multiple files simultaneously.

In this example, we’ll use the concurrent.futures model to read multiple text files simultaneously and count and display the number of lines and words in each.

Before we do that, we need some data files to process. You can use the following Python code to do that. It generates 1,000,000 random, nonsensical sentences each and writes them to 20 separate text files, sentences_01.txt, sentences_02.txt, etc.

import os
import random
import time

# --- Configuration ---
NUM_FILES = 20
SENTENCES_PER_FILE = 1_000_000
WORDS_PER_SENTENCE_MIN = 8
WORDS_PER_SENTENCE_MAX = 20
OUTPUT_DIR = "fake_sentences" # Directory to save the files

# --- 1. Generate a pool of words ---
# Using a small list of common words for variety.
# In a real scenario, you might load a much larger dictionary.
word_pool = [
    "the", "be", "to", "of", "and", "a", "in", "that", "have", "i",
    "it", "for", "not", "on", "with", "he", "as", "you", "do", "at",
    "this", "but", "his", "by", "from", "they", "we", "say", "her", "she",
    "or", "an", "will", "my", "one", "all", "would", "there", "their", "what",
    "so", "up", "out", "if", "about", "who", "get", "which", "go", "me",
    "when", "make", "can", "like", "time", "no", "just", "him", "know", "take",
    "people", "into", "year", "your", "good", "some", "could", "them", "see", "other",
    "than", "then", "now", "look", "only", "come", "its", "over", "think", "also",
    "back", "after", "use", "two", "how", "our", "work", "first", "well", "way",
    "even", "new", "want", "because", "any", "these", "give", "day", "most", "us",
    "apple", "banana", "car", "house", "computer", "phone", "coffee", "water", "sky", "tree",
    "happy", "sad", "big", "small", "fast", "slow", "red", "blue", "green", "yellow"
]

# Ensure output directory exists
os.makedirs(OUTPUT_DIR, exist_ok=True)

print(f"Starting to generate {NUM_FILES} files, each with {SENTENCES_PER_FILE:,} sentences.")
print(f"Total sentences to generate: {NUM_FILES * SENTENCES_PER_FILE:,}")
start_time = time.time()

for file_idx in range(NUM_FILES):
    file_name = os.path.join(OUTPUT_DIR, f"sentences_{file_idx + 1:02d}.txt")
    
    print(f"\nGenerating and writing to {file_name}...")
    file_start_time = time.time()
    
    with open(file_name, 'w', encoding='utf-8') as f:
        for sentence_idx in range(SENTENCES_PER_FILE):
            # 2. Construct fake sentences
            num_words = random.randint(WORDS_PER_SENTENCE_MIN, WORDS_PER_SENTENCE_MAX)
            
            # Randomly pick words
            sentence_words = random.choices(word_pool, k=num_words)
            
            # Join words, capitalize first, add a period
            sentence = " ".join(sentence_words).capitalize() + ".\n"
            
            # 3. Write to file
            f.write(sentence)
            
            # Optional: Print progress for large files
            if (sentence_idx + 1) % 100_000 == 0:
                print(f"  {sentence_idx + 1:,} sentences written to {file_name}...")
                
    file_end_time = time.time()
    print(f"Finished {file_name} in {file_end_time - file_start_time:.2f} seconds.")

total_end_time = time.time()
print(f"\nAll files generated! Total time: {total_end_time - start_time:.2f} seconds.")
print(f"Files saved in the '{OUTPUT_DIR}' directory.")

Here is what the start of sentences_01.txt looks like,

New then coffee have who banana his their how year also there i take.
Phone go or with over who one at phone there on will.
With or how my us him our sad as do be take well way with green small these.
Not from the two that so good slow new.
See look water me do new work new into on which be tree how an would out sad.
By be into then work into we they sky slow that all who also.
Come use would have back from as after in back he give there red also first see.
Only come so well big into some my into time its banana for come or what work.
How only coffee out way to just tree when by there for computer work people sky by this into.
Than say out on it how she apple computer us well then sky sky day by other after not.
You happy know a slow for for happy then also with apple think look go when.
As who for than two we up any can banana at.
Coffee a up of up these green small this us give we.
These we do because how know me computer banana back phone way time in what.

OK, now we can time how long it takes to read those files. Here is the code we’ll be testing. It simply reads each file, counts the lines and words, and outputs the results.

import concurrent.futures
import os
import time

def process_file(filename):
    """
    Process a single file, returning its line count and word count.
    """
    try:
        with open(filename, 'r') as file:
            content = file.read()
            lines = content.split('\n')
            words = content.split()
            return filename, len(lines), len(words)
    except Exception as e:
        return filename, -1, -1  # Return -1 for both counts if there's an error

def main():
    start_time = time.time()  # Start the timer

    # List to hold our files
    files = [f"./data/sentences_{i:02d}.txt" for i in range(1, 21)]  # Assumes 20 files named file_1.txt to file_20.txt

    # Use a ThreadPoolExecutor to process files in parallel
    with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
        # Submit all file processing tasks
        future_to_file = {executor.submit(process_file, file): file for file in files}

        # Process results as they complete
        for future in concurrent.futures.as_completed(future_to_file):
            file = future_to_file[future]
            try:
                filename, line_count, word_count = future.result()
                if line_count == -1:
                    print(f"Error processing {filename}")
                else:
                    print(f"{filename}: {line_count} lines, {word_count} words")
            except Exception as exc:
                print(f'{file} generated an exception: {exc}')

    end_time = time.time()  # End the timer
    print(f"Total execution time: {end_time - start_time:.2f} seconds")

if __name__ == "__main__":
    main()

Timing results

Regular Python first.

C:\Users\thoma\projects\python-gil>python example2.py

./data/sentences_09.txt: 1000001 lines, 14003319 words
./data/sentences_01.txt: 1000001 lines, 13999989 words
./data/sentences_05.txt: 1000001 lines, 13998447 words
./data/sentences_07.txt: 1000001 lines, 14004961 words
./data/sentences_02.txt: 1000001 lines, 14009745 words
./data/sentences_10.txt: 1000001 lines, 14000166 words
./data/sentences_06.txt: 1000001 lines, 13995223 words
./data/sentences_04.txt: 1000001 lines, 14005683 words
./data/sentences_03.txt: 1000001 lines, 14004290 words
./data/sentences_12.txt: 1000001 lines, 13997193 words
./data/sentences_08.txt: 1000001 lines, 13995506 words
./data/sentences_15.txt: 1000001 lines, 13998555 words
./data/sentences_11.txt: 1000001 lines, 14001299 words
./data/sentences_14.txt: 1000001 lines, 13998347 words
./data/sentences_13.txt: 1000001 lines, 13998035 words
./data/sentences_19.txt: 1000001 lines, 13999642 words
./data/sentences_20.txt: 1000001 lines, 14001696 words
./data/sentences_17.txt: 1000001 lines, 14000184 words
./data/sentences_18.txt: 1000001 lines, 13999968 words
./data/sentences_16.txt: 1000001 lines, 14000771 words
Total execution time: 18.77 seconds

Now for the GIL-free version

C:\Users\thoma\projects\python-gil>python3.14t example2.py

./data/sentences_02.txt: 1000001 lines, 14009745 words
./data/sentences_03.txt: 1000001 lines, 14004290 words
./data/sentences_08.txt: 1000001 lines, 13995506 words
./data/sentences_07.txt: 1000001 lines, 14004961 words
./data/sentences_04.txt: 1000001 lines, 14005683 words
./data/sentences_05.txt: 1000001 lines, 13998447 words
./data/sentences_01.txt: 1000001 lines, 13999989 words
./data/sentences_10.txt: 1000001 lines, 14000166 words
./data/sentences_06.txt: 1000001 lines, 13995223 words
./data/sentences_09.txt: 1000001 lines, 14003319 words
./data/sentences_12.txt: 1000001 lines, 13997193 words
./data/sentences_11.txt: 1000001 lines, 14001299 words
./data/sentences_18.txt: 1000001 lines, 13999968 words
./data/sentences_14.txt: 1000001 lines, 13998347 words
./data/sentences_13.txt: 1000001 lines, 13998035 words
./data/sentences_16.txt: 1000001 lines, 14000771 words
./data/sentences_19.txt: 1000001 lines, 13999642 words
./data/sentences_15.txt: 1000001 lines, 13998555 words
./data/sentences_17.txt: 1000001 lines, 14000184 words
./data/sentences_20.txt: 1000001 lines, 14001696 words
Total execution time: 5.13 seconds

Not quite as impressive as our first example, but still very good, showing a more than 3x improvement.

Example 3 — matrix multiplication

We’ll use the threading module for this. Here is the code we’ll be running.

import threading
import time
import os

def multiply_matrices(A, B, result, start_row, end_row):
    """Multiply a submatrix of A and B and store the result in the corresponding submatrix of result."""
    for i in range(start_row, end_row):
        for j in range(len(B[0])):
            sum_val = 0
            for k in range(len(B)):
                sum_val += A[i][k] * B[k][j]
            result[i][j] = sum_val

def main():
    """Main function to coordinate the multi-threaded matrix multiplication."""
    start_time = time.time()

    # Define the size of the matrices
    size = 1000
    A = [[1 for _ in range(size)] for _ in range(size)]
    B = [[1 for _ in range(size)] for _ in range(size)]
    result = [[0 for _ in range(size)] for _ in range(size)]

    # Get the number of CPU cores to decide on the number of threads
    num_threads = os.cpu_count()
    print(f"Number of CPU cores: {num_threads}")

    chunk_size = size // num_threads

    threads = []
    # Create and start threads
    for i in range(num_threads):
        start_row = i * chunk_size
        end_row = size if i == num_threads - 1 else (i + 1) * chunk_size
        thread = threading.Thread(target=multiply_matrices, args=(A, B, result, start_row, end_row))
        threads.append(thread)
        thread.start()

    # Wait for all threads to complete
    for thread in threads:
        thread.join()

    end_time = time.time()

    # Just print a small corner to verify
    print("Top-left 5x5 corner of the result matrix:")
    for r_idx in range(5):
        print(result[r_idx][:5])

    print(f"Total execution time (matrix multiplication): {end_time - start_time:.2f} seconds")

if __name__ == "__main__":
    main()

The code performs matrix multiplication of two 1000×1000 matrices in parallel using multiple CPU cores. It divides the result matrix into chunks, assigns each chunk to a separate process (equal to the number of CPU cores), and each process calculates its assigned portion of the matrix multiplication independently. Finally, it waits for all processes to finish and reports the total execution time, demonstrating how to leverage multiprocessing to speed up CPU-bound tasks.

Timing results

Regular Python:

C:\Users\thoma\projects\python-gil>python example3.py
Number of CPU cores: 32
Top-left 5x5 corner of the result matrix:
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
Total execution time (matrix multiplication): 43.95 seconds

GIL-free Python:

C:\Users\thoma\projects\python-gil>python3.14t example3.py
Number of CPU cores: 32
Top-left 5x5 corner of the result matrix:
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
Total execution time (matrix multiplication): 4.56 seconds

Once again, we get almost a 10x improvement using GIL-free Python. Not too shabby.

GIL-free is not always better.

An interesting point to note is that on this last test, I also tried it with a multiprocessing version of the code. It turned out that the regular Python was significantly faster (28%) than the GIL-free Python. I won’t present the code, just the results,

Timings

Regular Python first (multiprocessing).

C:\Users\thoma\projects\python-gil>python example4.py
Number of CPU cores: 32
Top-left 5x5 corner of the result matrix:
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
Total execution time (matrix multiplication): 4.49 seconds

GIL-free version (multiprocessing)

C:\Users\thoma\projects\python-gil>python3.14t example4.py
Number of CPU cores: 32
Top-left 5x5 corner of the result matrix:
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
Total execution time (matrix multiplication): 6.29 seconds

As always in these situations, it’s important to test thoroughly.

Bear in mind that these last examples are just tests to showcase the difference between GIL and GIL-free Python. Using an external library, such as NumPy, to perform matrix multiplication would be at least an order of magnitude faster than either.

One other point to note if you decide to use free-threading Python in your workloads is that not all third-party libraries you might want to use are compatible with it. The list of incompatible libraries is small and shrinking with each release, but it’s something to keep in mind. To view a list of these, please click the link below.

https://ft-checker.com

Summary

In this article, we discuss a potentially groundbreaking feature of the latest Python 3.14 release: the introduction of an optional “free-threaded” version, which removes the Global Interpreter Lock (GIL). The GIL is a mechanism in standard Python that simplifies memory management by ensuring only one thread executes Python bytecode at a time. Whilst acknowledging that this can be useful in some cases, it prevents true parallel processing on multi-core CPUs for CPU-intensive tasks.

The removal of the GIL in the free-threaded build is primarily aimed at enhancing performance. This can be especially useful for data scientists and machine learning engineers whose work often involves CPU-bound operations, such as model training and data preprocessing. This change allows Python code to utilise all available CPU cores simultaneously within a single process, potentially leading to significant speed improvements. 

To demonstrate the impact, the article presents several performance comparisons:

  • Finding prime numbers: A multi-threaded script saw a dramatic 10x performance increase, with execution time dropping from 3.70 seconds in standard Python to just 0.35 seconds in the GIL-free version.
  • Reading multiple files simultaneously: An I/O-bound task using a thread pool to process 20 large text files was over 3 times faster, completing in 5.13 seconds compared to 18.77 seconds with the standard interpreter.
  • Matrix multiplication: A custom, multi-threaded matrix multiplication code also experienced a nearly 10x speedup, with the GIL-free version finishing in 4.56 seconds, compared to 43.95 seconds for the standard version.

However, I also explained that the GIL-free version is not a panacea for Python code development. In a surprising turn, a multiprocessing version of the matrix multiplication code ran faster with standard Python (4.49 seconds) than with the GIL-free build (6.29 seconds). This highlights the importance of testing and benchmarking specific applications, as the overhead of process management in the GIL-free version can sometimes negate its benefits.

I also mentioned the caveat that not all third-party Python libraries are compatible with GIL-free Python and gave a URL where you can view a list of incompatible libraries.



Source link

Thomas Reid

About Author

TechToday Logo

Your go-to destination for the latest in tech, AI breakthroughs, industry trends, and expert insights.

Get Latest Updates and big deals

Our expertise, as well as our passion for web design, sets us apart from other agencies.

Digitally Interactive  Copyright 2022-25 All Rights Reserved.