5 Python Built-in Functions and Modules That Save Time

5 Python Built-in Functions and Modules That Save Time

I consider myself quite lazy, and I recommend every programmer embrace laziness as well. Not in the sense of staying in bed all day and binge-watching a full season of Frieren (though that's exactly what I did last weekend...), but in terms of following the DRY principle, taking your time before starting to code, and avoiding reinventing the wheel.

Sometimes the problem with finding the shortest path is that "You don't know what you don't know" and either need to search for it or have someone show it to you.
I had great luck working with amazing people with whom we could learn from each other, but I know that not everyone was that fortunate.

So I thought that I would share a few things that I like to use to save time so now "You can know what you don't know" 😊

And as always the source code is available here if you need it!

Shelve

shelve is a simple persistent storage option for Python objects, acting like a dictionary but stored on a disk.

I also find this useful when exploring third-party APIs. Imagine you have a limited number of requests and want to save not just the data from a request but the entire object for future analysis (headers, status, etc.). If you wish to continue your investigation later, you would need to keep the notebook's kernel alive or send the request again later.

And this is where shelve comes in handy! You can easily save any object that can be pickled with shelve:

import shelve
import requests

# Fetching data from an API and storing it using shelve
response = requests.get('https://example.com')

with shelve.open('myTest.db') as db:
    db['example_response'] = response

# Later access the data without needing to fetch it again
with shelve.open('myTest.db') as db:
    stored_response = db['example_response']

Partial

functools.partial allows you to fix a certain number of function's arguments and returns a partial object that behaves like a function.

I often use it when fetching data from APIs. For example, with GraphQL, you typically have just one endpoint to call.

So, instead of repeatedly using session.post(url, json=body), I find it faster to use partial in these situations:

from functools import partial

from requests import Session

# Create session
session = Session()

graphql_url = 'https://countries.trevorblades.com/'

# creates function `post` that is calling session.post with url=graphql_url everytime
post = partial(session.post, url=graphql_url)

body = """
query {
    continents{
        name
    }
}
"""

# Now I can just call post instead of session.post(graphql_url, json={"query": body})
resp = post(json={"query": body})
print("List of continents: ", resp.json(), sep="\n", end="\n\n")

body = """
query {
  country (code: "PL") {
    name,
    awsRegion
  }
}
"""

resp = post(json={"query": body})
print("AWS region of Poland: ", resp.json(), sep="\n")

Another example involving endpoints is when an API has a base URL and you need to append the path to an endpoint using urljoin(base_url, endpoint_path). In such cases, using partial with the base_url fixed is quicker.

It might not save a lot of time, but it's definitely useful to know.

Batched

itertools.batched simplifies the task of dividing iterable into batches of a specified size, offering a convenient solution for efficient batch processing.

Finally! It was introduced in Python 3.12. I can't even count how many times I've written this function in utils, so for me, this is a real time saver.

I find it especially helpful for processes that are I/O bound. Sure, you could run each process in a separate thread, but that might lead to issues with thread overhead.
In such cases, it's better to send batches of data that will be processed into a thread and process them iteratively within that thread.

try:
    # for python3.12
    from itertools import batched
except ImportError:
    # for older versions.
    # you need to install package with `pip install more_itertools`
    # or write your own function
    from more_itertools import batched

from concurrent.futures import ThreadPoolExecutor
from time import sleep, time
from timeit import Timer

DATA = [1] * 20000
MAX_WORKERS = 50
BATCH_SIZE = 400
SLEEP_TIME = 0.001
TIMEIT_REPEAT = 10

# We gonna sleep to simulate i/o operation
def simulate_io_operation(*args, **kwargs):
    sleep(SLEEP_TIME)

# process single item
def process_data(item):
    simulate_io_operation(item)

# process batch of items
def process_data_in_batch(items):
    for item in items:
        simulate_io_operation(item)

# Send each item to a different thread
def test_single():
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        results = executor.map(process_data, DATA)

# Send batch of items to a different thread
def test_batch():
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        results = executor.map(process_data_in_batch, batched(DATA, BATCH_SIZE))

print(
    f"Number of elements in list: {len(DATA)} | "
    f"batch size: {BATCH_SIZE} | "
    f"max workers: {MAX_WORKERS} | "
    f"times repeated: {TIMEIT_REPEAT}"
)

time_of_single = Timer(test_single).timeit(TIMEIT_REPEAT)
print(f"One thread per item ran {TIMEIT_REPEAT} times: {time_of_single} seconds")

time_of_batch = Timer(test_batch).timeit(TIMEIT_REPEAT)
print(f"One thread per {BATCH_SIZE} items ran {TIMEIT_REPEAT} times: {time_of_batch} seconds")

Of course, the batch size that you should use depends on many factors, so usually in your case you always need to experiment to find the best option for yourself.

Single dispatch

functools.singledispatch enables you to create generic functions that behave differently based on the type of their first argument.

Sometimes you don't have perfect control of the data that comes to you in APIs that you implement or data that you extract, but you need to clean that data based on its type or apply some business logic to it. You can write a long chain of if/elif/else where you check the type of variable and then apply some logic to it or just use singledispatch

from functools import singledispatch

@singledispatch
def process(value):
    raise NotImplementedError("Unsupported type")

@process.register
def _(value: int):
    return value + 10

@process.register
def _(value: str):
    return value.upper()

# Usage
print(process(10))  # Output: 20

print(process('hello'))  # Output: 'HELLO'

try:
    process(['lama'])
except NotImplementedError as err:
    print(f"NotImplementedError error caught: {err}")

Trust me, if you ever need to use different logic for different types, this approach will make your code much easier to read in the future when you're trying to figure out what you wrote.

Closing

contextlib.closing serves as a utility to ensure that resources are properly released after their use is complete.

Context managers in Python are usually used to manage resources that need to be set up and then properly closed or cleaned up after use. However, some third-party code might not implement them.

If that's the case we can not use with statement, but if an object has implemented a close method then we can use contextlib.closing.

import sqlite3
from contextlib import closing

with sqlite3.connect('myOtherTest') as connection:
    with closing(connection.cursor()) as cursor:
        cursor.execute("SELECT 'HELLO WORLD'")
        print(cursor.fetchall())

Sometimes I forget to close resources after using them (I know, that's a bad habit), so I always prefer to use with statement and why closing comes in handy.


And that's all for now. I hope you find something useful here. I would love to hear about any time savers you've discovered while working with Python, so please don't hesitate to leave a comment below. 😉