Python multiprocessing pool global variable

Last Updated on September 12, 2022

You can share a global variable with all child workers processes in the multiprocessing pool by defining it in the worker process initialization function.

In this tutorial you will discover how to share global variables with all workers in the Python process pool.

Let’s get started.

  • Need To Share Global Variable With All Workers in Process Pool
  • How to Share a Global Variable With All Workers
  • Example of Sharing a Global Variable With All Workers
  • Further Reading
  • Takeaways

Need To Share Global Variable With All Workers in Process Pool

The multiprocessing.pool.Pool in Python provides a pool of reusable processes for executing ad hoc tasks.

A process pool can be configured when it is created, which will prepare the child workers.

A process pool object which controls a pool of worker processes to which jobs can be submitted. It supports asynchronous results with timeouts and callbacks and has a parallel map implementation.

— multiprocessing — Process-based parallelism

We can issue one-off tasks to the process pool using functions such as apply() or we can apply the same function to an iterable of items using functions such as map().

Results for issued tasks can then be retrieved synchronously, or we can retrieve the result of tasks later by using asynchronous versions of the functions such as apply_async() and map_async().

When using the process pool, we may need to share a global variable with all child worker processes in the process pool.

This would allow all tasks executed in the process pool to use the shared global variable.

We may need this capability for many reasons, such as:

  • Allow all tasks to use a shared log.
  • Allow all tasks to use a shared queue or pipe.
  • Allow all tasks to use a shared synchronization primitive like a lock, semaphore, or event.

The process pool does not provide this capability.

How can we share a global variable with all child worker processes?

Or, put another way:

How can a shared global variable be accessed by all tasks executed by the process pool in Python?

We can share a global variable with all child process workers in the process pool.

This can be achieved by configuring the process pool to initialize each worker process using a custom function.

For example:

...

# create a process pool with custom initialization

pool=Pool(initializer=init_worker,initargs=(data,))

The global variable data required by each child worker process can be passed as an argument to the initialization function. It can then be stored in a global variable. This will make it available to each child worker process.

Recall, declaring a variable “global” in a function will define a global variable for the process, rather than a local variable for the function.

You may also recall that the worker initialization function is executed by the main thread of each new worker process. Therefore, a global variable defined in the initialization function will be available to the process later.

For example:

# initialize worker processes

def init_worker(data):

    # declare scope of a new global variable

    globalshared_data

    # store argument in the global variable for this process

    shared_data =data

Because each child worker process in the process pool will be initialized using the same function, the global variable or (or variables) will be accessible by all child worker processes in the process pool.

This means that any tasks executed in the process pool can access the global variable, such as custom functions executed as tasks in the process pool.

For example:

# task executed in a worker process

def task():

# access the global variable

print(shared_data)

You can learn more about configuring the child worker process initialization function in the tutorial:

  • Process Pool Initializer in Python

Now that we know how to share global variables with all worker processes, let’s look at a worked example.

Confused by the Pool class API?
Download my FREE PDF cheat sheet

Example of Sharing a Global Variable With All Workers

We can explore how to share a global variable with all child worker processes.

In this example, we will define a shared multiprocessing queue. We will then share this queue with each child worker process via its initialization function. Each child worker will store reference to the queue in a global variable so that all tasks executed by each worker can access it. We will then execute tasks in the process pool that put task results into the shared queue. The main process will then read results as they become available via the shared queue.

Firstly, we need to define the custom function used to initialize the child worker processes.

The initialization function must take the shared queue as an argument. It will then declare a new global variable for the child process and store a reference to the shared queue in the global variable.

The init_worker() function below implements this.

# initialize worker processes

def init_worker(shared_queue):

    # declare scope of a new global variable

    globalqueue

    # store argument in the global variable for this process

    queue =shared_queue

Next, we can define a custom task function to execute in the process pool.

The task function will take an integer identifier as a number as an argument. It will then generate a number between 0 and 5, then block for that many seconds to simulate a variable amount of computational effort. Finally, it will send the generated number and integer identifier as a tuple into the shared queue.

The task() function below implements this.

Note that we explicitly define the scope of the queue global variable. This is technically not required, but I believe it helps make the code more readable.

# task executed in a worker process

def task(identifier):

    # generate a value

    value=random()*5

    # block for a moment

    sleep(value)

    # declare scope of shared queue

    globalqueue

    # send result using shared queue

    queue.put((identifier, value))

Next, in the main process we can first create the shared multiprocessing queue. We will use a multiprocessing.SimpleQueue in this case.

...

# create a shared queue

shared_queue=SimpleQueue()

You can learn more about the multiprocessing.SimpleQueue in the tutorial:

  • Multiprocessing SimpleQueue in Python

Next, we can create and configure the process pool.

In this case, we will configure it so that each worker process is initialized using our init_worker() custom initialization function and pass the shared queue as an argument.

We will use the context manager interface so that the process pool is closed for us automatically once we are finished with it.

...

# create and configure the process pool

with Pool(initializer=init_worker,initargs=(shared_queue,))aspool:

# ...

You can learn more about the context manager interface in the tutorial:

  • Process Pool Context Manager

Next, we will issue 10 calls to our custom task function asynchronously using the map_async() function.

...

# issue tasks into the process pool

_=pool.map_async(task,range(10))

We will then consume the results of the tasks as they are available (e.g. simulating the imap_unordered() function). This can be achieved by iterating over the expected number of results and calling get() ob the shared queue for each task result.

...

# read results from the queue as they become available

foriinrange(10):

    result=shared_queue.get()

    print(f'Got {result}',flush=True)

Tying this together, the complete example is listed below.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

# SuperFastPython.com

# example of sharing a global variable among all workers

from random import random

from time import sleep

from multiprocessing import SimpleQueue

from multiprocessing.pool import Pool

# initialize worker processes

def init_worker(shared_queue):

    # declare scope of a new global variable

    globalqueue

    # store argument in the global variable for this process

    queue =shared_queue

# task executed in a worker process

def task(identifier):

    # generate a value

    value=random()* 5

    # block for a moment

    sleep(value)

    # declare scope of shared queue

    globalqueue

    # send result using shared queue

    queue.put((identifier, value))

# protect the entry point

if__name__=='__main__':

    # create a shared queue

    shared_queue=SimpleQueue()

    # create and configure the process pool

    with Pool(initializer=init_worker,initargs=(shared_queue,))aspool:

        # issue tasks into the process pool

        _=pool.map_async(task,range(10))

        # read results from the queue as they become available

        for iinrange(10):

            result=shared_queue.get()

            print(f'Got {result}',flush=True)

Running the example first creates the shared queue.

Next, the process pool is created and configured to use the custom initialization function.

Each worker process is created and started then initialized with the custom initialization function. Each worker creates a new global variable named “queue” and stores the passed in shared queue against the global variable. This makes “queue” available to all tasks executed by the worker process, and all worker processes are initialized the same way.

Next, 10 tasks are issued into the process pool.

The main process then iterates the 10 results, calling get() on the queue which will block and not return until a result is available.

Each task first generates a random number between 0 and 5, then blocks for that many seconds to simulate computational effort. The “queue” global variable for the process is declared explicitly, then accessed. The result for the task is put on the queue and the task completes.

Results are reported in the main process as they become available.

After all 10 results are retrieved from the shared queue, the main process continues on, automatically closing the process pool and then closing the application.

Note, the specific results will differ each time the program is run due to the use of random numbers.

Got (2, 0.38947694846648895)

Got (3, 0.7665425799985037)

Got (4, 1.6182597482880667)

Got (6, 2.912364034686572)

Got (7, 3.0557058569816458)

Got (8, 2.6846243338785)

Got (0, 3.589396223885189)

Got (5, 4.1921930714219116)

Got (1, 4.282642869898409)

Got (9, 4.827317385371338)


Need help with the Multiprocessing Pool?

Sign-up to my FREE 7-day email course and discover how to use the multiprocessing Pool, including how to configure the number of workers, how to execute tasks asynchronously, and much more!

Click the button below and enter your email address to sign-up and get the first lesson right now.

Start Your FREE Email Course Now!
 


Further Reading

This section provides additional resources that you may find helpful.

  • multiprocessing — Process-based parallelism
  • Multiprocessing Pool: The Complete Guide
  • Pool Class API Cheat Sheet
  • Multiprocessing API Interview Questions
  • Multiprocessing Pool Jump-Start (my 7-day course)

Takeaways

You now know how to share global variables with all workers in the Python process pool.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Photo by Jeremy Bezanger on Unsplash

Can multiprocessing access global variables?

You can share a global variable with all child workers processes in the multiprocessing pool by defining it in the worker process initialization function. In this tutorial you will discover how to share global variables with all workers in the Python process pool.

Are global variables shared between processes Python?

Global variables can only be shared or inherited by child processes that are forked from the parent process. Specifically, this means that you must create child processes using the 'fork' start method.

When would you use a multiprocessing pool?

Use the multiprocessing. Pool class when you need to execute tasks that may or may not take arguments and may or may not return a result once the tasks are complete. Use the multiprocessing. Pool class when you need to execute different types of ad hoc tasks, such as calling different target task functions.

How do you close a multiprocessing pool?

The process pool can be shutdown by calling the Pool. close() function. This will prevent the pool from accepting new tasks. Once all issued tasks are completed, the resources of the process pool, such as the child worker processes, will be released.