Combining Multiprocessing and Asyncio in Python for Performance Boosts | by Peng Qian | May, 2023

Combining Multiprocessing and Asyncio in Python for Performance Boosts | by Peng Qian | May, 2023

[ad_1]

Thanks to GIL, using multiple threads to perform CPU-bound tasks has never been an option. With the popularity of multicore CPUs, Python offers a multiprocessing solution to perform CPU-bound tasks. But until now, there were still some problems with using multiprocess-related APIs directly.

Before we start, we still have a small piece of code to aid in the demonstration:

The method takes one argument and starts accumulating from 0 to this argument. Print the method execution time and return the result.

As the code shows, we directly create and start multiple processes, and call the start and join methods of each process. However, there are some problems here:

  1. The join method cannot return the result of task execution.
  2. the join method blocks the main process and executes it sequentially.

Even if the later tasks are executed faster than the earlier ones, as shown in the following figure:

If you use Process Class directly, you can’t get the result of the process execution, and the join method will block the main process execution.
The screenshot shows the execution sequence of join. Image by Author
This diagram shows when use Process Class directly, the join method will block the main process execution.
Although process_b finishes executing first, it still has to wait for process_a. Image by Author

Problems of using Pool

If we use multiprocessing.Pool, there are also some problems:

As the code shows, Pool’s apply method is synchronous, which means you have to wait for the previously apply task to finish before the next apply task can start executing.

multiprocessing.Pool.apply method is synchronous
multiprocessing.Pool.apply method is synchronous. Image by Author

Of course, we can use the apply_async method to create the task asynchronously. But again, you need to use the get method to get the result blockingly. It brings us back to the problem with the join method:

Although apply_async is asynchronous, get will still block and execute sequentially
Although apply_async is asynchronous, get will still block and execute sequentially. Image by Author

So, what if we use concurrent.futures.ProcesssPoolExecutor to execute our CPU-bound tasks?

As the code shows, everything looks great and is called just like asyncio.as_completed. But look at the results; they are still fetched in startup order. This is not at all the same as asyncio.as_completed, which gets the results in the order in which they were executed:

When use ProcessPoolExecutor directly, the results is fetched in startup order.
Results are fetched in startup order. Image by Author
The result of the iteration still maintains the call order and blocks. Image by Author

Fortunately, we can use asyncio to handle IO-bound tasks, and its run_in_executor method to invoke multi-process tasks in the same way as asyncio. Not only unifying concurrent and parallel APIs but also solving the various problems we encountered above:

Combining asyncio and ProcessPoolExecutor. Image by Author

Since the sample code in the previous article was all about simulating what we should call the methods of the concurrent process, many readers still need help understanding how to use it in the actual coding after learning it. So after understanding why we need to perform CPU-bound parallel tasks in asyncio, today we will use a real-world example to explain how to use asyncio to handle IO-bound and CPU-bound tasks simultaneously and appreciate the efficiency of asyncio for our code.

Note: Before continuing, if you are interested in the practice of using asyncio.gather and asyncio.as_completed, you can read this article of mine:

[ad_2]
Source link

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *