index.html

Tips of Python Multiprocessing in Windows

字数统计: 516阅读时长: 3 min
2019/05/23 Share

Damn! The pool won’t join!

As a beginner, I started learning how to multiprocessing with pool by copying the snippets from a tutorial to Pycharm.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import multiprocessing


def task(num):
print("{} started".format(num))
msg = "Success"
return num, msg


print("Now the show begins!")
pool = multiprocessing.Pool()
tasks = 100
results = []
for i in range(0, tasks):
pool.apply_async(task, args=(i,), callback=results.append)
pool.close()
pool.join()
for num, msg in results:
print(num, msg)

But the output I got is

1
2
Now the show begins!

The program hangs here and no task is executed at all.

After Google, I realise to make the code work in Windows I must wrap everything in main() and add if __name__ == "__main__ entry, as the multiprocessing doc explained.

Okay then, I modified my code like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import multiprocessing


def task(num):
print("{} started".format(num))
msg = "Success"
return num, msg


def main():
print("Now the show begins!")
pool = multiprocessing.Pool()
tasks = 100
results = []
for i in range(0, tasks):
pool.apply_async(task, args=(i,), callback=results.append)
pool.close()
pool.join()
for num, msg in results:
print(num, msg)


if __name__ == '__main__':
main()

Now everything should work. But still the program hangs after I press the magic button.

The problem comes from the multiprocessing library, it does not support Pycharm’s venv intepreter. There are two methods to solve the problem.

  • Run program in cmd instead. (Extremely painful)
  • Change the interpreter in Run/Debug Configuration from venv to the python installed in the system. (Need install modules)

Uh oh~ logging not working!

While print() always print, logging can be troublesome in multiprocessing.
In the following example, the logging works without problem:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
import multiprocessing
import logging

logger = None


def task(num):
msg = "Success"
return num, msg


def main():
logger.info("Now the show begins!")
pool = multiprocessing.Pool()
tasks = 100
results = []
for i in range(0, tasks):
pool.apply_async(task, args=(i,), callback=results.append)
pool.close()
pool.join()
for num, msg in results:
print(num, msg)


if __name__ == '__main__':
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
main()

but when attempting to log something inside task for example:

1
2
3
logger.info("{} started".format(num))
msg = "Success"
return num, msg

The child processes will be broken because logger is initialized in main, but become None when the process access it.

One method to solve this problem is to make logger initialized outside main(), for example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
import multiprocessing
import logging

logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.DEBUG,
format='%(asctime)s [%(levelname)s] (%(process)d):\t%(message)s',
datefmt='%a, %d %b %Y %H:%M:%S', )


def task(num):
logger.info("{} started".format(num))
msg = "Success"
return num, msg


def main():
logger.info("Now the show begins!")
pool = multiprocessing.Pool()
tasks = 100
results = []
for i in range(0, tasks):
pool.apply_async(task, args=(i,), callback=results.append)
pool.close()
pool.join()
for num, msg in results:
print(num, msg)


if __name__ == '__main__':
main()

This works fine.

Logging to the same file in multiprocessing is not safe anymore, but this issue is not discussed here since tons of StackOverflow answer are already good enough.

EOF

CATALOG
  1. 1. Damn! The pool won’t join!
  2. 2. Uh oh~ logging not working!