Skip to content
This repository was archived by the owner on May 31, 2021. It is now read-only.

Commit 290791b

Browse files
committed
Add description for non-blocking version.
1 parent 3393316 commit 290791b

File tree

1 file changed

+76
-0
lines changed

1 file changed

+76
-0
lines changed

webscraper.rst

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -272,9 +272,85 @@ Let's see if we can do better.
272272
Getting Multiple Pages Asynchronously - With Time Savings
273273
---------------------------------------------------------
274274

275+
We want to take advantage of the asynchronous nature of ``get_page()``
276+
and save time.
277+
We modify our client to use a list with four instances of
278+
a :term:`task <task>`.
279+
This allows us to send out requests for all pages we want to retrieve without
280+
waiting for the answer before asking for the next page:
275281

276282
.. literalinclude:: examples/async_client_nonblocking.py
277283

284+
The interesting part is in this loop:
285+
286+
.. code-block:: python
287+
288+
with closing(asyncio.get_event_loop()) as loop:
289+
for wait in waits:
290+
tasks.append(get_page(host, port, wait))
291+
pages = loop.run_until_complete(asyncio.gather(*tasks))
292+
293+
We append all return values of ``get_page()`` to our lits of tasks.
294+
This allows us to send out all request, in our case four, without
295+
waiting for the answers.
296+
After sending all of them, we wait for the answers, using:
297+
298+
.. code-block:: python
299+
300+
loop.run_until_complete(asyncio.gather(*tasks))
301+
302+
We used ``loop.run_until_complete()`` already for each call to ``get_page()``
303+
in the previous section.
304+
The difference here is the use of ``asyncio.gather()`` that is called with all
305+
our tasks in the list ``tasks`` as arguments.
306+
The ``asyncio.gather(*tasks)`` means for our example with four list entries:
307+
308+
.. code-block:: python
309+
310+
asyncio.gather(tasks[0], tasks[1], tasks[2], tasks[3])
311+
312+
So, for a list with 100 tasks it would mean:
313+
314+
.. code-block:: python
315+
316+
asyncio.gather(tasks[0], tasks[1], tasks[2],
317+
# 96 more tasks here
318+
tasks[99])
319+
320+
321+
Let's see if we got any faster::
322+
323+
async_client_nonblocking.py
324+
It took 5.08 seconds for a total waiting time of 11.00.
325+
Waited for 1.00 seconds.
326+
That's all.
327+
Waited for 5.00 seconds.
328+
That's all.
329+
Waited for 3.00 seconds.
330+
That's all.
331+
Waited for 2.00 seconds.
332+
That's all.
333+
334+
Yes! It works.
335+
The total run time is about five seconds.
336+
This is the run time for the longest wait.
337+
Now, we don't have to wait for the sum of ``waits`` but rather for
338+
``max(waits)``.
339+
340+
We did quite a bit of work, sending a request and scanning an answer,
341+
including finding out the encoding.
342+
There should be a shorter way as these steps seem to be always necessary for
343+
getting the page content with the right encoding.
344+
Therefore, in the next section, we will have a look at high-level library
345+
``aiohttp`` that can help to make our code shorter.
346+
347+
Exercise
348+
++++++++
349+
350+
Add more waiting times to the list ``waits`` and see how this impacts
351+
the run times of the blocking and the non-blocking implementation.
352+
Try (positive) numbers that are all less than five.
353+
Try numbers greater than five.
278354

279355
High-Level Approach with ``aiohttp``
280356
------------------------------------

0 commit comments

Comments
 (0)