Skip to content
This repository was archived by the owner on May 31, 2021. It is now read-only.

Commit 7110119

Browse files
committed
Add details about getting one page asynchronously.
1 parent fd65a9b commit 7110119

File tree

2 files changed

+60
-5
lines changed

2 files changed

+60
-5
lines changed

examples/async_page.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
# file: async_page.py
2+
13
"""Get a "web page" asynchronously.
24
"""
35

@@ -27,8 +29,8 @@ async def get_page(host, port, wait=0):
2729
b'Connection: close',
2830
b'', b''
2931
]))
30-
msg_lines = []
3132
header = []
33+
msg_lines = []
3234
async for raw_line in reader:
3335
line = raw_line.decode(ENCODING).strip()
3436
if not line.strip():

webscraper.rst

Lines changed: 57 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -46,11 +46,12 @@ response::
4646

4747
Use different numbers and see how long it takes until the server responds.
4848

49-
The implementation looks like this:
49+
The full implementation looks like this:
5050

5151
.. literalinclude:: examples/simple_server.py
5252
:language: python
5353

54+
Let's have a look into the details.
5455
This provides a simple multi-threaded web server:
5556

5657
.. literalinclude:: examples/simple_server.py
@@ -80,10 +81,12 @@ The rest of the method contains the HTTP header and message.
8081
A Synchronous Client
8182
--------------------
8283

83-
Our first attempt is synchronous:
84+
Our first attempt is synchronous.
85+
This is the full implementation:
8486

8587
.. literalinclude:: examples/synchronous_client.py
8688

89+
Again, we go through step-by-step.
8790

8891
While about 80 % of the websites use ``utf-8`` as encoding
8992
(provided by the default in ``ENCODING``), it is a good idea to actually use
@@ -145,11 +148,61 @@ That is the sum of all waiting times.
145148
Let's see see if we can do better going asynchronously.
146149

147150

148-
Getting a Page Asynchronously
149-
-----------------------------
151+
Getting One Page Asynchronously
152+
-------------------------------
153+
154+
This module contains a functions that reads a page asynchronously,
155+
using the new Python 3.5 keywords ``async`` and ``await``:
156+
157+
.. literalinclude:: examples/async_page.py
158+
159+
As with the synchronous example, finding out the encoding of the page
160+
is a good idea.
161+
This function helps here by going through the lines of the HTTP header,
162+
which it gets as an argument, searching for ``charset`` and returning is value
163+
if found.
164+
Again, the default encoding is ``ISO-8859-1``:
150165

151166
.. literalinclude:: examples/async_page.py
167+
:language: python
168+
:start-after: ENCODING = 'ISO-8859-1'
169+
:end-before: async def get_page
170+
171+
The next function is way more interesting because it actually works
172+
asynchronously:
173+
174+
.. literalinclude:: examples/async_page.py
175+
:language: python
176+
:start-after: return ENCODING
152177

178+
The function ``asyncio.open_connection()`` opens a connection to the given URL.
179+
It returns a coroutine.
180+
Using ``await``, which had to be ``yield from`` in Python versions prior
181+
to 3.5, it yields an instance of a ``StreamReader`` and one of a
182+
``StreamWriter``.
183+
These only work within the event loop.
184+
185+
Now, we can send a ``GET`` request, suppling our waiting time by
186+
writing to the ``StreamWriter`` instance ``writer``.
187+
The request has to be in bytes.
188+
Therefore, we need to convert our strings in to bytestrings.
189+
190+
Next, we read header and message from the reader, which is a ``StreamReader``
191+
instance.
192+
We need to iterate over the reader by using the specific for loop for
193+
``asyncio``::
194+
195+
async for raw_line in reader:
196+
197+
198+
Header and message are dived by an empty line.
199+
We just stop the iteration as soon as we found an empty line.
200+
Handing the header over too ``get_encoding()`` provides the encoding
201+
of the retrieved page.
202+
The ``.decode()`` method uses this encoding to convert the read bytes
203+
into strings.
204+
After closing the writer, we can return the message lines joined by newline
205+
characters.
153206

154207
Getting Multiple Pages Asynchronously - Without Time Savings
155208
------------------------------------------------------------

0 commit comments

Comments
 (0)