Skip to content

Commit 455a767

Browse files
committed
Improve spelling, grammar, and style.
1 parent 9ed51f2 commit 455a767

File tree

1 file changed

+16
-16
lines changed

1 file changed

+16
-16
lines changed

webscraper.rst

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,13 @@ Web Scraping
44

55
Web scraping means downloading multiple web pages, often from different
66
servers.
7-
Typically, there is a considerable waiting time involved between sending a
8-
request and receiving the answer.
7+
Typically, there is a considerable waiting time between sending a request and
8+
receiving the answer.
99
Using a client that always waits for the server to answer before sending
10-
the next request, means spending most of time waiting.
11-
Here ``asyncio`` can help to send many request without waiting for a response
10+
the next request, can lead to spending most of time waiting.
11+
Here ``asyncio`` can help to send many requests without waiting for a response
1212
and collecting the answers later.
13-
The next examples show how a synchronous client spends most of the
13+
The following examples show how a synchronous client spends most of the time
1414
waiting and how to use ``asyncio`` to write asynchronous client that
1515
can handle many requests concurrently.
1616

@@ -75,7 +75,7 @@ The request handler only has a ``GET`` method:
7575
It takes the last entry in the paths with ``self.path[1:]``, i.e.
7676
our ``2.5``, and tries to convert it into a floating point number.
7777
This will be the time the function is going to sleep, using ``time.sleep()``.
78-
This means waits 2.5 seconds until it answers.
78+
This means waiting 2.5 seconds until it answers.
7979
The rest of the method contains the HTTP header and message.
8080

8181
A Synchronous Client
@@ -86,11 +86,11 @@ This is the full implementation:
8686

8787
.. literalinclude:: examples/synchronous_client.py
8888

89-
Again, we go through step-by-step.
89+
Again, we go through it step-by-step.
9090

9191
While about 80 % of the websites use ``utf-8`` as encoding
9292
(provided by the default in ``ENCODING``), it is a good idea to actually use
93-
the encoding of that is specified by ``charset``.
93+
the encoding specified by ``charset``.
9494
This is our helper to find out what the encoding of the page is:
9595

9696
.. literalinclude:: examples/synchronous_client.py
@@ -120,8 +120,8 @@ Now, we want multiple pages:
120120
We just iterate over the waiting times and call ``get_page()`` for all
121121
of them.
122122
The function ``time.perf_counter()`` provides a time stamp.
123-
Taking two time stamps a different and calculating their difference
124-
provides the elapsed run time.
123+
Taking two time stamps a different points in time and calculating their
124+
difference provides the elapsed run time.
125125

126126
Finally, we can run our client::
127127

@@ -145,7 +145,7 @@ and get this output::
145145
Because we wait for each call to ``get_page()`` to complete, we need to
146146
wait about 11 seconds.
147147
That is the sum of all waiting times.
148-
Let's see see if we can do better going asynchronously.
148+
Let's see if we can do it any better going asynchronously.
149149

150150

151151
Getting One Page Asynchronously
@@ -159,7 +159,7 @@ using the new Python 3.5 keywords ``async`` and ``await``:
159159
As with the synchronous example, finding out the encoding of the page
160160
is a good idea.
161161
This function helps here by going through the lines of the HTTP header,
162-
which it gets as an argument, searching for ``charset`` and returning is value
162+
which it gets as an argument, searching for ``charset`` and returning its value
163163
if found.
164164
Again, the default encoding is ``ISO-8859-1``:
165165

@@ -189,7 +189,7 @@ Therefore, we need to convert our strings in to bytestrings.
189189

190190
Next, we read header and message from the reader, which is a ``StreamReader``
191191
instance.
192-
We need to iterate over the reader by using the specific for loop for
192+
We need to iterate over the reader by using a special or loop for
193193
``asyncio``:
194194

195195
.. code-block:: python
@@ -350,7 +350,7 @@ Exercise
350350
Add more waiting times to the list ``waits`` and see how this impacts
351351
the run times of the blocking and the non-blocking implementation.
352352
Try (positive) numbers that are all less than five.
353-
Try numbers greater than five.
353+
Then try numbers greater than five.
354354

355355
High-Level Approach with ``aiohttp``
356356
------------------------------------
@@ -376,8 +376,8 @@ The function to get one page is asynchronous, because of the ``async def``:
376376
:start-after: import aiohttp
377377
:end-before: def get_multiple_pages
378378

379-
The arguments are the same as for the previous function to retrieve one page
380-
plus the additional argument ``session``.
379+
The arguments are the same as those for the previous function to retrieve one
380+
page plus the additional argument ``session``.
381381
The first task is to construct the full URL as a string from the given
382382
host, port, and the desired waiting time.
383383

0 commit comments

Comments
 (0)