This article will tell you how to use the Python requests module to retrieve a web page content by page URL and then save the web page content to a local file step by step.
1. Steps To Use Python Requests Module To Get A Web Page Content By URL.
- Open a terminal and run the command
pip show requests
to make sure the Python requests module has been installed.$ pip show requests Name: requests Version: 2.22.0 Summary: Python HTTP for Humans. Home-page: http://python-requests.org Author: Kenneth Reitz Author-email: me@kennethreitz.org License: Apache 2.0 Location: /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages Requires: chardet, idna, urllib3, certifi Required-by:
- If your Python environment does not contain the Python requests module, you can run the command
pip install requests
to install it. - Now run the command python in the terminal to go to the python interactive console.
$ python Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 16:52:21) [Clang 6.0 (clang-600.0.57)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>>
- Import the requests module.
>>> import requests
- Define a variable to contain a web page URL.
>>> web_page_url = "http://www.google.com"
- Add a User-Agent header to simulate a real web browser to send the requests. The User-Agent header should be saved in a dictionary object.
>>> headers = { 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36' }
- Make an HTTP get requests to the webserver by the requests module get method, and the get method returns a response object.
>>> response = requests.get(url=web_page_url, headers=headers)
- Get the request web page text content by the response.text attribute.
>>> page_content = response.text
- Print out the text to verify it is correct.
>>> print(page_content)
- Write the web page content to a local file to save it.
>>> with open('./google.html', 'w', encoding='utf8') as fp: ... fp.write(page_content) ... 131502 >>> print('Save web page content ' + web_page_url + ' successfully.') Save web page content http://www.google.com successfully.
- Read the local file content to verify it is the web page content.
# Open the local file with read permission. >>> with open('./google.html', 'r', encoding='utf8') as fp: ... line = fp.readline() # read one line text. # Only when the read-out text's length is 0 then quit the loop. ... while len(line) > 0: ... print(line) # read the next line. ... line = fp.readline()