reading-notes

Project maintained by will-ing Hosted on GitHub Pages — Theme by mattgraham

Web scaping

Web scraping is a technique to automatically access and extract large amounts of information from a website, which can save a huge amount of time and effort.

Most sites prohibit you from using the data for commercial purposes.
Make sure you are not downloading data at too rapid a rate because this may break the website.

Steps for scraping

inspect the website looking for links
Import libraries request, urlib.request, time, beautiful soup
fetching it
extracting it

Beautiful soup provides a few useful methods. Navigating, searching and modifying the parse tree.

Beautiful Soup automatically converts incoming documents to Unicode and outgoing documents to UTF-8.
Beautiful Soup sits on top of popular Python parsers like lxml and html5lib,

Main Page