reading-notes


Project maintained by will-ing Hosted on GitHub Pages — Theme by mattgraham

Web scaping

Web scraping is a technique to automatically access and extract large amounts of information from a website, which can save a huge amount of time and effort.

  1. Most sites prohibit you from using the data for commercial purposes.
  2. Make sure you are not downloading data at too rapid a rate because this may break the website.

Steps for scraping

  1. inspect the website looking for links
  2. Import libraries request, urlib.request, time, beautiful soup
  3. fetching it
  4. extracting it

Beautiful soup provides a few useful methods. Navigating, searching and modifying the parse tree.

Main Page