Web scaping
Web scraping
is a technique to automatically access and extract large amounts of information from a website, which can save a huge amount of time and effort.
- Most sites prohibit you from using the data for commercial purposes.
- Make sure you are not downloading data at too rapid a rate because this may break the website.
Steps for scraping
- inspect the website looking for links
- Import libraries
request, urlib.request, time, beautiful soup
- fetching it
- extracting it
Beautiful soup provides a few useful methods. Navigating, searching and modifying the parse tree.
- Beautiful Soup automatically converts incoming documents to Unicode and outgoing documents to UTF-8.
- Beautiful Soup sits on top of popular Python parsers like lxml and html5lib,
Main Page