Description: With over a couple of billion web pages on the Internet, it is but tempting to see how one can mine much of this information for fun or for profit. In this video, i run you through how to program a web crawler which will fetch pages and parse their content, so it can be converted into a useful format. <br><br>The web crawler which we create in this tutorial, consists of mainly 2 parts:<br>
Document fetching engine : This fetches the raw HTML page data from a website<br><br>
Document parsing engine : This uses an HTML DOM Parser to parse the page and derive useful input from it.
Once you have learned how to parse the data, then the next step is to store the data in a database. This will allow you to tun further analysis on the data and derive interesting insights. <br><br>We shall use the Python language and the BeautifulSoup DOM parser to pull this off. The video is very interactive and i use a "type as you go" methodology to help you understand the programming techniques.<br><br>The code for this tutorial is available for download. <br><br><style type="text/css">body { background: #FFF; } </style>
Disclaimer: We are a infosec video aggregator and this video is linked from an external website. The original author may be different from the user re-posting/linking it here. Please do not assume the authors to be same without verifying.
Very nice tutorial.