SecurityTube

Crawling The Web For Fun And Profit

Description: With over a couple of billion web pages on the Internet, it is but tempting to see how one can mine much of this information for fun or for profit. In this video, i run you through how to program a web crawler which will fetch pages and parse their content, so it can be converted into a useful format. The web crawler which we create in this tutorial, consists of mainly 2 parts:

Document fetching engine : This fetches the raw HTML page data from a website
Document parsing engine : This uses an HTML DOM Parser to parse the page and derive useful input from it.

Once you have learned how to parse the data, then the next step is to store the data in a database. This will allow you to tun further analysis on the data and derive interesting insights. We shall use the Python language and the BeautifulSoup DOM parser to pull this off. The video is very interactive and i use a "type as you go" methodology to help you understand the programming techniques. The code for this tutorial is available for download. <style type="text/css">body { background: #FFF; } </style>

Tags: programming ,

Disclaimer: We are a infosec video aggregator and this video is linked from an external website. The original author may be different from the user re-posting/linking it here. Please do not assume the authors to be same without verifying.

Comments:

Evildethow1983 on Tue 05 Feb 2013

Very nice tutorial.

Login to post a comment

Video Posted By

By SecurityTube_Bot
17281 Views, Posted Mon 21 Feb 2011 ago

View All His Videos

ST Course Videos

SecurityTube

Follow us

Login to post a comment