How to Create a Web Scrapper Using Python ? Easy Step By Step 👍

Mediology July 20, 2024

Step 1: Install Required Libraries

First, you need to install the necessary libraries if you haven't already. You can do this using pip:

sh
pip install requests beautifulsoup4

Step 2: Import the Libraries

python
import requests
from bs4 import BeautifulSoup

Step 3: Fetch the Web Page

Use the requests library to fetch the content of the web page.

python
url = 'http://example.com'
response = requests.get(url)

# Check if the request was successful
if response.status_code == 200:
    page_content = response.text
else:
    print(f"Failed to retrieve the web page. Status code: {response.status_code}")

Step 4: Parse the HTML Content

Use BeautifulSoup to parse the HTML content.

python
soup = BeautifulSoup(page_content, 'html.parser')

Step 5: Extract the Data

Find the elements you want to scrape. This step will vary depending on the structure of the HTML content.

python
# Example: Extracting all the headings
headings = soup.find_all('h1')

for heading in headings:
    print(heading.text.strip())

Step 6: Putting It All Together

Here is a complete example script that fetches a web page and prints all the headings (<h1> tags).

python
import requests
from bs4 import BeautifulSoup

def fetch_web_page(url):
    response = requests.get(url)
    if response.status_code == 200:
        return response.text
    else:
        print(f"Failed to retrieve the web page. Status code: {response.status_code}")
        return None

def parse_html(html_content):
    return BeautifulSoup(html_content, 'html.parser')

def extract_headings(soup):
    headings = soup.find_all('h1')
    return [heading.text.strip() for heading in headings]

def main():
    url = 'http://example.com'
    html_content = fetch_web_page(url)
    
    if html_content:
        soup = parse_html(html_content)
        headings = extract_headings(soup)
        
        for heading in headings:
            print(heading)

if __name__ == '__main__':
    main()

Post a Comment

0 Comments