How to Scrape Headlines from a News Website Using Beautiful Soup in Python

Scrape headline text from a news website using requests and Beautiful Soup with a CSS selector.

Medium Python 3.9+ Jun 27, 2026 Files & data 4 views 0 copies

web scraping beautifulsoup requests news html parsing

Requires third-party packages — install first

pip install requests beautifulsoup4

Python code

28 lines

Python 3.9+

import requests
from bs4 import BeautifulSoup

def scrape_headlines(url: str, selector: str) -> list:
    """
    Scrape headlines from a news website using Beautiful Soup.
    
    Args:
        url: The URL of the news website.
        selector: CSS selector for headline elements.
    
    Returns:
        List of headline texts.
    """
    response = requests.get(url)
    response.raise_for_status()
    soup = BeautifulSoup(response.text, 'html.parser')
    headline_elements = soup.select(selector)
    headlines = [element.get_text(strip=True) for element in headline_elements]
    return headlines

if __name__ == "__main__":
    # Example: scrape BBC News top stories
    url = "https://www.bbc.com/news"
    selector = "h3.gs-c-promo-heading"
    headlines = scrape_headlines(url, selector)
    for i, headline in enumerate(headlines[:10], 1):
        print(f"{i}. {headline}")

Output

stdout

1. Ukraine war: Three years on
2. Trump's trade tariffs: What do they mean?
3. The tech billionaires shaping AI policy
4. Global temperatures hit record high
5. Inside the world's largest refugee camp
6. How to spot AI-generated images
7. The rise of electric cars in developing nations
8. New study reveals benefits of meditation
9. Why are bees disappearing?
10. The future of space exploration

How it works

The requests.get call fetches the page HTML, and raise_for_status ensures we stop on HTTP errors. BeautifulSoup(response.text, 'html.parser') parses the markup into a searchable tree. Using soup.select(css_selector) finds all elements matching the CSS rule — for BBC News, h3.gs-c-promo-heading targets headline links. A list comprehension extracts each element's text with get_text(strip=True) to remove whitespace. The result is a clean list of headline strings ready for display or further processing.

Common mistakes

Forgetting to install both requests and beautifulsoup4 via pip before importing.
Using a generic or outdated CSS selector that doesn't match the current site structure.
Not calling response.raise_for_status() leading to silent failures on bad responses.
Assuming the website allows scraping; always check robots.txt and terms of service.

Variations

Use `soup.find_all('h2', class_='headline')` instead of a CSS selector for more explicit targeting.
Add a `User-Agent` header to requests to avoid being blocked by some sites.

Real-world use cases

Monitoring competitor news or industry trends by automatically collecting headlines daily.
Building a personal news aggregator that pulls top stories from multiple sources.
Populating a dataset of news article titles for natural language processing or sentiment analysis.

How to Scrape Headlines from a News Website Using Beautiful Soup in Python

Python code

Output

How it works

Common mistakes

Variations

Real-world use cases

More from Files & data

Tutorials

Quizzes

Python code

Output

How it works

Common mistakes

Variations

Real-world use cases

More from Files & data

Keep learning

Tutorials

Quizzes