Maintenance

Site is under maintenance — quizzes are still available.

Go to quizzes
Sponsored Reserved space — layout preview until AdSense is connected

How to Scrape Headlines from a News Website Using Beautiful Soup in Python

Scrape headline text from a news website using requests and Beautiful Soup with a CSS selector.

Medium Python 3.9+ Jun 27, 2026 Files & data 4 views 0 copies

Requires third-party packages — install first
pip install requests beautifulsoup4

Python code

28 lines
Python 3.9+
import requests
from bs4 import BeautifulSoup

def scrape_headlines(url: str, selector: str) -> list:
    """
    Scrape headlines from a news website using Beautiful Soup.
    
    Args:
        url: The URL of the news website.
        selector: CSS selector for headline elements.
    
    Returns:
        List of headline texts.
    """
    response = requests.get(url)
    response.raise_for_status()
    soup = BeautifulSoup(response.text, 'html.parser')
    headline_elements = soup.select(selector)
    headlines = [element.get_text(strip=True) for element in headline_elements]
    return headlines

if __name__ == "__main__":
    # Example: scrape BBC News top stories
    url = "https://www.bbc.com/news"
    selector = "h3.gs-c-promo-heading"
    headlines = scrape_headlines(url, selector)
    for i, headline in enumerate(headlines[:10], 1):
        print(f"{i}. {headline}")

Output

stdout
1. Ukraine war: Three years on
2. Trump's trade tariffs: What do they mean?
3. The tech billionaires shaping AI policy
4. Global temperatures hit record high
5. Inside the world's largest refugee camp
6. How to spot AI-generated images
7. The rise of electric cars in developing nations
8. New study reveals benefits of meditation
9. Why are bees disappearing?
10. The future of space exploration

How it works

The requests.get call fetches the page HTML, and raise_for_status ensures we stop on HTTP errors. BeautifulSoup(response.text, 'html.parser') parses the markup into a searchable tree. Using soup.select(css_selector) finds all elements matching the CSS rule — for BBC News, h3.gs-c-promo-heading targets headline links. A list comprehension extracts each element's text with get_text(strip=True) to remove whitespace. The result is a clean list of headline strings ready for display or further processing.

Common mistakes

  • Forgetting to install both requests and beautifulsoup4 via pip before importing.
  • Using a generic or outdated CSS selector that doesn't match the current site structure.
  • Not calling response.raise_for_status() leading to silent failures on bad responses.
  • Assuming the website allows scraping; always check robots.txt and terms of service.

Variations

  1. Use `soup.find_all('h2', class_='headline')` instead of a CSS selector for more explicit targeting.
  2. Add a `User-Agent` header to requests to avoid being blocked by some sites.

Real-world use cases

  • Monitoring competitor news or industry trends by automatically collecting headlines daily.
  • Building a personal news aggregator that pulls top stories from multiple sources.
  • Populating a dataset of news article titles for natural language processing or sentiment analysis.

Sponsored

Sponsored Reserved space — layout preview until AdSense is connected

Run locally

This sample needs third-party packages, so it cannot run in the browser IDE. Copy the code above, install the packages shown at the top, then run it in your own Python environment.

More from Files & data

Related tutorials and quizzes for this topic.