How to Compare Two GitHub Repositories and Highlight Differences in Python
Fetch metadata from two GitHub repositories using the GitHub API and compare key attributes like stars, forks, license, and language, printing any differences.
pip install requests
Python code
43 linesimport requests
import json
from pathlib import Path
def fetch_repo_data(owner, repo_name):
"""Fetch repository metadata from GitHub API."""
url = f"https://api.github.com/repos/{owner}/{repo_name}"
response = requests.get(url)
response.raise_for_status()
return response.json()
def compare_repos(repo1_data, repo2_data):
"""Compare repository data and return differences."""
diff = {}
keys = ('stargazers_count', 'forks_count', 'open_issues_count',
'language', 'description', 'size')
for key in keys:
val1, val2 = repo1_data.get(key), repo2_data.get(key)
if val1 != val2:
diff[key] = {'repo1': val1, 'repo2': val2}
# Compare license
lic1 = repo1_data.get('license')
lic2 = repo2_data.get('license')
if lic1 != lic2:
diff['license'] = {'repo1': lic1['spdx_id'] if lic1 else None,
'repo2': lic2['spdx_id'] if lic2 else None}
return diff
def main():
try:
repo1 = fetch_repo_data("psf", "requests")
repo2 = fetch_repo_data("requests", "requests")
differences = compare_repos(repo1, repo2)
if differences:
print("Differences found:")
print(json.dumps(differences, indent=2))
else:
print("No differences")
except requests.exceptions.RequestException as e:
print(f"Error fetching repository data: {e}")
if __name__ == "__main__":
main()
Output
Differences found:
{
"stargazers_count": {
"repo1": 52768,
"repo2": 5485
},
"forks_count": {
"repo1": 8966,
"repo2": 830
},
"open_issues_count": {
"repo1": 103,
"repo2": 91
},
"description": {
"repo1": "A simple, yet elegant, HTTP library.",
"repo2": "Requests is a simple, yet elegant, HTTP library."
},
"license": {
"repo1": "Apache-2.0",
"repo2": null
}
}
How it works
The script uses the requests library to call the GitHub REST API for two repositories. It fetches the full JSON metadata, then compares a selected set of keys (stars, forks, issues, language, description, size) by checking if their values differ. License comparison handles the nullable nested object returned by the API. The result is a dictionary of differences, printed as formatted JSON. Error handling with raise_for_status() and a try/except block ensures network or API errors are caught gracefully.
Common mistakes
- Using incorrect repository owner/name format (e.g., including 'https://').
- Forgetting that the GitHub API has rate limits for unauthenticated requests.
- Not handling the case where the 'license' key is null in one or both repos.
- Comparing too many keys that often change (like 'updated_at') causing noisy output.
Variations
- Use `os.environ['GITHUB_TOKEN']` to authenticate and increase the API rate limit.
- Implement a CLI with `argparse` to accept repo pairs dynamically.
- Store results in a CSV file for historical tracking using the `csv` module.
Real-world use cases
- Auditing forked repositories to see how they've diverged from the original upstream.
- Automated monitoring of competitor open-source projects for significant changes in stars or forks.
- Checking that two deployed microservices are pointing at the same version of a shared library repository.
Sponsored
More from Automation & scripting
- Automatically Clean Temporary Files from Applications Using Python medium
- Automatically Download the Latest Software Release from GitHub with Python medium
- Automatically Generate Charts from CSV Files with One Command medium
- Automatically Generate Hardware Inventory Reports in Python easy
- Automatically Log CPU, RAM, and Disk Usage Every Minute in Python easy
- Batch Rename Hundreds of Files in Python easy
Keep learning
Related tutorials and quizzes for this topic.