Find Most Active Contributors in a Repository with Python
Filter recent commits by date and count the most active contributors using Counter and datetime.
Python code
25 linesfrom collections import Counter
from datetime import datetime, timedelta
# Simulated commit data
commits = [
{"author": "Alice", "timestamp": datetime.now() - timedelta(days=1)},
{"author": "Bob", "timestamp": datetime.now() - timedelta(days=2)},
{"author": "Alice", "timestamp": datetime.now() - timedelta(days=3)},
{"author": "Charlie", "timestamp": datetime.now() - timedelta(days=5)},
{"author": "Bob", "timestamp": datetime.now() - timedelta(days=7)},
{"author": "Alice", "timestamp": datetime.now() - timedelta(days=10)},
{"author": "David", "timestamp": datetime.now() - timedelta(days=15)},
]
def find_most_active_contributors(commit_list, days=30, top_n=3):
cutoff = datetime.now() - timedelta(days=days)
recent_commits = [c for c in commit_list if c["timestamp"] >= cutoff]
contributor_counts = Counter(c["author"] for c in recent_commits)
return contributor_counts.most_common(top_n)
if __name__ == "__main__":
active = find_most_active_contributors(commits)
print("Most active contributors (last 30 days):")
for contributor, count in active:
print(f" {contributor}: {count} commits")
Output
Most active contributors (last 30 days):
Alice: 3 commits
Bob: 2 commits
Charlie: 1 commits
How it works
The function takes a list of commit dictionaries with an 'author' and 'timestamp' field. It calculates a cutoff date by subtracting the specified number of days from now. Using a list comprehension, it filters commits newer than the cutoff. A Counter then tallies each author's commits, and most_common(top_n) returns the top contributors sorted by count descending. This approach is efficient because it avoids manual grouping and sorting.
Common mistakes
- Forgetting to import Counter from collections and timedelta from datetime.
- Using naive datetime comparisons when timestamps might include timezone information.
- Assuming commits are already sorted by date instead of filtering by cutoff.
- Passing an empty list or invalid date format causing crashes.
Variations
- Replace simulated data with actual Git log parsing using subprocess to run 'git log --format=%an'.
- Use a pandas DataFrame to filter and group commits if already working in a data pipeline.
Real-world use cases
- Generating weekly team reports by analyzing recent commit activity in a shared repository.
- Identifying top contributors for recognition or sprint review dashboards.
- Filtering stale contributors to handle repository maintenance or onboarding outreach.
Sponsored
Keep learning
Related tutorials and quizzes for this topic.