Can GitHub Activity Predict the Next Startup Fundraise?

Six months ago I started watching public GitHub for a leading indicator that VCs were missing.

The thesis was simple. Hedge funds spent the last decade extracting alpha from satellite imagery, credit-card panels, and shipping data. The venture-capital equivalent — public engineering activity on GitHub — has been sitting in plain sight, ignored by most institutional sourcing teams, who still rely on Crunchbase, warm intros, and Twitter.

I built a crawler. I pointed it at 4,200 startup organizations across 20 sectors. I let it run weekly for six months.

Here is what I learned, what failed, and the public watchlist I am putting on the line.

What the signal actually looks like

The single most predictive indicator is not commit volume. It is commit velocity change.

A startup that ships 200 commits per week and continues to ship 200 commits per week tells you nothing. A startup that goes from 80 commits to 240 commits inside 14 days tells you something is happening. Often it is a fundraise close. Often it is a launch. Sometimes it is both.

The composite signal is three measurements run together:

  • Commit velocity (14-day window) — total commits to the org’s most active repo
  • Contributor growth (30-day window) — change in the count of unique contributors
  • New repo creation (30-day window) — fresh repos appearing in the org

When all three accelerate inside the same two-week window, I classify the org as “accelerating.” Across Q3 and Q4 of 2025, roughly 70% of accelerating startups announced a fundraise within six weeks of the signal firing. The lead time was 3 to 6 weeks ahead of any public source.

The four signal types

Not every acceleration looks the same. Four pre-fundraise patterns repeated often enough to name:

Engineering hiring burst. Contributor count jumps 40% or more inside 30 days. Often pre-Series A — the company has signed term sheets, started hiring, and the new engineers are pushing first commits. This shows up earlier than LinkedIn employee counts because contributions land before “Senior Engineer at Acme” gets posted.

Infrastructure buildout. Commits to ops, infra, deploy, and observability repos spike. The company is preparing to scale. Usually accompanies a Series A or Series B funding go-to-market expansion.

Deploy frequency spike. Commits per day double. Often a sign of a launch run-up — the team is shipping fixes and features fast. Sometimes followed by a product launch and a press cycle, sometimes by a fundraise where the metrics make the deck.

Framework migration. The team is migrating to a new stack — Next.js, Bun, a new ORM, fresh CI. This often happens 60 to 120 days before a Series A. It is the engineering equivalent of cleaning the apartment before parents visit. The hypothesis: founders know due diligence is coming and want a clean codebase to walk an investor through.

Where the signal fails

Honest accounting: the signal is bad for AI-pure startups. They commit constantly regardless of stage. Signal-to-noise is poor. I exclude AI-only orgs from the strongest classification tier and weight other features more heavily.

The signal is also useless for stealth startups that do not open-source. If your dream company is in stealth, GitHub gives you nothing.

And the signal is not investment advice. It tells you who to talk to. It does not tell you who to wire money to. The decision still requires founder conversations, product evaluation, market analysis, and competitive teardown. Engineering velocity is a sourcing signal, not an investment thesis.

What does not predict a round

Things that look meaningful in the data but turn out to be noise:

  • Star count. Vanity. A 30,000-star repo means it had a viral moment. It does not mean revenue.
  • Fork count. Lagging indicator at best. By the time forks accumulate, the round is closed.
  • Single-repo commit volume. Founders cosplay productivity. One person committing 400 times a week to one repo is a productivity tell, not a fundraise tell.
  • The GitHub trending tab. If a project is trending, the round is already 80% allocated. You are too late.

The signal is in change, not in level.

The public watchlist (Q3 2026 predictions)

Today I am publishing the 10 startups the model predicts will raise in Q3 2026. The list is dated, bookmarkable, and falsifiable. Come back in six months and check.

Each entry on the watchlist links to the underlying GitHub org so anyone can audit the signal. I am publishing this in public on purpose. Backtests are easy to fake. Live forward-tested predictions are hard to fake. The only way to build trust in an alternative data source is to show the work and let the future judge it.

The noise I had to filter out before any of this worked

Half of building this dataset was throwing data away. A naive crawl of commit counts will produce a confidently wrong watchlist on day one. What I had to subtract:

  • Bot commits. Dependabot, Renovate, GitHub Actions, semantic-release. They generate hundreds of commits per week per org and have nothing to do with engineering activity. I filter on author email patterns and on commits where the message matches chore(deps): and similar autogenerated prefixes.
  • Vendored code drops. Some startups push their entire monorepo or a vendored dependency tree in a single commit. That registers as a 50-file change with one author, and would otherwise blow out the velocity score. I cap per-commit file change weight.
  • Mirror repos. A startup pushes a public mirror of an internal repo and back-dates the history. Looks like a sudden 10,000-commit week. Detected by checking commit-date variance vs. push-date.
  • Personal projects on the org. Many startup orgs host an unrelated side project from one engineer. Excluded by tracking which repos receive multi-contributor activity.

After filtering, the working signal sits on roughly the top 1-2 repos per org by genuine multi-contributor activity in the last 90 days.

How to compute this yourself for a 50-startup watchlist

You do not need 4,200 orgs to start using this. Pick the 50 startups you actually care about and run the following:

  1. For each org, hit GET /orgs/{org}/repos?sort=pushed&per_page=10 and grab the top 3 most recently pushed repos.
  2. For each of those repos, hit GET /repos/{owner}/{repo}/stats/commit_activity (returns the last 52 weeks of weekly commit totals).
  3. Compute a 14-day rolling sum. Compare the most recent 14-day sum against the trailing 90-day median. A ratio above 2.0× is your shortlist.
  4. For each shortlisted org, hit GET /repos/{owner}/{repo}/contributors and compare unique-contributor count today vs. 30 days ago. A 40% jump is the second confirmation.
  5. List the orgs that pass both. That is your weekly call list.

A single Python script with requests and a 5,000-call/hour authenticated GitHub token gets you through 50 orgs in under a minute. The hard part is not the code — it is the noise filtering above, and the discipline of running it every Monday morning whether or not you feel like it.

How to use this in practice

If you are a VC, scout, or angel: monitor a small set of GitHub orgs in the sectors you cover. Compute commit-velocity change on a 14-day window. Watch for contributor count jumps and new-repo creation. When you see two of three together, that is your call list for the week.

If you are a founder: run the same calculation on your own org before your next pitch. Investors who use alt-data are already reading the public part. Know what your engineering signal looks like to them, and fix it before you start the round if it is steady or decelerating.

If you are pre-stealth and have not pushed any public code yet, this signal will not see you. That is the trade-off you are making for OPSEC.

What is next

I refresh the dataset every Monday at 09:00 UTC. New watchlists every quarter. The four signal types are not the final taxonomy — I expect at least one more (an “open-source go-to-market” signal that fires before PMF is reached) once the next two quarters of data are in.

The first VC firm to systematically use engineering-velocity data as a sourcing input has already won the next decade of seed-stage deal flow. The question is who that is going to be.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.