SEOs have a way of ruining everything.
That’s the long arc and has been proven repeatedly since the internet landed in your home.
In the 90s, “on-page SEO” was the whole game. Website builders realized that keyword density and meta-tag stuffing would lead to major results on Yahoo and Google. In other words, “the more a single word or phrase appears, the more credible that page must be about the topic.” SEO operators abused this in an almost-comical way, including hidden white-text-on-white-background paragraphs at the bottom of every page.
The gatekeepers (search engines) shifted, and so did the operators.
Off-page took over for the next twenty years. Backlinks. Domain authority. Anchor text. Link farms. Guest-post networks. Private blog networks where you’d create 10 websites on 10 different domains all with the goal of driving authority to one site of your own.
The cat and mouse game continued to take many twists and turns.
When AI tools became cheap and accessible, the latest in “SEOs ruining the internet” took hold with “programmatic SEO.”
The concept was simple: spin up thousands of long-tail-targeted, low-substance pages and flood Google with structurally-clean content that was all linked and technically-sound. With this, one would expect to capture the long-tail keywords around their business (like “best TV for video games” vs. “best TV”). Founders pitched it as a go-to-market strategy. Investors funded it.
It worked for a minute.
Then on-page programmatic SEO failed. Major search engines re-aligned the weighting and importance of internal linking, off-page references, and the structural quality of the content graph itself. The pivot wasn’t subtle if you were watching. A lot of ugly, “down and to the right” traffic charts and domains that ended up with “manual actions” on Google, also known as being de-indexed entirely.
That also created a new market of websites that exist primarily to sell guest posts, where a business operator pays for placement, gets the backlink, and count on the host’s domain authority to do the lifting to help the target page. The 2026 guest-post host site is what the PBN or link farm was in 2010. Same mechanic, different paint job.
But what many in the market started to see is that LLMs initially seemed to struggle to differentiate between the New York Times and a page that was built with four or five lines in a prompt.
I didn’t believe people (who were selling this as a solution) when they said this was the case, so I tried it out myself.
I launched more than a dozen “news sites” with the purpose of measuring indexing rates, rankings on search, and referrals by LLMs.
It was uncomfortably easy to launch a news site that looked credible-enough, requiring only $10 and a 4-5 line prompt.
I’m sharing my findings in detail below, but want to say up top that the gatekeepers held. The discourse around these programmatic news sites is mostly wrong and the data is more reassuring than you’ve been told.
And I also want to flag that this is simply a snapshot of January through May 2026. The game that is online discoverability is a never-ending game of cat and mouse and the observations I share here are not meant to be a be-all and end-all guide for anything. Just some stuff I learned as we conducted the (many) experiments.
There’s also a part where the news isn’t reassuring at all, and that’s where the social layer comes in.
We’ll get to that.
What we tested
The setup is a series of experiments we started in January 2026 and wrapped this week. The metric stack was tracked throughout. The reportable data window covers the most recent 90 days of indexed activity.
We built test sites across multiple verticals, geographies, and structural configurations. The breadth of the test matrix matters because the question wasn’t does this work? It was where does this work, where does it break, and what specifically catches it? One site running one configuration teaches almost nothing. More than a dozen running across a structured matrix teaches a lot.
Here’s what we varied:
| Dimension | Variants tested |
|—-|—-|
| Vertical | Local general news. Business. Sports. Lifestyle. Niche-political. Medical. Trade-publication-style verticals alongside general-news framing, to test whether industry-specific positioning performed differently. |
| Geography | Northeast, Midwest, South, West Coast, plus niche-state-targeting variants. Some were specific to large cities, small cities, or specific states (again, both large and small). |
| Content type | Hard news write-ups. Evergreen guides. Listicles. Business profiles. How-tos. Deep-dive long-form. |
| Content length | Short blurbs (200-400 words). Mid-length (600-1,000). Long-form (1,500-3,000+). |
| Content origin | Direct LLM output. Humanized to under 30% with help from AI-writing detection API. Humanized to under 1%. Fully human-written control. |
| Author signal | Individual personas with bios and (AI-generated) headshots. “Staff” desk byline. Unattributed. |
| Schema markup | Full structured data (NewsArticle, author, organization, breadcrumb). Versus minimal. |
| Internal linking | Topic clustering with cross-links. Versus flat structure. |
| Source attribution | Cited and linked. Cited without link. Paraphrased without citation. Fully unsourced. |
| Analytics | 50% using Google Analytics, others using none or open-source analytics. |
| Distribution | RSS aggregation pickup channels. Versus organic-discovery only. |
| Platform | Astro static, WordPress, Cloudflare Pages, multiple deploy patterns. |
| Visual design | Image-forward. Text-first. Hybrid. |
| Social signal | LinkedIn and Facebook business pages set up for some sites, not others. |
| Bought traffic | Small batches of click-through traffic piped through Google for selected domains. |
We also ran a parallel arm. Instead of building authority from scratch, we paid for article placements on existing third-party publisher sites that already had domain authority. Targets were news items, individuals, and people.
Every post placement was fact-checked for accuracy before going live. We struggled initially with fabricated quotes and credentials, but straightened that out quickly. No false claims about real people or events. The discipline mattered. We were adding noise to the web (and for that I apologize), but if we were going to be noisy, we had to make sure the noise wasn’t *bad *noise.
And importantly, as this was a research project, there were some rules: committing to a window, tracking metrics from day one, and publishes findings whether the outcome was good or bad. We did all three. This essay is the publication.
What actually happened
Indexation. Across the network, 17 sites generated about 3,560 URLs in their sitemaps across the test window. Google indexed roughly 107 of them.
That’s a network indexation rate of 3 percent.
Per-site indexation ran from 4 to 20 URLs each (usually including the homepage, some category pages, and the most recent post or two on the news-sitemap.xml) against sitemap counts ranging from 115 to 895.
The largest sitemap in the network, had 30 URLs indexed out of 895. One site had 6 URLs indexed against an empty sitemap, meaning Google had crawled and indexed pages we hadn’t even formally submitted, then decided most of the rest weren’t worth keeping.
Traffic. Total network pageviews across the test window came in at 65,722. Visitors: 44,546. Visits: 47,963. Network bounce rate: 88.9 percent.
The top three sites by traffic accounted for around 64 percent of the total. Two sites recorded zero (real) pageviews across their entire lifetime. A bounce rate near 89 percent means readers landed and left. Whatever they came looking for, they didn’t find it.
Ranking. Across all domains, the network achieved approximately 467 ranked keywords in the top 100 of Google’s US results.
Top-10 placements: 5. Top-3 placements: 2. Number-one placements: 1.
The single number-one was for a branded query. The site was ranking for its own name. Across 17 sites, the full program, ~3,500 URLs published, eleven content-origin variants, full schema, varied author signal, social scaffolding, and a parasite-SEO arm: exactly one number-one ranking, and it was for the domain ranking for itself.
The AI-detection scores didn’t predict anything. We enforced a 30 percent “AI-generated” maximum across all generated content. Many runs came in under 1 percent. Interestingly, though perhaps not because of the sample size, the detection score did not predict ranking outcome. Sub-1 percent pages ranked indistinguishably from under-30 percent pages. Both populations ranked badly. ==Passing the AI detector is necessary in 2026 but it’s far from sufficient when it comes to ranking.==
The parasite arm performed directionally consistently. Rented authority did not meaningfully outperform built authority. The same gatekeepers that filtered the network filtered the placements. The parasite results only reinforce what we found building it ourselves: the systems are working at the algorithmic layer, and where you start from doesn’t matter as much as the discourse implies.
The headline finding
The full program window, ~3,500 published URLs, eleven content-origin variants, full schema, varied author signal, social scaffolding, RSS aggregation, and a parallel paid-placement arm, the network produced one branded number-one, two top-three placements, an 88.9 percent bounce rate, and a 3 percent indexation rate.
The gatekeepers caught it. Specifically:
- Google’s helpful-content systems filtered most of the content out of the index. The 3 percent indexation rate is what catching looks like at scale. Even when we thought we did a nice job generating E-E-A-T content at scale.
- The LLM grounding layer (Perplexity, ChatGPT search, Gemini grounding) didn’t surface or cite the network in any meaningful volume. The agentic-search infrastructure that’s supposed to be the next ranking battleground treated this network the way the legacy SERP did.
- Google’s medical-content and legal-content quality systems did the heaviest filtering. The medical-vertical site in our network was suppressed harder than any other category across every metric. That filtering worked, and it should have. Medical content has the highest harm vector in the consumer-search stack. The fact that Google treats it accordingly is reassuring.
- The user signal, that 88.9 percent bounce rate, is the gatekeeper algorithms can measure secondhand (for the sites where Google Analytics was installed, 50%). Readers landed and left. The model held up where it counted, which is in front of actual human attention.
==If you’ve been told the LLM era is going to flood Google with AI slop and that nothing can be done, this is the news: something has been done. The systems caught most of it. Not all. Most.==
What didn’t work, and why it matters
Six findings worth carrying out of the data.
1. Our author persona system was detailed, and no one bought it. Detailed bios. Headshots. Minor (though inactive) social-account scaffolding. None of it correlated with ranking. Generic “staff” bylines performed indistinguishably from full-bio personas. The AI-content tooling industry is selling persona infrastructure that the algorithm already discounts.
2. The aggregation layer was the weakest link. Sites that pulled RSS sources without backlinks performed worst across every metric. ==Cite-or-be-filtered.== The pattern of summarizing public-record reporting without attribution to keep traffic on your own pages is the exact pattern that trips Google’s helpful-content suppression hardest. Practitioners who skip source attribution to keep readers on-site are paying with their own indexation.
3. AI-detection scores under 1 percent did not predict ranking success. Originality passes does not equal Google quality pass. The two systems measure different things. The detection layer scores synthetic-pattern matching against known LLM output. Google’s quality systems score relevance, retention, expertise signals, behavior. They’ve decoupled. A page can score 0 percent on AI-detection and still be useless to a reader. The model knows that.
4. Vertical-specific outcomes diverged sharply. General-news verticals ranked weakly. Business and sports ranked moderately on long-tail. The medical and legal verticals were suppressed harder than any other category, which is appropriate behavior given the harm vector in medical-content errors. The takeaway for practitioners isn’t don’t build medical sites with AI. It’s don’t expect Google to treat your medical content the way it treats your sports content. The quality bar is higher than ever and it’s calibrated for a reason.
5. Bought traffic did move the needle. Briefly. We piped a small volume of click-through traffic from Google’s SERP into selected test domains. The affected sites lifted in ranking for several days. Then they settled back to where they had been before the spike.
The behavioral signal is real. The ranker reads it. But bought traffic isn’t backed by anything durable, and the system corrects fast. Practitioners who use traffic-purchase services to bootstrap rankings are paying for short windows that don’t compound. If you’re working on a site where the next week matters more than the next year (a launch, a press cycle, a one-time event), there’s a use case. For long-horizon ranking, the signal evaporates.
What this confirms is that user-behavior signals carry real weight in 2026. The technical-SEO discourse undersells how much. Schema markup gets you crawled. Behavior gets you ranked. And behavior is harder to fake at scale than the SEO industry pretends.
6. Social-platform scaffolding didn’t move ranking. We set up LinkedIn and Facebook business pages for several test sites and skipped them entirely for others. We tried a few additional social approaches: cross-posting, profile presence on adjacent platforms. The variation didn’t show up in the ranking data. Sites with full social scaffolding ranked indistinguishably from sites with none.
This doesn’t mean social is irrelevant. It means social-signal scaffolding (pages, profiles, perimeter you set up because best-practice listicles tell you to) isn’t what social-driven authority looks like in 2026. Real social spread, where actual people share the work and cite the byline by name, probably is. We didn’t test that. But the algorithms don’t reward scaffolding.
What’s next
The test sites came down on April 30 and May 1. The LinkedIn and Facebook business pages we’d stood up for several of them came down with the rest of the perimeter. There are some other minor references and pages that are on their way down.
:::info
Here’s what should bother you about the experiment. It took roughly $10 in domain registration and 20 minutes on Claude Code to spin up a site that looked plausible. And though the algorithmic layer caught it and I believe most human readers would, too, that’s not necessarily the case.
:::
When I look at sites similar to those that I built, I see many of our observations are proving to be true there, too, but there are some operators who are doing well by ignoring search and AI in favor of discovery by the many social media algorithms that run our lives.
In other words, the bad guys are bad guy-ing over on Facebook and getting away with it. (And the traffic that funnels in from social signals does appear to then give them some additional love on search and LLMs.)
The high-authority publisher sites that accept paid article placements (the hosts for the entire guest-post and parasite-SEO economy) get most of their traffic from social and the LLM grounding layer is now (well, for now) pulling from socially-amplified content. So a paid guest post on a social-distributed host can end up cited inside ChatGPT or Perplexity not because it ranked, but because it traveled. That’s a different gatekeeping problem.
That means we’re stuck in this reality with AI-generated influencer accounts on Instagram and TikTok at industrial scale.
Bot networks on X amplifying narratives in coordinated waves.
Reddit astroturfing sophisticated enough that the platform’s own moderators are flagging it in pinned posts.
Meta has its own bureaucratic name for this. They call it coordinated inauthentic behavior, which translates roughly to fake humans pushing narratives at real humans. And these gatekeepers on the social platforms aren’t doing that good of a job.
I’m not running social campaigns and I don’t have the stomach to test and measure this sort of thing – there’s only so much time in a day – but I’d wager the findings will be different and much worse for the human internet.
My next tests are more about substrate than amplification: if social is the warzone, the structural-quality layer is where the gatekeepers and the LLMs both already trust the signal without it having to travel.
So how well do they do that?
One, which I will share in time, relates to how Wikipedia does seem to rule everything around us. There’s no doubt that’s because of the active participants and long-standing off-page signals that deem it a great source for information. Wikipedia ranks because the wiki structure is recognized by every gatekeeper as a quality pattern. Long-form. Sourced. Internally linked. Version-controlled. Community-edited. But how much of it is the off-page and how much is on-page and related to the technical beneath the surface? We’re trying to figure that out, especially with how it relates to discovery beyond the typical stuff on Wikipedia (like people, places) and more about things like podcasts and other rich media.
We’ll see.
Like above, I’ll publish what we learn in an effort to help people and early-stage businesses figure out how to show up where they need to online.
And then it’ll evolve again.