Analyst Toolkit

Web data: A reliable source for company analysis?

Company websites are generally reliable because firms benefit from sharing accurate, up-to-date information. While not perfect, their data becomes highly valuable when combined with additional sources and validation, as done by ISTARI.AI.
Header image

How reliable are information from company websites?

A question I encounter repeatedly in conversations with customers and research partners is: “Can we really trust your web data? Companies can write whatever they want on their websites.”
It’s a valid question.

The honest answer upfront: No, we don’t have a built-in lie detector. If a company claims to be ISO 27001-certified but isn’t, we cannot automatically disprove that at first glance. But after years of working with web-based indicators, both in research and in everyday practice, I’ve developed a more nuanced picture that I’d like to share here.

What information do companies publish?

At ISTARI.AI, we use AI agents to extract structured information from the websites of millions of companies around the world. We’ve been doing this since 2019, long before ChatGPT made AI accessible to a wider public—and I’ve been researching what web data can reveal about companies for almost 10 years.

My experience and various scientific studies have shown: companies have a strong self-interest in presenting predominantly accurate information on their websites.

The logic is simple:
Every website visitor is a potential customer. Companies want people to understand what they offer. Naturally, this happens with positive framing. But the core information about products and services is usually communicated willingly, in detail, and kept up to date. This is where companies have the strongest incentive for completeness.

Companies also communicate strategic signals and internal developments when they support a positive narrative. Examples include topics like sustainability (“We run on 100% green energy”), innovation (“Our new technology X is a world first”), and growth (new locations, expansion plans). This is classic signaling—but the information is actionable and often highly valuable for market analysis.

In addition to actively communicated information (the new CEO presenting themselves and their vision), there is also passively observable information (changes in managing directors in the legal notice). Job postings, in particular, are an often underestimated treasure trove. The roles and skills a company is hiring for reveal a lot about its technological orientation, internal growth areas, and strategic priorities. Following job ads over time allows you to detect developments even before they are communicated publicly—if at all.

The “social proof mechanism” also works in our favor: companies gladly present customers and partners to build credibility. Not all customers appear, and practices differ by industry, but the relational information that does appear is often highly valuable.

The ISTARI approach: A comprehensive picture through data fusion

One often overlooked but very pragmatic point in favor of using company websites as a data source: for the vast majority of companies worldwide, their own website is the only substantial public source of information. Social media profiles can complement it, but they are curated differently: less structured, less detailed, and optimized for engagement rather than information.

Yes, there are commercial registers and mandatory publications that provide company information. However, these typically contain only the bare essentials (legal form, managing directors, address) - nothing more. Traditional company databases are often insufficiently detailed or outdated (for more on this, read my blog on obsolete industry codes). Extensive annual reports are almost exclusively available for publicly listed companies.

Company websites, therefore, are almost always the primary source for detailed product information, technical documentation, annual reports (for non-listed companies), press releases, and company news.

At ISTARI, however, we do not rely solely on company websites. Our agent system systematically integrates additional sources such as public registers, patent databases, news portals, and industry magazines. I find the network perspective particularly interesting: if a company claims to be a market leader, we can check whether it appears as a relevant actor on other companies’ websites. Self-presentation can often be externally validated in this way.

Discover the AI solution leading organizations turn to for reliable insights

Replace manual analysis with AI-powered insights into the companies and markets you care about, based on data you can trust. Stay ahead of the curve and make smarter, faster decisions with ISTARI.

What research shows

As a researcher, one aspect is particularly important to me: we regularly validate our data against established sources and traditional indicators.

Through our ISTARI Research Partner Program, research groups from renowned institutions work with ISTARI data to conduct independent studies. These studies frequently compare our web-based indicators with traditional data sources such as patent data, register data, or survey results.

The consistent outcome:
Web indicators are highly robust, and their advantages in timeliness and detail are significant.
Feel free to take a look at our latest whitepaper.

The limits of web data - and how we handle them

I don’t want to imply that web data is perfect. It’s not. Quality varies. Some companies maintain their websites meticulously; others don’t. False information is possible. A company could theoretically list a certification it does not have. We cannot disprove that in the first step. Not everything is communicated. Companies share what benefits them—not everything that might be interesting to outsiders.

My conclusion on the reliability of web data

Company websites are a reliable source of up-to-date corporate information—more reliable than many initially assume. This is not due to exceptional honesty but to economic incentives. Companies benefit from being transparent and current in certain areas.


The combination of these intrinsic incentives, systematic multi-source validation, and scientific verification results in an approach trusted even by demanding users—from research institutions to government ministries.

For us at ISTARI, this is not just a technical topic. It is the foundation of what we do: generating Trusted Market Intelligence from public data.

Start for free with ISTARI Markets

The fastest way to find, filter and understand companies that matter.