The illusion of data
There is no magical shortcut to knowledge
Let me share a few observations.
Venture capital funds have invested billions of dollars in ESG data start-ups. However, I have never (!) come across a surprising AI-generated insight. It is very fine for compliance and reporting.
When I wrote my doctoral thesis, one of the data sets consisted of ~250 investments. It took me a few weeks to put together, but I still know the details of this data set 15 years later.
For a current research project, I let the AI agent run for a few afternoons and had more than 10,000 data points. Some first results were interesting when you know the field in detail (in the sense of a heatmap or sketch), but I still had to put in the effort to get some more meaningful and less noisy results.
Last summer, I was analyzing the loan portfolio of a Romanian financial intermediary. I was looking at the Google Street Map entries and the websites of a few hundred lenders and had a very good understanding of the underlying portfolio.
Let me share a few thoughts around these observations.
The essence of research
There are many accounts of innovations where people were looking for something else to discover something groundbreaking.
Fleming left some petri dishes and found penicilin after his holidays. There is also a (now debunked) story about German soldiers observing Bedouins using camel dung as antibiotic. Viagra was initially aimed at hypertension and angina pectoris. Goodyear accidentally found out how to make rubber commercially. The glue used for Post-Its was initially a failure as it was too weak but was successful in another application.
You also see many of these cases in the social sciences where people see something puzzling and want to understand the details. Examples are the Broken Windows which started as an abandoned car or the Weak Ties theory which started as the question how people found a job:
In a survey he conducted of how 282 men in the United States got their jobs, Granovetter found that a person’s weak ties – their casual connections and loose acquaintances – were more helpful than their strong ones in securing employment.
The point is that results often start with random observations and it is probably staying this way … as it is hard to imagine the other way around.
Research is the compression of knowledge
When I was studying mechanical engineering, I was often amazed how you can approximate real-life problems. Let us say that you are interested in the speed of water in a river. Does water in a concrete channel run faster or slower than in a rough natural mountain river of the same dimension? Why do we make concrete riverbeds like the one below?

To find out, researchers started doing experiments with different riverbeds. If you want to calculate the mean flow velocity you use the so-called Strickler coefficient, the slope, the total cross-sectional area and the perimeter.
That means that we can compress the knowledge in a simple formula with 3 parameters and one coefficient … and yes: Water runs up to 10x faster when the riverbed is made of concrete.
This compression of knowledge is the reason why today’s high school students have more knowledge than geniuses of a few hundred years ago. A high school student – most likely – knows more than Newton did in his days.
That is the case because we have compressed centuries of knowledge into single formulas where E =mc2 is the most famous version of that. David Deutsch has written beautifully about it in “The Fabric of Reality”.
The difference between applied research and more theoretical research
During my studies I spent one year at the INSA Rouen which stands for Institut National des Sciences Appliquées de Rouen. Coming from TU Vienna, I was often surprised or humbled to see how good the other students were at modelling computational stuff like fluid dynamics.
One off-hand remark (”that is too academic”) in a lecture helped me to understand the difference. Theory-focused universities work on the description of a problem (i.e., a puzzling observation), while universities with a focus on applied research take the description and solve it.
There is nothing wrong with either approach because both are needed. However, it shows that “problem solving” involves multiple steps each having a different focus.
The illusion of data
That brings me to my main point. I regularly see pitches where companies sell something along the following lines:
We have millions of research articles and give you a complete view of everything.
We have access to 40,000 newspapers and give you the ultimate overview of what is happening.
Many of the business models are quite fine. I am a big fan of ontology-driven approaches where you have a data model or a knowledge graph which gives you consistent and auditable results.
I also like those approaches where you get a quick overview of research fields you are not entirely familiar with (“what are researchers working on in the field of microplastic substitution?”). You get surprisingly good results upon which you can build.
However, I am skeptical of promises which say that they can describe a messy problem (and there are many), put the messy problem into an appropriate framework (which is complicated) and run analyses (not all are appropriate) to provide meaningful insights. That is the observation about the ESG data start-ups which are fine for reporting and compliance but not for new insights.
AI tools will also most likely not notice an accidental discovery or it is unlikely that the user will take advantage of it. It is hard to imagine Claude or Gemini writing an output “Look at this puzzling observation”.

