The Difficulty Of Obtaining Good Data

Are you also annoyed by the emails you get after you have been in a hotel or booked a rental car. For the hundredth time you are asked: “How satisfied were you with our service? Please help us by answering a little survey.” Oh, dear. It gets bad when the “small survey” leads to a never ending chain of questions and at the very end the hint to write a second rating on Tripadvisor. I’m already thinking about creating a blacklist of companies that send out surveys. Why? Because my effort is high, but a benefit for me is not visible.

But that is exactly my job now. Getting data from customers who use our software. And this data, this feedback is important. It’s how we can ensure that we continue to develop our solutions in the right direction. It’s all about collecting data and providing benefits for customers.

Automated Or Manual Data Collection

In many cases data is collected automatically. When you install software, you will be asked if you want to provide data to the developers. Do you allow this? Privacy experts often advise against it. However, I do allow it in individual cases if a software is important to me and I trust the manufacturer. Whether this trust is justified is a completely different question.

But how relevant is such automated data collection? Especially in the B2B environment, I hear time and again that no data collection is allowed in production systems, but only in test systems. But then the collected data can be misleading. Whatever you do when data is collected automatically, you should check exactly how relevant the data is.

You can force the collection of data from production systems, for example by linking it to support contracts. Only if data is sent, there will be good support. More and more companies are making this a requirement. Via the Internet of Things, machine parts deliver information to servers to improve maintenance. I think this is very interesting, but in the data center environment I still see a lot of skepticism among customers.

Sending email surveys is also part of automated data collection for me. But the feedback rate can be low. The questions must be asked in a way that prevents misunderstandings in interpretation. We use email surveys in part because it scales well. But at the same time the data quality suffers because relatively few answers come in, and sometimes the answers do not fit the questions.

I currently collect my data mostly manually by interviews. This is more complex, but the quality of the data is extremely high. The biggest hurdle here is to find the right people who know how to use our software and who are willing to answer questions. We try to scale this up through our sales teams and product champions (our fans). You have to remember that it is a give and take. Just asking for information without delivering anything – that’s not good. We directly provide special technical information and tips, and indirectly provide better software once the feedback has been implemented. In this partnership we are making progress.

Connecting The Worlds

We plan to link the automatic and manual data collection. Automatically, as much relevant technical information as possible should be collected. This data can then be supplemented manually via ratings or information on priorities on the customer side. Let’s see where this journey takes us.

Soil Samples, Power Generation, and IT

Sometimes I am surprised how seemingly completely different areas are closely connected. When I built the first dashboards for my data to use our solutions, I saw that the quality of the data had room for improvement. Some data was missing or outdated. And yet I was already able to present insights and results for our product management.

Last week, I spoke to a friend of mine who works as a chemical assistant in a government agency and analyses soil samples. She also collects data and provides dashboards and overviews. Her data is also incomplete or incorrect. But she too can generate valuable insights.

This was followed by a conversation with another friend who collects data on electricity generation at an energy producer and prepares it for reports to the management. He, too, is struggling with data quality.
I find it very exciting that all these different departments end up having the same challenges. And the same two strategies always emerge:

  1. show the best you can with the data you have.
  2. work on the quality of your data.

But don’t try to boil the ocean. You will never have 100% correct data. Make sure that the errors don’t twist the message of your dashboards. I work with two views of our data: An internal view that reveals gaps in quality, and an external view that helps our stakeholders make decisions. In doing so, we have to provide interpretation aids as long as there are gaps in the quality of the data.

And most importanty, we need t clearly state the quality of the data when we build dashboards.

I’ll Be A Kind Of Data Scientist

I have recently started working as a “Senior Customer Adoption Engineer”. This is a kind of data scientist who helps to gain information about how modern technologies are used. So better decisions can be made about future software developments. This in turn benefits our customers. I am very happy about this new task. I think it is very important. Let me explain why I think so.

Data can save lives

When we hear about large amounts of data, this is often accompanied by concerns about our privacy. But as is so often the case, there is another side to it. Large amounts of data can help us make vital decisions. This is particularly visible during the current corona pandemic. The more information we have about the status of the infections, the more accurate and reliable this data is, the better decisions we can make. Decisions that can save many lives.

That’s why scientists are working feverishly to increase testing capacities for the corona virus. That’s why politicians and decision-makers in Germany and many other countries listen to scientists. The scientists’ recommendations can be made on the basis of a good database.

Data in product developments and new technologies

My employer VMware delivers new technologies to better create, run and use applications. Product managers and business units must make decisions about what features are needed, or what improvements are important or needed quickly. Some decisions are made on instinct, but they are often based on existing data.

There are business figures such as revenue, licenses sold per solution, pipeline, and other relatively easy to obtain information. But there is also another side: What are customers doing with the technologies? How are they used? What benefits are particularly important, what are the difficulties? We have a lot of customers, so we use analysis of support requests, but also anonymously sent information from our products, if the customers allow it. You can get interesting data through these automated channels.

But really valuable are the data that reflect the subjective perception of the users. How well do the new technologies and solutions perform the tasks for which they are intended? I am now working on improving this kind of research in my new team. There’s a wonderful book on this topic that I would recommend if you are interested in gaining customer information: “Lean Customer Development” by Cindy Alvarez. The subtitle is “Build Products Your Customers Will Buy”, and it’s full of insights and practical tips on how to approach customers, be it via email, phone interviews or on-site interviews. The book is valuable for start-ups and also for established companies.

I am working on ways to provide useful information that generates valuable insights. Our customers should benefit from this because they get even better solutions, but it should also make the work of our product management easier. It should help them to make the right decisions.

The Eel and The Discipline of Small Steps

Have you ever tried to hold on to a live eel? You’ll hardly ever succeed. I grew up in Northern Germany at the Steinhuder Meer, and there are eels there. With my school class, I went to an eel smokehouse once, and we were allowed to try to hold an eel. It slips through your fingers. Nobody could hold it for more than a few seconds.

Sometimes it seems to me that the business value of a new technical solution is like an eel. You’ve invested millions in new software or services, and in the end you’re not sure whether this investment has delivered measurable added value for your own company. This seems to be a trend across all industries, but especially in modern IT such as cloud computing, IT departments have a hard time. Vendors are reacting with new roles such as Customer Success Manager. A search on LinkedIn for this job title yields 65,671 hits today. These people help customers to realize the added value of a solution.

In an ideal world, a product delivers its business value after installation and everyone is happy. But the world is not ideal. Especially solutions that involve change of operational processes, that are supposed to deliver particularly high added value, require a change in user behavior. That starts at the Apple Retail Store, where you can get a demonstration of how to make the transition to Apple products work. But this is even more true in large companies. This is often referred to as operational transformation, the change in IT operations.

That’s why I’m a big fan of small steps: Think big and start small. If the value of a great idea is visible in a first implementation after a short time, then the IT manager can provide management with more reliable predictions about future business value.

Look for a concrete use case that is close to the business. Define how you want to measure success. Pay special attention to how you want to measure the success of a business transformation. And don’t wait too long until the first milestone is reached.

When there is a special relationship of trust between customer and supplier, sometimes very large projects are initiated and new investments are made before the previous project has delivered measurable business value. This can work, but in the long run it is a risk for both sides. Think about the discipline of small steps.

An AI Playground

This week I was invited to the official opening ceremony of the ARIC (Artificial Intelligence Center) Hamburg. The ARIC brings together companies, start-ups, research institutes, banks and politics to initiate AI-based projects and establish AI solutions on the market. Besides good conversations, I experienced interesting presentations introducing AI projects.

A very large established finance company uses AI in two ways. There are short-term (in 1 to 3 years duration) projects in which modern applications and new user interfaces are developed. In the long term, in cooperation with ARIC, completely new business areas are tackled and the old processes are fundamentally improved, e.g. in the analysis of legal documents.

A communications company presented how they use AI to evaluate and optimize the efficiency and reach of marketing methods. A consulting firm showed how AI in image analysis can be used to categorize defects in aircraft engines much faster.

There are many ideas on how AI can drive new business, and yet it seemed a bit like a playground to me. This is not meant negatively. It’s about playful experimentation. There will be many more experiments to try. And it’s about starting on a small scale and proving the value of AI solutions, as I wrote earlier.

The more AI-based business models work, the more new ideas are coming up. I can imagine that AI will become much more interesting for many companies. And faster than you might think.

Truly Intelligent Machines

The definition of artificial intelligence can be vague. Sometimes it seems to be just brute force number crunching. There, more and more computing power is used to create a behavior that seems to show intelligence. But if we look behind the scenes of Deep Blue and other supercomputers that master games like chess or go, these are special cases where knowledge is optimized in a clearly defined area.

Human intelligence is much more creative and adaptable. It is prepared for every eventuality in our lives, much more than any computer.

And this is exactly where the 15-year-old classic by Jeff Hawkins and Sandra Blakeslee comes in: “On Intelligence” is a book in which we learn in great detail how the human brain works, how the neocortex is structured, how we use it to remember things, and how we make decisions. And it is precisely this biological template that the authors use to give us clues as to how to build truly intelligent computers.

A colleague and friend recommended this book to me, and I can only pass on this recommendation. Even if the predictions of 15 years ago did not really come true, it is still an enlightening reading.

“The most powerful things are simple,” Jeff writes in the prologue. He’s right, you might just think of the iPhone. So this book presents a simple and straightforward theory of intelligence. It is very profound when the individual cells and cell regions in the brain are explained how they interact and how information is stored and retrieved. Yes, you should concentrate while reading, but is it also understandable for non-neuro-scientists.

Now, if a machine uses this behaviour of the human brain, then it is really intelligent. Jeff assumes in this book that in 10 years (that would be 2015) such intelligent machines will exist. But in the next sentence he gets more cautious because it might take longer.

Jeff calls for the construction of such machines, which have the human neocortex of the brain as a model. In the book there are some examples, e.g. how such machines communicate and capture the world’s weather in a level of detail that seems impossible today. Do we really want that? I’m not sure that’s a good idea. And I haven’t heard anything more about such machines.

Anyway, I recommend the book “On Intelligence” to anyone interested in intelligent computers. You’ll have more respect for your brain after reading it.

Investing in AI and The Role of VMware

At the NORTEC 2020 trade fair for Manufacturers I was invited to a round table discussion about the introduction of AI in the manufacturing industry. Large companies, universities and local business leaders explored how to use AI to drive innovation and create a business plan for it. I was asked to present the role of virtualization for AI/ML projects, and a data scientist was interested (and surprised) by the performance benefits of virtualization as described in the VROOM blog article How Does Project Pacific Deliver 8% Better Performance Than Bare Metal?

Several representatives and local executives from private and family businesses discussed their business. Small and private companies are driving the economy in Northern Germany, where there is not a single DAX company, but many small and medium-sized companies. I was surprised to learn that these smaller companies increase their turnover much faster and more strongly than the large public companies. The consensus was that long-term investments exceed short-term investments. Public companies must take shareholder value into account and provide quarterly figures. Many decisions are made to increase short-term revenues. Smaller companies have a time horizon of 10 to 20 years for their investments, resulting in a more stable and reliable business. They work over many generations.

This has an interesting influence on their AI strategy. These companies cannot afford large investments, so we have discussed joint projects with students from local universities. These entrepreneurs cannot risk investing large sums of money because they have to control the risks. But they are very interested in AI and there are first companies that are starting to get value out of AI. But they are only at the beginning. Another stumbling block is the concerns about using the public cloud for AI projects, especially in terms of compliance and intellectual property protection. As a result, they will want to run AI/ML software in their local data centers or locations. The amount of investment is often only around €10,000 to €15,000 for hardware, so at first I thought this was too uninteresting for cloud infrastructure providers like VMware, who tend to support larger projects. But I was asked about the virtualization of AI/ML workloads because almost everyone has had good experiences with VMware vSphere (or VMware Workstation). In addition, universities and research institutions like DESY have to cover completely different dimensions, which can make infrastructure projects with virtualization interesting.

Unexpected Side Effects

In the podcast Die Maschine: Kontrolle ist gut, KI ist besser (in German language, by the radio station Deutschlandfunk) a scary fictional story is told from the 21st minute on:

An artificial intelligence has been developed that controls and executes all drug shipments worldwide. Because this was so critical, a special algorithm was chosen to ensure that individual population groups are not disadvantaged under guarantee, an algorithm which is always 100% politically correct.

When the artificial intelligence was activated, things went well at first, but then the number of deaths of diabetics increased in the rich countries. Insulin is lacking everywhere in the hospitals of the industrialised countries. How could this happen?

Well, the system worked exactly as it was designed. However, the artificial intelligence took into account the need for drugs worldwide. But there were not enough drugs like insulin for everyone on earth. Underserved areas, especially in Asia and Africa, received more drugs from the artificial intelligence, while the rich countries received less. Thus the shortage was distributed evenly across the globe.

It is a similar dilemma to two burning houses with people trapped inside, but you only have enough helpers to fight one fire and save the people, not both fires. What are you doing?

These are ethical questions that an artificial intelligence cannot answer automatically. So when artificial intelligence is used to sustain life, we should look very closely. And well-intentioned is certainly not always well done, as the story of the drugs shows.

ML Job Interview

There are many variants of this joke floating around It is so cool. Here is my favorite which I found on Twitter:

Interviewer: What’s your biggest strength?

Me: Expert in Machine learning

Interviewer: What’s 9 + 10?

Me: 5.

Interviewer: Nope. 19.

Me: It’s 14.

Interviewer: Wrong. 19.

Me: It’s 19.

Interviewer: What’s 2 + 2?

Me: 19.

Interviewer: You’ll overfit right in!

Like a Broken Marriage

There are cases where I think IT and business are like a broken marriage, and my work is that of a family therapist. What makes me think so?

Well, at a meeting of IT specialists I once asked which of the IT experts has the pressure to provide infrastructure faster. No one had come forward. They all said their work was okay.

A week later at the Machine Learning Conference I asked a few people where they run their applications. They said in the public cloud. When I asked if they were considering doing the same in their own data center, they gave me a big look: “I would never ask my own IT department if they were running machine learning applications. They are way too slow to deploy.”

No wonder IT staff feel no pressure, that business doesn’t even ask for faster deployment because they have given up.

It’s like in a marriage where the spouses have given up communicating. If you want to solve this, it’s hard work.

Now there are certainly IT departments that are working well with their customers in their respective business areas. But, dear IT people, are you sure that you know all the requirements of the business units? Are they still talking to you, or have they given up? It might be a good idea to validate assumptions explicitly. Maybe there is still unused potential for improvement. And dear business departments, have you asked your IT department lately if they could react faster? Perhaps you have overlooked potential in your own company?

If IT reacts too slowly to business requirements, then it has very little to do with technology, it’s all about processes and team structures. And above all it is about communication. Maybe you are getting help to get communication back on track.