Mon. Apr 15th, 2024

We spoke with Ram Ventakesh, Global CTO at Cloudera, about the growing importance of data in the digital transformation of companies. The company’s top technology managers also spoke about the revolution that generative AI could bring about in business operations.

Interview with Ram Ventakesh, Global CTO of Cloudera

Where do you see the main challenges facing organizations in the coming months?Data is the fuel for the transformation of the digital economy in which we are currently immersed. For this reason, it is so important to obtain the maximum value from them, as this is one of the main challenges for organizations. Technologically, they will have to adapt to modern data architectures such as data lake houses, data fabrics and data meshes. In addition, in the coming months, the balance between decentralization with data mesh and centralization through data lakes will be key to advancing in enterprise data management. It is crucial to promote data-based decision-making from the top management, while encouraging all areas of the organization to work efficiently, systematically eliminating bottlenecks and encouraging successful results.ChatGPT has caused the general population to discover the capabilities of AI. What do you think this implies?It’s amazing how quickly these tools have become a part of many people’s lives. This new way of interacting with a device or machine is very new and impressive and feels very natural, so I think conversational analytics is an interesting path that ChatGPT has opened up. For businesses, tools like ChatGPT have raised a lot of awareness and fueled conversations about AI and its potential business benefits. Large-scale language models are increasing access to data for everyone, but in doing so, they also raise concerns about regulatory compliance and intellectual property. Businesses need AI solutions they can trust. And trusting AI starts with trusting the data. Large-scale language models will only be as good as the data on which they have been trained is.Since the emergence of ChatGPT, there has been a lot of talk about apocalyptic aspects such as the elimination of jobs or even the annihilation of Humanity by AI… However, really important aspects such as data protection or the development of legislation to protect sources are left aside. What do you think will be the evolution of AI?For us, data analysis and machine learning go hand in hand. So we think customers will want to use these tools with all of their data. The interesting thing that we’re seeing in the data management arena is that we used to focus a lot on structured data, like point-of-sale transactions or other similar activities. And companies used to use SQL queries to analyze that data, right? But increasingly, the majority of our customer data, around 70% to 80%, is unstructured data. It can be anything from documented tweets, Zoom recordings, audio, video and all kinds of data that make up the information available in a company, whether it is structured differently or not structured clearly. That is the data found in the systems of any company. So when we think about data in this way, it’s reasonable to expect analytics based on machine learning. Therefore, our clients can already do it on our data platform. We show people what is possible with AI today in a very easy to understand and use way.Without data there is no artificial intelligence, what does Cloudera propose in this regard? If it is already difficult to perform efficient data management, what are the data management challenges regarding AI?Our customers have told us that while AI services like ChatGPT are attractive, they would love to build similar interactive experiences using their own data to improve their business intelligence and relevance. For AI to effectively support critical business decisions, data sets must be complete, accurate, and updated in real time. But it’s not just about aggregating data, you also need to prepare and analyze it, as AI models are only as good as the quality of the data they learn from. As I mentioned before, the problem is that enterprise data is often messy and made up of different types of data, each requiring separate analysis. Also, the data is stored in different places such as data centers, private clouds, at the edge or in various public clouds. We at Cloudera offer an Applied Machine Learning (AMP) Prototype that allows companies to use a chatbot similar to ChatGPT, but with their own corporate data, avoiding strange responses due to lack of context and unvalidated data sources. Because businesses need answers they can trust without compromising data compliance and intellectual property.Technology has always evolved, the problem now, could it be that it is evolving so fast that neither IT departments nor the people themselves have time to adapt to those changes?It is true that six months ago we would not have had a conversation about generative AI or its impact on data, but from my point of view, that is what is really exciting about data: there are so many possibilities and so many benefits to the value of data that we should see this evolution as something positive. IT professionals and data experts need to stay up to date with market trends and new technological developments. Only those companies committed to providing continuous training will succeed in the data industry.Beyond the impact of AI and data, what other trends do you think are going to be truly disruptive in the medium term?From my point of view, modern data architectures such as data fabric, data mesh, or data lakehouse are evolving and reflect the needs and capabilities to manage and leverage data effectively. The integration of data in modern architectures is essential for organizations to take full advantage of its potential. It provides accurate insights, improves operational efficiency, optimizes customer experiences, and enables organizations to be agile and competitive in today’s data-driven landscape. Another relevant trend is public cloud transformation. In that sense, I think public cloud is an important part of the data lake landscape, which is also something new in the last three, five years. I think it can be an effective way to implement secure sharing and collaboration when it comes to data access.As a cloud company, what has the cloud meant for data management?Data migration to the cloud improves data accessibility, optimizes storage and backup, increases scalability and flexibility, and enables a faster environment for innovation. In fact, better accessibility is the main reason why Spanish companies would migrate to the cloud, according to a recent Cloudera study in which we surveyed 850 IT managers in Europe. In addition to this data, 81% claimed to have transferred more data to the public cloud in the last 12 months, and that figure reaches 87% in Spain.Many CIOs are affirming that the cloud is the new on-premise, what is the truth in this statement?Although 87% of IT managers in Spain stated that their company moved more data to the public cloud in the last 12 months, 72% plan to repatriate data to on-premises environments. The main reasons for not migrating to the public cloud are concerns related to data governance and regulatory compliance, cybersecurity, and fear of cloud lock-in, which would make it difficult to change platforms in the future, according to this study. The reality is that hybrid architectures are the default standard for most companies. According to our analysis, 71% of Spanish organizations use a hybrid environment to store their data, while only 4% exclusively use the public cloud.What mistakes are being made in cloud migration and why are many companies moving data back to on-premises environments?Data is a highly valuable asset and demands its own strategy. The cloud is ultimately a flexible, agile and scalable delivery model. Without an Enterprise Data Strategy, based solely on the cloud, it would make it difficult to manage, access, secure, govern and obtain insights from the data. This is exactly what early cloud adopters have experienced: the move to the public cloud created new data and analytics silos that were more difficult to manage and more expensive. This is leading them to re-evaluate where some workloads should reside. Decisions about whether a workload is best suited for cloud-native deployment, whether in a shared public cloud or on-premises, must be driven by trusted data. Workload analysis allows companies to observe the performance of a workload before making a decision in one direction or another. Workloads that are more predictable and consume a relatively stable level of resources are often cheaper to run on-premises. While a customer-facing service, which is more variable, can work better in the cloud due to its elasticity. With the emergence of modern data architectures, organizations can get more value from their data and optimize their cloud costs at the same time.

By Alvaro Rivers

Award-winning student. Incurable social media fanatic. Music scholar. Beer maven. Writer.