--- title: "Council Post: How Data Observability Solves Data Scientists’ Biggest Pains" description: "Data observability tools reconcile data across the modern distributed data fabric, preventing and healing problematic issues when it comes to data at rest, data in motion and data for consumption." type: "news" locale: "en" url: "https://longbridge.com/en/news/46600643.md" published_at: "2021-09-27T13:35:29.000Z" --- # Council Post: How Data Observability Solves Data Scientists’ Biggest Pains > Data observability tools reconcile data across the modern distributed data fabric, preventing and healing problematic issues when it comes to data at rest, data in motion and data for consumption. Founder and CEO of Acceldata*, bringing data observability to data-driven enterprises.* getty Data scientists used to be the company nerds. But data scientists — or data analysts, as well as their slightly older siblings, business intelligence (BI) analysts — have “glowed up.” Today, data scientists and analysts are heroes and MVPs, with the power to transform your business with near real-time analyses and spookily accurate predictions that improve decision making, reduce risks and boost revenues. Companies have invested millions of dollars in cutting-edge data science platforms chock full of capabilities in order to support their data scientists and accelerate their transformation into data-driven businesses. So why do so many data scientists still have so many complaints about pain points in their job? And ironically, they all revolve around the same thing — data. More specifically, data scientists say they encounter: • Difficulty in finding the right data sets. • Unreliable training data to train their machine learning models. • A continuously changing data set both in volume and structure. • Adrift outcomes and predictions given changing data. • Inadequate visibility while executing their models, jobs and SQLs. • Tremendous challenges while maintaining high performance. **Driving Blind** It shouldn’t be a surprise. Companies that went big on data science platforms failed to invest in tools that granted visibility and control over the data itself. That’s like buying a sports car that can go from zero to 100 miles per hour in four seconds flat ... that also happens to have no windshield, windows or dashboard. In the automotive equivalent of a black box, you have no idea where you’re going or how fast, how fast your engine is revving or whether your tires are about to blow. Companies can’t take all of the blame for driving blind. There simply weren’t good data observability tools around. So what is data observability? It is a 360-degree view into data health, processing and pipelines. Data observability tools take a diversity of performance metrics, and they analyze them in order to alert you to predict, prevent and fix problems. In other words, data observability focuses on visibility, control and optimization of modern data pipelines built using diverse data technologies across hybrid data lakes and warehouses. **False Promises** In the past, there have been tools that claimed to deliver observability for data-intensive applications. Many were half-baked extensions of application performance management (APM) platforms, which have been around in some cases for almost two decades. That means these APM platforms, by and large, predate the rise of data intensive applications. Moreover, they remain firmly rooted in an application-centric view of the enterprise technology back-end.  Consequently, their visibility into the modern data infrastructure tends to be shallow or outdated. When data workers need help finding and validating the quality of data, troubleshooting why the data pipelines feeding their analytics jobs are slowing down or identifying what’s causing their data anomalies or where schemas drift, APM-based observability can’t answer their questions. Similarly, there are one-dimensional point solutions promising to provide data observability. Some work for only one platform, such as Hadoop. These tend to be primitive and also lock you into a single vendor. Others focus on only one task, usually data monitoring. Neither of them provides the single-pane-of-glass visibility, predictive and automation capabilities that modern heterogenous data infrastructures require and today’s data teams need. And like the APM-based tools above, they are weak at the data discovery, pipeline management and reliability capabilities that data scientists need to keep their work on track to meet their companies’ business goals. **Automated Data Reliability** For data scientists, data reliability is an important aspect of data observability. Data reliability enables data scientists and other members of a data team to diagnose if and when data reliability can affect the business outcomes they are trying to arrive at. Such reliability issues are common, due to the combination of external, unstructured volumes of data that is ingested into data repositories today. According to Gartner, data drift and other symptoms of poor data quality cost organizations an average of $12.9 million per year. This seems to be a gross underestimate according to us. Moreover, data, schema and model drift can wreak havoc on your machine learning initiatives. Data-driven organizations can act today to ensure that data lives up to its promise and delivers the expected ROI from their data initiatives by doing the following: **• Establish The Right Data Requirements:** It’s not enough to simply want better quality in your data. The first step is to build clear requirements for the data sets needed and the identification of where these data sources lie. Once that is defined, data scientists can determine features, files and tables that need to be included, the expected data types and how to extract and integrate the necessary data. These requirements will provide a framework to ensure the data team is working with the right sources and is getting the right data into the correct pipelines. **• Emphasizing Data Orchestration:** Compared to five years ago, the enterprise environment looks chaotic. There are more apps, sources, use cases and users, and the continuous nature of how all these elements operate makes some of these quickly fall out of synch. Inter-team communication, transaction and delivery must be aligned for rapid delivery with high accuracy. **• Automation And Systems:** In order to make data reliable and data scientists effective, data observability is an important step in gaining reliability and obtaining an automated set of modern data management capabilities. That includes AI-powered data reliability, data discovery and data optimization capabilities that ensure data is accurate, reliable and complete throughout the entire data pipeline, without heavy labor by the data science or engineering teams. Data observability tools reconcile data across the modern distributed data fabric, preventing and healing problematic issues when it comes to data at rest, data in motion and data for consumption. They trump classic, prior-era data quality tools, which were built for an era of structured data focused on relational databases. * * * Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify? * * * ## Related News & Research | Title | Description | URL | |-------|-------------|-----| | “硬件防御” 对冲 AI 焦虑,苹果与纳指相关性创 20 年新低 | AI 浪潮下,苹果因未深度卷入军备竞赛,与纳指相关性创 20 年新低,成为科技股动荡中的 “避风港”。在 AI 投资回报存疑及软件业面临颠覆的焦虑中,苹果凭借不易受冲击的硬件生态逆势突围。尽管存在估值偏高及增长放缓压力,其独特的 “AI 中 | [Link](https://longbridge.com/en/news/276301841.md) | | 为 AI 交易 “背书”!OpenAI 正敲定新一轮融资:以 8300 亿美元估值募资高达 1000 亿美元 | OpenAI 正以 8300 亿美元估值推进新一轮融资,目标筹集 1000 亿美元。软银拟领投 300 亿美元,亚马逊和英伟达可能各投 500 亿及 300 亿美元,微软拟投数十亿美元。本轮融资是 OpenAI 自去年秋季公司制改革以来的首 | [Link](https://longbridge.com/en/news/276298180.md) | | 纽蒙特矿业|8-K:2025 财年营收 227 亿美元超过预期 | | [Link](https://longbridge.com/en/news/276377769.md) | | 缺电、缺水、缺人还抢地!美国数据中心建设狂潮面临阻力 | 科技巨头掀起的数据中心基建狂潮正遭遇严峻 “现实墙”:从电网容量、水资源瓶颈到技术工人短缺,执行风险急剧上升。亚马逊等巨头以惊人高价抢地,直接挤压住宅开发,甚至斥资 7 亿美元购入原定建房的地块。这场资源竞赛不仅推高了运营成本,更可能拖累 | [Link](https://longbridge.com/en/news/276290793.md) | | 沃尔玛|8-K:2026 财年 Q4 营收 1889 亿美元超过预期 | | [Link](https://longbridge.com/en/news/276329482.md) | --- > **Disclaimer**: This article is for reference only and does not constitute any investment advice.