--- title: "理事會帖子:數據可觀測性如何解決數據科學家的最大痛苦" description: "Data observability tools reconcile data across the modern distributed data fabric, preventing and healing problematic issues when it comes to data at rest, data in motion and data for consumption." type: "news" locale: "zh-HK" url: "https://longbridge.com/zh-HK/news/46600643.md" published_at: "2021-09-27T13:35:29.000Z" --- # 理事會帖子:數據可觀測性如何解決數據科學家的最大痛苦 > Data observability tools reconcile data across the modern distributed data fabric, preventing and healing problematic issues when it comes to data at rest, data in motion and data for consumption. Founder and CEO of Acceldata*, bringing data observability to data-driven enterprises.* getty Data scientists used to be the company nerds. But data scientists — or data analysts, as well as their slightly older siblings, business intelligence (BI) analysts — have “glowed up.” Today, data scientists and analysts are heroes and MVPs, with the power to transform your business with near real-time analyses and spookily accurate predictions that improve decision making, reduce risks and boost revenues. Companies have invested millions of dollars in cutting-edge data science platforms chock full of capabilities in order to support their data scientists and accelerate their transformation into data-driven businesses. So why do so many data scientists still have so many complaints about pain points in their job? And ironically, they all revolve around the same thing — data. More specifically, data scientists say they encounter: • Difficulty in finding the right data sets. • Unreliable training data to train their machine learning models. • A continuously changing data set both in volume and structure. • Adrift outcomes and predictions given changing data. • Inadequate visibility while executing their models, jobs and SQLs. • Tremendous challenges while maintaining high performance. **Driving Blind** It shouldn’t be a surprise. Companies that went big on data science platforms failed to invest in tools that granted visibility and control over the data itself. That’s like buying a sports car that can go from zero to 100 miles per hour in four seconds flat ... that also happens to have no windshield, windows or dashboard. In the automotive equivalent of a black box, you have no idea where you’re going or how fast, how fast your engine is revving or whether your tires are about to blow. Companies can’t take all of the blame for driving blind. There simply weren’t good data observability tools around. So what is data observability? It is a 360-degree view into data health, processing and pipelines. Data observability tools take a diversity of performance metrics, and they analyze them in order to alert you to predict, prevent and fix problems. In other words, data observability focuses on visibility, control and optimization of modern data pipelines built using diverse data technologies across hybrid data lakes and warehouses. **False Promises** In the past, there have been tools that claimed to deliver observability for data-intensive applications. Many were half-baked extensions of application performance management (APM) platforms, which have been around in some cases for almost two decades. That means these APM platforms, by and large, predate the rise of data intensive applications. Moreover, they remain firmly rooted in an application-centric view of the enterprise technology back-end.  Consequently, their visibility into the modern data infrastructure tends to be shallow or outdated. When data workers need help finding and validating the quality of data, troubleshooting why the data pipelines feeding their analytics jobs are slowing down or identifying what’s causing their data anomalies or where schemas drift, APM-based observability can’t answer their questions. Similarly, there are one-dimensional point solutions promising to provide data observability. Some work for only one platform, such as Hadoop. These tend to be primitive and also lock you into a single vendor. Others focus on only one task, usually data monitoring. Neither of them provides the single-pane-of-glass visibility, predictive and automation capabilities that modern heterogenous data infrastructures require and today’s data teams need. And like the APM-based tools above, they are weak at the data discovery, pipeline management and reliability capabilities that data scientists need to keep their work on track to meet their companies’ business goals. **Automated Data Reliability** For data scientists, data reliability is an important aspect of data observability. Data reliability enables data scientists and other members of a data team to diagnose if and when data reliability can affect the business outcomes they are trying to arrive at. Such reliability issues are common, due to the combination of external, unstructured volumes of data that is ingested into data repositories today. According to Gartner, data drift and other symptoms of poor data quality cost organizations an average of $12.9 million per year. This seems to be a gross underestimate according to us. Moreover, data, schema and model drift can wreak havoc on your machine learning initiatives. Data-driven organizations can act today to ensure that data lives up to its promise and delivers the expected ROI from their data initiatives by doing the following: **• Establish The Right Data Requirements:** It’s not enough to simply want better quality in your data. The first step is to build clear requirements for the data sets needed and the identification of where these data sources lie. Once that is defined, data scientists can determine features, files and tables that need to be included, the expected data types and how to extract and integrate the necessary data. These requirements will provide a framework to ensure the data team is working with the right sources and is getting the right data into the correct pipelines. **• Emphasizing Data Orchestration:** Compared to five years ago, the enterprise environment looks chaotic. There are more apps, sources, use cases and users, and the continuous nature of how all these elements operate makes some of these quickly fall out of synch. Inter-team communication, transaction and delivery must be aligned for rapid delivery with high accuracy. **• Automation And Systems:** In order to make data reliable and data scientists effective, data observability is an important step in gaining reliability and obtaining an automated set of modern data management capabilities. That includes AI-powered data reliability, data discovery and data optimization capabilities that ensure data is accurate, reliable and complete throughout the entire data pipeline, without heavy labor by the data science or engineering teams. Data observability tools reconcile data across the modern distributed data fabric, preventing and healing problematic issues when it comes to data at rest, data in motion and data for consumption. They trump classic, prior-era data quality tools, which were built for an era of structured data focused on relational databases. * * * Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify? * * * ## Related News & Research | Title | Description | URL | |-------|-------------|-----| | OXY-Clean: Buffett's Favourite Oil Rig Scrubs $6 Billion In Debt Off The Books | Occidental Petroleum Corp (NYSE:OXY) has significantly reduced its debt by $5.8 billion after selling its OxyChem divisi | [Link](https://longbridge.com/zh-HK/news/276372720.md) | | Palantir spent $25M on CEO flights so Alex Karp could do all the talking | Palantir CEO Alex Karp has spent nearly $25 million on private flights over two years to maintain his visibility in the | [Link](https://longbridge.com/zh-HK/news/276359392.md) | | BUZZ-Street View: Booking Holdings may be shielded from AI disruption noise | Booking Holdingssurpassed Q4 profit expectations, driven by strong international travel demand. RBC Capital Markets sugg | [Link](https://longbridge.com/zh-HK/news/276333390.md) | | BREAKINGVIEWS-Software CEO wallets can help ease AI overkill | Software CEOs can mitigate the impact of AI-related stock declines by purchasing shares with their own funds, as seen wi | [Link](https://longbridge.com/zh-HK/news/276327967.md) | | Johnson & Johnson explores $20 billion sale of an orthopedics unit, Bloomberg News reports | Feb 19 (Reuters) - Johnson & Johnsonis preparing a potential sale of the orthopedics unit that it has been planning to s | [Link](https://longbridge.com/zh-HK/news/276370994.md) | --- > **免責聲明**:本文內容僅供參考,不構成任何投資建議。