BOOKS - Data Virtualization in the Cloud Era Data Lakes and Data Federation At Scale
US $5.56
343023
343023
Data Virtualization in the Cloud Era Data Lakes and Data Federation At Scale
Author: Daniel Abadi, Andrew Mott
Year: 2024-07-03
Number of pages: 184
Format: PDF | EPUB | MOBI
File size: 10.1 MB
Language: ENG
Year: 2024-07-03
Number of pages: 184
Format: PDF | EPUB | MOBI
File size: 10.1 MB
Language: ENG
For decades data virtualization has been little more than a dream. How nice it would be if we could ignore all the details regarding where data is located and how it is stored, and simply access all data within an organization from a single unified interface! Unfortunately, this dream was held back by fundamental limitations of hardware and complexity of the necessary software, so data virtualization remained a niche technology. However, in the last decade, advances in networking hardware and Machine Learning technology has started to transform data virtualization from dream to reality. One system may have an SQL interface, another GraphQL, and a third system may support only text search. The client who wishes to pose a question to these differing systems needs to learn the language that the system supports as its interface. The goal of data virtualization (DV) is to eliminate or alleviate these other barriers. A DV System creates a central interface in which data can be accessed no matter where it is located, no matter how it is stored, and no matter how it is organized. The most complex part of a DV System is the data virtualization engine (DV Engine), which receives requests from clients (generated using the client interface) and performs whatever processing is required for these requests. This typically involves communication with the specific underlying data sources that contain data relevant to those requests. The DV Engine thus needs to know how to communicate with a variety of different types of systems that may store data that is being virtualized by the system. In general, the goal of data virtualization is to allow clients to express requests over datasets without having to worry about the details of how the underlying data source systems store the source data.