The amount of data available in the world today is staggering, estimated by a 2011 Digital Universe Study from IDC and EMC to be around 1.8 zettabytes (1.8 trillion gigabytes), and projected to double every year. In addition, the study highlights the fact that the costs to create, capture, store, and manage data are about one-sixth of what they were in 2005, which means companies and enterprises large and small will continue to capture and store more and more data at lower costs.
So how much data is out there? According to data gathered from IBM, and compiled on the social media blog ViralHeat in October 2012, there are nearly 3 million emails sent every second around the world, 20 hours of video uploaded to YouTube every minute, and 50 million tweets sent every day. In addition, Google processes 24 petabytes (1 petabyte is equal to 1 quadrillion bytes, or 1 x 1015) of data every day.
With so much data around, it can be difficult to access the information you need, then harness it in a way that can benefit your organization. Data virtualization is the process of bringing together information from several different systems—including databases, applications, files, websites, and vendors—into a universal format that can be accessed from anywhere in the world, without the need to know where the original file is located or how it is formatted. Effective data virtualization tools transform several disparate systems into one cohesive and usable format, giving you the ability to access, view, and transform data that would have otherwise been impossible to aggregate.
Date virtualization systems use a virtual layer, also called an abstraction layer, between the original source of the data and the consumer, which eliminates the need to physically consolidate and store the information on your own servers. In addition, it allows for on-demand access to the most updated information from the data sources (including transaction systems, big data, data warehouses, and more) without the need to go through consolidation processes or batch processes within each data repository.
Regulating Data with Virtualization
Data is a wonderful tool for businesses, but with the volume of digital information that exists today, companies can quickly become overwhelmed if they do not have a way to manage that information. Many companies have multiple data repositories where they collect and store information internally, including individual files and computers, servers, and databases, as well as access to external information from data warehouses, transaction systems, and more. For large corporations, the information on their internal servers and computers alone could equate to millions of gigabytes of data.
In order to effectively use this information, there must be a way to aggregate all the information into one system that is useful and accessible to everyone. Prior to data virtualization, you had to access the direct source of the data, which presents some challenges. In cases where you remotely access the data, there could be downtime waiting to download the information you need, or your data could get messed up when you try to integrate it all into one system from several disparate sources. In addition, there are risks involved with allowing several people to access and manipulate the original source of data, opening the door to the possibility that someone could corrupt the original files. Since virtualization provides a map to the data through the virtual (abstraction) layer, downtime is virtually non-existent, you get access to the most up-to-date information, and you reduce or eliminate the risk of ruining the original files.
The Costs of Data Virtualization
In order to have an effective data virtualization system, companies need the right middleware platforms to provide support and functionality while reliably providing instant access to all the data available. These platforms include three key components:
• An integrated environment that grants access to all the key users, and outlines security controls, data standards, quality requirements, and data validation.
• The data virtualization server, where users input their queries and the system aggregates all the information into a format that is easy for the user to view and manipulate. This requires the ability to collect and transform the information from several different systems into a single format for consumption. These servers must also include validation and authentication to ensure data security.
• The ability to manage the servers and keep them running reliably all the time. One of the keys to quality data virtualization systems is access to high quality information in real time, which means there must be tools in place to support integration, security, and access to the system, as well as monitor the system logs to identify usage levels and key indicators to improve access.
While the costs to set up this type of system can be high initially, the return on investment a company can achieve through strategic use of the data gathered can more than outweigh the initial costs.
Case Studies in Data Virtualization
There are hundreds of examples of companies today, from large corporations to small- and medium-sized businesses, that are using data virtualization to improve the way they collect, maintain, and utilize information stored in databases and systems throughout the world.
For example, Chevron was recently recognized for implementing data virtualization in a project; by adding the virtual layer to several systems that had to be aggregated, project managers were able to cut the total time to migrate systems almost in half, and lower the risk of losing critical data.
Companies like Franklin Templeton, which rely on data to deliver results to investors, use data virtualization to manage databases more efficiently, eliminate data duplication within the system, and shorten the amount of time it takes to bring new products to the market, increasing their competitive edge.
For large corporations that aggregate data from several different data marts and high-volume warehouses, the ability to consolidate that information into a usable format that drives sales and customer retention strategies is a critical competitive advantage. Companies like AT&T are using data virtualization to consolidate hundreds of data sources into one system in real-time that inform everything from R&D to marketing
and sales.
Whether you are a small- or medium-sized business that is struggling with time-consuming routine IT tasks, such as manually managing several systems and databases, or you are a large corporation trying to access and aggregate billions of pieces of information every day, data virtualization can help you view and manipulate information that will give you the competitive edge you need. Every company has data, but without the ability to safely and reliably access and organize the key pieces that you need from all your disparate systems, you will never realize all the benefits that information can offer.