Sanity News

10 Data Virtualization Myths

Thursday, July 27, 2017

When server virtualization first emerged, it was a hard sell to many administrators. Consolidating all of your servers into an emulated software environment was unappealing for several reasons. Building a server through software sounded like complete science fiction at the time and that consolidation of servers onto fewer physical platforms increased the size of a failure domain. Today, server virtualization is pervasive throughout IT Infrastructures.

Virtualized computing has been growing in popularity for over a decade, but what about virtualized data? Let’s examine some of the myths associated with data virtualization and try to find some clarification.

1. We don’t need to virtualize our data - we already have a data warehouse.

The sources of unstructured data increase every day. For example, social media data can now be collected by interfacing directly with the sites. Also, the Internet of Things, or IoT, is generating an enormous amount of data such as telemetry, or raw data that must be collected and analyzed. You can still use your data warehouse, but virtualization allows you to tie in these new sources of data to produce better information and leverage it to provide a competitive advantage for your business processes.

2. Implementing new data technology isn’t cost effective.

Data virtualization software costs have become comparable to building a custom data center. Additionally, the number of IT personnel needed to process Agile Business Intelligence tasks is much fewer. Scripting through applications like Puppet or Chef allow administrators to automate repetitive tasks. Tools like Microsoft’s Power BI provide BI professionals the ability to incorporate multiple data sources into meaningful reports quickly without the need for highly technical personnel.

3. Querying virtual data can’t perform like physical data queries.

The progression of computing platform technologies, including faster network connectivity, advancements in processor density, and the introduction of new technologies such as Storage Class Memory (SCM) means virtualization software can process queries with multiple unconnected data sources at near real-time speeds. Virtualization software experience and knowledge deficits are the primary sources of this myth more than actual performance data. Switching to data virtualization will likely improve your query processes when compared to physical data queries.

4. Data virtualization is too complex.

This myth is propagated by a lack of knowledge or exposure to tools specifically designed to query virtualized data. Software is available that will allow users to query multiple sources of data from any of several emerging data sources. Most virtualized BI software is straightforward enough to be utilized by non-technical personnel. Implementing data virtualization will allow you to reduce your network administration labor costs.

5. The purpose of data virtualization is to emulate a virtual data warehouse.

Data virtualization can be used as a data warehouse, but it is more beneficial if data marts are connected to existing data warehouses to augment them. They can be added after the fact and can change fluidly. There is no need to incorporate them all into one siloed data source. The flexibility of data virtualization allows you to customize a data structure that fits your business without completely disrupting your current data solution.

6. Data virtualization and data federation are the same thing.

Early virtualization vendors called their offerings data federation products. Later, new concepts were added that evolved these products into an entire virtualization solution. Data federation is only one aspect of data virtualization. Data federation can be helpful for your business by homogenizing data stored on different servers, in different access languages, or with different APIs. This homogenous capability allows for the successful mining of data from a myriad of sources resulting in a capability to maximize knowledge patterns for an effective and customized utilization.

7. Data virtualization can only provide limited data cleansing because of real-time transformation.

This claim can be made about any data query software. It is always best to clean up data in the system natively rather than tax query software with the burden of transforming data as part of the display process. The scope and velocity of raw data collection typically leads to the injection of non-valued data. Native tool sets within the data collection application will perform a cleansing of data during ingestion. These dimension reduction tools assist in the complex acquisition of the heterogeneity nature of raw data. While data virtualization won’t necessarily improve your data cleansing processes, it also will not inhibit your current system.

8. Data virtualization requires shared storage.

Data virtualization works with direct attached storage (DAS) as well as many other types of data storage. DAS can be presented as unique data stores to the hypervisor and used the same way as any other data store. The emergence of Hyper Converged Infrastructure, or HCI, provides another high-performant alternative to SAN based storage. HCI platforms can be customized to include caching tiers and capacity tiers so your data can reside on the appropriate storage resource based on criticality and access requirements. The versatility of virtualization allows you to build a custom data solution within the parameters of your company’s specific needs.

9. Data virtualization can't perform as fast as ETL.

By extracting only the pieces of data used for analytics instead of a full copy of the data (Data Reduction), data virtualization is faster than ETL. Operations perform at higher speeds because the raw data is presented in a more concise method due to compression, algorithmic selection and redundancy elimination. Switching to data virtualization minimizes the time you spend waiting for data queries to complete, allowing you additional time to address or expand deliverables from the data set.

10. Data virtualization can't provide real-time data.

Virtualization sources are updated with live connections instead of snapshot data which could be out of date. It is closer to providing real-time data and faster than other data types that have to maintain persistent connections. Using real-time data in your analysis provides your business with the most current results, facilitating accurate and effective solutions such as machine learning or artificial Intelligence capability.

There is no need for data virtualization to be complex or daunting with the help of companies like Sanity Solutions. Contact us today to find out what data virtualization can do for you. We can help dispel the myths of the unknown when it comes to the application and maintenance of your valuable data.


Comments