Tuesday, August 09, 2011

MS BI and Hadoop Integration using Hadoop Connectors for SQL Server and Parallel Data Warehouse to analyze structured and unstructured data

I'm reading: MS BI and Hadoop Integration using Hadoop Connectors for SQL Server and Parallel Data Warehouse to analyze structured and unstructured dataTweet this !
Not-only SQL (No SQL) is ruling the world of unstructured data for data storage, warehousing and analytics, with Hadoop being the most successful and widely used technology. There are two choices you can make when something is gaining immense acceptance: either you can abandon and keep competing with your own league or you can partner with it and extend your reach deeper. Microsoft is without doubt one of the leaders in database management, data warehousing and analytics apart from IBM, Oracle and Teradata, but on structured data only. Microsoft Research is trying to churn out its own set of products to deal with BIG data and unstructured data challenges, using federated databases capable of MPP. But Hadoop has already earned a proven reputation and acceptance in this world of unstructured data.

The good news is that Microsoft is embracing Hadoop environments slowly and adopting a symbiotic policy. No organizations would have exclusively structured data or exclusive unstructured data, it's always a combination of both. Azure platform is already support Hadoop implementations. Recently Microsoft announced an upcoming CTP release of two new Hadoop connectors for SQL Server as well as Parallel Data Warehouse. Many visionary DW players are already offering a hybrid BI implementation that allows to use MapReduce (used to query data from Hadoop environments) and SQL together. With the release of Hadoop connector for SQL Server, its highly probable that SQL Server becomes a source for Hadoop environments rather than vice-versa as the ocean full of unstructured data sits in Hadoop environments which is nowhere in the reach of SQL Server to accomodate.

Still the interoperability facilitated by this connector, would empower SQL Server to extract data of interest from this ocean of data hosted in Hadoop environments, making MS BI stack even more powerful. Database Engines, ETLs as well as OLAP Engines would see bigger challenges than ever when clients start using Hadoop as a source for SQL Server, but my viewpoint is that it would mostly work other way round. These connectors are opening a door to the possibility where SQL Server based databases as well as data warehouses can/would be used in combination with Hadoop and MapReduce, effectively creating new opportunities for the entire ecosystem of database community from clients to technicians.

Its too early to know the taste of the food before you actually taste it, but you can predict about the taste from the flavor, and that's what I am trying to do as of now. You can read the announcement about these connectors from here.

1 comment:

Carla Sabotta said...

This white paper explorer how SSIS can be used to automate Hadoop & non-Hadoop job executions, and manage data transfers between Hadoop and other data sources.

http://msdn.microsoft.com/en-us/library/jj720569.aspx

Related Posts with Thumbnails