Summary: Presented by Dr. David DeWitt & Dr. Rimma Nehme Today, many organizations have data sets in both a relational DBMS and Hadoop. While systems like Hive, Impala (Cloudera), and Stinger (Hortonworks) can be used to query Hadoop-resident data, they are not capable of answering queries that combine data from the two universes of structured (relational) and semi-structured data stored in Hadoop. In this talk we will describe Polybase, a capability that we introduced as part of the SQL Server PDW V2 release and which is available now with the CTP3 release of SQL16. With Polybase, users can create external tables for data sets stored in HDFS. Currently the software supports the Hadoop-standard text, RC, ORC, and Parquet file formats. Polybase supports both Linux and Windows Hadoop clusters and both the Hortonworks and Cloudera distributions. Once an external table has been defined, users can execute standard SQL queries over external tables stored in HDFS without any knowledge of the Hadoop ecosystem (MapReduce programming is definitely not required). Furthermore, queries can span arbitrary combinations of relational tables stored in SQL Server and external tables stored in HDFS. All of this is totally transparent, allowing existing BI tools and applications to operate on HDFS-resident data without modification. In addition to supporting a single SQL Server instance querying HDFS-resident data, with SQL16, users will be able to easily deploy clusters of SQL Server instances that will operate together as a parallel database system when executing queries over HDFS-resident data.

About David: Prior to joining Microsoft as a Technical Fellow in March 2008, David DeWitt was a professor at the University of Wisconsin-Madison. He served as chair of the Computer Sciences Department at Wisconsin from July 1999 to July 2004. Professor DeWitt is a member of the National Academy of Engineering (1998), a fellow of the American Academy of Arts and Sciences (2007), and an ACM Fellow (1995). In 2008 he received the prestigious ACM Software Systems award for his pioneering work on parallel relational database systems. Professor DeWitt has authored over 120 technical publications.


