SQL Pivot: Converting Rows to Columns – The Databricks Blog | xxxSQL Pivot: Converting Rows to Columns – The Databricks Blog – xxx
菜单

SQL Pivot: Converting Rows to Columns – The Databricks Blog

十月 14, 2018 - MorningStar

SQL Pivot: Converting Rows to Columns

Try this notebook in Databricks

Pivot was first introduced in Apache Spark 1.6 as a new DataFrame feature that allows users to rotate a table-valued expression by turning the unique values from one column into individual columns.

The upcoming Apache Spark 2.4 release extends this powerful functionality of pivoting data to our SQL users as well. In this blog, using temperatures recordings in Seattle, we’ll show how we can use this common SQL Pivot feature to achieve complex data Temp (°F)……08-01-20185908-02-20185808-03-20185908-04-20185808-05-20185908-06-201859……

To combine this table with the previous table of daily high temperatures, we could join these two tables on the “Date” column. However, since we are going to use pivot, which performs grouping on the dates, we can simply concatenate the two tables using UNION ALL. And you’ll see later, this approach also provides us with more flexibility:

SELECT date, temp, 'H' as flag FROM high_temps UNION ALL SELECT date, temp, 'L' as flag FROM low_temps 

Now let’s try our pivot query with the new combined table:


SELECT * FROM ( SELECT year(date) year, month(date) month, temp, flag `H/L` FROM ( SELECT date, temp, 'H' as flag FROM high_temps UNION ALL SELECT date, temp, 'L' as flag FROM low_temps ) WHERE date BETWEEN DATE '2015-01-01' AND DATE '2018-08-31' ) PIVOT ( CAST(avg(temp) AS DECIMAL(4, 1)) FOR month in (6 JUN, 7 JUL, 8 AUG, 9 SEP) ) ORDER BY year DESC, `H/L` ASC

As a result, we get the average high and average low for each month of the past 4 years in one table. Note that we need to include the column flag in the pivot query, otherwise the expression avg(temp) would be based on a mix of high and low temperatures.

yearH/LJUNJULAUGSEP
2018H71.982.879.1NULL
2018L53.458.558.5NULL
2017H72.178.381.573.8
2017L53.756.359.055.6
2016H73.176.079.569.9
2016L53.957.659.952.9
2015H78.982.679.068.5
2015L56.459.958.552.5

You might have noticed that now we have two rows for each year, one for the high temperatures and the other for low temperatures. That’s because we have included one more column, flag, in the pivot input, which in turn becomes another implicit grouping column in addition to the original column year.

Alternatively, instead of being a grouping column, the flag can also serve as a pivot column. So now we have two pivot columns, month and flag:

SELECT * FROM (   SELECT year(date) year, month(date) month, temp, flag   FROM (     SELECT date, temp, 'H' as flag     FROM high_temps     UNION ALL     SELECT date, temp, 'L' as flag     FROM low_temps   )   WHERE date BETWEEN DATE '2015-01-01' AND DATE '2018-08-31' ) PIVOT (   CAST(avg(temp) AS DECIMAL(4, 1))   FOR (month, flag) in (     (6, 'H') JUN_hi, (6, 'L') JUN_lo,     (7, 'H') JUL_hi, (7, 'L') JUL_lo,     (8, 'H') AUG_hi, (8, 'L') AUG_lo,     (9, 'H') SEP_hi, (9, 'L') SEP_lo   ) ) ORDER BY year DESC 

This query presents us with a different layout of the same data, with one row for each year, but two columns for each month.

yearJUN_hiJUN_loJUL_hiJUL_loAUG_hiAUG_loSEP_hiSEP_lo
201871.953.482.858.579.158.5NULLNULL
201772.153.778.356.381.559.073.855.6
201673.153.976.057.679.557.969.652.9
201578.956.482.659.979.058.568.552.5

What’s Next

To run the query examples used in this blog, please check the pivot SQL examples in this accompanying blog.

Thanks to the Apache Spark community contributors for their contributions!

SQL Pivot: Converting Rows to Columns - The Databricks Blog

Try Databricks for free. Get started today

Databricks Inc.
160 Spear Street, 13th Floor
San Francisco, CA 94105
1-866-330-0121

Contact Us


Notice: Undefined variable: canUpdate in /var/www/html/wordpress/wp-content/plugins/wp-autopost-pro/wp-autopost-function.php on line 51