Transpose rows into columns in BigQuery (Pivot implementation) [duplicate] - sql

This question already has answers here:
How to Pivot table in BigQuery
(7 answers)
Closed 2 years ago.
I want to generate a new table and place all key value pairs with keys as column names and values as their respective values using BigQuery.
Example:
**Key** **Value**
channel_title Mahendra Guru
youtube_id ugEGMG4-MdA
channel_id UCiDKcjKocimAO1tV
examId 72975611-4a5e-11e5
postId 1189e340-b08f
channel_title Ab Live
youtube_id 3TNbtTwLY0U
channel_id UCODeKM_D6JLf8jJt
examId 72975611-4a5e-11e5
postId 0c3e6590-afeb
I want to convert it to:
**channel_title youtube_id channel_id examId postId**
Mahendra Guru ugEGMG4-MdA UCiDKcjKocimAO1tV 72975611-4a5e-11e5 1189e340-b08f
Ab Live 3TNbtTwLY0U UCODeKM_D6JLf8jJt 72975611-4a5e-11e5 0c3e6590-afeb
How to do it using BigQuery?

BigQuery does not support yet pivoting functions
You still can do this in BigQuery using below approach
But first, in addition to two columns in input data you must have one more column that would specify groups of rows in input that needs to be combined into one row in output
So, I assume your input table (yourTable) looks like below
**id** **Key** **Value**
1 channel_title Mahendra Guru
1 youtube_id ugEGMG4-MdA
1 channel_id UCiDKcjKocimAO1tV
1 examId 72975611-4a5e-11e5
1 postId 1189e340-b08f
2 channel_title Ab Live
2 youtube_id 3TNbtTwLY0U
2 channel_id UCODeKM_D6JLf8jJt
2 examId 72975611-4a5e-11e5
2 postId 0c3e6590-afeb
So, first you should run below query
SELECT 'SELECT id, ' +
GROUP_CONCAT_UNQUOTED(
'MAX(IF(key = "' + key + '", value, NULL)) as [' + key + ']'
)
+ ' FROM yourTable GROUP BY id ORDER BY id'
FROM (
SELECT key
FROM yourTable
GROUP BY key
ORDER BY key
)
Result of above query will be string that (if to format) will look like below
SELECT
id,
MAX(IF(key = "channel_id", value, NULL)) AS [channel_id],
MAX(IF(key = "channel_title", value, NULL)) AS [channel_title],
MAX(IF(key = "examId", value, NULL)) AS [examId],
MAX(IF(key = "postId", value, NULL)) AS [postId],
MAX(IF(key = "youtube_id", value, NULL)) AS [youtube_id]
FROM yourTable
GROUP BY id
ORDER BY id
you should now copy above result (note: you don't really need to format it - i did it for presenting only) and run it as normal query
Result will be as you would expected
id channel_id channel_title examId postId youtube_id
1 UCiDKcjKocimAO1tV Mahendra Guru 72975611-4a5e-11e5 1189e340-b08f ugEGMG4-MdA
2 UCODeKM_D6JLf8jJt Ab Live 72975611-4a5e-11e5 0c3e6590-afeb 3TNbtTwLY0U
Please note: you can skip Step 1 if you can construct proper query (as in step 2) by yourself and number of fields small and constant or if it is one time deal. But Step 1 just helper step that makes it for you, so you can create it fast any time!
If you are interested - you can see more about pivoting in my other posts.
How to scale Pivoting in BigQuery?
Please note – there is a limitation of 10K columns per table - so you are limited with 10K organizations.
You can also see below as simplified examples (if above one is too complex/verbose):
How to transpose rows to columns with large amount of the data in BigQuery/SQL?
How to create dummy variable columns for thousands of categories in Google BigQuery?
Pivot Repeated fields in BigQuery

Related

SQL - Query to split original sort

I hope my title is ok as I really don’t know how to call it.
Anyway, I have a table with the following :
ID - Num (Primary Key)
Category - VarChar
Name - VarChar
DateForName - Date
Data looks like that :
1 100 111 31/12/2017
2 101 210 30/12/2017
3 100 112 29/12/2017
4 101 203 27/12/2017
5 100 117 20/12/2017
6 103 425 08/12/2017
To generate this table, I just sorted by date DESC.
Is there a way to add a new column with the order per Category like :
1 100|1
2 101|1
3 100|2
4 101|2
5 100|3
6 103|1
Max
You want analytical function row_number():
select t.*
from (select *, row_number() over (partition by Category order by date desc) Seq
from table
) t
order by id;
Yes, SQL has a couple options for you to add a column that is populated with a ranking of the rows based on the category and id columns.
If you just want to add a column to the select statement, I recommend using the RANK() function.
See more details here:
https://learn.microsoft.com/en-us/sql/t-sql/functions/rank-transact-sql?view=sql-server-2017
For your current table, try the following select statement:
SELECT
[ID],
[Category],
[Name],
[DateForName],
RANK() OVER (PARTITION BY [Category] ORDER BY [DateForName] DESC) AS [CategoryOrder]
FROM [TableName]
Alternatively, if you want to add a permanent column (aka a field) to the existing table, I recommend treating this as a calculated column. See more information here:
https://learn.microsoft.com/en-us/sql/relational-databases/tables/specify-computed-columns-in-a-table?view=sql-server-2017
Because the new column would be completely based on two pre-existing columns and only those two columns. SQL can do a great job maintaining this for you.
Hope this helps!

SQL Server How To Transpose Rows To Columns, without PIVOT or UNPIVOT or Aggregation [duplicate]

This question already has answers here:
SQL transpose full table
(1 answer)
SQL: Real Transpose
(1 answer)
Closed 4 years ago.
EDIT 1: Both solutions, and the DUPE links work, but none of them retain the column order as I want it. All solutions sort the resulting column names in alphabetical order. If anyone has a solution to that, please post in comments.
EDIT 2: #taryn has posted this SQL Fiddle in comments, that does the column sort also :)
I've seen countless answers for doing something like this, but none are what I am looking to achieve. Almost all involve doing an aggregate, or grouping. So before you rush to flag this as a DUPE, please read the question fully first.
All I'm looking to do is Transpose the rows into columns, with the column names of the original resultset becoming the row values for the 1st column of the new resultset.
Here's how my data looks like, and how I want to transform / tranpose it to.
I've color coded it so you can quickly and clearly understand this.
In excel, I would do this by selecting the 1st table, copying it, then right-clicking and pasting it as Paste Special and check the Transpose checkbox.
I've tried PIVOT and UNPIVOT and neither seems to give me what I want. I'm likely not using it correctly, but I've spent more time than I anticipated trying to figure this out.
I've created a SQL Fiddle with the source table, sample data, and what I expect here, so you have something to start with => http://www.sqlfiddle.com/#!18/56afd/10
Here's also the code pasted inline.
IF OBJECT_ID ('dbo.Players') IS NOT NULL
DROP TABLE dbo.Players;
CREATE TABLE dbo.Players
(
PlayerID INT
, Win INT
, Defeat INT
, StandOff INT
, CONSTRAINT PK_Players PRIMARY KEY CLUSTERED (PlayerID) ON [PRIMARY]
);
INSERT INTO dbo.Players (PlayerID, Win, Defeat, StandOff)
VALUES
(1, 7, 6, 9),
(2, 12, 5, 0),
(3, 3, 11, 1);
And here's the expected output
SELECT * FROM dbo.Players;
-- Need to Transpose above results, into the following.
-- -------------------------------------------------------------------
-- | Stat_Type | Player_1 | Player_2 | Player_3 |
-- -------------------------------------------------------------------
-- | Win | 7 | 12 | 3 |
-- -------------------------------------------------------------------
-- | Defeat | 6 | 5 | 11 |
-- -------------------------------------------------------------------
-- | StandOff | 9 | 0 | 1 |
-- -------------------------------------------------------------------
-- Column Names become Row values for 1st column
-- PlayerId becomes column names
Using UNPIVOT/PIVOT:
WITH unpiv AS (
SELECT PlayerId, col, value
FROM dbo.Players
UNPIVOT (VALUE FOR col IN (Win, Defeat, StandOff)) unpiv
)
SELECT col AS Stat_Type, [1] AS Player1, [2] AS Player2, [3] AS Player3
FROM unpiv
PIVOT (MAX(value) FOR PlayerId IN ([1], [2], [3])) piv
ORDER BY CASE col WHEN 'Win' THEN 1 WHEN 'Defeat' THEN 2 ELSE 3 END;
DBFiddle Demo
In SQL Server, I would do:
select v.outcome,
max(case when v.playerId = 1 then val end) as playerId_1,
max(case when v.playerId = 2 then val end) as playerId_2,
max(case when v.playerId = 3 then val end) as playerId_3
from players cross apply
(values (playerId, win, 'win', 1),
(playerId, defeat, 'defeat', 2),
(playerId, standoff, 'standoff', 3)
) v(playerId, val, outcome, ordering)
group by v.outcome
order by max(ordering);
Here is the SQL Fiddle.
You should be able to do the same thing with pivot/unpivot. If you don't know the full list of players or outcomes, then you will need dynamic SQL.
UPDATE: Here's SQL Fiddle with custom sort order. Credit: #Taryn

Alternative for GROUP BY and STUFF in SQL

I am writing some SQL queries in AWS Athena. I have 3 tables search, retrieval and intent. In search table I have 2 columns id and term i.e.
id term
1 abc
1 bcd
2 def
1 ghd
What I want is to write a query to get:
id term
1 abc, bcd, ghd
2 def
I know this can be done using STUFF and FOR XML PATH but, in Athena all the features of SQL are yet not supported. Is there any other way to achieve this. My current query is:
select search.id , STUFF(
(select ',' + search.term
from search
FOR XML PATH('')),1,1,'')
FROM search
group by search.id
Also, I have one more question. I have retrieval table that consist of 3 columns i.e.:
id time term
1 0 abc
1 20 bcd
1 100 gfh
2 40 hfg
2 60 lkf
What I want is:
id time term
1 100 gfh
2 60 lkf
I want to write a query to get the id and term on the basis of max value of time. Here is my current query:
select retrieval.id, max(retrieval.time), retrieval.term
from search
group by retrieval.id, retrieval.term
order by max(retrieval.time)
I am getting duplicate id's along with the term. I think it is because, I am doing group by on id and term both. But, I am not sure how can I achieve it without using group by.
The XML method is brokenness in SQL Server. No reason to attempt it in any other database.
One method uses arrays:
select s.id, array_agg(s.term)
from search s
group by s.id;
Because the database supports arrays, you should learn to use them. You can convert the array to a string:
select s.id, array_join(array_agg(s.term), ',') as terms
from search s
group by s.id;
Group by is a group operation: think that you are clubbing the results and have to find min, max, count etc.
I am answering only one question. Use it to find the answer to question 1
For question 2:
select
from (select id, max(time) as time
from search
group by id, term
order by max(time)
) search_1, search as search_2
where search_1.id = search_2.id
and search_1.time = search_2.time

How to scale Pivoting in BigQuery?

Let's say, I have music video play stats table mydataset.stats for a given day (3B rows, 1M users, 6K artists).
Simplified schema is:
UserGUID String, ArtistGUID String
I need pivot/transpose artists from rows to columns, so schema will be:
UserGUID String, Artist1 Int, Artist2 Int, … Artist8000 Int
With Artist plays count by respective user
There was an approach suggested in How to transpose rows to columns with large amount of the data in BigQuery/SQL? and How to create dummy variable columns for thousands of categories in Google BigQuery? but looks like it doesn’t scale for numbers I have in my example
Can this approach be scaled for my example?
I tried below approach for up to 6000 features and it worked as expected. I believe it will work up to 10K features which is hard limit for number of columns in a table
STEP 1 - Aggregate plays by user / artist
SELECT userGUID as uid, artistGUID as aid, COUNT(1) as plays
FROM [mydataset.stats] GROUP BY 1, 2
STEP 2 – Normalize uid and aid – so they are consecutive numbers 1, 2, 3, … .
We need this at least for two reasons: a) make later dynamically created sql as compact as possible and b) to have more usable/friendly columns names
Combined with first step – it will be:
SELECT u.uid AS uid, a.aid AS aid, plays
FROM (
SELECT userGUID, artistGUID, COUNT(1) AS plays
FROM [mydataset.stats]
GROUP BY 1, 2
) AS s
JOIN (
SELECT userGUID, ROW_NUMBER() OVER() AS uid FROM [mydataset.stats] GROUP BY 1
) AS u ON u. userGUID = s.userGUID
JOIN (
SELECT artistGUID, ROW_NUMBER() OVER() AS aid FROM [mydataset.stats] GROUP BY 1
) AS a ON a.artistGUID = s.artistGUID
Let’s write output to table - mydataset.aggs
STEP 3 – Using already suggested (in above mentioned questions) approach for N features (artists) at a time.
In my particular example, by experimenting, I found that basic approach works well for number of features between 2000 and 3000.
To be on safe side I decided to use 2000 features at a time
Below script is used for dynamically generating query that then run to create partitioned tables
SELECT 'SELECT uid,' +
GROUP_CONCAT_UNQUOTED(
'SUM(IF(aid=' + STRING(aid) + ',plays,NULL)) as a' + STRING(aid)
)
+ ' FROM [mydataset.aggs] GROUP EACH BY uid'
FROM (SELECT aid FROM [mydataset.aggs] GROUP BY aid HAVING aid > 0 and aid < 2001)
Above query produces yet another query like below:
SELECT uid,SUM(IF(aid=1,plays,NULL)) a1,SUM(IF(aid=3,plays,NULL)) a3,
SUM(IF(aid=2,plays,NULL)) a2,SUM(IF(aid=4,plays,NULL)) a4 . . .
FROM [mydataset.aggs] GROUP EACH BY uid
This should be run and written to mydataset.pivot_1_2000
Executing STEP 3 two more times (adjusting HAVING aid > NNNN and aid < NNNN) we get three more tables mydataset.pivot_2001_4000, mydataset.pivot_4001_6000
As you can see - mydataset.pivot_1_2000 has expected schema but for features with aid from 1 to 2001; mydataset.pivot_2001_4000 has only features with aid from 2001 to 4000; and so on
STEP 4 – Merging all partitioned pivot table to final pivot table with all features represented as columns in one table
Same as in above steps. First we need generate query and then run it
So, initially we will “stitch” mydataset.pivot_1_2000 and mydataset.pivot_2001_4000. Then result with mydataset.pivot_4001_6000
SELECT 'SELECT x.uid uid,' +
GROUP_CONCAT_UNQUOTED(
'a' + STRING(aid)
)
+ ' FROM [mydataset.pivot_1_2000] AS x
JOIN EACH [mydataset.pivot_2001_4000] AS y ON y.uid = x.uid
'
FROM (SELECT aid FROM [mydataset.aggs] GROUP BY aid HAVING aid < 4001 ORDER BY aid)
Output string from above should be run and result written to mydataset.pivot_1_4000
Then we repeat STEP 4 like below
SELECT 'SELECT x.uid uid,' +
GROUP_CONCAT_UNQUOTED(
'a' + STRING(aid)
)
+ ' FROM [mydataset.pivot_1_4000] AS x
JOIN EACH [mydataset.pivot_4001_6000] AS y ON y.uid = x.uid
'
FROM (SELECT aid FROM [mydataset.aggs] GROUP BY aid HAVING aid < 6001 ORDER BY aid)
Result to be written to mydataset.pivot_1_6000
The resulted table has following schema:
uid int, a1 int, a2 int, a3 int, . . . , a5999 int, a6000 int
NOTE:
a. I tried this approach only up to 6000 features and it worked as expected
b. Run time for second/main queries in step 3 and 4 varied from 20 to 60 min
c. IMPORTANT: billing tier in steps 3 and 4 varied from 1 to 90. The good news is that respective table’s size is relatively small (30-40MB) so does billing bytes. For “before 2016” projects everything is billed as tier 1 but after October 2016 this can be an issue.
For more information, see Timing in High-Compute queries
d. Above example shows power of large-scale data transformation with BigQuery! Still I think (but I can be wrong) that storing materialized feature matrix is not the best idea

Need to Transform a Rows of Data into a single Row

I have a set of Data in MS Access
Number Owner
1 Heelo
1 Hi
1 There
2 What
2 Up
This needs to be transferrid into
Number Owner1 Owner2 Owner3 Owner4
1 Heelo Hi There -
2 What Up - -
Any idea on how to go on with this?
The crux in this case is we don't have a third column from where we can pivot the data.
You could add a third column with a sequence of numbers:
SELECT Number, (select count(*)
from YourTable as s
where s.number = t.number) as sequence, owner
from YourTable as t
then apply this solution to the results: SQL to transpose row pairs to columns in MS ACCESS database