Transpose in Google BigQuery/Excel - sql

I have a question regarding data transport in BQ (or actually export and do it in Excel). I am trying to get this result (Sorry I am new and not sure how to separate 2 columns, variant1 and variant2 should be 2 columns) :
ClientID
Date
Variant1. Variant2
AB
12/2
123. 456
My current query will give this output:
ClientID
Date
Variant
AB
12/1
123
AB
12/2
456
SELECT DISTINCT
case when (hits.ecommerceAction.action_type = '3') then date end date, [enter image description here][1]
clientId AS client_id,
page.pagepath as pagepath,
product.productVariant as variant,
FROM
`xxxx.ga_sessions_`,
UNNEST(hits) AS hits, unnest(hits.product) as product
Is there anyway that I can use to achieve the transpose step? My current output is more like a master data, all the product related information is under one column. Appreciate if you can share any thoughts!

Consider below approach
select * from (
select ClientID, Variant, Pagepath,
max(Date) over win Date,
row_number() over (win order by Date) pos
from your_current_output
window win as (partition by ClientID)
)
pivot (any_value(Variant) as Variant, any_value(Pagepath) as Pagepath for pos in (1,2,3))
if to apply to sample in your question
with your_current_output as (
select '12/1' Date, 123 ClientID, 'abc' Variant, 'fis.com' Pagepath union all
select '12/2', 123, 'efg', 'fere.com'
)
output is

Related

Window Function - date ranges

I'm trying to calculate duration between different status. Which is working for most part.
I have this table
Table
for id = 102, I was able to calculate duration of each status.
with ab as (
select id,
status,
max(updated_time) as end_time,
min(updated_time) as updated_time
from Table
group by id, status
)
select *,
lead(updated_time) over (partition by id order by updated_time) - updated_time as duration,
extract(epoch from duration) as duration_seconds
from ab
Output for id = 102
but for id = 101, status moved between 'IN_PROGRESS' to 'BLOCKED' & back to 'IN_PROGRESS'
here I need the below result so that I can get the correct IN_PROGRESS duration
Expected
One way to do this would be to track every time there is a change of STATUS for a given ID sorted by VERSION. The below query provides the desired output. More than brevity, I thought having multiple steps showing the transformations would be helpful. The column UNIX timestamp can be easily converted to human readable DateTimestamp format based on the specific database being used. The sample table definition and file used has also been shared below.
Query
WITH VW_STATUS_CHANGE AS
(
SELECT ID, STATUS, LAG(STATUS) OVER (PARTITION BY ID ORDER BY VERSION) LAG_STATUS, VERSION, UNIXTIME,
CASE WHEN LAG (STATUS) OVER (PARTITION BY ID ORDER BY VERSION) <> STATUS THEN 1 ELSE 0 END STATUS_CHANGE
FROM STACKOVERFLOWSQL
),
VW_CREATE_SYNTHETIC_PARTITION AS
(
SELECT ID, STATUS, LAG_STATUS, VERSION, UNIXTIME,STATUS_CHANGE,
SUM(STATUS_CHANGE) OVER (ORDER BY ID, VERSION) AS ROWNUMBER
FROM VW_STATUS_CHANGE
) ,
VW_RESULTS_INTERMEDIATE AS
(
SELECT ID, STATUS, LAG_STATUS, VERSION, UNIXTIME, STATUS_CHANGE,
"FIRST_VALUE"(UNIXTIME) OVER (
PARTITION BY "ID",
"STATUS", ROWNUMBER
ORDER BY
"VERSION"
) "TIME_FIRST_VALUE",
"FIRST_VALUE"(UNIXTIME) OVER (
PARTITION BY "ID",
"STATUS", ROWNUMBER
ORDER BY
"VERSION" DESC
) "TIME_LAST_VALUE"
FROM VW_CREATE_SYNTHETIC_PARTITION
ORDER BY ID, VERSION
)
SELECT DISTINCT ID, STATUS, TIME_FIRST_VALUE, TIME_LAST_VALUE
FROM VW_RESULTS_INTERMEDIATE
ORDER BY TIME_FIRST_VALUE
AWS Athena Table Used along with Sample data.
CREATE EXTERNAL TABLE STACKOVERFLOWSQL (
ID INTEGER,
STATUS STRING,
VERSION INTEGER,
UNIXTIME INTEGER
)
ROW FORMAT SERDE 'ORG.APACHE.HADOOP.HIVE.SERDE2.OPENCSVSERDE'
WITH SERDEPROPERTIES (
'SEPARATORCHAR' = ',',
"SKIP.HEADER.LINE.COUNT"="1"
)
STORED AS TEXTFILE
LOCATION 'S3://<S3BUCKETNAME>/';
Dataset Used:
ID,STATUS,VERSION,UNIXTIME
101,NOT_ASSIGNED,1,1668124141
101,IN_PROGRESS,2,1668124143
101,IN_PROGRESS,3,1668124146
101,IN_PROGRESS,4,1668124150
101,IN_PROGRESS,5,1668124155
101,BLOCKED,6,1668124161
101,BLOCKED,7,1668124168
101,IN_PROGRESS,8,1668124176
101,IN_PROGRESS,9,1668124185
101,IN_PROGRESS,10,1668124195
101,COMPLETED,11,1668124206
105,NOT_ASSIGNED,1,1668124207
105,IN_PROGRESS,2,1668124209
105,IN_PROGRESS,3,1668124212
105,IN_PROGRESS,4,1668124216
105,IN_PROGRESS,5,1668124221
105,IN_PROGRESS,6,1668124227
105,COMPLETED,7,1668124234
Result from the View
ID STATUS TIME_FIRST_VALUE TIME_LAST_VALUE
101 NOT_ASSIGNED 1668124141 1668124141
101 IN_PROGRESS 1668124143 1668124155
101 BLOCKED 1668124161 1668124168
101 IN_PROGRESS 1668124176 1668124195
101 COMPLETED 1668124206 1668124206
105 NOT_ASSIGNED 1668124207 1668124207
105 IN_PROGRESS 1668124209 1668124227
105 COMPLETED 1668124234 1668124234

Get ID from rows with repeated values on two colums to update them later

I have the following table:
ID, campaign, merchant,date.
I need to know what rows are repeated, which means, which have the same campaign with the same merchant, example:
ID, campaign, merchant, date
1, "hello", 260, 01/01/21
2, "hello", 260, 01/01/21
I can do this with this:
select campaign, merchant, count(*)
from public.my_table
group by campaign, merchant
HAVING count(*) > 1
That's ok, but I haven't found the way to change these repeated "campaign" to "hello day/month/year hour:minute:seconds" with this last query.
so what i want is to change the repeated campaigns (for the same merchant) like this:
ID, campaign, merchant, date
1, "hello 01/01/2021 02:22:22", 260
2, "hello 01/01/2021 02:22:32", 260
Its been hours but no luck yet, thank you!
i think I have found a solution minutes later I wrote the post:
UPDATE public.my_table
SET campaign = campaign || date
where id in (
SELECT id FROM
(SELECT *, count(*)
OVER
(PARTITION BY
campaign,
merchant
) AS count
FROM public.my_table) tableWithCount
WHERE tableWithCount.count > 1
)

SQL - Compare 2 different ranges of date

The table has these columns:
DATA, CODE and so on..
I need to display two different ranges of date and its code like:
|data|code|data2|code|
My query is:
SELECT DATA,CODE
FROM people
WHERE DATA >= ${data1} AND DATA <= ${data2}
GROUP BY DATA
ORDER BY DATA
What I did was trying to do 2 queries with differents variable but both return always the same range of data.
So I did something like:
SELECT DATA,CODE
FROM people
WHERE DATA >= ${d1} AND DATA <= ${d2}
GROUP BY DATA
ORDER BY DATA
and try to assign 4 differents date in order to get 2 ranges of period. Let's imagine data1='01-01-2001' and data2='31-12-2001' while d1='01-01-2002' and d2='31-12-2002'.
When I assigned the dates, both return only the last range.
So instead of getting |2001|code|2002|code| I've got |2002|code|2002|code|
I need for comparison, so I want to compare every day of the year 2001 on the left and with every day of the year 2002 on the right.
Using the bind variables :start1 and :end1 as the bounds for the first range and :start2 and :end2 as the bounds for the second range:
SELECT d1.data AS data1,
d1.code AS code1,
d2.data AS data2,
d2.code AS code2
FROM (
SELECT data,
code,
ROW_NUMBER() OVER ( ORDER BY data ) AS rn
FROM people
WHERE data BETWEEN :start1 AND :end1
) d1
FULL OUTER JOIN
(
SELECT data,
code,
ROW_NUMBER() OVER ( ORDER BY data ) AS rn
FROM people
WHERE data BETWEEN :start2 AND :end2
) d2
ON ( d1.rn = d2.rn )

SQL Server select max date per ID

I am trying to select max date record for each service_user_id for each finance_charge_id and the amount that is linked the highest date
select distinct
s.Finance_Charge_ID, MAX(s.start_date), s.Amount
from
Service_User_Finance_Charges s
where
s.Service_User_ID = '156'
group by
s.Finance_Charge_ID, s.Amount
The issue is that I receive multiple entries where the amount is different. I only want to receive the amount on the latest date for each finance_charge_id
At the moment I receive the below which is incorrect (the third line should not appear as the 1st line has a higher date)
Finance_Charge_ID (No column name) Amount
2 2014-10-19 1.00
3 2014-10-16 500.00
2 2014-10-01 1000.00
Remove the Amount column from the group by to get the correct rows. You can then join that query onto the table again to get all the data you need. Here is an example using a CTE to get the max dates:
WITH MaxDates_CTE (Finance_Charge_ID, MaxDate) AS
(
select s.Finance_Charge_ID,
MAX(s.start_date) MaxDate
from Service_User_Finance_Charges s
where s.Service_User_ID = '156'
group by s.Finance_Charge_ID
)
SELECT *
FROM Service_User_Finance_Charges
JOIN MaxDates_CTE
ON MaxDates_CTE.Finance_Charge_ID = Service_User_Finance_Charges.Finance_Charge_ID
AND MaxDates_CTE.MaxDate = Service_User_Finance_Charges.start_date
This can be done using a window function which removes the need for a self join on the grouped data:
select Finance_Charge_ID,
start_date,
amount
from (
select s.Finance_Charge_ID,
s.start_date,
max(s.start_date) over (partition by s.Finance_Charge_ID) as max_date,
s.Amount
from Service_User_Finance_Charges s
where s.Service_User_ID = 156
) t
where start_date = max_date;
As the window function does not require you to use group by you can add any additional column you need in the output.

SQL find min & max range within dataset

I have a table with the following columns:
contactId (int)
interval (int)
date (smalldate)
small sample data:
1,120,'12/02/2010'
1,121,'12/02/2010'
1,122,'12/02/2010'
1,123,'12/02/2010'
1,145,'12/02/2010'
1,146,'12/02/2010'
1,147,'12/02/2010'
2,122,'12/02/2010'
2,123,'12/02/2010'
2,124,'12/02/2010'
2,320,'12/02/2010'
2,321,'12/02/2010'
2,322,'12/02/2010'
2,450,'12/02/2010'
2,451,'12/02/2010'
how/is it possible - to get sql to return columns "contactId, minInterval, maxInterval, date", e.g
1,120,123,'12/02/2010'
1,145,147,'12/02/2010'
2,122,124,'12/02/2010'
2,320,322,'12/02/2010'
2,450,451,'12/02/2010'
hopefully this makes sense, basically i'm looking to figure out the min/max range of the intervals by provider & date for the range where they increment by one... once there is a break in the interval incrementer (e.g. more than one) then it would indicate a new min/max range...
any help is greatly appreciated :)
here is my exact SQL table setup:
create table availability
(
Id (int)
ProviderId (int)
IntervalId (int)
Date (date)
)
sample data
providerid,intervalid,date
1128,108,2010-12-27
1128,109,2010-12-27
1128,110,2010-12-27
1128,111,2010-12-27
1128,112,2010-12-27
1128,113,2010-12-27
1128,114,2010-12-27
1128,120,2010-12-27
1128,121,2010-12-27
1128,122,2010-12-27
1128,123,2010-12-27
1128,124,2010-12-27
1128,125,2010-12-27
1213,108,2010-12-27
1213,109,2010-12-27
1213,110,2010-12-27
1213,111,2010-12-27
1213,112,2010-12-27
1213,113,2010-12-27
1213,114,2010-12-27
1213,115,2010-12-27
1213,232,2010-12-27
1213,233,2010-12-27
1213,234,2010-12-27
3954,198,2010-12-27
3954,199,2010-12-27
3954,200,2010-12-27
3954,201,2010-12-27
3954,202,2010-12-27
3954,203,2010-12-27
3954,204,2010-12-27
3954,205,2010-12-27
3954,206,2010-12-27
3954,207,2010-12-27
3954,208,2010-12-27
3954,209,2010-12-27
3954,210,2010-12-27
3954,211,2010-12-27
3954,212,2010-12-27
3954,213,2010-12-27
3954,214,2010-12-27
3954,215,2010-12-27
3954,216,2010-12-27
3954,217,2010-12-27
3954,218,2010-12-27
3954,229,2010-12-27
3954,230,2010-12-27
3954,231,2010-12-27
3954,232,2010-12-27
3954,233,2010-12-27
3954,234,2010-12-27
1128,108,2010-12-28
1128,109,2010-12-28
1128,110,2010-12-28
1128,111,2010-12-28
1128,112,2010-12-28
1128,113,2010-12-28
1128,114,2010-12-28
1128,115,2010-12-28
1128,116,2010-12-28
3954,186,2010-12-28
3954,187,2010-12-28
3954,188,2010-12-28
3954,189,2010-12-28
3954,190,2010-12-28
3954,213,2010-12-28
3954,214,2010-12-28
3954,215,2010-12-28
3954,216,2010-12-28
3954,217,2010-12-28
3954,218,2010-12-28
3954,219,2010-12-28
3954,220,2010-12-28
3954,221,2010-12-28
3954,222,2010-12-28
sample result using current sql within answers:
1062,180,180,2010-12-20
1062,179,179,2010-12-20
1062,178,178,2010-12-20
1062,177,177,2010-12-20
1062,176,176,2010-12-20
1062,175,175,2010-12-20
1062,174,174,2010-12-20
1062,173,173,2010-12-20
1062,172,172,2010-12-20
1062,171,171,2010-12-20
1062,170,170,2010-12-20
1062,169,169,2010-12-20
1062,168,168,2010-12-20
1062,167,167,2010-12-20
1062,166,166,2010-12-20
1062,165,165,2010-12-20
1062,164,164,2010-12-20
1062,163,163,2010-12-20
1062,162,162,2010-12-20
1062,161,161,2010-12-20
1062,160,160,2010-12-20
1062,159,159,2010-12-20
1062,158,158,2010-12-20
1062,157,157,2010-12-20
1062,156,156,2010-12-20
1062,155,155,2010-12-20
1062,154,154,2010-12-20
1062,153,153,2010-12-20
1062,152,152,2010-12-20
1062,151,151,2010-12-20
1062,150,150,2010-12-20
1062,149,149,2010-12-20
1062,148,148,2010-12-20
1062,147,147,2010-12-20
1062,146,146,2010-12-20
1062,145,145,2010-12-20
1062,144,144,2010-12-20
1062,143,143,2010-12-20
1062,142,142,2010-12-20
1062,141,141,2010-12-20
1062,140,140,2010-12-20
1062,139,139,2010-12-20
1062,138,138,2010-12-20
1062,137,137,2010-12-20
1062,136,136,2010-12-20
1062,135,135,2010-12-20
1062,134,134,2010-12-20
1062,133,133,2010-12-20
1062,132,132,2010-12-20
1062,131,131,2010-12-20
1062,130,130,2010-12-20
1062,129,129,2010-12-20
1062,128,128,2010-12-20
1062,127,127,2010-12-20
1062,126,126,2010-12-20
1062,125,125,2010-12-20
1062,124,124,2010-12-20
1062,123,123,2010-12-20
1062,122,122,2010-12-20
1062,121,121,2010-12-20
1062,120,120,2010-12-20
1062,119,119,2010-12-20
1062,118,118,2010-12-20
1062,117,117,2010-12-20
1062,116,116,2010-12-20
1062,115,115,2010-12-20
1062,114,114,2010-12-20
1062,113,113,2010-12-20
1062,112,112,2010-12-20
In SQL Server, Oracle and PostgreSQL:
WITH q AS
(
SELECT t.*, interval - ROW_NUMBER() OVER (PARTITION BY contactID, date ORDER BY interval) AS sr
FROM mytable t
)
SELECT contactID, date, MIN(interval), MAX(interval)
FROM q
GROUP BY
date, contactID, sr
ORDER BY
date, contactID, sr
Update:
With your test data I get this output:
WITH mytable (providerId, intervalId, date) AS
(
SELECT 1128,108,'2010-12-27' UNION ALL
SELECT 1128,109,'2010-12-27' UNION ALL
SELECT 1128,110,'2010-12-27' UNION ALL
SELECT 1128,111,'2010-12-27' UNION ALL
SELECT 1128,112,'2010-12-27' UNION ALL
SELECT 1128,113,'2010-12-27' UNION ALL
SELECT 1128,114,'2010-12-27' UNION ALL
SELECT 1128,120,'2010-12-27' UNION ALL
SELECT 1128,121,'2010-12-27' UNION ALL
SELECT 1128,122,'2010-12-27' UNION ALL
SELECT 1128,123,'2010-12-27' UNION ALL
SELECT 1128,124,'2010-12-27' UNION ALL
SELECT 1128,125,'2010-12-27'
),
q AS
(
SELECT t.*, intervalId - ROW_NUMBER() OVER (PARTITION BY providerId, date ORDER BY intervalId) AS sr
FROM mytable t
)
SELECT providerId, date, MIN(intervalId), MAX(intervalId)
FROM q
GROUP BY
date, providerId, sr
ORDER BY
date, providerId, sr
1128 2010-12-27 108 114
1128 2010-12-27 120 125
, i. e. exactly what you were after.
Are you sure you are using the query correctly? Are you having duplicates on (providerId, intervalId, date)?
it's probably possible to do this with a SQL query alone, but it will probably be a bit mind-boggling. Basically a subquery to find places where it increments by one, joined to the original dataset, with tons of other logic in there. That's my impression at least.
If I were you,
If this is a one-time deal, don't care about performance and just iterate over it and do the calculation 'manually'.
If this is a production dataset and you need to do this operation on a frequent / automated / performance-intensive setting, then rearrange the original dataset to make this kind of query easier.
Hope one of those options is available to you.