Turn columns into values in BigQuery - google-bigquery

I have the following data:
date_ country category_A category_B
2022-12-11 USA 100 200
2022-12-11 Canada 2000 400
which is generated by this query:
select
date('2022-12-11') as date_, 'USA' as country, 100 as category_A, 200 as category_B,
union all select
date('2022-12-11') as date_, 'Canada' as country, 2000 as category_A, 400 as category_B
What I would like to do is to turn only the last two columns of the data into values and get a single column for the values, like this table:
date_ country category value
2022-12-11 USA category_A 100
2022-12-11 USA category_B 200
2022-12-11 Canada category_A 2000
2022-12-11 Canada category_B 400

Maybe it's not perfect but you can use the following query :
with countries AS
(select
date('2022-12-11') as date_, 'USA' as country, 100 as category_A, 200 as category_B,
union all select
date('2022-12-11') as date_, 'Canada' as country, 2000 as category_A, 400 as category_B
)
select
date_,
country,
category
from countries,
unnest(array[struct('category_A' AS category_name, category_A AS category_value), struct('category_B' AS category_name, category_B AS category_value)]) as category;
The result is :
In the unnest, you need to add all the expected category columns.

Related

Left only MAX value per group

I have got table like this:
TransactionID
…
Cost
MaxCostPerGroup
1234
...
1550
1550
2342
...
1950
2000
2342
...
2000
2000
4444
...
600
600
4444
...
400
600
4444
...
500
600
TransactionID – not unique
… - a lot of columns (30+)
Cost – could be different to one TransactionID
MaxCostPerGroup column shows max value for each TransactionID.
To continue working with data I need to bring table to the following form:
TransactionID
…
Cost
MaxCostPerGroup
1234
...
1550
1550
2342
...
1950
null
2342
...
2000
2000
4444
...
600
600
4444
...
400
null
4444
...
500
null
Then I want to sum MaxCostPerGroup by date (for example). Problem is that I must save every row, I cannot just group by. In ‘…’ section a lot of unique information, that is why I want to left only one value per TransactionID in last column. How can I do it with SQL?
Many thanks.
Using your data, I got the max value by partitioning by TransactionId and adding a IF statement to add NULLs.
See query below:
WITH sample_data as(
select '1234' as TransactionID, 1550 as Cost, 1550 as MaxCostPerGroup,
union all select '2342' as TransactionID, 1950 as Cost, 2000 as MaxCostPerGroup,
union all select '2342' as TransactionID, 2000 as Cost, 2000 as MaxCostPerGroup,
union all select '4444' as TransactionID, 600 as Cost, 600 as MaxCostPerGroup,
union all select '4444' as TransactionID, 400 as Cost, 600 as MaxCostPerGroup,
union all select '4444' as TransactionID, 500 as Cost, 600 as MaxCostPerGroup
),
get_max as (
select TransactionId,
Cost,
max(MaxCostPerGroup) OVER (PARTITION BY TransactionId) as max_per_id
from sample_data
),
add_null as (
select TransactionId,
Cost,
max_per_id,
if (Cost = max_per_id, max_per_id, NULL) as MaxCostPerGroup
from get_max
)
select TransactionId,Cost,MaxCostPerGroup from add_null
Output:

Spliting column values into groups and summing up values from linked table

I have two tables in SQL, they are linked by Customer_ID.
customers
customer_id; account_created; company_name; city;
1 11/10/2011 abc new york
2 1/1/2018 xyz los angeles
3 11/10/2012 finance new jersey
4 21/04/2013 juices san francisco
orders
order_id; customer_id; order_date; shipping date; order_value; currency;
100 1 19/10/2019 20/10/2019 4000 USD
101 3 1/10/2019 2/10/2019 300 USD
102 2 13/11/2019 15/11/2019 7000 USD
103 4 12/9/2019 20/9/2019 100 USD
104 1 10/11/2019 12/11/2019 3000 USD
I would like to divide orders into two regions: East (contains New York, Boston and New Jersey) and West (Los Angeles, San Francisco) and then show sum of order_value for both regions in a way:
Region sum of order_value
East 10000
West 20000
Here are the tables, sorry they are in image, I can't format them (will learn asap!)
It seems really weird to call "New Jersey" as city. In any case, you want a case expression of some sort to assign the region, and then aggregation:
select (case when city in ('New York', 'Boston', 'New Jersey') then 'East'
when city in ('Los Angeles', 'San Francisco') then 'West'
else '???'
end) as region,
sum(order_value)
from customers c join
orders o
on o.customer_id = c.customer_id
group by (case when city in ('New York', 'Boston', 'New Jersey') then 'East'
when city in ('Los Angeles', 'San Francisco') then 'West'
else '???'
end)
Just add a third table with fields: City|Region
Then you just join the 3 tables and group by Region and sum your orders value.
No code needed.

I wonder how it works when multiple group by, like group by column_name(1), column_name(2), column_name(3)

When i checked it, it doesn't remove duplication of value. Why?
example) Group by a , Group by a,b,c
Is there a difference between Group by a, Group by a,b,c ?
I wrote SQL query like this ::
SELECT COUNT(CustomerID), Country
FROM Customers
GROUP BY Country;
result ::
Table: Customers
COUNT(CustomerID) Country
---------------------------------
3 Argentina
2 Austria
2 Belgium
9 Brazil
3 Canada
2 Denmark
2 Finland
to
SELECT COUNT(CustomerID), Country
FROM Customers
GROUP BY Country, CustomerID;
Table: Customers
COUNT(CustomerID) Country
---------------------------------
1 Germany
1 Mexico
1 Mexico
1 UK
1 Sweden
1 Germany
1 France
Why doesn't tie same value changed query from Column_name?
It display all value along column_name.
I wonder if it works. thank you.

Ordering Query for Country & City data for the scenario given

My input data is below :
**Country city**
Australia Sydney
Australia melbourne
India Delhi
India Chennai
India Bangalore
Afghanistan Kabul
Output expected is:
Afghanistan
Kabul
Australia
melbourne
syndey
India
Bangalore
Chennai
Delhi
The data in both columns should be arranged alphabetically(both city level and country level) and result should be single column with above values. The country should be alphabetically ordered and the corresponding cities should go below them which should also be alphabetically ordered.
How can this be done without using an intermediate table in a single query?
You need a UNION ALL query to get one row per country and one row per city in your result:
select coalesce(city, country) as location
from
(
select distinct country, null as city from mytable
union all
select country, city from mytable
)
order by country, city nulls first;
This has a single table scan and also does not need to use UNION to get distinct results:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE cities ( Country, city ) AS
SELECT 'Australia', 'Sydney' FROM DUAL UNION ALL
SELECT 'Australia', 'melbourne' FROM DUAL UNION ALL
SELECT 'India', 'Delhi' FROM DUAL UNION ALL
SELECT 'India', 'Chennai' FROM DUAL UNION ALL
SELECT 'India', 'Bangalore' FROM DUAL UNION ALL
SELECT 'Afghanistan', 'Kabul' FROM DUAL;
Query 1:
SELECT value
FROM (
SELECT c.*,
country AS ctry,
ROW_NUMBER() OVER ( PARTITION BY Country ORDER BY city ) AS rn
FROM cities c
)
UNPIVOT( value FOR key IN ( Country AS 1, City AS 2 ) )
WHERE rn = 1 OR key = 2
ORDER BY ctry, rn, key
Results:
| VALUE |
|-------------|
| Afghanistan |
| Kabul |
| Australia |
| Sydney |
| melbourne |
| India |
| Bangalore |
| Chennai |
| Delhi |

SQL: Find max number of rows

I am totally new to coding so might be my question is silly sorry about it first.
I have a database that has CUST_REFERRED represent CUST_NUMBER who referred book someone
CUST_NUM NAME_S NAME_F ADDRESS Z_CODE CUST_REFERRED
1001 MORALES BONITA P.O. BOX 651 32328
1002 THOMPSON RYAN P.O. BOX 9835 90404
1003 SMITH LEILA P.O. BOX 66 32306
1004 PIERSON THOMAS 69821 SOUTH AVENUE 83707
1005 GIRARD CINDY P.O. BOX 851 98115
1006 CRUZ MESHIA 82 DIRT ROAD 12211
1007 GIANA TAMMY 9153 MAIN STREET 78710 1003
1008 JONES KENNETH P.O. BOX 137 82003
1009 PEREZ JORGE P.O. BOX 8564 91510 1003
1010 LUCAS JAKE 114 EAST SAVANNAH 30314
1011 MCGOVERN REESE P.O. BOX 18 60606
1012 MCKENZIE WILLIAM P.O. BOX 971 02110
1013 NGUYEN NICHOLAS 357 WHITE EAGLE AVE 34711 1006
1014 LEE JASMINE P.O. BOX 2947 82414
1015 SCHELL STEVE P.O. BOX 677 33111
1016 DAUM MICHELL 9851231 LONG ROAD 91508 1010
1017 NELSON BECCA P.O. BOX 563 49006
1018 MONTIASA GREG 1008 GRAND AVENUE 31206
1019 SMITH JENNIFER P.O. BOX 1151 07962 1003
1020 FALAH KENNETH P.O. BOX 335 08607
My idea is to find customer who referred max book. So as you can see 3 times 1003 number referred book who name is LEILA SMITH
I tried a code that;
SELECT
CUST_REFERRED,
COUNT(*)
FROM
CUSTOMER
GROUP BY
CUST_REFERRED
ORDER BY CUST_REFERRED ASC;
This code gives me:
1003 3
1006 1
1010 1
First, my question is I could not use LIMIT function to find max number
and the second question is How Can I add more information of customer?
SELECT NAME_F,
NAME_S,
ADDRESS,
CUST_REFERRED
FROM CUSTOMER
WHERE CUST_NUM = (SELECT MOST_CUS_REF
FROM (SELECT CUST_REFERRED MOST_CUS_REF, COUNT(CUST_REFERRED)
MOST_CUS_REF_COUNT
FROM (SELECT CUST_REFERRED
FROM customer
WHERE cust_referred IS NOT NULL
)
GROUP BY CUST_REFERRED
HAVING COUNT(CUST_REFERRED) = (SELECT MAX (cust_ref_num)
FROM (SELECT CUST_REFERRED,
COUNT(CUST_REFERRED) cust_ref_num
FROM (SELECT CUST_REFERRED
FROM customer
WHERE cust_referred IS NOT NULL
)
GROUP BY CUST_REFERRED
)
)
)
)
;
try this:
Select CUST_REFERRED, z.cnt from
(SELECT CUST_REFERRED, COUNT(*) cnt
FROM CUSTOMER where CUST_REFERRED is Not null
GROUP BY CUST_REFERRED) Z
where z.cnt =
(select Max(cnt) from
(SELECT COUNT(*) cnt
FROM CUSTOMER where CUST_REFERRED is Not null
GROUP BY CUST_REFERRED) ZZ)
Try this query --
;WITH CTE
AS (
SELECT CUSTOMER_REFID COUNT(*) AS REF_COUNT
FROM CUSTOMER
GROUP BY CUSTOMER_REFID
)
SELECT TOP 1 C2.CUSTOMER_ID
,C2.FIRST_NAME
,C2.LAST_NAME
,REF_COUNT
FROM CTE C1
INNER JOIN CUSTOMER C2
ON C1.CUSTOMER_REFID = C2.CUSTOMER_ID
ORDER BY REF_COUNT DESC
edited to add the referred customer details.
with data (cust_num, name_s, name_f, addr, code, cust_referred) as
(
/* begin: test data */
select 1001 ,'MORALES ','BONITA ','P.O. BOX 651 ',32328, null from dual union all
select 1002 ,'THOMPSON ','RYAN ','P.O. BOX 9835 ',90404, null from dual union all
select 1003 ,'SMITH ','LEILA ','P.O. BOX 66 ',32306, null from dual union all
select 1004 ,'PIERSON ','THOMAS ','69821, SOUTH AVENUE ',83707, null from dual union all
select 1005 ,'GIRARD ','CINDY ','P.O. BOX 851 ',98115, null from dual union all
select 1006 ,'CRUZ ','MESHIA ','82 DIRT ROAD ',12211, null from dual union all
select 1007 ,'GIANA ','TAMMY ','9153 MAIN STREET ',78710, 1003 from dual union all
select 1008 ,'JONES ','KENNETH ','P.O. BOX 137 ',82003, null from dual union all
select 1009 ,'PEREZ ','JORGE ','P.O. BOX 8564 ',91510, 1003 from dual union all
select 1010 ,'LUCAS ','JAKE ','114 EAST SAVANNAH ',30314, null from dual union all
select 1011 ,'MCGOVERN ','REESE ','P.O. BOX 18 ',60606, null from dual union all
select 1012 ,'MCKENZIE ','WILLIAM ','P.O. BOX 971 ',02110, null from dual union all
select 1013 ,'NGUYEN ','NICHOLAS ','357 WHITE EAGLE AVE ',34711, 1006 from dual union all
select 1014 ,'LEE ','JASMINE ','P.O. BOX 2947 ',82414, null from dual union all
select 1015 ,'SCHELL ','STEVE ','P.O. BOX 677 ',33111, null from dual union all
select 1016 ,'DAUM ','MICHELL ',',9851231, LONG ROAD ',91508, 1010 from dual union all
select 1017 ,'NELSON ','BECCA ','P.O. BOX 563 ',49006, null from dual union all
select 1018 ,'MONTIASA ','GREG ','1008 GRAND AVENUE ',31206, null from dual union all
select 1019 ,'SMITH ','JENNIFER ','P.O. BOX 1151 ',07962, 1003 from dual union all
select 1020 ,'FALAH ','KENNETH ','P.O. BOX 335 ',08607, null from dual
/* end: test data */
-- replace the above block with your table
-- eg. select * from customers_table
)
,
max_referred as
(
-- just interested in the first row after sorting by
-- the count of referred column values
select rownum, cust_referred, cnt from
(
select cust_referred, count(cust_referred) cnt from data group by cust_referred order by 2 desc
)
where rownum = 1
)
-- joining on cust_referred column in *data* and *max_referred* tables to get the customer details
-- and joining again to the *data* table for fetching the referred customer name
select
cust.cust_num, cust.name_s, cust.name_f, cust.addr, cust.code, cust.cust_referred, ms.name_f || ms.name_s as "Referred Customer"
from
data cust
join
max_referred mr on (cust.cust_referred = mr.cust_referred)
join
data ms
on (mr.cust_referred = ms.cust_num)
;
You can do it in a single table scan (i.e. without any self joins) using analytic functions:
SELECT *
FROM (
SELECT t.*,
MIN( CUST_REFERRED )
KEEP ( DENSE_RANK FIRST ORDER BY num_referrals DESC )
OVER ()
AS best_referrer
FROM (
SELECT c.*,
COUNT( CUST_REFERRED )
OVER ( PARTITION BY CUST_REFERRED )
AS num_referrals
FROM CUSTOMER c
) t
)
WHERE cust_num = best_referrer;