Grouping repeated fields in BQ - google-bigquery

I have the following table where "product" is a repeated field.
How can I group by "id" and merge the repeated field to sum the quantities so the output looks like this.
Trying to find an elegant solution that does not unnest.

Consider below
select id, array(
select as struct sku, sum(quantity) quantity
from t.product
group by sku
) product
from (
select id, array_concat_agg(product) product
from your_table
group by id
) t
if applied to sample data in your question - output is

Related

Using Derby SQL to calculate value for histogram

I have a table with various SKU in totes.
The table is totecontents with below columns:
ToteID
SKU
Each Tote can contain a maximum of 6 SKUs. (programmatically constrained)
select toteid, count(*) as qtypertote
from totecontents
group by toteid;
gives me a list of totes with the number of skus in each.
I now want to get to a table with following result
SkuCount Occurences where each row would have the ordinal value (1 through 6 ) and then the number of occurences of that value.
My efforts included the following approach
select count(*)
from
( select toteid, count(*) as qtypertote
from totecontents
group by toteid)
group by qtypertote;
Stung by the comments I performed more research. This works:
SELECT CountOfskus, COUNT(1) groupedCount
FROM
( SELECT COUNT(*) as countofskus, toteid
FROM totecontents
Group By toteid
) MyTable
GROUP BY countofskus;

Convert column values to columns

Currently the database looks like below :
I am trying to convert it as below :
The best way I could come up with was a SQL pivot. But that groups the Product ID and gives only one of the three 330 rows that we see above. I am not able to think of any other way to approach this. If anyone could think of any way to solve could you please share your thoughts ?
You can use conditional aggregation:
select productid,
max(case when description = 'Part No' then unitdesc end) as partno,
. . . -- and so on for the other columns
from t
group by productid;
EDIT:
I see, you have multiple rows per product. You have a problem, because SQL tables represent unordered sets. There is no ordering, unless you have a column that specifies the ordering. That is not obvious.
So, the following will create single rows, but not necessarily combined as you would like:
select productid,
max(case when description = 'Part No' then unitdesc end) as partno,
. . . -- and so on for the other columns
from (select t.*, row_number() over (partition by productid, description order by productid) as seqnum
from t
) t
group by productid, seqnum;
If you have a column that does capture the ordering of the rows, then use that column in the order by.
You can use LEFT JOIN to retrieve the corresponding values:
select
p.product_id,
n.unit_desc as part_no,
d.unit_desc as description,
pn.unit_desc as price_now,
u.unit_desc as unit
from (select distinct product_id from t) p
left join (select product_id, description from t where description = 'Part No') n
left join (select product_id, description from t where description = 'Description') d
left join (select product_id, description from t where description = 'Price Now') pn
left join (select product_id, description from t where description = 'Unit') u

Finding the maximum price for a given customer id

I need to write a hive query. I am working on a data set that has three columns : Customer ID, Product ID and the Price. I need to write a query which outputs the columns Customer ID and Product ID for the maximum item bought by the customer.
SELECT [customer], [product] FROM table WHERE [price] = (SELECT MAX(t.[price]) AS price
FROM table as t WHERE t.[customer] = [customer])
Could be something like this if you're wanting to find the most expensive item that a customer has purchased? I'm unsure if the syntax is 100% correct but it should give you something to go from. I've added a cheat sheet below for Hive just incase.
Hive Cheat Sheet
Using row_number():
select Customer_ID, Product_ID
from
(select Customer_ID,
Product_ID,
row_number () over ( partition by Customer_ID order by Price desc) rn
from table
where customer_id=given_customer_id --add filter if necessary
)s
where rn=1;

How to formulate a conditional sum in PostgreSQL?

I have a table containing id, category, noofquestions and company. I want a query which would return the noofquestions as sum of the values of noofquestions when category is same in two or more columns. I'm trying this query but it is only adding those columns whose category is same and noofquestions are equal which is wrong. It should not check for noofquestions.
SELECT id , category, SUM(NULLIF(noofquestions, '')::int), company
FROM tableName
WHERE id=1
GROUP BY id, category, noofquestions, company;
You should not group by noofquestions:
SELECT id, category, SUM(NULLIF(noofquestions, '')::int), company
FROM tableName
WHERE id = 1
GROUP BY id, category, company;

How to use distinct clause only for one column

I wanted to use distinct clause only for one column.I am having query like this
select id,brandname from brand.
Here brandname have same entry multiple time.I wanted to choose distinct brandname along with id.
You have to pick some way of getting only one ID, e.g.,
select max(id) , brandname
from brand
group by brandname
if you had more than one column you wanted... if the data was the same you could just continue to group by... however if extra columns had varying data you could use a slightly different strategy.
select * from brand
where id in
(
select max(id)
from brand
group by brandname
)
You can do this:
select Id,BrandName from brand group by BrandName,Id