BigQuery Join Array in standard SQL - sql

I am using standard SQL and I have table Order:
"Order" table
and I am trying to join it with table MenuItem
"MenuItem" table
on Order item_ids array and MenuItem __id__ integer column and get array of MenuItem prices, but I am getting an error:
Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN.
How to avoid this error?
Query:
WITH menu_items AS
(
SELECT
__id__,
price
FROM
`potykion.MenuItem`
)
SELECT
*,
ARRAY(
SELECT
price
FROM
UNNEST(item_ids) AS id
JOIN
menu_items
ON
id = menu_items.__id__
)
FROM
`potykion.Order`

Try below (BigQuery Standard SQL)
WITH Orders AS (
SELECT 1 AS id, ARRAY[1,2,3] AS item_ids UNION ALL
SELECT 2 AS id, ARRAY[4,5] AS item_ids UNION ALL
SELECT 3 AS id, ARRAY[1,4,6] AS item_ids
),
MenuItems AS (
SELECT 1 AS __id__, 1.1 AS price UNION ALL
SELECT 2 AS __id__, 1.2 AS price UNION ALL
SELECT 3 AS __id__, 1.3 AS price UNION ALL
SELECT 4 AS __id__, 1.4 AS price UNION ALL
SELECT 5 AS __id__, 1.5 AS price UNION ALL
SELECT 6 AS __id__, 1.6 AS price UNION ALL
SELECT 7 AS __id__, 1.7 AS price
)
SELECT
*,
ARRAY(
SELECT price
FROM UNNEST(item_ids) AS id
JOIN MenuItems
ON __id__ = id
) AS prices
FROM Orders
Table Orders:
Table MenuItems:
Result:

Solution with join inside array creation expression is correct, but it doesn't work with separate tables. Alternative solution is array aggregation:
WITH Orders AS (
SELECT 1 AS id, ARRAY[1,2,3] AS item_ids UNION ALL
SELECT 2 AS id, ARRAY[4,5] AS item_ids UNION ALL
SELECT 3 AS id, ARRAY[1,4,6] AS item_ids
),
MenuItems AS (
SELECT 1 AS __id__, 1.1 AS price UNION ALL
SELECT 2 AS __id__, 1.2 AS price UNION ALL
SELECT 3 AS __id__, 1.3 AS price UNION ALL
SELECT 4 AS __id__, 1.4 AS price UNION ALL
SELECT 5 AS __id__, 1.5 AS price UNION ALL
SELECT 6 AS __id__, 1.6 AS price UNION ALL
SELECT 7 AS __id__, 1.7 AS price
)
SELECT
id, ARRAY_AGG(price)
FROM Orders
JOIN MenuItems ON __id__ in UNNEST(item_ids)
GROUP BY id

Related

Subqueries with Group By

I have looked around on the internet for a while for a way to make this query work but am unable to work it out so far. I am trying to return the item descriptions and the quantity of items sold, This is what I have got at the moment:
SELECT itemdesc, quantity, (SELECT COUNT(quantity) FROM invoiceitem
WHERE invoiceitem.itemno = item.itemno
GROUP BY COUNT(invoiceitem.quantity)) Quantity
FROM item;
I am very lost at the moment, not sure if I am even linking the right tables together, can provide an ER Diagram if it helps, any help would be greatly appreciated, thankyou.
ANSWER:
SELECT item.itemdesc, (SELECT SUM(invoiceitem.quantity) FROM invoiceitem
WHERE invoiceitem.itemno = item.itemno
GROUP BY item.itemdesc) Quantity
FROM item
ORDER BY quantity;
Thankyou all!
Your outer query:
SELECT itemdesc,
quantity
/* ignoring the subquery */
FROM item;
Will not work as the item table does not have a quantity column.
If you intended to use the itemprice column then your query would be:
SELECT itemdesc,
itemprice,
( SELECT COUNT(quantity)
FROM invoiceitem
WHERE invoiceitem.itemno = item.itemno
) AS Quantity
FROM item
Which, for the sample data:
CREATE TABLE item (
itemno PRIMARY KEY,
itemdesc,
itemprice
) AS
SELECT 1, 'ItemA', 1 FROM DUAL UNION ALL
SELECT 2, 'ItemB', 2 FROM DUAL UNION ALL
SELECT 3, 'ItemC', 3 FROM DUAL UNION ALL
SELECT 4, 'ItemD', 4 FROM DUAL UNION ALL
SELECT 5, 'ItemE', 5 FROM DUAL;
CREATE TABLE invoiceitem( itemno, quantity ) AS
SELECT 1, 10 FROM DUAL UNION ALL
SELECT 1, 20 FROM DUAL UNION ALL
SELECT 1, 30 FROM DUAL UNION ALL
SELECT 3, 40 FROM DUAL UNION ALL
SELECT 4, 50 FROM DUAL UNION ALL
SELECT 5, NULL FROM DUAL;
Outputs:
ITEMDESC | ITEMPRICE | QUANTITY
:------- | --------: | -------:
ItemA | 1 | 3
ItemB | 2 | 0
ItemC | 3 | 1
ItemD | 4 | 1
ItemE | 5 | 0
And equivalent query using a join would be (assuming that item.itemno is a primary key):
SELECT MAX( i.itemdesc ) AS itemdesc,
MAX( i.itemprice ) AS itemprice,
COUNT(ii.quantity) AS Quantity
FROM item i
LEFT OUTER JOIN invoiceitem ii
ON ( ii.itemno = i.itemno )
GROUP BY i.itemno
You need to use LEFT OUTER JOIN rather than INNER JOIN to join the corresponding rows with zero items in the invoiceitem table.
db<>fiddle here
Group by like below should work. Please check.
select
item.itemdesc, count(invoiceitem.quantity) Quantity
from
item item
join invoiceitem invoiceitem on item.itemno = invoiceitem.itemno
group by
item.itemdesc
Thanks

How to bucketize a column using another table in Bigquery SQL?

I have a column grams in the table info which can be any positive integer.
Also, I have table map which has two columns price and grams, in which grams can take some discreet values (~lets say 50) and are in ascending order.
I want to add a column in table info named cost by fetching price from table map such that info.grams <= map.grams(smallest). In other words, I want to bucketize my info.grams based on discreet values of map.grams and fetch values of price.
What I know?
I can use CASE WHEN to bucketize info.grams like below and then join two tables and fetch price. But since the discreet values are not limited I want to find a clean way of doing it without making my query a mess.
CASE WHEN grams<=1 THEN 1
WHEN grams<=5 THEN 5
WHEN grams<=10 THEN 10
WHEN grams<=20 THEN 20
WHEN grams<=30 THEN 30
...
Below is for BigQuery Standard SQL
You can use RANGE_BUCKET function for this
#standardSQL
SELECT i.*,
price_map[SAFE_OFFSET(RANGE_BUCKET(grams, grams_map))] price
FROM `project.dataset.info` i,
(
SELECT AS STRUCT
ARRAY_AGG(grams + 1 ORDER BY grams) AS grams_map,
ARRAY_AGG(price ORDER BY grams) AS price_map
FROM `project.dataset.map`
)
You can test play with above using sample data as in below example
#standardSQL
WITH `project.dataset.info` AS (
SELECT 1 AS grams UNION ALL
SELECT 3 UNION ALL
SELECT 5 UNION ALL
SELECT 7 UNION ALL
SELECT 10 UNION ALL
SELECT 13 UNION ALL
SELECT 15
), `project.dataset.map` AS (
SELECT 5 AS grams, 0.99 price UNION ALL
SELECT 10, 1.99 UNION ALL
SELECT 15, 2.99
)
SELECT i.*,
price_map[SAFE_OFFSET(RANGE_BUCKET(grams, grams_map))] price
FROM `project.dataset.info` i,
(
SELECT AS STRUCT
ARRAY_AGG(grams + 1 ORDER BY grams) AS grams_map,
ARRAY_AGG(price ORDER BY grams) AS price_map
FROM `project.dataset.map`
)
with result
Row grams price
1 1 0.99
2 3 0.99
3 5 0.99
4 7 1.99
5 10 1.99
6 13 2.99
7 15 2.99
Oh, it would be nice to use standard SQL for this, with lead() and join:
select i.*, m.*
from info i left join
(select m.*, lead(grams) over (order by trams) as next_grams
from map m
) m
on i.grams >= m.grams and
(i.grams < next_grams or next_grams is null);
However, one limitation of BigQuery is that it does not support non-equi outer joins. So, you can convert the map table to an array and use unnest() to do what you want:
with info as (
select 1 as grams union all select 5 union all select 10 union all select 15
),
map as (
select 5 as grams, 'a' as bucket union all
select 10 as grams, 'b' as bucket union all
select 15 as grams, 'c' as bucket
)
select i.*,
(select map
from unnest(m.map) map
where map.grams >= i.grams
order by map.grams
limit 1
) m
from info i cross join
(select array_agg(map order by grams) as map
from map
) m;
In addition to Gordon's Mikhail's answers. I would like to suggest a third alternative, using FIRST_VALUE(), which is a built-in method in BigQuery, and the knowledge of window.
Starting from the principle that if we use LEFT JOIN between the info and map tables using grams as the primary key, respectively, we would have null values for each gram whose is not specified in the map table. For this reason, we will use this table (with the null values) to price all the grams with the next available price. In order to achieve that, we will use FIRST_VALUE(). According to the documentation:
Returns the value of the value_expression for the first row in the
current window frame.
Thus, we will select the first non null value between the current row and the next non-value row for each row where price is null. The syntax will be as follows:
#sample data info
WITH info AS (
SELECT 1 AS grams UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 5 UNION ALL
SELECT 6 UNION ALL
SELECT 7 UNION ALL
SELECT 8 UNION ALL
SELECT 9 UNION ALL
SELECT 10 UNION ALL
SELECT 11 UNION ALL
SELECT 13 UNION ALL
SELECT 15 UNION ALL
SELECT 16 UNION ALL
SELECT 18 UNION ALL
SELECT 19 UNION ALL
SELECT 20
),
#sample data map
map AS (
SELECT 5 AS grams, 1.99 price UNION ALL
SELECT 10, 2.99 UNION ALL
SELECT 15, 3.99 UNION ALL
SELECT 20, 4.99
),
#using left join, so there are rows with price = null
t AS (
SELECT i.grams, price
FROM info i LEFT JOIN map USING(grams)
ORDER BY grams
)
SELECT grams, first_value(price IGNORE NULLS)OVER (ORDER BY grams ASC ROWS BETWEEN CURRENT ROW and UNBOUNDED FOLLOWING) AS price
FROM t ORDER BY grams
and the output,
Row grams price
1 1 1.99
2 2 1.99
3 3 1.99
4 4 1.99
5 5 1.99
6 6 2.99
7 7 2.99
8 8 2.99
9 9 2.99
10 10 2.99
11 11 3.99
12 13 3.99
13 15 3.99
14 16 4.99
15 18 4.99
16 19 4.99
17 20 4.99
The last SELECT statement perform the action we describe above. In addition, I would like to point that:
UNBOUNDED FOLLOWING: The window frame ends at the end of the
partition.
And
CURRENT ROW :The window frame starts at the current row.

How to use SUM DISTINCT when the order has the same qty of items

I'm working on a query to show me total amount of orders sent and qty of items sent in a day. Due to the lots of joins I have duplicate rows. It looks like this:
DispatchDate Order Qty
2019-07-02 1 2
2019-07-02 1 2
2019-07-02 1 2
2019-07-02 2 2
2019-07-02 2 2
2019-07-02 2 2
2019-07-02 3 5
2019-07-02 3 5
2019-07-02 3 5
I'm using this query:
SELECT DispatchDate, COUNT(DISTINCT Order), SUM(DISTINCT Qty)
FROM TABLE1
GROUP BY DispatchDate
Obviously on this date there 3 orders with total of items that equals 9
However, the query is returning:
3 orders and 7 items
I don't have a clue how to resolve this issue. How can I sum the quantities for each orders instead of simply removing duplicates from only one column like SUM DISTINCT does
Could do a CTE
with cte1 as (
SELECT Order AS Order
, DispatchDate
, MAX(QTY) as QTY
FROM FROM TABLE1
GROUP BY Order
, DispatchDate
)
SELECT DispatchDate
, COUNT(Order)
, SUM(Qty)
FROM cte1
GROUP BY DispatchDate
You have major problems with your data model, if the data is stored this way. If this is the case, you need a table with one row per order.
If this is the result of a query, you can probably fix the underlying query so you are not getting duplicates.
If you need to work with the data in this format, then extract a single row for each group. I think that row_number() is quite appropriate for this purpose:
select count(*), sum(qty)
from (select t.*, row_number() over (partition by dispatchdate, corder order by corder) as seqnum
from t
) t
where seqnum = 1
Here is a db<>fiddle.
At first, you should avoid multiplicating of the rows while linking. Like, for example, using LEFT JOIN instead of JOIN. But, as we are where are:
SELECT DispatchDate, sum( Qty)
FROM (
SELECT distinct DispatchDate, Order, Qty
FROM TABLE1 )T
GROUP BY DispatchDate
you have typed SUM(DISTINCT Qty), which summed up distinct values for Qty, that is 2 and 5. This is 7, isn't it?
Due to the lots of joins I have duplicate rows.
IMHO, you should fix your primary data at first. Probably the Qty column is function of unique combination of DispatchDate,Order tuple. Delete duplicities in primary data source and ensure there cannot be different Qty for two rows with same DispatchDate,Order. Then go back to your task and you'll find your SQL much simpler. No offense regarding other answers but they just mask the mess in primary data source and are unclear about choosing Qty for duplicate DispatchDate,Order (some take max, some sum).
Try this:
SELECT DispatchDate, COUNT(DISTINCT Order), SUM(DISTINCT Qty)
FROM TABLE1
GROUP BY DispatchDate, Order
I think you need dispatch date and order wise sum of distinct quantity.
How about this? Check comments within the code.
(I renamed the order column to corder; order can't be used as an identifier).
SQL> WITH test (dispatchdate, corder, qty)
2 -- your sample data
3 AS (SELECT DATE '2019-07-02', 1, 2 FROM DUAL UNION ALL
4 SELECT DATE '2019-07-02', 1, 2 FROM DUAL UNION ALL
5 SELECT DATE '2019-07-02', 1, 2 FROM DUAL UNION ALL
6 --
7 SELECT DATE '2019-07-02', 2, 2 FROM DUAL UNION ALL
8 SELECT DATE '2019-07-02', 2, 2 FROM DUAL UNION ALL
9 SELECT DATE '2019-07-02', 2, 2 FROM DUAL UNION ALL
10 --
11 SELECT DATE '2019-07-02', 3, 5 FROM DUAL UNION ALL
12 SELECT DATE '2019-07-02', 3, 5 FROM DUAL UNION ALL
13 SELECT DATE '2019-07-02', 3, 5 FROM DUAL),
14 -- compute sum of distinct qty per BOTH dispatchdate AND corder
15 temp
16 AS ( SELECT t1.dispatchdate,
17 t1.corder,
18 SUM (DISTINCT t1.qty) qty
19 FROM test t1
20 GROUP BY t1.dispatchdate,
21 t1.corder
22 )
23 -- the final result is then simple
24 SELECT t.dispatchdate,
25 COUNT (*) cnt,
26 SUM (qty) qty
27 FROM temp t
28 GROUP BY t.dispatchdate;
DISPATCHDA CNT QTY
---------- ---------- ----------
02.07.2019 3 9
SQL>

Pivot table with multi values of one column

Everything find with one value for each column, but does it support multi value?
Example for my query:
WITH
INPUT_LIST AS
(SELECT 1 PRODUCT_ID, 1 TYPE_ID, 1000 PRICE FROM DUAL
UNION ALL
SELECT 2 PRODUCT_ID, 1 TYPE_ID, 1500 PRICE FROM DUAL
UNION ALL
SELECT 3 PRODUCT_ID, 2 TYPE_ID, 500 PRICE FROM DUAL
UNION ALL
SELECT 4 PRODUCT_ID, 3 TYPE_ID, 2000 PRICE FROM DUAL
UNION ALL
SELECT 1 PRODUCT_ID, 4 TYPE_ID, 1000 PRICE FROM DUAL
UNION ALL
SELECT 2 PRODUCT_ID, 5 TYPE_ID, 1500 PRICE FROM DUAL
UNION ALL
SELECT 3 PRODUCT_ID, 2 TYPE_ID, 500 PRICE FROM DUAL
UNION ALL
SELECT 2 PRODUCT_ID, 3 TYPE_ID, 2000 PRICE FROM DUAL
)
SELECT * FROM
(SELECT PRODUCT_ID, TYPE_ID, SUM(PRICE) TOTAL FROM INPUT_LIST GROUP BY PRODUCT_ID, TYPE_ID)
PIVOT (SUM(TOTAL) FOR TYPE_ID IN (1 AS "FIRST_TYPE", 2 AS "SECOND_TYPE", 3 AS "THIRD_TYPE", 4 AS "FOURTH_TYPE", 5 AS "FIFTH"))
ORDER BY PRODUCT_ID;
Multi value mean I want to mark TYPE_ID in (3,4,5) to "OTHER_TYPE". Something like:
PIVOT (SUM(TOTAL) FOR TYPE_ID IN (1 AS "FIRST_TYPE", 2 AS "SECOND_TYPE", (3,4,5) AS "OTHER_TYPE"))
I can use other way to query but I want to know can pivot do that?
No PIVOT clause does not have such a feature.
But You can still do a pivot the old fashioned way:
SELECT PRODUCT_ID,
sum( case when type_id = 1 then PRICE end ) As FIRST_TYPE,
sum( case when type_id = 2 then PRICE end ) As SEcOND_TYPE,
sum( case when type_id in ( 3,4,5) then PRICE end ) ANOTHER_TYPE
FROM INPUT_LIST
GROUP BY PRODUCT_ID
ORDER BY PRODUCT_ID;
Just group the types in the sub-query first:
SELECT *
FROM (
SELECT PRODUCT_ID,
CASE
WHEN TYPE_ID IN (1,2)
THEN TYPE_ID
ELSE 3
END AS TYPE_ID,
PRICE
FROM INPUT_LIST
)
PIVOT (
SUM(PRICE) FOR TYPE_ID IN (
1 AS "FIRST_TYPE",
2 AS "SECOND_TYPE",
3 AS "OTHER_TYPE"
)
)
ORDER BY PRODUCT_ID;

Oracle SUM returns false summary with identicals values returing from an SELECT UNION

I am facing a problem with SUM statement.
This query returns MY_ID = 1 and QTY = 7
select my_id, sum(qty) qty
from
(
select 1 my_id ,2 qty from dual
union
select 1 my_id, 5 qty from dual
)
group by my_id;
But this one returns MY_ID = 1 and QTY = 5 instead of QTY = 10.
select my_id, sum(qty) qty
from
(
select 1 my_id ,5 qty from dual
union
select 1 my_id, 5 qty from dual
)
group by my_id;
How can I summary the two quantity in case of the two values are the same?
Use union all:
select my_id, sum(qty) qty
from
(
select 1 my_id ,5 qty from dual
union all
select 1 my_id, 5 qty from dual
)
group by my_id;
Try using union all:
The below works:
select my_id, sum(qty) qty
from
(
select 1 my_id ,5 qty from dual
union all
select 1 my_id, 5 qty from dual
)
group by my_id;
this is because 5 union 5 is always 5. if you do union all it includes everything irrespective of it being the same!
In the second query, the two rows in the union are identical.
There are two forms of UNION: UNION ALL and UNION DISTINCT. Which one is the default varies, but it looks like you're getting a UNION DISTINCT, which since the two (1, 5) rows are the same is only returning one of them. Change it to:
select my_id, sum(qty) qty
from
(
select 1 my_id ,5 qty from dual
union ALL
select 1 my_id, 5 qty from dual
)
group by my_id;
That should give you what you want: (1, 10).
EDIT: Briefly I had union DISTINCT in the query which was wrong! Now corrected....