How to bucketize a column using another table in Bigquery SQL? - sql

I have a column grams in the table info which can be any positive integer.
Also, I have table map which has two columns price and grams, in which grams can take some discreet values (~lets say 50) and are in ascending order.
I want to add a column in table info named cost by fetching price from table map such that info.grams <= map.grams(smallest). In other words, I want to bucketize my info.grams based on discreet values of map.grams and fetch values of price.
What I know?
I can use CASE WHEN to bucketize info.grams like below and then join two tables and fetch price. But since the discreet values are not limited I want to find a clean way of doing it without making my query a mess.
CASE WHEN grams<=1 THEN 1
WHEN grams<=5 THEN 5
WHEN grams<=10 THEN 10
WHEN grams<=20 THEN 20
WHEN grams<=30 THEN 30
...

Below is for BigQuery Standard SQL
You can use RANGE_BUCKET function for this
#standardSQL
SELECT i.*,
price_map[SAFE_OFFSET(RANGE_BUCKET(grams, grams_map))] price
FROM `project.dataset.info` i,
(
SELECT AS STRUCT
ARRAY_AGG(grams + 1 ORDER BY grams) AS grams_map,
ARRAY_AGG(price ORDER BY grams) AS price_map
FROM `project.dataset.map`
)
You can test play with above using sample data as in below example
#standardSQL
WITH `project.dataset.info` AS (
SELECT 1 AS grams UNION ALL
SELECT 3 UNION ALL
SELECT 5 UNION ALL
SELECT 7 UNION ALL
SELECT 10 UNION ALL
SELECT 13 UNION ALL
SELECT 15
), `project.dataset.map` AS (
SELECT 5 AS grams, 0.99 price UNION ALL
SELECT 10, 1.99 UNION ALL
SELECT 15, 2.99
)
SELECT i.*,
price_map[SAFE_OFFSET(RANGE_BUCKET(grams, grams_map))] price
FROM `project.dataset.info` i,
(
SELECT AS STRUCT
ARRAY_AGG(grams + 1 ORDER BY grams) AS grams_map,
ARRAY_AGG(price ORDER BY grams) AS price_map
FROM `project.dataset.map`
)
with result
Row grams price
1 1 0.99
2 3 0.99
3 5 0.99
4 7 1.99
5 10 1.99
6 13 2.99
7 15 2.99

Oh, it would be nice to use standard SQL for this, with lead() and join:
select i.*, m.*
from info i left join
(select m.*, lead(grams) over (order by trams) as next_grams
from map m
) m
on i.grams >= m.grams and
(i.grams < next_grams or next_grams is null);
However, one limitation of BigQuery is that it does not support non-equi outer joins. So, you can convert the map table to an array and use unnest() to do what you want:
with info as (
select 1 as grams union all select 5 union all select 10 union all select 15
),
map as (
select 5 as grams, 'a' as bucket union all
select 10 as grams, 'b' as bucket union all
select 15 as grams, 'c' as bucket
)
select i.*,
(select map
from unnest(m.map) map
where map.grams >= i.grams
order by map.grams
limit 1
) m
from info i cross join
(select array_agg(map order by grams) as map
from map
) m;

In addition to Gordon's Mikhail's answers. I would like to suggest a third alternative, using FIRST_VALUE(), which is a built-in method in BigQuery, and the knowledge of window.
Starting from the principle that if we use LEFT JOIN between the info and map tables using grams as the primary key, respectively, we would have null values for each gram whose is not specified in the map table. For this reason, we will use this table (with the null values) to price all the grams with the next available price. In order to achieve that, we will use FIRST_VALUE(). According to the documentation:
Returns the value of the value_expression for the first row in the
current window frame.
Thus, we will select the first non null value between the current row and the next non-value row for each row where price is null. The syntax will be as follows:
#sample data info
WITH info AS (
SELECT 1 AS grams UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 5 UNION ALL
SELECT 6 UNION ALL
SELECT 7 UNION ALL
SELECT 8 UNION ALL
SELECT 9 UNION ALL
SELECT 10 UNION ALL
SELECT 11 UNION ALL
SELECT 13 UNION ALL
SELECT 15 UNION ALL
SELECT 16 UNION ALL
SELECT 18 UNION ALL
SELECT 19 UNION ALL
SELECT 20
),
#sample data map
map AS (
SELECT 5 AS grams, 1.99 price UNION ALL
SELECT 10, 2.99 UNION ALL
SELECT 15, 3.99 UNION ALL
SELECT 20, 4.99
),
#using left join, so there are rows with price = null
t AS (
SELECT i.grams, price
FROM info i LEFT JOIN map USING(grams)
ORDER BY grams
)
SELECT grams, first_value(price IGNORE NULLS)OVER (ORDER BY grams ASC ROWS BETWEEN CURRENT ROW and UNBOUNDED FOLLOWING) AS price
FROM t ORDER BY grams
and the output,
Row grams price
1 1 1.99
2 2 1.99
3 3 1.99
4 4 1.99
5 5 1.99
6 6 2.99
7 7 2.99
8 8 2.99
9 9 2.99
10 10 2.99
11 11 3.99
12 13 3.99
13 15 3.99
14 16 4.99
15 18 4.99
16 19 4.99
17 20 4.99
The last SELECT statement perform the action we describe above. In addition, I would like to point that:
UNBOUNDED FOLLOWING: The window frame ends at the end of the
partition.
And
CURRENT ROW :The window frame starts at the current row.

Related

Return distinct rows based on only one column in oracle sql

I want to return an n number of distinct rows. The distinct rows should be based on one column (SN) only.
I have the query below which is expected to return 4 rows where the serial number is greater than 2 and no rows with similar SN column values are returned.
Table
SN letter value
1 test 25
1 bread 26
3 alpha 43
4 beta 23
4 gamma 5
5 omega 60
6 omega 60
Expected Result
SN letter value
3 alpha 43
4 beta 23
5 omega 60
6 omega 60
This is the query I have. This does not work correctly, it returns the duplicates because it filters disctinct values by all the columns combined instead of just the single column, SN.
SELECT * FROM (SELECT a.*, row_number() over(order by SN) rowRank
FROM (SELECT distinct SN, letter, value from table where SN > 2 order by SN) a)
WHERE rowRank BETWEEN 1 AND 4}"
You do not need to use DISTINCT before trying to filter out your results. You can modify the ORDER BY clause of the row_rank analytic function if you need to modify which duplicate of a SN should be returned. Right now it is returning the first LETTER value alphabetically since that matches your example result.
Query
WITH
some_table (sn, letter, VALUE)
AS
(SELECT 1, 'test', 25 FROM DUAL
UNION ALL
SELECT 1, 'bread', 26 FROM DUAL
UNION ALL
SELECT 3, 'alpha', 43 FROM DUAL
UNION ALL
SELECT 4, 'beta', 23 FROM DUAL
UNION ALL
SELECT 4, 'gamma', 5 FROM DUAL
UNION ALL
SELECT 5, 'omega', 60 FROM DUAL
UNION ALL
SELECT 6, 'omega', 60 FROM DUAL)
--Above is to set up the sample data. Use the query below with your real table
SELECT sn, letter, VALUE
FROM (SELECT sn,
letter,
VALUE,
ROW_NUMBER () OVER (PARTITION BY sn ORDER BY letter) AS row_rank
FROM some_table
WHERE sn > 2)
WHERE row_rank = 1
ORDER BY sn
FETCH FIRST 4 ROWS ONLY;
Result
SN LETTER VALUE
_____ _________ ________
3 alpha 43
4 beta 23
5 omega 60
6 omega 60
SELECT * FROM
(
SELECT
t.*
,ROW_NUMBER() OVER (PARTITION BY sn ORDER BY value ) rn
FROM
t
WHERE sn > 2
) t1
WHERE t1.rn = 1
ORDER BY sn;

How to get all sums values with out each element using BigQuery?

I have a table in BigQuery. I want to count all sums of values in column removing each element alternately by id. As output I want to see removed id and sum of other values.
WITH t as (SELECT 1 AS id, "LY" as code, 34 AS value
UNION ALL
SELECT 2, "LY", 45
UNION ALL
SELECT 3, "LY", 23
UNION ALL
SELECT 4, "LY", 5
UNION ALL
SELECT 5, "LY", 54
UNION ALL
SELECT 6, "LY", 78)
SELECT lv id, SUM(lag) sum_wo_id
FROM
(SELECT *, FIRST_VALUE(id) OVER (ORDER BY id DESC) lv, LAG(value) OVER (Order by id) lag from t)
GROUP BY lv
In example above I can see sum of values with out id = 6. How can I modify this query to get sums without another ids like 12346, 12356, 12456, 13456, 23456 and see which one removed?
Below is for BigQuery Standard SQL
Assuming ids are distinct - you can simply use below
#standardSQL
SELECT id AS removed_id,
SUM(value) OVER() - value AS sum_wo_id
FROM t
if applied to sample data from your question - output is
Row removed_id sum_wo_id
1 1 205
2 2 194
3 3 216
4 4 234
5 5 185
6 6 161
In case if id is not unique - you can first group by id as in below example
#standardSQL
SELECT id AS removed_id,
SUM(value) OVER() - value AS sum_wo_id
FROM (
SELECT id, SUM(value) AS value
FROM t
GROUP BY id
)

How to use SUM DISTINCT when the order has the same qty of items

I'm working on a query to show me total amount of orders sent and qty of items sent in a day. Due to the lots of joins I have duplicate rows. It looks like this:
DispatchDate Order Qty
2019-07-02 1 2
2019-07-02 1 2
2019-07-02 1 2
2019-07-02 2 2
2019-07-02 2 2
2019-07-02 2 2
2019-07-02 3 5
2019-07-02 3 5
2019-07-02 3 5
I'm using this query:
SELECT DispatchDate, COUNT(DISTINCT Order), SUM(DISTINCT Qty)
FROM TABLE1
GROUP BY DispatchDate
Obviously on this date there 3 orders with total of items that equals 9
However, the query is returning:
3 orders and 7 items
I don't have a clue how to resolve this issue. How can I sum the quantities for each orders instead of simply removing duplicates from only one column like SUM DISTINCT does
Could do a CTE
with cte1 as (
SELECT Order AS Order
, DispatchDate
, MAX(QTY) as QTY
FROM FROM TABLE1
GROUP BY Order
, DispatchDate
)
SELECT DispatchDate
, COUNT(Order)
, SUM(Qty)
FROM cte1
GROUP BY DispatchDate
You have major problems with your data model, if the data is stored this way. If this is the case, you need a table with one row per order.
If this is the result of a query, you can probably fix the underlying query so you are not getting duplicates.
If you need to work with the data in this format, then extract a single row for each group. I think that row_number() is quite appropriate for this purpose:
select count(*), sum(qty)
from (select t.*, row_number() over (partition by dispatchdate, corder order by corder) as seqnum
from t
) t
where seqnum = 1
Here is a db<>fiddle.
At first, you should avoid multiplicating of the rows while linking. Like, for example, using LEFT JOIN instead of JOIN. But, as we are where are:
SELECT DispatchDate, sum( Qty)
FROM (
SELECT distinct DispatchDate, Order, Qty
FROM TABLE1 )T
GROUP BY DispatchDate
you have typed SUM(DISTINCT Qty), which summed up distinct values for Qty, that is 2 and 5. This is 7, isn't it?
Due to the lots of joins I have duplicate rows.
IMHO, you should fix your primary data at first. Probably the Qty column is function of unique combination of DispatchDate,Order tuple. Delete duplicities in primary data source and ensure there cannot be different Qty for two rows with same DispatchDate,Order. Then go back to your task and you'll find your SQL much simpler. No offense regarding other answers but they just mask the mess in primary data source and are unclear about choosing Qty for duplicate DispatchDate,Order (some take max, some sum).
Try this:
SELECT DispatchDate, COUNT(DISTINCT Order), SUM(DISTINCT Qty)
FROM TABLE1
GROUP BY DispatchDate, Order
I think you need dispatch date and order wise sum of distinct quantity.
How about this? Check comments within the code.
(I renamed the order column to corder; order can't be used as an identifier).
SQL> WITH test (dispatchdate, corder, qty)
2 -- your sample data
3 AS (SELECT DATE '2019-07-02', 1, 2 FROM DUAL UNION ALL
4 SELECT DATE '2019-07-02', 1, 2 FROM DUAL UNION ALL
5 SELECT DATE '2019-07-02', 1, 2 FROM DUAL UNION ALL
6 --
7 SELECT DATE '2019-07-02', 2, 2 FROM DUAL UNION ALL
8 SELECT DATE '2019-07-02', 2, 2 FROM DUAL UNION ALL
9 SELECT DATE '2019-07-02', 2, 2 FROM DUAL UNION ALL
10 --
11 SELECT DATE '2019-07-02', 3, 5 FROM DUAL UNION ALL
12 SELECT DATE '2019-07-02', 3, 5 FROM DUAL UNION ALL
13 SELECT DATE '2019-07-02', 3, 5 FROM DUAL),
14 -- compute sum of distinct qty per BOTH dispatchdate AND corder
15 temp
16 AS ( SELECT t1.dispatchdate,
17 t1.corder,
18 SUM (DISTINCT t1.qty) qty
19 FROM test t1
20 GROUP BY t1.dispatchdate,
21 t1.corder
22 )
23 -- the final result is then simple
24 SELECT t.dispatchdate,
25 COUNT (*) cnt,
26 SUM (qty) qty
27 FROM temp t
28 GROUP BY t.dispatchdate;
DISPATCHDA CNT QTY
---------- ---------- ----------
02.07.2019 3 9
SQL>

How to Parse simple data in BigQuery

I need to do a simple parse of some data coming from a field. Example:
1/2
1/3
10/20
12/31
I simply need to Split or Parse this on "/". Is there a simple function that will allow me to do this?
Below example is for BigQuery Standard SQL
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, '1/2' list UNION ALL
SELECT 2, '1/3' UNION ALL
SELECT 3, '10/20' UNION ALL
SELECT 4, '15/' UNION ALL
SELECT 5, '12/31'
)
SELECT id,
SPLIT(list, '/')[SAFE_OFFSET(0)] AS first_element,
SPLIT(list, '/')[SAFE_OFFSET(1)] AS second_element
FROM `project.dataset.table`
-- ORDER BY id
with result as below
Row id first_element second_element
1 1 1 2
2 2 1 3
3 3 10 20
4 4 15
5 5 12 31
Check the following SQL functions:
https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#split
https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#regexp_extract_all
For example with SPLIT you can do
SELECT parts[SAFE_OFFSET(0)], parts[SAFE_OFFSET(1)]
FROM (SELECT SPLIT(field) parts FROM UNNEST(["1/2", "10/20"]) field)

How to use distinct and sum both together in oracle?

For example my table contains the following data:
ID price
-------------
1 10
1 10
1 20
2 20
2 20
3 30
3 30
4 5
4 5
4 15
So given the example above,
ID price
-------------
1 30
2 20
3 30
4 20
-----------
ID 100
How to write query in oracle? first sum(distinct price) group by id then sum(all price).
I would be very careful with a data structure like this. First, check that all ids have exactly one price:
select id
from table t
group by id
having count(distinct price) > 1;
I think the safest method is to extract a particular price for each id (say the maximum) and then do the aggregation:
select sum(price)
from (select id, max(price) as price
from table t
group by id
) t;
Then, go fix your data so you don't have a repeated additive dimension. There should be a table with one row per id and price (or perhaps with duplicates but controlled by effective and end dates).
The data is messed up; you should not assume that the price is the same on all rows for a given id. You need to check that every time you use the fields, until you fix the data.
first sum(distinct price) group by id then sum(all price)
Looking at your desired output, it seems you also need the final sum(similar to ROLLUP), however, ROLLUP won't directly work in your case.
If you want to format your output in exactly the way you have posted your desired output, i.e. with a header for the last row of total sum, then you could set the PAGESIZE in SQL*Plus.
Using UNION ALL
For example,
SQL> set pagesize 7
SQL> WITH DATA AS(
2 SELECT ID, SUM(DISTINCT price) AS price
3 FROM t
4 GROUP BY id
5 )
6 SELECT to_char(ID) id, price FROM DATA
7 UNION ALL
8 SELECT 'ID' id, sum(price) FROM DATA
9 ORDER BY ID
10 /
ID PRICE
--- ----------
1 30
2 20
3 30
4 20
ID PRICE
--- ----------
ID 100
SQL>
So, you have an additional row in the end with the total SUM of price.
Using ROLLUP
Alternatively, you could use ROLLUP to get the total sum as follows:
SQL> set pagesize 7
SQL> WITH DATA AS
2 ( SELECT ID, SUM(DISTINCT price) AS price FROM t GROUP BY id
3 )
4 SELECT ID, SUM(price) price
5 FROM DATA
6 GROUP BY ROLLUP(id);
ID PRICE
---------- ----------
1 30
2 20
3 30
4 20
ID PRICE
---------- ----------
100
SQL>
First do the DISTINCT and then a ROLLUP
SELECT ID, SUM(price) -- sum of the distinct prices
FROM
(
SELECT DISTINCT ID, price -- distinct prices per ID
FROM tab
) dt
GROUP BY ROLLUP(ID) -- two levels of aggregation, per ID and total sum
SELECT ID,SUM(price) as price
FROM
(SELECT ID,price
FROM TableName
GROUP BY ID,price) as T
GROUP BY ID
Explanation:
The inner query will select different prices for each ids.
i.e.,
ID price
-------------
1 10
1 20
2 20
3 30
4 5
4 15
Then the outer query will select SUM of those prices for each id.
Final Result :
ID price
----------
1 30
2 20
3 30
4 20
Result in SQL Fiddle.
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE MYTABLE ( ID, price ) AS
SELECT 1, 10 FROM DUAL
UNION ALL SELECT 1, 10 FROM DUAL
UNION ALL SELECT 1, 20 FROM DUAL
UNION ALL SELECT 2, 20 FROM DUAL
UNION ALL SELECT 2, 20 FROM DUAL
UNION ALL SELECT 3, 30 FROM DUAL
UNION ALL SELECT 3, 30 FROM DUAL
UNION ALL SELECT 4, 5 FROM DUAL
UNION ALL SELECT 4, 5 FROM DUAL
UNION ALL SELECT 4, 15 FROM DUAL;
Query 1:
SELECT COALESCE( TO_CHAR(ID), 'ID' ) AS ID,
SUM( PRICE ) AS PRICE
FROM ( SELECT DISTINCT ID, PRICE FROM MYTABLE )
GROUP BY ROLLUP ( ID )
ORDER BY ID
Results:
| ID | PRICE |
|----|-------|
| 1 | 30 |
| 2 | 20 |
| 3 | 30 |
| 4 | 20 |
| ID | 100 |