How to get all sums values with out each element using BigQuery? - google-bigquery

I have a table in BigQuery. I want to count all sums of values in column removing each element alternately by id. As output I want to see removed id and sum of other values.
WITH t as (SELECT 1 AS id, "LY" as code, 34 AS value
UNION ALL
SELECT 2, "LY", 45
UNION ALL
SELECT 3, "LY", 23
UNION ALL
SELECT 4, "LY", 5
UNION ALL
SELECT 5, "LY", 54
UNION ALL
SELECT 6, "LY", 78)
SELECT lv id, SUM(lag) sum_wo_id
FROM
(SELECT *, FIRST_VALUE(id) OVER (ORDER BY id DESC) lv, LAG(value) OVER (Order by id) lag from t)
GROUP BY lv
In example above I can see sum of values with out id = 6. How can I modify this query to get sums without another ids like 12346, 12356, 12456, 13456, 23456 and see which one removed?

Below is for BigQuery Standard SQL
Assuming ids are distinct - you can simply use below
#standardSQL
SELECT id AS removed_id,
SUM(value) OVER() - value AS sum_wo_id
FROM t
if applied to sample data from your question - output is
Row removed_id sum_wo_id
1 1 205
2 2 194
3 3 216
4 4 234
5 5 185
6 6 161
In case if id is not unique - you can first group by id as in below example
#standardSQL
SELECT id AS removed_id,
SUM(value) OVER() - value AS sum_wo_id
FROM (
SELECT id, SUM(value) AS value
FROM t
GROUP BY id
)

Related

Return distinct rows based on only one column in oracle sql

I want to return an n number of distinct rows. The distinct rows should be based on one column (SN) only.
I have the query below which is expected to return 4 rows where the serial number is greater than 2 and no rows with similar SN column values are returned.
Table
SN letter value
1 test 25
1 bread 26
3 alpha 43
4 beta 23
4 gamma 5
5 omega 60
6 omega 60
Expected Result
SN letter value
3 alpha 43
4 beta 23
5 omega 60
6 omega 60
This is the query I have. This does not work correctly, it returns the duplicates because it filters disctinct values by all the columns combined instead of just the single column, SN.
SELECT * FROM (SELECT a.*, row_number() over(order by SN) rowRank
FROM (SELECT distinct SN, letter, value from table where SN > 2 order by SN) a)
WHERE rowRank BETWEEN 1 AND 4}"
You do not need to use DISTINCT before trying to filter out your results. You can modify the ORDER BY clause of the row_rank analytic function if you need to modify which duplicate of a SN should be returned. Right now it is returning the first LETTER value alphabetically since that matches your example result.
Query
WITH
some_table (sn, letter, VALUE)
AS
(SELECT 1, 'test', 25 FROM DUAL
UNION ALL
SELECT 1, 'bread', 26 FROM DUAL
UNION ALL
SELECT 3, 'alpha', 43 FROM DUAL
UNION ALL
SELECT 4, 'beta', 23 FROM DUAL
UNION ALL
SELECT 4, 'gamma', 5 FROM DUAL
UNION ALL
SELECT 5, 'omega', 60 FROM DUAL
UNION ALL
SELECT 6, 'omega', 60 FROM DUAL)
--Above is to set up the sample data. Use the query below with your real table
SELECT sn, letter, VALUE
FROM (SELECT sn,
letter,
VALUE,
ROW_NUMBER () OVER (PARTITION BY sn ORDER BY letter) AS row_rank
FROM some_table
WHERE sn > 2)
WHERE row_rank = 1
ORDER BY sn
FETCH FIRST 4 ROWS ONLY;
Result
SN LETTER VALUE
_____ _________ ________
3 alpha 43
4 beta 23
5 omega 60
6 omega 60
SELECT * FROM
(
SELECT
t.*
,ROW_NUMBER() OVER (PARTITION BY sn ORDER BY value ) rn
FROM
t
WHERE sn > 2
) t1
WHERE t1.rn = 1
ORDER BY sn;

Assigning values as new column based on percentages from another table

I have two tables one is a row-level table another is a corresponding table with variables.
Table 1
Table 2
Output Required -
The variable should be populated in a new column based on the % of the variable.
First, don't use symbols as field names; user the word percentage rather than the symbol %.
Second, your mapping table (table2) should probably have the lower and upper bounds to make things simpler later on... (You can accomplish that using window functions if you can't correct the mapping table.)
Then you can use window functions on your data to identify each row in terms of which row it is within just its own group.
Once done, it becomes a relatively simple join...
WITH
map AS
(
SELECT
*,
SUM(percentage) OVER (PARTITION BY State, Region ORDER BY Variable) AS upper_bound
FROM
Table2 # lower_bound is just upper_bound - percentage
),
data AS
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY State, Region) - 1 AS group_row_number, # -1 to make the row number start from zero
COUNT(*) OVER (PARTITION BY State, Region) AS group_size
FROM
Table1
)
SELECT
*
FROM
data
INNER JOIN
map
ON data.Region = map.Region
AND data.State = map.State
AND data.group_row_number >= data.group_size * (map.upper_bound - map.percentage)
AND data.group_row_number < data.group_size * map.upper_bound
Below is for BigQuery Standard SQL
Version 1
Non-orthodox version with use of RANGE_BUCKET function
#standardSQL
WITH buckets AS (
SELECT state, region,
ARRAY_AGG(variable ORDER BY variable) variables,
ARRAY_AGG(percentage ORDER BY variable) bins
FROM (
SELECT state, region, variable, SUM(1. * percentage) OVER(win) percentage
FROM table2
WINDOW win AS (PARTITION BY state, region ORDER BY variable)
)
GROUP BY state, region
)
SELECT user, state, region,
variables[OFFSET(
RANGE_BUCKET((ROW_NUMBER() OVER(win) - 1) / (COUNT(1) OVER(win)) * 100, bins)
)] AS variable
FROM table1
JOIN buckets USING (state, region)
WINDOW win AS (PARTITION BY state, region)
-- ORDER BY user
If to apply to sample data from your question - output is
Row user state region variable
1 1 ORD 1 ABC
2 2 ORD 1 ABC
3 3 ORD 1 ABC
4 4 ORD 1 XYZ
5 5 ORD 1 XYZ
6 6 ORD 1 XYZ
7 7 IAD 2 ABC
8 8 IAD 2 ABC
9 9 IAD 2 ABC
10 10 IAD 2 ABC
11 11 IAD 2 AED
12 12 IAD 2 AED
13 13 IAD 2 XYZ
14 14 IAD 2 XYZ
Version 2
Below is more traditional version (obviously with same output as in above/first version)
#standardSQL
WITH buckets AS (
SELECT *, SUM(percentage) OVER(PARTITION BY state, region ORDER BY variable) AS bin
FROM table2
), table1_with_stats AS (
SELECT *,
ROW_NUMBER() OVER(win) - 1 AS position,
COUNT(*) OVER(win) AS size
FROM table1
WINDOW win AS (PARTITION BY state, region)
)
SELECT user, state, region, variable
FROM table1_with_stats
INNER JOIN buckets
USING (state, region)
WHERE position BETWEEN size * (bin - percentage) / 100
AND size * bin /100 - 1
-- ORDER BY user
Test Data
You can test, play with above using below CTE
WITH table1 AS (
SELECT 1 user, 'ORD' state, 1 region UNION ALL
SELECT 2, 'ORD', 1 UNION ALL
SELECT 3, 'ORD', 1 UNION ALL
SELECT 4, 'ORD', 1 UNION ALL
SELECT 5, 'ORD', 1 UNION ALL
SELECT 6, 'ORD', 1 UNION ALL
SELECT 7, 'IAD', 2 UNION ALL
SELECT 8, 'IAD', 2 UNION ALL
SELECT 9, 'IAD', 2 UNION ALL
SELECT 10, 'IAD', 2 UNION ALL
SELECT 11, 'IAD', 2 UNION ALL
SELECT 12, 'IAD', 2 UNION ALL
SELECT 13, 'IAD', 2 UNION ALL
SELECT 14, 'IAD', 2
), table2 AS (
SELECT 'ORD' state, 1 region, 'ABC' variable, 50 percentage UNION ALL
SELECT 'ORD', 1, 'XYZ', 50 UNION ALL
SELECT 'IAD', 2, 'ABC', 50 UNION ALL
SELECT 'IAD', 2, 'XYZ', 25 UNION ALL
SELECT 'IAD', 2, 'AED', 25
)

How to use SUM DISTINCT when the order has the same qty of items

I'm working on a query to show me total amount of orders sent and qty of items sent in a day. Due to the lots of joins I have duplicate rows. It looks like this:
DispatchDate Order Qty
2019-07-02 1 2
2019-07-02 1 2
2019-07-02 1 2
2019-07-02 2 2
2019-07-02 2 2
2019-07-02 2 2
2019-07-02 3 5
2019-07-02 3 5
2019-07-02 3 5
I'm using this query:
SELECT DispatchDate, COUNT(DISTINCT Order), SUM(DISTINCT Qty)
FROM TABLE1
GROUP BY DispatchDate
Obviously on this date there 3 orders with total of items that equals 9
However, the query is returning:
3 orders and 7 items
I don't have a clue how to resolve this issue. How can I sum the quantities for each orders instead of simply removing duplicates from only one column like SUM DISTINCT does
Could do a CTE
with cte1 as (
SELECT Order AS Order
, DispatchDate
, MAX(QTY) as QTY
FROM FROM TABLE1
GROUP BY Order
, DispatchDate
)
SELECT DispatchDate
, COUNT(Order)
, SUM(Qty)
FROM cte1
GROUP BY DispatchDate
You have major problems with your data model, if the data is stored this way. If this is the case, you need a table with one row per order.
If this is the result of a query, you can probably fix the underlying query so you are not getting duplicates.
If you need to work with the data in this format, then extract a single row for each group. I think that row_number() is quite appropriate for this purpose:
select count(*), sum(qty)
from (select t.*, row_number() over (partition by dispatchdate, corder order by corder) as seqnum
from t
) t
where seqnum = 1
Here is a db<>fiddle.
At first, you should avoid multiplicating of the rows while linking. Like, for example, using LEFT JOIN instead of JOIN. But, as we are where are:
SELECT DispatchDate, sum( Qty)
FROM (
SELECT distinct DispatchDate, Order, Qty
FROM TABLE1 )T
GROUP BY DispatchDate
you have typed SUM(DISTINCT Qty), which summed up distinct values for Qty, that is 2 and 5. This is 7, isn't it?
Due to the lots of joins I have duplicate rows.
IMHO, you should fix your primary data at first. Probably the Qty column is function of unique combination of DispatchDate,Order tuple. Delete duplicities in primary data source and ensure there cannot be different Qty for two rows with same DispatchDate,Order. Then go back to your task and you'll find your SQL much simpler. No offense regarding other answers but they just mask the mess in primary data source and are unclear about choosing Qty for duplicate DispatchDate,Order (some take max, some sum).
Try this:
SELECT DispatchDate, COUNT(DISTINCT Order), SUM(DISTINCT Qty)
FROM TABLE1
GROUP BY DispatchDate, Order
I think you need dispatch date and order wise sum of distinct quantity.
How about this? Check comments within the code.
(I renamed the order column to corder; order can't be used as an identifier).
SQL> WITH test (dispatchdate, corder, qty)
2 -- your sample data
3 AS (SELECT DATE '2019-07-02', 1, 2 FROM DUAL UNION ALL
4 SELECT DATE '2019-07-02', 1, 2 FROM DUAL UNION ALL
5 SELECT DATE '2019-07-02', 1, 2 FROM DUAL UNION ALL
6 --
7 SELECT DATE '2019-07-02', 2, 2 FROM DUAL UNION ALL
8 SELECT DATE '2019-07-02', 2, 2 FROM DUAL UNION ALL
9 SELECT DATE '2019-07-02', 2, 2 FROM DUAL UNION ALL
10 --
11 SELECT DATE '2019-07-02', 3, 5 FROM DUAL UNION ALL
12 SELECT DATE '2019-07-02', 3, 5 FROM DUAL UNION ALL
13 SELECT DATE '2019-07-02', 3, 5 FROM DUAL),
14 -- compute sum of distinct qty per BOTH dispatchdate AND corder
15 temp
16 AS ( SELECT t1.dispatchdate,
17 t1.corder,
18 SUM (DISTINCT t1.qty) qty
19 FROM test t1
20 GROUP BY t1.dispatchdate,
21 t1.corder
22 )
23 -- the final result is then simple
24 SELECT t.dispatchdate,
25 COUNT (*) cnt,
26 SUM (qty) qty
27 FROM temp t
28 GROUP BY t.dispatchdate;
DISPATCHDA CNT QTY
---------- ---------- ----------
02.07.2019 3 9
SQL>

Split a column into multiple columns

select distinct account_num from account order by account_num;
The above query gave the below result
account_num
1
2
4
7
12
18
24
37
45
59
I want to split the account_num column into tuple of three account_num's like (1,2,4);(7,12,18);(24,37,45),(59); The last tuple has only one entry as there are no more account_num's left. Now I want a query to output the min and max of each tuple. (please observe that the max of one tuple is less than the min of the next tuple). Output desired is shown below
1 4
7 18
24 45
59 59
Edit: I have explained my requirement in the best way I could
You can use the example below as a scratch, this is only based on information you have provided so far. For further documentation, you can consult Oracle's analytical functions docs:
with src as( --create a source data
select 1 col from dual union
select 2 from dual union
select 4 from dual union
select 7 from dual union
select 12 from dual union
select 18 from dual union
select 24 from dual union
select 37 from dual union
select 45 from dual union
select 59 from dual
)
select
col,
decode(col_2, 0, max_col, col_2) col_2 -- for the last row we get the maximum value for the row
from (
select
col,
lead(col, 2, 0) over (order by col) col_2, -- we get the values from from two rows behind
max(col) over () max_col, -- we get the max value to be used for the last row in the result
rownum rn from src -- we get the rownum to handle the final output
) where mod(rn - 1, 3) = 0 -- only get rows having a step of two
This is another solution.
SELECT *
FROM (SELECT DISTINCT MIN(val) over(PARTITION BY gr) min_,
MAX(val) over(PARTITION BY gr) max_
FROM (SELECT val,
decode(trunc(rn / 3), rn / 3, rn / 3, ceil(rn / 3)) gr
FROM (SELECT val,
row_number() over(ORDER BY val) rn
FROM (select distinct account_num from account order by account_num)))) ORDER BY min_
UPDATED
Solution without analytic function.
SELECT MIN(val) min_,
MAX(val) max_
FROM (SELECT val,
ceil(rn / 3) gr
FROM (SELECT val,
rownum rn
FROM A_DEL_ME)) GROUP BY gr
Please add more information on what you want to do. What is the connection between account_number 1 and number 4, 7 and 18? Is there any? If not, why would you want to split this into two columns and what is the rule for splitting it?
With what you have posted, you could do something like this:
select 1 as account_num, 4 as account_num1 from dual
union all select 7 as account_num, 18 as account_num1 from dual
...
and so on, but I don't see the use for this.

In Oracle, how do I get a page of distinct values from sorted results?

I have 2 columns in a one-to-many relationship. I want to sort on the "many" and return the first occurrence of the "one". I need to page through the data so, for example, I need to be able to get the 3rd group of 10 unique "one" values.
I have a query like this:
SELECT id, name
FROM table1
INNER JOIN table2 ON table2.fkid = table1.id
ORDER BY name, id;
There can be multiple rows in table2 for each row in table1.
The results of my query look like this:
id | name
----------------
2 | apple
23 | banana
77 | cranberry
23 | dark chocolate
8 | egg
2 | yak
19 | zebra
I need to page through the result set with each page containing n unique ids. For example, if start=1 and n=4 I want to get back
2
23
77
8
in the order they were sorted on (i.e., name), where id is returned in the position of its first occurrence. Likewise if start=3 and n=4 and order = desc I want
8
23
77
2
I tried this:
SELECT * FROM (
SELECT id, ROWNUM rnum FROM (
SELECT DISTINCT id FROM (
SELECT id, name
FROM table1
INNER JOIN table2 ON table2.fkid = table1.id
ORDER BY name, id)
WHERE ROWNUM <= 4)
WHERE rnum >=1)
which gave me the ids in numerical order, instead of being ordered as the names would be.
I also tried:
SELECT * FROM (
SELECT DISTINCT id, ROWNUM rnum FROM (
SELECT id FROM (
SELECT id, name
FROM table1
INNER JOIN table2 ON table2.fkid = table1.id
ORDER BY name, id)
WHERE ROWNUM <= 4)
WHERE rnum >=1)
but that gave me duplicate values.
How can I page through the results of this data? I just need the ids, nothing from the "many" table.
update
I suppose I'm getting closer with changing my inner query to
SELECT id, name, rank() over (order by name, id)
FROM table1
INNER JOIN table2 ON table2.fkid = table1.id
...but I'm still getting duplicate ids.
You may need to debug it a little, but but it will be something like this:
SELECT * FROM (
SELECT * FROM (
SELECT id FROM (
SELECT id, name, row_number() over (partition by id order by name) rn
FROM table1
INNER JOIN table2 ON table2.fkid = table1.id
)
) WHERE rn=1 ORDER BY name, id
) WHERE rownum>=1 and rownum<=4;
It's a bit convoluted (and I would tend to suspect that it could be simplified) but it should work. You'd can put whatever start and end position you want in the WHERE clause-- I'm showing here with start=2 and n=4 are pulled from a separate table but you could simplify things by using a couple of parameters instead.
SQL> ed
Wrote file afiedt.buf
1 with t as (
2 select 2 id, 'apple' name from dual union all
3 select 23, 'banana' from dual union all
4 select 77, 'cranberry' from dual union all
5 select 23, 'dark chocolate' from dual union all
6 select 8, 'egg' from dual union all
7 select 2, 'yak' from dual union all
8 select 19, 'zebra' from dual
9 ),
10 x as (
11 select 2 start_pos, 4 n from dual
12 )
13 select *
14 from (
15 select distinct
16 id,
17 dense_rank() over (order by min_id_rnk) outer_rnk
18 from (
19 select id,
20 min(rnk) over (partition by id) min_id_rnk
21 from (
22 select id,
23 name,
24 rank() over (order by name) rnk
25 from t
26 )
27 )
28 )
29 where outer_rnk between (select start_pos from x) and (select start_pos+n-1 from x)
30* order by outer_rnk
SQL> /
ID OUTER_RNK
---------- ----------
23 2
77 3
8 4
19 5