I am trying to get the percentage of rows that a set of particular value has. Best explained by example. I can do this by each column very simply using ratio-to-report function and over(), but am having issues with multiple groupings
Assume table has 2 columns:
column a column b
1000 some data
1100 some data
2000 some data
1400 some data
1500 some data
With the following query, I can get for this domain set, each one is 20% of the total rows
select columna, count(*), trunc(ratio_to_report(count(columna)) over() * 100, 2) as perc
from table
group by columna
order by perc desc;
However, what I need is for example to determine the percentage & count of the rows that contain 1000, 1400 or 2000; From looking at it, you can tell its 60%, but need a query to return that. This needs to be efficient, as the query will be running against millions of rows. Like I said before, I have this working on a single value and its percentage, but the multiple is what is throwing me.
Seems like I need to be able to put an IN clause somewhere, but the values will not be these specific values each time. I will need to get the values for the "IN" part of it from another table, if that makes sense. guess I need some kind of multiple grouping.
Potentially, you're looking for something like
SQL> ed
Wrote file afiedt.buf
1 with x as (
2 select 1000 a from dual
3 union all
4 select 1100 from dual
5 union all
6 select 1400 from dual
7 union all
8 select 1500 from dual
9 union all
10 select 2000 from dual
11 )
12 select (case when a in (1000,1400,1500)
13 then 1
14 else 0
15 end) bucket,
16 count(*),
17 ratio_to_report(count(*)) over ()
18 from x
19 group by (case when a in (1000,1400,1500)
20 then 1
21 else 0
22* end)
SQL> /
BUCKET COUNT(*) RATIO_TO_REPORT(COUNT(*))OVER()
---------- ---------- -------------------------------
1 3 .6
0 2 .4
I'm not sure I entirely understand the requirement, but do you need ratio_to_report at all? Have a look at the following, and let me know how close this is to what you want, and we can work from there!
T1 is the table containing your sample data
create table t1(a primary key) as
select 1000 as a from dual union all
select 1100 as a from dual union all
select 1400 as a from dual union all
select 1500 as a from dual union all
select 2000 as a from dual;
T2 is the lookup table you mentioned (where you get the list of IDs)
create table t2(a primary key) as
select 1000 as a from dual union all
select 1400 as a from dual union all
select 2000 as a from dual;
A left join from T1->T2 will return all rows in T1 paired with all matching rows in T2. For each A in T1 that does not exist in your set (T2), the result will be padded with NULL. We can exploit the fact that COUNT() doesn't count (hehe) nulls.
select count(t1.a) as num_rows
,count(t2.a) as in_set
,count(t2.a) / count(t1.a) as shr_in_set
from t1
left
join t2 on(t1.a = t2.a);
The result of running the query is:
NUM_ROWS IN_SET SHR_IN_SET
---------- ---------- ----------
5 3 ,6
Related
I used distinct keyword on one column it did work very well but when I add the second column in select query it doesn't work for me as both columns have duplicate values. So I want to not show me the duplicate values in both columns. Is there any proper select query for that.
The sample data is:
For Col001:
555
555
7878
7878
89.
Col002:
43
43
56
56
56
67
67
67
79
79
79.
I want these data in this format:
Col001:
555
7878
89.
Col002:
43
56
67
79
I tried the following query:
Select distinct col001, col002 from tbl1
Use a set operator. UNION will give you the set of unique values from two subqueries.
select col001 as unq_col_val
from your_table
union
select col002
from your_table;
This presumes you're not fussed whether the value comes from COL001 or COL002. If you are fussed, this variant preserves that information:
select 'COL001' as source_col
,col001 as unq_col_val
from your_table
union
select 'COL002' as source_col
,col002
from your_table;
Note that this result set will contain more rows if the same value exists in both columns.
DISTINCT works across the entire row considering all values in the row and will remove duplicate values where the entire row is duplicated.
For example, given the sample data:
CREATE TABLE table_name (col001, col002) AS
SELECT 1, 1 FROM DUAL UNION ALL
SELECT 1, 2 FROM DUAL UNION ALL
SELECT 1, 3 FROM DUAL UNION ALL
SELECT 2, 1 FROM DUAL UNION ALL
SELECT 2, 2 FROM DUAL UNION ALL
--
SELECT 1, 2 FROM DUAL UNION ALL -- These are duplicates
SELECT 2, 2 FROM DUAL;
Then:
SELECT DISTINCT
col001,
col002
FROM table_name
Outputs:
COL001
COL002
1
1
1
2
1
3
2
1
2
2
And the duplicates have been removed.
If you want to only display distinct values for each column then you need to consider each column separately and can use something like:
SELECT c1.col001,
c2.col002
FROM ( SELECT DISTINCT
col001,
DENSE_RANK() OVER (ORDER BY col001) AS rnk
FROM table_name
) c1
FULL OUTER JOIN
( SELECT DISTINCT
col002,
DENSE_RANK() OVER (ORDER BY col002) AS rnk
FROM table_name
) c2
ON (c1.rnk = c2.rnk)
Which outputs:
COL001
COL002
1
1
2
2
null
3
db<>fiddle here
How can I group by and count the values separated by double quotes between the brackets? I have 400K rows, so I'm also concerned about performance.
["853","1800"]
["852","1500"]
["833","1800"]
["857","1820"]
["23468","3184"]
.....
Desired output:
Value Count
23468 1212
09692 987
... ...
Do you mean something like this? (The with clause is only for testing - remove it, and use your actual table and column names in the main query.)
with
sample_data (j_arr) as (
select '["853","1800"]' from dual union all
select '["852","1500"]' from dual union all
select '["833","1800"]' from dual union all
select '["857","1820"]' from dual union all
select '["23468","3184"]' from dual union all
select '["013", "013", "013"]' from dual
)
select str, count(*) as ct
from sample_data cross apply json_table(j_arr, '$[*]' columns str path '$')
group by str
order by ct desc, str -- or whatever you need
;
STR CT
-------- ---
013 3
1800 2
1500 1
1820 1
23468 1
3184 1
833 1
852 1
853 1
857 1
I am sorry, but I have no clue what "register" means in this context. If you mean that you have 400K rows, I can't see how performance would be an issue. A quick test on my system (with 402K rows) took about 0.33 seconds.
I'm working on a query to show me total amount of orders sent and qty of items sent in a day. Due to the lots of joins I have duplicate rows. It looks like this:
DispatchDate Order Qty
2019-07-02 1 2
2019-07-02 1 2
2019-07-02 1 2
2019-07-02 2 2
2019-07-02 2 2
2019-07-02 2 2
2019-07-02 3 5
2019-07-02 3 5
2019-07-02 3 5
I'm using this query:
SELECT DispatchDate, COUNT(DISTINCT Order), SUM(DISTINCT Qty)
FROM TABLE1
GROUP BY DispatchDate
Obviously on this date there 3 orders with total of items that equals 9
However, the query is returning:
3 orders and 7 items
I don't have a clue how to resolve this issue. How can I sum the quantities for each orders instead of simply removing duplicates from only one column like SUM DISTINCT does
Could do a CTE
with cte1 as (
SELECT Order AS Order
, DispatchDate
, MAX(QTY) as QTY
FROM FROM TABLE1
GROUP BY Order
, DispatchDate
)
SELECT DispatchDate
, COUNT(Order)
, SUM(Qty)
FROM cte1
GROUP BY DispatchDate
You have major problems with your data model, if the data is stored this way. If this is the case, you need a table with one row per order.
If this is the result of a query, you can probably fix the underlying query so you are not getting duplicates.
If you need to work with the data in this format, then extract a single row for each group. I think that row_number() is quite appropriate for this purpose:
select count(*), sum(qty)
from (select t.*, row_number() over (partition by dispatchdate, corder order by corder) as seqnum
from t
) t
where seqnum = 1
Here is a db<>fiddle.
At first, you should avoid multiplicating of the rows while linking. Like, for example, using LEFT JOIN instead of JOIN. But, as we are where are:
SELECT DispatchDate, sum( Qty)
FROM (
SELECT distinct DispatchDate, Order, Qty
FROM TABLE1 )T
GROUP BY DispatchDate
you have typed SUM(DISTINCT Qty), which summed up distinct values for Qty, that is 2 and 5. This is 7, isn't it?
Due to the lots of joins I have duplicate rows.
IMHO, you should fix your primary data at first. Probably the Qty column is function of unique combination of DispatchDate,Order tuple. Delete duplicities in primary data source and ensure there cannot be different Qty for two rows with same DispatchDate,Order. Then go back to your task and you'll find your SQL much simpler. No offense regarding other answers but they just mask the mess in primary data source and are unclear about choosing Qty for duplicate DispatchDate,Order (some take max, some sum).
Try this:
SELECT DispatchDate, COUNT(DISTINCT Order), SUM(DISTINCT Qty)
FROM TABLE1
GROUP BY DispatchDate, Order
I think you need dispatch date and order wise sum of distinct quantity.
How about this? Check comments within the code.
(I renamed the order column to corder; order can't be used as an identifier).
SQL> WITH test (dispatchdate, corder, qty)
2 -- your sample data
3 AS (SELECT DATE '2019-07-02', 1, 2 FROM DUAL UNION ALL
4 SELECT DATE '2019-07-02', 1, 2 FROM DUAL UNION ALL
5 SELECT DATE '2019-07-02', 1, 2 FROM DUAL UNION ALL
6 --
7 SELECT DATE '2019-07-02', 2, 2 FROM DUAL UNION ALL
8 SELECT DATE '2019-07-02', 2, 2 FROM DUAL UNION ALL
9 SELECT DATE '2019-07-02', 2, 2 FROM DUAL UNION ALL
10 --
11 SELECT DATE '2019-07-02', 3, 5 FROM DUAL UNION ALL
12 SELECT DATE '2019-07-02', 3, 5 FROM DUAL UNION ALL
13 SELECT DATE '2019-07-02', 3, 5 FROM DUAL),
14 -- compute sum of distinct qty per BOTH dispatchdate AND corder
15 temp
16 AS ( SELECT t1.dispatchdate,
17 t1.corder,
18 SUM (DISTINCT t1.qty) qty
19 FROM test t1
20 GROUP BY t1.dispatchdate,
21 t1.corder
22 )
23 -- the final result is then simple
24 SELECT t.dispatchdate,
25 COUNT (*) cnt,
26 SUM (qty) qty
27 FROM temp t
28 GROUP BY t.dispatchdate;
DISPATCHDA CNT QTY
---------- ---------- ----------
02.07.2019 3 9
SQL>
I have below table -
MOBILE AMOUNT
-----------------
M1 10
M1 20
M1 30
M2 40
M2 10
M3 30
I want to find count of distinct mobiles having total amount greater than 40.
So I have written query with inner query as -
select count(mobile)
from
(
select mobile,sum(amount)
from TAB
group by mobile
having sum(amount) >40
)
Is there a way to write this with plain query i.e. without inner query.
Output needed (as only M1 and M2 have sum(amount) >40)-
CNT
---
2
Maybe something like this?
SQL> with test (mobile, amount) as
2 (select 'm1', 10 from dual union
3 select 'm1', 20 from dual union
4 select 'm1', 30 from dual union
5 select 'm2', 40 from dual union
6 select 'm2', 10 from dual union
7 select 'm3', 30 from dual
8 )
9 select sum(count(distinct mobile)) cnt
10 from test
11 group by mobile
12 having sum(amount) > 40;
CNT
----------
2
SQL>
The nested query you provided in your example is the correct one. You are asking for an aggregation on a higher level than the SUM(amount); you are asking for the number of resulting groups.
In your comment, you mentioned that the main concern is that the structure of the SQL statement changes when you include aggregation. That's how the SQL language handles different query semantics.
Just changing WHERE clauses will only allow for swapping filter conditions.
If you want to enable filtering on aggregated characteristics, then you have to have the multi-level-approach with SQL.
For example my table contains the following data:
ID price
-------------
1 10
1 10
1 20
2 20
2 20
3 30
3 30
4 5
4 5
4 15
So given the example above,
ID price
-------------
1 30
2 20
3 30
4 20
-----------
ID 100
How to write query in oracle? first sum(distinct price) group by id then sum(all price).
I would be very careful with a data structure like this. First, check that all ids have exactly one price:
select id
from table t
group by id
having count(distinct price) > 1;
I think the safest method is to extract a particular price for each id (say the maximum) and then do the aggregation:
select sum(price)
from (select id, max(price) as price
from table t
group by id
) t;
Then, go fix your data so you don't have a repeated additive dimension. There should be a table with one row per id and price (or perhaps with duplicates but controlled by effective and end dates).
The data is messed up; you should not assume that the price is the same on all rows for a given id. You need to check that every time you use the fields, until you fix the data.
first sum(distinct price) group by id then sum(all price)
Looking at your desired output, it seems you also need the final sum(similar to ROLLUP), however, ROLLUP won't directly work in your case.
If you want to format your output in exactly the way you have posted your desired output, i.e. with a header for the last row of total sum, then you could set the PAGESIZE in SQL*Plus.
Using UNION ALL
For example,
SQL> set pagesize 7
SQL> WITH DATA AS(
2 SELECT ID, SUM(DISTINCT price) AS price
3 FROM t
4 GROUP BY id
5 )
6 SELECT to_char(ID) id, price FROM DATA
7 UNION ALL
8 SELECT 'ID' id, sum(price) FROM DATA
9 ORDER BY ID
10 /
ID PRICE
--- ----------
1 30
2 20
3 30
4 20
ID PRICE
--- ----------
ID 100
SQL>
So, you have an additional row in the end with the total SUM of price.
Using ROLLUP
Alternatively, you could use ROLLUP to get the total sum as follows:
SQL> set pagesize 7
SQL> WITH DATA AS
2 ( SELECT ID, SUM(DISTINCT price) AS price FROM t GROUP BY id
3 )
4 SELECT ID, SUM(price) price
5 FROM DATA
6 GROUP BY ROLLUP(id);
ID PRICE
---------- ----------
1 30
2 20
3 30
4 20
ID PRICE
---------- ----------
100
SQL>
First do the DISTINCT and then a ROLLUP
SELECT ID, SUM(price) -- sum of the distinct prices
FROM
(
SELECT DISTINCT ID, price -- distinct prices per ID
FROM tab
) dt
GROUP BY ROLLUP(ID) -- two levels of aggregation, per ID and total sum
SELECT ID,SUM(price) as price
FROM
(SELECT ID,price
FROM TableName
GROUP BY ID,price) as T
GROUP BY ID
Explanation:
The inner query will select different prices for each ids.
i.e.,
ID price
-------------
1 10
1 20
2 20
3 30
4 5
4 15
Then the outer query will select SUM of those prices for each id.
Final Result :
ID price
----------
1 30
2 20
3 30
4 20
Result in SQL Fiddle.
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE MYTABLE ( ID, price ) AS
SELECT 1, 10 FROM DUAL
UNION ALL SELECT 1, 10 FROM DUAL
UNION ALL SELECT 1, 20 FROM DUAL
UNION ALL SELECT 2, 20 FROM DUAL
UNION ALL SELECT 2, 20 FROM DUAL
UNION ALL SELECT 3, 30 FROM DUAL
UNION ALL SELECT 3, 30 FROM DUAL
UNION ALL SELECT 4, 5 FROM DUAL
UNION ALL SELECT 4, 5 FROM DUAL
UNION ALL SELECT 4, 15 FROM DUAL;
Query 1:
SELECT COALESCE( TO_CHAR(ID), 'ID' ) AS ID,
SUM( PRICE ) AS PRICE
FROM ( SELECT DISTINCT ID, PRICE FROM MYTABLE )
GROUP BY ROLLUP ( ID )
ORDER BY ID
Results:
| ID | PRICE |
|----|-------|
| 1 | 30 |
| 2 | 20 |
| 3 | 30 |
| 4 | 20 |
| ID | 100 |