Using two aggregate functions with HAVING clause - sql

I have below table -
MOBILE AMOUNT
-----------------
M1 10
M1 20
M1 30
M2 40
M2 10
M3 30
I want to find count of distinct mobiles having total amount greater than 40.
So I have written query with inner query as -
select count(mobile)
from
(
select mobile,sum(amount)
from TAB
group by mobile
having sum(amount) >40
)
Is there a way to write this with plain query i.e. without inner query.
Output needed (as only M1 and M2 have sum(amount) >40)-
CNT
---
2

Maybe something like this?
SQL> with test (mobile, amount) as
2 (select 'm1', 10 from dual union
3 select 'm1', 20 from dual union
4 select 'm1', 30 from dual union
5 select 'm2', 40 from dual union
6 select 'm2', 10 from dual union
7 select 'm3', 30 from dual
8 )
9 select sum(count(distinct mobile)) cnt
10 from test
11 group by mobile
12 having sum(amount) > 40;
CNT
----------
2
SQL>

The nested query you provided in your example is the correct one. You are asking for an aggregation on a higher level than the SUM(amount); you are asking for the number of resulting groups.
In your comment, you mentioned that the main concern is that the structure of the SQL statement changes when you include aggregation. That's how the SQL language handles different query semantics.
Just changing WHERE clauses will only allow for swapping filter conditions.
If you want to enable filtering on aggregated characteristics, then you have to have the multi-level-approach with SQL.

Related

Select greatest n per group using EXISTS

I have a RoadInsp table in Oracle 18c. I've put the data in a CTE for purpose of this question:
with roadinsp (objectid, asset_id, date_) as (
select 1, 1, to_date('2016-04-01','YYYY-MM-DD') from dual union all
select 2, 1, to_date('2019-03-01','YYYY-MM-DD') from dual union all
select 3, 1, to_date('2022-01-01','YYYY-MM-DD') from dual union all
select 4, 2, to_date('2016-04-01','YYYY-MM-DD') from dual union all
select 5, 2, to_date('2021-01-01','YYYY-MM-DD') from dual union all
select 6, 3, to_date('2022-03-01','YYYY-MM-DD') from dual union all
select 7, 3, to_date('2016-04-01','YYYY-MM-DD') from dual union all
select 8, 3, to_date('2018-03-01','YYYY-MM-DD') from dual union all
select 9, 3, to_date('2013-03-01','YYYY-MM-DD') from dual union all
select 10, 3, to_date('2010-06-01','YYYY-MM-DD') from dual
)
select * from roadinsp
OBJECTID ASSET_ID DATE_
---------- ---------- ----------
1 1 2016-04-01
2 1 2019-03-01
3 1 2022-01-01 --select this row
4 2 2016-04-01
5 2 2021-01-01 --select this row
6 3 2022-03-01 --select this row
7 3 2016-04-01
8 3 2018-03-01
9 3 2013-03-01
10 3 2010-06-01
I'm using GIS software that only lets me use SQL in a WHERE clause/SQL expression, not a full SELECT query.
I want to select the greatest n per group using the WHERE clause. In other words, for each ASSET_ID, I want to select the row that has the latest date.
As an experiment, I want to make the selection specifically using the EXISTS operator.
The reason being: While this post technically pertains to Oracle (since that's what S.O. community members would have access to), in practice, I want to use the logic in a proprietary database called a file geodatabase. The file geodatabase has very limited SQL support; a small subset of SQL-92 syntax. But it does seem to support EXISTS and subqueries, although not correlated subqueries, joins, or any modern SQL syntax. Very frustrating.
SQL reference for query expressions used in ArcGIS
Subquery support in file geodatabases is limited to the following:
Scalar subqueries with comparison operators. A scalar subquery returns a single value, for example:
GDP2006 > (SELECT MAX(GDP2005) FROM countries)
For file geodatabases, the set functions AVG, COUNT, MIN, MAX, and
SUM can only be used in scalar subqueries.
EXISTS predicate, for example:
EXISTS (SELECT * FROM indep_countries WHERE COUNTRY_NAME = 'Mexico')
Question:
Using the EXISTS operator, is there a way to select the greatest n per group? (keeping in mind the limitations mentioned above)
Edit:
If an asset has multiple rows with the same top date, then only one of those rows should be selected.
rank analytic function does the job, if it is available to you (Oracle 18c certainly does support it).
Sample data:
SQL> with roadinsp (objectid, asset_id, date_) as (
2 select 1, 1, to_date('2016-04-01','YYYY-MM-DD') from dual union all
3 select 2, 1, to_date('2019-03-01','YYYY-MM-DD') from dual union all
4 select 3, 1, to_date('2022-01-01','YYYY-MM-DD') from dual union all
5 select 4, 2, to_date('2016-04-01','YYYY-MM-DD') from dual union all
6 select 5, 2, to_date('2021-01-01','YYYY-MM-DD') from dual union all
7 select 6, 3, to_date('2022-03-01','YYYY-MM-DD') from dual union all
8 select 7, 3, to_date('2016-04-01','YYYY-MM-DD') from dual union all
9 select 8, 3, to_date('2018-03-01','YYYY-MM-DD') from dual union all
10 select 9, 3, to_date('2013-03-01','YYYY-MM-DD') from dual union all
11 select 10, 3, to_date('2010-06-01','YYYY-MM-DD') from dual
12 ),
Query begins here: first rank rows per asset_id by date in descending order:
13 temp as
14 (select r.*,
15 rank() over (partition by asset_id order by date_ desc) rnk
16 from roadinsp r
17 )
Finally, fetch rows that rank as the highest:
18 select *
19 from temp
20 where rnk = 1;
OBJECTID ASSET_ID DATE_ RNK
---------- ---------- ---------- ----------
3 1 2022-01-01 1
5 2 2021-01-01 1
6 3 2022-03-01 1
SQL>
If you can't use that, how about a subquery?
<snip>
13 select r.objectid, r.asset_id, r.date_
14 from roadinsp r
15 where (r.asset_id, r.date_) in (select t.asset_id, t.max_date
16 from (select a.asset_id, max(a.date_) max_date
17 from roadinsp a
18 group by a.asset_id
19 ) t
20 );
OBJECTID ASSET_ID DATE_
---------- ---------- ----------
6 3 2022-03-01
5 2 2021-01-01
3 1 2022-01-01
SQL>

How to use SUM DISTINCT when the order has the same qty of items

I'm working on a query to show me total amount of orders sent and qty of items sent in a day. Due to the lots of joins I have duplicate rows. It looks like this:
DispatchDate Order Qty
2019-07-02 1 2
2019-07-02 1 2
2019-07-02 1 2
2019-07-02 2 2
2019-07-02 2 2
2019-07-02 2 2
2019-07-02 3 5
2019-07-02 3 5
2019-07-02 3 5
I'm using this query:
SELECT DispatchDate, COUNT(DISTINCT Order), SUM(DISTINCT Qty)
FROM TABLE1
GROUP BY DispatchDate
Obviously on this date there 3 orders with total of items that equals 9
However, the query is returning:
3 orders and 7 items
I don't have a clue how to resolve this issue. How can I sum the quantities for each orders instead of simply removing duplicates from only one column like SUM DISTINCT does
Could do a CTE
with cte1 as (
SELECT Order AS Order
, DispatchDate
, MAX(QTY) as QTY
FROM FROM TABLE1
GROUP BY Order
, DispatchDate
)
SELECT DispatchDate
, COUNT(Order)
, SUM(Qty)
FROM cte1
GROUP BY DispatchDate
You have major problems with your data model, if the data is stored this way. If this is the case, you need a table with one row per order.
If this is the result of a query, you can probably fix the underlying query so you are not getting duplicates.
If you need to work with the data in this format, then extract a single row for each group. I think that row_number() is quite appropriate for this purpose:
select count(*), sum(qty)
from (select t.*, row_number() over (partition by dispatchdate, corder order by corder) as seqnum
from t
) t
where seqnum = 1
Here is a db<>fiddle.
At first, you should avoid multiplicating of the rows while linking. Like, for example, using LEFT JOIN instead of JOIN. But, as we are where are:
SELECT DispatchDate, sum( Qty)
FROM (
SELECT distinct DispatchDate, Order, Qty
FROM TABLE1 )T
GROUP BY DispatchDate
you have typed SUM(DISTINCT Qty), which summed up distinct values for Qty, that is 2 and 5. This is 7, isn't it?
Due to the lots of joins I have duplicate rows.
IMHO, you should fix your primary data at first. Probably the Qty column is function of unique combination of DispatchDate,Order tuple. Delete duplicities in primary data source and ensure there cannot be different Qty for two rows with same DispatchDate,Order. Then go back to your task and you'll find your SQL much simpler. No offense regarding other answers but they just mask the mess in primary data source and are unclear about choosing Qty for duplicate DispatchDate,Order (some take max, some sum).
Try this:
SELECT DispatchDate, COUNT(DISTINCT Order), SUM(DISTINCT Qty)
FROM TABLE1
GROUP BY DispatchDate, Order
I think you need dispatch date and order wise sum of distinct quantity.
How about this? Check comments within the code.
(I renamed the order column to corder; order can't be used as an identifier).
SQL> WITH test (dispatchdate, corder, qty)
2 -- your sample data
3 AS (SELECT DATE '2019-07-02', 1, 2 FROM DUAL UNION ALL
4 SELECT DATE '2019-07-02', 1, 2 FROM DUAL UNION ALL
5 SELECT DATE '2019-07-02', 1, 2 FROM DUAL UNION ALL
6 --
7 SELECT DATE '2019-07-02', 2, 2 FROM DUAL UNION ALL
8 SELECT DATE '2019-07-02', 2, 2 FROM DUAL UNION ALL
9 SELECT DATE '2019-07-02', 2, 2 FROM DUAL UNION ALL
10 --
11 SELECT DATE '2019-07-02', 3, 5 FROM DUAL UNION ALL
12 SELECT DATE '2019-07-02', 3, 5 FROM DUAL UNION ALL
13 SELECT DATE '2019-07-02', 3, 5 FROM DUAL),
14 -- compute sum of distinct qty per BOTH dispatchdate AND corder
15 temp
16 AS ( SELECT t1.dispatchdate,
17 t1.corder,
18 SUM (DISTINCT t1.qty) qty
19 FROM test t1
20 GROUP BY t1.dispatchdate,
21 t1.corder
22 )
23 -- the final result is then simple
24 SELECT t.dispatchdate,
25 COUNT (*) cnt,
26 SUM (qty) qty
27 FROM temp t
28 GROUP BY t.dispatchdate;
DISPATCHDA CNT QTY
---------- ---------- ----------
02.07.2019 3 9
SQL>

Seat arrangement using Oracle SQL for examination

I was trying to arrange seat for an examination from the below dataset.
and the output dataset would be like the below(alternate department student one after another)
I am unable to get the desire output. Please help me on that. I am using the Oracle 11g express edition.
http://sqlfiddle.com/#!4/510071/1
Using ROW_NUMBER analytic function, create sort order for each department; then select values sorted by that number.
For example:
SQL> with test (roll_no, name, department) as
2 (select 1, 'anik', 'cse' from dual union all
3 select 2, 'sudipto', 'cse' from dual union all
4 select 3, 'injamam', 'cse' from dual union all
5 select 8, 'sajukta', 'ece' from dual union all
6 select 9, 'gourab', 'ece' from dual union all
7 select 10, 'soumenn', 'ece' from dual),
8 inter as
9 (select roll_no, name, department,
10 row_number() over (partition by department order by roll_no) rn
11 from test
12 )
13 select roll_no, name, department
14 from inter
15 order by rn, department;
ROLL_NO NAME DEP
---------- ------- ---
1 anik cse
8 sajukta ece
2 sudipto cse
9 gourab ece
3 injamam cse
10 soumenn ece
6 rows selected.
SQL>
You seem to want them interleaved. If so, use row_number() in the order by:
select s.*
from student s
order by row_number() over (partition by "department" order by "roll_no"),
"department";
Here is the SQL Fiddle.
Note: Don't wrap column names in double quotes. That means that the case of the identifier matters -- and just makes queries harder to write.

How do I Display Aggregated Values Alongside Disaggregated Values in an Oracle Query?

I have a simple table in Oracle that I want to group a certain way: I want to display disaggregated results alongside aggregated results in the same row. Here is the input table:
with abc as
(
select 'aaa' nnn, 100 amt from dual union
select 'aaa', 20 from dual union
select 'aaa', 3 from dual union
select 'bbb', 44 from dual
)
select * from abc
I want to display each individual row joined with a sum of the AMT column grouped by the NNN column. I don't know how to explain this, so here's what it'd look like:
The Sum column in a given row will equal the sum of the values of the AMT column for all of the rows with an NNN value equal to the NNN value in the same given row.
I can do this by joining the input table with a grouped version of itself using the query below, but I think this is messy. My question is: Is there a builtin function in Oracle that accomplishes this? (My Oracle experience is a little weak, although I have lots of experience with SQL Server.)
with abc as
(
select 'aaa' nnn, 100 amt from dual union
select 'aaa', 20 from dual union
select 'aaa', 3 from dual union
select 'bbb', 44 from dual
)
select tblLeft.nnn, tblLeft.amt, tblRight.amtSum
from
(
select nnn, amt from abc
) tblLeft
inner join
(
select nnn, sum(amt) amtSum
from abc
group by nnn
) tblRight on tblLeft.nnn = tblRight.nnn
You could use analytic function to achieve your goal:
with abc as
(
select 'aaa' nnn, 100 amt from dual union
select 'aaa', 20 from dual union
select 'aaa', 3 from dual union
select 'bbb', 44 from dual
)
select nnn,
amt,
sum(amt) over (partition by nnn)
from abc;
Output:
NNN AMT SUM(AMT)OVER(PARTITIONBYNNN)
aaa 3 123
aaa 20 123
aaa 100 123
bbb 44 44
What analytic functions do: they allow you to use functions like SUM, but they calculate the value for each row instead of aggregating the result. They have some other interesting options, if you would like to learn more:
https://oracle-base.com/articles/misc/analytic-functions
SQLFiddle example

Using ratio_to_report analytic

I am trying to get the percentage of rows that a set of particular value has. Best explained by example. I can do this by each column very simply using ratio-to-report function and over(), but am having issues with multiple groupings
Assume table has 2 columns:
column a column b
1000 some data
1100 some data
2000 some data
1400 some data
1500 some data
With the following query, I can get for this domain set, each one is 20% of the total rows
select columna, count(*), trunc(ratio_to_report(count(columna)) over() * 100, 2) as perc
from table
group by columna
order by perc desc;
However, what I need is for example to determine the percentage & count of the rows that contain 1000, 1400 or 2000; From looking at it, you can tell its 60%, but need a query to return that. This needs to be efficient, as the query will be running against millions of rows. Like I said before, I have this working on a single value and its percentage, but the multiple is what is throwing me.
Seems like I need to be able to put an IN clause somewhere, but the values will not be these specific values each time. I will need to get the values for the "IN" part of it from another table, if that makes sense. guess I need some kind of multiple grouping.
Potentially, you're looking for something like
SQL> ed
Wrote file afiedt.buf
1 with x as (
2 select 1000 a from dual
3 union all
4 select 1100 from dual
5 union all
6 select 1400 from dual
7 union all
8 select 1500 from dual
9 union all
10 select 2000 from dual
11 )
12 select (case when a in (1000,1400,1500)
13 then 1
14 else 0
15 end) bucket,
16 count(*),
17 ratio_to_report(count(*)) over ()
18 from x
19 group by (case when a in (1000,1400,1500)
20 then 1
21 else 0
22* end)
SQL> /
BUCKET COUNT(*) RATIO_TO_REPORT(COUNT(*))OVER()
---------- ---------- -------------------------------
1 3 .6
0 2 .4
I'm not sure I entirely understand the requirement, but do you need ratio_to_report at all? Have a look at the following, and let me know how close this is to what you want, and we can work from there!
T1 is the table containing your sample data
create table t1(a primary key) as
select 1000 as a from dual union all
select 1100 as a from dual union all
select 1400 as a from dual union all
select 1500 as a from dual union all
select 2000 as a from dual;
T2 is the lookup table you mentioned (where you get the list of IDs)
create table t2(a primary key) as
select 1000 as a from dual union all
select 1400 as a from dual union all
select 2000 as a from dual;
A left join from T1->T2 will return all rows in T1 paired with all matching rows in T2. For each A in T1 that does not exist in your set (T2), the result will be padded with NULL. We can exploit the fact that COUNT() doesn't count (hehe) nulls.
select count(t1.a) as num_rows
,count(t2.a) as in_set
,count(t2.a) / count(t1.a) as shr_in_set
from t1
left
join t2 on(t1.a = t2.a);
The result of running the query is:
NUM_ROWS IN_SET SHR_IN_SET
---------- ---------- ----------
5 3 ,6