multiple top n aggregates query defined as a view (or function)?

multiple top n aggregates query defined as a view (or function)? - sql

I couldn't find a past question exactly like this problem. I have an orders table, containing a customer id, order date, and several numeric columns (how many of a particular item were ordered on that date). Removing some of the numberics, it looks like this:
customer_id date a b c d
0001 07/01/22 0 3 3 5
0001 07/12/22 12 0 50 0
0002 06/30/22 5 65 0 30
0002 07/20/22 1 0 19 2
0003 08/01/22 0 0 99 0
I need to sum each numeric column by customer_id, then return the top n customers for each column. Obviously that means a single customer may appear multiple times, once for each column. Assuming top 2, the desired output would look something like this:
column_ranked customer_id sum rank
'a' 001 12 1
'a' 002 6 2
'b' 002 65 1
'b 001 3 2
'c' 003 99 1
'c' 001 53 2
'd' 002 30 1
'd' 001 5 2
(this assumes no date range filter)
My first thought was a CTE to collapse the table into its per-customer sums, then a union from the CTE, with a limit n clause, once for each summed column. That works if the date range is hard-coded into the CTE .... but I want to define this as a view, so it can be called by users something like this:
SELECT * from top_customers_view WHERE date_range BETWEEN ( date1 and date2 )
How can I pass the date restriction down to the CTE? Or am I taking the wrong approach entirely? If a view isn't possible, can it be done as a function? (without using a costly cursor, that is.)

Since the date ranges clearly produce a massive number of combinations you cannot generate a view with them. You can write a query, however, as shown below:
with
p as (select cast ('2022-01-01' as date) as ds, cast ('2022-12-31' as date) as de),
a as (
select top 10 customer_id, 'a' as col, sum(a) as s
from t cross join p where date between ds and de
group by customer_id order by s desc
),
b as (
select top 10 customer_id, 'b' as col, sum(b) as s
from t cross join p where date between ds and de
group by customer_id order by s desc
),
c as (
select top 10 customer_id, 'c' as col, sum(b) as s
from t cross join p where date between ds and de
group by customer_id order by s desc
),
d as (
select top 10 customer_id, 'd' as col, sum(b) as s
from t cross join p where date between ds and de
group by customer_id order by s desc
)
select * from a
union all select * from b
union all select * from c
union all select * from d
order by customer_id, col, s desc
The date range is in the second line.
See db<>fiddle.
Alternatively, you could create a data warehousing solution, but it would require much more effort to make it work.

Related

How do i select all columns, plus the result of the sum

I have this select:
"Select * from table" that return:
Id
Value
1
1
1
1
2
10
2
10
My goal is create a sum from each Value group by id like this:
Id
Value
Sum
1
1
2
1
1
2
2
10
20
2
10
20
I Have tried ways like:
SELECT Id,Value, (SELECT SUM(Value) FROM Table V2 WHERE V2.Id= V.Id GROUP BY IDRNC ) FROM Table v;
But the is not grouping by id.
Id
Value
Sum
1
1
1
1
1
1
2
10
10
2
10
10

Aggregation aggregates rows, reducing the number of records in the output. In this case you want to apply the result of a computation to each of your records, task carried out by the corresponding window function.
SELECT table.*, SUM(Value) OVER(PARTITION BY Id) AS sum_
FROM table
Check the demo here.

Your attempt looks correct.
Can you try the below query :
It works for me :
SELECT Id, Value,
(SELECT SUM(Value) FROM Table V2 WHERE V2.Id= V.Id GROUP BY ID) as sum
FROM Table v;

You can do it using inner join to join with selection grouped by id :
select t.*, sum
from _table t
inner join (
select id, sum(Value) as sum
from _table
group by id
) as s on s.id = t.id
You can check it here

Your select is ok if you adjust it just a little:
SELECT Id,Value, (SELECT SUM(Value) FROM Table V2 WHERE V2.Id= V.Id GROUP BY IDRNC ) FROM Table v;
GROUP BY IDRNC is a mistake and should be GROUP BY ID
you should give an alias to a sum column ...
subquery selecting the sum does not have to have self table alias to be compared with outer query that has one (this is not a mistake - works either way)
Test:
WITH
a_table (ID, VALUE) AS
(
Select 1, 1 From Dual Union All
Select 1, 1 From Dual Union All
Select 2, 10 From Dual Union All
Select 2, 10 From Dual
)
SELECT ID, VALUE, (SELECT SUM(VALUE) FROM a_table WHERE ID = v.ID GROUP BY ID) "ID_SUM" FROM a_table v;
ID VALUE ID_SUM
---------- ---------- ----------
1 1 2
1 1 2
2 10 20
2 10 20

numbers of users buying the exact same product from the same shop for > 2 times in 1 years

I have data like this:
date user prod shop cat1 cat2
2022-02-01 1 a a ah g
2022-02-02 1 a1 b ah g
2022-04-03 1 a a ah g
2022-04-19 1 a a ah g
2022-05-01 2 b c bg g
I want to know how many user buy the same product in the same shop for >2 times in period 1 year. The result i want like:
table 1
cat1 number_of_user
ah 1
table 2
cat2 number_of_user
g 1
For total user, my query like:
WITH data_product AS(
SELECT DATE(payment_time) date,
user,
CONCAT(prod, "_", shop) product_shop,
cat1,
cat2
FROM
a
WHERE
DATE(payment_time) BETWEEN "2022-01-01" AND DATE_SUB(current_date, INTERVAL 1 day)
ORDER BY 1,2,3),
purchased AS (
SELECT user, product_shop, count(product_shop) tot_purchased
FROM data_product
GROUP BY 1,2
HAVING COUNT(product_shop) > 2
)
SELECT COUNT(user) number_of_user FROM purchased
Please help to get number of user buy the same product in the same shop more than 2 times in period based on cat1 and cat2.

Try this:
create temporary table table1 as(
select *,extract(YEAR from date) as year from `projectid.dataset.table`
);
create temporary table table2 as(
select * except(date,cat2) ,count(user) over(partition by cat1,year,user,prod,shop) tcount from table1
);
create temporary table table4 as(
select * except(date,cat1) ,count(user) over(partition by cat2,year,user,prod,shop) tcount from table1
);
select distinct year,cat1 ,count(distinct user) number_of_user from table2 where tcount>2 group by YEAR,cat1;
select distinct year,cat2 ,count(distinct user) number_of_user from table4 where tcount>2 group by YEAR,cat2;
If you want a single result set you can union both the select statements.

I think this query might work. The first part shows count of customers who purchased same product in category1 from same shop during one year. Second part shows that for category2, then we concatenate the two set by union operation :
with cte as
(select distinct
PDate,userID as userID,prod as prod,shop,cat1 as cat1,cat2,
count(userID) over (partition by UserID,prod,shop,year(Pdate),cat1) as cat1_count,
count(PDate) over (partition by UserID,prod,shop,year(Pdate),cat2) as cat2_count
from tbl1)
select
cte.cat1 as c1,'0' as c2,count(distinct cte.cat1) as Num
from cte
where cte.cat1_count>1
group by cte.prod,cte.userID,cte.cat1
union
select
'0',cte.cat2,count(distinct cte.cat2)
from cte
where cte.cat2_count>1
group by cte.prod,cte.userID,cte.cat2

Return last amount for each element with same ref_id

I have 2 tables, one is credit and other one is creditdetails.
Creditdetails creates new row every day for each of credit.
ID Amount ref_id date
1 2 1 16.03
2 3 1 17.03
3 4 1 18.03
4 1 2 16.03
5 2 2 17.03
6 0 2 18.03
I want to sum up amount of every row with the unique id and last date. So the output should be 4 + 0.

You can use ROW_NUMBER to filter on the latest amount per ref_id.
Then SUM it.
SELECT SUM(q.Amount) AS TotalLatestAmount
FROM
(
SELECT
cd.ref_id,
cd.Amount,
ROW_NUMBER() OVER (PARTITION BY cd.ref_id ORDER BY cd.date DESC) AS rn
FROM Creditdetails cd
) q
WHERE q.rn = 1;
A test on db<>fiddle here

With this query:
select ref_id, max(date) maxdate
from creditdetails
group by ref_id
you get all the last dates for each ref_id, so you can join it to the table creditdetails and sum over amount:
select sum(amount) total
from creditdetails c inner join (
select ref_id, max(date) maxdate
from creditdetails
group by ref_id
) g
on g.ref_id = c.ref_id and g.maxdate = c.date

I think you want something like this,
select sum(amount)
from table
where date = ( select max(date) from table);
with the understanding that your date column doesn't appear to be in a standard format so I can't tell if it needs to be formatted in the query to work properly.

Selecting nth top row based on number of occurrences of value in 3 tables

I have three tables let's say A, B and C. Each of them has column that's named differently, let's say D1, D2 and D3. In those columns I have values between 1 and 26. How do I count occurrences of those values and sort them by that count?
Example:
TableA.D1
1
2
1
1
3
TableB.D2
2
1
1
1
2
3
TableC.D3
2
1
3
So the output for 3rd most common value would look like this:
3 -- number 3 appeared only 3 times
Likewise, output for 2nd most common value would be:
2 -- number 2 appeared 4 times
And output for 1st most common value:
1 -- number 1 appeared 7 times

You probably want :
select top (3) d1
from ((select d1 from tablea ta) union all
(select d2 from tableb tb) union all
(select d3 from tablec tc)
) t
group by d1
order by count(*) desc;

SELECT DQ3.X, DQ3.CNT
(
SELECT DQ2.*, dense_rank() OVER (ORDER BY DQ2.CNT DESC) AS RN
(SELECT DS.X,COUNT(DS.X) CNT FROM
(select D1 as X FROM TableA UNION ALL SELECT D2 AS X FROM TABLE2 UNION ALL SELECT D3 AS X FROM TABLE3) AS DS
GROUP BY DS.X
) DQ2
) DQ3 WHERE DQ3.RN = 3 --the third in the order of commonness - note that 'ties' can be handled differently

One of the things about SQL scripts: they get difficult to read very easily. I'm a big fan of making things as readable as absolute possible. So I'd recommend something like:
declare #topThree TABLE(entry int, cnt int)
select TOP 3 entry,count(*) as cnt
from
(
select d1 as entry from tablea UNION ALL
select d2 as entry from tableb UNION ALL
select d3 as entry from tablec UNION ALL
) as allTablesCombinedSubquery
order by count(*)
select TOP 1 entry
from #topThree
order by cnt desc
... it's extremely readable, and doesn't use any concepts that are tough to grok.

Select except where different in SQL

I need a bit of help with a SQL query.
Imagine I've got the following table
id | date | price
1 | 1999-01-01 | 10
2 | 1999-01-01 | 10
3 | 2000-02-02 | 15
4 | 2011-03-03 | 15
5 | 2011-04-04 | 16
6 | 2011-04-04 | 20
7 | 2017-08-15 | 20
What I need is all dates where only one price is present.
In this example I need to get rid of row 5 and 6 (because there is two difference prices for the same date) and either 1 or 2(because they're duplicate).
How do I do that?

select date,
count(distinct price) as prices -- included to test
from MyTable
group by date
having count(distinct price) = 1 -- distinct for the duplicate pricing

The following should work with any DBMS
SELECT id, date, price
FROM TheTable o
WHERE NOT EXISTS (
SELECT *
FROM TheTable i
WHERE i.date = o.date
AND (
i.price <> o.price
OR (i.price = o.price AND i.id < o.id)
)
)
;
JohnHC answer is more readable and delivers the information the OP asked for ("[...] I need all the dates [...]").
My answer, though less readable at first, is more general (allows for more complexes tie-breaking criteria) and also is capable of returning the full row (with id and price, not just date).

;WITH CTE_1(ID ,DATE,PRICE)
AS
(
SELECT 1 , '1999-01-01',10 UNION ALL
SELECT 2 , '1999-01-01',10 UNION ALL
SELECT 3 , '2000-02-02',15 UNION ALL
SELECT 4 , '2011-03-03',15 UNION ALL
SELECT 5 , '2011-04-04',16 UNION ALL
SELECT 6 , '2011-04-04',20 UNION ALL
SELECT 7 , '2017-08-15',20
)
,CTE2
AS
(
SELECT A.*
FROM CTE_1 A
INNER JOIN
CTE_1 B
ON A.DATE=B.DATE AND A.PRICE!=B.PRICE
)
SELECT * FROM CTE_1 WHERE ID NOT IN (SELECT ID FROM CTE2)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

multiple top n aggregates query defined as a view (or function)? - sql

Related

How do i select all columns, plus the result of the sum

numbers of users buying the exact same product from the same shop for > 2 times in 1 years

Return last amount for each element with same ref_id

Selecting nth top row based on number of occurrences of value in 3 tables

Select except where different in SQL

Categories

Resources