Agg Functions while Partitioning Data in SQL

Agg Functions while Partitioning Data in SQL - sql

I have a table that looks like this:
store_id industry_id cust_id amount gender
1 100 1000 1.00 M
2 100 1000 2.05 M
3 100 1000 3.15 M
4 100 1000 4.00 M
5 100 2000 5.00 F
6 200 2000 5.20 F
7 200 5000 6.05 F
8 200 6000 7.10 F
Here's the code to create this table:
CREATE TABLE t1(
store_id int,
industry_id int,
cust_id int,
amount float,
gender char
);
INSERT INTO t1 VALUES(1,100,1000,1.00, 'M');
INSERT INTO t1 VALUES(2,100,1000,2.05, 'M');
INSERT INTO t1 VALUES(3,100,1000,3.15, 'M');
INSERT INTO t1 VALUES(4,100,1000,4.00, 'M');
INSERT INTO t1 VALUES(5,100,2000,5.00, 'F');
INSERT INTO t1 VALUES(6,200,2000,5.20, 'F');
INSERT INTO t1 VALUES(7,200,5000,6.05, 'F');
INSERT INTO t1 VALUES(8,200,6000,7.10, 'F');
The question I'm trying to answer is: What is the avg. transaction amount for the top 20% of customers by industry?
This should yield these results:
store_id. industry_id avg_amt_top_20
1 100 4.80
2 100 4.80
3 100 4.80
4 100 4.80
5 100 4.80
6 200 7.10
7 200 7.10
8 200 7.10
Here's what I have so far:
SELECT
store_id, industry_id,
avg(CASE WHEN percentile>=0.80 THEN amount ELSE NULL END) OVER(PARTITION BY industry_id) as cust_avg
FROM(
SELECT store_id, industry_id, amount, cume_dist() OVER(
PARTITION BY industry_id
ORDER BY amount desc) AS percentile
FROM t1
) tmp
GROUP BY store_id, industry_id;
This fails on the GROUP BY (contains nonaggregated column 'amount'). What's the best way to do this?

What is the avg. transaction amount for the top 20% of customers by industry?
Based on this question, I don't see why store_id is in the results.
If I understand correctly, you need to aggregate to get the total by customer. Then you can use NTILE() to determine the top 20%. The final step is aggregating by industry:
SELECT industry_id, AVG(total)
FROM (SELECT customer_id, industry_id, SUM(amount) as total,
NTILE(5) OVER (PARTITION BY industry_id ORDER BY SUM(amount) DESC) as tile
FROM t
GROUP BY customer_id, industry_id
) t
WHERE tile = 1
GROUP BY industry_id

Related

Query to calculate the sum from 3 different tables

I have 3 tables
Table A:
Cid acc_id acc balance
1 345 100
1 456 300
2 347 500
Table B:
Cid acc_id acc balance
1 348 100
1 457 300
2 349 500
Table C:
Cid acc_id acc balance
1 340 100
1 457 300
2 344 500
I need to create a single table which gives the sum of balances for each customer across all 3 tables.
Cid. Balance
1. 1200
2. 1500
I need SQL for this purpose. Since customer id is repeating within the table I m confused.

Use union all and aggregation
select cid, sum(balance)
from ((select Cid, acc_id, balance
from a
) union all
(select Cid, acc_id, balance
from b
) union all
(select Cid, acc_id, balance
from c
)
) abc
group by cid;

You can use UNION ALL, as in:
select
cid,
sum(balance) as balance
from (
select * from table_a
union all
select * from table_b
union all
select * from table_c
) x
group by cid

SQL query aggregate function with two tables

I'm trying to query some data from SQL such that it sums some columns, gets the max of other columns and the corresponding value from another table. For example,
|table1|
|order id| |id| |shares| |date| other stuff
12345 1 100 05/13/16 XXX
12345 2 200 05/15/16 XXX
12345 3 300 06/12/16 XXX
12345 4 400 02/22/16 XXX
56789 5 1000 03/30/16 XXX
56789 6 200 02/25/16 XXX
22222 7 5000 01/10/16 XXX
|table2|
|id| |price|
1 21.2
2 20.2
3 19.1
4 21.3
5 100.0
6 110.0
7 5.0
I want my output to be:
|shares| |date| |price| other stuff
1000 06/12/16 19.1 max(other stuff)
1200 03/30/16 1000.0 max(other stuff)
5000 01/10/16 5.0 max(other stuff)
The shares have been summed up, the date is max(date), and the price is the price at the corresponding max(date).
So far, I have:
select
orderid, stock, side, exchange,
max(startdate), max(enddate),
sum(shares), sum(execution_price * shares) / sum(shares),
max(limitprice), max(price)
from
table1 t1
inner join
table2 t2 on t2.id = t1.id
where
location = 'CHICAGO'
and startdate > '1/1/2016'
and order_type = 'limit'
group by
orderid, stock, side, exchange
However, this returns:
|shares| |date| |price| |other stuff|
1000 06/12/16 21.3 max(other stuff)
1200 03/30/16 1100.0 max(other stuff)
5000 01/10/16 5.0 max(other stuff)
which isn't the corresponding price for the max(date).
The only link between the two datasets are their id numbers, which is why
inner join
table2 t2 on t2.id = t1.id
is done. No dates in the second table at all. Any help?
Thanks.

You can resolve this using Sub-query. You need not use any aggregate function on price column, just find the max date and then get price of that particular date.Try something like this..
select t5.*, t4.price
from
(select t1.order_id, sum(t1.shares) as shares, max(t1.date) as maxdate, max(other_stuff) as other_stuff
from Table1 t1
inner join
Table2 t2 on t2.id = t1.id
group by t1.order_id) t5
inner join Table1 t3
on t5.maxdate = t3.date and t5.order_id = t3.order_id
inner join Table2 t4
on t3.id = t4.id;
ONLINE DEMO HERE

Try this (don't forget to replace #table1 and #table2 with your own table names):
SELECT Aggregated.shares
, Aggregated.date
, Aggregated.other_stuff
, T2.price
FROM (
SELECT order_id
, SUM(shares) as shares
, MAX(date) as date
, MAX(other_stuff) as other_stuff
FROM #table1 AS T1
GROUP BY order_id
) AS Aggregated
INNER JOIN #table1 AS T1 ON Aggregated.order_id = T1.order_id AND Aggregated.date = T1.date
INNER JOIN #table2 AS T2 ON T2.id = T1.id

So, before I write you a query, you basically want the price during that max date, correct? You have MAX on price, sum of shares, max on limit price, sum on shares again and so on.
My guess is you want the latest price based on the latest date (Max) then run the calculations for the latest date, latest # of shares for that max date, and sum that all together? You're also grouping on ID, Shares, and other things that don't really make sense, it would seem you would want to group on Shares, Side and Exchange but not ID. Looks like you put a max on other things just so they show up without having to group on them, this is not going to work for what you want as long as I think I know what you're looking for =) Let me know and I can definitely help if I know what your end result "specs" are.

I would do a sub-query with a max over partition ordered by date to display the last date price, then do the aggregations on the upper level, here is an example of how it would work.
Sample data
pk id shares date id price
------- --- -------- -------------------------- --- -------
100 1 100 2016-07-08 10:40:34.707 1 50
100 2 200 2016-07-06 10:40:34.707 2 20
101 3 500 2016-07-09 10:40:34.707 3 70
101 4 150 2016-07-07 10:40:34.707 4 80
102 5 300 2016-07-10 10:40:34.707 5 40
Query
with t1 as (
select 100 pk,1 id, 100 shares, getdate()-3 date union all
select 100 pk,2 id, 200 shares, getdate()-5 date union all
select 101 pk,3 id, 500 shares, getdate()-2 date union all
select 101 pk,4 id, 150 shares, getdate()-4 date union all
select 102 pk,5 id, 300 shares, getdate()-1 date ),
t2 as (
select 1 id, 50 price union all
select 2 id, 20 price union all
select 3 id, 70 price union all
select 4 id, 80 price union all
select 5 id, 40 price
)
SELECT pk,sum(shares) shares,max(date) date, max(price) from(
SELECT pk,
shares,
date,
MAX(price) over(partition by pk order by date desc) price
FROM t1
JOIN t2 ON t1.id = t2.id) a
group by pk
Result
pk shares date Price
--- ------- ------------------------ -----
100 300 2016-07-08 10:51:16.023 50
101 650 2016-07-09 10:51:16.023 80
102 300 2016-07-10 10:51:16.023 40

SQL Counting Duplicates in a Column

I have been stuck on this problem for a while and have searched over the net for an answer..
My problem is:
I have duplicates in one column. I want to count how many duplicates there are in the one column and then I want to divide the a field by that count. I want to be able to do this for each record in the column as well.
Basically I want the script to behave like this
Count number of duplicates -> divide field A by count of duplicates.
Sample data:
t1.Invoiceno | t2.Amount | t2.orderno
-------------------------------------
201412 200 P202
201412 200 P205
302142 500 P232
201412 300 P211
450402 250 P102
450402 250 P142
450402 250 P512
Desired Result:
Invoiceno | Amount | orderno| duplicates|amount_new
-------------------------------------------------
201412 200 P202 2 100
201412 200 P205 2 100
302142 500 P232 1 500
201552 300 P211 1 300
450402 1200 P102 3 400
450402 1200 P142 3 400
450402 1200 P512 3 400
I do not want to insert new columns into the table, I just want the results to show the two new columns.

Here is one way:
select A / dups.dups
from t cross join
(select count(*) as dups
from (select onecol
from t
group by onecol
having count(*) > 1
) o
) dups
EDIT:
Well, now that the problem is clarified to something more reasonable. You can user a similar approach to the above, but the dups subquery needs to be aggregated by invoice and amount:
select amount / dups.dups as new_amount
from table t join
(select invoice, amount, count(*) as dups
from table t
) dups
on t.invoice = dups.invoice and t.amount = dups.amount;

Here is another way:
Declare #tempTable Table ( ID int , A int)
INSERT INTO #tempTable VALUES (1, 12)
INSERT INTO #tempTable VALUES (1, 12)
INSERT INTO #tempTable VALUES (2, 20)
INSERT INTO #tempTable VALUES (2, 24)
INSERT INTO #tempTable VALUES (2, 15)
INSERT INTO #tempTable VALUES (3, 10)
INSERT INTO #tempTable VALUES (5, 12)
-------------------------------------------
;WITH DupsCTE (ID, DuplicateCount) AS
(
SELECT ID, COUNT(*) AS DuplicateCount FROM #tempTable GROUP BY ID
)
SELECT t.ID, t.A,
c.DuplicateCount, t.A / c.DuplicateCount AS ModifiedA
FROM
#tempTable t
INNER JOIN DupsCTE c ON c.ID = t.ID

How to select Remain values after subtract with one Fixed value

Need To select Data From One Table After Minus With One Value
this is the question i already asked and this solution for one value input to table and result. but i need this with more input values for different categories and each categories output
for eg(based of previous question)
Table 1
SNo Amount categories
1 100 type1
2 500 type1
3 400 type1
4 100 type1
5 100 type2
6 200 type2
7 300 type2
8 500 type3
9 100 type3
and
values for type1 - 800
values for type2 - 200
values for type3 - 100
and the output need is
for type-1
800 - 100 (Record1) = 700
700 - 500 (record2) = 200
200 - 400 (record3) = -200
The table records starts from record 3 with Balance Values Balance 200
Table-Output
SNo Amount
1 200
2 100
that means if minus 800 in first table the first 2 records will be removed and in third record 200 is Balance
same operation for remain types also and how to do it?

SQLFiddle demo
with T1 as
(
select t.*,
SUM(Amount) OVER (PARTITION BY [Type] ORDER BY [SNo])
-
CASE WHEN Type='Type1' then 800
WHEN Type='Type2' then 200
WHEN Type='Type3' then 100
END as Total
from t
)select Sno,Type,
CASE WHEN Amount>Total then Total
Else Amount
end as Amount
from T1 where Total>0
order by Sno
UPD: If types are not fixed then you should create a table for them, for example:
CREATE TABLE types
([Type] varchar(5), [Value] int);
insert into types
values
('type1',800),
('type2',200),
('type3',100);
and use the following query:
with T1 as
(
select t.*,
SUM(Amount) OVER (PARTITION BY t.[Type] ORDER BY [SNo])
-
ISNULL(types.Value,0) as Total
from t
left join types on (t.type=types.type)
)select Sno,Type,
CASE WHEN Amount>Total then Total
Else Amount
end as Amount
from T1 where Total>0
order by Sno
SQLFiddle demo
UPDATE: For MSSQL 2005 just replace SUM(Amount) OVER (PARTITION BY t.[Type] ORDER BY [SNo]) with (select SUM(Amount) from t as t1
where t1.Type=t.Type
and t1.SNo<=t.SNo)
with T1 as
(
select t.*,
(select SUM(Amount) from t as t1
where t1.Type=t.Type
and t1.SNo<=t.SNo)
-
ISNULL(types.Value,0) as Total
from t
left join types on (t.type=types.type)
)select Sno,Type,
CASE WHEN Amount>Total then Total
Else Amount
end as Amount
from T1 where Total>0
order by Sno
SQLFiddle demo

Generating order statistics grouped by order total

Hopefully I can explain this correctly. I have a table of line orders (each line order consists of quantity of item and the price, there are other fields but I left those out.)
table 'orderitems':
orderid | quantity | price
1 | 1 | 1.5000
1 | 2 | 3.22
2 | 1 | 9.99
3 | 4 | 0.44
3 | 2 | 15.99
So to get order total I would run
SELECT SUM(Quantity * price) AS total
FROM OrderItems
GROUP BY OrderID
However, I would like to get a count of all total orders under $1 (just provide a count).
My end result I would like would be able to define ranges:
under $1, $1 - $3, 3-5, 5-10, 10-15, 15.. etc;
and my data to look like so (hopefully):
tunder1 | t1to3 | t3to5 | t5to10 | etc
10 | 500 | 123 | 5633 |
So that I can present a piechart breakdown of customer orders on our eCommerce site.
Now I can run individual SQL queries to get this, but I would like to know what the most efficient 'single sql query' would be. I am using MS SQL Server.
Currently I can run a single query like so to get under $1 total:
SELECT COUNT(total) AS tunder1
FROM (SELECT SUM(Quantity * price) AS total
FROM OrderItems
GROUP BY OrderID) AS a
WHERE (total < 1)
How can I optimize this? Thanks in advance!

select
count(case when total < 1 then 1 end) tunder1,
count(case when total >= 1 and total < 3 then 1 end) t1to3,
count(case when total >= 3 and total < 5 then 1 end) t3to5,
...
from
(
select sum(quantity * price) as total
from orderitems group by orderid
);

you need to use HAVING for filtering grouped values.

try this:
DECLARE #YourTable table (OrderID int, Quantity int, Price decimal)
INSERT INTO #YourTable VALUES (1,1,1.5000)
INSERT INTO #YourTable VALUES (1,2,3.22)
INSERT INTO #YourTable VALUES (2,1,9.99)
INSERT INTO #YourTable VALUES (3,4,0.44)
INSERT INTO #YourTable VALUES (3,2,15.99)
SELECT
SUM(CASE WHEN TotalCost<1 THEN 1 ELSE 0 END) AS tunder1
,SUM(CASE WHEN TotalCost>=1 AND TotalCost<3 THEN 1 ELSE 0 END) AS t1to3
,SUM(CASE WHEN TotalCost>=3 AND TotalCost<5 THEN 1 ELSE 0 END) AS t3to5
,SUM(CASE WHEN TotalCost>=5 THEN 1 ELSE 0 END) AS t5andup
FROM (SELECT
SUM(quantity * price) AS TotalCost
FROM #YourTable
GROUP BY OrderID
) dt
OUTPUT:
tunder1 t1to3 t3to5 t5andup
----------- ----------- ----------- -----------
0 0 0 3
(1 row(s) affected)

WITH orders (orderid, quantity, price) AS
(
SELECT 1, 1, 1.5
UNION ALL
SELECT 1, 2, 3.22
UNION ALL
SELECT 2, 1, 9.99
UNION ALL
SELECT 3, 4, 0.44
UNION ALL
SELECT 4, 2, 15.99
),
ranges (bound) AS
(
SELECT 1
UNION ALL
SELECT 3
UNION ALL
SELECT 5
UNION ALL
SELECT 10
UNION ALL
SELECT 15
),
rr AS
(
SELECT bound, ROW_NUMBER() OVER (ORDER BY bound) AS rn
FROM ranges
),
r AS
(
SELECT COALESCE(rf.rn, 0) AS rn, COALESCE(rf.bound, 0) AS f,
rt.bound AS t
FROM rr rf
FULL JOIN
rr rt
ON rt.rn = rf.rn + 1
)
SELECT rn, f, t, COUNT(*) AS cnt
FROM r
JOIN (
SELECT SUM(quantity * price) AS total
FROM orders
GROUP BY
orderid
) o
ON total >= f
AND total < COALESCE(t, 10000000)
GROUP BY
rn, t, f
Output:
rn f t cnt
1 1 3 1
3 5 10 2
5 15 NULL 1
, that is 1 order from $1 to $3, 2 orders from $5 to $10, 1 order more than $15.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Agg Functions while Partitioning Data in SQL - sql

Related

Query to calculate the sum from 3 different tables

SQL query aggregate function with two tables

SQL Counting Duplicates in a Column

How to select Remain values after subtract with one Fixed value

Generating order statistics grouped by order total

Categories

Resources