SQL Sum over partition "NOT" by column - sql
I need to build analytical SQL queries in which the client should specify any metrics(summing of values in a specific column) or dimensions (group by specific columns).
Assume that I have a table with columns hour, dim_a, dim_b, metric_a, metric_b, metric_c with values showed in csv below
hour,dim_a,dim_b,metric_a,metric_b
0,A,X,4,4
0,A,Y,4,24
0,B,Y,20,24
1,B,Y,21,35
1,A,Y,4,35
1,C,Y,10,35
2,B,Y,21,30
2,C,Y,3,30
2,A,Y,6,30
Take a look at metric_b. This metric is always the same if values hour and dim_b are the same regardless of value of dim_a. For example:
1,B,Y,21,35
1,A,Y,4,35
1,C,Y,10,35
If we select columns hour, dim_b, metric_b and take distinct values table will look like:
hour,dim_b,metric_b
0,X,4
0,Y,24
1,Y,35
2,Y,30
And by this values all aggregations against metric_b should be done
I would to like run analytical queries over this data grouping by specific dimensions and doing aggregations of metrics with special aggregation when it comes to metric_b.
when I want to group by hour, dim_a, dim_b, and see metrics metric_a and metric_b. Expected result is
hour,dim_a,dim_b,metric_a,metric_b
0,A,X,4,4
0,A,Y,4,24
0,B,Y,20,24
1,B,Y,21,35
1,A,Y,4,35
1,C,Y,10,35
2,B,Y,21,30
2,C,Y,3,30
2,A,Y,6,30
When I want to group by dim_a, dim_b, and see metrics metric_a and metric_b. Expected result is
dim_a,dim_b,metric_a,metric_b
A,X,4,4
A,Y,14,89
B,Y,62,89
C,Y,13,89
Value of metric_b is calculated from 89 = 24 + 35 + 30; 4 = 4
When I want to group by dim_b, and see metric. metric_a and metric_b. Expected result is:
dim_b,metric_a,metric_b
X,4,4
Y,89,89
Value of metric_b is calculated from 89 = 24 + 35 + 30; 4 = 4
And finally when I want to group by dim_a, and see metric. metric_a and metric_b. The expected result is:
dim_a,metric_a,metric_b
A,18,93
B,62,93
C,13,93
Value of metric_b is calculated from 93 = 24 + 35 + 30 + 4
So aggregation of metric_b should be a sum of metric_b but not taking in the accounts dim_a as a grouping column, but taking everything else. Is there SQL syntax that could help me doing this?
What's more I would like to say that these queries are going to be run on AWS Redshift and there are 20 metrics and dimension 16 so 36 columns. And there will be up to 100 billions of rows there.
for number 2:
SELECT *
FROM (
SELECT dim_a
,dim_b
,sum(metric_a) a
FROM dbo.Table_2 t
GROUP BY dim_a
,dim_b
) a
CROSS APPLY (
SELECT sum(metric_b) b
FROM (
SELECT DISTINCT metric_b
,hour
,dim_b
FROM dbo.Table_2
) t2
WHERE t2.dim_b = a.dim_b
) c
for number 3 :
SELECT *
FROM (
SELECT dim_b
,sum(metric_a) a
FROM dbo.Table_2 t
GROUP BY dim_b
) a
CROSS APPLY (
SELECT sum(metric_b) b
FROM (
SELECT DISTINCT metric_b
,hour
,dim_b
FROM dbo.Table_2
) t2
WHERE t2.dim_b = a.dim_b
) c
for number 4:
SELECT *
FROM (
SELECT dim_a
,sum(metric_a) a
FROM dbo.Table_2 t
GROUP BY dim_a
) a
CROSS APPLY (
SELECT sum(metric_b) b
FROM (
SELECT DISTINCT metric_b
,hour
,dim_b
FROM dbo.Table_2
) t2
) c
Related
Oracle SQL Group by and sum with multiple conditions
I attached a capture of two tables: - the left table is a result of others "Select" query - the right table is the result I want from the left table The right table can be created following the next conditions: When the same Unit have all positive or all negative energy values, the result remain the same When the same Unit have positive and negative energy values then: Make a sum of all Energy for that Unit(-50+15+20 = -15) and then take the maximum of absolut value for the Energy.e.g. max(abs(energy))=50 and take the price for that value. I use SQL ORACLE. I realy appreciate the help in this matter ! http://sqlfiddle.com/#!4/eb85a/12
This returns desired result: signs CTE finds out whether there are positive/negative values, as well as maximum ABS energy value then, there's union of two selects: one that returns "original" rows (if count of distinct signs is 1), and one that returns "calculated" values, as you described SQL> with 2 signs as 3 (select unit, 4 count(distinct sign(energy)) cnt, 5 max(abs(energy)) max_abs_ene 6 from tab 7 group by unit 8 ) 9 select t.unit, t.price, t.energy 10 from tab t join signs s on t.unit = s.unit 11 where s.cnt = 1 12 union all 13 select t.unit, t2.price, sum(t.energy) 14 from tab t join signs s on t.unit = s.unit 15 join tab t2 on t2.unit = s.unit and abs(t2.energy) = s.max_abs_ene 16 where s.cnt = 2 17 group by t.unit, t2.price 18 order by unit; UNIT PRICE ENERGY -------------------- ---------- ---------- A 20 -50 A 50 -80 B 13 -15 SQL> Though, what do you expect if there was yet another "B" unit row with energy = +50? Then two rows would have the same MAX(ABS(ENERGY)) value.
A union all might be the simplest solution: with t as ( select t.*, max(energy) over (partition by unit) as max_energy, min(energy) over (partition by unit) as min_energy from t ) select unit, price, energy from t where max_energy > 0 and min_energy > 0 or max_energy < 0 and min_enery < 0 union all select unit, max(price) keep (dense_rank first order by abs(energy)), sum(energy) from t where max_energy > 0 and min_energy < 0 group by unit;
SQL Get closest value to a number
I need to find the closet value of each number in column Divide from the column Quantity and put the value found in the Value column for both Quantities. Example: In the column Divide the value of 5166 would be closest to Quantity column value 5000. To keep from using those two values more than once I need to place the value of 5000 in the value column for both numbers, like the example below. Also, is it possible to do this without a loop? Quantity Divide Rank Value 15500 5166 5 5000 1250 416 5 0 5000 1666 5 5000 12500 4166 4 0 164250 54750 3 0 5250 1750 3 0 6250 2083 3 0 12250 4083 3 0 1750 583 2 0 17000 5666 2 0 2500 833 2 0 11500 3833 2 0 1250 416 1 0
There are a couple of answers here but they both use ctes/complex subqueries. There is a much simpler/faster way by just doing a couple of self joins and a group-by https://www.db-fiddle.com/f/rM268EYMWuK7yQT3gwSbGE/0 select min(min.quantity) as minQuantityOverDivide , t1.divide , max(max.quantity) as maxQuantityUnderDivide , case when (abs(t1.divide - coalesce(min(min.quantity),0)) < abs(t1.divide - coalesce(max(max.quantity),0))) then max(max.quantity) else min(min.quantity) end as cloestQuantity from t1 left join (select quantity from t1) min on min.quantity >= t1.divide left join (select quantity from t1) max on max.quantity < t1.divide group by t1.divide
If I understood the requirements, 5166 is not closest to 5000 - it's closes to 5250 (delta of 166 vs 84) The corresponding query, without loops, shall be (fiddle here: https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=be434e67ba73addba119894a98657f17). (I added a Value_Rank as it's not sure if you want Rank to be kept or recomputed) select Quantity, Divide, Rank, Value, dense_rank() over(order by Value) as Value_Rank from ( select Quantity, Divide, Rank, -- case when abs(Quantity_let_delta) < abs(Quantity_get_delta) then Divide + Quantity_let_delta else Divide + Quantity_get_delta end as Value from ( select so.Quantity, so.Divide, so.Rank, -- There is no LessEqualThan, assume GreaterEqualThan max(isnull(so_let.Quantity, so_get.Quantity)) - so.Divide as Quantity_let_delta, -- There is no GreaterEqualThan, assume LessEqualThan min(isnull(so_get.Quantity, so_let.Quantity)) - so.Divide as Quantity_get_delta from SO so left outer join SO so_let on so_let.Quantity <= so.Divide -- left outer join SO so_get on so_get.Quantity >= so.Divide group by so.Quantity, so.Divide, so.Rank ) so ) result Or, if by closest you mean the previous closest (fiddle here: https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=b41fb1a3fc11039c7f82926f8816e270). select Quantity, Divide, Rank, Value, dense_rank() over(order by Value) as Value_Rank from ( select so.Quantity, so.Divide, so.Rank, -- There is no LessEqualThan, assume 0 max(isnull(so_let.Quantity, 0)) as Value from SO so left outer join SO so_let on so_let.Quantity <= so.Divide group by so.Quantity, so.Divide, so.Rank ) result
You don't need a loop, basically you need to find which is lowest difference between the divide and all the quantities (first cte). Then use this distance to find the corresponding record (second cte) and then join with your initial table to get the converted values (final select) ;with cte as ( select t.Divide, min(abs(t2.Quantity-t.Divide)) as ClosestQuantity from #t1 as t cross apply #t1 as t2 group by t.Divide ) ,cte2 as ( select distinct t.Divide, t2.Quantity from #t1 as t cross apply #t1 as t2 where abs(t2.Quantity-t.Divide) = (select ClosestQuantity from cte as c where c.Divide = t.Divide) ) select t.Quantity, cte2.Quantity as Divide, t.Rank, t.Value from #t1 as t left outer join cte2 on t.Divide = cte2.Divide
sql query - difference between the row values of same column
Can anybody tell me how to calculate the difference between the rows of the same column? ID DeviceID Reading Date Flag 1 2 10 12/02/2015 1 2 3 08 12/02/2015 1 3 2 12 12/02/2015 1 4 2 20 12/02/2015 0 5 4 10 12/02/2015 0 6 2 19 12/02/2015 0 In ABOVE table I want to calculate the difference between the Readings for DeviceID 2 for some date say 12/02/2015 for example, (12-10=2) (20-12=8) (19-2 =-1) and want to sum up this difference i.e. 2+8+(-1)=9
If you use MS Access, I was try this code for your question: I was made 4 query in MS Access: Query1 to get data deviceId=2 and date=12/2/2015: select id, reading from table1 where deviceid=2 and date=#12/2/2015#; Then I make Query2 to get row number from query1: select (select count(*) from query1 where a.id>=id) as rowno, a.reading from query1 a; Then I make Query3 to get difference value field reading from query2: select (tbl2.reading-tbl1.reading) as diff from query2 tbl1 left join query2 tbl2 on tbl1.rowno=tbl2.rowno-1 And then final query to get sum from result difference in query3: SELECT sum(diff) as Total_Diff FROM Query3; But, if you use SQL Server, you can use this query (look for example sqlfiddle): ;with tbl as( select row_number()over(order by id) as rowno, reading from table1 where deviceid=2 and date='20150212' ) select sum(diff) as sum_diff from ( select (b.reading-a.reading) as diff from tbl a left join tbl b on a.rowno=b.rowno-1 ) tbl_diff
You can try this (replace Table1 with your table name): SELECT Sum([Diffs].[Difference]) AS FinalReading FROM ( SELECT IDs.DeviceID, [Table1].Reading AS NextReading, Table1_1.Reading AS PrevReading, [Table1].Reading-Table1_1.Reading AS Difference FROM ( ( SELECT [Table1].DeviceID, [Table1].ID, CLng(Nz(DMax("ID","Table1","[DeviceID] = " & [DeviceID] & " And [ID] < " & [ID]),0)) AS PrevID FROM Table1 WHERE DeviceID = 2 ) AS IDs INNER JOIN Table1 ON IDs.ID=[Table1].ID) INNER JOIN Table1 AS Table1_1 ON IDs.PrevID=Table1_1.ID ) AS Diffs; The IDs table expression calculates the prev ID for the DeviceID in question. (I put the WHERE clause in this table expression, but you can move it to the outer one if you want to calc the FinalReadings for ALL devices at once, the filter it at the end. Less efficient but more flexible.) We join back to the original tables on the ID and PrevIDs from the inner table expressions, get their Reading values, and perform the difference operation in the Diffs table expression. The final outer query just sums the Difference values from each row value.
Joining next Sequential Row
I am planing an SQL Statement right now and would need someone to look over my thougts. This is my Table: id stat period --- ------- -------- 1 10 1/1/2008 2 25 2/1/2008 3 5 3/1/2008 4 15 4/1/2008 5 30 5/1/2008 6 9 6/1/2008 7 22 7/1/2008 8 29 8/1/2008 Create Table CREATE TABLE tbstats ( id INT IDENTITY(1, 1) PRIMARY KEY, stat INT NOT NULL, period DATETIME NOT NULL ) go INSERT INTO tbstats (stat,period) SELECT 10,CONVERT(DATETIME, '20080101') UNION ALL SELECT 25,CONVERT(DATETIME, '20080102') UNION ALL SELECT 5,CONVERT(DATETIME, '20080103') UNION ALL SELECT 15,CONVERT(DATETIME, '20080104') UNION ALL SELECT 30,CONVERT(DATETIME, '20080105') UNION ALL SELECT 9,CONVERT(DATETIME, '20080106') UNION ALL SELECT 22,CONVERT(DATETIME, '20080107') UNION ALL SELECT 29,CONVERT(DATETIME, '20080108') go I want to calculate the difference between each statistic and the next, and then calculate the mean value of the 'gaps.' Thougts: I need to join each record with it's subsequent row. I can do that using the ever flexible joining syntax, thanks to the fact that I know the id field is an integer sequence with no gaps. By aliasing the table I could incorporate it into the SQL query twice, then join them together in a staggered fashion by adding 1 to the id of the first aliased table. The first record in the table has an id of 1. 1 + 1 = 2 so it should join on the row with id of 2 in the second aliased table. And so on. Now I would simply subtract one from the other. Then I would use the ABS function to ensure that I always get positive integers as a result of the subtraction regardless of which side of the expression is the higher figure. Is there an easier way to achieve what I want?
The lead analytic function should do the trick: SELECT period, stat, stat - LEAD(stat) OVER (ORDER BY period) AS gap FROM tbstats
The average value of the gaps can be done by calculating the difference between the first value and the last value and dividing by one less than the number of elements: select sum(case when seqnum = num then stat else - stat end) / (max(num) - 1); from (select period, row_number() over (order by period) as seqnum, count(*) over () as num from tbstats ) t where seqnum = num or seqnum = 1; Of course, you can also do the calculation using lead(), but this will also work in SQL Server 2005 and 2008.
By using Join also you achieve this SELECT t1.period, t1.stat, t1.stat - t2.stat gap FROM #tbstats t1 LEFT JOIN #tbstats t2 ON t1.id + 1 = t2.id
To calculate the difference between each statistic and the next, LEAD() and LAG() may be the simplest option. You provide an ORDER BY, and LEAD(something) returns the next something and LAG(something) returns the previous something in the given order. select x.id thisStatId, LAG(x.id) OVER (ORDER BY x.id) lastStatId, x.stat thisStatValue, LAG(x.stat) OVER (ORDER BY x.id) lastStatValue, x.stat - LAG(x.stat) OVER (ORDER BY x.id) diff from tbStats x
How to SELECT top N rows that sum to a certain amount?
Suppose: MyTable -- Amount 1 2 3 4 5 MyTable only has one column, Amount, with 5 rows. They are not necessarily in increasing order. How can I create a function, which takes a #SUM INT, and returns the TOP N rows that sum to this amount? So for input 6, I want Amount 1 2 3 Since 1 + 2 + 3 = 6. 2 + 4 / 1 + 5 won't work since I want TOP N ROWS For 7/8/9/10, I want Amount 1 2 3 4 I'm using MS SQL Server 2008 R2, if this matters.
Saying "top N rows" is indeed ambiguous when it comes to relational databases. I assume that you want to order by "amount" ascending. I would add a second column (to a table or view) like "sum_up_to_here", and create something like that: create view mytable_view as select mt1.amount, sum(mt2.amount) as sum_up_to_here from mytable mt1 left join mytable mt2 on (mt2.amount < mt1.amount) group by mt1.amount or: create view mytable_view as select mt1.amount, (select sum(amount) from mytable where amount < mt1.amount) from mytable mt1 and then I would select the final rows: select amount from mytable_view where sum_up_to_here < (some value) If you don't bother about performance you may of course run it in one query: select amount from ( select mt1.amount, sum(mt2.amount) as sum_up_to_here from mytable mt1 left join mytable mt2 on (mt2.amount < mt1.amount) group by mt1.amount ) t where sum_up_to_here < 20
One approach: select t1.amount from MyTable t1 left join MyTable t2 on t1.amount > t2.amount group by t1.amount having coalesce(sum(t2.amount),0) < 7 SQLFiddle here.
In Sql Server you can use CDEs to make it pretty simple to read. Here is a CDE I did to sum up totals used in sequence. The CDE is similar to the joins above, and holds the total up to any given index. Outside of the CDE I join it back to the original table so I can select it along with other fields. ;with summrp as ( select m1.idx, sum(m2.QtyReq) as sumUsed from #mrpe m1 join #mrpe m2 on m2.idx <= m1.idx group by m1.idx ) select RefNum, RefLineSuf, QtyReq, ProjectedDate, sumUsed from #mrpe m join summrp on summrp.idx=m.idx
In SQL Server 2012 you can use this shortcut to get a result like Grzegorz's. SELECT amount FROM ( SELECT * , SUM(amount) OVER (ORDER BY amount ASC) AS total from demo ) T WHERE total <= 6 A fiddle in the hand... http://sqlfiddle.com/#!6/b8506/6