SQL Server - Sum entire column AND Group By - sql

Suppose I had the following table in SQL Server:
grp: val: criteria:
a 1 1
a 1 1
b 1 1
b 1 1
b 1 1
c 1 1
c 1 1
c 1 1
d 1 1
Now what I want is to get an output which would basically be:
Select grp, val / [sum(val) for all records] grouped by grp where criteria = 1
So, given the following is true:
Sum of all values = 9
Sum of values in grp(a) = 2
Sum of values in grp(b) = 3
Sum of values in grp(c) = 3
Sum of values in grp(d) = 1
The output would be as follows:
grp: calc:
a 2/9
b 3/9
c 3/9
d 1/9
What would my SQL have to look like??
Thanks!!

You should be able to use something like this which uses sum() over():
select distinct grp,
sum(val) over(partition by grp)
/ (sum(val) over(partition by criteria)*1.0) Total
from yourtable
where criteria = 1
See SQL Fiddle with Demo
The result is:
| GRP | TOTAL |
------------------------
| a | 0.222222222222 |
| b | 0.333333333333 |
| c | 0.333333333333 |
| d | 0.111111111111 |

I completely agree with #bluefeet's response -- this is just a little more of a database-independent approach (should work with most RDBMS):
select distinct
grp,
sum(val)/cast(total as decimal)
from yourtable
cross join
(
select SUM(val) as total
from yourtable
) sumtable
where criteria = 1
GROUP BY grp, total
And here is the SQL Fiddle.

Related

Product of distinct values before a certain date

I have a table with schema:
date | item_id | factor
----------------------
20180710 | 1 | 0.1
20180711 | 1 | 0.1
20180712 | 1 | 2
20180713 | 1 | 2
20180714 | 1 | 2
20180710 | 2 | 0.1
20180711 | 2 | 0.1
20180712 | 2 | 5
20180713 | 2 | 5
20180714 | 2 | 10
The factor for each item_id can change on any date. On each date, I need to calculate the product of all the distinct factors for each item_id up to that date (inclusive), so the final output for the above table should be:
date | id | cumulative_factor
20180710 | 1 | 0.1
20180711 | 1 | 0.1
20180712 | 1 | 0.2
20180713 | 1 | 0.2
20180714 | 1 | 0.2
20180710 | 2 | 0.1
20180711 | 2 | 0.1
20180712 | 2 | 0.5
20180713 | 2 | 0.5
20180714 | 2 | 5
Logic:
On 20180711, for id=1, the distinct factors is 0.1 only, so the cumulative factor is 0.1.
On 20180714, for id=1, the distinct factors are 0.1 and 2, so the cumulative factor is 0.1*2 = 0.2.
On 20180714, for id=2, the distinct factors are 0.1, 5 and 10, so the cumulative factor is 0.1*5*10 = 5.
I've tried
select a.id, a.date, b.cum_factor
from factor_table a
left join (
select id, date, ISNULL(EXP(SUM(distinct log_factor)),1) as cum_factor
from factor_table
where date < a.date
) b
on a.id=b.id and a.date = b.date
but I get the error
a.date not found
there isn't a product aggregate function in SQL Server.
However, you can emulate it with EXP ( SUM ( LAG ( value ) ) )
please refer to in-line query for the comments
; with
cte as
(
-- this cte set the factor to 1 if it is same as previous row
-- as you wanted `product of distinct`
select *,
factor2 = CASE WHEN LAG(factor) OVER (PARTITION BY id
ORDER BY [date]) IS NULL
OR LAG(factor) OVER (PARTITION BY id
ORDER BY [date]) <> factor
THEN factor
ELSE 1
END
from factor_table
),
cte2 as
(
-- this cte peform SUM( LOG ( factor ) ) only. except EXP()
select *, factor3 = SUM(LOG(factor2)) OVER (PARTITION BY id
ORDER BY [date])
from cte
)
-- EXP() is not a window function, so it has to do it in separately in another level
select *, EXP(factor3) as cumulative_factor
from cte2
Note : LAG() required SQL Server 2012 or later
Something seems wrong with multiplying distinct factors. You can pretty easily express this using window functions:
select f.id, f.date, f.cum_factor
exp(sum(distinct log(log_factor) over (partition by f.id order by f.date)))
from factor_table f;
To get around the limitation on distinct:
select f.id, f.date, f.cum_factor
exp(sum(log(case when seqnum = 1 then log_factor end) over (partition by f.id order by f.date)))
from (select t.*,
row_number() over (partition by id, log_factor order by date) as seqnum
from factor_table f
) f;

Unable to use lag function correctly in sql

I have created a table from multiple tables like this:
Week | Cid | CustId | L1
10 | 1 | 1 | 2
10 | 2 | 1 | 2
10 | 5 | 1 | 2
10 | 4 | 1 | 1
10 | 3 | 2 | 1
4 | 6 | 1 | 2
4 | 7 | 1 | 2
I want the output as:
Repeat
0
1
1
0
0
0
1
So, basically what I want is for each week, if a person (custid) comes in again with the same L1, then the value in the column Repeat should become 1, otherwise 0 ( so like, here, in row 2 & 3, custid 1, came with L1=2 again, so it will get 1 in column "Repeat", however in row 4, custid 1 came with L1=1, so it will get value as ).
By the way, the table isn't ordered (as I've shown).
I'm trying to do it as follows:
select t.*,
lag(0, 1, 0) over (partition by week, custid, L1 order by cid) as repeat
from
table;
But this is not giving the output and is giving empty result.
I think you need a case, but I would use row_number() for this:
select t.*,
(case when row_number() over (partition by week, custid, l1 order by cid) = 1
then 0 else 1
end) as repeat
from table;
This can also be computed without Window functions but by a self-join in the following way:
SELECT a.week, a.cid, a.custid, a.l1,
CASE WHEN b IS NULL THEN 1 ELSE 0 END AS repeat
FROM mytable a NATURAL LEFT JOIN
(SELECT week, min(cid) AS cid, custid, l1 FROM mytable
GROUP BY week,custid,l1) b
ORDER BY week DESC, custid, l1 DESC, cid;
It can be done simply by using an count(*) as analytic function. No case expression or self join needed. The query is even portable across databases that support analytic functions:
SELECT cust.*, least(count(*)
OVER (PARTITION BY Week, CustId, L1 ORDER BY Cid
ROWS UNBOUNDED PRECEDING) - 1, 1) repeat
FROM cust ORDER BY Week DESC, custId, L1 DESC;
Executing the query on your data results in the following output (last row is the repeat row):
Week | Cid | CustId | L1 | repeat
10 1 1 2 0
10 2 1 2 1
10 5 1 2 1
10 4 1 1 0
10 3 2 1 0
4 6 1 2 0
4 7 1 2 1
Tested on Oracle 11g and PostgreSQL 9.4. Note that the second ORDER BY is optional. See Oracle Language Reference, Analytic Functions for more details.

SQL - Calculate next row based on previous in the same column

I have spent hours trying to solve this with loops, the lag function but it doesn't solve my problem. I have a table where the first row of a particular field is populated, the next row is calculated based on a subtraction of the previous row of data from 2 columns, the next row is then based on the result of this. The example is below of the original table and the result set:
a b a b
502.5 33.85 502.5 33.85
25.46 468.65 25.46
20.83 443.19 20.83
133.07 422.36 133.07
144.65 289.29 144.65
144.65 144.64 144.65
I have tried several different methods with stored procedures and can get the 2nd row result set but I can't get it to continue and calculate the rest of the fields, it's easy in excel but not so in SQL. Any suggestions?
If your RDBMS supports windowed aggregate functions:
Assuming you have an id or some such thing that is determining the order of your rows (as you indicated there is a first).
You can use the max() over() (in this case min() works instead of max() as well) and sum() over() windowed aggregate functions
select
id
, max(a) over (order by id) - (sum(b) over (order by id) - b) as a
, b
from t
rextester demo: http://rextester.com/MGKM17497
returns:
+----+--------+--------+
| id | a | b |
+----+--------+--------+
| 1 | 502,50 | 33,85 |
| 2 | 468,65 | 25,46 |
| 3 | 443,19 | 20,83 |
| 4 | 422,36 | 133,07 |
| 5 | 289,29 | 144,65 |
| 6 | 144,64 | 144,65 |
+----+--------+--------+
In case, as I saw data before editing )
This solution also assumes that you have id column and order depends on this column
with t(id, a, b) as(
select 1, 502.5, 33.85 union all
select 2, 25.46, null union all
select 3, 20.83, null union all
select 4, 133.07, null union all
select 5, 144.65, null union all
select 6, 144.65, null
)
select case when id = 1 then a else b end as a, case when id = 1 then (select b from t order by id offset 0 rows fetch next 1 rows only) else a end as b from (
select id, a, lag((select a from t order by id offset 0 rows fetch next 1 rows only)-s) over(order by id) as b from (
select id, a, sum(case when b is null then a else b end ) over(order by id) s
from t
) tt
) ttt

Grouping SQL Results based on order

I have table with data something like this:
ID | RowNumber | Data
------------------------------
1 | 1 | Data
2 | 2 | Data
3 | 3 | Data
4 | 1 | Data
5 | 2 | Data
6 | 1 | Data
7 | 2 | Data
8 | 3 | Data
9 | 4 | Data
I want to group each set of RowNumbers So that my result is something like this:
ID | RowNumber | Group | Data
--------------------------------------
1 | 1 | a | Data
2 | 2 | a | Data
3 | 3 | a | Data
4 | 1 | b | Data
5 | 2 | b | Data
6 | 1 | c | Data
7 | 2 | c | Data
8 | 3 | c | Data
9 | 4 | c | Data
The only way I know where each group starts and stops is when the RowNumber starts over. How can I accomplish this? It also needs to be fairly efficient since the table I need to do this on has 52 Million Rows.
Additional Info
ID is truly sequential, but RowNumber may not be. I think RowNumber will always begin with 1 but for example the RowNumbers for group1 could be "1,1,2,2,3,4" and for group2 they could be "1,2,4,6", etc.
For the clarified requirements in the comments
The rownumbers for group1 could be "1,1,2,2,3,4" and for group2 they
could be "1,2,4,6" ... a higher number followed by a lower would be a
new group.
A SQL Server 2012 solution could be as follows.
Use LAG to access the previous row and set a flag to 1 if that row is the start of a new group or 0 otherwise.
Calculate a running sum of these flags to use as the grouping value.
Code
WITH T1 AS
(
SELECT *,
LAG(RowNumber) OVER (ORDER BY ID) AS PrevRowNumber
FROM YourTable
), T2 AS
(
SELECT *,
IIF(PrevRowNumber IS NULL OR PrevRowNumber > RowNumber, 1, 0) AS NewGroup
FROM T1
)
SELECT ID,
RowNumber,
Data,
SUM(NewGroup) OVER (ORDER BY ID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Grp
FROM T2
SQL Fiddle
Assuming ID is the clustered index the plan for this has one scan against YourTable and avoids any sort operations.
If the ids are truly sequential, you can do:
select t.*,
(id - rowNumber) as grp
from t
Also you can use recursive CTE
;WITH cte AS
(
SELECT ID, RowNumber, Data, 1 AS [Group]
FROM dbo.test1
WHERE ID = 1
UNION ALL
SELECT t.ID, t.RowNumber, t.Data,
CASE WHEN t.RowNumber != 1 THEN c.[Group] ELSE c.[Group] + 1 END
FROM dbo.test1 t JOIN cte c ON t.ID = c.ID + 1
)
SELECT *
FROM cte
Demo on SQLFiddle
How about:
select ID, RowNumber, Data, dense_rank() over (order by grp) as Grp
from (
select *, (select min(ID) from [Your Table] where ID > t.ID and RowNumber = 1) as grp
from [Your Table] t
) t
order by ID
This should work on SQL 2005. You could also use rank() instead if you don't care about consecutive numbers.

Sequence grouping in TSQL

I'm trying to group data in sequence order. Say I have the following table:
| 1 | A |
| 1 | A |
| 1 | B |
| 1 | B |
| 1 | C |
| 1 | B |
I need the SQL query to output the following:
| 1 | A | 1 |
| 1 | A | 1 |
| 1 | B | 2 |
| 1 | B | 2 |
| 1 | C | 3 |
| 1 | B | 4 |
The last column is a group number that is incremented in each group. The important thing to note is that rows 3, 4 and 5 contain the same data which should be grouped into 2 groups not 1.
For MSSQL2008:
Suppose you have a SampleStatuses table:
Status Date
A 2014-06-11
A 2014-06-14
B 2014-06-25
B 2014-07-01
A 2014-07-06
A 2014-07-19
B 2014-07-21
B 2014-08-13
C 2014-08-19
you write the following:
;with
cte as (
select top 1 RowNumber, 1 as GroupNumber, [Status], [Date] from SampleStatuses order by RowNumber
union all
select c1.RowNumber,
case when c2.Status <> c1.Status then c2.GroupNumber + 1 else c2.GroupNumber end as GroupNumber, c1.[Status], c1.[Date]
from cte c2 join SampleStatuses c1 on c1.RowNumber = c2.RowNumber + 1
)
select * from cte;
you get this result:
RowNumber GroupNumber Status Date
1 1 A 2014-06-11
2 1 A 2014-06-14
3 2 B 2014-06-25
4 2 B 2014-07-01
5 3 A 2014-07-06
6 3 A 2014-07-19
7 4 B 2014-07-21
8 4 B 2014-08-13
9 5 C 2014-08-19
The normal way you would do what you want is the dense_rank function:
select key, val,
dense_rank() over (order by key, val)
from t
However, this does not address the problem of separating the last groups.
To handle this, I have to assume there is an "id" column. Tables, in SQL, do not have an ordering, so I need the ordering. If you are using SQL Server 2012, then you can use the lag() function to get what you need. Use the lag to see if the key, val pair is the same on consecutive rows:
with t1 as (
select id, key, val,
(case when key = lead(key, 1) over (order by id) and
val = lead(val, 1) over (order by id)
then 1
else 0
end) as SameAsNext
from t
)
select id, key, val,
sum(SameAsNext) over (order by id) as GroupNum
from t
Without SQL Server 2012 (which has cumulative sums), you have to do a self-join to identify the beginning of each group:
select t.*,
from t left outer join
t tprev
on t.id = t2.id + 1 and t.key = t2.key and t.val = t2.val
where t2.id is null
With this, assign the group as the minimum id using a join:
select t.id, t.key, t.val,
min(tgrp.id) as GroupId
from t left outer join
(select t.*,
from t left outer join
t tprev
on t.id = t2.id + 1 and t.key = t2.key and t.val = t2.val
where t2.id is null
) tgrp
on t.id >= tgrp.id
If you want these to be consecutive numbers, then put them in a subquery and use dense_rank().
This will give you rankings on your columns.
It will not give you 1,2,3 however.
It will give you 1,3,6 etc based on how many in each grouping
select
a,
b,
rank() over (order by a,b)
from
table1
See this SQLFiddle for a clearer idea of what I mean: http://sqlfiddle.com/#!3/0f201/2/0