How to compute the diff between records?

How to compute the diff between records? - sql

My table records is like below
ym cnt
200901 57
200902 62
200903 67
...
201001 84
201002 75
201003 75
...
201101 79
201102 77
201103 80
...
I want to computer the diff between current month and per month .
the result would like below ...
ym cnt diff
200901 57 57
200902 62 5 (62 - 57)
200903 67 5 (67 - 62)
...
201001 84 ...
201002 75
201003 75
...
201101 79
201102 77
201103 80
...
Can anyone told me how to wrote a sql to got the result and with a good performance ?
UPDATE:
sorry for simple words
my solution is
step1: input the currentmonth data into temp table1
step2: input the permonth data into temp table2
step3: left join 2 tables to compute the result
Temp_Table1
SELECT (ym - 1) as ym , COUNT( item_cnt ) as cnt
FROM _table
GROUP BY (ym - 1 )
order by ym
Temp_Table2
SELECT ym , COUNT( item_cnt ) as cnt
FROM _table
GROUP BY ym
order by ym
select ym , (b.cnt - a.cnt) as diff from Temp_Table2 a
left join Temp_Table1 b
on a.ym = b.ym
*If i want to compare the diff between the month in this year and last year
I can only change the ym - 1 to ym - 100*
but , actually , the group by key is not only ym
there is max 15 keys and max 100 millions records
so , I wonder a good solution can easy to manager the source
and good performance.

For MSSQL, this has one reference to the table, so potentially it can be faster (maybe not) than left join which has two references to the table:
-- ================
-- sample data
-- ================
declare #t table
(
ym varchar(6),
cnt int
)
insert into #t values ('200901', 57)
insert into #t values ('200902', 62)
insert into #t values ('200903', 67)
insert into #t values ('201001', 84)
insert into #t values ('201002', 75)
insert into #t values ('201003', 75)
-- ===========================
-- solution
-- ===========================
select
ym2,
diff = case when cnt1 is null then cnt2
when cnt2 is null then cnt1
else cnt2 - cnt1
end
from
(
select
ym1 = max(case when k = 2 then ym end),
cnt1 = max(case when k = 2 then cnt end),
ym2 = max(case when k = 1 then ym end),
cnt2 = max(case when k = 1 then cnt end)
from
(
select
*,
rn = row_number() over(order by ym)
from #t
) t1
cross join
(
select k = 1 union all select k = 2
) t2
group by rn + k
) t
where ym2 is not null

Can anyone told me how to wrote a sql to got the result
Absolutely. Simply get the row with the next highest date, and subtract.
and with a good performance ?
No. Relational databases are not really meant to be traversed linearly, and even using indexes appropriately would require a virtual linear traversal.

Related

SQL Server : group by sum of column

I need to aggregate data by one column which contains numeric data.
I have data like:
ID | Amount
---+-------
1 | 44
2 | 15
3 | 16
4 | 8
5 | 16
Result, which I expect is:
ID | Amount
---+-------
1 | 44
2 | 31
4 | 24
Query should group data ordered by ID column by Amount column in parts of max sum of amount 32. If amount is greater then 32 then it should be presented as one 'group'. Result should contain Min(ID) and SUM(Amount) which can't be greater than 32 when group more than one record.

The only way that I know how to accomplish this is using iteration (although in your case if you have enough single values over 32, then you might be able to use a more efficient approach).
Iteration in SQL Server queries is handled by recursive CTEs (once you forswear cursors):
with v as (
select *
from (values (1, 44), (2, 15), (3, 16), (4, 8), (5, 16) ) v(id, amount)
),
t as (
select v.*, row_number() over (order by id) as seqnum
),
cte as (
select seqnum, id, amount, id as grp
from t
where seqnum = 1
union all
select t.seqnum, t.id,
(case when t.amount + cte.amount > 32 then t.amount else t.amount + cte.amount end) as amount,
(case when t.amount + cte.amount > 32 then t.id else cte.grp end) as grp
from cte join
t
on cte.seqnum = t.seqnum + 1
)
select grp, max(amount)
from cte
group by grp;
I should note that the use of max(amount) in the outer query assumes that the values are never negative. A slight modification can handle that situation.
Also, the intermediate result using t is not strictly necessary for the data you have provided. It ensures that the columns used in the join actually have no gaps.

You can try this version with rownumbers assigned initially and each row is joined to the previous one in a recursive cte. And if the running sum > 32 a new group starts.
with rownums as (select t.*,row_number() over(order by id) as rnum from t)
,cte(rnum,id,amount,runningsum,grp) as (select rnum,id,amount,amount,1 from rownums where rnum=1
union all
select t.rnum,t.id,t.amount
,case when c.runningsum+t.amount > 32 then t.amount else c.runningsum+t.amount end
,case when c.runningsum+t.amount > 32 then t.id else c.grp end
from cte c
join rownums t on t.rnum=c.rnum+1
)
select grp as id,max(runningsum) as amount
from cte
group by grp
Sample Demo

Back fill timeseries data in SQL

I have data in a SQL (Vertica) database table that looks like this...
ts src val
---------------------------------
10:25:10 C 72
10:25:09 A 13
10:25:08 A 99
10:25:05 B 22
10:25:02 C 71
I need to "rotate" it into columns and backfill the last known value by the src column like so.
ts a_val b_val c_val
----------------------------
10:25:10 13 22 72
10:25:09 13 22 71
10:25:08 99 22 71
10:25:05 null 22 71
10:25:02 null null 71
I know all the possible values of the src ahead of time.

Probably the easiest way is with correlated subqueries. This won't necessarily have the best performance:
select t.ts,
(select t2.val from table t2 where t2.ts <= t.ts and t2.src = 'a' order by t2.ts desc) as val_a,
(select t2.val from table t2 where t2.ts <= t.ts and t2.src = 'b' order by t2.ts desc) as val_b,
(select t2.val from table t2 where t2.ts <= t.ts and t2.src = 'c' order by t2.ts desc) as val_c
from table t;
An index on table(ts, src, val) might help the subqueries in a database other than Vertica.

Use analytic functions. Something like:
SELECT ts
, src
, MIN(val) val
FROM (
SELECT ts
, src
, first_value(val) OVER (
PARTITION BY src
ORDER BY ts
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
) val
FROM table
) tab
GROUP BY 1, 2
ORDER BY 1, 2

Counting number of positive value in a query

I'm working on the following query and table
SELECT dd.actual_date, dd.week_number_overall, sf.branch_id, AVG(sf.overtarget_qnt) AS targetreach
FROM sales_fact sf, date_dim dd
WHERE dd.date_id = sf.date_id
AND dd.week_number_overall BETWEEN 88-2 AND 88
AND sf.branch_id = 1
GROUP BY dd.actual_date, branch_id, dd.week_number_overall
ORDER BY dd.actual_date ASC;
ACTUAL_DATE WEEK_NUMBER_OVERALL BRANCH_ID TARGETREACH
----------- ------------------- ---------- -----------
13/08/14 86 1 -11
14/08/14 86 1 12
15/08/14 86 1 11.8
16/08/14 86 1 1.4
17/08/14 86 1 -0.2
19/08/14 86 1 7.2
20/08/14 87 1 16.6
21/08/14 87 1 -1.4
22/08/14 87 1 14.4
23/08/14 87 1 2.8
24/08/14 87 1 18
26/08/14 87 1 13.4
27/08/14 88 1 -1.8
28/08/14 88 1 10.6
29/08/14 88 1 7.2
30/08/14 88 1 14
31/08/14 88 1 9.6
02/09/14 88 1 -3.2
the "TargetReach" column shows whether target has been reach or not.
A negative value means target wasn't reached on that day.
How can I get calculate the number of ROW with positive value for this query?
that will show something like:
TOTAL_POSITIVE_TARGET_REACH WEEK_NUMBER_OVERALL
--------------------------- ------------------
13 88
I have tried to use CASE but still not working right.
Thanks a lot.

You want to use conditional aggregation:
with t as (
<your query here>
)
select week_number_overall, sum(case when targetreach > 0 then 1 else 0 end)
from t
group by week_number_overall;
However, I would rewrite your original query to use proper join syntax. Then the query would look like:
SELECT week_number_overall,
SUM(CASE WHEN targetreach > 0 THEN 1 ELSE 0 END)
FROM (SELECT dd.actual_date, dd.week_number_overall, sf.branch_id, AVG(sf.overtarget_qnt) AS targetreach
FROM sales_fact sf JOIN
date_dim dd
ON dd.date_id = sf.date_id
WHERE dd.week_number_overall BETWEEN 88-2 AND 88 AND sf.branch_id = 1
GROUP BY dd.actual_date, branch_id, dd.week_number_overall
) t
GROUP BY week_number_overall
ORDER BY week_number_overall;
THe difference between a CTE (the first solution) and a subquery is (in this case) just a matter of preference.

SELECT WEEK_NUMBER_OVERALL, COUNT(*) TOTAL_POSITIVE_TARGET_REACH
FROM (your original query)
WHERE TARGETREACH >= 0
GROUP BY WEEK_NUMBER_OVERALL

select sum( decode( sign( TARGETREACH ) , -1 , 0 , 0 , 0 , 1 , 1 ) )
from ( "your query here" );

Use HAVING Clause
SELECT dd.actual_date, dd.week_number_overall, sf.branch_id, AVG(sf.overtarget_qnt) AS targetreach
FROM sales_fact sf, date_dim dd
WHERE dd.date_id = sf.date_id
AND dd.week_number_overall BETWEEN 88-2 AND 88
AND sf.branch_id = 1
GROUP BY dd.actual_date, branch_id, dd.week_number_overall
HAVING AVG(sf.overtarget_qnt)>0
ORDER BY dd.actual_date ASC;

Using decode(), sign() get both positive count & negative count.
drop table test;
create table test (
key number(5),
value number(5));
insert into test values ( 1, -9 );
insert into test values ( 2, -8 );
insert into test values ( 3, 10 );
insert into test values ( 4, 12 );
insert into test values ( 5, -9 );
insert into test values ( 6, 8 );
insert into test values ( 7, 51 );
commit;
select sig , count ( sig ) from
(
select key, ( (decode( sign( value ) , -1 , '-ve' , 0 , 'zero' , 1 , '+ve' ) ) ) sig
from test
)
group by sig
SIG COUNT(SIG)
---- ----------------------
+ve 4
-ve 3

How can I get the first result for each account in this SQL query?

I'm trying to write a query that follows this logic:
Find the first following status code of an account that had a previous status code of X.
So if I have a table of:
id account_num status_code
64 1 X
82 1 Y
72 2 Y
87 1 Z
91 2 X
103 2 Z
The results would be:
id account_num status_code
82 1 Y
103 2 Z
I've come up with a couple of solutions but I'm not all that great with SQL and so they've been pretty inelegeant thus far. I was hoping that someone here might be able to point me in the right direction.
View:
SELECT account_number, id
FROM table
WHERE status_code = 'X'
Query:
SELECT account_number, min(id)
FROM table
INNER JOIN view
ON table.account_number = view.account_number
WHERE table.id > view.id
At this point I have the id that I need but I'd have to write ANOTHER query that uses the id tolook up the status_code.
Edit: To add some context, I'm trying to find calls that have a status_code of X. If a call has a status_code of X we want to dial it a different way the next time we make an attempt. The aim of this query is to provide a report that will show the results of the second dial if the first dial resulted an X status code.

Here's a SQL Server solution.
UPDATE
The idea is to avoid a number of NESTED LOOP joins as proposed by Olaf because they roughly have O(N * M) complexity and thus extremely bad for your performance. MERGED JOINS complexity is O(NLog(N) + MLog(M)) which is much better for real world scenarios.
The query below works as follows:
RankedCTE is a subquery that assigns a row number to each id partioned by account and sorted by id which represents the time. So for the data below the output of this
SELECT
id,
account_num,
status_code,
ROW_NUMBER() OVER (PARTITION BY account_num ORDER BY id DESC) AS item_rank
FROM dbo.Test
would be:
id account_num status_code item_rank
----------- ----------- ----------- ----------
87 1 Z 1
82 1 Y 2
64 1 X 3
103 2 Z 1
91 2 X 2
72 2 Y 3
Once we have them numbered we join the result on itself like this:
WITH RankedCTE AS
(
SELECT
id,
account_num,
status_code,
ROW_NUMBER() OVER (PARTITION BY account_num ORDER BY id DESC) AS item_rank
FROM dbo.Test
)
SELECT
*
FROM
RankedCTE A
INNER JOIN RankedCTE B ON
A.account_num = B.account_num
AND A.item_rank = B.item_rank - 1
which will give us an event and a preceding event in the same table
id account_num status_code item_rank id account_num status_code item_rank
----------- ----------- ----------- ----------- ----------- ----------- ----------- -----------
87 1 Z 1 82 1 Y 2
82 1 Y 2 64 1 X 3
103 2 Z 1 91 2 X 2
91 2 X 2 72 2 Y 3
Finally, we just have to take the preceding event with code "X" and the event with code not "X":
WITH RankedCTE AS
(
SELECT
id,
account_num,
status_code,
ROW_NUMBER() OVER (PARTITION BY account_num ORDER BY id DESC) AS item_rank
FROM dbo.Test
)
SELECT
A.id,
A.account_num,
A.status_code
FROM
RankedCTE A
INNER JOIN RankedCTE B ON
A.account_num = B.account_num
AND A.item_rank = B.item_rank - 1
AND A.status_code <> 'X'
AND B.status_code = 'X'
Query plans for this query and #Olaf Dietsche solution (one of the versions) are below.
Data setup script
CREATE TABLE dbo.Test
(
id int not null PRIMARY KEY,
account_num int not null,
status_code nchar(1)
)
GO
INSERT dbo.Test (id, account_num, status_code)
SELECT 64 , 1, 'X' UNION ALL
SELECT 82 , 1, 'Y' UNION ALL
SELECT 72 , 2, 'Y' UNION ALL
SELECT 87 , 1, 'Z' UNION ALL
SELECT 91 , 2, 'X' UNION ALL
SELECT 103, 2, 'Z'

SQL Fiddle with subselect
select id, account_num, status_code
from mytable
where id in (select min(t1.id)
from mytable t1
join mytable t2 on t1.account_num = t2.account_num
and t1.id > t2.id
and t2.status_code = 'X'
group by t1.account_num)
and SQL Fiddle with join, both for MS SQL Server 2012, both returning the same result.
select id, account_num, status_code
from mytable
join (select min(t1.id) as min_id
from mytable t1
join mytable t2 on t1.account_num = t2.account_num
and t1.id > t2.id
and t2.status_code = 'X'
group by t1.account_num) t on id = min_id

SELECT MIN(ID), ACCOUNT_NUM, STATUS_CODE FROM (
SELECT ID, ACCOUNT_NUM, STATUS_CODE
FROM ACCOUNT A1
WHERE EXISTS
(SELECT 1
FROM ACCOUNT A2
WHERE A1.ACCOUNT_NUM = A2.ACCOUNT_NUM
AND A2.STATUS_CODE = 'X'
AND A2.ID < A1.ID)
) SUB
GROUP BY ACCOUNT_NUM
Here's an SQLFIDDLE

Here's query, with your data, checked under PostgreSQL:
SELECT t0.*
FROM so13594339 t0 JOIN
(SELECT min(t1.id), t1.account_num
FROM so13594339 t1, so13594339 t2
WHERE t1.account_num = t2.account_num AND t1.id > t2.id AND t2.status_code = 'X'
GROUP BY t1.account_num
) z
ON t0.id = z.min AND t0.account_num = z.account_num;

SQL Server query for finding the sum of 4 consecutive values

Can somebody help me in finding the sum of 4 consecutive values i.e rolling sum of last 4 values.
Like:
VALUE SUM
1 NULL
2 NULL
3 NULL
4 10
5 14
6 18
7 22
8 26
9 30
10 34
11 38
12 42
13 46
14 50
15 54
16 58
17 62
18 66
19 70
20 74
21 78
22 82
23 86
24 90
25 94
26 98
27 102
28 106
29 110
30 114
31 118
32 122
33 126
34 130
35 134
36 138
37 142
38 146
Thanks,

select sum(select top 4 Value from [table] order by Value Desc)
or, perhaps
select sum(value)
from [Table]
where Value >= (Max(Value) - 4)
I haven't actually tried either of those- and can't at the moment, but they should get you pretty close.

Quick attempt, which gets the results you've posted in your question (except the 1st 3 rows are not NULL). Assumes that VALUE field is unique and in ascending order:
-- Create test TABLE with 38 values in
DECLARE #T TABLE (Value INTEGER)
DECLARE #Counter INTEGER
SET #Counter = 1
WHILE (#Counter <= 38)
BEGIN
INSERT #T VALUES(#Counter)
SET #Counter = #Counter + 1
END
-- This gives the results
SELECT t1.VALUE, x.Val
FROM #T t1
OUTER APPLY(SELECT SUM(VALUE) FROM (SELECT TOP 4 VALUE FROM #T t2 WHERE t2.VALUE <= t1.VALUE ORDER BY t2.VALUE DESC) x) AS x(Val)
ORDER BY VALUE
At the very least, you should see the kind of direction I was heading in.

Assuming ID can give you the last 4 rows.
SELECT SUM([SUM])
FROM
(
SELECT TOP 4 [SUM] FROM myTable ORDER BY ID DESC
) foo
Each time you query it, it will read the last 4 rows.
If this is wrong (e.g. you want the sum of each consecutive 4 rows), then please give sample output

Following would work if your Value column is sequential
;WITH q (Value) AS (
SELECT 1
UNION ALL
SELECT q.Value + 1
FROM q
WHERE q.Value < 38
)
SELECT q.Value
, CASE WHEN q.Value >= 4 THEN q.Value * 4 - 6 ELSE NULL END
FROM q
otherwise you might use something like this
;WITH q (Value) AS (
SELECT 1
UNION ALL
SELECT q.Value + 1
FROM q
WHERE q.Value < 38
)
, Sequential (ID, Value) AS (
SELECT ID = ROW_NUMBER() OVER (ORDER BY Value)
, Value
FROM q
)
SELECT s1.Value
, [SUM] = s1.Value + s2.Value + s3.Value + s4.Value
FROM Sequential s1
LEFT OUTER JOIN Sequential s2 ON s2.ID = s1.ID - 1
LEFT OUTER JOIN Sequential s3 ON s3.ID = s2.ID - 1
LEFT OUTER JOIN Sequential s4 ON s4.ID = s3.ID - 1
Note that the table qin the examples is a stub for your actual table. The actual statement then becomes
;WITH Sequential (ID, Value) AS (
SELECT ID = ROW_NUMBER() OVER (ORDER BY Value)
, Value
FROM YourTable
)
SELECT s1.Value
, [SUM] = s1.Value + s2.Value + s3.Value + s4.Value
FROM Sequential s1
LEFT OUTER JOIN Sequential s2 ON s2.ID = s1.ID - 1
LEFT OUTER JOIN Sequential s3 ON s3.ID = s2.ID - 1
LEFT OUTER JOIN Sequential s4 ON s4.ID = s3.ID - 1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to compute the diff between records? - sql

Related

SQL Server : group by sum of column

Back fill timeseries data in SQL

Counting number of positive value in a query

How can I get the first result for each account in this SQL query?

SQL Server query for finding the sum of 4 consecutive values

Categories

Resources