Difference between rows in sql based on rows number (SQL Server) - sql

I have a problem. I have table with following columns and sample data:
RN Date Time
---------------------
1 2015-02-02 12
2 2015-02-02 25
3 2015-02-02 27
1 2015-02-08 42
2 2015-02-08 45
1 2015-03-01 60
2 2015-03-01 62
3 2015-03-01 63
4 2015-03-01 63
I need get a difference between time start and time end of every day.
For example:
27-12
45-42
63-60
Any suggestions? :)

select
Date, max(Time) as mx, min(Time) as mn, max(Time) - min(Time) as diff
from table_name
group by Date

Related

Why does my cumulative column not work as expected?

Here is my query , I have a column called cum_balance which is supposed to calculate the cumulative balance but after row number 10 there is an anamoly and it doesn't work as expected , all I notice is that from row number 10 onwards the hour column has same value. What's the right syntax for this?
[select
hour,
symbol,
amount_usd,
category,
sum(amount_usd) over (
order by
hour asc RANGE BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) as cum_balance
from
combined_transfers_usd_netflow
order by
hour][1]
I have tried removing RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW , adding a partition by hour and group by hour. None of them gave the expected result or errors
Row Number
Hour
SYMBOL
AMOUNT_USD
CATEGORY
CUM_BALANCE
1
2021-12-02 23:00:00
WETH
227.2795
in
227.2795
2
2021-12-03 00:00:00
WETH
-226.4801153
out
0.7993847087
3
2022-01-05 21:00:00
WETH
5123.716203
in
5124.515587
4
2022-01-18 14:00:00
WETH
-4466.2366
out
658.2789873
5
2022-01-19 00:00:00
WETH
2442.618599
in
3100.897586
6
2022-01-21 14:00:00
USDC
99928.68644
in
103029.584
7
2022-03-01 16:00:00
UNI
8545.36098
in
111574.945
8
2022-03-04 22:00:00
USDC
-2999.343
out
108575.602
9
2022-03-09 22:00:00
USDC
-5042.947675
out
103532.6543
10
2022-03-16 21:00:00
USDC
-4110.6579
out
98594.35101
11
2022-03-16 21:00:00
UNI
-3.209306045
out
98594.35101
12
2022-03-16 21:00:00
UNI
-16.04653022
out
98594.35101
13
2022-03-16 21:00:00
UNI
-16.04653022
out
98594.35101
14
2022-03-16 21:00:00
UNI
-16.04653022
out
98594.35101
15
2022-03-16 21:00:00
UNI
-6.418612089
out
98594.35101
The "problem" with your data in all the ORDER BY values after row 10 are the same.
So if we shrink the data down a little, and use for groups to repeat the experiment:
with data(grp, date, val) as (
select * from values
(1,'2021-01-01'::date, 10),
(1,'2021-01-02'::date, 11),
(1,'2021-01-03'::date, 12),
(2,'2021-01-01'::date, 20),
(2,'2021-01-02'::date, 21),
(2,'2021-01-02'::date, 22),
(2,'2021-01-04'::date, 23)
)
select d.*
,sum(val) over ( partition by grp order by date RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) as cum_val_1
,sum(val) over ( partition by grp order by date ) as cum_val_2
from data as d
order by 1,2;
we get:
GRP
DATE
VAL
CUM_VAL_1
CUM_VAL_2
1
2021-01-01
10
10
10
1
2021-01-02
11
21
21
1
2021-01-03
12
33
33
2
2021-01-01
20
20
20
2
2021-01-02
21
63
63
2
2021-01-02
22
63
63
2
2021-01-04
23
86
86
we see with group 1 that values accumulate as we expect. So for group 2 we put duplicate values as see those rows get the same value, but rows after "work as expected again".
This tells us how this function work across unstable data (values that sort the same) is that they are all stepped in one leap.
Thus if you want each row to be different they will need better ORDER distinctness. This could be forced by add random values of literal random nature, or feeling non random ROW_NUMBER, but really they would be random, albeit not explicit, AND if you use random, you might get duplicates, thus really should use ROW_NUMBER or SEQx to have unique values.
Also the second formula shows they are equal, and it's the ORDER BY problem not the framing of "which rows" are used.
with data(grp, date, val) as (
select * from values
(1,'2021-01-01'::date, 10),
(1,'2021-01-02'::date, 11),
(1,'2021-01-03'::date, 12),
(2,'2021-01-01'::date, 20),
(2,'2021-01-02'::date, 21),
(2,'2021-01-02'::date, 22),
(2,'2021-01-04'::date, 23)
)
select d.*
,seq8() as s
,sum(val) over ( partition by grp order by date ) as cum_val_1
,sum(val) over ( partition by grp order by date, s ) as cum_val_2
,sum(val) over ( partition by grp order by date, seq8() ) as cum_val_3
from data as d
order by 1,2;
gives:
GRP
DATE
VAL S
CUM_VAL_1
CUM_VAL_2
CUM_VAL_2_2
1
2021-01-01
10
0
10
10
1
2021-01-02
11
1
21
21
1
2021-01-03
12
2
33
33
2
2021-01-01
20
3
20
20
2
2021-01-02
21
4
63
41
2
2021-01-02
22
5
63
63
2
2021-01-04
23
6
86
86

Rolling min/max in SQL Query?

The database engine is SQLite3. It's a simple table:
CREATE TABLE T (ID INTEGER, DATE STRING, VALUE NUMERIC);
-- rows of T:
id date value
1 2020-01-01 11
2 2020-01-01 23
3 2020-01-01 32
4 2020-01-01 41
5 2020-01-01 57
6 2020-01-01 62
How can I create a rolling min/max? Say of period 3:
id date val min3 max3
1 2020-01-01 11 11 11
2 2020-01-01 23 11 11
3 2020-01-01 32 11 32
4 2020-01-01 41 23 41
5 2020-01-01 57 32 57
5 2020-01-01 62 41 62
I keep getting min 11 Max 62 for everything because I don't know how to do the rolling min/max
You can use window functions:
select t.*,
min(val) over (order by date rows between 2 preceding and current row) min3,
max(val) over (order by date rows between 2 preceding and current row) max3
from t;

SQL how to count but only count one instance if two columns match?

Wondering how to select from a table:
FIELDID personID purchaseID dateofPurchase
--------------------------------------------------
2 13 147 2014-03-21 00:00:00
3 15 165 2015-03-23 00:00:00
4 13 456 2018-03-24 00:00:00
5 1 133 2018-03-21 00:00:00
6 23 123 2013-03-22 00:00:00
7 25 456 2013-03-21 00:00:00
8 25 456 2013-03-23 00:00:00
9 22 456 2013-03-28 00:00:00
10 25 589 2013-03-21 00:00:00
11 82 147 1991-10-22 00:00:00
12 82 453 2003-03-22 00:00:00
I'd like to get a result table of two columns: weekday and the number of purchases of each weekday, but only count the distinct days of purchases if done by the same person on the same day - for example since personID 25 purchased two things on 2013-03-21, that should only count as one 'thursday' instead of 2.
Basically, if the personID and the dateofPurchase are the same for more than one row, only count it once is what I want.
Here is what I have currently: It does everything correctly except it will count the above scenario under the thursday twice, when I would only want to add one:
SELECT v.wkday as day, COUNT(*) as 'absences'
FROM dbo.AttendanceRecord pr CROSS APPLY
(VALUES (CASE WHEN DATEPART(WEEKDAY, date) IN (1, 7)
THEN 'Weekend'
ELSE DATENAME(WEEKDAY, date)
END)
) v(wkday)
GROUP BY v.wkday;
to clarify:
If an item is purchased for at least one puchaseID on a specific day they will be counted as purchased for that day, and do not need to be counted again for each new purchase ID on that day.
I think you want to count distinct persons, so that would be:
COUNT(DISTINCT personid) as absences
Note that single quotes are not appropriate around column aliases. If you need to escape them, use square braces.
EDIT:
If you want to count distinct person-days, then you can use:
COUNT(DISTINCT CONCAT(personid, ':', dateofpurchase) as absences

Creating a new calculated column in SQL

Is there a way to find the solution so that I need for 2 days, there are 2 UD's because there are June 24 2 times and for the rest there are single days.
I am showing the expected output here:
Primary key UD Date
-------------------------------------------
1 123 2015-06-24 00:00:00.000
6 456 2015-06-24 00:00:00.000
2 123 2015-06-25 00:00:00.000
3 658 2015-06-26 00:00:00.000
4 598 2015-06-27 00:00:00.000
5 156 2015-06-28 00:00:00.000
No of times Number of days
-----------------------------
4 1
2 2
The logic is 4 users are there who used the application on 1 day and there are 2 userd who used the application on 2 days
You can use two levels of aggregation:
select cnt, count(*)
from (select date, count(*) as cnt
from t
group by date
) d
group by cnt
order by cnt desc;

How to calculate a running total that is a distinct sum of values

Consider this dataset:
id site_id type_id value date
------- ------- ------- ------- -------------------
1 1 1 50 2017-08-09 06:49:47
2 1 2 48 2017-08-10 08:19:49
3 1 1 52 2017-08-11 06:15:00
4 1 1 45 2017-08-12 10:39:47
5 1 2 40 2017-08-14 10:33:00
6 2 1 30 2017-08-09 07:25:32
7 2 2 32 2017-08-12 04:11:05
8 3 1 80 2017-08-09 19:55:12
9 3 2 75 2017-08-13 02:54:47
10 2 1 25 2017-08-15 10:00:05
I would like to construct a query that returns a running total for each date by type. I can get close with a window function, but I only want the latest value for each site to be summed for the running total (a simple window function will not work because it sums all values up to a date--not just the last values for each site). So I guess it could be better described as a running distinct total?
The result I'm looking for would be like this:
type_id date sum
------- ------------------- -------
1 2017-08-09 06:49:47 50
1 2017-08-09 07:25:32 80
1 2017-08-09 19:55:12 160
1 2017-08-11 06:15:00 162
1 2017-08-12 10:39:47 155
1 2017-08-15 10:00:05 150
2 2017-08-10 08:19:49 48
2 2017-08-12 04:11:05 80
2 2017-08-13 02:54:47 155
2 2017-08-14 10:33:00 147
The key here is that the sum is not a running sum. It should only be the sum of the most recent values for each site, by type, at each date. I think I can help explain it by walking through the result set I've provided above. For my explanation, I'll walk through the original data chronologically and try to explain the expected result.
The first row of the result starts us off, at 2017-08-09 06:49:47, where chronologically, there is only one record of type 1 and it is 50, so that is our sum for 2017-08-09 06:49:47.
The second row of the result is at 2017-08-09 07:25:32, at this point in time we have 2 unique sites with values for type_id = 1. They have values of 50 and 30, so the sum is 80.
The third row of the result occurs at 2017-08-09 19:55:12, where now we have 3 sites with values for type_id = 1. 50 + 30 + 80 = 160.
The fourth row is where it gets interesting. At 2017-08-11 06:15:00 there are 4 records with a type_id = 1, but 2 of them are for the same site. I'm only interested in the most recent value for each site so the values I'd like to sum are: 30 + 80 + 52 resulting in 162.
The 5th row is similar to the 4th since the value for site_id:1, type_id:1 has changed again and is now 45. This results in the latest values for type_id:1 at 2017-08-12 10:39:47 are now: 30 + 80 + 45 = 155.
Reviewing the 6th row is also interesting when we consider that at 2017-08-15 10:00:05, site 2 has a new value for type_id 1, which gives us: 80 + 45 + 25 = 150 for 2017-08-15 10:00:05.
You can get a cumulative total (running total) by including an ORDER BY clause in your window frame.
select
type_id,
date,
sum(value) over (partition by type_id order by date) as sum
from your_table;
The ORDER BY works because
The default framing option is RANGE UNBOUNDED PRECEDING, which is the same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.
SELECT type_id,
date,
SUM(value) OVER (PARTITION BY type_id ORDER BY type_id, date) - (SUM(value) OVER (PARTITION BY type_id, site_id ORDER BY type_id, date) - value) AS sum
FROM your_table
ORDER BY type_id,
date