How to get data with a must different conditions?

How to get data with a must different conditions? - sql

If i have tables with the following structure :
USERS :
USERID,NAME
TIMETB :
CHECKTIME,Sysdate,modified,USERID
If i have sample data like this :
USERS :
USERID NAME
434 moh
77 john
66 yara
TIMETB :
CHECKTIME USERID modified
2015-12-21 07:20:00.000 434 0
2015-12-21 08:39:00.000 434 2
2015-12-22 07:31:00.000 434 0
2015-12-21 06:55:00.000 77 0
2015-12-21 07:39:00.000 77 0
2015-12-25 07:11:00.000 66 0
2015-12-25 07:22:00.000 66 0
2015-12-25 07:50:00.000 66 2
2015-12-26 07:40:00.000 66 2
2015-12-26 07:21:00.000 66 2
Now i want to get the users who have two or more different transactions(modified) at the same date :
The result i expect is :
CHECKTIME USERID modified NAME
2015-12-21 07:20:00.000 434 0 moh
2015-12-21 08:39:00.000 434 2 moh
2015-12-25 07:11:00.000 66 0 yara
2015-12-25 07:22:00.000 66 0 yara
2015-12-25 07:50:00.000 66 2 yara
I write the following query but i get more than i expect i mean i get users who have transactions of the same (modified) !!.
SELECT a.CHECKTIME,
a.Sysdate,
(CASE WHEN a.modified = 0 THEN 'ADD' ELSE 'DELETE' END) AS modified,
b.BADGENUMBER,
b.name,
a.Emp_num AS Creator
FROM TIMETB a
INNER JOIN Users b ON a.USERID = b.USERID
WHERE YEAR(checktime) = 2015
AND MONTH(checktime) = 12
AND (
SELECT COUNT(*)
FROM TIMETB cc
WHERE cc.USERID = a.USERID
AND CONVERT(DATE, cc.CHECKTIME) = CONVERT(DATE, a.CHECKTIME)
AND cc.modified IN (0, 2)
) >= 2
AND a.modified IS NOT NULL
AND a.Emp_num IS NOT NULL

You use window functions for this:
select t.*
from (select t.*,
count(*) over (partition by userid, cast(checktime as date)) as cnt
from timetb t
) t
where cnt >= 2;
If you want the name, just join in the appropriate table.
EDIT:
If you want different values of a column, a simple way is to compare the min and max values:
select t.*
from (select t.*,
min(modified) over (partition by userid, cast(checktime as date)) as minm,
max(modified) over (partition by userid, cast(checktime as date)) as maxm
from timetb t
) t
where minm <> maxm;

Related

Snowflake SQL - Count Distinct Users within descending time interval

I want to count the distinct amount of users over the last 60 days, and then, count the distinct amount of users over the last 59 days, and so on and so forth.
Ideally, the output would look like this (TARGET OUTPUT)
Day Distinct Users
60 200
59 200
58 188
57 185
56 180
[...] [...]
where 60 days is the max total possible distinct users, and then 59 would have a little less and so on and so forth.
my query looks like this.
select
count(distinct (case when datediff(day,DATE,current_date) <= 60 then USER_ID end)) as day_60,
count(distinct (case when datediff(day,DATE,current_date) <= 59 then USER_ID end)) as day_59,
count(distinct (case when datediff(day,DATE,current_date) <= 58 then USER_ID end)) as day_58
FROM Table
The issue with my query is that This outputs the data by column instead of by rows (like shown below) AND, most importantly, I have to write out this logic 60x for each of the 60 days.
Current Output:
Day_60 Day_59 Day_58
209 207 207
Is it possible to write the SQL in a way that creates the target as shown initially above?

Using below data in CTE format -
with data_cte(dates,userid) as
(select * from values
('2022-05-01'::date,'UID1'),
('2022-05-01'::date,'UID2'),
('2022-05-02'::date,'UID1'),
('2022-05-02'::date,'UID2'),
('2022-05-03'::date,'UID1'),
('2022-05-03'::date,'UID2'),
('2022-05-03'::date,'UID3'),
('2022-05-04'::date,'UID1'),
('2022-05-04'::date,'UID1'),
('2022-05-04'::date,'UID2'),
('2022-05-04'::date,'UID3'),
('2022-05-04'::date,'UID4'),
('2022-05-05'::date,'UID1'),
('2022-05-06'::date,'UID1'),
('2022-05-07'::date,'UID1'),
('2022-05-07'::date,'UID2'),
('2022-05-08'::date,'UID1')
)
Query to get all dates and count and distinct counts -
select dates,count(userid) cnt, count(distinct userid) cnt_d
from data_cte
group by dates;
DATES
CNT
CNT_D
2022-05-01
2
2
2022-05-02
2
2
2022-05-03
3
3
2022-05-04
5
4
2022-05-05
1
1
2022-05-06
1
1
2022-05-08
1
1
2022-05-07
2
2
Query to get difference of date from current date
select dates,datediff(day,dates,current_date()) ddiff,
count(userid) cnt,
count(distinct userid) cnt_d
from data_cte
group by dates;
DATES
DDIFF
CNT
CNT_D
2022-05-01
45
2
2
2022-05-02
44
2
2
2022-05-03
43
3
3
2022-05-04
42
5
4
2022-05-05
41
1
1
2022-05-06
40
1
1
2022-05-08
38
1
1
2022-05-07
39
2
2
Get records with date difference beyond a certain range only -
include clause having
select datediff(day,dates,current_date()) ddiff,
count(userid) cnt,
count(distinct userid) cnt_d
from data_cte
group by dates
having ddiff<=43;
DDIFF
CNT
CNT_D
43
3
3
42
5
4
41
1
1
39
2
2
38
1
1
40
1
1
If you need to prefix 'day' to each date diff count, you can
add and outer query to previously fetched data-set and add the needed prefix to the date diff column as following -
I am using CTE syntax, but you may use sub-query given you will select from table -
,cte_1 as (
select datediff(day,dates,current_date()) ddiff,
count(userid) cnt,
count(distinct userid) cnt_d
from data_cte
group by dates
having ddiff<=43)
select 'day_'||to_char(ddiff) days,
cnt,
cnt_d
from cte_1;
DAYS
CNT
CNT_D
day_43
3
3
day_42
5
4
day_41
1
1
day_39
2
2
day_38
1
1
day_40
1
1
Updated the answer to get distinct user count for number of days range.
A clause can be included in the final query to limit to number of days needed.
with data_cte(dates,userid) as
(select * from values
('2022-05-01'::date,'UID1'),
('2022-05-01'::date,'UID2'),
('2022-05-02'::date,'UID1'),
('2022-05-02'::date,'UID2'),
('2022-05-03'::date,'UID5'),
('2022-05-03'::date,'UID2'),
('2022-05-03'::date,'UID3'),
('2022-05-04'::date,'UID1'),
('2022-05-04'::date,'UID6'),
('2022-05-04'::date,'UID2'),
('2022-05-04'::date,'UID3'),
('2022-05-04'::date,'UID4'),
('2022-05-05'::date,'UID7'),
('2022-05-06'::date,'UID1'),
('2022-05-07'::date,'UID8'),
('2022-05-07'::date,'UID2'),
('2022-05-08'::date,'UID9')
),cte_1 as
(select datediff(day,dates,current_date()) ddiff,userid
from data_cte), cte_2 as
(select distinct ddiff from cte_1 )
select cte_2.ddiff,
(select count(distinct userid)
from cte_1 where cte_1.ddiff <= cte_2.ddiff) cnt
from cte_2
order by cte_2.ddiff desc
DDIFF
CNT
47
9
46
9
45
9
44
8
43
5
42
4
41
3
40
1

You can do unpivot after getting your current output.
sample one.
select
*
from (
select
209 Day_60,
207 Day_59,
207 Day_58
)unpivot ( cnt for days in (Day_60,Day_59,Day_58));

Postgresql Using Limit with Order by without select and where case

So I want to get the latest 2 rows of the person_id and check if the deposit field is decreased.
Here is what I have done so far;
The customer table is;
person_id employee_id deposit ts
101 201 44 2021-09-30 10:12:19+00
101 201 47 2021-09-30 09:12:19+00
101 201 65 2021-09-29 09:12:19+00
100 200 21 2021-09-29 10:12:19+00
104 203 54 2021-09-27 10:12:19+00
and as a result I want is;
person_id employee_id deposit ts pre_deposit pre_ts
101 201 44 2021-09-30 10:12:19+00 47 2021-09-30 09:12:19+00
I don't want to get deposit:65 row because I just want to check the last 2 rows. 47 > 44 so. I need to compare only the last 2 rows. if the deposit decreased in any other rows, I simply don't care.
SELECT person_id,
employee_id,
deposit,
ts,
lag(deposit) over client_window as pre_deposit,
lag(ts) over client_window as pre_ts
FROM customer
WINDOW client_window as (partition by person_id order by ts)
ORDER BY person_id , ts
so it returns a table with the following results;
person_id employee_id deposit ts pre_deposit pre_ts
101 201 44 2021-09-30 10:12:19+00 47 2021-09-30 09:12:19+00
101 201 47 2021-09-30 09:12:19+00 65 null
100 200 21 2021-09-29 10:12:19+00 null 2021-09-29 09:12:19+00
104 203 54 2021-09-27 10:12:19+00 null null
but if I do the following;
SELECT person_id,
employee_id,
deposit,
ts,
lag(deposit) over client_window as pre_deposit,
lag(ts) over client_window as pre_ts
FROM customer
WINDOW client_window as (partition by person_id order by ts limit 2)
this query doesn't work, because it throws an error as;
ERROR: syntax error at or near "limit"
LINE 11: limit 2
so how can I limit to compare the last 2 rows?
where pre_deposit > deposit

You can do:
with
data as (
select person_id, employee_id, deposit, ts,
row_number() over(partition by person_id order by ts desc) as rn
from customer
)
select a.*,
b.deposit as pre_deposit,
b.ts as pre_ts
from data a
left join data b on a.person_id = b.person_id and b.rn = 2
where a.rn = 1
Result:
person_id employee_id deposit ts rn pre_deposit pre_ts
---------- ------------ -------- ------------------------- --- ------------ ------------------------
100 200 21 2021-09-29T10:12:19.000Z 1 null null
101 201 44 2021-09-30T10:12:19.000Z 1 47 2021-09-30T09:12:19.000Z
104 203 54 2021-09-27T10:12:19.000Z 1 null null
See running example at DB Fiddle.

You are pretty close. Use this brilliant distinct on on your first query slightly modified and you are there.
select distinct on (person_id) *
from
(
select person_id, employee_id, deposit, ts,
lead(deposit) over w as pre_deposit,
lead(ts) over w as pre_ts
from customer
window w as (partition by person_id order by ts desc)
) t
where pre_deposit > deposit
order by person_id, ts desc;
SQL Fiddle here.

SQL Server - SUM and comma-separated values using GROUP BY clause

I have 2 tables:
NDEvent:
EventId EndTime
33 2020-10-23 15:00:00.000
33 2020-10-23 15:00:00.000
35 2020-10-21 03:30:00.000
35 2020-10-24 15:00:00.000
35 2020-10-25 15:00:00.000
34 2020-10-23 15:00:00.000
EventAppointment:
Id DocId EventId Amount
1 7647 34 10.00
2 7647 34 10.00
3 28531 33 20.00
4 7647 35 20.00
5 7647 35 100.00
6 7647 35 200.00
And I want result to be like this:
DocId EventId Amount Id
7647 34 20.00 1,2
28531 33 20.00 3
7647 35 320.00 4,5,6
What I have tried is:
select e.Amount,e.DoctorId,e.EventId,
Id= STUFF(
(SELECT DISTINCT ',' + CAST(e.Id as nvarchar(max))
from NDEvent nd
inner join EventAppointment e on nd.Id = e.EventId
where
GETDATE() > nd.EndTime
GROUP BY
e.Amount,e.DoctorId,e.EventId,e.Id
FOR XML PATH(''))
, 1, 1, ''
)
from NDEvent nd
inner join EventAppointment e on nd.Id = e.EventId
where
GETDATE() > nd.EndTime
GROUP BY
e.Amount,e.DoctorId,e.EventId
But it is not giving expected result.
Could anyone help with this query? Or point me to a right direction? Thank you.

It doesn't look like yo need to NDEvent table here at all (though I include it in the sample data). Just SUM and STRING_AGG against EventAppointment:
USE Sandbox
GO
WITH NDEvent AS(
SELECT *
FROM (VALUES(33,CONVERT(datetime,'2020-10-23T15:00:00.000')),
(33,CONVERT(datetime,'2020-10-23T15:00:00.000')),
(35,CONVERT(datetime,'2020-10-21T03:30:00.000')),
(35,CONVERT(datetime,'2020-10-24T15:00:00.000')),
(35,CONVERT(datetime,'2020-10-25T15:00:00.000')),
(34,CONVERT(datetime,'2020-10-23T15:00:00.000')))V(EventID,EndTime)),
EventAppointment AS(
SELECT *
FROM (VALUES(1,7647 ,34,10.00),
(2,7647 ,34,10.00),
(3,28531,33,20.00),
(4,7647 ,35,20.00),
(5,7647 ,35,100.00),
(6,7647 ,35,200.00))V(Id,DocId, EventID, Amount))
SELECT DocID,
EventID,
SUM(Amount) AS Amount,
STRING_AGG(Id,',') WITHIN GROUP (ORDER BY Id) AS IDs
FROM EventAppointment EA
GROUP BY DocId,
EventID;

Can be used in other data.
WITH Table1 AS(
SELECT EventId FROM NDEvent
GROUP BY EventId
),
Table2 AS(
SELECT e.DocId,e.EventId,e.Amount,
STUFF((
SELECT ',' + CAST(ee.Id as nvarchar)
FROM EventAppointment ee
where ee.EventId = e.EventId
GROUP BY ee.EventId,ee.Id
FOR XML PATH('')), 1, 1, '') AS Id
FROM Table1 t
LEFT OUTER JOIN EventAppointment e ON t.EventId = e.EventId
)
SELECT DocId,EventId,SUM(Amount) AS Amount,Id FROM Table2
GROUP BY DocId,EventId,Id

How do I get the time elapsed between flags for every user?

The table below represents user logins (i.e LogAction_INT = 1 is login, LogAction_INT = 0 is logout). What is the best approach to sum the time elapsed between a user's login and logout (session). Ideally I need a total of time spent per user. Everything I can think of includes while loops and it's too complex.
ID User_ID LogDate_DT LogAction_INT
1940 18 2019-04-01 13:15:06.027 1
1941 18 2019-04-01 13:47:39.010 0
1942 18 2019-04-01 15:48:46.453 1
1943 18 2019-04-01 15:54:47.520 0
1944 68 2019-04-02 15:09:20.460 1
1945 68 2019-04-02 15:53:11.223 0
1946 86 2019-04-03 12:48:14.340 1
1947 86 2019-04-03 14:49:55.400 0
1948 80 2019-04-04 08:54:48.157 1
1949 86 2019-04-04 15:26:51.917 1
1950 86 2019-04-04 15:27:53.030 0
1951 86 2019-04-04 15:28:00.920 1
1952 86 2019-04-04 15:28:30.243 0
1953 86 2019-04-04 15:28:35.490 1
1954 86 2019-04-04 15:53:41.700 0
1955 68 2019-04-04 15:54:07.720 1
1956 80 2019-04-04 16:15:55.200 0
I expect to have something like:
User TotalSessionTime
---- -----------------
18 04:45
68 10:02
80 06:12

You can enumerate each of the types and then use conditional aggregation or a join:
select user_id, seqnum,
datediff(second, min(LogDate_DT), max(LogDate_DT)) as diff_seconds
from (select t.*,
row_number() over (partition by user_id, LogAction_INT order by id) as seqnum
from t
) t
group by user_id, seqnum;
You can then sum these by user:
select user_id, sum(diff_seconds)
from (select user_id, seqnum,
datediff(second, min(LogDate_DT), max(LogDate_DT)) as diff_seconds
from (select t.*,
row_number() over (partition by user_id, LogAction_INT order by id) as seqnum
from t
) t
group by user_id, seqnum;
) t
group by user_id;
The issue with this type of problem is that the ins and outs don't usually match up quite so cleanly. That makes this a much harder problem.
In supported versions of SQL Server, I would do this using lag().

if it is always in pair, you can use row_number() to generate a running no and then group every 2 rows as 1
; with cte as
(
select *, grp = (row_number() over (partition by User_ID order by ID) - 1) / 2
from your_table
)
cte2 as
(
select User_ID, elapsed = datediff(second, min(LogDate_DT), max(LogDate_DT))
from cte
group by User_ID, grp
)
select User_ID, sum(elapsed)
from cte2
group by User_ID

TSQL OVER (PARTITION BY ... )

For my next trick, I would like to select only the most recent event for each client. Instead of four events for 000017 I want one.
OK c_id e_date e_ser e_att e_recip Age c_cm e_staff rn
--> 000017 2013-04-02 00:00:00.000 122 1 1 36 90510 90510 15
--> 000017 2013-02-26 00:00:00.000 122 1 1 36 90510 90510 20
--> 000017 2013-02-12 00:00:00.000 122 1 1 36 90510 90510 24
--> 000017 2013-01-29 00:00:00.000 122 1 1 36 90510 90510 27
--> 000188 2012-11-02 00:00:00.000 160 1 1 31 1289 1289 44
--> 001713 2013-10-01 00:00:00.000 142 1 1 26 2539 2539 1
--> 002531 2013-07-12 00:00:00.000 190 1 1 61 1689 1689 21
--> 002531 2013-06-14 00:00:00.000 190 1 1 61 1689 1689 30
--> 002531 2013-06-07 00:00:00.000 190 1 1 61 1689 1689 31
--> 002531 2013-05-28 00:00:00.000 122 1 1 61 1689 1689 33
Here is the query that got me to this stage (perhaps you have some suggestions to improve this as well, the extra nested query creating t2 table is probably excessive.) Thank you all!!!
SELECT TOP(10)*
FROM (
SELECT *
FROM (
SELECT (SELECT CASE WHEN
(e_att IN (1,2)
AND e_date > DATEADD(month, -12, getdate())
AND e_ser NOT IN (100,115)
AND e_recip NOT IN ('2','7')
AND (( (e_recip = '3') AND (DATEDIFF(Year, c_bd, GetDate())>10) ) OR (e_recip <> '3') )
AND c_cm = e_staff)
THEN '-->'
WHEN 1=1 THEN ''
END
) AS 'OK'
,c_id, e_date, e_ser, e_att, e_recip, DATEDIFF(Year, c_bd, GetDate()) AS 'Age', c_cm, e_staff
,row_number() OVER (PARTITION BY c_id ORDER BY e_date DESC) rn
FROM events INNER JOIN client ON e_case_no = c_id
LEFT OUTER JOIN doc ON doc.doc_dbid = client.c_id
WHERE client.c_id IN ( /* confidential query */ )
AND e_date > DATEADD(month, -12, getdate())
AND e_ser BETWEEN 11 AND 1000
GROUP BY c_id, e_date, e_ser, e_att, e_recip, c_bd, c_cm, e_staff
) t1
) t2
WHERE OK = '-->'
ORDER BY c_id, e_date DESC

It looks like the following produces the row number, sorted by date, per client:
,row_number() OVER (PARTITION BY c_id ORDER BY e_date DESC) rn
So adding where rn=1 should yield the most recent event per client:
) t1
WHERE rn = 1
) t2

Here is some improvements to your original query:
SELECT TOP(10) *
FROM (
SELECT '-->' AS 'OK' -- always this see where.
,c_id, e_date, e_ser, e_att, e_recip, DATEDIFF(Year, c_bd, GetDate()) AS 'Age', c_cm, e_staff
,row_number() OVER (PARTITION BY c_id ORDER BY e_date DESC) rn
FROM events INNER JOIN client ON e_case_no = c_id
LEFT OUTER JOIN doc ON doc.doc_dbid = client.c_id
WHERE client.c_id IN ( /* confidential query */ )
-- this part was in case and then filtered for later, if we put it in where now more efficient
(e_att IN (1,2) AND e_date > DATEADD(month, -12, getdate())
AND e_ser NOT IN (100,115)
AND (( (e_recip = '3') AND DATEDIFF(Year, c_bd, GetDate()>10) ) OR e_recip NOT IN ('2', '3', '7') )
AND c_cm = e_staff)
AND e_date > DATEADD(month, -12, getdate())
AND e_ser BETWEEN 11 AND 1000
GROUP BY c_id, e_date, e_ser, e_att, e_recip, c_bd, c_cm, e_staff
) t2
ORDER BY c_id, e_date DESC
Besides removing some un-needed parens, if you move the stuff from the CASE statement to a where you don't need to filter on it in the outer query and this makes it simpler.
Add in the row_number statement from McGarnagle's answer and you should get the results you want.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to get data with a must different conditions? - sql

Related

Snowflake SQL - Count Distinct Users within descending time interval

Postgresql Using Limit with Order by without select and where case

SQL Server - SUM and comma-separated values using GROUP BY clause

How do I get the time elapsed between flags for every user?

TSQL OVER (PARTITION BY ... )

Categories

Resources