How to apply randomly selected values to distinct dates in SQL Server - sql

I have a table showing available dates for some staff with two fields - staffid and date with information that looks :`
staffid date
1 2016-01-01
1 2016-01-02
1 2016-01-03
2 2016-01-03
3 2016-01-01
3 2016-01-03
I need to generate a list of DISTINCT available dates from this table, where the staff selected to each date is selected randomly. I know how to select rows based on one distinct field, (see for example the answer here, but this will always select the rows based on a given order in the table (so for example staff 1 for January 1, while I need selection to be random so sometimes 1 will be selected as the distinct row and sometimes staff 3 will be selected.
The result needs to be ordered by date.

Try this:
-- test data
create table your_table (staffid int, [date] date);
insert into your_table values
(1, '2016-01-01'),
(1, '2016-01-02'),
(1, '2016-01-03'),
(2, '2016-01-03'),
(3, '2016-01-01'),
(3, '2016-01-03');
-- query
select *
from (
select distinct [date] [distinct_date] from your_table
) as d
outer apply (
select top 1 staffid
from your_table
where d.[distinct_date] = [date]
order by newid()
) as x
-- result 1
distinct_date staffid
-----------------------
2016-01-01 3
2016-01-02 1
2016-01-03 1
-- result 2
distinct_date staffid
-----------------------
2016-01-01 1
2016-01-02 1
2016-01-03 2
hope it helps :)

Related

How to select rows where logged in last month and logged min 1 time in one of month preceding August in Oracle SQL?

I have table in Oracle SQL presents ID of clients and date with time of their login to application:
ID | LOGGED
----------------
11 | 2021-07-10 12:55:13.278
11 | 2021-08-10 13:58:13.211
11 | 2021-02-11 12:22:13.364
22 | 2021-01-10 08:34:13.211
33 | 2021-04-02 14:21:13.272
I need to select only these clients (ID) who has logged minimum 1 time in last month (August) and minimum 1 time in one month preceding August (June or July)
Currently we have September, so...
I need clients who has logged min 1 time in August
and min 1 time in July or Jun,
if logged in June -> not logg in July
if logged in July -> not logged in June
As a result I need like below:
ID
----
11
How can do that in Oracle SQL ? be aware that column "LOGGED" has Timestamp like: 2021-01-10 08:34:13.211
May be you consider this:
select id
from yourtable
group by id
having count(case
months_between(trunc(sysdate,'MM'),
trunc(logged,'MM')
) when 1 then 1 end
) >= 1
and count
(case when
months_between(trunc(sysdate,'MM') ,
trunc(logged,'MM')
) in (2,3) then 1 end
) = 1
I don't understand one thing:
You wrote :
minimum 1 time in one month preceding August (June or July)
and after then:
if logged in June -> not logg in July
if logged in July -> not logged in June
If you need EXACTLY one month- June or July
just consider my query above.
If you need minimum one logon in June and July, then:
select id
from yourtable
group by id
having count(case
months_between(trunc(sysdate,'MM'),
trunc(logged,'MM')
) when 1 then 1 end
) >= 1
and count
(case when
months_between(trunc(sysdate,'MM') ,
trunc(logged,'MM')
) in (2,3) then 1 end
) >= 1
Your question needs some clarification, but based on what you were describing I am seeing a couple of options.
The simplest one is probably using a combo of data densification (for generating a row for every month for each id) plus an analytical function (for enabling inter-row calculations. Here's a simple example of this:
rem create a dummy table with some more data (you do not seem to worry about the exact timestamp)
drop table logs purge;
create table logs (ID number, LOGGED timestamp);
insert into logs values (11, to_timestamp('2021-07-10 12:55:13.278','yyyy-mm-dd HH24:MI:SS.FF'));
insert into logs values (11, to_timestamp('2021-07-11 12:55:13.278','yyyy-mm-dd HH24:MI:SS.FF'));
insert into logs values (11, to_timestamp('2021-08-10 13:58:13.211','yyyy-mm-dd HH24:MI:SS.FF'));
insert into logs values (11, to_timestamp('2021-02-11 12:22:13.364','yyyy-mm-dd HH24:MI:SS.FF'));
insert into logs values (11, to_timestamp('2021-04-11 12:22:13.364','yyyy-mm-dd HH24:MI:SS.FF'));
insert into logs values (22, to_timestamp('2021-01-10 08:34:13.211','yyyy-mm-dd HH24:MI:SS.FF'));
insert into logs values (33, to_timestamp('2021-04-02 14:21:13.272','yyyy-mm-dd HH24:MI:SS.FF'));
commit;
The following SQL gets your data densified and lists the total count of logins for a month and the previous month on the same row so that you could do a comparative calculation. I have not done then, but I am hoping you get the idea.
with t as
(-- dummy artificial table just to create a time dimension for densification
select distinct to_char(sysdate - rownum,'yyyy-mm') mon
from dual connect by level < 300),
l_sparse as
(-- aggregating your login info per month
select id, to_char(logged,'yyyy-mm') mon, count(*) cnt
from logs group by id, to_char(logged,'yyyy-mm') ),
l_dense as
(-- densification with partition outer join
select t.mon, l.id, cnt from l_sparse l partition by (id)
right outer join t on (l.mon = t.mon)
)
-- final analytical function to list current and previous row info in same record
select mon, id
, cnt
, lag(cnt) over (partition by id order by mon asc) prev_cnt
from l_dense
order by id, mon;
parts of the result:
MON ID CNT PREV_CNT
------- ---------- ---------- ----------
2020-12 11
2021-01 11
2021-02 11 2
2021-03 11 2
2021-04 11 1
2021-05 11 1
2021-06 11
2021-07 11 3
2021-08 11 2 3
2021-09 11 2
2020-12 22
2021-01 22 2
2021-02 22 2
2021-03 22
2021-04 22
...
You can see for ID 11 that for 2021-08 you have logins for the current and previous month, so you can math on it. (Would require another subselect/with branch).
Alternatives to this would be:
interrow calculation plus time math between two logged timestamps
pattern matching
Did not drill into those, not enough info about your real requirement.

How to sum the hours using two Date Fields and group them by the user id in SQL

I feel like the task is straight forward but I am having hard time getting it to do what I want.
Here is a table in my database:
ID |Empl_Acc_ID |CheckIn |CheckOut |WeekDay
----------------------------------------------------------------------------
1 | 1 | 2017-09-24 08:03:02.143 | 2017-09-24 12:00:00.180 | Sun
2 | 1 | 2017-09-24 13:02:23.457 | 2017-09-24 17:01:02.640 | Sun
3 | 2 | 2017-09-24 08:05:23.457 | 2017-09-24 13:01:02.640 | Mon
4 | 2 | 2017-09-24 14:05:23.457 | 2017-09-24 17:00:02.640 | Mon
5 | 3 | 2017-09-24 07:05:23.457 | 2017-09-24 11:30:02.640 | Tue
6 | 3 | 2017-09-24 12:31:23.457 | 2017-09-24 16:01:02.640 | Tue
and so on....
I want to group Empl_Acc_ID by the same date and sum up the total hours each employee worked that day. Each employee could have either one or more records per day depending on how many breaks he/she took that day.
For example if Empl_Acc_ID (2) worked 3 different days with one break, the table will contain 6 records for that person but in my query I want to see 3 records with the total hours they worked each day.
Here is how I constructed the query:
select distinct w.Empl_Acc_ID, ws.fullWorkDayHours
from Work_Schedule as w
INNER JOIN (
SELECT Empl_Acc_ID, fullWorkDayHours = Sum(DATEDIFF(hour, w.CheckIn, w.CheckOut))
from Work_Schedule w
GROUP BY Empl_Acc_ID
) ws on w.Empl_Acc_ID = ws.Empl_Acc_ID
This query does not quite get me what I need. It only returns the sum of hours per employee for all the days they worked. Also, this query only has 2 columns but I want to see more columns. when I tried adding more columns, the records no longer are distinct by Empl_Acc_ID.
What is wrong with the query?
Thank you
You do not need self-join this table in that case, just group by casting the datetime field to date.
create table Work_Schedule (
ID TINYINT,
Empl_Acc_ID TINYINT,
CheckIn DATETIME,
CheckOut DATETIME,
WeekDay CHAR(3)
);
INSERT INTO Work_Schedule VALUES (1, 1,'2017-09-24 08:03:02.143','2017-09-24 12:00:00.180','Sun');
INSERT INTO Work_Schedule VALUES (2, 1,'2017-09-24 13:02:23.457','2017-09-24 17:01:02.640','Sun');
INSERT INTO Work_Schedule VALUES (3, 2,'2017-09-24 08:05:23.457','2017-09-24 13:01:02.640','Mon');
INSERT INTO Work_Schedule VALUES (4, 2,'2017-09-24 14:05:23.457','2017-09-24 17:00:02.640','Mon');
INSERT INTO Work_Schedule VALUES (5, 3,'2017-09-24 07:05:23.457','2017-09-24 11:30:02.640','Tue');
INSERT INTO Work_Schedule VALUES (6, 3,'2017-09-24 12:31:23.457','2017-09-24 16:01:02.640','Tue');
SELECT w.Empl_Acc_ID,
CAST(CheckIn AS DATE) [date],
SUM(DATEDIFF(hour, w.CheckIn, w.CheckOut)) fullWorkDayHours
FROM Work_Schedule w
GROUP BY w.Empl_Acc_ID, CAST(CheckIn AS DATE)
DROP TABLE Work_Schedule;
Empl_Acc_ID date fullWorkDayHours
1 2017-09-24 8
2 2017-09-24 8
3 2017-09-24 8
Try this. You just have to group by date and employee account.
select Employee.Empl_Acc_ID, FirstName, LastName, Username,
convert(varchar(10), checkin, 101) as checkin, convert(varchar(10),
checkout, 101) as checkout, sum(datediff(hour, checkin, checkout)) as hours
from Employee
inner join Employee_Account on Employee.Empl_Acc_ID =
Employee_Account.Empl_Acc_ID
inner join Work_Schedule on Employee_Account.Empl_Acc_ID =
Work_Schedule.Empl_Acc_ID
group by convert(varchar(10), checkin, 101), convert(varchar(10), checkout,
101), Employee.Empl_Acc_ID, FirstName, LastName, Username
order by Employee.Empl_Acc_ID
You do not group by date, that's the issue:
SELECT DISTINCT w.Empl_Acc_ID, ws.fullWorkDayHours, ws.CheckInDate
FROM Work_Schedule as w
INNER JOIN (
SELECT Empl_Acc_ID, CAST(w.CheckIn AS DATE) AS [CheckInDate], fullWorkDayHours = Sum(DATEDIFF(hour,
w.CheckIn, w.CheckOut))
from Work_Schedule w
GROUP BY Empl_Acc_ID, CAST(w.CheckIn AS DATE)
) ws on w.Empl_Acc_ID = ws.Empl_Acc_ID
No need of doing self join, it works fine without it:
Select distinct Empl_Acc_ID, Sum(DATEDIFF(hour,CheckIN,CheckOut)) As
FullDayWorkHours from EMP2
where DATEPART(day,CheckIn)=DATEPART(day,CheckOut)
Group By Empl_Acc_ID

Sql Server group by sets of columns

I have a data set where I need to count patient visits with such rules:
Two or more visits to the same doctor in the same day count as 1 visit, regardless of the reason
Two or more visits to different doctors for the same reason count as 1 visit
Two or more visits to different doctors on the same day for different reasons count as two or more visits.
Example data:
DoctorId PatientId VisitDate ReasonCode RowId
-------- --------- --------- ---------- -----
1 100 2014-01-01 200 1
1 100 2014-01-01 210 2
2 100 2014-01-01 200 3
2 100 2014-01-11 300 4
1 100 2014-01-15 200 5
2 400 2014-01-15 200 6
In this example, my final count would be based on grouping rowId 1, 2, 3 for 1 visit; grouping row 4 as 1 visit, grouping row 5 as 1 visit for a total of 3 visits for patient 100. Patient 400 has 1 visit as well.
patientid visitdate numberofvisits
--------- --------- --------------
100 2014-01-01 3
100 2014-01-11 1
100 2014-01-15 1
400 2014-01-15 1
Where I'm stuck is how to handle the group by so that I get the different scenarios covered. If the grouping were doctor, date, I'd be fine. If it were doctor, date, ReasonCode, I'd be fine. It's the logic of the doctorId and the ReasonCode in the scenario where 2 doctors are involved, and doctorid and date in the other when it's the same doctor. I've not been deeply into Sql Server in a long time, so it's possible that a common table expression is the solution and I'm not seeing it. I'm using Sql Server 2014 and there's a decent lattitude in performance. I would be looking for a sql server query that produces the results above. As best I can tell, there's no way to group this the way I need it counted.
The answer was an except clause and grouping each of the sets before a final count. Sometimes, we over-complicate things.
DECLARE #tblAllData TABLE
(
DoctorId INT NOT NULL
, PatientId INT NOT NULL
, VisitDate DATE NOT NULL
, ReasonCode INT NOT NULL
, RowId INT NOT NULL
)
INSERT #tblAllData
SELECT
1,100,'2014-01-01',200,1
UNION ALL
SELECT
1,100,'2014-01-01',210,2
UNION ALL
SELECT
2,100,'2014-01-01',200,3
UNION ALL
SELECT
2,100,'2014-01-11',300,4
UNION ALL
SELECT
1,100,'2014-01-15',200,5
UNION ALL
SELECT
2,400,'2014-01-15',200,6
DECLARE #tblTempCountedRows AS TABLE
(
PatientId INT NOT NULL
, VisitDate DATE
, ReasonCode INT
)
INSERT #tblTempCountedRows
SELECT PatientId, VisitDate,0
FROM #tblAllData
GROUP BY PatientId, DoctorId, VisitDate
EXCEPT
SELECT PatientId, VisitDate, ReasonCode
FROM #tblAllData
GROUP BY PatientId, VisitDate, ReasonCode
select * from #tblTempCountedRows
DECLARE #tblFinalCountedRows AS TABLE
(
PatientId INT NOT NULL
, VisitCount INT
)
INSERT #tblFinalCountedRows
SELECT
PatientId
, count(1) as Member_visit_Count
FROM
#tblTempCountedRows
GROUP BY PatientId
SELECT * from #tblFinalCountedRows
Here's a Sql Fiddle with the results:
Sql Fiddle

Renumbering rows in SQL Server

I'm kinda new into the SQL Server and I'm having the following question: is there any possibility to renumber the rows in a column?
For ex:
id date name
1 2016-01-02 John
2 2016-01-02 Jack
3 2016-01-02 John
4 2016-01-02 John
5 2016-01-03 Jack
6 2016-01-03 Jack
7 2016-01-04 John
8 2016-01-03 Jack
9 2016-01-02 John
10 2016-01-04 Jack
I would like that all "Johns" to start with id 1 and go on (2, 3, 4 etc) and all "Jacks" have the following number when "John" is done (5, 6, 7 etc). Thanks!
I hope this helps..
declare #t table (id int ,[date] date,name varchar(20))
insert into #t
( id, date, name )
values (1,'2016-01-02','John')
,(2,'2016-01-02','Jack')
,(3,'2016-01-02','John')
,(4,'2016-01-02','John')
,(5,'2016-01-03','Jack')
,(6,'2016-01-03','Jack')
,(7,'2016-01-04','John')
,(8,'2016-01-03','Jack')
,(9,'2016-01-02','John')
,(10,'2016-01-04','Jack')
select
row_number() over(order by name,[date]) as ID,
date ,
name
from
#t
order by name
The id should just be an internal identifier you use for joins etc - I wouldn't change it. But you could query such a numbering using a window function:
SELECT ROW_NUMBER() OVER (ORDER BY CASE name WHEN 'John' THE 1 ELSE 2 END) AS rn,
date,
name
FROM mytable
Instead of renumbering the id column, you can use ROW_NUMBER window function to renumber the rows as per your requirement. for e.g.:
SELECT ROW_NUMBER() OVER(PARTITION BY name ORDER BY date) as rowid,date,name
FROM tablename

Total number of days for a task before going on to the next one, grouped by person

I am trying to figure out how to show how many days have been worked on a certain task by using the dates in between each “task login” for each person. I think this can be done with one query? I'm open to suggestions and/or ideas.
The Table:
--------+-----------+----------
Person | TaskLogin | Date
--------+-----------+----------
Jane | A | 2013-01-01
Jane | B | 2013-01-03
Jane | A | 2013-01-06
Jane | B | 2013-01-10
Bob | A | 2013-01-01
Bob | A | 2013-01-06
---------------------------------------------------------------------
Row 1: Jane starts task A starting 2013-01-01 and works on it until starting Task B on 2013-01-03 = 2 days worked on Task A
Row 2: Jane starts on task B starting 2013-01-03 and works on it until starting task A on 2013-01-06 = 3 days worked on Task B
Row 3: Jane starts on task A starting 2013-01-06 and works on it until starting task B on 2013-01-10 = 4 days worked on Task A
Row 4: Skip because that is the highest date for Jane (Jane may or may not finish task B 2013-01-10 but we will not count it)
Row 5: Bob starts task A starting on 2013-01-01 and works on it until continuing to work on task A by logging it again on 2013-01-06 = 5 days worked on task A
Row 6: Skip because that is the highest date for Bob
A = 11 days because 2 + 4 + 5
B = 3 days because of Row 2
The output:
------+---------------------
Tasks | Time between Tasks
------+---------------------
A | 11 days
B | 3 days
**EDIT:*****
The solutions of Nicarus and Gordon Linoff (first pre-2013 solution specifically, with my edits in the comments) works. Note that (select distinct * from table t) t for table can be added to Gordon Linoff's solution to accommodate for the case of someone logging in twice in the same day.
What you are looking for is the lead() function. This is only available in SQL Server 2012. Before that, the easiest way is a correlated subquery:
select TaskLogin, sum(datediff(day, date, nextdate)) as days
from (select t.*,
(select top 1 date
from table t2
where t2.person = t.person
order by date desc
) as nextdate
from table t
) t
where nextdate is not null
group by TaskLogin;
In SQL Server 2012, it would be:
select TaskLogin, sum(datediff(day, date, nextdate)) as days
from (select t.*, lead(date) over (partition by person order by date) as nextdate
from table t
) t
where nextdate is not null
group by TaskLogin;
Maybe not the most elegant way, but it certainly works:
-- Setup table/insert values --
IF OBJECT_ID('TempDB.dbo.#TaskAccounting') IS NOT NULL BEGIN
DROP TABLE #TaskAccounting
END
CREATE TABLE #TaskAccounting
(
Person VARCHAR(4) NOT NULL,
TaskLogin CHAR(1) NOT NULL,
TaskDate DATETIME NOT NULL
)
INSERT INTO #TaskAccounting
VALUES ('Jane','A','2013-01-01')
INSERT INTO #TaskAccounting
VALUES ('Jane','B','2013-01-03')
INSERT INTO #TaskAccounting
VALUES ('Jane','A','2013-01-06')
INSERT INTO #TaskAccounting
VALUES ('Jane','B','2013-01-10')
INSERT INTO #TaskAccounting
VALUES ('Bob','A','2013-01-01')
INSERT INTO #TaskAccounting
VALUES ('Bob','A','2013-01-06');
-- Use a CTE to add sequence and join on it --
WITH Tasks AS (
SELECT
Person,
TaskLogin,
TaskDate,
ROW_NUMBER() OVER(PARTITION BY Person ORDER BY TaskDate) AS Sequence
FROM
#TaskAccounting
)
SELECT
a.TaskLogin AS Tasks,
CAST(SUM(DATEDIFF(DD,a.TaskDate,b.TaskDate)) AS VARCHAR) + ' days' AS TimeBetweenTasks
FROM
Tasks a
JOIN
Tasks b
ON (a.Person = b.Person)
AND (a.Sequence = b.Sequence - 1)
GROUP BY
a.TaskLogin