SQL Pivot Half of table - sql

I have a table that consists of time information. It's basically:
Employee, Date, Seq, Time In, Time Out.
They can clock out multiple times a day, so I'm trying to get all of the clock outs in a day on one row. My result would be something like:
Employee, Date, TimeIn1, TimeOut1, TimeIn2, TimeOut2, TimeIn3, TimeOut3....
Where the 1, 2, and 3 are the sequence numbers. I know I could just do a bunch of left joins to the table itself based on employee=employee, date=date, and seq=seq+1, but is there a way to do it in a pivot? I don't want to pivot the employee and date fields, just the time in and time out.

The short answer is: Yes, it's possible.
The exact code will be updated if/when you provide sample data to clarify some points, but you can absolutely pivot the times out while leaving the employee/work date alone.
Sorry for the wall of code; none of the fiddle sites are working from my current computer
declare #test table (
pk int,
workdate date,
seq int,
tIN time,
tOUT time
)
insert into #test values
(1, '2020-11-25', 1, '08:00', null),
(1, '2020-11-25', 2, null, '11:00'),
(1, '2020-11-25', 3, '11:32', null),
(1, '2020-11-25', 4, null, '17:00'),
(2, '2020-11-25', 5, '08:00', null),
(2, '2020-11-25', 6, null, '09:00'),
(2, '2020-11-25', 7, '09:15', null),
-- new date
(1, '2020-11-27', 8, '08:00', null),
(1, '2020-11-27', 9, null, '08:22'),
(1, '2020-11-27', 10, '09:14', null),
(1, '2020-11-27', 11, null, '12:08'),
(1, '2020-11-27', 12, '01:08', null),
(1, '2020-11-27', 13, null, '14:40'),
(1, '2020-11-27', 14, '14:55', null),
(1, '2020-11-27', 15, null, '17:00')
select *
from (
/* this just sets the column header names and condenses their values */
select
pk,
workdate,
colName = case when tin is not null then 'TimeIn' + cast(empDaySEQ as varchar) else 'TimeOut' + cast(empDaySEQ as varchar) end,
colValue = coalesce(tin, tout)
from (
/* main query */
select
pk,
workdate,
/* grab what pair # this clock in or out is; reset by employee & date */
empDaySEQ = (row_number() over (partition by pk, workdate order by seq) / 2) + (row_number() over (partition by pk, workdate order by seq) % 2),
tin,
tout
from #test
) i
) a
PIVOT (
max(colValue)
for colName
IN ( /* replace w/ dynamic if you don't know upper boundary of max in/out pairs */
[TimeIn1],
[TimeOut1],
[TimeIn2],
[TimeOut2],
[TimeIn3],
[TimeOut3],
[TimeIn4],
[TimeOut4]
)
) mypivotTable
generates these results.
(I would provide a fiddle demo but they're not working for me today)

Related

Averaging values and getting standard devs from database to build graph

Tricky to Explain so Ill shrink down the info to a minimum:
But first, I'll try and explain my ultimate goal, I want to take users who trialed a product and determine how that product affected a value as a percentage compared to their average baseline and then average all these percentages with stand devs.
I have database with the a table that has a user_id, a value, a date.
user_id
value
date
int
int
int in epoch miliseconds
I then have a second table which indicates when a trial began and ends for a user and the product they are using for said trial.
user_id
start_date
end_date
product id
int
int in epoch milisecs
int in epoch milisecs
int
What I want to do is gather all the user's trials for one product type, and for each user that participated get a baseline value and their percent change each day. Then take all these percentages and average them and get a standard deviation for each day.
One problem is date needs to convert to days since start_date so anything between the start date and the first 24 hrs will be lumped as day 0, next 24 as day 1, and so forth. So ill be averaging the percents of each day
Not every day was recorded for each user so some will have multiple missing days, so I cant need to mark each day as days from start
The start_date's are random between users
So the graph will look like this:
picture
I would prefer to do as much of it in sql as possible, but the rest will be in Golang.
I was thinking about grabbing each trial , and then each trial will have an array of results. so then I iterate over each trial and iteriate over the results for each trial picking day 0, day 1, day 2 and saving these in their own arrays which I will then average. Everything start getting super messy though
such as in semi pseudo code:
db.Query("select user_id, start_date from trials where product_id = $1", productId).Scan(&trial.UserId, &trial.StartDate)
//extract trials from rows
for _, trial := range trials {
// extract leadingAvgStart from StartDate
db.QueryRow("select AVG(value) from results where user_id = $1 date between $2 and $3", trial.UserId, leadingAvgStart, trial.StartDate)
// Now we have the baseline for the user
rows := db.Query("select value, date from results where product_id = $1", start)
//Now we extract the results and have and array
//Convert Dates to Dates from start Date
//...? It just start getting ugly and I believe there has to be a better way
}
How can I do most of the heavy lifting with sql?
create table users (id int PRIMARY KEY, name text);
create table products (id int PRIMARY KEY, name text);
create table values (
id int PRIMARY KEY
, user_id int REFERENCES users(id)
, value int
, date numeric
);
create table trials (
id int PRIMARY KEY
, user_id int REFERENCES users(id)
, start_date numeric
, end_date numeric
, product_id int REFERENCES products(id)
);
INSERT INTO users (id, name ) VALUES
(1,'John'),
(2,'Jane'),
(3,'Billy'),
(4,'Miranda');
INSERT INTO products (id, name ) VALUES
(1, 'pill A'),
(2, 'pill B'),
(3, 'pill C'),
(4, 'exercise bal'),
(5, 'diet plan');
INSERT INTO trials (id,user_id,start_date,end_date,product_id) VALUES
(1, 1, 1667896408000, 1668099442000, 1),
(2, 1, 1667896408000, 1668099442000, 2),
(3, 2, 1667576960000, 1668074401000, 3),
(4, 3, 1667896408000, 1668099442000, 1);
INSERT INTO values (id, user_id, value, date) VALUES
(38, 1, 7, 1668182428000),
(1, 1, 7, 1668099442000),
(2, 1, 8, 1668074401000),
(3, 1, 8, 1668012300000),
(4, 1, 6, 1668011197000),
(5, 1, 6, 1667978268000),
(6, 1, 9, 1667925002000),
(7, 1, 9, 1667896408000),
(8, 1, 4, 1667838601000),
(9, 1, 6, 1667803049000),
(10, 1, 7, 1667576960000),
(12, 1, 5, 1667546428000),
(13, 1, 8, 1667490149000),
(14, 2, 8, 1668182428000),
(15, 2, 7, 1668099442000),
(16, 2, 8, 1668074401000),
(17, 2, 9, 1668012300000),
(18, 2, 6, 1668011197000),
(19, 2, 6, 1667978268000),
(20, 2, 5, 1667925002000),
(21, 2, 9, 1667896408000),
(22, 2, 4, 1667803049000),
(23, 2, 4, 1667576960000),
(24, 2, 5, 1667546428000),
(25, 2, 9, 1667490149000),
(26, 3, 6, 1668182428000),
(27, 3, 7, 1668099442000),
(28, 3, 8, 1668074401000),
(29, 3, 9, 1668011197000),
(30, 3, 6, 1667978268000),
(31, 3, 9, 1667925002000),
(32, 3, 9, 1667896408000),
(33, 3, 8, 1667838601000),
(34, 3, 6, 1667803049000),
(35, 3, 4, 1667576960000),
(36, 3, 5, 1667546428000),
(37, 3, 6, 1667490149000);
Ok I figures it out, basically I do two inner join queries and treat those as tables and then inner join those, and use a group by to average
select
query1.product_uuid,
query1.days,
AVG(query1.value / query2.avg) as avg_percent
from
(
select
DATE_PART(
'day',
to_timestamp(
values
.date / 1000
):: date - trials.start_date
) as days,
trials.uuid as trial_uuid,
trials.product_uuid,
values
.value as value
from
values
as
values
inner join product_trials as trials ON
values
.user_id = trials.user_id
where
values
.source = 'Trued'
and
values
.use = 'true'
AND trials.start_date IS NOT NULL
AND trials.end_date IS NOT NULL
AND to_timestamp(
values
.date / 1000
):: date > trials.start_date
AND to_timestamp(
values
.date / 1000
):: date < trials.end_date
) as query1
inner join (
select
values
.user_id,
trials.uuid as trial_uuid,
AVG(value)
from
values
inner join product_trials as trials ON
values
.user_id = trials.user_id
where
source = 'Trued'
and use = true
AND trials.start_date IS NOT NULL
AND trials.end_date IS NOT NULL
AND to_timestamp(
values
.date / 1000
):: date < trials.start_date
AND DATE_PART(
'day',
to_timestamp(
values
.date / 1000
):: date - trials.start_date
) > -20
GROUP BY
values
.user_id,
trials.uuid
) as query2 ON query1.trial_uuid = query2.trial_uuid
where
query2.avg > 0
GROUP BY
query1.days,
query1.product_uuid
ORDER BY
query1.product_uuid,
query1.days

Group By get the currently matching Effective data out of past and future date in SQL Server [duplicate]

This question already has answers here:
Get top 1 row of each group
(19 answers)
Closed 2 years ago.
I'm having list of rows with a column EffectiveOn in SQL Server database table. I want to fetch the currently applicable EffectiveOn for each AccountId respective to current date. Consider the following table
In the above table I have to fetch the rows whose (Id: 11 to 15) because the current date (i.e., today) is 2020-11-19
I tried the following solution but I can't How do I get the current records based on it's Effective Date?
Kindly assist me how to get the expected result-set.
Sample Data:
CREATE TABLE [dbo].[DataInfo]
(
[Id] INT NOT NULL,
[AccountId] INT NOT NULL,
[EffectiveOn] DATE NOT NULL
)
GO
INSERT INTO [dbo].[DataInfo](Id, AccountId, EffectiveOn)
VALUES (1, 1, '2020-01-01'), (2, 2, '2020-01-02'), (3, 3, '2020-01-03'), (4, 4, '2020-01-04'), (5, 5, '2020-01-05'),
(6, 1, '2020-05-01'), (7, 2, '2020-05-02'), (8, 3, '2020-05-03'), (9, 4, '2020-05-04'), (10, 5, '2020-05-05'),
(11, 1, '2020-10-01'), (12, 2, '2020-10-02'), (13, 3, '2020-10-03'), (14, 4, '2020-10-04'), (15, 5, '2020-10-05'),
(16, 1, '2021-02-01'), (17, 2, '2021-02-02'), (18, 3, '2021-02-03'), (19, 4, '2021-02-04'), (20, 5, '2021-02-05')
You can use a correlated subquery to get the most recent date as of a particular date:
select di.*
from datainfo di
where di.effectiveon = (select max(di2.effecctiveon)
from datainfo di2
where di2.accountid = di.accountid and
di2.effectiveon < getdate()
);
You can also do this with window functions:
select di.*
from (select di.*,
row_number() over (partition by accountid order by effective on desc) as seqnum
from datainfo di
where di.effectiveon < getdate()
) di
where seqnum = 1;

COUNT() with GROUP BY - Query to get number of times a set of values appear in a column SQL

Just struggling with this question.
Write a query that selects the following details about the sides that have been ordered in orders:
• The side ID number and side name.
• How many orders the side has been ordered in (regardless of quantity).
I have created a view already and this is the table for it (ordered_sides_details is the name of the view)
View Table
I've wrttien this query but I believe it just counts the number of rows instead of how many times each side is ordered.
SELECT ordered_sides_details.side_name, COUNT(*)
FROM ordered_sides_details
GROUP BY ordered_sides_details.side_name;
This is the resulting table
Obviously its incorrect as 1.25L Coke has only been in 1 order.
Any help with solving this would be awesome. Thanks.
Solution
There must be something wrong with view you've created.
This should be enough to yield proper results:
SELECT
side_id
,side_name
,COUNT(*) AS total_count
FROM dbo.orders
GROUP BY side_id, side_name
Boostrapping (SQL Server)
Scripts for bootstrapping your example:
IF NOT EXISTS (SELECT 1 FROM sys.tables t WHERE t.object_id = OBJECT_ID('dbo.orders'))
BEGIN
CREATE TABLE orders
(
order_id INT,
side_id INT NOT NULL,
side_name NVARCHAR(100) NOT NULL,
ordered_quantity INT NOT NULL,
total_cost MONEY NOT NULL
);
END;
INSERT INTO orders (order_id, side_id, side_name, ordered_quantity, total_cost)
VALUES
(10, 1, '390ml Coke', 1, 3.00),
(5, 2, '1.25l Coke', 2, 10.00),
(8, 3, 'Lava Cake', 3, 8.85),
(7, 4, 'Chicken Wings', 4, 14.00),
(6, 5, 'Garlic Bread', 4, 7.80),
(5, 6, 'Healthy Kale Chips', 3, 16.50),
(5, 6, 'Healthy Kale Chips', 2, 11.00),
(4, 5, 'Garlic Bread', 1, 1.95),
(3, 4, 'Chicken Wings', 1, 3.50),
(2, 3, 'Lava Cake', 2, 5.90);

How do you join tables sharing the same column?

I made an SQL Fiddle and what I would like to do is join these two queries by using the departmentid.
What I would like to show is the departmentname and not_approved_manager.
Would it be best to use a union or join in this case?
Tables
create table cserepux
(
status int,
comment varchar(25),
departmentid int,
approveddate datetime
);
insert into cserepux (status, comment, departmentid, approveddate)
values (1, 'testing1', 1, NULL), (1, 'testing2', 1, NULL),
(1, 'testing2', 2, NULL), (0, 'testing2', 1, NULL),
(0, 'tesitng2', 1, NULL), (0, 'testing2', 1, NULL),
(0, 'tesitng2', 1, NULL), (0, 'testing3', 2, NULL),
(0, 'testing3', 3, NULL);
create table cseDept
(
departmentid int,
department_name varchar(25)
);
insert into cseDept (departmentid,department_name)
values (1, 'department one'), (2, 'department two'),
(3, 'department three'), (4, 'department four');
Query
select
departmentid,
COUNT(*) AS 'not_approved_manager'
from
cserepux
where
approveddate is null
group by
departmentid
SELECT * FROM cseDept
You need to do a join. A union will not get you what you want.
select d.department_name, COUNT(*) AS 'not_approved_manager'
from cserepux c
inner join cseDept d on c.departmentid = d.departmentid
where approveddate is null
group by d.department_name
Do you need just a join and a correct group by
select dep.department_name, COUNT(*) AS 'not_approved_manager'
from cseDept dep
join cserepux cs on cs.departmentid = dep.departmentid
where approveddate is null
group by dep.department_name
Fiddle: http://sqlfiddle.com/#!3/5cf4e/30
Since joins and group by are really basic things in SQL I can suggest you do take a look on some tutorials to get a bit more proficiency whit it. You can try SQL Server Central stairway articles series

How to group rows by their DATEDIFF?

I hope you can help me.
I need to display the records in HH_Solution_Audit table -- if 2 or more staffs enter the room within 10 minutes. Here are the requirements:
Display only the events that have a timestamp (LAST_UPDATED) interval of less than or equal to 10 minutes. Therefore, I must compare the current row to the next row and previous row to check if their DATEDIFF is less than or equal to 10 minutes. I’m done with this part.
Show only the records if the number of distinct STAFF_GUID inside the room for less than or equal to 10 minutes is at least 2.
HH_Solution_Audit Table Details:
ID - PK
STAFF_GUID - staff id
LAST_UPDATED - datetime when a staff enters a room
Here's what I got so far. This satisfies requirement # 1 only.
CREATE TABLE HH_Solution_Audit (
ID INT PRIMARY KEY,
STAFF_GUID NVARCHAR(1),
LAST_UPDATED DATETIME
)
GO
INSERT INTO HH_Solution_Audit VALUES (1, 'b', '2013-04-25 9:01')
INSERT INTO HH_Solution_Audit VALUES (2, 'b', '2013-04-25 9:04')
INSERT INTO HH_Solution_Audit VALUES (3, 'b', '2013-04-25 9:13')
INSERT INTO HH_Solution_Audit VALUES (4, 'a', '2013-04-25 10:15')
INSERT INTO HH_Solution_Audit VALUES (5, 'a', '2013-04-25 10:30')
INSERT INTO HH_Solution_Audit VALUES (6, 'a', '2013-04-25 10:33')
INSERT INTO HH_Solution_Audit VALUES (7, 'a', '2013-04-25 10:41')
INSERT INTO HH_Solution_Audit VALUES (8, 'a', '2013-04-25 11:02')
INSERT INTO HH_Solution_Audit VALUES (9, 'a', '2013-04-25 11:30')
INSERT INTO HH_Solution_Audit VALUES (10, 'a', '2013-04-25 11:45')
INSERT INTO HH_Solution_Audit VALUES (11, 'a', '2013-04-25 11:46')
INSERT INTO HH_Solution_Audit VALUES (12, 'a', '2013-04-25 11:51')
INSERT INTO HH_Solution_Audit VALUES (13, 'a', '2013-04-25 12:24')
INSERT INTO HH_Solution_Audit VALUES (14, 'b', '2013-04-25 12:27')
INSERT INTO HH_Solution_Audit VALUES (15, 'b', '2013-04-25 13:35')
DECLARE #numOfPeople INT = 2,
--minimum number of people that must be inside
--the room for #lengthOfStay minutes
#lengthOfStay INT = 10,
--number of minutes of stay
#dateFrom DATETIME = '04/25/2013 00:00',
#dateTo DATETIME = '04/25/2013 23:59';
WITH cteSource AS
(
SELECT ID, STAFF_GUID, LAST_UPDATED,
ROW_NUMBER() OVER (ORDER BY LAST_UPDATED) AS row_num
FROM HH_SOLUTION_AUDIT
WHERE LAST_UPDATED >= #dateFrom AND LAST_UPDATED <= #dateTo
)
SELECT [current].ID, [current].STAFF_GUID, [current].LAST_UPDATED
FROM
cteSource AS [current]
LEFT OUTER JOIN
cteSource AS [previous] ON [current].row_num = [previous].row_num + 1
LEFT OUTER JOIN
cteSource AS [next] ON [current].row_num = [next].row_num - 1
WHERE
DATEDIFF(MINUTE, [previous].LAST_UPDATED, [current].LAST_UPDATED)
<= #lengthOfStay
OR
DATEDIFF(MINUTE, [current].LAST_UPDATED, [next].LAST_UPDATED)
<= #lengthOfStay
ORDER BY [current].ID, [current].LAST_UPDATED
Running the query returns IDs:
1, 2, 3, 5, 6, 7, 10, 11, 12, 13, 14
That satisfies requirement # 1 of having less than or equal to 10 minutes interval between the previous row, current row and next row.
Can you help me with the 2nd requirement? If it's applied, the returned IDs should only be:
13, 14
Here's an idea. You don't need ROW_NUMBER and previous and next records. You just need to queries unioned - one looking for everyone that have someone checked X minutes behind, and another looking for X minutes upfront. Each uses a correlated sub-query and COUNT(*) to find number of matching people. If number is greater then your #numOfPeople - that's it.
EDIT: new version: Instead of doing two queries with 10 minutes upfront and behind, we'll only check for 10 minutes behind - selecting those that match in cteLastOnes. After that will go in another part of query to search for those that actually exist within those 10 minutes. Ultimately again making union of them and the 'last ones'
WITH cteSource AS
(
SELECT ID, STAFF_GUID, LAST_UPDATED
FROM HH_SOLUTION_AUDIT
WHERE LAST_UPDATED >= #dateFrom AND LAST_UPDATED <= #dateTo
)
,cteLastOnes AS
(
SELECT * FROM cteSource c1
WHERE #numOfPeople -1 <= (SELECT COUNT(DISTINCT STAFF_GUID)
FROM cteSource c2
WHERE DATEADD(MI,#lengthOfStay,c2.LAST_UPDATED) > c1.LAST_UPDATED
AND C2.LAST_UPDATED <= C1.LAST_UPDATED
AND c1.STAFF_GUID <> c2.STAFF_GUID)
)
SELECT * FROM cteLastOnes
UNION
SELECT * FROM cteSource s
WHERE EXISTS (SELECT * FROM cteLastOnes l
WHERE DATEADD(MI,#lengthOfStay,s.LAST_UPDATED) > l.LAST_UPDATED
AND s.LAST_UPDATED <= l.LAST_UPDATED
AND s.STAFF_GUID <> l.STAFF_GUID)
SQLFiddle DEMO - new version
SQLFiddle DEMO - old version