SQL Server Group By UserID and Pivot By DateTime - sql-server-2012

Unlike the many similar questions, I haven't yet come across one like my requirement.
I need to report when users are logging on & off. This is the data in the table. UserID is unique.
Logged
UserID
Status
202103010657
Peter
On
202103010710
Peter
Off
202103011856
Corey
On
202103011904
Corey
Off
202103011206
Peter
On
202103011211
Peter
Off
I need to "pivot" it to be:
User
Logged On
Logged Off
Peter
202103010657
202103010710
Corey
202103011856
202103011904
Peter
202103011206
202103011211

I think you need LEAD/LAG here:
SELECT
UserID AS [User],
Logged AS [Logged On],
NextLogged AS [Logged Off]
FROM (
SELECT *,
LEAD(CASE WHEN Status = 'Off' THEN Logged END) OVER
(PARTITION BY UserID ORDER BY Logged) AS NextLogged
FROM table
) t
WHERE t.Status = 'On'
Admnittedly, this will ignore cases when there are two Off statuses consecutively.

Related

How do you merge duplicate rows in a table in BigQuery - replacing missing values with most recent records

For example, I have a table of leads from a marketing database. There are multiple records with duplicate email values. I'd like to merge all of the duplicate records to roll up into the latest updated record and if the latest updated record is missing values for certain fields then update those fields from other records most recently updated.
Table:
First
last
Email
Phone
Job Title
State
Last Updated
John
Doe
john.doe#example.com
MD
1/1/2019
John
low
john.doe#example.com
1234567891
Coach
VA
1/1/2018
John
Doe
john.doe#example.com
3214569875
Teacher
CA
1/1/2017
Andy
Yes
john.doe#example.com
DC
1/1/2021
Roby
Doe
john.doe#example.com
8628423578
Scientist
VA
1/1/2025
Output - One record:
First
last
Email
Phone
Job Title
State
Last Updated
Andy
Yes
john.doe#example.com
1234567891
Coach
DC
1/1/2021
In this example, since the 2021 record is missing a phone number and job title, those values are pulled from the most recent updated records (2018).
I've thought about using Distinct or Unique functions but not sure how to execute on the merge using the last updated record and then filling in the blank values with the other most recent records. Any help would be greatly appreciated!!
Thank you in advance.
Best,
Dawit
Consider below approach - I think it is most generic - you need just make sure you have correct list of fields in unpivot and pivot lines. Though there is an assumption that following fields (First, Last, Phone, Job_Title, State) are all of string data type
select First, Last, Email, Phone, Job_Title, State, max_Last_Updated as Last_Updated
from (
select * except(Last_Updated),
max(Last_Updated) over(partition by Email) as max_Last_Updated
from data
unpivot (value for col in (First, Last, Phone, Job_Title, State))
where true
qualify row_number() over(partition by Email, col order by Last_Updated desc) = 1
)
pivot (max(value) for col in ('First', 'Last', 'Phone', 'Job_Title', 'State', 'Last_Updated'))
If applied to sample data in your question (excluding 2025 row) - output is
You need a method to know that these are all the same record. You can use last_value(ignore nulls) for this purpose:
select t.*,
last_value(first ignore nulls) over (partition by email order by last_updated) as imputed_first,
last_value(last ignore nulls) over (partition by email order by last_updated) as imputed_first,
. . . -- and so on for the other columns
from t;

How to select a foreign key after narrowing down via Group By and Having in a subquery

I've got a unique problem. I'm querying a replicated database table cost_plan_breakdown, and the replication is known to have some duplicates due to issues with deleting records. I'm not the Admin so I'm trying to sidestep these duplicates as efficiently as possible. The table looks like this:
sys_id
sys_created_on
cost_plan
breakdown_start_date
axr123
2020-10-01 09:31:15
Outlook KTLO - Lisa Lymon
10-01-2020
pqo100
2020-12-23 05:50:20
Outlook KTLO - Lisa Lymon
10-01-2020
cji985
2020-10-01 09:31:15
Outlook KTLO - Lisa Lymon
11-01-2020
twg795
2020-10-05 13:23:08
DataPyramid CTB - Dave Dods
10-01-2020
jqr820
2020-09-28 16:11:54
Revoluccion CTB - Marcus Vance
11-01-2020
vjo150
2021-01-13 11:10:09
Server KTLO - Tom Smith
10-01-2020
Cost Plans typically have between 1 and 12 breakdowns during their lifespan, but there should only be one breakdown per cost plan per month. Notice that the Outlook Cost Plan has two breakdowns within the same month (October) with differing sys_id and sys_created_on.
So by using a smaller subquery in the where clause, I'm trying to determine the following:
"Group the rows with identical month and year of breakdown_start_date, and identical cost_plan. Of the remaining rows, select the one with the MAX sys_created_on. Take the sys_id of that row and feed it to the parent query to only include these rows."
...rest of query above
WHERE cpb.breakdown_type = 'requirement'
AND cpb.sys_id IN
(SELECT cpb2.sys_id
FROM cost_plan_breakdown cpb2
GROUP BY cpb2.name,
YEAR(cpb2.start_date_time),
MONTH(cpb2.start_date_time)
HAVING MAX(cpb2.sys_created_on))
At this point, I'm running into the error
cpb2.sys_id is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
I've previously semi-solved this by putting the MAX sys_created_on in the SELECT statement, and matching off that, but I realized that could pull in unwanted dupe records just because they match the sys_created_on of another.
I feel like the solution may be staring me in the face, but I'm stuck. Appreciate your help!
Use row_number to number the duplicate rows and then exclude them. Ordering the row number by sys_created_on desc ensures you get the latest of each per month.
declare #Test table (sys_id varchar(6), sys_created_on datetime2(0), cost_plan varchar(32), breakdown_start_date date);
insert into #Test (sys_id, sys_created_on, cost_plan, breakdown_start_date)
values
('axr123', '2020-10-01 09:31:15', 'Outlook KTLO - Lisa Lymon', '10-01-2020'),
('pqo100', '2020-12-23 05:50:20', 'Outlook KTLO - Lisa Lymon', '10-01-2020'),
('cji985', '2020-10-01 09:31:15', 'Outlook KTLO - Lisa Lymon', '11-01-2020'),
('twg795', '2020-10-05 13:23:08', 'DataPyramid CTB - Dave Dods', '10-01-2020'),
('jqr820', '2020-09-28 16:11:54', 'Revoluccion CTB - Marcus Vance', '11-01-2020'),
('vjo150', '2021-01-13 11:10:09', 'Server KTLO - Tom Smith', '10-01-2020');
with cte as (
select *
, row_number() over (partition by cost_plan, datepart(year,breakdown_start_date), datepart(month,breakdown_start_date) order by sys_created_on desc) rn
from #Test
)
select *
from cte
where rn = 1;
As per your comments this (the CTE) is just a neat way to write a sub-query/derived table and can still be written as follows:
select *
from (
select *
, row_number() over (partition by cost_plan, datepart(year,breakdown_start_date), datepart(month,breakdown_start_date) order by sys_created_on desc) rn
from #Test
) cte
where rn = 1;
Note: If you provide DDL+DML as shown above you make it much easier for people to assist.

Query only latest entry for each user

The goal of the following query is, to check whether a user has been terminated in the system in time. So there is a table that contains information about the system termination (how and when the user has been terminated) and one with the termination date of the user. Since there are three ways to terminate a user, some users have several termination entries. In the end, I only want to see the latest entry before their termination date if they have been terminated at all. All Date fields are INT fields.
Current Query:
Select
B.TerminationApproach,
B.SystemTerminationDate,
A.UserName,
A.LastName,
A.FirstName,
A.TerminationDate,
Case
When B.SystemTerminationDate <= A.TerminationDate Then 0
Else 1
End As EvalCheck
From A
Left Join B On B.User = A.UserName
Current Result:
TerminationApproach SystemTerminationDate TerminationDate UserNAme LastName FirstName EvalCheck
No profiles 20180301 20180226 AWALL Wall Aaron 1
Locally locked 20181027 20180226 AWALL Wall Aaron 1
Deleted 20180301 20180226 AWALL Wall Aaron 1
No profiles 20180301 20180301 CBLAIR Blair Carlos 0
Locally locked 20181027 20180301 CBLAIR Blair Carlos 1
No profiles 20180301 20180301 CBLAIR Blair Carlos 0
Then there is a third table which contains user activity. I need to map the results of my first query to the user activity, to see whether the user has performed changes after in the system after his termination date. The third table looks like this:
UserID Date Activity
AWALL 20180227 Table Change
So with my example, the end result of my query should look like this:
TerminationApproach SystemTerminationDate TerminationDate UserNAme LastName FirstName EvalCheck ActivityAfterTermination
No profiles 20180301 20180226 AWALL Wall Aaron 1 Yes
No profiles 20180301 20180301 CBLAIR Blair Carlos 0 No
Your question and your query are rather disconnected. It is not clear what columns represent a user or the date you care about.
But the basic idea is:
with t as (
<your query here>
)
select t.*
from (select t.*,
row_number() over (partition by UserId order by date desc) as seqnum
from t
) t
where seqnum = 1;

SQL Server 2000: need to return record ID from a previous record in current query

I work on a help-desk and am doing some analysis of PC repair tickets.
I am needing to dump data from our call log system that returns history of tickets for issues on computers where they were recently repaired by another team. We are simply trying to improve QA on deployed machines and this data will help.
I have the query for the analysis of tickets, but I am wanting to return the ticket number of the last PC repair case.
My current query is as follows:
SELECT
CallLog.CallID,
CallLog.CustID,
Subset.Rep_num,
Subset.FirstName,
Subset.LastName,
CallLog.OpndetailCat,
CallLog.Tracker_Full,
CallLog.RecvdDate,
FROM
heatPrd.dbo.CallLog CallLog,
heatPrd.dbo.Subset Subset
WHERE
CallLog.CallID = Subset.CallID AND
CallLog.RecvdDate>='2015-10-01' AND
CallLog.OpnAreaCat='back from repair'
ORDER BY
CallLog.CallID DESC
This returns
CallID CustID Rep_num FirstName LastName OpndetailCat Tracker_Full
2182375 1234 Sarah Doe Missing Email Folde
2181831 1235 JENNIFER Doe ZOTHER
2180815 1236 123 Jason Smith ZOTHER
2180790 1237 124 DARCY Doe Wrong Proxy Config
2180787 1239 125 Jason Smith ZOTHER
I want to add a column to the query that would return something to the effect of
select max(callid)
from calllog
where calltype = 'in_for_service_pc' and custid = '1234'
where calltype = 'in_for_service_pc' resides on the CallLog table and custID would pull from the query result.
This is a lot of info so i hope my request is clear.
Disclaimer: Data resides in SQL Server 2000 so some of the newer commands may not work.
Something like this should be pretty close.
SELECT
cl.CallID,
cl.CustID,
s.Rep_num,
s.FirstName,
s.LastName,
cl.OpndetailCat,
cl.Tracker_Full,
cl.RecvdDate,
x.MaxCallID
FROM heatPrd.dbo.CallLog cl
JOIN heatPrd.dbo.Subset s ON cl.CallID = s.CallID
left join
(
select max(cl2.callid) as MaxCallID
, cl2.custid
from calllog cl2
where cl2.calltype = 'in_for_service_pc'
group by cl2.custid
) x on x.custid = cl.custid
WHERE cl.RecvdDate >= '2015-10-01' AND
cl.OpnAreaCat = 'back from repair'
ORDER BY cl.CallID DESC

Logically merging 4 columns of the same information

I'm querying 3 different databases (4 total fields) for their "username" field given a particular machine name in our environment: SCCM, McAfee EPO, and ActiveDirectory.
The four columns are SCCM_TOP, SCCM_LAST, EPO, AD
Some of the tuples I get look like:
JOE, JOE, ADMINISTRATOR, JOE
or
JOE, SARAH, JOE, JOE
or
NULL, NULL, JOE, JOE
or
NULL, NULL, JOE, SARAH
The last example of which is the most difficult to code against.
I'm writing a CASE statement to help merge the information in an additive way to give one
final column of the "best guess". At the moment, I'm weighing the most valid username based on another column, which is "age of the record" from each database.
CASE
WHEN ePO_Age <= CT_AGE AND NOT ePO_UN IS NULL THEN ePO_UN
WHEN NOT (SCCM_AGE) IS NULL AND NOT (SCCM_LAST_UN) IS NULL THEN SCCM_LAST_UN
WHEN NOT (SCCM_AGE) IS NULL AND NOT (SCCM_TOP_UN) IS NULL THEN SCCM_TOP_UN
WHEN NOT (AD_UN) IS NULL THEN AD_UN
ELSE NULL
END AS BestName,
But there has to be a better way to combine these records into one. My next step is to weigh the "average age" and then pick the username from there, discarding "Administrator".
Any thoughts or tricks?
You could benefit a little from the COALESCE function to get the first NON-NULL value and do something like:
COALESCE(CASE WHEN ePO_Age<=CT_AGE THEN ePO_UN END,
CASE WHEN SCCM_AGE IS NOT NULL THEN COALESCE(SCCM_LAST_UN, SCCM_TOP_UN) END,
AD_UN) AS BestName
If you just want to get the most recent UserName that isn't null, try using UNION to combine the results from each table.
SELECT TOP 1 qry.UserName
FROM(
SELECT UserName, CreateDate
FROM UserNames_1
UNION ALL
SELECT UserName, CreateDate
FROM UserNames_2
UNION ALL
SELECT UserName, CreateDate
FROM UserNames_3
) AS qry
WHERE qry.UserName IS NOT NULL
ORDER BY qry.CreateDate DESC
Have a SQL Fiddle