Updating rows in a view from a complex select statement - sql

I have been working on a query that identifies an issue with the data in my database:
SELECT t1.*
FROM [DailyTaskHours] t1
INNER JOIN (
SELECT ActivityDate
,taskId
,EnteredBy
FROM [DailyTaskHours]
WHERE hours != 0
GROUP BY EnteredBy
,taskId
,ActivityDate
HAVING COUNT(*) > 1
) t2 ON (
t1.ActivityDate = t2.ActivityDate
AND t1.taskId = t2.taskId
AND t1.EnteredBy = t2.EnteredBy
AND t1.Hours != 0
)
ORDER BY ActivityDate
What this does is find duplicate hours booked for the same person on the same task on the same day:
Now that I found the issues I want to correct them with an UPDATE. I want the duplicate activity that was created earlier than the other to move the value from Hours to doubleBookedHours and for Hours to be zeroed out. Secondly, I want the more recent row's DoubleBookedFLag column to be updated to 1.
How can I achieve this?

You can write a SQL Server Agent Job to call T-SQL or a SSIS package to perform your logic.
I always like using pseudo code when designing my algorithm.
For instance.
Find duplicate entries and save them to a temporary table, either in a staging area or tempdb. Some location that is accessible by multiple processes (spids).
Find least recent records (1+). Move hours to double booked column?
Find least recent records (1+). Zero out hours column.
Update the most recent record to have double book flag column set to 1.
You were not specific on moving the value from hours to double booked hours. Are these columns?
In short, a SQL Server Agent job and several correct T-SQL steps should solve your problem.

Related

BigQuery Join Using Most Recent Row

I have seen variations of this question but have been searching StackOverflow for almost a week now trying various solutions and still struggling with this. Really appreciate you taking the time to consider my question.
I am working on a research project in GCP using BigQuery. I have a table result of ~100 million rows of events where there is a session_id column that relates to the session that the event originated from. I would like to join this with another table status of about 40 million rows that has that same session_id and tracks the status of those sessions. Both tables have a time column. In the result table, this is the time of the event. In the status table this is the time of any status changes. I want to join the rows in the result table with the corresponding row in the status table for the most recent state of the session up to or before the time of the event using the session ID. The result would be that each row in the result table would have the corresponding information about the state of the session when the event occurred.
How can I achieve this? Any way to do it that won't be really inefficient? Thank you so much for your help!
You may be able to use a left join:
select r.*, s.status -- choose whatever columns you want
from result r left join
(select s.*,
lead(time) over (partition by session_id order by time) as next_time
from status s
) s
on r.session_id = s.session_id and
r.time <= s.time and
(r.time > s.next_time or s.next_time is null)

Query build to find records where all of a series of records have a value

Let me explain a little bit about what I am trying to do because I dont even know the vocab to use to ask. I have an Access 2016 database that records staff QA data. When a staff member misses a QA we assign a job aid that explains the process and they can optionally send back a worksheet showing they learned about what was missed. If they do all of these ina 3 month period they get a credit on their QA score. So I have a series of records all of whom have a date we assigned the work(RA1) and MAY have a work returned date(RC1).
In the below image "lavalleer" has earned the credit because both of her sheets got returned. "maduncn" Did not earn the credit because he didn't do one.
I want to create a query that returns to me only the people that are like "lavalleer". I tried hitting google and searched here and access.programmers.co.uk but I'm only coming up with instructions to use Not null statements. That wouldn't work for me because if I did a IS Not Null on "maduncn" I would get the 4 records but it would exclude the null.
What I need to do is build a query where I can see staff that have dates in ALL of their RC1 fields. If any of their RC1 fields are blank I dont want them to return.
Consider:
SELECT * FROM tablename WHERE NOT UserLogin IN (SELECT UserLogin FROM tablename WHERE RCI IS NULL);
You could use a not exists clause with a correlated subquery, e.g.
select t.* from YourTable t where not exists
(select 1 from YourTable u where t.userlogin = u.userlogin and u.rc1 is null)
Here, select 1 is used purely for optimisation - we don't care what the query returns, just that it has records (or doesn't have records).
Or, you could use a left join to exclude those users for which there is a null rc1 record, e.g.:
select t.* from YourTable t left join
(select u.userlogin from YourTable u where u.rc1 is null) v on t.userlogin = v.userlogin
where v.userlogin is null
In all of the above, change all occurrences of YourTable to the name of your table.

Select latest and earliest times within a time group and a pivot statement

I have attandance data that contains a username, time, and status (IN or OUT). I want to show attendance data that contains a name, and the check in/out times. I expect a person to check in and out no more than twice a day. The data looks like this:
As you can see, my problem is that one person can have multiple data entries in different seconds for the same login attempt. This is because I get data from a fingerprint attendace scanner, and the machine in some cases makes multiple entries, sometimes just within 5-10 seconds. I want to select the data to be like this:
How can I identify the proper time for the login attempt, and then select the data with a pivot?
First, you need to normalize your data by removing the duplicate entries. In your situation, that's a challenge because the duplicated data isn't easily identified as a duplicate. You can make some assumptions though. Below, I assume that no one will make multiple login attempts in a two minute window. You can do this by first using a Common Table Expression (CTE, using the WITH clause).
Within the CTE, you can use the LAG function. Essentially what this code is saying is "for each partition of user and entry type, if the previous value was within 2 minutes of this value, then put a number, otherwise put null." I chose null as the flag that will keep the value because LAG of the first entry is going to be null. So, your CTE will just return a table of entry events (ID) that were distinct attempts.
Now, you prepare another CTE that a PIVOT will pull from that has everything from your table, but only for the entry IDs you cared about. The PIVOT is going to look over the MIN/MAX of your IN/OUT times.
WITH UNIQUE_LOGINS AS (
SELECT ID FROM LOGIN_TABLE
WHERE CASE WHEN LAG(TIME, 1, 0) OVER (PARTITION BY USERNAME, STATUS ORDER BY TIME)
+ (2/60/24) < TIME THEN NULL ELSE 1 END IS NULL ), -- Times within 2 minutes
TEMP_FOR_PIVOT AS (
SELECT USERNAME, TIME, STATUS FROM LOGIN_TABLE WHERE ID IN (SELECT ID FROM UNIQUE_LOGINS)
)
SELECT * FROM TEMP_FOR_PIVOT
PIVOT (
MIN(TIME), MAX(TIME) FOR STATUS IN ('IN', 'OUT')
)
From there, if you need to rearrange or rename your columns, then you can just put that last SELECT into yet another CTE and then select your values from it. There is some more about PIVOT here: Rotate/pivot table with aggregation in Oracle

Compare 2 tables and add missing records to the first, taking into account year/months

I have 2 tables, one with codes and budgets called FACT_QUANTITY_TMP and the other is a tree with all possible codes called C_DS_BD_AP_A.
All codes that exist are in this C_DS_BD_AP_A table, yet not all are in FACT_QUANTITY_TMP. Only those with budget get added by the ERP.
We need all codes to be in this FACT_QUANTITY_TMP table, just with budget to be 0 in that case.
I was trying first to get the missing codes by the following query:
SELECT T2.D_ACTIECODE From
(SELECT distinct
A.FULL_DATE as FULL_DATE, A.DIM03 as DIM03
FROM FACT_QUANTITY_TMP A) T1
RIGHT JOIN
(select distinct B.D_ACTIECODE AS D_ACTIECODE from C_DS_BD_AP_A B) T2
ON
T1.DIM03 = T2.D_ACTIECODE
where T1.DIM03 is null
order by T1.full_date
I get a list of my missing records yet it doesn't take into accounts the FULL_DATE (year and month) of the destination table.
In short, FACT_QUANTITY_TMP needs to have all records added that it's missing grouped by months and year.
Kind of looking for the best approach here, this query would be used in a automatically run stored proc every month when the ERP data gets pulled.
You can generate the missing records by doing a cross join to generate all combinations and then removing those that are already there. For example:
select fd.fulldate, c.D_ACTIECODE
from (select distinct fulldate from fact_quantity_tmp) fd cross join
(select D_ACTIECODE from C_DS_BD_AP_A) c left join
fact_quantity_tmp fqt
on fqt.fulldate = fd.fulldate and fqt.dim03 = c.D_ACTIECODE
where fqt.fulldate is null;
You can put an insert before this to insert these rows into the fact table.

Select Last Updated Row with condition

I'm working on building a workload tracking system, I have a table that currently has listed all the tasks to be completed (each with a unique ID), but also has all the updates with a datestamp so that I can track how long it took for the status to be updated.
My dilemma is that for a form I want to query only the latest update, currently the select query shows both the original task and the updated task separately.
In words, I guess what I need to do is to select only a task given that the ID is the last one with that same task number (which is different than the ID, there will be duplicates when it is updated)
So if I have:
ID Task Date
1 A 4/30/13
2 B 5/2/13
3 A 5/3/13
That the table only shows:
ID Task Date
3 A 5/3/13
2 B 5/2/13
How can I do this? I think I'm missing something simple...
There are multiple ways to approach this query, even in Access. Here is a way using in with a subquery:
select t.*
from t
where t.id in (select MAX(id) as maxid
from t
group by task
)
order by task
The subquery finds the maximum ids for all the tasks. It then returns the rows from the original table that match those ids.