Select latest and earliest times within a time group and a pivot statement - sql

I have attandance data that contains a username, time, and status (IN or OUT). I want to show attendance data that contains a name, and the check in/out times. I expect a person to check in and out no more than twice a day. The data looks like this:
As you can see, my problem is that one person can have multiple data entries in different seconds for the same login attempt. This is because I get data from a fingerprint attendace scanner, and the machine in some cases makes multiple entries, sometimes just within 5-10 seconds. I want to select the data to be like this:
How can I identify the proper time for the login attempt, and then select the data with a pivot?

First, you need to normalize your data by removing the duplicate entries. In your situation, that's a challenge because the duplicated data isn't easily identified as a duplicate. You can make some assumptions though. Below, I assume that no one will make multiple login attempts in a two minute window. You can do this by first using a Common Table Expression (CTE, using the WITH clause).
Within the CTE, you can use the LAG function. Essentially what this code is saying is "for each partition of user and entry type, if the previous value was within 2 minutes of this value, then put a number, otherwise put null." I chose null as the flag that will keep the value because LAG of the first entry is going to be null. So, your CTE will just return a table of entry events (ID) that were distinct attempts.
Now, you prepare another CTE that a PIVOT will pull from that has everything from your table, but only for the entry IDs you cared about. The PIVOT is going to look over the MIN/MAX of your IN/OUT times.
WITH UNIQUE_LOGINS AS (
SELECT ID FROM LOGIN_TABLE
WHERE CASE WHEN LAG(TIME, 1, 0) OVER (PARTITION BY USERNAME, STATUS ORDER BY TIME)
+ (2/60/24) < TIME THEN NULL ELSE 1 END IS NULL ), -- Times within 2 minutes
TEMP_FOR_PIVOT AS (
SELECT USERNAME, TIME, STATUS FROM LOGIN_TABLE WHERE ID IN (SELECT ID FROM UNIQUE_LOGINS)
)
SELECT * FROM TEMP_FOR_PIVOT
PIVOT (
MIN(TIME), MAX(TIME) FOR STATUS IN ('IN', 'OUT')
)
From there, if you need to rearrange or rename your columns, then you can just put that last SELECT into yet another CTE and then select your values from it. There is some more about PIVOT here: Rotate/pivot table with aggregation in Oracle

Related

INSERT INTO two columns from a SELECT query

I have a table called VIEWS with Id, Day, Month, name of video, name of browser... but I'm interested only in Id, Day and Month.
The ID can be duplicate because the user (ID) can watch a video multiple days in multiple months.
This is the query for the minimum date and the maximum date.
SELECT ID, CONCAT(MIN(DAY), '/', MIN(MONTH)) AS MIN_DATE,
CONCAT(MAX(DAY), '/', MAX(MONTH)) AS MAX_DATE,
FROM Views
GROUP BY ID
I want to insert this select with two columns(MIN_DATE and MAX_DATE) to two new columns with insert into.
How can be the insert into query?
To do what you are trying to do (there are some issues with your solution, please read my comments below), first you need to add the new columns to the table.
ALTER TABLE Views ADD MIN_DATE VARCHAR(10)
ALTER TABLE Views ADD MAX_DATE VARCHAR(10)
Then you need to UPDATE your new columns (not INSERT, because you don't want new rows). Determine the min/max for each ID, then join the result back to the table to be able to update each row. You can't update directly from a GROUP BY as rows are grouped and lose their original row.
;WITH MinMax
(
SELECT
ID,
CONCAT(MIN(V.DAY), '/', MIN(V.MONTH)) AS MIN_DATE,
CONCAT(MAX(V.DAY), '/', MAX(V.MONTH)) AS MAX_DATE
FROM
Views AS V
GROUP BY
ID
)
UPDATE V SET
MIN_DATE = M.MIN_DATE,
MAX_DATE = M.MAX_DATE
FROM
MinMax AS M
INNER JOIN Views AS V ON M.ID = V.ID
The problems that I see with this design are:
Storing aggregated columns: you usually want to do this only for performance issues (which I believe is not the case here), as querying the aggregated (grouped) rows is faster due to being less rows to read. The problem is that you will have to update the grouped values each time one of the original rows is updated, which as extra processing time. Another option would be periodically updating the aggregated values, but you will have to accept that for a period of time the grouped values are not really representing the tracking table.
Keeping aggregated columns on the same table as the data they are aggregating: this is normalization problem. Updating or inserting a row will trigger updating all rows with the same ID as the min/max values might have changed. Also the min/max values will always be repeated on all rows that belong to the same ID, which is extra space that you are wasting. If you had to save aggregated data, you need to save it on a different table, which causes the problems I listed on the previous point.
Using text data type to store dates: you always want to work dates with a proper DATETIME data type. This will not only enable to use date functions like DATEADD or DATEDIFF, but also save space (varchars that store dates need more bytes that DATETIME). I don't see the year part on your query, it should be considered to compute a min/max (this might depend what you are storing on this table).
Computing the min/max incorrectly: If you have the following rows:
ID DAY MONTH
1 5 1
1 3 2
The current result of your query would be 3/1 as MIN_DATE and 5/2 as MAX_DATE, which I believe is not what you are trying to find. The lowest here should be the 5th of January and the highest the 3rd of February. This is a consequence of storing date parts as independent values and not the whole date as a DATETIME.
What you usually want to do for this scenario is to group directly on the query that needs the data grouped, so you will do the GROUP BY on the SELECT that needs the min/max. Having an index by ID would make the grouping very fast. Thus, you save the storage space you would use to keep the aggregated values and also the result is always the real grouped result at the time that you are querying.
Would be something like the following:
;WITH MinMax
(
SELECT
ID,
CONCAT(MIN(V.DAY), '/', MIN(V.MONTH)) AS MIN_DATE, -- Date problem (varchar + min/max computed seperately)
CONCAT(MAX(V.DAY), '/', MAX(V.MONTH)) AS MAX_DATE -- Date problem (varchar + min/max computed seperately)
FROM
Views AS V
GROUP BY
ID
)
SELECT
V.*,
M.MIN_DATE,
M.MAX_DATE
FROM
MinMax AS M
INNER JOIN Views AS V ON M.ID = V.ID

Return All Historical Account Records for Accounts with Change in Corresponding Value

I am trying to select all records in a time-variant Account table for each account with a change in an associated value (e.g. the maturity date). A change in the value will result in the most recent record for an account being end-dated and a new record (containing a new effective date of the following day) being created. The most recent records for accounts in this table have an end-date of 12/31/9000.
For instance, in the below illustration, account 44444444 would not be included in my query result set since it hasn't had a change in the value (and thus also has no additional records aside from the original); however, the other accounts have multiple changes in values (and multiple records), so I would want to see those returned.
I've tried using the row_num function, as well as a reflexive join, but for some reason I'm not getting the expected results. What are some ways to obtain the results I need?
Note: The primary key for this table includes the acct_id and eff_dt. Also, I'm using PostgreSQL in a Greenplum environment.
Here are two types of queries I tried to use but which produced problematic results:
Query 1
Query 2
If you want only the accounts, use aggregation:
select acct_id
from t
group by acct_id
having min(value) <> max(value);
Based on your description, you could also use count(*) >.
If you want the original records, you can use window functions:
select t.*
from (select t.*, count(*) over (partition by acct_id) as cnt
from t
) t
where cnt > 1;

Sum column total based on criteria sql

I am not even sure if I am asking this question correctly. Algorithmically I know what I want to do, but don't know the appropriate syntax in SQL.
I have created a table that contains total online session times by customer number, IP, session start time, and total session length. Here is an example of what this table looks like(ip and CustNo is masked, also not sure how to make tables so excuse the weirdness):
CustNo minDate maxDate ClientIp timeDiff
123456 2017-11-14-02:39:27.093 2017-11-14-02:39:59.213 1.1.1.1 0.000372
I then create another table looking for a specific type of activity and want to know how long this specific user has used that IP for before this specific activity. The second table contains each activity as a separate row, customerID, IP and a timestamp.
Up to here no issue and the tables look fine.
I now need to write the part that will look into the first table based on customer ID and IP, then sum all usage of that IP for that customer as long as session min start time is less than the activity time but I have no idea how to do this. Here is the current function (not working obviously). I am doing a left join because it is possible this will be a new IP and it may not be in the first table.
SELECT
*,
SUM(##finalSessionSums.timeDiff)
FROM
##allTransfersToDiffReceip
LEFT JOIN
##finalSessionSums ON ##allTransfersToDiffReceip.CustNo = ##finalSessionSums.CustNo
AND ##allTransfersToDiffReceip.ClientIp = ##finalSessionSums.ClientIp
AND ##allTransfersToDiffReceip.[DateTime] < ##finalSessionSums.minDate
I get an aggregate function error here but I don't know how to approach this at all.
You have a SELECT * (return all columns) and an aggregate function (In this case SUM). Whenever you combine specific columns for return alongside aggregate, summarised values you need to stipulate each column specified in the SELECT clause in the GROUP BY clause. For example
SELECT
A, B, SUM(C) as CSum
FROM
Table
GROUP BY
A, B
In cause of the few information, I can't provide a perfect solution, but I'll give it a try:
First, like Alan mentioned, you have to select only columns that you need for your aggregate-function, which is CustomerNo and Ip. To get the sums of the query, you have to group it like this:
SELECT sum(s.timeDiff) as Sum, s.custNo, s.Ip
FROM ##finalSessionSums s
INNER JOIN ##allTransfersToDiffReceip a on a.CustNo = s.CustNo
AND a.ClientIp = s.ClientIp
AND a.[DateTime] < s.minDate
GROUP BY s.custNo, s.Ip;

SQL Server: I have multiple records per day and I want to return only the first of the day

I have some records track inquires by DATETIME. There is an glitch in the system and sometimes a record will enter multiple times on the same day. I have a query with a bunch of correlated subqueries attached to these but the numbers are off because when there were those glitches in the system then these leads show up multiple times. I need the first entry of the day, I tried fooling around with MIN but I couldn't quite get it to work.
I currently have this, I am not sure if I am on the right track though.
SELECT SL.UserID, MIN(SL.Added) OVER (PARTITION BY SL.UserID)
FROM SourceLog AS SL
Here's one approach using row_number():
select *
from (
select *,
row_number() over (partition by userid, cast(added as date) order by added) rn
from sourcelog
) t
where rn = 1
You could use group by along with min to accomplish this.
Depending on how your data is structured if you are assigning a unique sequential number to each record created you could just return the lowest number created per day. Otherwise you would need to return the ID of the record with the earliest DATETIME value per day.
--Assumes sequential IDs
select
min(Id)
from
[YourTable]
group by
--the conversion is used to stip the time value out of the date/time
convert(date, [YourDateTime]

Finding first pending events from SQL table

I have a table which contains TV Guide data.
In a simplified form, the columns look like the following...
_id, title, start_time, end_time, channel_id
What I'm trying to do is create a list of TV shows in a NOW/NEXT format. Generating the 'NOW' list (what's currently being broadcast) is easy but trying to get a list of what is showing 'NEXT' is causing me problems.
I tried this...
SELECT * from TV_GUIDE where start_time >= datetime('now') GROUP BY channel_id
Sure enough this gives me one TV show for each TV channel_id but it gives me the very last shows (by date/time) in the TV_GUIDE table.
SQL isn't my strong point and I'm struggling to work out why only the last TV shows are returned. It seems I need to do a sub-query of a query (or a query of a sub-query). I've tried combinations of ORDER BY and LIMIT but they don't help.
I believe that you first must select the couples (channel_id, ) from TV_GUIDE, based on your criteria.
Then, you will display those records of TV_GUIDE which match those criteria:
SELECT source.* from TV_GUIDE AS source
JOIN (SELECT channel_id, MIN(start_time) AS start_time
FROM TV_GUIDE
WHERE start_time >= now() GROUP BY channel_id ) AS start_times
ON (source.channel_id = start_times.channel_id
AND source.start_time = start_times.start_time)
ORDER BY channel_id;
This first selects all shows with minimum start time, one for each channel, thus giving you the channel id and the start time. Then fills in the other information (MySQL sometimes lets you retrieve that from a single query, but I feel it's a bad habit to acquire - maybe you add a field and it won't work anymore) from a JOIN with the same table.
You might want to add an index on the combined fields (start_time, channel_id). Just to be on the safe side, make it a UNIQUE index.