Querying data for current day and only the most recent entry - sql

I'm new to Informix and am trying to figure out some of the syntax. I have a table that stores agent status information. I only want to pull two things from the table;
1 - The current day's rows
2 - Only the most recent entry for each agent
So, a query of
select limit 5 agentid, eventdatetime from agentstatedetail order by eventdatetime desc
will yield;
+------------+------------------------+
| agentid | eventdatetime |
+------------+------------------------+
| 1552 | 2013-12-04 16:48:20.122|
| 1482 | 2013-12-04 16:48:18.897|
| 1439 | 2013-12-04 16:48:17.754|
| 1188 | 2013-12-04 16:48:15.972|
| 788 | 2013-12-04 16:48:15.190|
+------------+------------------------+
The Informix syntax seems a bit different from mysql. How can I pull this kind of information? I tried using the "today" modifier, but it doesn't work the way I thought it would.

You could get the max eventdatetime for each agentid that has a row for today with:
SELECT agentid,max(eventdatetime)
FROM agentstatedetail
WHERE date(eventdatetime) = TODAY
GROUP BY agentid;
p.s. To get the same effect of LIMIT 5 in informix you do SELECT FIRST 5 ...

Hope this work:
select A.agentid, A.eventdatetime
from agentstatedetail A
where
date(A.eventdatetime) = TODAY
and A.eventdatetime = ( select MAX(B.eventdatetime) from agentstatedetail B where A.agentid = B.agentid)

Related

Querying the retention rate on multiple days with SQL

Given a simple data model that consists of a user table and a check_in table with a date field, I want to calculate the retention date of my users. So for example, for all users with one or more check ins, I want the percentage of users who did a check in on their 2nd day, on their 3rd day and so on.
My SQL skills are pretty basic as it's not a tool that I use that often in my day-to-day work, and I know that this is beyond the types of queries I am used to. I've been looking into pivot tables to achieve this but I am unsure if this is the correct path.
Edit:
The user table does not have a registration date. One can assume it only contains the ID for this example.
Here is some sample data for the check_in table:
| user_id | date |
=====================================
| 1 | 2020-09-02 13:00:00 |
-------------------------------------
| 4 | 2020-09-04 12:00:00 |
-------------------------------------
| 1 | 2020-09-04 13:00:00 |
-------------------------------------
| 4 | 2020-09-04 11:00:00 |
-------------------------------------
| ... |
-------------------------------------
And the expected output of the query would be something like this:
| day_0 | day_1 | day_2 | day_3 |
=================================
| 70% | 67 % | 44% | 32% |
---------------------------------
Please note that I've used random numbers for this output just to illustrate the format.
Oh, I see. Assuming you mean days between checkins for users -- and users might have none -- then just use aggregation and window functions:
select sum( (ci.date = ci.min_date)::numeric ) / u.num_users as day_0,
sum( (ci.date = ci.min_date + interval '1 day')::numeric ) / u.num_users as day_1,
sum( (ci.date = ci.min_date + interval '2 day')::numeric ) / u.num_users as day_2
from (select u.*, count(*) over () as num_users
from users u
) u left join
(select ci.user_id, ci.date::date as date,
min(min(date::date)) over (partition by user_id order by date) as min_date
from checkins ci
group by user_id, ci.date::date
) ci;
Note that this aggregates the checkins table by user id and date. This ensures that there is only one row per date.

Creating user time report that includes zero hour weeks

I'm having a heck of a time putting together a query that I thought would be quite simple. I have a table that records total hours spent on a task and the user that reported those hours. I need to put together a query that returns how many hours a given user charged to each week of the year (including weeks where no hours were charged).
Expected Output:
|USER_ID | START_DATE | END_DATE | HOURS |
-------------------------------------------
|'JIM' | 4/28/2019 | 5/4/2019 | 6 |
|'JIM' | 5/5/2019 | 5/11/2019 | 0 |
|'JIM' | 5/12/2019 | 5/18/2019 | 16 |
I have a function that returns the start and end date of the week for each day, so I used that and joined it to the task table by date and summed up the hours. This gets me very close, but since I'm joining on date I obviously end up with NULL for the USER_ID on all zero hour rows.
Current Output:
|USER_ID | START_DATE | END_DATE | HOURS |
-------------------------------------------
|'JIM' | 4/28/2019 | 5/4/2019 | 6 |
| NULL | 5/5/2019 | 5/11/2019 | 0 |
|'JIM' | 5/12/2019 | 5/18/2019 | 16 |
I've tried a few other approaches, but each time I end up hitting the same problem. Any ideas?
Schema:
---------------------------------
| TASK_LOG |
---------------------------------
|USER_ID | DATE_ENTERED | HOURS |
-------------------------------
|'JIM' | 4/28/2019 | 6 |
|'JIM' | 5/12/2019 | 6 |
|'JIM' | 5/13/2019 | 10 |
------------------------------------
| DATE_HELPER_TABLE |
|(This is actually a function, but I|
| put it in a table to simplify) |
-------------------------------------
|DATE | START_OF_WEEK | END_OF_WEEK |
-------------------------------------
|5/3/2019 | 4/28/2019 | 5/4/2019 |
|5/4/2019 | 4/28/2019 | 5/4/2019 |
|5/5/2019 | 5/5/2019 | 5/11/2019 |
| ETC ... |
Query:
SELECT HRS.USER_ID
,DHT.START_OF_WEEK
,DHT.END_OF_WEEK
,SUM(HOURS)
FROM DATE_HELPER_TABLE DHT
LEFT JOIN (
SELECT TL.USER_ID
,TL.HOURS
,DHT2.START_OF_WEEK
,DHT2.END_OF_WEEK
FROM TASK_LOG TL
JOIN DATE_HELPER_TABLE DHT2 ON DHT2.DATE_VALUE = TL.DATE_ENTERED
WHERE TL.USER_ID = 'JIM1'
) HRS ON HRS.START_OF_WEEK = DHT.START_OF_WEEK
GROUP BY USER_ID
,DHT.START_OF_WEEK
,DHT.END_OF_WEEK
ORDER BY DHT.START_OF_WEEK
http://sqlfiddle.com/#!18/02d43/3 (note: for this sql fiddle, I converted my date helper function into a table to simplify)
Cross join the users (in question) and include them in the join condition. Use coalesce() to get 0 instead of NULL for the hours of weeks where no work was done.
SELECT u.user_id,
dht.start_of_week,
dht.end_of_week,
coalesce(sum(hrs.hours), 0)
FROM date_helper_table dht
CROSS JOIN (VALUES ('JIM1')) u (user_id)
LEFT JOIN (SELECT tl.user_id,
dht2.start_of_week,
tl.hours
FROM task_log tl
INNER JOIN date_helper_table dht2
ON dht2.date_value = tl.date_entered) hrs
ON hrs.user_id = u.user_id
AND hrs.start_of_week = dht.start_of_week
GROUP BY u.user_id,
dht.start_of_week,
dht.end_of_week
ORDER BY dht.start_of_week;
I used a VALUES clause here to list the users. If you only want to get the times for particular users you can do so too (or use any other subquery, or ...). Otherwise you can use your user table (which you didn't post, so I had to use that substitute).
However the figures that are produced by this (and your original query) look strange to me. In the fiddle your user has worked for a total of 23 hours in the task_log table. Yet your sums in the result are 24 and 80, that is way to much on its own and even worse taking into account, that 1 hour in task_log isn't even on a date listed in date_helper_table.
I suspect you get more accurate figures if you just join task_log, not that weird derived table.
SELECT u.user_id,
dht.start_of_week,
dht.end_of_week,
coalesce(sum(tl.hours), 0)
FROM date_helper_table dht
CROSS JOIN (VALUES ('JIM1')) u (user_id)
LEFT JOIN task_log tl
ON tl.user_id = u.user_id
AND tl.date_entered = dht.date_value
GROUP BY u.user_id,
dht.start_of_week,
dht.end_of_week
ORDER BY dht.start_of_week;
But maybe that's just me.
SQL Fiddle
http://sqlfiddle.com/#!18/02d43/65
Using your SQL fiddle, I simply updated the select statement to account for and convert null values. As far as I can tell, there is nothing in your post that makes this option not viable. Please let me know if this is not the case and I will update. (This is not intended to detract from sticky bit's answer, but to offer an alternative)
SELECT ISNULL(HRS.USER_ID, '') as [USER_ID]
,DHT.START_OF_WEEK
,DHT.END_OF_WEEK
,SUM(ISNULL(HOURS,0)) as [SUM]
FROM DATE_HELPER_TABLE DHT
LEFT JOIN (
SELECT TL.USER_ID
,TL.HOURS
,DHT2.START_OF_WEEK
,DHT2.END_OF_WEEK
FROM TASK_LOG TL
JOIN DATE_HELPER_TABLE DHT2 ON DHT2.DATE_VALUE = TL.DATE_ENTERED
WHERE TL.USER_ID = 'JIM1'
) HRS ON HRS.START_OF_WEEK = DHT.START_OF_WEEK
GROUP BY USER_ID
,DHT.START_OF_WEEK
,DHT.END_OF_WEEK
ORDER BY DHT.START_OF_WEEK
Create a dates table that includes all dates for the next 100 years in the first column, the week of the year, day of the month etc in the next.
Then select from that dates table and left join everything else. Do isnull function to replace nulls with zeros.

SQL Where Query to Return Distinct Values

I have an app that has the built in initial Select option and only allows me to enter from the Where section. I have rows with duplicate values. I'm trying to get the list of just one record for each distinct value but am unsure how to get the statement to work. I've found one that almost does the trick but it doesn't give me any rows that had a dup. I assume due to the = so just need a way to get one for each that matches my where criteria. Examples below.
Initial Data Set
Date | Name | ANI | CallIndex | Duration
---------------------------------------------------------
2/2/2015 | John | 5555051000 | 00000.0001 | 60
2/2/2015 | John | | 00000.0001 | 70
3/1/2015 | Jim | 5555051001 | 00000.0012 | 80
3/4/2015 | Susan | | 00000.0022 | 90
3/4/2015 | Susan | 5555051002 | 00000.0022 | 30
4/10/2015 | April | 5555051003 | 00000.0030 | 35
4/11/2015 | Leon | 5555051004 | 00000.0035 | 10
4/15/2015 | Jane | 5555051005 | 00000.0050 | 20
4/15/2015 | Jane | 5555051005 | 00000.0050 | 60
4/15/2015 | Kevin | 5555051006 | 00000.0061 | 35
What I Want the Query to Return
Date | Name | ANI | CallIndex | Duration
---------------------------------------------------------
2/2/2015 | John | 5555051000 | 00000.0001 | 60
3/1/2015 | Jim | 5555051001 | 00000.0012 | 80
3/4/2015 | Susan | 5555051002 | 00000.0022 | 30
4/10/2015 | April | 5555051003 | 00000.0030 | 35
4/11/2015 | Leon | 5555051004 | 00000.0035 | 10
4/15/2015 | Jane | 5555051005 | 00000.0050 | 20
4/15/2015 | Kevin | 5555051006 | 00000.0061 | 35
Here is what I was able to get but when i run it I don't get the rows that did have dups callindex values. duration doesn't mattern and they never match up so if it helps to query using that as a filter that would be fine. I've added mock data to assist.
use Database
SELECT * FROM table
WHERE Date between '4/15/15 00:00' and '4/15/15 23:59'
and callindex in
(SELECT callindex
FROM table
GROUP BY callinex
HAVING COUNT(callindex) = 1)
Any help would be greatly appreciated.
Ok with the assistance of everyone here i was able to get the query to work perfectly within SQL. That said apparently the app I'm trying this on has a built in character limit and the below query is too long. This is the query i have to use as far as the restrictions and i have to be able to search both ID's at the same time because some get stamped with one or the other rarely both. I'm hoping someone might be able to help me shorten it?
use Database
select * from tblCall
WHERE
flddate between '4/15/15 00:00' and '4/15/15 23:59'
and fldAgentLoginID='1234'
and fldcalldir='incoming'
and fldcalltype='external'
and EXISTS (SELECT * FROM (SELECT MAX(fldCallName) AS fldCallName, fldCallID FROM tblCall GROUP BY fldCallID) derv WHERE tblCall.fldCallName = derv.fldCallName AND tblCall.fldCallID = derv.fldCallID)
or
flddate between '4/15/15 00:00' and '4/15/15 23:59'
and '4/15/15 23:59'
and fldPhoneLoginID='56789'
and fldcalldir='incoming'
and fldcalltype='external'
and EXISTS (SELECT * FROM (SELECT MAX(fldCallName) AS fldCallName, fldCallID FROM tblCall GROUP BY fldCallID) derv WHERE tblCall.fldCallName = derv.fldCallName AND tblCall.fldCallID = derv.fldCallID)
If the constraint is that we can only add to the WHERE clause, I don't think it's possible, due to there being 2 absolutely identical rows:
4/15/2015 | Jane | 5555051005 | 00000.0050
4/15/2015 | Jane | 5555051005 | 00000.0050
Is it possible that you can add HAVING or GROUP BY to the WHERE? or possibly UNION the SELECT to another SELECT statement? That may open up some additional possibilities.
Maybe with an union:
SELECT *
FROM table
GROUP BY Date, Name, ANI, CallIndex
HAVING ( COUNT(*) > 1 )
UNION
SELECT *
FROM table
WHERE Name not in (SELECT name from table
GROUP BY Date, Name, ANI, CallIndex
HAVING ( COUNT(*) > 1 ))
From your sample, it seems like you could just exclude rows in which there was no value in the ANI column. If that is the case you could simply do:
use Database
SELECT * FROM table
WHERE Date between '4/15/15 00:00' and '4/15/15 23:59'
and ANI is not null
If this doesn't work for you, let me know and I can see what else I can do.
Edit:
You've made it sound like the CallIndex combined with the Duration is a unique value. That seems somewhat doubtful to me, but if that is the case you could do something like this:
use Database
SELECT * FROM table
WHERE Date between '4/15/15 00:00' and '4/15/15 23:59'
and cast(callindex as varchar(80))+'-'+cast(min(duration) as varchar(80)) in
(SELECT cast(callindex as varchar(80))+'-'+cast(min(duration) as varchar(80))
FROM table
GROUP BY callindex)
There are two keywords you can use to get non-duplicated data, either DISTINCT or GROUP BY. In this case, I would use a GROUP BY, but you should read up on both.
This query groups all of the records by CallIndex and takes the MAX value for each of the other columns and should give you the results you want:
SELECT MAX(Date) AS Date, MAX(Name) AS Name, MAX(ANI) AS ANI, CallIndex
FROM table
GROUP BY CallIndex
EDIT
Since you can't use GROUP BY directly but you can have any SQL in the WHERE clause you can do:
SELECT *
FROM table
WHERE EXISTS
(
SELECT *
FROM
(
SELECT MAX(Date) AS Date, MAX(Name) AS Name, MAX(ANI) AS ANI, CallIndex
FROM table
GROUP BY CallIndex
) derv
WHERE table.Date = derv.Date
AND table.Name = derv.Name
AND table.ANI = derv.ANI
AND table.CallIndex = derv.CallIndex
)
This selects all rows from the table where there exists a matching row from the GROUP BY.
It won't be perfect, if any two rows match exactly, you'll still have duplicates, but that's the best you'll get with your restriction.
In your data, why not just do this?
SELECT *
FROM table
WHERE Date >= '2015-04-15' and Date < '2015-04-16'
ani is not null;
If the blank values are only a coincidence, then you have a problem just using a where clause. If the results are full duplicates (no column has a different value), then you probably cannot do what you want with just a where clause -- unless you are using SQLite, Oracle, or Postgres.

DENSE RANK in both ways

Thanks to answers to my previous question, I could establish this kind of query
http://sqlfiddle.com/#!4/3e6f9/7/0
The result table shows work time right before each pause data.
From this, I would like to add another column which shows "work time right after each pause data", resulting as below.
STAFF | MAX(...) | START_TIME | END_TIME | MIN(...)
------+----------+------------+----------+---------
GC01 | 12:00 | 12:03 | 12:07 | 12:10
GC01 | 12:20 | 12:25 | 12:35 | 12:40
GC02 | 12:33 | 12:35 | 12:45 | (null)
I think this is what you're looking for. You need to rejoin to the work table:
select p.staff,
max(w.work_time) keep (dense_rank first order by w.work_time desc),
p.start_time, p.end_time,
min(w2.work_time) keep (dense_rank first order by w2.work_time)
from pause p
join work w
on p.staff = w.staff
and p.start_time >= w.work_time
left join work w2
on p.staff = w2.staff
and p.end_time <= w2.work_time
group by p.staff, p.start_time, p.end_time
Updated SQL Fiddle

Select Earliest Date and Time from List of Distinct User Sessions

I have a table of user access sessions which records website visitor activity:
accessid, userid, date, time, url
I'm trying to retrieve all distinct sessions for userid 1234, as well as the earliest date and time for each of those distinct sessions.
SELECT
DISTINCT accessid,
date,
time
FROM
accesslog
WHERE userid = '1234'
GROUP BY accessid
This gives me the date and time of a random row within each distinct accessid. I've read a number of posts recommending the use of min() and max(), so I tried:
SELECT DISTINCT accessid, MIN(DATE) AS date, MIN(TIME) AS time FROM accesslog WHERE userid = '1234' GROUP BY accessid ORDER BY date DESC, time DESC
... and even...
SELECT DISTINCT accessid, MIN(CONCAT(DATE, ' ', TIME)) AS datetime FROM accesslog WHERE userid = '1234' GROUP BY accessid ORDER BY date DESC, time DESC
... but I never get the correct result of the earliest date and time.
What is the trick to ordering this kind of query?
EDIT -
Something weird is happening....
The code posted below by Bill Karwin correctly retrieves the earliest date and time for sessions that started in 2009-09. But, for sessions that began on some day in 2009-08, the time and date for the first hit occurring in the current month is what is returned. In other words, the query does not appear to be spanning months!
Example data set:
accessid | userid | date | time
1 | 1234 | 2009-08-15 | 01:01:01
1 | 1234 | 2009-09-01 | 12:01:01
1 | 1234 | 2009-09-15 | 13:01:01
2 | 1234 | 2009-09-01 | 14:01:01
2 | 1234 | 2009-09-15 | 15:01:01
At least on my actual data table, the query posted below finds the follow earliest date and time for each of the two accessid's:
accessid | userid | date | time
1 | 1234 | 2009-09-01 | 12:01:01
2 | 1234 | 2009-09-01 | 14:01:01
... and I would guess that the only reason the result for accessid 2 appears correct is because it has no hits in a previous month.
Am I going crazy?
EDIT 2 -
The answer is yes, I am going crazy. The query works on the above sample data when placed in a table of duplicate structure.
Here is the (truncated) original data. I included the very first hit, another hit in the same month, the first hit of the next month, and then the last hit of the month. The original data set has many more hits in between these points, for a total of 462 rows.
accessid | date | time
cbb82c08d3103e721a1cf0c3f765a842 | 2009-08-18 | 04:01:42
cbb82c08d3103e721a1cf0c3f765a842 | 2009-08-23 | 23:18:52
cbb82c08d3103e721a1cf0c3f765a842 | 2009-09-17 | 05:12:16
cbb82c08d3103e721a1cf0c3f765a842 | 2009-09-18 | 06:29:59
... the query returns the 2009-09-17 value as the earliest value when the original table is queried. But, when I copy the ........ oh, balls.
It's because the hits from 2009-08% have an empty userid field.
This is a variation of the "greatest-n-per-group" problem that comes up on StackOverflow several times per week.
SELECT
a1.accessid,
a1.date,
a1.time
FROM
accesslog a1
LEFT OUTER JOIN
accesslog a2
ON (a1.accessid = a2.accessid AND a1.userid = a2.userid
AND (a1.date > a2.date OR a1.date = a2.date AND a1.time > a2.time))
WHERE a1.userid = '1234'
AND a2.accessid IS NULL;
The way this works is that we try to find a row (a2) that has the same accessid and userid, and an earlier date or time than the row a1. When we can't find an earlier row, then a1 must be the earliest row.
Re your comment, I just tried it with the sample data you provided. Here's what I get:
+----------+------------+----------+
| accessid | date | time |
+----------+------------+----------+
| 1 | 2009-08-15 | 01:01:01 |
| 2 | 2009-09-01 | 14:01:01 |
+----------+------------+----------+
I'm using MySQL 5.0.75 on Mac OS X.
Try this
SELECT
accessid,
date,
time
FROM
accesslog
WHERE userid = '1234'
GROUP BY accessid
HAVING MIN(date)
It will return all unique accesses with minimum time for each for userid = '1234'.