DENSE RANK in both ways - sql

Thanks to answers to my previous question, I could establish this kind of query
http://sqlfiddle.com/#!4/3e6f9/7/0
The result table shows work time right before each pause data.
From this, I would like to add another column which shows "work time right after each pause data", resulting as below.
STAFF | MAX(...) | START_TIME | END_TIME | MIN(...)
------+----------+------------+----------+---------
GC01 | 12:00 | 12:03 | 12:07 | 12:10
GC01 | 12:20 | 12:25 | 12:35 | 12:40
GC02 | 12:33 | 12:35 | 12:45 | (null)

I think this is what you're looking for. You need to rejoin to the work table:
select p.staff,
max(w.work_time) keep (dense_rank first order by w.work_time desc),
p.start_time, p.end_time,
min(w2.work_time) keep (dense_rank first order by w2.work_time)
from pause p
join work w
on p.staff = w.staff
and p.start_time >= w.work_time
left join work w2
on p.staff = w2.staff
and p.end_time <= w2.work_time
group by p.staff, p.start_time, p.end_time
Updated SQL Fiddle

Related

How to create a table that loops over data in Postgres

I want to create a table that returns the top 10 aggregate cons_name over a given week, that repeats every day.
So for 5/29/2019 it will pull the top 10 cons_name by their sum dating back to 5/22/2019.
Then, for 5/28/2019, the top 10 cons_name by their sum back to 5/21/2019.
A table of top 10 dating back 7 days all the way to 2018-12-01.
I can make the simple code dating back 7 days but, I have tried Windows to no avail.
SELECT cons_name,
pricedate,
sum(shadow)
FROM spp.rtbinds
WHERE pricedate >= current_date - 7
GROUP BY cons_name, shadow, pricedate
ORDER BY shadow asc
LIMIT 10
This query generates the output below
cons_name pricedate sum
"TEMP17_24078" "2019-05-28 00:00:00" "-1473.29723333333"
"TEMP17_24078" "2019-05-28 00:00:00" "-1383.56638333333"
"TMP175_24736" "2019-05-23 00:00:00" "-1378.40504166667"
"TMP159_24149" "2019-05-23 00:00:00" "-1328.847675"
"TMP397_24836" "2019-05-23 00:00:00" "-1221.19560833333"
"TEMP17_24078" "2019-05-28 00:00:00" "-1214.9914"
"TMP175_24736" "2019-05-23 00:00:00" "-1123.83254166667"
"TEMP72_22893" "2019-05-29 00:00:00" "-1105.93840833333"
"TMP164_23704" "2019-05-24 00:00:00" "-1053.051375"
"TMP175_24736" "2019-05-27 00:00:00" "-1043.52104166667"
I would like a table and function that returns a table of each day's top 10 dating back a week.
Using window functions get's you on the right track but you should be reading further in the documentation about the possibilities.
We have multiple issues here that we need to solve:
gaps in the data (missing pricedate) not get us the correct number of rows (7) to calculate the overall sum
for the calculation itself we need all data rows so the WHERE clause cannot be used to limit only to the visible days
in order to select the top-10 for each day, we have to generate a row number per partition because the LIMIT clause cannot be applied per group
This is why I came up with the following CTE's:
CTE days: generate the gap-less date series and mark visible days
CTE daily: LEFT JOIN the data to the generated days and produce daily sums (and handle NULL entries)
CTE calc: produce the cumulative sums
CTE numbered: produce row numbers reset each day
select the actual visible rows and limit them to max. 10 per day
So for a specific week (2019-05-26 - 2019-06-01), the query will look like the following:
WITH
days (c_day, c_visible, c_lookback) as (
SELECT gen::date, (CASE WHEN gen::date < '2019-05-26' THEN false ELSE true END), gen::date - 6
FROM generate_series('2019-05-26'::date - 6, '2019-06-01'::date, '1 day'::interval) AS gen
),
daily (cons_name, pricedate, shadow_sum) AS (
SELECT
r.cons_name,
r.pricedate::date,
coalesce(sum(r.shadow), 0)
FROM days
LEFT JOIN spp.rtbinds AS r ON (r.pricedate::date = days.c_day)
GROUP BY 1, 2
),
calc (cons_name, pricedate, shadow_sum) AS (
SELECT
cons_name,
pricedate,
sum(shadow_sum) OVER (PARTITION BY cons_name ORDER BY pricedate ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
FROM daily
),
numbered (cons_name, pricedate, shadow_sum, position) AS (
SELECT
calc.cons_name,
calc.pricedate,
calc.shadow_sum,
ROW_NUMBER() OVER (PARTITION BY calc.pricedate ORDER BY calc.shadow_sum DESC)
FROM calc
)
SELECT
days.c_lookback,
numbered.cons_name,
numbered.shadow_sum
FROM numbered
INNER JOIN days ON (days.c_day = numbered.pricedate AND days.c_visible)
WHERE numbered.position < 11
ORDER BY numbered.pricedate DESC, numbered.shadow_sum DESC;
Online example with generated test data: https://dbfiddle.uk/?rdbms=postgres_11&fiddle=a83a52e33ffea3783207e6b403bc226a
Example output:
c_lookback | cons_name | shadow_sum
------------+--------------+------------------
2019-05-26 | TMP400_27000 | 4578.04474575352
2019-05-26 | TMP700_25000 | 4366.56857151864
2019-05-26 | TMP200_24000 | 3901.50325547671
2019-05-26 | TMP400_24000 | 3849.39595793188
2019-05-26 | TMP700_28000 | 3763.51693260809
2019-05-26 | TMP600_26000 | 3751.72016620729
2019-05-26 | TMP500_28000 | 3610.75970225036
2019-05-26 | TMP300_26000 | 3598.36888491176
2019-05-26 | TMP600_27000 | 3583.89777677553
2019-05-26 | TMP300_21000 | 3556.60386707587
2019-05-25 | TMP400_27000 | 4687.20302128047
2019-05-25 | TMP200_24000 | 4453.61603102228
2019-05-25 | TMP700_25000 | 4319.10566615313
2019-05-25 | TMP400_24000 | 4039.01832416654
2019-05-25 | TMP600_27000 | 3986.68667223025
2019-05-25 | TMP600_26000 | 3879.92447655788
2019-05-25 | TMP700_28000 | 3632.56970774056
2019-05-25 | TMP800_25000 | 3604.1630071504
2019-05-25 | TMP600_28000 | 3572.50801157858
2019-05-25 | TMP500_27000 | 3536.57885829499
2019-05-24 | TMP400_27000 | 5034.53660146287
2019-05-24 | TMP200_24000 | 4646.08844632655
2019-05-24 | TMP600_26000 | 4377.5741555281
2019-05-24 | TMP700_25000 | 4321.11906399066
2019-05-24 | TMP400_24000 | 4071.37184911687
2019-05-24 | TMP600_25000 | 3795.00857752701
2019-05-24 | TMP700_26000 | 3518.6449117614
2019-05-24 | TMP600_24000 | 3368.15348120732
2019-05-24 | TMP200_25000 | 3305.84444172308
2019-05-24 | TMP500_28000 | 3162.57388606668
2019-05-23 | TMP400_27000 | 4057.08620966971
2019-05-23 | TMP700_26000 | 4024.11812392669
...

Creating user time report that includes zero hour weeks

I'm having a heck of a time putting together a query that I thought would be quite simple. I have a table that records total hours spent on a task and the user that reported those hours. I need to put together a query that returns how many hours a given user charged to each week of the year (including weeks where no hours were charged).
Expected Output:
|USER_ID | START_DATE | END_DATE | HOURS |
-------------------------------------------
|'JIM' | 4/28/2019 | 5/4/2019 | 6 |
|'JIM' | 5/5/2019 | 5/11/2019 | 0 |
|'JIM' | 5/12/2019 | 5/18/2019 | 16 |
I have a function that returns the start and end date of the week for each day, so I used that and joined it to the task table by date and summed up the hours. This gets me very close, but since I'm joining on date I obviously end up with NULL for the USER_ID on all zero hour rows.
Current Output:
|USER_ID | START_DATE | END_DATE | HOURS |
-------------------------------------------
|'JIM' | 4/28/2019 | 5/4/2019 | 6 |
| NULL | 5/5/2019 | 5/11/2019 | 0 |
|'JIM' | 5/12/2019 | 5/18/2019 | 16 |
I've tried a few other approaches, but each time I end up hitting the same problem. Any ideas?
Schema:
---------------------------------
| TASK_LOG |
---------------------------------
|USER_ID | DATE_ENTERED | HOURS |
-------------------------------
|'JIM' | 4/28/2019 | 6 |
|'JIM' | 5/12/2019 | 6 |
|'JIM' | 5/13/2019 | 10 |
------------------------------------
| DATE_HELPER_TABLE |
|(This is actually a function, but I|
| put it in a table to simplify) |
-------------------------------------
|DATE | START_OF_WEEK | END_OF_WEEK |
-------------------------------------
|5/3/2019 | 4/28/2019 | 5/4/2019 |
|5/4/2019 | 4/28/2019 | 5/4/2019 |
|5/5/2019 | 5/5/2019 | 5/11/2019 |
| ETC ... |
Query:
SELECT HRS.USER_ID
,DHT.START_OF_WEEK
,DHT.END_OF_WEEK
,SUM(HOURS)
FROM DATE_HELPER_TABLE DHT
LEFT JOIN (
SELECT TL.USER_ID
,TL.HOURS
,DHT2.START_OF_WEEK
,DHT2.END_OF_WEEK
FROM TASK_LOG TL
JOIN DATE_HELPER_TABLE DHT2 ON DHT2.DATE_VALUE = TL.DATE_ENTERED
WHERE TL.USER_ID = 'JIM1'
) HRS ON HRS.START_OF_WEEK = DHT.START_OF_WEEK
GROUP BY USER_ID
,DHT.START_OF_WEEK
,DHT.END_OF_WEEK
ORDER BY DHT.START_OF_WEEK
http://sqlfiddle.com/#!18/02d43/3 (note: for this sql fiddle, I converted my date helper function into a table to simplify)
Cross join the users (in question) and include them in the join condition. Use coalesce() to get 0 instead of NULL for the hours of weeks where no work was done.
SELECT u.user_id,
dht.start_of_week,
dht.end_of_week,
coalesce(sum(hrs.hours), 0)
FROM date_helper_table dht
CROSS JOIN (VALUES ('JIM1')) u (user_id)
LEFT JOIN (SELECT tl.user_id,
dht2.start_of_week,
tl.hours
FROM task_log tl
INNER JOIN date_helper_table dht2
ON dht2.date_value = tl.date_entered) hrs
ON hrs.user_id = u.user_id
AND hrs.start_of_week = dht.start_of_week
GROUP BY u.user_id,
dht.start_of_week,
dht.end_of_week
ORDER BY dht.start_of_week;
I used a VALUES clause here to list the users. If you only want to get the times for particular users you can do so too (or use any other subquery, or ...). Otherwise you can use your user table (which you didn't post, so I had to use that substitute).
However the figures that are produced by this (and your original query) look strange to me. In the fiddle your user has worked for a total of 23 hours in the task_log table. Yet your sums in the result are 24 and 80, that is way to much on its own and even worse taking into account, that 1 hour in task_log isn't even on a date listed in date_helper_table.
I suspect you get more accurate figures if you just join task_log, not that weird derived table.
SELECT u.user_id,
dht.start_of_week,
dht.end_of_week,
coalesce(sum(tl.hours), 0)
FROM date_helper_table dht
CROSS JOIN (VALUES ('JIM1')) u (user_id)
LEFT JOIN task_log tl
ON tl.user_id = u.user_id
AND tl.date_entered = dht.date_value
GROUP BY u.user_id,
dht.start_of_week,
dht.end_of_week
ORDER BY dht.start_of_week;
But maybe that's just me.
SQL Fiddle
http://sqlfiddle.com/#!18/02d43/65
Using your SQL fiddle, I simply updated the select statement to account for and convert null values. As far as I can tell, there is nothing in your post that makes this option not viable. Please let me know if this is not the case and I will update. (This is not intended to detract from sticky bit's answer, but to offer an alternative)
SELECT ISNULL(HRS.USER_ID, '') as [USER_ID]
,DHT.START_OF_WEEK
,DHT.END_OF_WEEK
,SUM(ISNULL(HOURS,0)) as [SUM]
FROM DATE_HELPER_TABLE DHT
LEFT JOIN (
SELECT TL.USER_ID
,TL.HOURS
,DHT2.START_OF_WEEK
,DHT2.END_OF_WEEK
FROM TASK_LOG TL
JOIN DATE_HELPER_TABLE DHT2 ON DHT2.DATE_VALUE = TL.DATE_ENTERED
WHERE TL.USER_ID = 'JIM1'
) HRS ON HRS.START_OF_WEEK = DHT.START_OF_WEEK
GROUP BY USER_ID
,DHT.START_OF_WEEK
,DHT.END_OF_WEEK
ORDER BY DHT.START_OF_WEEK
Create a dates table that includes all dates for the next 100 years in the first column, the week of the year, day of the month etc in the next.
Then select from that dates table and left join everything else. Do isnull function to replace nulls with zeros.

How to return average of each condition in SQL

I am struggling with this SQL question for several days. I am quite new to SQL. Really appreciate your time and effort.
Q: returns the average arrival delay time for each day of the week.
Expect results:
+--------------+---------------+
| weekday_name | avg_delay |
+--------------+---------------+
| Friday | 14.4520127056 |
| Monday | 10.5375015249 |
| Thursday | 8.47985564693 |
| Wednesday | 8.4561902339 |
| Saturday | 7.54455459234 |
| Tuesday | 4.63152453983 |
| Sunday | 4.21165978081 |
+--------------+---------------+
I already have two table ready: flight_delays and weekdayName
My sql code in ipython book:
SELECT distinct w.weekday_name, AVG(f.arr_delay) as average_delay
FROM flight_delays as f, weekdayName as w
WHERE f.day_of_week = w.dayofweek and w.dayofweek <= 7
ORDER BY AVG(arr_delay)
it only returns:
weekday_name average_delay
Sunday 8.295147670495197
So it actually average all seven days' results. But I want to average results of each day. Could you please explain where is my mistake. Thanks a lot.
First, learn to use proper join and group by syntax. Also, I don't think and w.dayofweek <= 7 is needed.
Does this do what you want?
SELECT w.weekday_name, AVG(f.arr_delay) as average_delay
FROM flight_delays f join
weekdayName w
on f.day_of_week = w.dayofweek
GROUP BY w.weekday_name
ORDER BY AVG(arr_delay)
Made some adjustments:
SELECT w.weekday_name, AVG(f.arr_delay) as average_delay
FROM flight_delays f INNER JOIN weekdayName w
ON f.day_of_week = w.dayofweek
GROUP BY w.weekday_name
ORDER BY AVG(arr_delay);
If you are doing aggregation(here AVG) in your SELECT, you need to provide non-aggregation fields in the GROUP BY.

SQL Where Query to Return Distinct Values

I have an app that has the built in initial Select option and only allows me to enter from the Where section. I have rows with duplicate values. I'm trying to get the list of just one record for each distinct value but am unsure how to get the statement to work. I've found one that almost does the trick but it doesn't give me any rows that had a dup. I assume due to the = so just need a way to get one for each that matches my where criteria. Examples below.
Initial Data Set
Date | Name | ANI | CallIndex | Duration
---------------------------------------------------------
2/2/2015 | John | 5555051000 | 00000.0001 | 60
2/2/2015 | John | | 00000.0001 | 70
3/1/2015 | Jim | 5555051001 | 00000.0012 | 80
3/4/2015 | Susan | | 00000.0022 | 90
3/4/2015 | Susan | 5555051002 | 00000.0022 | 30
4/10/2015 | April | 5555051003 | 00000.0030 | 35
4/11/2015 | Leon | 5555051004 | 00000.0035 | 10
4/15/2015 | Jane | 5555051005 | 00000.0050 | 20
4/15/2015 | Jane | 5555051005 | 00000.0050 | 60
4/15/2015 | Kevin | 5555051006 | 00000.0061 | 35
What I Want the Query to Return
Date | Name | ANI | CallIndex | Duration
---------------------------------------------------------
2/2/2015 | John | 5555051000 | 00000.0001 | 60
3/1/2015 | Jim | 5555051001 | 00000.0012 | 80
3/4/2015 | Susan | 5555051002 | 00000.0022 | 30
4/10/2015 | April | 5555051003 | 00000.0030 | 35
4/11/2015 | Leon | 5555051004 | 00000.0035 | 10
4/15/2015 | Jane | 5555051005 | 00000.0050 | 20
4/15/2015 | Kevin | 5555051006 | 00000.0061 | 35
Here is what I was able to get but when i run it I don't get the rows that did have dups callindex values. duration doesn't mattern and they never match up so if it helps to query using that as a filter that would be fine. I've added mock data to assist.
use Database
SELECT * FROM table
WHERE Date between '4/15/15 00:00' and '4/15/15 23:59'
and callindex in
(SELECT callindex
FROM table
GROUP BY callinex
HAVING COUNT(callindex) = 1)
Any help would be greatly appreciated.
Ok with the assistance of everyone here i was able to get the query to work perfectly within SQL. That said apparently the app I'm trying this on has a built in character limit and the below query is too long. This is the query i have to use as far as the restrictions and i have to be able to search both ID's at the same time because some get stamped with one or the other rarely both. I'm hoping someone might be able to help me shorten it?
use Database
select * from tblCall
WHERE
flddate between '4/15/15 00:00' and '4/15/15 23:59'
and fldAgentLoginID='1234'
and fldcalldir='incoming'
and fldcalltype='external'
and EXISTS (SELECT * FROM (SELECT MAX(fldCallName) AS fldCallName, fldCallID FROM tblCall GROUP BY fldCallID) derv WHERE tblCall.fldCallName = derv.fldCallName AND tblCall.fldCallID = derv.fldCallID)
or
flddate between '4/15/15 00:00' and '4/15/15 23:59'
and '4/15/15 23:59'
and fldPhoneLoginID='56789'
and fldcalldir='incoming'
and fldcalltype='external'
and EXISTS (SELECT * FROM (SELECT MAX(fldCallName) AS fldCallName, fldCallID FROM tblCall GROUP BY fldCallID) derv WHERE tblCall.fldCallName = derv.fldCallName AND tblCall.fldCallID = derv.fldCallID)
If the constraint is that we can only add to the WHERE clause, I don't think it's possible, due to there being 2 absolutely identical rows:
4/15/2015 | Jane | 5555051005 | 00000.0050
4/15/2015 | Jane | 5555051005 | 00000.0050
Is it possible that you can add HAVING or GROUP BY to the WHERE? or possibly UNION the SELECT to another SELECT statement? That may open up some additional possibilities.
Maybe with an union:
SELECT *
FROM table
GROUP BY Date, Name, ANI, CallIndex
HAVING ( COUNT(*) > 1 )
UNION
SELECT *
FROM table
WHERE Name not in (SELECT name from table
GROUP BY Date, Name, ANI, CallIndex
HAVING ( COUNT(*) > 1 ))
From your sample, it seems like you could just exclude rows in which there was no value in the ANI column. If that is the case you could simply do:
use Database
SELECT * FROM table
WHERE Date between '4/15/15 00:00' and '4/15/15 23:59'
and ANI is not null
If this doesn't work for you, let me know and I can see what else I can do.
Edit:
You've made it sound like the CallIndex combined with the Duration is a unique value. That seems somewhat doubtful to me, but if that is the case you could do something like this:
use Database
SELECT * FROM table
WHERE Date between '4/15/15 00:00' and '4/15/15 23:59'
and cast(callindex as varchar(80))+'-'+cast(min(duration) as varchar(80)) in
(SELECT cast(callindex as varchar(80))+'-'+cast(min(duration) as varchar(80))
FROM table
GROUP BY callindex)
There are two keywords you can use to get non-duplicated data, either DISTINCT or GROUP BY. In this case, I would use a GROUP BY, but you should read up on both.
This query groups all of the records by CallIndex and takes the MAX value for each of the other columns and should give you the results you want:
SELECT MAX(Date) AS Date, MAX(Name) AS Name, MAX(ANI) AS ANI, CallIndex
FROM table
GROUP BY CallIndex
EDIT
Since you can't use GROUP BY directly but you can have any SQL in the WHERE clause you can do:
SELECT *
FROM table
WHERE EXISTS
(
SELECT *
FROM
(
SELECT MAX(Date) AS Date, MAX(Name) AS Name, MAX(ANI) AS ANI, CallIndex
FROM table
GROUP BY CallIndex
) derv
WHERE table.Date = derv.Date
AND table.Name = derv.Name
AND table.ANI = derv.ANI
AND table.CallIndex = derv.CallIndex
)
This selects all rows from the table where there exists a matching row from the GROUP BY.
It won't be perfect, if any two rows match exactly, you'll still have duplicates, but that's the best you'll get with your restriction.
In your data, why not just do this?
SELECT *
FROM table
WHERE Date >= '2015-04-15' and Date < '2015-04-16'
ani is not null;
If the blank values are only a coincidence, then you have a problem just using a where clause. If the results are full duplicates (no column has a different value), then you probably cannot do what you want with just a where clause -- unless you are using SQLite, Oracle, or Postgres.

Querying data for current day and only the most recent entry

I'm new to Informix and am trying to figure out some of the syntax. I have a table that stores agent status information. I only want to pull two things from the table;
1 - The current day's rows
2 - Only the most recent entry for each agent
So, a query of
select limit 5 agentid, eventdatetime from agentstatedetail order by eventdatetime desc
will yield;
+------------+------------------------+
| agentid | eventdatetime |
+------------+------------------------+
| 1552 | 2013-12-04 16:48:20.122|
| 1482 | 2013-12-04 16:48:18.897|
| 1439 | 2013-12-04 16:48:17.754|
| 1188 | 2013-12-04 16:48:15.972|
| 788 | 2013-12-04 16:48:15.190|
+------------+------------------------+
The Informix syntax seems a bit different from mysql. How can I pull this kind of information? I tried using the "today" modifier, but it doesn't work the way I thought it would.
You could get the max eventdatetime for each agentid that has a row for today with:
SELECT agentid,max(eventdatetime)
FROM agentstatedetail
WHERE date(eventdatetime) = TODAY
GROUP BY agentid;
p.s. To get the same effect of LIMIT 5 in informix you do SELECT FIRST 5 ...
Hope this work:
select A.agentid, A.eventdatetime
from agentstatedetail A
where
date(A.eventdatetime) = TODAY
and A.eventdatetime = ( select MAX(B.eventdatetime) from agentstatedetail B where A.agentid = B.agentid)