how to exclude the most recent null field from query result? - sql

I want to design a query to find out is there at least one cat (select count(*) where rownum = 1) that haven't been checked out.
One weird condition is that the result should exclude if the most recent cat that didn't checked out, so that:
TABLE schedule
-------------------------------------
| type | checkin | checkout
-------------------------------------
| cat | 20:10 | (null)
| dog | 19:35 | (null)
| dog | 19:35 | (null)
| cat | 15:31 | (null) ----> exclude this cat in this scenario
| dog | 12:47 | 13:17
| dog | 10:12 | 12:45
| cat | 08:27 | 11:36
should return 1, the first record
| cat | 20:10 | (null)
I kind of create the query like
select * from schedule where type = 'cat' and checkout is null order by checkin desc
however this query does not resolve the exclusion. I can sure handle it in the service layer like java, but just wondering any solution can design in the query and with good performance when there is large amount of data in the table ( checkin and checkout are indexed but not type)

How about this?
Select *
From schedule
Where type='cat' and checkin=(select max(checkin) from schedule where type='cat' and checkout is null);

Assuming the checkin and checkout data type is string (which it shouldn't be, it should be DATE), to_char(checkin, 'hh24:mi') will create a value of the proper data type, DATE, assuming the first day of the current month as the "date" portion. It shouldn't matter to you, since presumably all the hours are from the same date. If in fact checkin/out are in the proper DATE data type, you don't need the to_date() call in order by (in two places).
I left out the checkout column from the output, since you are only looking for the rows with null in that column, so including it would provide no information. I would have left out type as well, but perhaps you'll want to have this for cats AND dogs at some later time...
with
schedule( type, checkin, checkout ) as (
select 'cat', '20:10', null from dual union all
select 'dog', '19:35', null from dual union all
select 'dog', '19:35', null from dual union all
select 'cat', '15:31', null from dual union all
select 'dog', '12:47', '13:17' from dual union all
select 'dog', '10:12', '12:45' from dual union all
select 'cat', '08:27', '11:36' from dual
)
-- end of test data; actual solution (SQL query) begins below this line
select type, checkin
from ( select type, checkin,
row_number() over (order by to_date(checkin, 'hh24:mi')) as rn
from schedule
where type = 'cat' and checkout is null
)
where rn > 1
order by to_date(checkin, 'hh24:mi') -- ORDER BY is optional
;
TYPE CHECKIN
---- -------
cat 20:10

Related

How to associate date events together by date

I'm working on writing a query to organize install and removal dates for car part numbers. I want to find a record of all car part installs, and removals of the same part if they have been removed from a vehicle, identified by it's VIN. I'm having trouble associating these events together because the only thing tying them together is the dates. Removals must occur after installs and another install cannot occur on the same part unless it has been removed first.
I have been able to summarize the data into separate rows by event type (e.g. each install has its own row and each removal has its own row.
What I've tried is using DECODE() by event type, but it keeps the records in separate rows. Maybe there's something COALESCE() can do here, but I'm not sure.
Here's a summary of how the data looks:
part_no | serial_no | car_vin | event_type | event_date
12345 | a1b2c3 | 9876543 | INSTALL | 01-JAN-2019
12345 | a1b2c3 | 9876543 | REMOVE | 01-AUG-2019
54321 | t3c4a8 | 9876543 | INSTALL | 01-MAR-2019
12345 | a1b2c3 | 3456789 | INSTALL | 01-SEP-2019
And here's what the expected outcome is:
part_no | serial_no | car_vin | install_date | remove_date
12345 | a1b2c3 | 9876543 | 01-JAN-2019 | 01-AUG-2019
12345 | a1b2c3 | 3456789 | 01-SEP-2019 |
54321 | t3c4a8 | 9876543 | 01-MAR-2019 |
We can use pivoting logic here:
SELECT
part_no,
serial_no,
car_vin,
MAX(CASE WHEN event_type = 'INSTALL' THEN event_date END) AS install_date,
MAX(CASE WHEN event_type = 'REMOVE' THEN event_date END) AS remove_date
FROM yourTable
GROUP BY
part_no,
serial_no,
car_vin
ORDER BY
part_no;
Demo
This approach is a typical way to transform a key value store table, which is basically what your table is, into the output you want to see.
You can use the SQL for Pattern Matching (MATCH_RECOGNIZE):
WITH t(part_no,serial_no,car_vin,event_type,event_date) AS
(SELECT 12345, 'a1b2c3', 9876543, 'INSTALL', DATE '2019-01-01' FROM dual
UNION ALL SELECT 12345, 'a1b2c3', 9876543, 'REMOVE', DATE '2019-08-01' FROM dual
UNION ALL SELECT 54321, 't3c4a8', 9876543, 'INSTALL', DATE '2019-03-01' FROM dual
UNION ALL SELECT 12345, 'a1b2c3', 3456789, 'INSTALL', DATE '2019-09-01' FROM dual)
SELECT part_no,serial_no,car_vin, INSTALL_DATE, REMOVE_DATE
FROM t
MATCH_RECOGNIZE (
PARTITION BY part_no,serial_no,car_vin
ORDER BY event_date
MEASURES
FINAL MAX(REMOVE.event_date) AS REMOVE_DATE,
FINAL MAX(INSTALL.event_date) AS INSTALL_DATE
PATTERN ( INSTALL REMOVE? )
DEFINE
REMOVE AS event_type = 'REMOVE',
INSTALL AS event_type = 'INSTALL'
)
ORDER BY part_no, INSTALL_DATE, REMOVE_DATE;
+--------------------------------------------------+
|PART_NO|SERIAL_NO|CAR_VIN|INSTALL_DATE|REMOVE_DATE|
+--------------------------------------------------+
|12345 |a1b2c3 |9876543|01.01.2019 |01.08.2019 |
|12345 |a1b2c3 |3456789|01.09.2019 | |
|54321 |t3c4a8 |9876543|01.03.2019 | |
+--------------------------------------------------+
The key clause here is PATTERN ( INSTALL REMOVE? ). It means, you have exactly one INSTALL event followed by zero or one REMOVE event.
If you can have more than just one INSTALL event then use PATTERN ( INSTALL+ REMOVE? )
If you can have more than just one INSTALL event and optionally more than one REMOVE event then use PATTERN ( INSTALL+ REMOVE* )
You can simply add more events, e.g. ORDER, DISPOSAL, etc.

SQL query to turn change log into intervals

So let me describe the problem:
-I have a task table with an assignee column, a created column and a resolved column
(both created and resolved are timestamps)
+---------+----------+------------+------------+
| task_id | assignee | created | resolved |
+---------+----------+------------+------------+
| tsk1 | him | 2000-01-01 | 2018-01-03 |
+---------+----------+------------+------------+
-I have a change log table with a task_id, a from column, a to column and a date column that records each time the assignee is changed
+---------+----------+------------+------------+
| task_id | from | to | date |
+---------+----------+------------+------------+
| tsk1 | me | you | 2017-04-06 |
+---------+----------+------------+------------+
| tsk1 | you | him | 2017-04-08 |
+---------+----------+------------+------------+
I want to select a table that shows a list of all the assignees that worked on a task within an interval
+---------+----------+------------+------------+
| task_id | assignee | from | to |
+---------+----------+------------+------------+
| tsk1 | me | 2000-01-01 | 2017-04-06 |
+---------+----------+------------+------------+
| tsk1 | you | 2017-04-06 | 2017-04-08 |
+---------+----------+------------+------------+
| tsk1 | him | 2017-04-08 | 2018-01-03 |
+---------+----------+------------+------------+
I'm having trouble with the first(/last) row, where the from(/to) should be set as created(/resolved), I don't know how to make a column with data from two different tables...
I've tried making them in their own select and then merging all rows with union, but I don't think this is a very good solution...
Hmmm . . . This is tricker than it seems. The idea is to use lead() to get the next date, but you need to "augment" the data with information from the tasks table:
select task_id, to, date as fromdate,
coalesce(lead(date) over (partition by task_id order by date),
max(resolved) over (partition by task_id)
) as todate
from ((select task_id, to, date, null::timestamp
from log l
) union all
(select distint on (t.task_id) t.task_id, l.from, t.created, t.resolved
from task t join
log l
on t.task_id = l.task_id
order by t.task_id, l.date
)
) t;
demo:db<>fiddle
SELECT
l.task_id,
assignee_from as assignee,
COALESCE(
lag(assign_date) OVER (ORDER BY assign_date),
created
) as date_from,
assign_date as date_to
FROM
log l
JOIN
task t
ON l.task_id = t.task_id
UNION ALL
SELECT * FROM (
SELECT DISTINCT ON (l.task_id)
l.task_id, assignee_to, assign_date, resolved
FROM
log l
JOIN
task t
ON l.task_id = t.task_id
ORDER BY l.task_id, assign_date DESC
) s
ORDER BY task_id, date_from
UNION consists of two parts: The part from the log and finally the last row from the task table.
The first part uses LAG() window function to get the previous date to the current row. Because "me" has no previous row, that would result in a NULL value. So this is catched by getting the created date from the task table.
The second part is to get the last row: Here I am getting the last row of the log by DISTINCT and ORDER BY assign_date DESC. So I know the last assignee_to. The rest is similar to the first part: Getting the resolved value from the task table.
Thanks to the answer from S-Man and Gordon Linoff, I was able to come up with this solution:
SELECT t.task_id,
t.item_from AS assignee,
COALESCE(lag(t.changelog_created) OVER (
PARTITION BY t.task_id ORDER BY t.changelog_created),
max(t.creationdate) OVER (PARTITION BY t.task_id)) AS fromdate,
t.changelog_created as todate
FROM ( SELECT ch.task_id,
ch.item_from,
ch.changelog_created,
NULL::timestamp without time zone AS creationdate
FROM changelog_generic_expanded_view ch
WHERE ch.field::text = 'assignee'::text
UNION ALL
( SELECT DISTINCT ON (t_1.id_task) t_1.id_task,
t_1.assigneekey,
t_1.resolutiondate,
t_1.creationdate
FROM task_jira t_1
ORDER BY t_1.id_task)) t;
Note: this is the final version so the names are a bit different, but the idea stays the same.
This is basically the same code as Gordon Linoff, but I go through changelog in the opposite direction.
I use the 2nd part of UNION ALL to generate the last assignee instead of the first (this is to handle the case where there is no changelog at all, the last assignee is generated without involving changelogs)

Joining two into one SSIS

does anyone know to work this out?
i'm a dummie with SSIS, i have a derived column with WomenID, MenID, Date, and Status.
The thing is that i need to "join" WomenID and MenID into one (IDs) keeping the date and status, for example:
WomenID| MenID| Date | Status
123 | 345 | 20160819 | M
768 | 762 | 19870830 | S
and need to turn it into
ID |Date |Status
123 |20160819 | M
768 |19870830 | S
345 |20160819 | M
762 |19870830 | S
I know that this is a trival question but can't see the light with this one.
One option uses a UNION:
SELECT WomenID AS ID, Date, Status FROM yourTable
UNION ALL
SELECT MenID, Date, Status FROM yourTable
If you want the exact ordering which you are showing us, we need to do more work. A computed column is one way to go:
WITH cte AS (
SELECT WomenID AS ID, Date, Status, 0 AS position FROM yourTable
UNION ALL
SELECT MenID, Date, Status, 1 FROM yourTable
)
SELECT ID, Date, Status
FROM cte
ORDER BY position, Status;
Demo
This should help:
select womenid as Id, Date, status where status=‘F’
Union
Select menid as Id, Date, status where status=‘M’
Hope it helps

How to handle multiple associated log messages?

I have a service that calculates a bunch of things for a project. A user can trigger this calculation multiple times a day. Every calculation generates a few interesting metrics (let's call them A, B, C).
I report these metrics to a log service with individual log messages. The log messages look like this:
date | calculationID1 | projectID1 | metricA | valueA
date | calculationID1 | projectID1 | metricB | valueB
date | calculationID1 | projectID1 | metricC | valueC
date | calculationID2 | projectID2 | metricA | valueA
date | calculationID2 | projectID2 | metricB | valueB
date | calculationID2 | projectID2 | metricC | valueC
date | calculationID3 | projectID1 | metricA | valueA
date | calculationID3 | projectID1 | metricB | valueB
date | calculationID3 | projectID1 | metricC | valueC
In this example the project with ID 1 was run two times on this particular day. In my analytics backend I have a Hive cluster to analyze this data and I want to generate a table with the last reported metrics for every project for a given day day:
date | calculationID3 | projectID1 | valueA | valueB | valueC
date | calculationID2 | projectID2 | valueA | valueB | valueC
Apparently this calculation is really costly as I do a lot of joins. My company has a strict logging format and that's why I created one value per log message. Should I create one log message containing all metrics instead to ease the reporting?
Can anyone point me to best practices for these kind of problems?
If we use DB, supporting PIVOT clause in SQL, then we can gather data from log report using the following query.
The same results can be fetched without PIVOT, but another way requires a lot of copy-paste and juggling, and since you are "pragmatic with implementation", I suppose we don't need to speak about those dirty things.
To see what's happening in the query you can do 3 steps:
run the query without PIVOT (just remove the PIVOT keyword and the rest of the code)
then run it as is
compare results of the first and the second step, recognising how the rows are transposing to the columns
WITH
data_table (stamp, calculation_ID, project_ID, metric_name, metric_value) as ( select
timestamp '2015-01-01 00:00:01', 'calc_ID_1', 'project_WHITE', 'metric_A', 11 from dual union all select
timestamp '2015-01-01 00:00:02', 'calc_ID_1', 'project_WHITE', 'metric_B', 21 from dual union all select
timestamp '2015-01-01 00:00:03', 'calc_ID_1', 'project_WHITE', 'metric_C', 31 from dual union all select
timestamp '2015-01-01 00:01:04', 'calc_ID_2', 'project_WHITE', 'metric_A', 12 from dual union all select
timestamp '2015-01-01 00:01:05', 'calc_ID_2', 'project_WHITE', 'metric_B', 22 from dual union all select
timestamp '2015-01-01 00:01:06', 'calc_ID_2', 'project_WHITE', 'metric_C', 32 from dual union all select
timestamp '2015-01-01 00:00:11', 'calc_ID_3', 'project_BLACK', 'metric_A', 41 from dual union all select
timestamp '2015-01-01 00:00:12', 'calc_ID_3', 'project_BLACK', 'metric_B', 51 from dual union all select
timestamp '2015-01-01 00:00:13', 'calc_ID_3', 'project_BLACK', 'metric_C', 61 from dual union all select
timestamp '2015-01-01 00:01:14', 'calc_ID_4', 'project_BLACK', 'metric_A', 42 from dual union all select
timestamp '2015-01-01 00:01:15', 'calc_ID_4', 'project_BLACK', 'metric_B', 52 from dual union all select
timestamp '2015-01-01 00:01:16', 'calc_ID_4', 'project_BLACK', 'metric_C', 62 from dual
)
SELECT *
FROM (
select trunc(stamp) AS day,
calculation_id,
project_id,
metric_name,
metric_value
from (
select t.*,
rank() OVER (PARTITION BY project_ID, metric_name, trunc(stamp) ORDER BY stamp DESC) calculation_rank
from data_table t
-- take only the last log row for (project_ID, metric_name) for every given day
) where calculation_rank = 1
)
PIVOT (
-- aggregate function is required here,
-- and SUM can be replaced with something more relevant to custom logic
SUM(metric_value)
FOR
metric_name IN ('metric_A' AS "Metric A",
'metric_B' AS "Metric B",
'metric_C' AS "Metric C")
);
Results:
DAY | CALCULATION_ID | PROJECT_ID | Metric A | Metric B | Metric C
------------------------------------------------------------------------------
2015-01-01 | calc_ID_4 | project_BLACK | 42 | 52 | 62
2015-01-01 | calc_ID_2 | project_WHITE | 12 | 22 | 32
In this query calculation_ID is redundant (I use it only to make example clearer for code reader). But you still can apply this info to check integrity of logging data format, exploring if equal calculation_ID corresponds to the metrics involved in the same group/time-period.

SQL Where Query to Return Distinct Values

I have an app that has the built in initial Select option and only allows me to enter from the Where section. I have rows with duplicate values. I'm trying to get the list of just one record for each distinct value but am unsure how to get the statement to work. I've found one that almost does the trick but it doesn't give me any rows that had a dup. I assume due to the = so just need a way to get one for each that matches my where criteria. Examples below.
Initial Data Set
Date | Name | ANI | CallIndex | Duration
---------------------------------------------------------
2/2/2015 | John | 5555051000 | 00000.0001 | 60
2/2/2015 | John | | 00000.0001 | 70
3/1/2015 | Jim | 5555051001 | 00000.0012 | 80
3/4/2015 | Susan | | 00000.0022 | 90
3/4/2015 | Susan | 5555051002 | 00000.0022 | 30
4/10/2015 | April | 5555051003 | 00000.0030 | 35
4/11/2015 | Leon | 5555051004 | 00000.0035 | 10
4/15/2015 | Jane | 5555051005 | 00000.0050 | 20
4/15/2015 | Jane | 5555051005 | 00000.0050 | 60
4/15/2015 | Kevin | 5555051006 | 00000.0061 | 35
What I Want the Query to Return
Date | Name | ANI | CallIndex | Duration
---------------------------------------------------------
2/2/2015 | John | 5555051000 | 00000.0001 | 60
3/1/2015 | Jim | 5555051001 | 00000.0012 | 80
3/4/2015 | Susan | 5555051002 | 00000.0022 | 30
4/10/2015 | April | 5555051003 | 00000.0030 | 35
4/11/2015 | Leon | 5555051004 | 00000.0035 | 10
4/15/2015 | Jane | 5555051005 | 00000.0050 | 20
4/15/2015 | Kevin | 5555051006 | 00000.0061 | 35
Here is what I was able to get but when i run it I don't get the rows that did have dups callindex values. duration doesn't mattern and they never match up so if it helps to query using that as a filter that would be fine. I've added mock data to assist.
use Database
SELECT * FROM table
WHERE Date between '4/15/15 00:00' and '4/15/15 23:59'
and callindex in
(SELECT callindex
FROM table
GROUP BY callinex
HAVING COUNT(callindex) = 1)
Any help would be greatly appreciated.
Ok with the assistance of everyone here i was able to get the query to work perfectly within SQL. That said apparently the app I'm trying this on has a built in character limit and the below query is too long. This is the query i have to use as far as the restrictions and i have to be able to search both ID's at the same time because some get stamped with one or the other rarely both. I'm hoping someone might be able to help me shorten it?
use Database
select * from tblCall
WHERE
flddate between '4/15/15 00:00' and '4/15/15 23:59'
and fldAgentLoginID='1234'
and fldcalldir='incoming'
and fldcalltype='external'
and EXISTS (SELECT * FROM (SELECT MAX(fldCallName) AS fldCallName, fldCallID FROM tblCall GROUP BY fldCallID) derv WHERE tblCall.fldCallName = derv.fldCallName AND tblCall.fldCallID = derv.fldCallID)
or
flddate between '4/15/15 00:00' and '4/15/15 23:59'
and '4/15/15 23:59'
and fldPhoneLoginID='56789'
and fldcalldir='incoming'
and fldcalltype='external'
and EXISTS (SELECT * FROM (SELECT MAX(fldCallName) AS fldCallName, fldCallID FROM tblCall GROUP BY fldCallID) derv WHERE tblCall.fldCallName = derv.fldCallName AND tblCall.fldCallID = derv.fldCallID)
If the constraint is that we can only add to the WHERE clause, I don't think it's possible, due to there being 2 absolutely identical rows:
4/15/2015 | Jane | 5555051005 | 00000.0050
4/15/2015 | Jane | 5555051005 | 00000.0050
Is it possible that you can add HAVING or GROUP BY to the WHERE? or possibly UNION the SELECT to another SELECT statement? That may open up some additional possibilities.
Maybe with an union:
SELECT *
FROM table
GROUP BY Date, Name, ANI, CallIndex
HAVING ( COUNT(*) > 1 )
UNION
SELECT *
FROM table
WHERE Name not in (SELECT name from table
GROUP BY Date, Name, ANI, CallIndex
HAVING ( COUNT(*) > 1 ))
From your sample, it seems like you could just exclude rows in which there was no value in the ANI column. If that is the case you could simply do:
use Database
SELECT * FROM table
WHERE Date between '4/15/15 00:00' and '4/15/15 23:59'
and ANI is not null
If this doesn't work for you, let me know and I can see what else I can do.
Edit:
You've made it sound like the CallIndex combined with the Duration is a unique value. That seems somewhat doubtful to me, but if that is the case you could do something like this:
use Database
SELECT * FROM table
WHERE Date between '4/15/15 00:00' and '4/15/15 23:59'
and cast(callindex as varchar(80))+'-'+cast(min(duration) as varchar(80)) in
(SELECT cast(callindex as varchar(80))+'-'+cast(min(duration) as varchar(80))
FROM table
GROUP BY callindex)
There are two keywords you can use to get non-duplicated data, either DISTINCT or GROUP BY. In this case, I would use a GROUP BY, but you should read up on both.
This query groups all of the records by CallIndex and takes the MAX value for each of the other columns and should give you the results you want:
SELECT MAX(Date) AS Date, MAX(Name) AS Name, MAX(ANI) AS ANI, CallIndex
FROM table
GROUP BY CallIndex
EDIT
Since you can't use GROUP BY directly but you can have any SQL in the WHERE clause you can do:
SELECT *
FROM table
WHERE EXISTS
(
SELECT *
FROM
(
SELECT MAX(Date) AS Date, MAX(Name) AS Name, MAX(ANI) AS ANI, CallIndex
FROM table
GROUP BY CallIndex
) derv
WHERE table.Date = derv.Date
AND table.Name = derv.Name
AND table.ANI = derv.ANI
AND table.CallIndex = derv.CallIndex
)
This selects all rows from the table where there exists a matching row from the GROUP BY.
It won't be perfect, if any two rows match exactly, you'll still have duplicates, but that's the best you'll get with your restriction.
In your data, why not just do this?
SELECT *
FROM table
WHERE Date >= '2015-04-15' and Date < '2015-04-16'
ani is not null;
If the blank values are only a coincidence, then you have a problem just using a where clause. If the results are full duplicates (no column has a different value), then you probably cannot do what you want with just a where clause -- unless you are using SQLite, Oracle, or Postgres.