Select only of MIN values in ORACLE SQL - sql

I want to select the earliest date and time from my data set and only show those row(s) that fit the requirement. And show 3 columns.
I got it to show the data in the right order by date and time. How can I just get it to show the data that have the mininum values? I tried using first, limit, and top x, but they don't work, and aren't exactly what I need since the answer may have more than 1 value.
Here is my example sql:
Select report, date, time
From events
order by date, time

Try this:
SELECT report, date, time
FROM (SELECT report, date, time,
ROW_NUMBER() OVER(PARTITION BY report ORDER BY date ASC, time ASC) AS RowNum
From events
) AS CTE
WHERE CTE.RowNum = 1

Something like this should work assuming that every row has a validly formatted day and time component.
SELECT report,
dt,
time
FROM (SELECT report,
dt,
time,
rank() over (partition by report
order by to_date( dt || ' ' || time, 'MM/DD/YYYY HH24MI' ) asc) rnk
FROM events)
WHERE rnk = 1
From a data model standpoint, however, you should always store dates in DATE columns rather than trying to store them in VARCHAR2 columns. Since you want date comparison and sorting semantics, you'll have to transform the data into a DATE which is costly at runtime. And there is a great chance that someone will eventually store data in a different format in the column or store an invalid string (i.e. a day of '02/29/2011') which will cause your query to start generating errors.

Guessing a bit as the data types aren't clear, but something like this might work (example using a CTE to generate dummy data):
with events as (
select 'report1' as report, '01/01/2012' as date_field, '0800' as time_field
from dual
union all select 'report1', '01/01/2012', '0900' from dual
union all select 'report1', '01/02/2012', '0930' from dual
union all select 'report2', '01/01/2012', '0900' from dual
union all select 'report2', '01/01/2012', '0900' from dual
union all select 'report2', '01/01/2012', '1000' from dual
)
select report, date_field, time_field
from (
select report, date_field, time_field,
row_number() over (partition by report
order by to_date(date_field, 'MM/DD/YYYY'), time_field) as rn
from events
)
where rn = 1
order by report;
REPORT DATE_FIELD TIME
------- ---------- ----
report1 01/01/2012 0800
report2 01/01/2012 0900
You may have a different date format mask; I've assumed US format as you referred to 'military time'.
Depending on how you want to treat ties, you'll want rank or dense_rank instead of row_number. See the documentation of analytic functions for more info. As Justin pointed out you probably want rank, which with the same data gives:
REPORT DATE_FIELD TIME
------- ---------- ----
report1 01/01/2012 0800
report2 01/01/2012 0900
report2 01/01/2012 0900
The inner select adds an extra rn column that assigns a ranking to each result; each value of report will have at least one row that gets assigned 1 (if using rank, otherwise exactly one), and possibly rows with 2, 3 etc. The one(s) with 1 will have the earliest date/time for that report. The outer query then filters to only show those ranked 1, via the where rn = 1 clause, hence only giving the data with the earliest date/time for each report - the rest is discarded.

Related

Finding id's available in previous weeks but not in current week

How to find if an id which was present in previous weeks but not available in current week on a rolling basis. For e.g
Week1 has id 1,2,3,4,5
Week2 has id 3,4,5,7,8
Week3 has id 1,3,5,10,11
So I found out that id 1 and 2 are missing in week 2 and id 2,4,7,8 are missing in week 3 from previous 2 weeks But how to do this on a rolling window for a large amount of data distributed over a period of 20+ years
Please find the sample dataset and expected output. I am expecting the output to be partitioned based on the week_end Date
Dataset
ID|WEEK_START|WEEK_END|APPEARING_DATE
7152|2015-12-27|2016-01-02|2015-12-27
8350|2015-12-27|2016-01-02|2015-12-27
7152|2015-12-27|2016-01-02|2015-12-29
4697|2015-12-27|2016-01-02|2015-12-30
7187|2015-12-27|2016-01-02|2015-01-01
8005|2015-12-27|2016-01-02|2015-12-27
8005|2015-12-27|2016-01-02|2015-12-29
6254|2016-01-03|2016-01-09|2016-01-03
7962|2016-01-03|2016-01-09|2016-01-04
3339|2016-01-03|2016-01-09|2016-01-06
7834|2016-01-03|2016-01-09|2016-01-03
7962|2016-01-03|2016-01-09|2016-01-05
7152|2016-01-03|2016-01-09|2016-01-07
8350|2016-01-03|2016-01-09|2016-01-09
2403|2016-01-10|2016-01-16|2016-01-10
0157|2016-01-10|2016-01-16|2016-01-11
2228|2016-01-10|2016-01-16|2016-01-14
4697|2016-01-10|2016-01-16|2016-01-14
Excepted Output
Partition1: WEEK_END=2016-01-02
ID|MAX(LAST_APPEARING_DATE)
7152|2015-12-29
8350|2015-12-27
4697|2015-12-30
7187|2015-01-01
8005|2015-12-29
Partition1: WEEK_END=2016-01-09
ID|MAX(LAST_APPEARING_DATE)
7152|2016-01-07
8350|2016-01-09
4697|2015-12-30
7187|2015-01-01
8005|2015-12-29
6254|2016-01-03
7962|2016-01-05
3339|2016-01-06
7834|2016-01-03
Partition3: WEEK_END=2016-01-10
ID|MAX(LAST_APPEARING_DATE)
7152|2016-01-07
8350|2016-01-09
4697|2016-01-14
7187|2015-01-01
8005|2015-12-29
6254|2016-01-03
7962|2016-01-05
3339|2016-01-06
7834|2016-01-03
2403|2016-01-10
0157|2016-01-11
2228|2016-01-14
Please use below query,
select ID, MAX(APPEARING_DATE) from table_name
group by ID, WEEK_END;
Or, including WEEK)END,
select ID, WEEK_END, MAX(APPEARING_DATE) from table_name
group by ID, WEEK_END;
You can use aggregation:
select t.*, max(week_end)
from t
group by id
having max(week_end) < '2016-01-02';
Adjust the date in the having clause for the week end that you want.
Actually, your question is a bit unclear. I'm not sure if a later week end would keep the row or not. If you want "as of" data, then include a where clause:
select t.id, max(week_end)
from t
where week_end < '2016-01-02'
group by id
having max(week_end) < '2016-01-02';
If you want this for a range of dates, then you can use a derived table:
select we.the_week_end, t.id, max(week_end)
from (select '2016-01-02' as the_week_end union all
select '2016-01-09' as the_week_end
) we cross join
t
where t.week_end < we.the_week_end
group by id, we.the_week_end
having max(t.week_end) < we.the_week_end;

How to calculate the longest period in days that a company has gone without headcount change?

Given an employees table with the columns EmpID,FirstName,LastName,StartDate, and EndDate.
I want to use a query on Oracle to calculate the longest period in days that a company has gone without headcount change.
Here is my query:
select MAX(endDate-startDate)
from
(select endDate
from employees
where endDate is not null)
union all
(select startDate
from employees)
But I got an error:
ORA-00904:"STARTDATE":invalid identifier
How can I fix this error?
Is my query the correct answer to this question?
Thanks
You aren't returning the startDate in the sub-query. Add startDate to the inner query.
select MAX(endDate-startDate) from
(select startDate, endDate from employees where endDate is not null)
union all
(select startDate from employees)
EDIT:
You can also try this:
select MAX(endDate-startDate) from employees where endDate is not null
However, I don't think your query is what you're looking for as it only lists the longest term employee that no longer works at the company.
In a simplistic view, you would want to put together all the start-dates (when the headcount increases) and all the end-dates (when it decreases), combine them all, arrange them in increasing order, measure the differences between consecutive dates, and take the max.
"Put together" is a UNION ALL, and measure differences between "consecutive" dates can be done with the analytic function lag().
One complication: one employee may start exactly on the same date another is terminated, so the headcount doesn't change. More generally, on any given date there may be starts and ends, and you need to exclude the dates when there are an equal number of starts and ends. So the first part of the solution is more complicated: you need to group by date and compare the start and end counts.
Something like this may work (not tested!):
with d ( dt, flag ) as (
select start_date, 's' from employees union all
select end_date , 'e' from employees
),
prep ( int ) as
select dt - lag(dt) over (order by dt)
from d
group by dt
having count(case flag when 's' then 1 end) !=
count(case flag when 'e' then 1 end)
)
select max(int) as max_interval
from prep
;
Edit - Gordon has a good point in his solution: perhaps the longest period without a change in headcount is the current period (ending "now"). For this reason, one needs to add SYSDATE to the UNION ALL, like he did. It can be added with either flag (for example 's' to be specific).
I think the answer to your question is something like this:
select max(span)
from (select (lead(dte) over (order by dte) - dte) as span
from (select startDate as dte from employees union all
select endDate as dte from employees union all
select trunc(sysdate) from dual
) d
) d;
A head-count change (presumably) occurs when an employee starts or stops. Hence, you want the largest interval between two such adjacent dates.

PL/SQL Finding Difference Between Start and End Dates in Different Rows

I am trying to find the difference between start and end dates in different rows of a result set, using PL/SQL. Here is an example:
ID TERM START_DATE END_DATE
423 201420 26-AUG-13 13-DEC-13
423 201430 21-JAN-14 09-MAY-14
423 201440 16-JUN-14 07-AUG-14
For any specific ID, I need to get the difference between the end date in the first record and the start date of the second record. Similarly, I need to get the difference between the end date in the second record and the start date of the third record, and so forth.
Eventually I will need to perform the same operation on a variety of IDs. I am assuming I have to use a cursor and loop.
I would appreciate any help or suggestions on accomplishing this. Thanks in advance.
The "lead" analytic function in Oracle can grab a value from the succeeding row as a value in the current row.
Given a series of rows returned from a query and a position of the cursor, LEAD provides access to a row at a given physical offset beyond that position.
Here, this SQL grabs start_date from the next row and subtracts end_date from the current row.
select id, term, start_date, end_date,
lead(start_date) over (partition by id order by term) - end_date diff_in_days
from your_table;
Sample output:
ID TERM START_DATE END_DATE DIFF_IN_DAYS
---------- ---------- -------------------- -------------------- ------------
423 201420 26-AUG-2013 00:00:00 13-DEC-2013 00:00:00 39
423 201430 21-JAN-2014 00:00:00 09-MAY-2014 00:00:00 36
423 201440 14-JUN-2014 00:00:00 07-AUG-2014 00:00:00
I would suggest looking at using the LEAD and LAG analytic functions from Oracle. By the sounds of it they should suit your needs.
See the docs here: http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions074.htm
Code:
SELECT [ID], [TERM], [START_DATE], [END_DATE],
CASE WHEN MIN([END_DATE]) OVER(PARTITION BY [ID] ORDER BY [TERM] ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)=[END_DATE] THEN NULL ELSE
MIN([END_DATE]) OVER(PARTITION BY [ID] ORDER BY [TERM] ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)-[START_DATE] END AS [DAYS_BETWEEN]
FROM [TABLE]
This seemed to work:
SELECT DISTINCT
ID,
TERM_CODE,
TERM_START_DATE,
TERM_END_DATE,
( ( LEAD ( TERM_START_DATE, 1 ) OVER ( PARTITION BY ID ORDER BY TERM_CODE ) ) -TERM_END_DATE AS DIFF DAYS
FROM TABLE

SQL to Find Number of Weeks an Employee Was Active Between Two Dates

I have a table with a list of dates where an employee became Active/Inactive and I want to count the weeks that an employee was Active within a certain date range.
So the table (ps_job) would have values like this:
EMPLID EFFDT HR_STATUS
------ ----- ------
1000 01-Jul-11 A
1000 01-Sep-11 I
1000 01-Jan-12 A
1000 01-Mar-12 I
1000 01-Sep-12 A
The query would need to show me the number of weeks that this emplid was active from 01-Jul-11 to 31-Dec-12.
The desired result set would be:
EMPLID WEEKS_ACTIVE
------ ------------
1000 35
I got the number 35 by adding the results from the SQLs below:
SELECT (NEXT_DAY('01-Sep-11','SUNDAY') - NEXT_DAY('01-Jul-11','SUNDAY'))/7 WEEKS_ACTIVE FROM DUAL;
SELECT (NEXT_DAY('01-Mar-12','SUNDAY') - NEXT_DAY('01-Jan-12','SUNDAY'))/7 WEEKS_ACTIVE FROM DUAL;
SELECT (NEXT_DAY('31-Dec-12','SUNDAY') - NEXT_DAY('01-Sep-12','SUNDAY'))/7 WEEKS_ACTIVE FROM DUAL;
The problem is I can't seem to figure out how to create a single query statement that will go through all the rows for every employee within a certain date range and just return each emplid and the number of weeks they were active. I would prefer to use basic SQL instead of PL/SQL so that I can transfer it to a PeopleSoft query that can be run by the user, but I am willing to run it for the user using Oracle SQL Developer if need be.
Database: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
Here I'm using lead in a subquery to get the next date and then summing the intervals in the outer query:
with q as (
select EMPLID, EFFDT, HR_STATUS
, lead (EFFDT, 1) over (partition by EMPLID order by EFFDT) as NEXT_EFFDT
from ps_job
order by EMPLID, EFFDT
)
select EMPLID
, trunc(sum((trunc(coalesce(NEXT_EFFDT, current_timestamp)) - trunc(EFFDT)) / 7)) as WEEKS_ACTIVE
from q
where HR_STATUS = 'A'
group by EMPLID;
The coalesce function will grab the system date in the event it cannot find a matching I record (employee is current). You could substitute the end of the year if that's your spec.
Note that I'm not doing any rigorous testing to see that your entries are ordered A/I/A/I etc., so you might want to add checks of that nature if you know your data requires it.
Feel free to play with this at SQL Fiddle.
If the customer just wants a rough estimate I'd start with the number of days for each stint, divided by 7 and rounded.
The trick is to line up the Active date with its corresponding Inactive date, and the best way I can think to do this is to pick out the Active and Inactive dates separately, rank them by date, and join them back together by EmplID and rank. The ROW_NUMBER() analytical function is the best way to rank in this situation:
WITH
EmpActive AS (
SELECT
EmplID,
EffDt,
ROW_NUMBER() OVER (PARTITION BY EmplID ORDER BY EffDt NULLS LAST) DtRank
FROM ps_job
WHERE HR_Status = 'A'
),
EmpInactive AS (
SELECT
EmplID,
EffDt,
ROW_NUMBER() OVER (PARTITION BY EmplID ORDER BY EffDt NULLS LAST) DtRank
FROM ps_job
WHERE HR_Status = 'I'
)
SELECT
EmpActive.EmplID,
EmpActive.EffDt AS ActiveDate,
EmpInactive.EffDt AS InactiveDate,
ROUND((NVL(EmpInactive.EffDt, TRUNC(SYSDATE)) - EmpActive.EffDt) / 7) AS WeeksActive
FROM EmpActive
LEFT JOIN EmpInactive ON
EmpActive.EmplID = EmpInactive.EmplID AND
EmpActive.DtRank = EmpInactive.DtRank
The third gig for EmplID = 1000 has an active date but no inactive date, hence the NULLS LAST in the ROW_NUMBER ordering and the left join between the two subqueries.
I've used the "days / 7" math here; you can substitute what you need when you hear back from the customer. Note that if there isn't a corresponding inactive date the query uses the current date.
There's a SQLFiddle of this here.
The following should work for what you are trying to do. I did have to hard code the end date in the NVL statement
SELECT emplid,
hr_status,
ROUND(SUM(end_date - start_date)/7) num_weeks
FROM (SELECT emplid,
hr_status,
effdt start_date,
NVL(LEAD(effdt) OVER (PARTITION BY emplid ORDER BY effdt),
TO_DATE('12312012','MMDDYYYY')) end_date
FROM ps_job
)
WHERE hr_status = 'A'
GROUP BY emplid,
hr_status
ORDER BY emplid
The inner query will pull the employee and HR status info from the table and use the effdt column as the start date and use the LEAD analytic function to get the next effdt date value from the table, which indicates the start of the next status and so would be the end_date of the current line. If the LEAD function returns NULL, we assign it the finish date (12/31/2012) that you were wanting. he out statement then just limits the result set to the records with the active HR status and calculates the weeks.

Last day of the month with a twist in SQLPLUS

I would appreciate a little expert help please.
in an SQL SELECT statement I am trying to get the last day with data per month for the last year.
Example, I am easily able to get the last day of each month and join that to my data table, but the problem is, if the last day of the month does not have data, then there is no returned data. What I need is for the SELECT to return the last day with data for the month.
This is probably easy to do, but to be honest, my brain fart is starting to hurt.
I've attached the select below that works for returning the data for only the last day of the month for the last 12 months.
Thanks in advance for your help!
SELECT fd.cust_id,fd.server_name,fd.instance_name,
TRUNC(fd.coll_date) AS coll_date,fd.column_name
FROM super_table fd,
(SELECT TRUNC(daterange,'MM')-1 first_of_month
FROM (
select TRUNC(sysdate-365,'MM') + level as DateRange
from dual
connect by level<=365)
GROUP BY TRUNC(daterange,'MM')) fom
WHERE fd.cust_id = :CUST_ID
AND fd.coll_date > SYSDATE-400
AND TRUNC(fd.coll_date) = fom.first_of_month
GROUP BY fd.cust_id,fd.server_name,fd.instance_name,
TRUNC(fd.coll_date),fd.column_name
ORDER BY fd.server_name,fd.instance_name,TRUNC(fd.coll_date)
You probably need to group your data so that each month's data is in the group, and then within the group select the maximum date present. The sub-query might be:
SELECT MAX(coll_date) AS last_day_of_month
FROM Super_Table AS fd
GROUP BY YEAR(coll_date) * 100 + MONTH(coll_date);
This presumes that the functions YEAR() and MONTH() exist to extract the year and month from a date as an integer value. Clearly, this doesn't constrain the range of dates - you can do that, too. If you don't have the functions in Oracle, then you do some sort of manipulation to get the equivalent result.
Using information from Rhose (thanks):
SELECT MAX(coll_date) AS last_day_of_month
FROM Super_Table AS fd
GROUP BY TO_CHAR(coll_date, 'YYYYMM');
This achieves the same net result, putting all dates from the same calendar month into a group and then determining the maximum value present within that group.
Here's another approach, if ANSI row_number() is supported:
with RevDayRanked(itemDate,rn) as (
select
cast(coll_date as date),
row_number() over (
partition by datediff(month,coll_date,'2000-01-01') -- rewrite datediff as needed for your platform
order by coll_date desc
)
from super_table
)
select itemDate
from RevDayRanked
where rn = 1;
Rows numbered 1 will be nondeterministically chosen among rows on the last active date of the month, so you don't need distinct. If you want information out of the table for all rows on these dates, use rank() over days instead of row_number() over coll_date values, so a value of 1 appears for any row on the last active date of the month, and select the additional columns you need:
with RevDayRanked(cust_id, server_name, coll_date, rk) as (
select
cust_id, server_name, coll_date,
rank() over (
partition by datediff(month,coll_date,'2000-01-01')
order by cast(coll_date as date) desc
)
from super_table
)
select cust_id, server_name, coll_date
from RevDayRanked
where rk = 1;
If row_number() and rank() aren't supported, another approach is this (for the second query above). Select all rows from your table for which there's no row in the table from a later day in the same month.
select
cust_id, server_name, coll_date
from super_table as ST1
where not exists (
select *
from super_table as ST2
where datediff(month,ST1.coll_date,ST2.coll_date) = 0
and cast(ST2.coll_date as date) > cast(ST1.coll_date as date)
)
If you have to do this kind of thing a lot, see if you can create an index over computed columns that hold cast(coll_date as date) and a month indicator like datediff(month,'2001-01-01',coll_date). That'll make more of the predicates SARGs.
Putting the above pieces together, would something like this work for you?
SELECT fd.cust_id,
fd.server_name,
fd.instance_name,
TRUNC(fd.coll_date) AS coll_date,
fd.column_name
FROM super_table fd,
WHERE fd.cust_id = :CUST_ID
AND TRUNC(fd.coll_date) IN (
SELECT MAX(TRUNC(coll_date))
FROM super_table
WHERE coll_date > SYSDATE - 400
AND cust_id = :CUST_ID
GROUP BY TO_CHAR(coll_date,'YYYYMM')
)
GROUP BY fd.cust_id,fd.server_name,fd.instance_name,TRUNC(fd.coll_date),fd.column_name
ORDER BY fd.server_name,fd.instance_name,TRUNC(fd.coll_date)