Oracle - select rows with minimal value in a subset - sql

I have a following table of dates:
dateID INT (PK),
personID INT (FK),
date DATE,
starttime VARCHAR, --Always in a format of 'HH:MM'
What I want to do is I want to pull rows (all columns, including PK) with lowest date (primary condition) and starttime (secondary condition) for every person. For example, if we have
row1(date = '2013-04-01' and starttime = '14:00')
and
row2(date = '2013-04-02' and starttime = '08:00')
row1 will be retrieved, along with all other columns.
So far I have come up with gradual filtering the table, but it`s quite a mess. Is there more efficient way of doing this?
Here is what I made so far:
SELECT
D.id
, D.personid
, D.date
, D.starttime
FROM table D
JOIN (
SELECT --Select lowest time from the subset of lowest dates
A.personid,
B.startdate,
MIN(A.starttime) AS starttime
FROM table A
JOIN (
SELECT --Select lowest date for every person to exclude them from outer table
personid
, MIN(date) AS startdate
FROM table
GROUP BY personid
) B
ON A.personid = B.peronid
AND A.date = B.startdate
GROUP BY
A.personid,
B.startdate
) C
ON C.personid = D.personid
AND C.startdate = D.date
AND C.starttime = D.starttime
It works, but I think there is a more clean/efficient way to do this. Any ideas?
EDIT: Let me expand a question - I also need to extract maximum date (only date, without time) for each person.
The result should look like this:
id
personid
max(date) for each person
min(date) for each person
min(starttime) for min(date) for each person
It is a part of a much larger query (the resulting table is joined with it), and the resulting table must be lightweight enough so that the query won`t execute for too long. With single join with this table (just using min, max for each field I wanted) the query took about 3 seconds, and I would like the resulting query not to take longer than 2-3 times that.

you should be able to do this like:
select a.dateID, a.personID, a.date, a.max_date, a.starttime
from (select t.*,
max(t.date) over (partition by t.personID) max_date,
row_number() over (partition by t.personID
order by t.date, t.starttime) rn
from table t) a
where a.rn = 1;
sample data added to fiddle: http://sqlfiddle.com/#!4/63c45/1

This is the query you can use and no need to incorporate in your query. You can also use #Dazzal's query as stand alone
SELECT ID, PERSONID, DATE, STARTTIME
(
SELECT ID, PERONID, DATE, STARTTIME, ROW_NUMBER() OVER(PARTITION BY personid ORDER BY STARTTIME, DATE) AS RN
FROM TABLE
) A
WHERE
RN = 1

select a.id,a.accomp, a.accomp_name, a.start_year,a.end_year, a.company
from (select t.*,
min(t.start_year) over (partition by t.company) min_date,
max(t.end_year) over (partition by t.company) max_date,
row_number() over (partition by t.company
order by t.end_year desc) rn
from temp_123 t) a
where a.rn = 1;

Related

Find the max date to last one year transaction for each group

I have to query in sql server where I have to find for each id it's volume such that we have last 1 year date for each id with it's volume.
for example below is my data ,
for each id I need to query the last 1 year transaction from when we have the entry for that id as you can see from the snippet for id 1 we have the latest date as 7/31/2020 so I need the last 1 year entry from that date for that id, The highlighted one is exclude because that date is more than 1 year from the latest date for that id
Similarly for Id 3 we have all the date range in one year from the latest date for that particular id
I tried using the below query and I can get the latest date for each id but I am not sure how to extract all the dates for each id from the latest date to one year, I would appreciate if some one could help me.
I am using Microsoft sql server would need the query which executes in sql server, Table name is emp and have millions of id
Select *
From emp as t
inner join (
Select tm.id, max(tm.date_tran) as MaxDate
From emp tm
Group by tm.id
) tm on t.id = tm.id and t.date_tran = tm.MaxDate
To exclude transactions where the date difference between the tran_date and the maximum tran_date for each id is greater than 1 year, something like this:
;with max_cte(id, max_date) as (
Select id, max(date_tran)
From emp tm
Group by id )
Select *
From emp e
join max_cte mc on e.id=mc.id
and datediff(d, e.date_tran, mc.max_date)<=365;
Update: per comments, added volume. Thnx GMB :)
;with max_cte(id, date_tran, volume, max_date) as (
Select *, dateadd(year, -1, max(date_tran) over(partition by id)) max_date
From #emp tm)
Select id, sum(volume) sum_volume
From max_cte mc
where mc.date_tran>max_date
group by id;
You can do this with window functions:
select id, sum(volume) total_volume
from (
select t.*, max(date_tran) over(partition by id) max_date_tran
from mytable t
) t
where date_tran > dateadd(year, -1, max_date_tran)
group by id
Alternatively, you can use a correlated subquery for filtering:
select id, sum(volume) total_volume
from mytable t
where t.date_tran > (
select dateadd(year, -1, max(t1.date_tran))
from mytable t1
where t1.id = t.id
)
The second query would take advantage of an index on (id, date_tran).
this should do the trick for you:
SELECT
*
FROM
emp
JOIN
(
SELECT
MAX(date_tran) max_date_tran
, Id
FROM
emp
GROUP BY
id
) emp2
ON emp2.Id = emp.Id
AND DATEADD(YEAR, -1, emp2.max_date_tran) <= emp.date_tran;
Your code is good. Just add the date difference function to get the particular time in between the transaction, like the following:
Select *
From emp as t
inner join ( Select id as id, max(date_tran) as maxdate
From emp tm
Group by id
) tm on t.id = tm.id and datediff(d, e.date_tran, mc.maxdate)<=365;

Rank the closest date, but not if date has already been used in previous row

Complicated to summarize the issue in the title, and in text so check the IMG for visual explanation.
I've got an issue joining two tables. The date from the first table (startdatetable) should get the next/closest date in the other table (enddatetable). This is to 99% easy done with rank, because most rows have a startdate that can find the next enddate, and before another enddate is available there is a new startdate.
However, if there are two startdates and only one enddate available the startdates will join the same enddate.
What I'm trying to do is, that if a date has been used in the row before, it should not be used in the next row.
The rows I want are highlighted.
The SQL I started out with looked like
select *
from (
select rank() over (partition by tid order by enddate asc) as rnk
, id, startdate, enddate
from startdateTable
inner join enddateTable on startdateTable.ID = enddateTable.id
and enddateTable.enddate > startdateTable.startdate
) q
where q.rnk = 1
This gets me the following result. The last row should instead get the 2100 date, since the 2020-06 date has been used in the previous row.
If you have two tables and you want to align them, then you can use row_number():
select s.id, s.startdate, e.enddate
from (select s.*, row_number() over (partition by id order by startdate) as seqnum
from startdateTable
) s join
(select e.*, row_number() over (partition by id order by enddate) as seqnum
from enddateTable
) e
on e.id = s.id and e.seqnum = s.seqnum

SQL: transposing a time series table into a start-end time table if an event occur

I am trying to use a select statement to create a view, transposing a table with datetime into a table with records in each row, the start-end time when the consecutive values by time (partition by station) in 'record' field is not 0.
Here is a sample of the initial table.
And how it should look like after transposing.
Can anyone help?
You can use the conditional_change_event analytical function to create a special grouping identifier to split these out in a simple query:
select row_number() over () unique_id,
station,
min(datetime) startdate,
max(datetime) enddate
from (
select t.*, CONDITIONAL_CHANGE_EVENT(decode(recording,0,0,1))
over (partition by station order by datetime) chg
from mytable t
) x
where recording > 0
group by station, chg
order by 1, 2
The decode is just to set up your islands and gaps (where gaps are recording <= 0 and islands are recording > 0). Then the change event on that will generate a new identifier for grouping. Also note that I am grouping on the change event even though it isn't part of the output.
ROW_NUMBER() is the best for partitioning. Next, you can do a self join on the partitioned tables to see if the difference between times is greater than five minutes. I think the best solution is to partition on the rolling sum of the timestamp difference, offset by 5 minutes based on your pattern. If the five minutes is not a regular pattern then there is probably a generalized approach that can be used with the zeroes.
Solution written as a CTE below for easy view creation (it's a slow view though).
WITH partitioned as (
SELECT datetime, station, recording,
ROW_NUMBER() OVER(PARTITION BY station
ORDER BY datetime ASC) rn
FROM table --Not sure what the tablename is
WHERE recording != 0),
diffed as (
SELECT a.datetime, a.station,
DATEDIFF(mi,ISNULL(b.datetime,a.datetime),a.datetime)-5) Difference
--The ISNULL logic is for when a.datetime is the beginning of the block,
--we want a 0
FROM partitioned a
LEFT JOIN partitioned b on a.rn = b.rn + 1 and a.station=b.station
GROUP BY a.datetime,a.station),
cumulative as (
SELECT a.datetime, a.station, SUM(b.difference) offset_grouping
FROM diff a
LEFT JOIN diff b on a.datetime >= b.datetime and a.station = b.station ),
ordered as (SELECT datetime,station,
ROW_NUMBER() OVER(PARTITION BY station,offset_grouping ORDER BY datetime asc) starter,
ROW_NUMBER() OVER(PARTITION BY station,offset_grouping ORDER BY datetime desc) ender
FROM cumulative)
SELECT ROW_NUMBER() OVER(ORDER BY a.datetime) unique_id,a.station,a.datetime startdate, b.datetime enddate
FROM ordered a
JOIN ordered b on a.starter = b.ender and a.station=b.station and a.starter=1
This is the only solution I can think of but again, it's slow depending on the amount of data you have.

Efficiently group by column aggregate

SELECT date, id, sum(revenue)
FROM table
WHERE date between '2013-01-01' and '2013-01-08'
GROUP BY date, id
HAVING sum(revenue)>1000
Returns rows that have revenue>1000.
SELECT date, id, sum(revenue)
FROM table
WHERE date between '2013-01-01' and '2013-01-08'
AND id IN (SELECT id FROM table where date between '2013-01-01' and '2013-01-08' GROUP BY id HAVING sum(revenue)>1000)
GROUP BY date, id
Returns rows for id's whose total revenue over the date period is >1000 as desired. But this query is much slower. Any quicker way to do this?
Make sure you have indexes on the date and id columns, and try this variation:
select t.date, t.id, sum(t.revenue)
from table t
inner join (
select id
from table
where date between '2013-01-01' and '2013-01-08'
group by id
having sum(revenue) > 1000
) ts on t.id = ts.id
where t.date between '2013-01-01' and '2013-01-08'
group by t.date, t.id
it's not MySQL, it's Vertica ;)
Cris, what projection and order by you using in CREATE TABLE ???
Do you try using database designer
see http://my.vertica.com/docs/6.1.x/HTML/index.htm#14415.htm

SQL query for find out in/out time in a office

This may be silly question for the DATABASE guys but for me its very tough because it's the first time when I am seeing the DATABASE..I am not the DATABASE guy but I have to do this.
I have DATABASE in SQL SERVER 2005 and there is a VIEW that contain the data related to employee in/out in the office.
There may be more then 2 entry in a day for a particular date but we consider first entry in a day is in time in the office and last entry as the out time from the office,In between these these employee can go out side any no of time but we consider only first and last time entry in a day.
So I have to to write QUERY for this...
EDIT :
List of coloumns in a view some other coloumn are also there but I do'nt think it is necessary to describe here ..
CardSrNo
LastName
FirstName
MidleName
PersonalID
Date
Time
YMD
HMS
CardNumber
Department
Try a good ol' group by.
Select employeeID, Date,
MIN(Time) as InTime,
MAX(Time) as OutTime
FROM Transactions
GROUP BY employeeID, Date
You can use ROW_NUMBER() ranking function to determine first and last time entry
;WITH cte AS
(
SELECT *, ROW_NUMBER() OVER(PARTITION BY PersonalID ORDER BY [Date] ASC, [Time] ASC) AS rnASC,
ROW_NUMBER() OVER(PARTITION BY PersonalID ORDER BY [Date] DESC, [Time] DESC) AS rnDESC
FROM dbo.your_tableName
)
SELECT *
FROM cte
WHERE rnASC = 1 OR rnDESC = 1
Simple example on SQLFiddle
OR use option with EXISTS operator
SELECT *
FROM dbo.your_tableName t1
WHERE EXISTS (
SELECT 1
FROM dbo.your_tableName t2
WHERE t1.PersonalID = t2.PersonalID
HAVING (t1.[Date] = MAX(t2.[Date]) AND t1.[Time] = MAX(t2.[Time]))
OR (t1.[Date] = MIN(t2.[Date]) AND t1.[Time] = MIN(t2.[Time]))
)
Simple example on SQLFiddle
Now I am using that query :
SELECT FirstName,LastName,CardNumber,Department, Date , MIN(Time) AS intime , MAX(Time) AS outtime ,
convert(varchar(8),(convert(datetime,MAX(Time),110) - convert(datetime,MIN(Time),110)),108) AS Duration
FROM CARDENTRYEXITTRANSACTIONVIEW
GROUP BY FirstName, Date,LastName,CardNumber,Department
And its give the perfect result............