The Maximum value of two columns with group by - sql

I have a table that contains the followings data :
TRIP TRIP_DATE TRIP_TIME
A 2018-08-08 11:00
A 2018-08-09 11:00
A 2018-08-08 23:00
A 2018-08-20 11:00
A 2018-08-20 14:00
I want the select statement to retrieve the Number of trips, Count , the latest date and time.
Basically the output should be like this:
TRIPS MAX(TRIP_DATE) TRIP_TIME
5 2018-08-20 14:00

This is tricky. I think I would do:
select cnt, date, time
from (select t.*,
row_number() over (partition by trip order by date desc, time desc) as seqnum
count(*) over (partition by trip) as cnt
from t
) t
where seqnum = 1;

You can use the following using GROUP BY:
SELECT TRIP, COUNT(TRIP) AS cnt, MAX(CONCAT(TRIP_DATE, ' ', TRIP_TIME)) AS maxDateTime
FROM table_name
GROUP BY TRIP
To combine the DATE and TIME value you can use one of the following:
using CONCAT_WS: CONCAT_WS(' ', TRIP_DATE, TRIP_TIME)
using CONCAT: CONCAT(TRIP_DATE, ' ', TRIP_TIME)
You can use the above query as sub-query to get the DATE and TIME as seperate values:
SELECT TRIP, cnt, DATE(maxDateTime), TIME_FORMAT(TIME(maxDateTime), '%H:%i') FROM (
SELECT TRIP, COUNT(TRIP) AS cnt, MAX(CONCAT(TRIP_DATE, ' ', TRIP_TIME)) AS maxDateTime
FROM table_name
GROUP BY TRIP
)t;
Note: I recommend to split the DATE and TIME values on the application side. I would also store the DATE and TIME value in one column as DATETIME instead of separate columns.
demos: https://www.db-fiddle.com/f/xcMdmivjJa29rDhHxkUmuJ/2

You can use row_number() function :
select t.*
from (select *, row_number() over (partition by trip order by date desc, time desc) seq
from table t
) t
where seq = 1;

I would go with this (assuming you wanted the MAX Trip_Time as well, its a little difficult to tell from your example):
SELECT COUNT(TRIP) AS Trips,
MAX(TRIP_DATE) AS MAX(TRIP_DATE),
MAX(TRIP_TIME) AS TRIP_TIME
FROM myTable
GROUP BY TRIP

You have option of using analytic function as will as group function here.
All will do the job . Looking at final output I believe max function with group by is more suitable.
There is no hard and fast rule but personally I prefer grouping when final outcome need to be suppressed.

Related

Cumulative count on date data from datetime format

I have a table looks like:
ID entry_time
abc123 2020-05-29 10:29:18.000
def456 2020-05-30 13:12:43.000
...
I want to do cumulative count by date, so I did:
select entry_time, count (*) OVER (ORDER BY entry_time) as TOTAL_NUM from my_table;
It is okay, but it will count based on datetime format. I would like to count only on date (i.e. by day, don't care about time).
How would I do that?
Many thanks,
You can try by converting entry time to date.
select
convert(date, entry_time) as entry_time,
count (*) OVER (ORDER BY convert(date, entry_time)) as total_num
from my_table;
If you want one record per day, you can use aggregation and window functions:
select
date(entry_time) entry_day,
sum(count(*)) over(order by date(entry_time)) total_num
from mytable
group by date(entry_time)
order by entry_day
This assumes that your database supports the date(), that converts a datetime to a date (as in MySQL for example). If it does not, it sure has an alternative way to do this.
You can use convert or cast entry_time to Date
select entry_time, count (*) OVER (ORDER BY Convert(date,entry_time)) as TOTAL_NUM from my_table;
OR
select entry_time, count (*) OVER (ORDER BY Cast(entry_timeas as date)) as TOTAL_NUM from my_table;

DISTINCT ON to find min and max times

I have tried using DISTINCT ON with posrgresql to achieve the following:
Lets say I have a table that looks like this:
id time price
1 12:00 10
1 13:00 20
1 14:00 30
And my goal is to create a table with only 1 row per id, that shows a column of the minimum time price and the maximum time price. Something that looks like this:
id min_time_price max_time_price
1 10 30
I tried using DISTINCT ON (id) but can't really get it.
Would love some help, Thank you!
Here is one method:
select t.id, tmin.price, tmax.price
from (select t.id, min(time) as min_time, max(time) as max_time
from t
) t join
t tmin
on t.id = tmin.id and t.min_time = tmin.time join
t tmax
on t.id = tmax.id and t.max_time = tmax.time;
You can also use aggregation. Postgres doesn't have first()/last() aggregation functions, but arrays are handy:
select t.id,
array_agg(price order by time asc)[1] as min_time_price,
array_agg(price order by time desc)[1] as max_time_price
from t
group by id;
Or using first_value() and last_value():
select distinct t.id,
first_value(price) over (partition by time order by time) as min_time_price,
first_value(price) over (partition by time order by time desc) as max_time_price
from t

To recieve one record per id and the one with the latest date

I have the following table stamps with the columns:
Worker
Date
Transferred
Balance
I want out one row per worker,
the record with the latest day, and also have value 1 in Transferred
I have tried a lot of possibilities but none works the way I want to.
SELECT DISTINCT OUT.WORKER,OUT.DATE,OUT.TRANSFERRED,OUT.BALANCE
FROM (
SELECT WORKER,DATE,TRANSFERRED,BALANCE
FROM STAMPS
ORDER BY DATE DESC
) AS OUT
GROUP BY WORKER
You say you want the latest day (presumably the latest for a given worker), so you need the max function.
select s.Worker,
s.Date,
s.Transferred,
s.Balance
from
(select worker,
max(date) as date
from stamps
where transferred = 1
group by Worker) as max_dates,
join stamps s
on s.worked = max_dates.worker
and s.date = max_dates.date
The typical way is to use window functions:
SELECT S.WORKER, S.DATE, S.TRANSFERRED, S.BALANCE
FROM (SELECT S.*,
ROW_NUMBER() OVER (PARTITION BY WORKER ORDER BY DATE DESC) AS SEQNUM
FROM STAMPS S
) S
WHERE SEQNUM = 1;
With the right indexes, a correlated subquery often has the best performance:
select s.*
from stamps s
where s.date = (select max(s2.date)
from stamps s2
where s2.worker = s.worker
);
The appropriate index is on stamps(worker, date).
SELECT * FROM dbo.STAMPS[enter image description here][1]
SELECT subResult.WORKER,
subResult.Date,
subResult.Transferred,
subResult.Balance
FROM
(
SELECT WORKER,
DATE,
TRANSFERRED,
BALANCE,
ROW_NUMBER() OVER(PARTITION BY Worker ORDER BY date DESC) AS rowNum
FROM STAMPS
WHERE Transferred=1
) AS subResult
WHERE subResult.rowNum=1

SQL: transposing a time series table into a start-end time table if an event occur

I am trying to use a select statement to create a view, transposing a table with datetime into a table with records in each row, the start-end time when the consecutive values by time (partition by station) in 'record' field is not 0.
Here is a sample of the initial table.
And how it should look like after transposing.
Can anyone help?
You can use the conditional_change_event analytical function to create a special grouping identifier to split these out in a simple query:
select row_number() over () unique_id,
station,
min(datetime) startdate,
max(datetime) enddate
from (
select t.*, CONDITIONAL_CHANGE_EVENT(decode(recording,0,0,1))
over (partition by station order by datetime) chg
from mytable t
) x
where recording > 0
group by station, chg
order by 1, 2
The decode is just to set up your islands and gaps (where gaps are recording <= 0 and islands are recording > 0). Then the change event on that will generate a new identifier for grouping. Also note that I am grouping on the change event even though it isn't part of the output.
ROW_NUMBER() is the best for partitioning. Next, you can do a self join on the partitioned tables to see if the difference between times is greater than five minutes. I think the best solution is to partition on the rolling sum of the timestamp difference, offset by 5 minutes based on your pattern. If the five minutes is not a regular pattern then there is probably a generalized approach that can be used with the zeroes.
Solution written as a CTE below for easy view creation (it's a slow view though).
WITH partitioned as (
SELECT datetime, station, recording,
ROW_NUMBER() OVER(PARTITION BY station
ORDER BY datetime ASC) rn
FROM table --Not sure what the tablename is
WHERE recording != 0),
diffed as (
SELECT a.datetime, a.station,
DATEDIFF(mi,ISNULL(b.datetime,a.datetime),a.datetime)-5) Difference
--The ISNULL logic is for when a.datetime is the beginning of the block,
--we want a 0
FROM partitioned a
LEFT JOIN partitioned b on a.rn = b.rn + 1 and a.station=b.station
GROUP BY a.datetime,a.station),
cumulative as (
SELECT a.datetime, a.station, SUM(b.difference) offset_grouping
FROM diff a
LEFT JOIN diff b on a.datetime >= b.datetime and a.station = b.station ),
ordered as (SELECT datetime,station,
ROW_NUMBER() OVER(PARTITION BY station,offset_grouping ORDER BY datetime asc) starter,
ROW_NUMBER() OVER(PARTITION BY station,offset_grouping ORDER BY datetime desc) ender
FROM cumulative)
SELECT ROW_NUMBER() OVER(ORDER BY a.datetime) unique_id,a.station,a.datetime startdate, b.datetime enddate
FROM ordered a
JOIN ordered b on a.starter = b.ender and a.station=b.station and a.starter=1
This is the only solution I can think of but again, it's slow depending on the amount of data you have.

SQL query for find out in/out time in a office

This may be silly question for the DATABASE guys but for me its very tough because it's the first time when I am seeing the DATABASE..I am not the DATABASE guy but I have to do this.
I have DATABASE in SQL SERVER 2005 and there is a VIEW that contain the data related to employee in/out in the office.
There may be more then 2 entry in a day for a particular date but we consider first entry in a day is in time in the office and last entry as the out time from the office,In between these these employee can go out side any no of time but we consider only first and last time entry in a day.
So I have to to write QUERY for this...
EDIT :
List of coloumns in a view some other coloumn are also there but I do'nt think it is necessary to describe here ..
CardSrNo
LastName
FirstName
MidleName
PersonalID
Date
Time
YMD
HMS
CardNumber
Department
Try a good ol' group by.
Select employeeID, Date,
MIN(Time) as InTime,
MAX(Time) as OutTime
FROM Transactions
GROUP BY employeeID, Date
You can use ROW_NUMBER() ranking function to determine first and last time entry
;WITH cte AS
(
SELECT *, ROW_NUMBER() OVER(PARTITION BY PersonalID ORDER BY [Date] ASC, [Time] ASC) AS rnASC,
ROW_NUMBER() OVER(PARTITION BY PersonalID ORDER BY [Date] DESC, [Time] DESC) AS rnDESC
FROM dbo.your_tableName
)
SELECT *
FROM cte
WHERE rnASC = 1 OR rnDESC = 1
Simple example on SQLFiddle
OR use option with EXISTS operator
SELECT *
FROM dbo.your_tableName t1
WHERE EXISTS (
SELECT 1
FROM dbo.your_tableName t2
WHERE t1.PersonalID = t2.PersonalID
HAVING (t1.[Date] = MAX(t2.[Date]) AND t1.[Time] = MAX(t2.[Time]))
OR (t1.[Date] = MIN(t2.[Date]) AND t1.[Time] = MIN(t2.[Time]))
)
Simple example on SQLFiddle
Now I am using that query :
SELECT FirstName,LastName,CardNumber,Department, Date , MIN(Time) AS intime , MAX(Time) AS outtime ,
convert(varchar(8),(convert(datetime,MAX(Time),110) - convert(datetime,MIN(Time),110)),108) AS Duration
FROM CARDENTRYEXITTRANSACTIONVIEW
GROUP BY FirstName, Date,LastName,CardNumber,Department
And its give the perfect result............