SQL: Get latest record - sql

this is my relational model:
Request
------------------------------
RequestId
------------------------------
1
2
RequestState
------------------------------
RequestStateId | Name
------------------------------
10 | Received
20 | Processing
30 | Finsihed
Request_RequestState
-------------------------------------------------------------------
Request_RequestStateId | RequestId | RequestStateId | CreatedOn
-------------------------------------------------------------------
1 | 1 | 10 | 2010-01-01
2 | 1 | 20 | 2010-01-02
3 | 2 | 10 | 2010-01-15
Each time a request state changes, this change is stored.
Now I need to list requests by its current state.
Like "Get all requests with current state = Received".
So far I only managed to created a query that return requests of a given state, but it doesn't matter if it is the current state or an older one... So I somehow need to use CreatedOn to get the latest/current state.
Any help? Thanks in advance!

You change your model...
With the current scheme, as more and more data changes take place, it will take longer and longer to determine the current state using the queries suggested above...
You need a "Current_Request_State" attribute on your request.

This query should also give you what you want but I also agree with Martin Milan that you should consider caching the most recent status value on the Request table.
SELECT r.RequestId, rrs.RequestStateId, rs.RequestStateName, rrs.StateChangedDate
FROM Request r
INNER JOIN (
SELECT ROW_NUMBER() OVER (PARTITION BY RequestId ORDER BY CreatedOn DESC) AS ROWNUM,
RequestId,
RequestStateId
CreatedOn
FROM Request_RequestState
) rrs
ON r.RequestId = rrs.RequestId
AND ROWNUM = 1
INNER JOIN RequestState rs
ON rrs.RequestStateId = rs.REquestStateId

This should do it for you:
SELECT r.RequestId,
s.Name AS RequestStateName
FROM Request r
INNER JOIN Request_RequestState rs
ON rs.Request_RequestStateId = (
SELECT TOP 1 x.Request_RequestStateId
FROM Request_RequestState x
WHERE x.RequestId = r.RequestId
--// you could add filter to get "current" status at some DATE
--//AND x.CreatedOn < '2010-01-15'
ORDER BY x.CreatedOn DESC
)
INNER JOIN RequestState s
ON s.RequestStateId = rs.RequestStateId
WHERE s.Name = 'Received'
You can also get all "current" request as of some other date, if you use filter as commented in the code.
I would probably just create a view from the SQL query above, and use it:
SELECT * FROM MyRequestStateView WHERE RequestStateName = 'Received'

Assuming that Request_RequestStateId increments up with time (i.e. the records with the greatest ID has the latest CreatedOn date)....
SELECT rrs.RequestId, rrs.RequestStateId, rs.Name
FROM Request_RequestState rrs
JOIN (
SELECT MAX(Request_RequestStateId) AS LatestRequestStateId
FROM Request_RequestState
GROUP BY RequestId
) rrs2 ON rrs.Request_RequestStateId = rrs2.LatestRequestStateId
JOIN RequestState rs ON rrs.RequestStateId = rs.RequestStateId
WHERE rs.Name = 'Received'

A possible query could look like that:
select r.requestId,
rs.RequestStateId,
rs.Name,
rrs.CreatedOn
from (select r2.* from request_requeststate where r2.createdon = (select max(createdon) from request_requeststate r3 where r3.request_requeststateId = r2.request_requeststateId)) rrs
inner join requeststate rs on rs.requeststateId = rrs.reqeststateid
inner join request r on r.requestid = rrs.requestid
You could use this query as a view or add a where clause where you filter for a specific request-state.

Related

Group By Dynamic Ranges in SQL (cockroachdb/postgres)

I have a query that looks like
select s.session_id, array_agg(sp.value::int8 order by sp.value::int8) as timestamps
from sessions s join session_properties sp on sp.session_id = s.session_id
where s.user_id = '6f129b1c-43a6-4871-86f6-1749bfe1a5af' and sp.key in ('SleepTime', 'WakeupTime') and value != 'None' and value::int8 > 0
group by s.session_id
The result would look like
f321c813-7927-47aa-88c3-b3250af34afa | {1588499070,1588504354}
f38a8841-c402-433d-939d-194eca993bb6 | {1588187599,1588212803}
2befefaf-3b31-46c9-8416-263fa7b9309d | {1589912247,1589935771}
3da64787-65cd-4305-b1ac-1393e2fb11a9 | {1589741569,1589768453}
537e69aa-c39d-484d-9108-2f2cd956d4ee | {1588100398,1588129026}
5a9470ff-f930-491f-a57d-8c089e535d53 | {1589140368,1589165092}
The first column is a unique id and the second column is from and to timestamps.
Now I have a third table which has some timeseries data
records
------------------------
timestamp | name | value
Is it possible to find avg(value) from from records in group of session_ids over the from and to timestamps.
I could run a for loop in the application and do a union to get the desired result. But I was wondering if that is possible in postgres or cockroachdb
I wouldn't aggregate the two values but use two joins to find them. That way you can be sure which value belongs to which property.
Once you have that, you can join that result to your records table.
with ranges as (
select s.session_id, st.value as from_value, wt.value as to_value
from sessions s
join session_properties st on sp.session_id = s.session_id and st.key = 'SleepTime'
join session_properties wt on wt.session_id = s.session_id and wt.key = 'WakeupTime'
where s.user_id = '6f129b1c-43a6-4871-86f6-1749bfe1a5af'
and st.value != 'None' and wt.value::int8 > 0
and wt.value != 'None' and wt.value::int8 > 0
)
select ra.session_id, avg(rc.value)
from records rc
join ranges ra
on ra.from_value >= rc.timewstamp
and rc.timestamp < ra.to_value
group by ra.session_id;

SQL Count with join are returning double results

I have two tables, "event" and "soundType". I am trying to count the number of event with specific soundType.
This is my request :
SELECT Count(*) AS nb
FROM event
INNER JOIN soundtype
ON event.id = soundtype.eventid
WHERE ( soundtype.NAME = 'pop'
OR soundtype.NAME = 'rock' )
AND ( event.partytype = 'wedding'
OR event.partytype = 'Corporate evening'
OR event.partytype = 'birthday' )
Example of tables below:
event Table
id userId partyType
----------------------------
249 30 birthday
250 30 wedding
SoundType Table
id evenId name
-----------------------
1 249 pop
2 249 rock
3 250 pop
The result
nb
---
3
The result i expect
nb
---
2
Thank you for your help
You might find that exists is more efficient than count(distinct):
SELECT COUNT(*) AS nb
FROM event e
WHERE e.partytype IN ('wedding', 'Corporate evening' , 'birthday') AND
EXISTS (SELECT 1
FROM soundtype st
WHERE st.eventid = e.id AND
st.NAME IN ('pop', 'rock')
) ;
Your problem is (presumably) arising because some events have multiple sound types. You just need to match one of them. Multiplying out all the rows just to use COUNT(DISTINCT) is inefficient, when EXISTS (or IN) prevents the duplicates in the first place.
You count all the resulting records. But you need to count different events. So use distinct
SELECT COUNT(distinct event.id) AS nb
FROM event
INNER JOIN soundType ON event.id = soundType.eventId
WHERE soundType.name in('pop', 'rock')
AND event.partyType in('wedding', 'Corporate evening', 'birthday')

Hive table with multiple partitions

I have a table (data_table) with multiple partition columns year/month/monthkey.
Directories look something like year=2017/month=08/monthkey=2017-08/files.parquet
Which of the below queries would be faster?
select count(*) from data_table where monthkey='2017-08'
or
select count(*) from data_table where monthkey='2017-08' and year = '2017' and month = '08'
I think the initial time taken by hadoop take to find the required directories in the first case would be more. But want to confirm
Finding the relevant partitions is a metastore operation and not a file system operation.
It is done by querying the metasore and not by scanning the directories.
The metasore query of the first use-case will most likely be faster than the second use-case but in any case we are talking here on fractions of a second.
Demo
create external table t100k(i int)
partitioned by (x int,y int,xy string)
;
explain dependency select count(*) from t100k where xy='100-1000';
The query that was issued against the metastore:
select "PARTITIONS"."PART_ID"
from "PARTITIONS"
inner join "TBLS" on "PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID" and "TBLS"."TBL_NAME" = 't100k'
inner join "DBS" on "TBLS"."DB_ID" = "DBS"."DB_ID" and "DBS"."NAME" = 'local_db'
inner join "PARTITION_KEY_VALS" "FILTER2" on "FILTER2"."PART_ID" = "PARTITIONS"."PART_ID" and "FILTER2"."INTEGER_IDX" = 2
where (("FILTER2"."PART_KEY_VAL" = '100-1000'))
explain dependency select count(*) from t100k where x=100 and y=1000 and xy='100-1000';
The query that was issued against the metastore:
select "PARTITIONS"."PART_ID"
from "PARTITIONS"
inner join "TBLS" on "PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID" and "TBLS"."TBL_NAME" = 't100k'
inner join "DBS" on "TBLS"."DB_ID" = "DBS"."DB_ID" and "DBS"."NAME" = 'local_db'
inner join "PARTITION_KEY_VALS" "FILTER0" on "FILTER0"."PART_ID" = "PARTITIONS"."PART_ID" and "FILTER0"."INTEGER_IDX" = 0
inner join "PARTITION_KEY_VALS" "FILTER1" on "FILTER1"."PART_ID" = "PARTITIONS"."PART_ID" and "FILTER1"."INTEGER_IDX" = 1
inner join "PARTITION_KEY_VALS" "FILTER2" on "FILTER2"."PART_ID" = "PARTITIONS"."PART_ID" and "FILTER2"."INTEGER_IDX" = 2
where ( ( (((case when "FILTER0"."PART_KEY_VAL" <> '__HIVE_DEFAULT_PARTITION__' then cast("FILTER0"."PART_KEY_VAL" as decimal(21,0)) else null end) = 100)
and ((case when "FILTER1"."PART_KEY_VAL" <> '__HIVE_DEFAULT_PARTITION__' then cast("FILTER1"."PART_KEY_VAL" as decimal(21,0)) else null end) = 1000))
and ("FILTER2"."PART_KEY_VAL" = '100-1000')) )
Since comment will change the formatting, hence posting here.
Kindly accept #Dudu's reply. Please execute the below on metastore DB (mysql in my case):
mysql> select part_id, location, tbl_id, part_name from PARTITIONS as P inner join SDS as S on P.SD_ID = S.SD_ID where P.TBL_ID = 472;
+---------+-------------------------------------------------------------------------+--------+--------------------------------------+
| part_id | location | tbl_id | part_name |
+---------+-------------------------------------------------------------------------+--------+--------------------------------------+
| 7 | hdfs://hostname:8020/tmp/multi_part/2011/01/2011-01 | 472 | year=2011/month=1/year_month=2011-01 |
| 9 | hdfs://hostname:8020/tmp/multi_part/2012/01/2012-01 | 472 | year=2012/month=1/year_month=2012-01 |
+---------+-------------------------------------------------------------------------+--------+--------------------------------------+
2 rows in set (0.00 sec)
The location from both the queries will pull data from same hdfs directory.
The only difference in speed will be from the metastore DB query that is already explained in Dudu's answer.

SELF-JOIN discarding true CROSS JOIN rows

I have the following query;
What I get is tickets information. I use self-join to obtain the requester and the assignee in the same row:
SELECT z.id AS TICKET, z.name AS Subject, reqs.name AS Requester, techs.name AS Assignee,
e.name AS Entity,DATE_FORMAT(tt.date,'%y%-%m%-%d') AS DATE,
DATE_FORMAT(tt.date,'%T') AS HOUR,
CASE WHEN z.priority = 6 THEN 'Mayor' WHEN z.priority = 5 THEN 'Muy urgente' WHEN z.priority = 4 THEN 'Urgente'WHEN z.priority = 3 THEN 'Mediana' WHEN z.priority = 2 THEN 'Baja' WHEN z.priority =1 THEN 'Muy baja' END AS Priority,
c.name AS Category, i.name AS Department
FROM glpi_tickets_users tureq
JOIN glpi_tickets_users tutech ON tureq.tickets_id = tutech.tickets_id
JOIN glpi_users AS reqs ON tureq.users_id = reqs.id
JOIN glpi_users AS techs ON tutech.users_id = techs.id
JOIN glpi_tickets z ON z.id = tureq.tickets_id
LEFT OUTER JOIN glpi_tickettasks tt ON z.id = tt.tickets_id
LEFT JOIN glpi_itilcategories i ON z.itilcategories_id = i.id
LEFT JOIN glpi_usercategories c ON c.id = reqs.usercategories_id
INNER JOIN glpi_entities e ON z.entities_id = e.id
WHERE (tureq.id < tutech.id AND tureq.type < tutech.type) OR
(tureq.id < tutech.id AND tureq.users_id = tutech.users_id) OR
(tureq.id = tutech.id AND tureq.users_id = tutech.users_id)
The problem is that I get something like that:
1 Report jdoe jdoe Development 16-06-07 11:56:17 Mediana Software Mkt
1 Report jdoe fwilson Development 16-06-07 11:56:17 Mediana Software MKt
1 Report fwilson fwilson Development 16-06-07 11:56:17 Mediana Software Mkt
2 Task11 gwilliams gwilliams Ops 16-06-08 12:00:00 ALTA Hardware Def
3 Task12 gwilliams gwilliams Ops 16-06-08 12:01:00 ALTA Hardware Def
I don't want first and third row because is a CROSS JOIN result. Second row is OK, because jdoe is a requester and fwilson an assignee.
The problem is that sometimes requester and assignee are the same, eg: he creates a ticket for a task that himself will do. For example, 4th and 5th rows are OK.
So, how should I do to make a difference for those distinct cases, i.e.: I need to include:
tureq.id = tech.id AND req.users_id = tech.users.id
BUT NOT IF ALREADY EXISTS
tureq.id = tech.id AND req.users_id <> tech.users_id
Update
The main problem is that a user can assign to himself a ticket:
SELECT * from glpi_tickets_users WHERE type = 2 GROUP BY tickets_id HAVING COUNT(users_id)<2 limit 3;
+----+------------+----------+------+------------------+-------------------+
| id | tickets_id | users_id | type | use_notification | alternative_email |
+----+------------+----------+------+------------------+-------------------+
| 1 | 2 | 12 | 2 | 1 | NULL |
| 3 | 6 | 13 | 2 | 1 | NULL |
| 7 | 8 | 14 | 2 | 1 | NULL |
+----+------------+----------+------+------------------+-------------------+
Update 2:
It was a human mistake. The problem was really not about self-assigned tickets. Rather it was either that some tickets had not Requester or had Requester but still had not any resolver assigned.
I've found
As there are always the two types per ticket you are interested in, you can simply select the according records, so as to get requester and assignee per ticket.
select
t.id as ticket,
t.name as subject,
requester.name as requester,
assignee.name as assignee,
e.name as entity,
date_format(tt.date,'%y%-%m%-%d') as date,
date_format(tt.date,'%T') as hour,
case t.priority
when 6 then 'Mayor'
when 5 then 'Muy urgente'
when 4 then 'Urgente'
when 3 then 'Mediana'
when 2 then 'Baja'
when 1 then 'Muy baja'
end as priority,
uc.name as category,
ic.name as department
from glpi_tickets t
join glpi_entities e on e.id = t.entities_id
join
(
select tu.tickets_id, u.name, u.usercategories_id
from glpi_tickets_users tu
join glpi_users u on u.id = users_id
where tu.type = 1
) requester on requester.tickets_id = t.id
join
(
select tu.tickets_id, u.name
from glpi_tickets_users tu
join glpi_users u on u.id = users_id
where tu.type = 2
) assignee on assignee.tickets_id = t.id
left join glpi_itilcategories ic on ic.id = t.itilcategories_id
left join glpi_usercategories uc on uc.id = requester.usercategories_id;
left outer join glpi_tickettasks tt on tt.tickets_id = t.id
The only thing I wonder is: There can be several ticket tasks per ticket. So what do you want to do then? Have one line per ticket task in your results? This is what the query does. Only, it looks queer that your result rows don't contain any information on the tasks except for the dates, so you may have many, many lines with the same data, only with different dates. So maybe, you'd rather want the first or last date per ticket. To get the last date per ticket, you'd replace the last line in the query with:
left outer join
(
select tickets_id, max(date) as date
from glpi_tickettasks
group by tickets_id
) tt on tt.tickets_id = t.id
And you probably want to add an ORDER BY clause.
you need to add more qualifiers to your joins for example
JOIN glpi_tickets_users tutech ON tureq.tickets_id = tutech.tickets_id and tutech.type = 2

Constructing an SQL query for schema

I have the following database schema for an attendance system:
How would I write an SQL query to generate a good report of entries on day X? I need it to generate a report that has
Employee Name | TimeIn | TimeOut
Bob | 10:00 | 11:00
Sam | 10:30 | 18:00
Bob | 11:30 | 15:00
but the row that defines if it was a time in or out is set by entryType (1 being in, 0 being out), so I would aliases TimeIn and TimeOut.
My attempt was
`SELECT firstName, time from log INNER JOIN users on log.employeeID = users.employeeID WHERE date = GETDATE()`
but this doesn't handle the fact that some times are entry, some are exit.
Note that there can be multiple sign ins per date.
Update:
Another attempt, but the subquery returns multiple rows
select firstName, (select time as timeIn from log where entryType = 1), (select time as timeOut from log where entryType = 0) inner join users on log.uID = users.uID from log group by uID
This works in Oracle (apologies for the non-ANSI style, but you should get the drift)..
SELECT FORENAME,SURNAME,L1.TIME IN_TIME,L2.TIME OUT_TIME
FROM EMPLOYEES EMP, LOG L1, LOG L2
WHERE EMP.EMPLOYEE_ID = L1.EMPLOYEE_ID
AND EMP.EMPLOYEE_ID = L2.EMPLOYEE_ID
AND L1.ENTRYTYPE = 1
AND L2.ENTRYTYPE = 0
AND L2.TIME = (SELECT MIN(TIME) FROM LOG WHERE EMPLOYEE_ID = L2.EMPLOYEE_ID AND L2.ENTRYTYPE = 0 AND TIME > L1.TIME)
Update:
Ah, yes, hadn't considered that. In this case you need an outer join. something like this (untested):
SELECT FORENAME,SURNAME,L1.TIME IN_TIME,L2.TIME OUT_TIME
FROM EMPLOYEES EMP
INNER JOIN LOG L1 ON EMP.EMPLOYEE_ID = L1.EMPLOYEE_ID AND L1.ENTRYTYPE = 1
LEFT OUTER JOIN LOG L2 ON EMP.EMPLOYEE_ID = L2.EMPLOYEE_ID AND L2.ENTRYTYPE = 0
AND L2.TIME = (SELECT MIN(TIME) FROM LOG WHERE EMPLOYEE_ID = L2.EMPLOYEE_ID AND L2.ENTRYTYPE = 0 AND TIME > L1.TIME)
Simply this will work. Try this
SELECT FORENAME,SURNAME,LG.IN_TIME,LG.OUT_TIME FROM EMPLOYEES EMP INNER JOIN
(SELECT MIN(TIME) IN_TIME,MAX(TIME) OUT_TIME,EMPLOYEE_ID FROM LOG
GROUP BY EMPLOYEE_ID) LG ON EMP.EMPLOYEE_ID=LG.EMPLOYEE_ID
Note : I didnt include the entry type because at any time min time will be swipe in and max time will be swipe out
Updated
To show no of sign ins and outs try something like this,
SELECT FORENAME,SURNAME,LG.IN_TIME,LG.OUT_TIME,LG.no_of_ins,
LG.no_of_outs FROM EMPLOYEES EMP INNER JOIN
(SELECT MIN(TIME) IN_TIME,MAX(TIME) OUT_TIME,EMPLOYEE_ID,
COUNT( CASE WHEN ENTRY_TYPE='I' THEN 1 ELSE O END noi) no_of_ins,
COUNT( CASE WHEN ENTRY_TYPE='O' THEN 1 ELSE O END nou) no_of_outs,
GROUP BY EMPLOYEE_ID) LG ON EMP.EMPLOYEE_ID=LG.EMPLOYEE_ID
This query will give you the earliest time in and latest time out of an employee.
SELECT E.FORENAME,
(SELECT MIN(TIME) FROM LOG WHERE EMPLOYEEID = E.EMPLOYEEID AND ENTRYTYPE = 1 AND DATE = <YOUR DAYE>) AS "TIME_IN",
(SELECT MAX(TIME) FROM LOG WHERE EMPLOYEEID = E.EMPLOYEEID AND ENTRYTYPE = 0 AND DATE = <YOUR DAYE>) AS "TIME_OUT"
FROM EMPLOYEE E WHERE E.EMPLOYEEID = <EMPLOYEE ID>