First post! I'm a SQL newbie and am trying to query a huge data set into something manageable.
Data below is a for a Dr. office. I have the appointment ID (don't care about the patient name for this), which can have a few different associated events. I want to show all of those events as timestamps by column, with take the most-recent one if there are multiples (the patient rescheduled).
From there, I'll datediff to get the different breakdowns, but I'm not sure how to get there. I've been searching and must be using the wrong terms, so if this has been answered elsewhere, please link me and don't use your time to explain.
Thanks for your help!
Use something resembling the following pattern.
SELECT s1.id "ID",
s2.time "Scheduled",
...
s6.time "Depart Office",
datediff(minute, s3.time, s4.time) "Wait Time"
FROM (SELECT DISTINCT
id
FROM elbat) s1
LEFT JOIN (SELECT id,
time
FROM elbat s2
WHERE event = 'Scheduled') s2
ON s2.id = s1.id
...
LEFT JOIN (SELECT id,
time
FROM elbat
WHERE event = 'Depart Office') s6
ON s6.id = s1.id;
I chose the SQL Server datediff() syntax. You may need to rewrite it for your DBMS. If you have a patient table, you should also replace (SELECT DISTINCT id FROM elbat) with that patient table. You might also need to come up with a method for multiple appointments for a patient. Include e.g. the day in the joins (or any other attribute with which the different appointments can be distinguished).
Related
Using Oracle SQL Developer, I have three tables with some common data that I need to join.
Appreciate any help on this!
Please refer to https://i.stack.imgur.com/f37Jh.png for the input and desired output (table formatting doesn't work on all tables).
These tables are made up in order to anonymize them, and in reality contain other data with millions of entries, but you could think of them as representing:
Product = Main product categories in a grocery store.
Subproduct = Subcategory products to the above. Each time the table is updated, the main product category may loses or get some new suproducts assigned to it. E.g. you can see that from May to June the Pulled pork entered while the Fishsoup was thrown out.
Issues = Status of the products, for example an apple is bad if it has brown spots on it..
What I need to find is: for each P_NAME, find the latest updated set of subproducts (SP_ID and SP_NAME), and append that information with the latest updated issue status (STATUS_FLAG).
Please note that each main product category gets its set of subproducts updated at individual occasions i.e. 1234 and 5678 might be "latest updated" on different dates.
I have tried multiple queries but failed each time. I am using combos of SELECT, LEFT OUTER JOIN, JOIN, MAX and GROUP BY.
Latest attempt, which gives me the combo of the first two tables, but missing the third:
SELECT
PRODUCT.P_NAME,
SUBPRODUCT.SP_PRODUCT_ID, SUBPRODUCT.SP_NAME, SUBPRODUCT.SP_ID, SUPPRODUCT.SP_VALUE_DATE
FROM SUBPRODUCT
LEFT OUTER JOIN PRODUCT ON PRODUCT.P_ID = SUBPRODUCT.SP_PRODUCT_ID
JOIN(SELECT SP_PRODUCT_ID, MAX(SP_VALUE_DATE) AS latestdate FROM SUBPRODUCT GROUP BY SP_PRODUCT_ID) sub ON
sub.SP_PRODUCT_ID = SUBPRODUCT.SP_PRODUCT_ID AND sub.latestDate = SUBPRODUCT.SP_VALUE_DATE;
Trying to find a row with a max value is a common SQL pattern - you can do it with a join, like your example, but it's usually more clear to use a subquery or a window function.
Correlated subquery example
select
PRODUCT.P_NAME,
SUBPRODUCT.SP_PRODUCT_ID, SUBPRODUCT.SP_NAME, SUBPRODUCT.SP_ID, SUPPRODUCT.SP_VALUE_DATE,
ISSUES.STATUS_FLAG, ISSUES.STATUS_LAST_UPDATED
from PRODUCT
join SUBPRODUCT
on PRODUCT.P_ID = SUBPRODUCT.SP_PRODUCT_ID
and SUBPRODUCT.SP_VALUE_DATE = (select max(S2.SP_VALUE_DATE) as latestDate
from SUBPRODUCT S2
where S2.SP_PRODUCT_ID = SUBPRODUCT.SP_PRODUCT_ID)
join ISSUES
on ISSUES.ISSUE_ID = SUBPRODUCT.SP_ID
and ISSUES.STATUS_LAST_UPDATED = (select max(I2.STATUS_LAST_UPDATED) as latestDate
from ISSUES I2
where I2.ISSUE_ID = ISSUES.ISSUE_ID)
Window function / inline view example
select
PRODUCT.P_NAME,
S.SP_PRODUCT_ID, S.SP_NAME, S.SP_ID, S.SP_VALUE_DATE,
I.STATUS_FLAG, I.STATUS_LAST_UPDATED
from PRODUCT
join (select SUBPRODUCT.*,
max(SP_VALUE_DATE) over (partition by SP_PRODUCT_ID) as latestDate
from SUBPRODUCT) S
on PRODUCT.P_ID = S.SP_PRODUCT_ID
and S.SP_VALUE_DATE = S.latestDate
join (select ISSUES.*,
max(STATUS_LAST_UPDATED) over (partition by ISSUE_ID) as latestDate
from ISSUES) I
on I.ISSUE_ID = S.SP_ID
and I.STATUS_LAST_UPDATED = I.latestDate
This often performs a bit better, but window functions can be tricky to understand.
I'm facing a problem successfully completing (running) a query on a singular table in Access 2013 using SQL to complete a Datediff on consecutive/Sequential rows of timestamps, which track status changes in tickets going through our ticketing system.
The table titled: dbo_Master3_FieldHistory, has a field which tracks timestamps each time a ticket's status changes. Unfortunately, it only includes 1 timestamp per change, meaning it doesn't inherently have a secondary timestamp for when the status is changed again, which I need to run a DateDiff to calculate AGE for tickets, based on Status.
I found a plausible solution for this on StackOverflow, linked below. When i tried to implement this solution, as is with minor adjustments, and including adjustments for filtering out old data and particular fields, it just freezes my Access program and never times out (have to force close Access)
Date Difference between consecutive rows
'This is the basic code, traslated from the linked StackOverflow solution to fit this tables fields (I believed)
SELECT T.mrID, T.mrSEQUENCE, T.mrUSERID, T.mrFIELDNAME, T.mrNEWFIELDVALUE, T.mrOLDFIELDVALUE, T.mrTIMESTAMP, T.mrNextTIMESTAMP, DateDiff("s",T.mrTIMESTAMP, T.mrNextTIMESTAMP) AS STATUSTIME
FROM (
SELECT T1.mrID, T1.mrSEQUENCE, T1.mrUSERID, T1.mrFIELDNAME, T1.mrNEWFIELDVALUE, T1.mrOLDFIELDVALUE, T1.mrTIMESTAMP,
(SELECT MIN(mrTIMESTAMP)
FROM dbo_MASTER3_FIELDHISTORY AS T2
WHERE T2.mrID = T1.mrID
AND T2.mrTIMESTAMP > T1.mrTIMESTAMP
) As mrNextTIMESTAMP
FROM dbo_MASTER3_FIELDHISTORY AS T1
) AS T
'This is the code that I wanted to use to account for filtering out two particular fields, limiting the data to tickets (mrID) newer than 1/1/2018 and only those where the mrFIELDNAME is mrSTATUS
SELECT T.mrID, T.mrSEQUENCE, T.mrUSERID, T.mrFIELDNAME, T.mrNEWFIELDVALUE, T.mrOLDFIELDVALUE, T.mrTIMESTAMP, T.mrNextTIMESTAMP, DateDiff("s",T.mrTIMESTAMP, T.mrNextTIMESTAMP) AS STATUSTIME
FROM (
SELECT T1.mrID, T1.mrSEQUENCE, T1.mrUSERID, T1.mrFIELDNAME, T1.mrNEWFIELDVALUE, T1.mrOLDFIELDVALUE, T1.mrTIMESTAMP,
(SELECT MIN(mrTIMESTAMP)
FROM dbo_MASTER3_FIELDHISTORY AS T2
WHERE mrFIELDNAME = "mrSTATUS"
AND T2.mrID = T1.mrID
AND T2.mrTIMESTAMP > T1.mrTIMESTAMP
) As T1.mrNextTIMESTAMP
FROM dbo_MASTER3_FIELDHISTORY AS T1
WHERE mrFIELDNAME = "mrSTATUS"
AND mrTIMESTAMP >= #1/1/2018#
) AS T;
Access freezes when I try to run these queries. I've tried several ways but can't get it to work
I was able to figure it out, thank you to those who took your time to read through this interesting challenge. Instead of using the second code set in the link provided, I utilized the first and it worked beautifully. With some additions to the code to account for other filters/criteria, I have the results I need.
SELECT T1.mrID, T1.mrSEQUENCE, T1.mrUSERID, T1.mrFIELDNAME, T1.mrNEWFIELDVALUE, T1.mrOLDFIELDVALUE, T1.mrTIMESTAMP, MIN(T2.mrTIMESTAMP) AS mrNextTIMESTAMP, DATEDIFF("s", T1.mrTIMESTAMP, MIN(T2.mrTIMESTAMP)) AS TimeInStatus
FROM ((dbo_MASTER3_FIELDHISTORY AS T1 LEFT JOIN dbo_MASTER3_FIELDHISTORY AS T2 ON (T2.mrTIMESTAMP > T1.mrTIMESTAMP) AND (T1.mrID = T2.mrID)) INNER JOIN dbo_MASTER3 AS T4 ON (T4.mrID = T1.mrID))
WHERE T4.mrSUBMITDATE >= #1/1/2018#
AND t1.mrFIELDNAME = "mrSTATUS"
AND NOT T4.mrSTATUS="_Deleted_"
AND NOT T4.mrSTATUS="_SOLVED_"
AND NOT T4.mrSTATUS="_PENDING_SOLUTION_"
GROUP BY T1.mrID, T1.mrSEQUENCE, T1.mrUSERID, T1.mrFIELDNAME, T1.mrNEWFIELDVALUE, T1.mrOLDFIELDVALUE, T1.mrTIMESTAMP
ORDER BY T1.mrID, T1.mrTIMESTAMP;
Sincerely,
Kristopher
i want to group this query by project only because there are two records of same project but i only want one.
But when i add group by clause it asks me to add other columns as well by which grouping does not work.
*
DECLARE #Year varchar(75) = '2018'
DECLARE #von DateTime = '1.09.2018'
DECLARE #bis DateTime = '30.09.2018'
select new_projekt ,new_geschftsartname, new_mitarbeitername, new_stundensatz
from Filterednew_projektkondition ps
left join Filterednew_fakturierungsplan fp on ps.new_projekt = fp.new_hauptprojekt1
where ps.statecodename = 'Aktiv'
and fp.new_startdatum >= #von +'00:00:00'
and fp.new_enddatum <= #bis +'23:59:59'
--and new_projekt= Filterednew_projekt.new_
--group by new_projekt
*
look at the column new_projekt . row 2 and 3 has same project, but i want it to appear only once. Due to different other columns this is not possible.
if its of interested , there is another coluim projectcondition id which is unique for both
You can't ask a database to decide arbitrarily for you, which records should be thrown away when doing a group. You have to be precise and specific
Example, here is some data about a person:
Name, AddressZipCode
John Doe, 90210
John Doe, 12345
SELECT name, addresszipcode FROM person INNER JOIN address on address.personid = person.id
There are two addresses stored for this one guy, the person data is repeated in the output!
"I don't want that. I only want to see one line for this guy, together with his address"
Which address?
That's what you have to tell the database
"Well, obviously his current address"
And how do you denote that an address is current?
"It's the one with the null enddate"
SELECT name, addresszipcode FROM person INNER JOIN address on address.personid = person.id WHERE address.enddate = null
If you still get two addresses out, there are two address records that are null - you have data that is in violation of your business data modelling principles ("a person's address history shall have at most one adddress that is current, denoted by a null end date") - fix the data
"Why can't i just group by name?"
You can, but if you do, you still have to tell the database how to accumulate the non-name data that it shows you. You want an address data out of it, it has 2 it wants to show you, you have to tell it which to discard. You could do this:
SELECT name, MAX(addresszipcode) FROM person INNER JOIN address on address.personid = person.id GROUP BY name
"But I don't want the max zipcode? That doesn't make sense"
OK, use the MIN, the SUM, the AVG, anything that makes sense. If none of these make sense, then use something that does, like the address line that has the highest end date, or the lowest end date that is a future end date. If you only want one address on show you must decide how to boil that data down to just one record - you have to write the rule for the database to follow and no question about it you have to create a rule so make it a rule that describes what you actually want
Ok, so you created a rule - you want only the rows with the minimum new_stundenstatz
DECLARE #Year varchar(75) = '2018'
DECLARE #von DateTime = '1.09.2018'
DECLARE #bis DateTime = '30.09.2018'
select new_projekt ,new_geschftsartname, new_mitarbeitername, new_stundensatz
from
(SELECT *, ROW_NUMBER() OVER(PARTITON BY new_projekt ORDER BY new_stundensatz) rown FROM Filterednew_projektkondition) ps
left join
Filterednew_fakturierungsplan fp on ps.new_projekt = fp.new_hauptprojekt1
where ps.statecodename = 'Aktiv'
and fp.new_startdatum >= #von +'00:00:00'
and fp.new_enddatum <= #bis +'23:59:59'
and ps.rown = 1
Here I've used an analytic operation to number the rows in your PS table. They're numbered in order of ascending new_stundensatz, starting with 1. The numbering restarts when the new_projekt changes, so each new_projekt will have a number 1 row.. and then we make that a condition of the where
(Helpful side note for applying this technique in future.. Ff it were the FP table we were adding a row number to, we would need to put AND fp.rown= 1 in the ON clause, not the WHERE clause, because putting it in the where would make the LEFT join behave like an INNER, hiding rows that don't have any FP matching record)
I am fairly new in Access and SQL programming. I am trying to do the following:
Sum(SO_SalesOrderPaymentHistoryLineT.Amount) AS [Sum Of PaymentPerYear]
and group by year even when there is no amount in some of the years. I would like to have these years listed as well for a report with charts. I'm not certain if this is possible, but every bit of help is appreciated.
My code so far is as follows:
SELECT
Base_CustomerT.SalesRep,
SO_SalesOrderT.CustomerId,
Base_CustomerT.Customer,
SO_SalesOrderPaymentHistoryLineT.DatePaid,
Sum(SO_SalesOrderPaymentHistoryLineT.Amount) AS [Sum Of PaymentPerYear]
FROM
Base_CustomerT
INNER JOIN (
SO_SalesOrderPaymentHistoryLineT
INNER JOIN SO_SalesOrderT
ON SO_SalesOrderPaymentHistoryLineT.SalesOrderId = SO_SalesOrderT.SalesOrderId
) ON Base_CustomerT.CustomerId = SO_SalesOrderT.CustomerId
GROUP BY
Base_CustomerT.SalesRep,
SO_SalesOrderT.CustomerId,
Base_CustomerT.Customer,
SO_SalesOrderPaymentHistoryLineT.DatePaid,
SO_SalesOrderPaymentHistoryLineT.PaymentType,
Base_CustomerT.IsActive
HAVING
(((SO_SalesOrderPaymentHistoryLineT.PaymentType)=1)
AND ((Base_CustomerT.IsActive)=Yes))
ORDER BY
Base_CustomerT.SalesRep,
Base_CustomerT.Customer;
You need another table with all years listed -- you can create this on the fly or have one in the db... join from that. So if you had a table called alltheyears with a column called y that just listed the years then you could use code like this:
WITH minmax as
(
select min(year(SO_SalesOrderPaymentHistoryLineT.DatePaid) as minyear,
max(year(SO_SalesOrderPaymentHistoryLineT.DatePaid) as maxyear)
from SalesOrderPaymentHistoryLineT
), yearsused as
(
select y
from alltheyears, minmax
where alltheyears.y >= minyear and alltheyears.y <= maxyear
)
select *
from yearsused
join ( -- your query above goes here! -- ) T
ON year(T.SO_SalesOrderPaymentHistoryLineT.DatePaid) = yearsused.y
You need a data source that will provide the year numbers. You cannot manufacture them out of thin air. Supposing you had a table Interesting_year with a single column year, populated, say, with every distinct integer between 2000 and 2050, you could do something like this:
SELECT
base.SalesRep,
base.CustomerId,
base.Customer,
base.year,
Sum(NZ(data.Amount)) AS [Sum Of PaymentPerYear]
FROM
(SELECT * FROM Base_CustomerT INNER JOIN Year) AS base
LEFT JOIN
(SELECT * FROM
SO_SalesOrderT
INNER JOIN SO_SalesOrderPaymentHistoryLineT
ON (SO_SalesOrderPaymentHistoryLineT.SalesOrderId = SO_SalesOrderT.SalesOrderId)
) AS data
ON ((base.CustomerId = data.CustomerId)
AND (base.year = Year(data.DatePaid))),
WHERE
(data.PaymentType = 1)
AND (base.IsActive = Yes)
AND (base.year BETWEEN
(SELECT Min(year(DatePaid) FROM SO_SalesOrderPaymentHistoryLineT)
AND (SELECT Max(year(DatePaid) FROM SO_SalesOrderPaymentHistoryLineT))
GROUP BY
base.SalesRep,
base.CustomerId,
base.Customer,
base.year,
ORDER BY
base.SalesRep,
base.Customer;
Note the following:
The revised query first forms the Cartesian product of BaseCustomerT with Interesting_year in order to have base customer data associated with each year (this is sometimes called a CROSS JOIN, but it's the same thing as an INNER JOIN with no join predicate, which is what Access requires)
In order to have result rows for years with no payments, you must perform an outer join (in this case a LEFT JOIN). Where a (base customer, year) combination has no associated orders, the rest of the columns of the join result will be NULL.
I'm selecting the CustomerId from Base_CustomerT because you would sometimes get a NULL if you selected from SO_SalesOrderT as in the starting query
I'm using the Access Nz() function to convert NULL payment amounts to 0 (from rows corresponding to years with no payments)
I converted your HAVING clause to a WHERE clause. That's semantically equivalent in this particular case, and it will be more efficient because the WHERE filter is applied before groups are formed, and because it allows some columns to be omitted from the GROUP BY clause.
Following Hogan's example, I filter out data for years outside the overall range covered by your data. Alternatively, you could achieve the same effect without that filter condition and its subqueries by ensuring that table Intersting_year contains only the year numbers for which you want results.
Update: modified the query to a different, but logically equivalent "something like this" that I hope Access will like better. Aside from adding a bunch of parentheses, the main difference is making both the left and the right operand of the LEFT JOIN into a subquery. That's consistent with the consensus recommendation for resolving Access "ambiguous outer join" errors.
Thank you John for your help. I found a solution which works for me. It looks quiet different but I learned a lot out of it. If you are interested here is how it looks now.
SELECT DISTINCTROW
Base_Customer_RevenueYearQ.SalesRep,
Base_Customer_RevenueYearQ.CustomerId,
Base_Customer_RevenueYearQ.Customer,
Base_Customer_RevenueYearQ.RevenueYear,
CustomerPaymentPerYearQ.[Sum Of PaymentPerYear]
FROM
Base_Customer_RevenueYearQ
LEFT JOIN CustomerPaymentPerYearQ
ON (Base_Customer_RevenueYearQ.RevenueYear = CustomerPaymentPerYearQ.[RevenueYear])
AND (Base_Customer_RevenueYearQ.CustomerId = CustomerPaymentPerYearQ.CustomerId)
GROUP BY
Base_Customer_RevenueYearQ.SalesRep,
Base_Customer_RevenueYearQ.CustomerId,
Base_Customer_RevenueYearQ.Customer,
Base_Customer_RevenueYearQ.RevenueYear,
CustomerPaymentPerYearQ.[Sum Of PaymentPerYear]
;
We have a database that stores vehicle's gps position, date, time, vehicle identification, lat, long, speed, etc., every minute.
The following select pulls each vehicle position and info, but the problem is that returns the first record, and I need the last record (current position), based on date (datagps.Fecha) and time (datagps.Hora). This is the select:
SELECT configgps.Fichagps,
datacar.Ficha,
groups.Nombre,
datagps.Hora,
datagps.Fecha,
datagps.Velocidad,
datagps.Status,
datagps.Calleune,
datagps.Calletowo,
datagps.Temp,
datagps.Longitud,
datagps.Latitud,
datagps.Evento,
datagps.Direccion,
datagps.Provincia
FROM asigvehiculos
INNER JOIN datacar ON (asigvehiculos.Iddatacar = datacar.Id)
INNER JOIN configgps ON (datacar.Configgps = configgps.Id)
INNER JOIN clientdata ON (asigvehiculos.Idgroup = clientdata.group)
INNER JOIN groups ON (clientdata.group = groups.Id)
INNER JOIN datagps ON (configgps.Fichagps = datagps.Fichagps)
Group by Fichagps;
I need same result I'm getting, but instead of the older record I need the most recent
(LAST datagps.Fecha / datagps.Hora).
How can I accomplish this?
Add ORDER BY datagps.Fecha DESC, datagps.Hora DESC LIMIT 1 to your query.
I'm not sure why you are having any problems with this as Lex's answers seem good.
I would start putting ORDER BY's in your query so it puts them in an order, when it's showing the record you want as the first one in the list, then add the LIMIT.
If you want the most recent, then the following should be good enough:
ORDER BY datagps.Fecha DESC, datagps.Hora DESC
If you simply want the record that was added to the database most recently (irregardless of the date/time fields), then you could (assuming you have an auto-incremental primary key in the datagps table (I assume it's called dataID for this example)):
ORDER BY datagps.dataID DESC
If these aren't showing the data you want - then there is something missing from your example (maybe data-types aren't DATETIME fields? - if not - then maybe a CONVERT to change them from their current type before ORDERing BY would be a good idea)
EDIT:
I've seen the screenshot and I'm confused as to what the issue is still. That appears to be showing everything in order. Are you implying that there are many more than 5 records? How many are you expecting?
Do you mean: for each record returned, you want the one row from the table datagps with the latest date and time attached to the result? If so, how about this:
# To show how the query will be executed
# comment to return actual results
EXPLAIN
SELECT
configgps.Fichagps, datacar.Ficha, groups.Nombre, datagps.Hora, datagps.Fecha,
datagps.Velocidad, datagps.Status, datagps.Calleune, datagps.Calletowo,
datagps.Temp, datagps.Longitud, datagps.Latitud, datagps.Evento,
datagps.Direccion, datagps.Provincia
FROM asigvehiculos
INNER JOIN datacar ON (asigvehiculos.Iddatacar = datacar.Id)
INNER JOIN configgps ON (datacar.Configgps = configgps.Id)
INNER JOIN clientdata ON (asigvehiculos.Idgroup = clientdata.group)
INNER JOIN groups ON (clientdata.group = groups.Id)
INNER JOIN datagps ON (configgps.Fichagps = datagps.Fichagps)
########### Add this section
LEFT JOIN datagps b ON (
configgps.Fichagps = b.Fichagps
# wrong condition
#AND datagps.Hora < b.Hora
#AND datagps.Fecha < b.Fecha)
# might prevent indexes to be used
AND (datagps.Fecha < b.Fecha OR (datagps.Fecha = b.Fecha AND datagps.Hora < b.Hora))
WHERE b.Fichagps IS NULL
###########
Group by configgps.Fichagps;
Similar question here only that that one uses outer joins.
Edit (again):
The conditions are wrong so corrected it. Can you show us the output of the above EXPLAIN query so we can pinpoint where the bottle neck is?
As hurikhan77 said, it will be better if you could convert both of the the columns into a single datetime field - though I'm guessing this would not be possible for your case (since your database is already being used?)
Though if you can convert it, the condition (on the join) would become:
AND datagps.FechaHora < b.FechaHora
After that, add an index for datagps.FechaHora and the query would be fast(er).
What you probably want is getting the maximum of (Fecha,Hora) per grouped dataset? This is a little complicated to accomplish with your column types. You should combine Fecha and Hora into one column of type DATETIME. Then it's easy to just SELECT MAX(FechaHora) ... GROUP BY Fichagps.
It could have helped if you posted your table structure to understand the problem.