Aggregate function in comparison of 2 rows in the same table (SQL) - sql

Given the table definition:
create table mytable (
id integer,
mydate datetime,
myvalue integer )
I want to get the following answer by a single SQL query:
id date_actual value_actual date_previous value_previous
where:
date_previous is the maximum of all the dates preceeding date_actual
for each id and values correspond with the two dates
{max(date_previous) < date_actual ?}
How can I achieve that?
Thanks for your hints

This is a variation of the common "greatest N per group" query which comes up every week on StackOverflow.
SELECT m1.id, m1.mydate AS date_actual, m1.myvalue AS value_actual,
m2.mydate AS date_previous, m2.myvalue AS value_previous
FROM mytable m1
LEFT OUTER JOIN mytable m2
ON (m1.id = m2.id AND m1.mydate > m2.mydate)
LEFT OUTER JOIN mytable m3
ON (m1.id = m3.id AND m1.mydate > m3.mydate AND m3.mydate > m2.mydate)
WHERE m3.id IS NULL;
In other words, m2 is all rows with the same id and a lesser mydate, but we want only the one such that there is no row m3 with a date between m1 and m2. Assuming the dates are unique, there will only be one row in m2 where this is true.

Assuming I understood your requirements correctly, here's something you can try.
select a.id,
a.mydate as date_actual,
a.value as value_actual,
b.date as date_previous,
b.value as value_previous
from mytable a, mytable b
where a.id = b.id and
a.mydate > b.mydate and
b.mydate = (select max(mydate) from mytable c where c.id = a.id and c.mydate < a.mydate)
Apologies for the ugly SQL. I am sure there are better ways of doing this.

Related

proc sql join in SAS that is closest to a date

How can I do a one-to-many join between two datasets with proc sql in SAS to obtain the record in Dataset B is closest to a value in Dataset A?
Dataset A
#Patient #Date of Dose
001 2020-02-01
Dataset B
# Patient # Lab Test #Date of Test # Value
001 Test 1 2020-01-17 6
001 Test 1 2020-01-29 10
I want to do the join to select the second record in Dataset B, the record with a "Date of Test" that is the closest (less than or equal to) to the "Date of Dose" in the first dataset.
I want to do the join to select the [..] record in Dataset B [...] with a "Date of Test" that is the closest (less than or equal to) to the "Date of Dose" in the first dataset.
You could use outer appy - if sas supports that:
select a.*, b.*
from a
outer apply(
select top 1 b.*
from b
where b.patient = a.patient and b.date_of_test <= a.date_of_dose
order by b.date_of_test desc
) b
Another solution is to join with a not exists condition:
select a.*, b.*
from a
left join b
on b.patient = a.patient
and b.date_of_test <= a.date_of_dose
and not exists (
select 1
from b b1
where
b1.patient = a.patient
and b1.date_of_test <= a.date_of_dose
and b1.date_of_test > b.date_of_test
)
Calculate the absolute difference between both dates and select the minimum date with a having clause. You'll need to do additional logic, such as distinct, to remove any duplicates.
proc sql noprint;
select t1.patient
, t1.date_of_dose
, abs(t1.date - t2.date) as date_dif
from dataset_A as t1
LEFT JOIN
dataset_B as t2
ON t1.patient = t2.patient
where t1.date <= t2.date
group by t1.patient
having calculated date_dif = min(calculated date_dif)
;
quit;
try this:
SELECT TOP 1 *
FROM (
SELECT DSA.Patient
,DSA.Date_Of_Dose
,DSB.Date_Of_Test
,DATEDIFF(Day, DSA.Date_Of_Dose, DSA.Date_Of_Dose) Diff
FROM DataSetA DSA
JOIN DataSetB DSB ON DSA.Patient = DSB.Patient
) Data
WHERE ABS(Diff) = MIN(ABS(Diff));
Sorry, I got no way to know if this is working 'cause I'm not at home. I hope it helps you.
I want to do the join to select the second record in Dataset B, the record with a "Date of Test" that is the closest (less than or equal to) to the "Date of Dose" in the first dataset.
I would recommend cross-joining Date of Test and Date of Dose, and calculate absolute difference between the dates using intck() function, and leave the minimal value.

How to do a join in Oracle tables?

I am trying to help a co-worker do an inner join on two oracle tables so he can build a particular graph on a report.
I have no Oracle experience, only SQL Server and have gotten to what seems like the appropriate statement, but does not work.
SELECT concat(concat(month("a.timestamp"),','),day("a.timestamp")) as monthDay
, min("a.data_value") as minTemp
, max("a.data_value") as maxTemp
, "b.forecast" as forecastTemp
, "a.timestamp" as date
FROM table1 a
WHERE "a.category" = 'temperature'
GROUP BY concat(concat(month("timestamp"),','),day("timestamp"))
INNER JOIN (SELECT "forecast"
, "timestamp"
FROM table2
WHERE "category" = 'temperature') b
ON "a.timestamp" = "b.timestamp"
It doesn't like my aliases for some reason. It doesn't like not having quotes for some reason.
Also when I use the fully scored names it still fails because:
ORA-00933 SQL command not properly ended
The order of the query should be
SELECT
FROM
INNER JOIN
WHERE
GROUP BY
as below
SELECT concat(concat(month("a.timestamp"),','),day("a.timestamp")) as monthDay
, min("a.data_value") as minTemp
, max("a.data_value") as maxTemp
, "b.forecast" as forecastTemp
, "a.timestamp" as date
FROM table1 a
INNER JOIN (SELECT "forecast"
, "timestamp"
FROM table2
WHERE "category" = 'temperature') b
ON "a.timestamp" = "b.timestamp"
WHERE "category" = 'temperature'
GROUP BY concat(concat(month("timestamp"),','),day("timestamp"))
In a flood of attempts, here's yet another one.
table2 can be moved out of subquery; join it with table1 on category as well
note that all non-aggregates columns (from the SELECT) have to be contained in the GROUP BY clause. It seems that a.timestamp contains more info than just month and day - if that's so, it'll probably ruin the whole result set as data won't be grouped by monthday, but by the whole date - consider removing it from SELECT, if necessary
SELECT TO_CHAR(a.timestamp,'mm.dd') monthday,
MIN(a.data_value) mintemp,
MAX(a.data_value) maxtemp,
b.forecast forecasttemp,
a.timestamp c_date
FROM table1 a
JOIN table2 b ON a.timestamp = b.timestamp
AND a.category = b.category
WHERE a.category = 'temperature'
GROUP BY TO_CHAR(a.timestamp,'mm.dd'),
b.forecast,
a.timestamp;
The correct (simplified) syntax of select is
SELECT <columns>
FROM table1 <alias>
JOIN table2 <alias> <join_condition>
WHERE <condition>
GROUP BY <group by columns>
You are doing it wrong. Use subquery:
SELECT c.*, b.`forecast` as forecastTemp
FROM
(SELECT concat(concat(month(a.`timestamp`),','),day(a.`timestamp`)) as monthDay
, min(a.`data_value`) as minTemp
, max(a.`data_value`) as maxTemp
, a.`timestamp` as date
FROM table1 a
WHERE `category`='temperature'
GROUP BY concat(concat(month(`timestamp`),','),day(`timestamp`))) c
INNER JOIN (SELECT `forecast`
, `timestamp`
FROM table2
WHERE `category` = 'temperature') b
ON c.`timestamp` = b.`timestamp`;
In addition to the order of the components other answers have mentioned (where goes after join etc), you also need to remove all of the double-quote characters. In Oracle, these override the standard naming rules, so "a.category" is only valid if your table actually has a column named, literally, "a.category", e.g.
create table demo ("a.category" varchar2(10));
insert into demo ("a.category") values ('Weird');
select d."a.category" from demo d;
It's quite rare to need to do this.
The query should look something like this:
SELECT to_char(a.timestamp, 'MM-DD') as monthDay,
min(a.data_value) as minTemp,
max(a.data_value) as maxTemp,
b.forecast as forecastTemp
FROM table1 a JOIN
table2 b
ON a.timestamp = b.timestamp and b.category = 'temperature'
WHERE a.category = 'temperature'
GROUP BY to_char(timestamp, 'MM-DD'), b.forecast;
I'm not 100% sure this is what you want. Your query has numerous issues and complexities:
You don't need a subquery in the FROM clause.
You can use to_char() instead of the more complex date string processing.
The group by did not contain all the relevant fields.
Don't use double quotes, unless really, really needed.

Insert into table with multiple joins under a unique condition based off time

I have an insert statment that incorporates multiple joins. However, the last join (table ItemMulitplers) doesnt really have anything "tied" to the other tables. They are just multipliers in this table with no unique identification or connection with others. the only thing is a timestamp from this table.
I have 5 rows in this table and my script is taking all five rows. I need it to select only one and to base it off of the closest time from the table called ItemsProduced. They get executed at the same time but not on the same millisecond level. any help is most appreciated thank you
insert into KLNUser.dbo.ItemLookup (ItemNumber, Cases, [Description], [Type], Wic, Elc, totalelc, Shift, [TimeStamp])
select a.ItemNumber, b.CaseCount,b.ItemDescription, b.DivisionCode, b.WorkCenter, b.LaborPerCase, a.CaseCount* b.LaborPerCase* c.IaCoPc, a.shift, a.TimeStamp from ItemsProduced a
inner join MasterItemList b on a.ItemNumber = b.itemnumber
inner join ItemMultipliers c on c.MultiplyTimeStamp <=a.Timestamp Interval 1 seconds
where not exists (select * from ItemLookup where ItemNumber = a.ItemNumber and Cases = b.CaseCount and [TimeStamp] = a.TimeStamp)
I think the easiest way is with cross apply:
select a.ItemNumber, b.CaseCount,b.ItemDescription, b.DivisionCode, b.WorkCenter, b.LaborPerCase, a.CaseCount* b.LaborPerCase* c.IaCoPc, a.shift, a.TimeStamp
from ItemsProduced a inner join
MasterItemList b
on a.ItemNumber = b.itemnumber cross apply
(select top 1 *
from ItemMultipliers c
where c.MultiplyTimeStamp < a.Timestamp
order by c.MultiplyTimeStamp desc
) c
where not exists (select * from ItemLookup where ItemNumber = a.ItemNumber and Cases = b.CaseCount and [TimeStamp] = a.TimeStamp)

SQL Query to get time periods contained in another one in the same table

I have the following table (person_program):
program_id person_id start_date end_date
1 15588499 01-01-2014 02-16-2014
2 15588499 02-17-2014 03-01-2014
3 15588499 02-15-2014 02-21-2014
I need to get the program_id that are contained by another time period in the same table (in this case, program_id = 3).
Any idea to solve this?
Thanks!!
Yes, you can reference the same table and get overlapping periods:
select t1.program_id ThisOne, t2.program_id OverlapsWith
from person_program t1
inner join person_program t2
on t1.program_id < t2.program_id
and t1.person_id = t2.person_id
and t2.start_date > t1.end_date
SQL Fiddle demo
Try:
select *
from person_program x
where exists (select 1 from person_program y where y.start_date <= x.start_date and y.person_id = x.person_id and y.program_id <> x.program_id)
or exists (select 1 from person_program y where y.end_date >= x.end_date and y.person_id = x.person_id and y.program_id <> x.program_id)
Note: This finds situations where, for the same person_id, there is another record on the table, with a different program_id, having a start and end date FULLY contained within the range of another.
EDIT: I just change the AND to OR, as it looks like, from your post, you are looking for partial, not necessarily full, overlaps. This should do that.
Assuming you are passing in the program_id you want to find overlaps for, and partial overlaps are OK:
DECLARE #program_id int = 3
SELECT PP.program_id,
PP.person_id
FROM person_program PP
INNER JOIN person_program Source
ON PP.start_date <= Source.end_date
AND PP.end_date >= Source.start_date
AND Source.program_id = #program_id

Limit join to one row

I have the following query:
SELECT sum((select count(*) as itemCount) * "SalesOrderItems"."price") as amount, 'rma' as
"creditType", "Clients"."company" as "client", "Clients".id as "ClientId", "Rmas".*
FROM "Rmas" JOIN "EsnsRmas" on("EsnsRmas"."RmaId" = "Rmas"."id")
JOIN "Esns" on ("Esns".id = "EsnsRmas"."EsnId")
JOIN "EsnsSalesOrderItems" on("EsnsSalesOrderItems"."EsnId" = "Esns"."id" )
JOIN "SalesOrderItems" on("SalesOrderItems"."id" = "EsnsSalesOrderItems"."SalesOrderItemId")
JOIN "Clients" on("Clients"."id" = "Rmas"."ClientId" )
WHERE "Rmas"."credited"=false AND "Rmas"."verifyStatus" IS NOT null
GROUP BY "Clients".id, "Rmas".id;
The problem is that the table "EsnsSalesOrderItems" can have the same EsnId in different entries. I want to restrict the query to only pull the last entry in "EsnsSalesOrderItems" that has the same "EsnId".
By "last" entry I mean the following:
The one that appears last in the table "EsnsSalesOrderItems". So for example if "EsnsSalesOrderItems" has two entries with "EsnId" = 6 and "createdAt" = '2012-06-19' and '2012-07-19' respectively it should only give me the entry from '2012-07-19'.
SELECT (count(*) * sum(s."price")) AS amount
, 'rma' AS "creditType"
, c."company" AS "client"
, c.id AS "ClientId"
, r.*
FROM "Rmas" r
JOIN "EsnsRmas" er ON er."RmaId" = r."id"
JOIN "Esns" e ON e.id = er."EsnId"
JOIN (
SELECT DISTINCT ON ("EsnId") *
FROM "EsnsSalesOrderItems"
ORDER BY "EsnId", "createdAt" DESC
) es ON es."EsnId" = e."id"
JOIN "SalesOrderItems" s ON s."id" = es."SalesOrderItemId"
JOIN "Clients" c ON c."id" = r."ClientId"
WHERE r."credited" = FALSE
AND r."verifyStatus" IS NOT NULL
GROUP BY c.id, r.id;
Your query in the question has an illegal aggregate over another aggregate:
sum((select count(*) as itemCount) * "SalesOrderItems"."price") as amount
Simplified and converted to legal syntax:
(count(*) * sum(s."price")) AS amount
But do you really want to multiply with the count per group?
I retrieve the the single row per group in "EsnsSalesOrderItems" with DISTINCT ON. Detailed explanation:
Select first row in each GROUP BY group?
I also added table aliases and formatting to make the query easier to parse for human eyes. If you could avoid camel case you could get rid of all the double quotes clouding the view.
Something like:
join (
select "EsnId",
row_number() over (partition by "EsnId" order by "createdAt" desc) as rn
from "EsnsSalesOrderItems"
) t ON t."EsnId" = "Esns"."id" and rn = 1
this will select the latest "EsnId" from "EsnsSalesOrderItems" based on the column creation_date. As you didn't post the structure of your tables, I had to "invent" a column name. You can use any column that allows you to define an order on the rows that suits you.
But remember the concept of the "last row" is only valid if you specifiy an order or the rows. A table as such is not ordered, nor is the result of a query unless you specify an order by
Necromancing because the answers are outdated.
Take advantage of the LATERAL keyword introduced in PG 9.3
left | right | inner JOIN LATERAL
I'll explain with an example:
Assuming you have a table "Contacts".
Now contacts have organisational units.
They can have one OU at a point in time, but N OUs at N points in time.
Now, if you have to query contacts and OU in a time period (not a reporting date, but a date range), you could N-fold increase the record count if you just did a left join.
So, to display the OU, you need to just join the first OU for each contact (where what shall be first is an arbitrary criterion - when taking the last value, for example, that is just another way of saying the first value when sorted by descending date order).
In SQL-server, you would use cross-apply (or rather OUTER APPLY since we need a left join), which will invoke a table-valued function on each row it has to join.
SELECT * FROM T_Contacts
--LEFT JOIN T_MAP_Contacts_Ref_OrganisationalUnit ON MAP_CTCOU_CT_UID = T_Contacts.CT_UID AND MAP_CTCOU_SoftDeleteStatus = 1
--WHERE T_MAP_Contacts_Ref_OrganisationalUnit.MAP_CTCOU_UID IS NULL -- 989
-- CROSS APPLY -- = INNER JOIN
OUTER APPLY -- = LEFT JOIN
(
SELECT TOP 1
--MAP_CTCOU_UID
MAP_CTCOU_CT_UID
,MAP_CTCOU_COU_UID
,MAP_CTCOU_DateFrom
,MAP_CTCOU_DateTo
FROM T_MAP_Contacts_Ref_OrganisationalUnit
WHERE MAP_CTCOU_SoftDeleteStatus = 1
AND MAP_CTCOU_CT_UID = T_Contacts.CT_UID
/*
AND
(
(#in_DateFrom <= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateTo)
AND
(#in_DateTo >= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateFrom)
)
*/
ORDER BY MAP_CTCOU_DateFrom
) AS FirstOE
In PostgreSQL, starting from version 9.3, you can do that, too - just use the LATERAL keyword to achieve the same:
SELECT * FROM T_Contacts
--LEFT JOIN T_MAP_Contacts_Ref_OrganisationalUnit ON MAP_CTCOU_CT_UID = T_Contacts.CT_UID AND MAP_CTCOU_SoftDeleteStatus = 1
--WHERE T_MAP_Contacts_Ref_OrganisationalUnit.MAP_CTCOU_UID IS NULL -- 989
LEFT JOIN LATERAL
(
SELECT
--MAP_CTCOU_UID
MAP_CTCOU_CT_UID
,MAP_CTCOU_COU_UID
,MAP_CTCOU_DateFrom
,MAP_CTCOU_DateTo
FROM T_MAP_Contacts_Ref_OrganisationalUnit
WHERE MAP_CTCOU_SoftDeleteStatus = 1
AND MAP_CTCOU_CT_UID = T_Contacts.CT_UID
/*
AND
(
(__in_DateFrom <= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateTo)
AND
(__in_DateTo >= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateFrom)
)
*/
ORDER BY MAP_CTCOU_DateFrom
LIMIT 1
) AS FirstOE
Try using a subquery in your ON clause. An abstract example:
SELECT
*
FROM table1
JOIN table2 ON table2.id = (
SELECT id FROM table2 WHERE table2.table1_id = table1.id LIMIT 1
)
WHERE
...