PostgreSQL GROUP BY: SELECT column on MAX of another WHERE a third column = x - sql

Let's suppose we have two tables in PostgreSQL:
Table "citizens"
country_ref citizen_name entry_date
-----------------------------------------------------
0 peter 2013-01-14 21:00:00.000
1 fernando 2013-01-14 20:00:00.000
0 robert 2013-01-14 19:00:00.000
3 albert 2013-01-14 18:00:00.000
2 esther 2013-01-14 17:00:00.000
1 juan 2013-01-14 16:00:00.000
3 egbert 2013-01-14 15:00:00.000
1 francisco 2013-01-14 14:00:00.000
3 adolph 2013-01-14 13:00:00.000
2 emilie 2013-01-14 12:00:00.000
2 jacques 2013-01-14 11:00:00.000
0 david 2013-01-14 10:00:00.000
Table "countries"
country_id country_name country_group
-------------------------------------------
0 england 0
1 spain 0
2 france 1
3 germany 1
Now I want to obtain the last entered citizen on the "citizens" table for each country of a given country_group.
My best try so far is this query (Let's call it Query_1) :
SELECT country_ref, MAX(entry_date) FROM citizens
LEFT JOIN countries ON country_id = country_ref
WHERE country_group = 1 GROUP BY country_ref
Output:
country_ref max
---------------------------------
3 2013-01-14 18:00:00
2 2013-01-14 17:00:00
So then I could do:
SELECT citizen_name FROM citizens WHERE (country_ref, entry_date) IN (Query_1)
... which will give me the output I'm looking for: albert and esther.
But I'd prefer to achieve this in a single query. I wonder if it's possible?

This should be simplest and fastest:
SELECT DISTINCT ON (i.country_ref)
i.citizen_name
FROM citizens i
JOIN countries o ON o.country_id = i.country_ref
WHERE o.country_group = 1
ORDER BY i.country_ref, i.entry_date DESC
You can easily return more columns from both tables by simply adding them to the SELECT list.
SQL Fiddle.
Details, links and explanation in this related answer:
Select first row in each GROUP BY group?

SELECT citizen_name,
country_ref,
entry_date
from (
SELECT cit.citizen_name,
cit.country_ref,
MAX(cit.entry_date) over (partition by cit.country_ref) as max_date,
cit.entry_date
FROM citizens cit
LEFT JOIN countries cou ON cou.country_id = cit.country_ref
WHERE cou.country_group = 1
) t
where max_date = entry_date
SQLFiddle demo: http://www.sqlfiddle.com/#!12/50776/1

Why don't you simply:
SELECT citizen_name FROM citizens WHERE (country_ref, entry_date) IN (
SELECT country_ref, MAX(entry_date) FROM citizens
LEFT JOIN countries ON country_id = country_ref
WHERE country_group = 1 GROUP BY country_ref
)
It might not be the best plan, but it depends on many factors, and it is simple to write.

Related

SQL - How to find if the combination of column has occured before or not?

Following example demonstrates the question
id
location
dt
1
India
2020-01-01
2
Usa
2020-02-01
1
Usa
2020-03-01
3
China
2020-04-01
1
India
2020-05-01
2
France
2020-06-01
1
India
2020-07-01
2
Usa
2020-08-01
This table is sorted by date.
I want to create another column, which would tell if the id has been to the location before or not.
So, The output would be like
id
location
dt
travelled
1
India
2020-01-01
0
2
Usa
2020-02-01
0
1
Usa
2020-03-01
0
3
China
2020-04-01
0
1
India
2020-05-01
1
2
France.
2020-06-01
0
1
India
2020-07-01
1
2
Usa
2020-08-01
1
The issue I am facing is, For every row, I need to consider only the rows above it.
Use EXISTS in a CASE expression:
SELECT t1.id, t1.location,
CASE
WHEN EXISTS (
SELECT 1
FROM tablename t2
WHERE t2.id = t1.id AND t2.location = t1.location AND t2.date < t1.date
) THEN 1
ELSE 0
END travelled
FROM tablename t1
I would strongly recommend window functions for this:
select t.*,
(case when row_number() over (partition by id, location order by date) > 1
then 1 else 0
end) as travelled
from t;
Window functions are usually faster than alternative methods.

CREATE TEMP TABLE BASED ON SELECT DISTINCT ON 3 COLUMNS BUT WITH 1 EXTRA COLUMN

I need to make a temporary file with in it:
Partcode, MutationDate, MovementType, Qty
Every partcode has multiple mutationdates per Movementtype (there are max 9 movementtypes possible)
I need to get the last mutationdate per movementtype per partcode and the quantity that goes with that.
An example with partcode 003307
003307 2018-05-31 1 -100
003307 2018-06-11 2 -33
003307 2018-04-25 3 +25
and so on for all 9 movementtypes.
What did I get so far:
create table #LMUT(
MutationDate T_Date
,PartCode T_Code_Part
,CumInvQty T_Quantum_Qty10_3
,MovementType T_Type_PMOverInvt
)
insert #LMUT(
MutationDate,
Partcode,
CumInvQty,
MovementType)
SELECT
cast (max(MOV.MutationDate) as date)
,MOV.PartCode
,INV.MutationQty
,INV.PMOverInvtType
FROM dbo.T_PartMovementMain as MOV
inner join dbo.T_PartMovementOverInvt as INV on
INV.PMMainCode=MOV.PMMainCode
WHERE
MOV.PartMovementType = 1
group by MOV.PartCode,INV.PMOverInvtType,INV.MutationQty,MOV.MutationDate
SELECT * FROM #LMUT where partcode='003007'
drop table #LMUT
results in:
2016-12-06 00:00:00.000 003007 -24.000 2
2016-09-29 00:00:00.000 003007 -24.000 2
2016-11-09 00:00:00.000 003007 -24.000 2
2016-11-22 00:00:00.000 003007 -24.000 2
2016-10-26 00:00:00.000 003007 -24.000 2
2016-09-12 00:00:00.000 003007 -42.000 2
2016-10-13 00:00:00.000 003007 -24.000 2
2016-12-03 00:00:00.000 003007 100.000 5
2017-01-12 00:00:00.000 003007 -48.000 2
2016-10-04 00:00:00.000 003007 306.000 7
Not what I need, still have 8 times type 2
What else have I tried:
SELECT distinct MOV.Partcode,INV.PMOverInvtType,mov.MutationDate
FROM dbo.T_PartMovementMain as MOV
inner join dbo.T_PartMovementOverInvt as INV on
INV.PMMainCode=MOV.PMMainCode
WHERE
mov.MutationDate = (SELECT MAX (c.MutationDate) FROM
dbo.T_PartMovementMain as c
inner join dbo.T_PartMovementOverInvt as d on D.PMMainCode=c.PMMainCode
WHERE
C.PartMovementType = 1 AND
C.PartCode=mov.PartCode AND
D.PMMainCode = C.PMMainCode AND
D.PMOverInvtType=inv.PMOverInvtType
)
and MOV.PartMovementType = 1 and mov.partcode='003007'
order by MOV.Partcode,INV.PMOverInvtType
Results in:
3007 2 2017-01-12 00:00:00.000
3007 5 2016-12-03 00:00:00.000
3007 7 2016-10-04 00:00:00.000
That is what I want but I need to get the Qty too.
use row_number() window function
with cte as
( SELECT MOV.*,INV.*,
row_number() over(partition by INV.PMOverInvtType order by MOV.MutationDate desc)rn
FROM dbo.T_PartMovementMain as MOV
inner join dbo.T_PartMovementOverInvt as INV on
INV.PMMainCode=MOV.PMMainCode
) select cte.* from cte where rn=1
Solved it like this:
create table #LMUT(
PartCode T_Code_Part
,MovementType T_Type_PMOverInvt
,MutationDate T_Date
,CumInvQty T_Quantum_Qty10_3
)
insert #LMUT(Partcode,MovementType,MutationDate,CumInvQty)
select Artikel,Type,Datum,Aant
from (
SELECT MOV.Partcode as Artikel,INV.PMOverInvtType as Type,mov.MutationDate as Datum,INV.MutationQty as Aant,
row_number() over(partition by MOV.Partcode,INV.PMOverInvtType order by MOV.Partcode,INV.PMOverInvtType,MOV.MutationDate desc) rn
FROM dbo.T_PartMovementMain as MOV
inner join dbo.T_PartMovementOverInvt as INV on INV.PMMainCode=MOV.PMMainCode) cse
where rn=1
select * from #LMUT order by Partcode
drop table #LMUT

T-SQL max date and min date between two date

First, thanks for your time and your help!
I have two tables:
Table 1
PersId name lastName city
---------------------------------------
1 John Smith Tirana
2 Leri Nice Tirana
3 Adam fortsan Tirana
Table 2
Id PersId salesDate
--------------------------------------------
1 1 2017-01-22 08:00:40 000
2 2 2017-01-22 09:00:00 000
3 1 2017-01-22 10:00:00 000
4 1 2017-01-22 20:00:00 000
5 3 2017-01-15 09:00:00 000
6 1 2017-01-21 09:00:00 000
7 1 2017-01-21 10:00:00 000
8 1 2017-01-21 18:55:00 000
I would like to see the first recent sales between two dates according to each city for each day I want to bring it empty if I do not have a sale
SalesDate > '2017-01-17 09:00:00 000'
and SalesDate < '2017-01-23 09:00:00 000'
Table 2, id = 5 because the record is not in the specified date range
If I wanted my results to look like
Id PersId MinSalesDate MaxSalesDate City
-----------------------------------------------------------------------------
1 1 2017-01-22 08:00:40 000 2017-01-22 20:00:00 000 Tirana
2 2 2017-01-22 09:00:00 000 null Tirana
3 3 null null Tirana
4 1 2017-01-21 09:00:00 000 2017-01-21 18:55:00 000 Tirana
You dont identify how to get ID in the result. You appear to just want Row_Number(). I will leave that out, but this should get you started. You may have to work out conversion issues in the data range check, and I havent checked the query for syntax errors, I will leave that to you.
Select T1.PersId, City
, Min(T2.salesDate) MinSalesDate
, Max(T2.salesDate) MaxSalesDate
From Table1 T1
Left Join Table2 T2
On T1.PersId = T2.PersId
And T2.salesDate Between '2017-01-17 09:00:00 000' And < '2017-01-23 09:00:00 000'
Group BY T1.PersId, T2.City
Try the following using row_number to get min and max sale dates:
SELECT
T2.Id, T1.PersId, T2.MIN_salesDate, T2.MAX_salesDate, T1.City
FROM Table1 T1
LEFT JOIN
(
SELECT MIN(Id) as Id, PersId, MIN(salesDate) as MIN_salesDate, MAX(salesDate) as MAX_salesDate
FROM
(
SELECT
*
,ROW_NUMBER() OVER (PARTITION BY PersId ORDER BY salesDate ASC) as RNKMIN
,ROW_NUMBER() OVER (PARTITION BY PersId ORDER BY salesDate DESC) as RNKMAX
FROM Table2 T2
WHERE salesDate Between '2017-01-17 09:00:00 000' And '2017-01-23 09:00:00 000'
) temp
WHERE RNKMIN = 1 or RNKMAX = 1
GROUP BY PersId
) T2
on T1.PersId = T2.PersId

Select first event after a timestamp per row in another table in PostgreSQL

I have a table with visits to some city by some person on some timestamp:
city_visits:
person_id city timestamp
-----------------------------------------------
1 Paris 2017-01-01 00:00:00
1 Amsterdam 2017-01-03 00:00:00
1 Brussels 2017-01-04 00:00:00
1 London 2017-01-06 00:00:00
2 Berlin 2017-01-01 00:00:00
2 Brussels 2017-01-02 00:00:00
2 Berlin 2017-01-06 00:00:00
2 Hamburg 2017-01-07 00:00:00
Another table lists when a person bought ice cream:
ice_cream_events:
person_id flavour timestamp
-----------------------------------------------
1 Vanilla 2017-01-02 00:12:00
1 Chocolate 2017-01-05 00:18:00
2 Strawberry 2017-01-03 00:09:00
2 Caramel 2017-01-05 00:15:00
For each line in city_visits table, I need to join the same person's next ice-cream event, along with its timestamp and flavour:
desired_output:
person_id city timestamp ic_flavour ic_timestamp
---------------------------------------------------------------------------
1 Paris 2017-01-01 00:00:00 Vanilla 2017-01-02 00:12:00
1 Amsterdam 2017-01-03 00:00:00 Chocolate 2017-01-05 00:18:00
1 Brussels 2017-01-04 00:00:00 Chocolate 2017-01-05 00:18:00
1 London 2017-01-06 00:00:00 null null
2 Berlin 2017-01-01 00:00:00 Strawberry 2017-01-03 00:09:00
2 Brussels 2017-01-02 00:00:00 Strawberry 2017-01-03 00:09:00
2 Berlin 2017-01-06 00:00:00 null null
2 Hamburg 2017-01-07 00:00:00 null null
I've tried the following:
SELECT DISTINCT ON (cv.person_id, cv.timestamp)
cv.person_id,
cv.city,
cv.timestamp,
ic.flavour as ic_flavour,
ic.timestamp as ic_timestamp
FROM city_visits cv
JOIN ice_cream_events ic
ON ic.person_id = cv.person_id
AND ic.timestamp > cv.timestamp
The DISTINCT ON clause prevents all but one future ice cream events to be joined for each city visit. It works however it does not automatically select the first one, rather it seems to pick any ice cream event in the future for the same person. Any ORDER BY clause I can add doesn't seem to change this.
An ideal way of solving that would be to make the DISTINCT ON clause choose the minimal ic_timestamp each time he has to filter out duplicates.
Since there is no city in ice_cream_events, your query would join to lots of ice-cream events for every visit before picking the earliest one. I suggest LEFT JOIN LATERAL instead, which will be much faster for this case when backed by an appropriate index:
SELECT *
FROM city_visits cv
LEFT JOIN LATERAL (
SELECT flavour AS ic_flavour, timestamp AS ic_timestamp
FROM ice_cream_events
WHERE person_id = cv.person_id
AND timestamp > cv.timestamp
ORDER BY timestamp
LIMIT 1
) ice ON true
ORDER BY cv.person_id, cv.timestamp;
LEFT [OUTER] JOIN includes visits without any ice-cream. If you only want visits with ice-cream, switch to CROSS JOIN.
JOIN (select ...) ue ON 1=1?
The outer ORDER BY only sorts result rows in this case (unlike when combined with DISTINCT ON, where it also decides which row to pick from each set of peers).
Select first row in each GROUP BY group?
If tables are big, be sure to have appropriate indexes to make it fast. Ideally, a composite index on ice_cream_events (person_id, timestamp, flavour) - columns in this order. And on city_visits (person_id, timestamp) for the outer sort. Or maybe even on city_visits (person_id, timestamp, city) to allow another index-only scan. Depends on your actual situation. The example is obviously symbolic.
Optimize GROUP BY query to retrieve latest record per user
It seems that the DISTINCT ON clause actually follows the ORDER BY clause.
As a consequence, the problem was solved with adding the correct ordering:
SELECT DISTINCT ON (cv.person_id, cv.timestamp)
cv.person_id,
cv.city,
cv.timestamp,
ic.flavour as ic_flavour,
ic.timestamp as ic_timestamp
FROM city_visits cv
JOIN ice_cream_events ic
ON ic.person_id = cv.person_id
AND ic.timestamp > cv.timestamp
ORDER BY cv.person_id, cv.timestamp ASC, ic.timestamp ASC -- <- this line added

Select distinct records based on max(date) or NULL date

I am trying to get a list of employees based on their employee status or their most recent termination date. If the employee is active, the termination date will be NULL. There are also employees that have worked in multiple companies within our organization, I only want the record from the most recent company, whether active or terminated. An employee may also have different Employee numbers in the different companies, so the selection will have to be based on the SSN (Fica) number.
Here is an original data set:
company employee Fica First_name emp_status Term_date
5 7026 Jason T1 2013-09-16 00:00:00.000
500 7026 Jason T1 2010-11-30 00:00:00.000
7 7026 Jason T1 2009-07-31 00:00:00.000
2 90908 Jason A1 NULL
505 293866 William T1 2008-05-23 00:00:00.000
7 7243 Ashley T1 2010-07-11 00:00:00.000
2 90478 Michael T1 2013-01-11 00:00:00.000
500 90478 Michael T1 2011-09-26 00:00:00.000
500 311002 Andreas A1 NULL
3 365463 Matthew A1 NULL
500 248766 Chris T1 2007-04-23 00:00:00.000
500 90692 Kaitlyn T1 2012-03-13 00:00:00.000
2 90692 Kaitlyn A5 NULL
500 90236 Jeff T1 2011-09-26 00:00:00.000
2 90236 Jeff A1 NULL
2 90433 Nathan T1 2012-03-26 00:00:00.000
500 90433 Nathan T1 2011-09-26 00:00:00.000
Here are the results I am trying to get:
company employee Fica First_name emp_status Term_date
2 90908 Jason A1 NULL
505 293866 William T1 2008-05-23 00:00:00.000
7 7243 Ashley T1 2010-07-11 00:00:00.000
2 90478 Michael T1 2013-01-11 00:00:00.000
500 311002 Andreas A1 NULL
3 365463 Matthew A1 NULL
500 248766 Chris T1 2007-04-23 00:00:00.000
2 90692 Kaitlyn A5 NULL
2 90236 Jeff A1 NULL
2 90433 Nathan T1 2012-03-26 00:00:00.000
Thanks for any help you are able to give. I need to run this on a SQL2005 server which will be connecting to an Oracle server via ODBC.
If the dates were all populated, you could do this with a "standard" not exists query. The NULLs introduce a problem, but that problem can be solved using coalesce():
select t.*
from table t
where not exists (select 1
from table t2
where t2.employee = t.employee and
coalesce(t2.term_date, '9999-01-01') > coalesce(t.term_date, '9999-01-01)
);
NOTE: If you need for this to work on Oracle, then you need a different format for the date constant.
EDIT:
Another way to solve this uses row_number():
select t.*
from (select t.*,
row_number() over (partition by employee
order by (case when term_date is null then 0 else 1 end),
term_date desc
) as seqnum
from table t
) t
where seqnum = 1;
The rule for choosing the "last" row are embedded in the order by clause. Put the NULL value first, followed by the term_date in descending order.