For each unique value of some groupid column, how do I get rows with last 3 dates? - sql

I have a table with columns: FILING_ID, DATE, and BLAH
I'm trying to write a query that for each FILING_ID, returns the rows with the last three dates. If table is:
FILING_ID DATE
aksjdfj 2/1/2006
b 2/1/2006
b 3/1/2006
b 4/1/2006
b 5/1/2006
I would like:
FILING_ID DATE
aksjdfj 2/1/2006
b 3/1/2006
b 4/1/2006
b 5/1/2006
I was thinking of maybe running some query to figure out the 3rd highest date for each FILING_ID then doing a join and comparing the cutoff date with the DATE?
I use PostgreSQL. Is there some way to use limit?

SELECT filing_id, date -- more columns?
FROM (
SELECT *, row_number() OVER (PARTITION BY filing_id ORDER BY date DESC NULLS LAST) AS rn
FROM tbl
) sub
WHERE rn < 4
ORDER BY filing_id, date; -- optionally order rows
NULLS LAST is only relevant if date can actually be NULL.
If date is not unique, you may need to break ties to get stable results.
PostgreSQL sort by datetime asc, null first?
Select first row in each GROUP BY group?
Is there some way to use limit?
Maybe. If you have an additional table holding all distinct filing_id (and possibly a few more, which are removed by the join), you can use CROSS JOIN LATERAL (, LATERAL is short syntax):
SELECT f.filing_id, t.*
FROM filing f -- table with distinct filing_id
, LATERAL (
SELECT date -- more columns?
FROM tbl
WHERE filing_id = f.filing_id
ORDER BY date DESC NULLS LAST
LIMIT 3 -- now you can use LIMIT
) t
ORDER BY f.filing_id, t.date;
What is the difference between LATERAL and a subquery in PostgreSQL?
If you don't have a filing table, you can create one. Or derive it on the fly:
Optimize GROUP BY query to retrieve latest record per user

Related

How create a unique ID based on conditions in SQL?

I would like to get a new ID, no matter the format (in the example below 11,12,13...)
Based on the following condition:
Every time the days column value is greater then 1 and not null then current row and all following ones will get the same ID until a new value will meet the condition.
Within the same email
Below you can see the expected 1 (in the format of XX)
I thought about using two conditions with the following order between them
Every time the days column value is greater then 1 then all following rows will get the same ID until a new value will meet the condition.
2.AND When lag (previous) is equal to 0/1/null.
Assuming you have an EmailDate column over which you're ordering (a DATETIME field, really), try something like this:
WITH
TableNameWithEmailDateIDs AS (
SELECT
*,
ROW_NUMBER() OVER (
ORDER BY
Email DESC,
EmailDate
) AS EmailDateID
FROM
TableName
),
IDs AS (
SELECT
*,
LEAD(EmailDateID, 1) OVER (
ORDER BY
Email,
EmailDate
) AS LeadEmailDateID
FROM
(
SELECT
*,
-- REMOVE +10 if you don't want 11 to be starting ID
ROW_NUMBER() OVER (
ORDER BY
Email DESC,
EmailDate
)+10 AS ID
FROM
TableNameWithEmailDateIDs
WHERE
Days > 1
OR Days IS NULL
) X
)
SELECT
COALESCE(TableName.EmailDate, IDs.EmailDate) AS EmailDate,
IDs.Email,
COALESCE(TableName.Days, IDs.Days) AS Days,
IDs.ID
FROM
IDs
LEFT JOIN TableNameWithEmailDateIDs TableName
ON IDs.Email = TableName.Email
AND TableName.EmailDateID BETWEEN
IDs.EmailDateID
AND IDs.LeadEmailDateID-1
ORDER BY
ID DESC,
TableName.EmailDate DESC
;
First, create a CTE that generates IDs for each distinct Email/Date combo (helpful for LEFT JOIN condition later). Then, create a CTE that generates IDs for rows that meet your condition (i.e. the important rows). Finally, LEFT JOIN your main table onto that CTE to fill in the "gaps", so to speak.
I suggest running each of the components of this query independently to fully understand what's going on.
Hope it helps!

How to pick first record from the duplicates, With only duplicate column values

Here is the situation where I have a table in bigquery like following.
As in the table we have record 1 and 3 with the same id but different first_name (Say the person with the id one changed his first_name) all other fields are same in both of the records (1 and 3) Now I need to select one records out of those 2 how can I do that. I tried self join but that is discarding both of the records, group_by will not work because the records is not duplicate only the Id is duplicate same with the distinct.
Thanks!!!!
The query I am using right now is
select * from table t group by 1,2,3,4,5;
You Can use ROW_NUMBER function to assign row numbers to each of your records in the table.
select *
from(
select *, ROW_NUMBER() OVER(PARTITION BY t.id) rn
from t)
Where rn = 1
ROW_NUMBER does not require the ORDER BY clause. Returns the sequential row ordinal (1-based) of each row for each ordered partition. If the ORDER BY clause is unspecified then the result is non-deterministic.
If you have record created date or modified dates you can use those in the ORDER BY clause to alway pick up the latest records.
SQL tables represent unordered sets. There is no first row unless you have a column that specifies the ordering. Let me assume you have such a column.
If you want a particular row, you can use aggregation with an order by:
select array_agg(t order by ? asc limit 1)[ordinal(1)].*
from t
group by id;
? is the column that specifies the ordering.
You can also leave out the order by:
select array_agg(t limit 1)[ordinal(1)].*
from t
group by id;

Find unique row when two rows when two row have the column id as the same, but have other column a date which I want the newset one

Create a query of unique values, but some of them have same ids but different dates. Just want the newest date. I am joining several tables, but do not know how to handle this
SELECT DISTINCT ap.id, MAX(ap.date)
FROM sometable;
I tried this code but no result.
I get these resulst:
id date
------------
1 10/31/18
1 10/15/18
2 11/05/17
2 11/04/17
But I want these results:
1 10/31/18
2 11/05/17
In case your query has other columns too that you want to show in result, you will have to resort to analytical function, in such case your query will look like following
select id, the_date /* ,other columns */ from (
select row_number() over (partition by id
order by some_date /* your date column */ desc ) ord,
id,
some_date the_date
/* ,other columns */
from <your_table>
) where ord = 1
;
You need group by
SELECT ap.id, MAX(ap.date)
from sometable ap
group by ap.id;
the aggregation function as min(), max() ,, count() need group by for return the related agreagated result
and for the query in you comment you should use
SELECT ap.id, MAX(ap.date)
from sometable ap
where ap.id in ( id1, id2, id3.... idns)
group by ap.id

how to get latest date column records when result should be filtered with unique column name in sql?

I have table as below:
I want write a sql query to get output as below:
the query should select all the records from the table but, when multiple records have same Id column value then it should take only one record having latest Date.
E.g., Here Rudolf id 1211 is present three times in input---in output only one Rudolf record having date 06-12-2010 is selected. same thing with James.
I tried to write a query but it was not succssful. So, please help me to form a query string in sql.
Thanks in advance
You can partition your data over Date Desc and get the first row of each partition
SELECT A.Id, A.Name, A.Place, A.Date FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Id ORDER BY Date DESC) AS rn
FROM [Table]
) A WHERE A.rn = 1
you can use WITH TIES
select top 1 PERCENT WITH TIES * from t
order by (row_number() over(partition by id order by date desc))
https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=280b7412b5c0c04c208f2914b44c7ce3
As i can see from your example, duplicate rows differ only in Date. If it's a case, then simple GROUP BY with MAX aggregate function will do the job for you.
SELECT Id, Name, Place, MAX(Date)
FROM [TABLE_NAME]
GROUP BY Id, Name, Place
Here is working example: http://sqlfiddle.com/#!18/7025e/2

SQL: select next available date for multiple records

I have an oracle DB.
My table has ID and DATE columns (and more).
I would like to select for every ID the next available record after a certain date. For only one ID the query would be:
SELECT * FROM my_table
WHERE id = 1 AND date >= '01.01.2018'
(just ignoring the to_date() function)
How would that look like for multiple IDs? And I do want to SELECT *.
Thanks!
We can use ROW_NUMBER here:
SELECT ID, date -- and maybe other columns
FROM
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY date) rn
FROM my_table
WHERE date >= date '2018-01-01'
) t
WHERE rn = 1
The idea here is to assign a row number to each ID partition, starting with the earliest date which occurs after the cutoff you specify. The first record from each partition would then be the immediate next date, assuming it exists.