Group by unique customers and then create column for min date - sql

I have a table with customer IDs, transaction_date
Example of table:
ID |transaction_date
PT2073|2015-02-28
PT2073|2019-02-28
PT2013|2015-04-28
PT2013|2017-02-11
PT2013|2017-07-11
GOAL: I want to create another column so that for each unique ID I get the first first transaction date. It should look like:
Example of table:
ID |transaction_date|first_transaction_date
PT2073|2015-02-28 |2015-02-28
PT2073|2019-02-28 |2015-02-28
PT2013|2015-04-28 |2015-04-28
PT2013|2017-02-11 |2015-04-28
PT2013|2017-07-11 |2015-04-28
I created another table where I select the minimum and then group it :
SELECT id, MIN(transaction_date) as first_transaction_date FROM customer_details GROUP BY id
When I checked the table I was not getting the same value in first_transaction_date column for a unique id.

Use min() as a window function:
select t.*, min(t.transaction_date) over (partition by t.id)
from t;
No group by is needed.

Use first_value() window function: It gives out the first value of an ordered group (your groups are the ids that are ordered by the transaction_date):
Click for demo:db<>fiddle
SELECT
*,
first_value(transaction_date) OVER (PARTITION BY id ORDER BY transaction_date)
FROM
mytable
If your Postgres version does not support window functions (prior 8.4, Amazon Redshift):
Click demo:db<>fiddle
SELECT
t1.*,
(SELECT min(transaction_date) FROM mytable t2 WHERE t1.id = t2.id)
FROM
mytable t1

Related

BigQuery - Extract last entry of each group

I have one table where multiple records inserted for each group of product. Now, I want to extract (SELECT) only the last entries. For more, see the screenshot. The yellow highlighted records should be return with select query.
The HAVING MAX and HAVING MIN clause for the ANY_VALUE function is now in preview
HAVING MAX and HAVING MIN were just introduced for some aggregate functions - https://cloud.google.com/bigquery/docs/release-notes#February_06_2023
with them query can be very simple - consider below approach
select any_value(t having max datetime).*
from your_table t
group by t.id, t.product
if applied to sample data in your question - output is
You might consider below as well
SELECT *
FROM sample_table
QUALIFY DateTime = MAX(DateTime) OVER (PARTITION BY ID, Product);
If you're more familiar with an aggregate function than a window function, below might be an another option.
SELECT ARRAY_AGG(t ORDER BY DateTime DESC LIMIT 1)[SAFE_OFFSET(0)].*
FROM sample_table t
GROUP BY t.ID, t.Product
Query results
You can use window function to do partition based on key and selecting required based on defining order by field.
For Example:
select * from (
select *,
rank() over (partition by product, order by DateTime Desc) as rank
from `project.dataset.table`)
where rank = 1
You can use this query to select last record of each group:
Select Top(1) * from Tablename group by ID order by DateTime Desc

Test whether MIN would work over ROW_NUMBER

Situation:
I have three columns:
id
date
tx_id
The primary id column is tx_id and is unique in the table. Each tx_id is tied to an id and it has a record date. I would like to test whether or not the tx_id is incremental.
Objective:
I need to extract the first tx_id by id but I want to prevent using ROW_NUMBER
i.e
select id, date, tx_id, row_number() over(partition by id order by date asc) as First_transaction_id from table
and simply use
select id, date, MIN(tx_id) as First_transaction_id from table
So how can i make sure since i have more than 50 millions of ids that by using MINtx_id will yield the earliest transaction for each id?
How can i add a flag column to segment those that don't satisfy the condition?
how can i make sure since i have more than 50 millions of ids that by using MINtx_id will yield the earliest transaction for each id?
Simply do the comparison:
You can get the exceptions with logic like this:
select t.*
from (select t.*,
min(tx_id) over (partition by id) as min_tx_id,
rank() over (partition by id order by date) as seqnum
from t
) t
where tx_id = min_tx_id and seqnum > 1;
Note: this uses rank(). It seems possible that there could be two transactions for an id on the same date.
use corelated sunquery
select t.* from table_name t
where t.date= ( select min(date) from table_name
t1 where t1.id=t.id)

Max of a Date field into another field in Postgresql

I have a postgresql table wherein I have few fields such as id and date. I need to find the max date for that id and show the same into a new field for all the ids. SQLFiddle site was not responding so I have an example in the excel. Here is the screenshot of the data and the output for the table.
You could use the windowing variant of max:
SELECT id, date, MAX(date) OVER (PARTITION BY id)
FROM mytable
Something like this might work:
WITH maxdts AS (
SELECT id, max(dt) maxdt FROM table GROUP BY id
)
SELECT id, date, maxdt FROM table t, maxdts m WHERE t.id = m.id;
Keep in mind without more information that this could be a horribly inefficient query, but it will get you what you need.

Query historized data

To describe my query problem, the following data is helpful:
A single table contains the columns ID (int), VAL (varchar) and ORD (int)
The values of VAL may change over time by which older items identified by ID won't get updated but appended. The last valid item for ID is identified by the highest ORD value (increases over time).
T0, T1 and T2 are points in time where data got entered.
How do I get in an efficient manner to the Result set?
A solution must not involve materialized views etc. but should be expressible in a single SQL-query. Using Postgresql 9.3.
The correct way to select groupwise maximum in postgres is using DISTINCT ON
SELECT DISTINCT ON (id) sysid, id, val, ord
FROM my_table
ORDER BY id,ord DESC;
Fiddle
You want all records for which no newer record exists:
select *
from mytable
where not exists
(
select *
from mytable newer
where newer.id = mytable.id
and newer.ord > mytable.ord
)
order by id;
You can do the same with row numbers. Give the latest entry per ID the number 1 and keep these:
select sysid, id, val, ord
from
(
select
sysid, id, val, ord,
row_number() over (partition by id order by ord desc) as rn
from mytable
)
where rn = 1
order by id;
Left join the table (A) against itself (B) on the condition that B is more recent than A. Pick only the rows where B does not exist (i.e. A is the most recent row).
SELECT last_value.*
FROM my_table AS last_value
LEFT JOIN my_table
ON my_table.id = last_value.id
AND my_table.ord > last_value.ord
WHERE my_table.id IS NULL;
SQL Fiddle

SQL: Eliminate duplicate entries by selecting ID but showing Name instead

How can I select by (ID) but still show distinct (Name) by the newest (Mod.Date) to eliminate the duplicates in the (Name) column. I'm assuming this is easy but i've never done this. Thank you
You need a Subquery which selects the latest date for each ID:
SELECT t.ID, max(t.`mod.date`) last_date
FROM YourTable t
GROUP BY t.ID
This subquery has to be linked to the original table using the ID and the date.
SELECT t.ID,t.Name,t.`mod.date`
FROM YourTable t
JOIN (SELECT t.ID, max(t.`mod.date`) last_date
FROM YourTable t
GROUP BY t.ID) tmp ON tmp.ID=t.ID AND tmp.`mod.date`=t.`mod.date`
This gives you ID and (latest) Name for all IDs.
Update: Another possibility which should work in Access also is to use the ALL comparison:
SELECT t.ID,t.Name,t.`mod.date`
FROM YourTable t
WHERE t.`mod.date` >= ALL (SELECT max(t1.`mod.date`)
FROM YourTable t1
WHERE t1.ID=t.ID GROUP BY t1.ID)