Hive joining columns with milliseconds - hive

I have a table having columns id,create_time,code.
create_time column is of type string having timestamp value in the format yyyy-MM-dd HH:mm:ss.SSSSSS
Now my requirement is to find the latest code(recent create_time) for each id. If the create_time column has no milliseconds part, I can do
select id,create_time,code from(
select id,max(unix_timestamp(create_time,"yyyy-MM-dd HH:mm:ss")) over (partition by id) as latest_time from table)a
join table b on a.latest_time=b.create_time
As unix time functions consider only seconds not milliseconds, am not able to proceed with them.
Please help

Why would you try to convert at all? Since you are only looking for the latest timestamp I would just do:
select id,create_time,code from(
select id,max(create_time) over (partition by id) as latest_time from table)a
join table b on a.latest_time=b.create_time
The ones without miliseconds will be treated, as they would have "000000" instead.

You do not need join for this.
If you need all records with max(create_time), use rank() or dense_rank(). Rank will assign 1 to all records with the latest create_time if there are many records with the same time.
If you need only one record per id even it there are many records with create_time=max(create_time), then use row_number() instead of rank():
select id,create_time,code
from
(
select id,create_time,code,
rank() over(partition by id order by create_time desc) rn
)s
where rn=1;

Related

BigQuery - Extract last entry of each group

I have one table where multiple records inserted for each group of product. Now, I want to extract (SELECT) only the last entries. For more, see the screenshot. The yellow highlighted records should be return with select query.
The HAVING MAX and HAVING MIN clause for the ANY_VALUE function is now in preview
HAVING MAX and HAVING MIN were just introduced for some aggregate functions - https://cloud.google.com/bigquery/docs/release-notes#February_06_2023
with them query can be very simple - consider below approach
select any_value(t having max datetime).*
from your_table t
group by t.id, t.product
if applied to sample data in your question - output is
You might consider below as well
SELECT *
FROM sample_table
QUALIFY DateTime = MAX(DateTime) OVER (PARTITION BY ID, Product);
If you're more familiar with an aggregate function than a window function, below might be an another option.
SELECT ARRAY_AGG(t ORDER BY DateTime DESC LIMIT 1)[SAFE_OFFSET(0)].*
FROM sample_table t
GROUP BY t.ID, t.Product
Query results
You can use window function to do partition based on key and selecting required based on defining order by field.
For Example:
select * from (
select *,
rank() over (partition by product, order by DateTime Desc) as rank
from `project.dataset.table`)
where rank = 1
You can use this query to select last record of each group:
Select Top(1) * from Tablename group by ID order by DateTime Desc

new column with row number sql

I have data like below with two columns, I need an output with new column shown below
Input -
Name,Date,Value
Test1,20200901,55
Test1,20200901,100
Test1,20200901,150
Test1,20200805,25
Test1,20200805,30
Row number is based on data from column - Name and Date
Output,
Name,Date,Value, row_number
Test1,20200901,55,1
Test1,20200901,100,1
Test1,20200901,150,1
Test1,20200805,25,2
Test1,20200805,30,2
The query using Partition didn't help
select *, row_number() over (partition by Date) as Rank from Table
Can someone please help here
Thank you very much
You want dense_rank():
select *,
dense_rank() over (order by Date) as Rank
from Table;
There is something suspicious when you are using partition by without order by (even if the underlying database supports that).
Use dense_rank() - and an order by clause:
select t.*, dense_rank() over (order by Date) as rn from mytable t
This gives you a sequential number that starts at 1 on the earliest date value increments without gaps everytime date changes.

Test whether MIN would work over ROW_NUMBER

Situation:
I have three columns:
id
date
tx_id
The primary id column is tx_id and is unique in the table. Each tx_id is tied to an id and it has a record date. I would like to test whether or not the tx_id is incremental.
Objective:
I need to extract the first tx_id by id but I want to prevent using ROW_NUMBER
i.e
select id, date, tx_id, row_number() over(partition by id order by date asc) as First_transaction_id from table
and simply use
select id, date, MIN(tx_id) as First_transaction_id from table
So how can i make sure since i have more than 50 millions of ids that by using MINtx_id will yield the earliest transaction for each id?
How can i add a flag column to segment those that don't satisfy the condition?
how can i make sure since i have more than 50 millions of ids that by using MINtx_id will yield the earliest transaction for each id?
Simply do the comparison:
You can get the exceptions with logic like this:
select t.*
from (select t.*,
min(tx_id) over (partition by id) as min_tx_id,
rank() over (partition by id order by date) as seqnum
from t
) t
where tx_id = min_tx_id and seqnum > 1;
Note: this uses rank(). It seems possible that there could be two transactions for an id on the same date.
use corelated sunquery
select t.* from table_name t
where t.date= ( select min(date) from table_name
t1 where t1.id=t.id)

how to get latest date column records when result should be filtered with unique column name in sql?

I have table as below:
I want write a sql query to get output as below:
the query should select all the records from the table but, when multiple records have same Id column value then it should take only one record having latest Date.
E.g., Here Rudolf id 1211 is present three times in input---in output only one Rudolf record having date 06-12-2010 is selected. same thing with James.
I tried to write a query but it was not succssful. So, please help me to form a query string in sql.
Thanks in advance
You can partition your data over Date Desc and get the first row of each partition
SELECT A.Id, A.Name, A.Place, A.Date FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Id ORDER BY Date DESC) AS rn
FROM [Table]
) A WHERE A.rn = 1
you can use WITH TIES
select top 1 PERCENT WITH TIES * from t
order by (row_number() over(partition by id order by date desc))
https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=280b7412b5c0c04c208f2914b44c7ce3
As i can see from your example, duplicate rows differ only in Date. If it's a case, then simple GROUP BY with MAX aggregate function will do the job for you.
SELECT Id, Name, Place, MAX(Date)
FROM [TABLE_NAME]
GROUP BY Id, Name, Place
Here is working example: http://sqlfiddle.com/#!18/7025e/2

Max of a Date field into another field in Postgresql

I have a postgresql table wherein I have few fields such as id and date. I need to find the max date for that id and show the same into a new field for all the ids. SQLFiddle site was not responding so I have an example in the excel. Here is the screenshot of the data and the output for the table.
You could use the windowing variant of max:
SELECT id, date, MAX(date) OVER (PARTITION BY id)
FROM mytable
Something like this might work:
WITH maxdts AS (
SELECT id, max(dt) maxdt FROM table GROUP BY id
)
SELECT id, date, maxdt FROM table t, maxdts m WHERE t.id = m.id;
Keep in mind without more information that this could be a horribly inefficient query, but it will get you what you need.