Select the max timestamp in hive - hive

I have a table-customer with two records in the different timestamp. I want to select the max timestamp records: 2014-08-15 15:54:07.379.
Select Customer_ID, Account_ID, max(ProcessTimeStamp)
from Customer
group by Customer_ID, Account_ID
I should get one record, but the actual result is two records.
How can I get the max ProcessTimeStamp records?

You can use windows function here.
Either you can use dense_rank() or row_num().
1.USING DENSE_RANK()
select customer_id,account_id,processTimeStamp
from (select *
,dense_rank() over(partition by customer_id order by processTimeStamp desc) as rank
from "your table"
) temp
where rank=1
2.USING ROW NUMBER
select customer_id,account_id,processTimeStamp
from (select *
,row_number() over(partition by customer_id order by processTimeStamp desc) as rank
from "your table"
) temp
where rank=1
BUT with row_number() each row will get a unique number and if there are duplicate records than row_number will give only the row where row number=1(in above mentioned case).

Related

How do I select 1 [oldest] row per group of rows, given multiple groups?

Let's say we have the database table below, called USER_JOBS.
I'd like to write an SQL query that reflects this algorithm:
Divide the whole table in groups of rows defined by a common USER_ID (in the example table, the 2 resulting groups are colored yellow & green)
From each group, select the oldest row (according to SCHEDULE_TIME)
From this example table, the desired SQL query would return these 2 rows:
You can use ranking function (supported in most RDBS):
SELECT *
FROM
(
SELECT *
,ROW_NUMBER() OVER (PARTITION BY USER_ID ORDER BY SCHEDULE_TIME DESC) AS RowID
FROM [table]
)
WHERE RowID = 1
WITH Ranked AS (
SELECT
RANK() OVER (PARTITION BY User_ID ORDER BY ScheduleTime DESC) as Ranking,
*
FROM [table_name]
)
SELECT Status, Sob_Type, User_ID, TimeStamp FROM ranking WHERE Ranks = 1;

How to grab the last value in a column per user for the last date

I have a table that contains three columns: ACCOUNT_ID, STATUS, CREATE_DATE.
I want to grab only the LAST status for each account_id based on the latest create_date.
In the example above, I should only see three records and the last STATUS per that account_2.
Do you know a way to do this?
create table TBL 1 (
account_id int,
status string,
create_date date)
select account_id, max(create_date) from table group by account_id;
will give you the account_id and create_date at the closest past date to today (assuming create_date can never be in the future, which makes sense).
Now you can join with that data to get what you want, something along the lines for example:
select account_id, status, create_date from table where (account_id, create_date) in (<the select expression from above>);
If you use that frequently (account with the latest create date), then consider defining a view for that.
If you have many columns and want keep the row that is the last line, you can use QUALIFY to run the ranking logic, and keep the best, like so:
SELECT *
FROM tbl
QUALIFY row_number() over (partition by account_id order by create_date desc) = 1;
The long form is the same pattern the Ely shows in the second answer. But with the MAX(CREATE_DATE) solution, if you have two rows on the same last day, the IN solution with give you both. you can also get via QUALIFY if you use RANK
So the SQL is the same as:
SELECT account_id, status, create_date
FROM (
SELECT *,
row_number() over (partition by account_id order by create_date desc) as rn
FROM tbl
)
WHERE rn = 1;
So the RANK for, which will show all equal rows is:
SELECT *
FROM tbl
QUALIFY rank() over (partition by account_id order by create_date desc) = 1;

Select only 3 rows per user - SQL Query

These are the columns in my table
id (autogenerated)
created_user
created_date
post_text
This table has lot of values. I wanted to take latest 3 posts of every created_user
I am new to SQL and need help. I ran the below query in my Postgres database and it is not helpful
SELECT * FROM posts WHERE created_date IN
(SELECT MAX(created_date) FROM posts GROUP BY created_date)
You could use the row_number() window function to create an ordered row count per user. After that you can easily filter by this value
demo:db<>fiddle
SELECT
*
FROM (
SELECT
*,
row_number() OVER (PARTITION BY created_user ORDER BY created_date DESC)
FROM
posts
) s
WHERE row_number <= 3
PARTITION BY groups the users
ORDER BY date DESC orders the posts of each user into a descending order to get the most recent as row_count == 1, ...

Distinct rows in a table in sql

I have a table with multiple rows of the same member id. I need only distinct rows based on 2 unique columns
Ex: there are 100 different customers, the table has 1000 rows because every customer has multiple cities and segments assigned to him.
I need 100 distinct rows for these customers depending on a unique segment and city combination. There is no specific requirement for this combination, just the first from the table is fine.
So, currently the table is somewhat like this,
Hope this helps.
use row_number()
select * from (select *,row_number() over(partition by memberid order by sales) rn
from table_name
) a where a.rn=1
Handy sql-server top(1) with ties syntax for that
select top(1) with ties t.*
from table_name t
order by row_number() over(partition by memberid order by sales)
As you have no paticular requirement for which exactly row to select, any column will do at order by, it can be null as well
select top(1) with ties t.*
from table_name t
order by row_number() over(partition by memberid order by (select null))
The simplest way to do this is to use the ROW_NUMBER() OVER(GROUP BY...) syntax. You have no need to use an order by, since you want an arbitrary row, but only one, for each member.
Since you need only the expected data, and not the Row_Number value, make sure that you detail the fields returned, like below:
SELECT
MemberId,
city,
segment,
sales
FROM (
SELECT *
ROW_NUMBER() OVER (GROUP BY MemberId) as Seq
FROM [Status]
) src
WHERE Seq = 1

Insert every other row to a Column

I have this Table with one row Transaction Date the first row is the checkIn and the second one is the Checkout if we organized the result by date asc.
I need to pass the second row value to another column named Checkout. This table has at least 1000 records
Try using the query below
With CteCheckOut as(
Select
ROW_NUMBER() over (Partition by studentID Order by studentID,transactionDate) as Rownumber,
*
From student
)
Select studentID, transactionDate as CheckOutDate
From CteCheckOut
Where Rownumber%2 = 0
You can use row_number():
select t.*
from (select t.*,
row_number() over (partition by studentid order by transactiondate) as seqnum
from t
) t
where seqnum = 2;
This assumes that only studentid is used to identify the "first" and "second" rows. If more columns are needed, add them to the partition by clause.
If the checkout date you need is the second (max) date grouped by reason and you want to single out this date then try:
select
studentID,
reason,
max(transactionDate) as checkoutDate
from student
group by
reason