How to count row's in hive? - sql

this is my table:
ID/Number/Date
1/111/2021-01-01
2/111/2021-01-02
6/333/2921-01-01
I need a table which count the rows based on Number order by Date asc.
This should be my final table:
ID/Number/Date/Row_No_Count
1/111/2021-01-01/1
2/111/2021-01-02/2
6/333/2921-01-01/1
How to achieve this with hive? Is their any function?

Row Number is a Function IN SQL Server for this type of Work.
You can solve Your Problem on based on below Query .
Query : Select *,row_number () Over (partition by Number order by Number) 'Row_Number_Count' From t ;
Output :
id Number Date Row_Number_Count
----------- ----------- ---------- --------------------
1 111 2021-01-01 1
2 111 2021-01-02 2
6 333 2921-01-01 1
(3 rows affected)

try row_number() window function like below.
select t.*,
row_number() over(partition by Number order by Number,Date asc ) as Row_No_Count
from table t

Related

Retrieve max date for distinct IDs in a table [duplicate]

This question already has answers here:
Fetch the rows which have the Max value for a column for each distinct value of another column
(35 answers)
GROUP BY with MAX(DATE) [duplicate]
(6 answers)
Select First Row of Every Group in sql [duplicate]
(2 answers)
Oracle SQL query: Retrieve latest values per group based on time [duplicate]
(2 answers)
Return row with the max value of one column per group [duplicate]
(3 answers)
Closed 3 years ago.
I have the table ABC with the following data
Id Name Date Execution id
-- ---- --------- -------------
1 AA 09SEP2019 11
1 AA 08SEP2019 22
1 AA 07SEP2019 33
2 BB 09SEP2019 44
2 BB 08SEP2019 55
2 BB 07SEP2019 66
And I want to get for every distinct ID in the table its max date. So the result set must be as the following
Id Name Date Execution id
-- ---- --------- -------------
1 AA 09SEP2019 11
2 BB 09SEP2019 44
The query that returns the result I need
WITH MaxDate as (
SELECT Id,Name,Max(Date) from ABC group by Id,Name
)
SELECT view1.*, view2.exection_id
from
MaxDate view1,
ABC view2
WHERE
view1.date=view2.date and
view1.name=view2.name;
I don't like to get the max date for the distinct ID by this way. May be there is another way ? Might be there is more easiest way?
One way is to use RANK:
WITH cte AS (
SELECT ABC.*, RANK() OVER(PARTITION BY Id,Name ORDER BY Date DESC) rnk
FROM ABC
)
SELECT *
FROM cte
WHERE rnk = 1
ORDER BY id;
You can use keep dense_rank last do to this in one level of query, as long as you only want one or a small number of column retained:
select id,
name,
max(date_) as date_,
max(execution_id) keep (dense_rank last order by date_) as execution_id
from abc
group by id, name
order by id;
ID NAME DATE_ EXECUTION_ID
---------- ---- ---------- ------------
1 AA 2019-09-09 11
2 BB 2019-09-09 44
If ID and name are not always the same, and you want the name form the latest date too, then use the same pattern:
select id,
max(name) keep (dense_rank last order by date_) as name,
max(date_) as date_,
max(execution_id) keep (dense_rank last order by date_) as execution_id
from abc
group by id
order by id;
which gets the same result with your sample data.
With lots of columns it's probably simpler to use a subquery (CTE or inline view) with a ranking function and a filter (as #Lukasz shows).
With NOT EXISTS:
select t.* from ABC t
where not exists (
select 1 from ABC
where "Id" = t."Id" and "Name" = t."Name" and "Date" > t."Date"
)
I used and name = t.name only because you have it in your code.
If it is not needed you can remove it.
See the demo.
Results:
Id | Name | Date | Execution id
-: | :--- | :---------| -----------:
1 | AA | 09-SEP-19 | 11
2 | BB | 09-SEP-19 | 44

Calculate "position in run" in SQL

I have a table of consecutive ids (integers, 1 ... n), and values (integers), like this:
Input Table:
id value
-- -----
1 1
2 1
3 2
4 3
5 1
6 1
7 1
Going down the table i.e. in order of increasing id, I want to count how many times in a row the same value has been seen consecutively, i.e. the position in a run:
Output Table:
id value position in run
-- ----- ---------------
1 1 1
2 1 2
3 2 1
4 3 1
5 1 1
6 1 2
7 1 3
Any ideas? I've searched for a combination of windowing functions including lead and lag, but can't come up with it. Note that the same value can appear in the value column as part of different runs, so partitioning by value may not help solve this. I'm on Hive 1.2.
One way is to use a difference of row numbers approach to classify consecutive same values into one group. Then a row number function to get the desired positions in each group.
Query to assign groups (Running this will help you understand how the groups are assigned.)
select t.*
,row_number() over(order by id) - row_number() over(partition by value order by id) as rnum_diff
from tbl t
Final Query using row_number to get positions in each group assigned with the above query.
select id,value,row_number() over(partition by value,rnum_diff order by id) as pos_in_grp
from (select t.*
,row_number() over(order by id) - row_number() over(partition by value order by id) as rnum_diff
from tbl t
) t

Oracle SQL: retrieve sum and value from row in group based on value of another column

I am trying to create a summary query that returns the sum of the quantity for each group along with the description from from the row with the largest quantity in that group.
For example, if the table looks like this:
GROUP QTY DESC
----- --- ----
1 23 CCC
1 42 AAA
1 61 BBB
2 11 ZZZ
2 53 XXX
2 32 YYY
The query would return:
1 125 BBB (desc from row with largest qty for group 1)
2 95 XXX (desc from row with largest qty for group 2)
Thanks!
The window function row_number() is your friend for this type of query. It assigns a sequential number to values. You can then use this information in an aggregation:
select group, sum(qty), max(case when seqnum = 1 then desc end)
from (select t.*,
row_number() over (partition by group order by qty desc) as seqnum
from t
) t
group by group
By the way, group and desc are lousy names for columns because they conflict with reserved words. You should rename them or enclose them in double quotes in the query.

MAX function without group by

I have the following table:
ID | NUM
1 | 4
2 | 9
3 | 1
4 | 7
5 | 10
I want a result of:
ID | NUM
5 | 10
When I try to use MAX(NUM) I get and error that I have to use GROUP BY in order to use MAX function
Any idea?
As per the error, use of an aggregate like Max requires a Group By clause if there are any non-aggregated columns in the select list (In your case, you are trying to find the MAX(Num) and then return the value(s) associated in the ID column). In MS SQL Server you can get what you want via ordering and limiting the returned rows:
SELECT TOP 1 ID, NUM
FROM [table]
ORDER BY NUM DESC;
In other RDBMS systems the LIMIT offers similar functionality.
Edit
If you need to return all rows which have the same maximum, then use the WITH TIES qualification:
SELECT TOP 1 WITH TIES ID, NUM
FROM [table]
ORDER BY NUM DESC;
May return more than 1 result:
SELECT id, num
FROM table
WHERE num = (SELECT MAX(num) FROM table)
Try this query.
WITH result AS
(
select DENSE_RANK() OVER( ORDER BY NUM desc) AS RowNo,ID,NUM from #emp
)
select ID,NUM from result where RowNo=1
it will return max values even if it has more MAX values like:
ID | NUM
5 | 10
6 | 10
refer below link to know more about RANKING Functions:
http://msdn.microsoft.com/en-us/library/ms189798
How about:
SELECT TOP 1 ID,NUM FROM table ORDER BY NUM DESC;
Do this -
SELECT TOP 1 ID,
NUM
FROM <yourtable>
ORDER BY NUM DESC;
Get all rows have max values but THERE ARE 3 SELECT, It's not good for performance
SELECT id, MAX(num) as num
FROM table
GROUP BY id
ORDER BY MAX(num) DESC
LIMIT (SELECT COUNT(*)
FROM table
WHERE num =(SELECT MAX(num) FROM table)
)

How to distinguish between the first and rest for duplicate records using sql?

These are the input table and required output table.
Input table
ID Name
-------------
1 aaa
1 ababaa
2 bbbbbb
2 bcbcbccbc
2 bcdbcdbbbbb
3 ccccc
Output table
ID Name Ord
-----------------------------
1 aaa first
1 ababaa rest
2 bbbbbb first
2 bcbcbccbc rest
2 bcdbcdbbbbb rest
3 ccccc first
First and Rest is based on the occurrence of an ID field.
Is there a way to write a SQL query to achieve this ?
P.S. - This question is somewhat similar to what I am looking for.
select id, name, case rnk when 1 then 'first' else 'rest' end ord
from(
select *, RANK() over(partition by id order by id,name) rnk
from input
) X
You can also try this
SELECT id, name,
Decode(ROW_NUMBER() OVER (partition by id order by id,name),1,'First','Rest') Ord
FROM Input_table;
You can use this query as this is much simple and yields good performance