Find the nth value in hive

Find the nth value in hive - sql

I am trying to identify the Nth Score Value which is also dependant on another variable.
For example I want to see the nth Transaction amount of each person, the issue I currently have is that my RANK does not re-start the count of n at each name, it just continues down the output like a row count:
Syntax example:
SELECT name, txn_amount, dense_rank() over (order by name,txn_amount desc ) as nth_value FROM payment_table
Any help is greatly appreciated.
P.S I am using HIVE to run this if it helps

You need to partition by one value and order by the other:
SELECT name, txn_amount,
FROM (SELECT pt.*,
dense_rank() over (partition by name order by txn_amount desc ) as nth_value
FROM payment_table pt
) pt
WHERE nth_value = X;
The subquery is needed to get a particular value. If you want multiple values in the same row, you can use GROUP BY:
SELECT name,
MAX(CASE WHEN nth_value = 1 THEN txn_amount END) as value_1,
MAX(CASE WHEN nth_value = 2 THEN txn_amount END) as value_2
FROM (SELECT pt.*,
dense_rank() over (partition by name order by txn_amount desc ) as nth_value
FROM payment_table pt
) pt
WHERE nth_value = X
GROUP BY name;
Note: DENSE_RANK() will ignore duplicates. If you want to see those as well (so the second value could have the same value as the first), then use ROW_NUMBER().

Related

How do I create a new SQL table with custom column names and populate these columns

So I currently have an SQL statement that generates a table with the most frequent occurring value as well as the least frequent occurring value in a table. However this table has 2 rows with the row values as well as the fields. I need to create a custom table with 2 columns with min and max. Then have one row with one value for each. The value for these columns needs to be from the same row.
(SELECT name, COUNT(name) AS frequency
FROM firefighter_certifications
GROUP BY name
ORDER BY frequency DESC limit 1)
UNION
(SELECT name, COUNT(name) AS frequency
FROM firefighter_certifications
GROUP BY name
ORDER BY frequency ASC limit 1);
So for the above query I would need the names of the min and max values in one row. I need to be able to define the name of new columns for the generated SQL query as well.
Min_Name | Max_Name
Certif_1 | Certif_2

I think this query should give you the results you want. It ranks each name according to the number of times it appears in the table, then uses conditional aggregation to select the min and max frequency names in one row:
with cte as (
select name,
row_number() over (order by count(*) desc) as maxr,
row_number() over (order by count(*)) as minr
from firefighter_certifications
group by name
)
select max(case when minr = 1 then name end) as Min_Name,
max(case when maxr = 1 then name end) as Max_Name
from cte

Postgres doesn't offer "first" and "last" aggregation functions. But there are other, similar methods:
select distinct first_value(name) over (order by cnt desc, name) as name_at_max,
first_value(name) over (order by cnt asc, name) as name_at_min
from (select name, count(*) as cnt
from firefighter_certifications
group by name
) n;
Or without any subquery at all:
select first_value(name) over (order by count(*) desc, name) as name_at_max,
first_value(name) over (order by count(*) asc, name) as name_at_min
from firefighter_certifications
group by name
limit 1;
Here is a db<>fiddle

SQL - Window function to get values from previous row where value is not null

I am using Exasol, in other DBMS it was possible to use analytical functions such LAST_VALUE() and specify some condition for the ORDER BY clause withing the OVER() function, like:
select ...
LAST_VALUE(customer)
OVER (PARTITION BY ID ORDER BY date_x DESC ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING ) as the_last
Unfortunately I get the following error:
ERROR: [0A000] Feature not supported: windowing clause (Session:
1606983630649130920)
the same do not happen if instead of AND 1 PRECEDING I use: CURRENT ROW.
Basically what I wanted is to get the last value according the Order by that is NOT the current row. In this example it would be the $customer of the previous row.
I know that I could use the LAG(customer,1) OVER ( ...) but the problem is that I want the previous customer that is NOT null, so the offset is not always 1...
How can I do that?
Many thanks!

Does this work?
select lag(customer) over (partition by id
order by (case when customer is not null then 1 else 0 end),
date
)
You can do this with two steps:
select t.*,
max(customer) over (partition by id, max_date) as max_customer
from (select t.*,
max(case when customer is not null then date end) over (partition by id order by date) as max_date
from t
) t;

How to get single closest value for each column type in DB2

I have this query:
SELECT * FROM TABLE1 WHERE KEY_COLUMN='NJCRF' AND TYPE_COLUMN IN ('SCORE1', 'SCORE2', 'SCORE3') AND DATE_EFFECTIVE_COLUMN<='2016-09-17'
I get about 12 record(rows) as result.
How to get result closest to DATE_EFFECTIVE_COLUMN for each TYPE_COLUMN? In this case, how to get three records, for each type, that are closest to effective date?
UPDATE: I could use TOP if I had to go over only single type, but I have three at this moment and for each of them I need to get closest time result.
Hope I made it clear, let me know if you need more info.

If I understand correctly, you can use ROW_NUMBER():
SELECT t.*
FROM (SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY TYPE_COLUMN ORDER BY DATE_EFFECTIVE_COLUMN DESC) as seqnum
FROM TABLE1 t
WHERE KEY_COLUMN = 'NJCRF' AND
TYPE_COLUMN IN ('SCORE1', 'SCORE2', 'SCORE3') AND
DATE_EFFECTIVE_COLUMN <= '2016-09-17'
) t
WHERE seqnum = 1;
If you want three records per type, just use seqnum <= 3.

I like ROW_NUMBER() for this. You want to partition by TYPE, which will start the row count over for each type, then order by DATE_EFFECTIVE desc, and take only the highest date (the first row):
SELECT *
FROM (
SELECT *,
ROW_NUMBER() over (PARTITION BY TYPE_COLUMN ORDER BY DATE_EFFECTIVE_COLUMN desc) RN
FROM TABLE1
WHERE KEY_COLUMN = 'NJCRF'
AND TYPE_COLUMN IN ('SCORE1', 'SCORE2', 'SCORE3')
AND DATE_EFFECTIVE_COLUMN <= '2016-09-17'
) A
WHERE RN = 1

SQL Find the minimum date based on consecutive values

I'm having trouble constructing a query that can find consecutive values meeting a condition. Example data below, note that Date is sorted DESC and is grouped by ID.
To be selected, for each ID, the most recent RESULT must be 'Fail', and what I need back is the earliest date in that run of 'Fails'. For ID==1, only the 1st two values are of interest (the last doesn't count due to prior 'Complete'. ID==2 doesn't count at all, failing the first condition, and for ID==3, only the first value matters.
A result table might be:
The trick seems to be doing some type of run-length encoding, but even with several attempts manipulating ROW_NUM and an attempt at the tabibitosan method for grouping consecutive values, I've been unable to gain traction.
Any help would be appreciated.

If your database supports window functions, you can do
select id, case when result='Fail' then earliest_fail_date end earliest_fail_date
from (
select t.*
,row_number() over(partition by id order by dt desc) rn
,min(case when result = 'Fail' then dt end) over(partition by id) earliest_fail_date
from tablename t
) x
where rn=1
Use row_number to get the latest row in the table. min() over() to get the earliest fail date for each id. If the first row has status Fail, you select the earliest_fail_date or else it would be null.
It should be noted that the expected result for id=1 is wrong. It should be 2016-09-20 as it is the earliest fail date.
Edit: Having re-read the question, i think this is what you might be looking for. Getting the minimum Fail date from the latest consecutive groups of Fail rows.
with grps as (
select t.*,row_number() over(partition by id order by dt desc) rn
,row_number() over(partition by id order by dt)-row_number() over(partition by id,result order by dt) grp
from tablename t
)
,maxfailgrp as (
select g.*,
max(case when result = 'Fail' then grp end) over(partition by id) maxgrp
from grps g
)
select id,
case when result = 'Fail' then (select min(dt) from maxfailgrp where id = m.id and grp=m.maxgrp) end earliest_fail_date
from maxfailgrp m
where rn=1
Sample Demo

Selecting type(s) of account with 2nd maximum number of accounts

Suppose we have an accounts table along with the already given values
I want to find the type of account with second highest number of accounts. In this case, result should be 'FD'. In case their is a contention for second highest count I need all those types in the result.
I'm not getting any idea of how to do it. I've found numerous posts for finding second highest values, say salary, in a table. But not for second highest COUNT.

This can be done using cte's. Get the counts for each type as the first step. Then use dense_rank (to get multiple rows with same counts in case of ties) to get the rank of rows by type based on counts. Finally, select the second ranked row.
with counts as (
select type, count(*) cnt
from yourtable
group by type)
, ranks as (
select type, dense_rank() over(order by cnt desc) rnk
from counts)
select type
from ranks
where rnk = 2;

One option is to use row_number() (or dense_rank(), depending on what "second" means when there are ties):
select a.*
from (select a.type, count(*) as cnt,
row_number() over (order by count(*) desc) as seqnum
from accounta a
group by a.type
) a
where seqnum = 2;
In Oracle 12c+, you can use offset/fetch:
select a.type, count(*) as cnt
from accounta a
group by a.type
order by count(*) desc
offset 1
fetch first 1 row only

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Find the nth value in hive - sql

Related

How do I create a new SQL table with custom column names and populate these columns

SQL - Window function to get values from previous row where value is not null

How to get single closest value for each column type in DB2

SQL Find the minimum date based on consecutive values

Selecting type(s) of account with 2nd maximum number of accounts

Categories

Resources