Trying to perform a dynamic limit in hive sql via the rank function.
PROBLEM:
I want to use the limit from table A against table B to create the output. Example below.
TABLE A:
ID | Limit
------------
123 | 1
456 | 3
789 | 2
TABLE B:
ID | User
-------
123 | ABC
123 | DEF
123 | GHI
456 | JKL
456 | MNO
789 | PQR
789 | RST
OUTPUT:
ID | User
----------
123 | ABC
456 | JKL
456 | MNO
789 | PQR
789 | RST
Unfortunately you cannot do a dynamic limit (as far as I know) in hive sql. So I was trying to use rank. My current query looks like this:
SELECT c.id, c.users, c.rnk
FROM (
SELECT b.id, b.user, a.limit, rank() over (ORDER BY b.id DESC) as rnk
FROM a JOIN b
ON a.id = b.id
) c
WHERE rnk < c.limit;
Currently I get the error:
ParseException line 3:9 cannot recognize input near 'rank' '(' ')' in from source 0
Any ideas why? Or maybe a better approach?
Thanks!
SELECT c.id, c.users, c.rnk
FROM (
SELECT b.id, b.user, a.limit, row_number() over (PARTITION by b.id ORDER BY b.id ) as rn
FROM a JOIN b
ON a.id = b.id
) c
WHERE rn <= c.limit;
In the above query row_number() will number rows after join, filter in the where clause will work as limit. ORDER BY is not necessary for simply limiting rows without any preference, replace ORDER BY with your rule, for example order by user.
Related
EDITED
We have a table like below where there are multiple rows with same Name with different Ops
Deal
Name
Ops
ABC
A
NULL
ABC
A
NULL
ABC
B
NULL
ABC
B
NULL
ABC
B
Default
ABC
B
Default
ABC
C
NULL
ABC
C
NULL
ABC
C
Default
ABC
C
Default
ABC
C
Aggr
ABC
C
Aggr
We need to get rows with Default when both NULL and Default is tagged and Aggr when NULL, Default and Aggr are tagged
Expected output:
Deal
Name
Ops
ABC
A
NULL
ABC
A
NULL
ABC
B
Default
ABC
B
Default
ABC
C
Aggr
ABC
C
Aggr
We need to get this using a SQL query.
I have tried this one here:
WITH PriorityRanking AS
(
SELECT
DENSE_RANK() OVER (PARTITION BY Deal, Name
ORDER BY
CASE COALESCE(Ops,NULL)
WHEN 'Aggr' THEN 1
WHEN 'Default' THEN 2
WHEN NULL THEN 3
END) AS Rnk,
*
FROM
Table_name
)
SELECT Deal, Name, Ops
FROM
(SELECT Deal, Name, Ops,
FROM PriorityRanking
WHERE Rnk = 1)
But the rank is set to 1 for NULL values and so the final select is not working properly.
Please suggest what is the best way to get the required data
I think you just need to use row_number and coalesce your NULL values to be ordered below the other values -
with r as (
select *, Row_Number() over(partition by deal, name order by IsNull(Ops,'x') ) rn
from t
)
select Deal, Name, Ops
from r
where rn = 1;
depending on your real data as mentioned you could also simply aggregate:
select Deal, Name, min(Ops) Ops
from t
group by Deal, Name;
The below logic can work
WITH cte1 as
(
Select deal,name,isnull(Ops,'Z') as Col from table
),
cte2 as
(
Select *,row_number() over(Partition by Deal,name order by col) as rnbr from cte1
)
Select deal,name,case when col='Z' then NUll else Col end as Ops from cte2 where rnbr=1
You can replace NULL with a text and sort after that
After changing your data, the basic concept stays valid
But now you need DENSE_RANK like you had it in your original question, as SQL SERVER can't handle NULL, so replacing will help
WITH PriorityRanking AS
(
SELECT DENSE_RANK() OVER (PARTITION BY Name ORDER BY CASE ISNULL(Ops,'NULL')
WHEN 'Aggr' THEN 1
WHEN 'Default' THEN 2
WHEN 'NULL' THEN 3
END) AS Rnk,*
FROM Table_name
)
SELECT [Deal], [Name], [Ops] FROM PriorityRanking WHERE Rnk = 1
Deal | Name | Ops
:--- | :--- | :------
ABC | A | null
ABC | A | null
ABC | B | Default
ABC | B | Default
ABC | C | Aggr
ABC | C | Aggr
db<>fiddle here
WITH PriorityRanking AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY Name ORDER BY CASE ISNULL(Ops,'NULL')
WHEN 'Aggr' THEN 1
WHEN 'Default' THEN 2
WHEN 'NULL' THEN 3
END) AS Rnk,*
FROM Table_name
)
SELECT Deal, Name, Ops FROM PriorityRanking where Rnk = 1
Deal | Name | Ops
:--- | :--- | :------
ABC | A | null
ABC | B | Default
ABC | C | Aggr
db<>fiddle here
how would I go about dropping rows which have a duplicate and keeping another row by its category.
For example, let's consider a sample table
Item | location | Status
------------------------
123 | A | done
123 | A | not_done
123 | B | Other
435 | D | Other
So essentially what I want to get to would be this table
Item | location | Status
------------------------
123 | A | done
435 | D | Other
I am not interested in the other status or location IF the status is done. If it is not "done" then I would show the following one.
Any clues if it is possible to create something like this in an SQL query?
Identify rows with done if any and prioritize them.
select * except rn
from (
select item, location, status
, row_number() over (
partition by item
order by case status when 'done' then 0 else 1 end
) as rn
from t
)
where rn = 1
(I didn't try it, excuse syntax errors please.)
Yes you can do it via exists condition like below after enabling standard SQL first.
select * from
yourtable A
where not exists
(
select 1 from yourtable B
where A.id=B.id and A.location=B.location
and A.status<>B.status
AND B.status <> 'done'
)
I have a table with those 5 rows.
code | type_id | status
-----+--------+--------
123 | 123456 | DONE
123 | 456789 | DONE
321 | 654321 | DONE
321 | 897321 | DONE
456 | 999888 | DONE
456 | 777666 | FAIL
And I want to change it to below with DONE only.
code | type_id1 | type_id2
-----+----------+---------
123 | 123456 | 456789
321 | 654321 | 897321
456 | 999888 | null
How can I join them to show the result?
You can use aggregation
select
code,
min(type) type1,
case when count(*) > 1 then max(type) end type2
from mytable
group by code
Note that this only works as expected if a code has 1 or 2 types.
If I understand correctly that you want one row per code, you can use aggregation:
select code,
min(type_id) as type_id1,
(case when min(type_id) <> max(type_id) then max(type_id) end) as type_id2
from t
where status = 'DONE'
group by code;
Note that SQL tables represent unordered sets. With your sample data, there is no way to preserve "the original" order of the values, because that is undefined -- unless another column specifies that ordering.
You can make a LEFT JOIN, for example:
SELECT
A.code,
A.type_id,
B.type_id
FROM table A
LEFT JOIN
table B ON A.code = B.code AND A.type_id <> B.type_id AND A.status = B.status
WHERE A.status = 'DONE'
You can use cte as (select code, type_id, status, row_number() over(partition by code order by code) as rank from table)
select * from cte where rank =1 and status ='Done'
How can we create all the combinations of any length for the values in one column and return the distinct count of another column for that combination?
Table:
+------+--------+
| Type | Name |
+------+--------+
| A | Tom |
| A | Ben |
| B | Ben |
| B | Justin |
| C | Ben |
+------+--------+
Output Table:
+-------------+-------+
| Combination | Count |
+-------------+-------+
| A | 2 |
| B | 2 |
| C | 1 |
| AB | 3 |
| BC | 2 |
| AC | 2 |
| ABC | 3 |
+-------------+-------+
When the combination is only A, there are Tom and Ben so it's 2.
When the combination is only B, 2 distinct names so it's 2.
When the combination is A and B, 3 distinct names: Tom, Ben, Justin so it's 3.
I'm working in Amazon Redshift. Thank you!
NOTE: This answers the original version of the question which was tagged Postgres.
You can generate all combinations with this code
with recursive td as (
select distinct type
from t
),
cte as (
select td.type, td.type as lasttype, 1 as len
from td
union all
select cte.type || t.type, t.type as lasttype, cte.len + 1
from cte join
t
on 1=1 and t.type > cte.lasttype
)
You can then use this in a join:
with recursive t as (
select *
from (values ('a'), ('b'), ('c'), ('d')) v(c)
),
cte as (
select t.c, t.c as lastc, 1 as len
from t
union all
select cte.type || t.type, t.type as lasttype, cte.len + 1
from cte join
t
on 1=1 and t.type > cte.lasttype
)
select type, count(*)
from (select name, cte.type, count(*)
from cte join
t
on cte.type like '%' || t.type || '%'
group by name, cte.type
having count(*) = length(cte.type)
) x
group by type
order by type;
There is no way to generate all possible combinations (A, B, C, AB, AC, BC, etc) in Amazon Redshift.
(Well, you could select each unique value, smoosh them into one string, send it to a User-Defined Function, extract the result into multiple rows and then join it against a big query, but that really isn't something you'd like to attempt.)
One approach would be to create a table containing all possible combinations — you'd need to write a little program to do that (eg using itertools in Python). Then, you could join the data against that reasonably easy to get the desired result (eg IF 'ABC' CONTAINS '%A%').
I have table with descriptions of smth. For example:
My_Table
id description
================
1 ABC
2 ABB
3 OPAC
4 APEЧ
I need to get all unique symbols from all "description" columns.
Result should look like that:
symbol
================
A
B
C
O
P
E
Ч
And it shoud work for all languages, so, as I see, regular expressions cant help.
Please help me. Thanks.
with cte (c,description_suffix) as
(
select substr(description,1,1)
,substr(description,2)
from mytable
where description is not null
union all
select substr(description_suffix,1,1)
,substr(description_suffix,2)
from cte
where description_suffix is not null
)
select c
,count(*) as cnt
from cte
group by c
order by c
or
with cte(n) as
(
select level
from dual
connect by level <= (select max(length(description)) from mytable)
)
select substr(t.description,c.n,1) as c
,count(*) as cnt
from mytable t
join cte c
on c.n <= length(description)
group by substr(t.description,c.n,1)
order by c
+---+-----+
| C | CNT |
+---+-----+
| A | 4 |
| B | 3 |
| C | 2 |
| E | 1 |
| O | 1 |
| P | 2 |
| Ч | 1 |
+---+-----+
Create a numbers table and populate it with all the relevant ids you'd need (in this case 1..maxlength of string)
SELECT DISTINCT
locate(your_table.description, numbers.id) AS symbol
FROM
your_table
INNER JOIN
numbers
ON numbers.id >= 1
AND numbers.id <= CHAR_LENGTH(your_table.description)
SELECT DISTINCT(SUBSTR(ll,LEVEL,1)) OP --Here DISTINCT(SUBSTR(ll,LEVEL,1)) is used to get all distinct character/numeric in vertical as per requirment
FROM
(
SELECT LISTAGG(DES,'')
WITHIN GROUP (ORDER BY ID) ll
FROM My_Table --Here listagg is used to convert all values under description(des) column into a single value and there is no space in between
)
CONNECT BY LEVEL <= LENGTH(ll);