select non-existing data as null using HIVE LATERAL VIEW

select non-existing data as null using HIVE LATERAL VIEW - hive

I am trying to get non existing data as null using outer explode in hive but my query is not returning anything.
EDIT :
Table - year string, companyrank
year:string,topcompanies:array<struct<name:string,rank:string>>
sample data
EDIT:
2015,
"topcompanies":[
{"name":"apple","rank":"1"},
{"name":"samsung","rank":"2"},
{"name":"SONY","rank":"3"},
]
2016,
"topcompanies":[
{"name":"apple","rank":"1"},
{"name":"samsung","rank":"2"},
{"name":"SONY","rank":"3"},
{"name":"LG","rank":"4"}
]
query to get data
select year, rank1, rank2, rank3, rank4
FROM companyrank
LATERAL VIEW outer explode(topcompanies) rank1_t as rank1_v
LATERAL VIEW outer explode(topcompanies) rank2_t as rank2_v
LATERAL VIEW outer explode(topcompanies) rank3_t as rank3_v
LATERAL VIEW outer explode(topcompanies) rank4_t as rank4_v
WHERE
(rank1_v.rank = 1 or rank1_v.rank is null)
AND (rank2_v.rank = 2 or rank2_v.rank is null)
AND (rank3_v.rank = 3 or rank3_v.rank is null)
AND (rank4_v.rank = 4 or rank4_v.rank is null)
expected output-
expected output when rank4 does not exists
year rank1 rank2 rank3 rank4
2016 apple samsung SONY null
if rank4 data exists then
year rank1 rank2 rank3 rank4
2015 apple samsung SONY LG
EDIT:
I need to get all 4 ranks for each year, if any of the ranks does not exists then the rank should show NULL.

The straightforward answer to your question is "use lateral view outer", but there is a much cleaner solution.
select min (case when i.rank = 1 then i.name end) as rank1
,min (case when i.rank = 2 then i.name end) as rank2
,min (case when i.rank = 3 then i.name end) as rank3
,min (case when i.rank = 4 then i.name end) as rank4
from companyrank c
lateral view inline(topcompanies) i
;
+--------+----------+--------+--------+
| rank1 | rank2 | rank3 | rank4 |
+--------+----------+--------+--------+
| apple | samsung | SONY | NULL |
+--------+----------+--------+--------+

Related

Average and sort by this based on other conditional columns in a table

I have a table in SQL Server 2017 like below:
Name Rank1 Rank2 Rank3 Rank4
Jack null 1 1 3
Mark null 3 2 2
John null 2 3 1
What I need to do is to add an average rank column then rank those names based on those scores. We ignore null ranks. Expected output:
Name Rank1 Rank2 Rank3 Rank4 AvgRank FinalRank
Jack null 1 1 3 1.66 1
Mark null 3 2 2 2.33 3
John null 2 3 1 2 2
My query now looks like this:
;with cte as (
select *, AvgRank= (Rank1+Rank2+Rank3+Rank4)/#NumOfRankedBy
from mytable
)
select *, FinakRank= row_number() over (order by AvgRank)
from cte
I am stuck at finding the value of #NumOfRankedBy, which should be 3 in our case because Rank1 is null for all.
What is the best way to approach such an issue?
Thanks.

Your conumdrum stems from the fact your table in not normalised and you are treating data (Rank) as structure (columns).
You should have a table for Ranks where each rank is a row, then your query is easy.
You can unpivot your columns into rows and then make use of avg
select *, FinakRank = row_number() over (order by AvgRank)
from mytable
cross apply (
select Avg(r * 1.0) AvgRank
from (values(rank1),(rank2),(rank3),(rank4))r(r)
)r;

query a table with multiple rows for same id, into single data row in results

I have a few tables like this where a person has multiple data rows. The IDs are sequential but do not always start at 1. Is there a way to have the results come out in a single data row for each person. I have a few tables like this and I ultimately would like to join them via CLIENT_ID, but I'm a bit stumped. Is this possible?
Using oracle sql.
CLIENT_ID
NAME
ID
ID_DESCRIPTION
5
joe
1
apple
5
joe
5
orange
68
brian
2
orange
68
brian
6
mango
68
brian
10
lemon
12
katie
3
watermelon
where the results look like this
CLIENT_ID
NAME
ID1
ID1_DESCRIPTION
ID2
ID2_DESCRIPTION
ID3
ID3_DESCRIPTION
5
joe
1
apple
5
orange
68
brian
2
orange
6
mango
10
lemon
12
katie
3
watermelon

If Pivot ist not available, this should do it:
Select
Client_id,
sum(case when id_description='apple' then 1 else 0 end) as Apples,
sum(case when id_description='orange' then 1 else 0 end) as Oranges...
[]etc.
from
t
group by Client_ID

Might need some minor tweaking as I wrote this just off the top of my head, but something like this should work. Will say this doesn't account for more than 3 rows per CLIENT_ID. For that, would need to do a dynamic pivot (plenty of online articles on this topic).
Pivoting Based on Order of Items
WITH cte_RowNum AS (
SELECT ROW_NUMBER() OVER (PARTITION BY CLIENT_ID ORDER BY ID) AS RowNum
,*
FROM YourTable
)
SELECT CLIENT_ID
,MAX(CASE WHEN RowNum = 1 THEN ID END) AS ID1
,MAX(CASE WHEN RowNum = 1 THEN [Description] END) AS ID1_DESCRIPTION
,MAX(CASE WHEN RowNum = 2 THEN ID END) AS ID2
,MAX(CASE WHEN RowNum = 2 THEN [Description] END) AS ID2_DESCRIPTION
,MAX(CASE WHEN RowNum = 3 THEN ID END) AS ID3
,MAX(CASE WHEN RowNum = 3 THEN [Description] END) AS ID3_DESCRIPTION
FROM cte_RowNum
GROUP BY CLIENT_ID;

Pivot Different Prices for the Same Item

I have a need to show different prices for the same product where the client wants to understand discrepancies in same region. The source table looks like below
Item
Brand
Concept
Price
00A
A
Alpha
1
00B
A
Alpha
1
00B
A
Alpha
2
00B
A
Beta
3
00A
B
Alpha
1
00B
B
Alpha
1
00B
B
Beta
2
00B
B
Alpha
3
The output I am trying to achieve is a little complicate but can be simplified if we only focus on Brand A so please consider i am pivoting for Brand A only. Result Needed is
Item
Alpha
Beta
00A
1
Null
Null
00B
1
2
3

Are you looking for conditional aggregation?
select item, brand,
max(case when concept = 'Alpha' and seqnum = 1 then price end) as alpha_price_1,
max(case when concept = 'Alpha' and seqnum = 2 then price end) as alpha_price_2,
max(case when concept = 'Beta' and seqnum = 1 then price end) as alpha_price_3
from (select t.*,
row_number() over (partition by item, brand, concept order by price) as seqnum
from t
) t
group by item, brand;

Getting unique Ids but not loose any data using SQL

Given a table sale where id is not unique:
id name item quantity
1 Darsh shoes 5
2 Liyah oil 1
2 Eiliyah watch 1
3 Zakaria notebook 2
3 Elliot shirt 3
4 Reese bag 1
I need to select all unique ids for a row and not loose any data(like for id in (2,3) where both name,item and quantity should be displayed in same row).Also there are maximum of 2 same id in sale table.
I tried using row_number() to get some unique pattern(s).
From this query :
Select a.id,a.name,a.item,a.quantity,b.name as name2,b.item as item2,b.quantity as quantity2
,row_number() over(partition by a.id order by a.id) as f1
,row_number() over(partition by a.name order by a.id) as f2
from sale a inner join sale b on a.id = b.id
I got this
id name item quantity name2 item2 quantity2 f1 f2
1 Darsh shoes 5 Darsh shoes 5 1 1
2 Eiliyah watch 1 Liyah oil 1 2 1
2 Eiliyah watch 1 Eiliyah watch 1 4 2
3 Elliot shirt 3 Zakaria notebook 2 2 1
3 Elliot shirt 3 Elliot shirt 3 4 2
2 Liyah oil 1 Eiliyah watch 1 3 1
2 Liyah oil 1 Liyah oil 1 1 2
4 Reese bag 1 Reese bag 1 1 1
3 Zakaria notebook 2 Elliot shirt 3 3 1
3 Zakaria notebook 2 Zakaria notebook 2 1 2
Now here the problem,If I filter f1,f2 and use IIF for remove repetitive data using this query :
Select id,name,item,quantity
,iif(name = name2,NULL,name2) as name2
,iif(item = item2,NULL,item2) as item2
,iif(quantity = quantity2,NULL,quantity2) as quantity2
from (
Select a.id,a.name,a.item,a.quantity,b.name as name2,b.item as item2,b.quantity as quantity2
,row_number() over(partition by a.id order by a.id) as f1
,row_number() over(partition by a.name order by a.id) as f2
from sale a inner join sale b on a.id = b.id
)t
where (f1=1 and f2=1) or(f1=3 and f2=1)
order by id
then quantity2 is (null) in 2nd row as shown below.
id name item quantity name2 item2 quantity2
1 Darsh shoes 5 NULL NULL NULL
2 Liyah oil 1 Eiliyah watch NULL
3 Zakaria notebook 2 Elliot shirt 3
4 Reese bag 1 NULL NULL NULL
So, there can be same quantity for different item and name.
Expected result:
id name item quantity name2 item2 quantity2
1 Darsh shoes 5 NULL NULL NULL
2 Liyah oil 1 Eiliyah watch 1
3 Zakaria notebook 2 Elliot shirt 3
4 Reese bag 1 NULL NULL NULL
Please help me.
Thanks!

One method is conditional aggregation . . . if you know that there are at most two duplicates per id:
select id,
max(case when seqnum = 1 then name end) as name_1,
max(case when seqnum = 1 then item end) as item_1,
max(case when seqnum = 1 then quantity end) as quantity_1,
max(case when seqnum = 2 then name end) as name_2,
max(case when seqnum = 2 then item end) as item_2,
max(case when seqnum = 2 then quantity end) as quantity_2
from (select s.*,
row_number() over (partition by id order by id) as seqnum
from sale s
) s
group by id;

As per your expected result .You can create temp(or Intermediate table) and as there are maximum of same two id ,then this can be your answer:
select *,row_number() over (partition by id order by id) as u_id into #test from sale
select * from (select * from #test where u_id=1) a
left join (select * from #test where u_id=2)b
on a.id = b.id

logic in HAVING clause to get multiple values of a group by result

Imagine I have a table with data as below:
ROLE_ID | USER_ID | CODE
---------------------------------
14 | USER A | 001
15 | USER A | 002
11 | USER B | 004
13 | USER D | 005
13 | USER A | 001
15 | USER B | 009
15 | USER D | 005
12 | USER C | 004
15 | USER C | 008
13 | USER D | 007
15 | USER D | 007
I want to get the User ids and codes that only have 13 and 15 role_ids. So based on the data above I would like back the following
USER D | 005
USER D | 007
I have the query below, however, it only brings back one, not both.
SELECT a.user_id, a.code
FROM my_table a
WHERE a.ROLE_ID in (13,15,11,14)
group by a.USER_ID, a.code
having sum( case when a.role_id in (13,15) then 1 else 0 end) = 2
and sum( case when a.role_id in (11,14) then 1 else 0 end) = 0
ORDER BY USER_ID
The above query only brings
USER D | 005
rather than
USER D | 005
USER D | 007

Sometimes just listening to your own words in English translates into the easiest to read SQL:
SELECT DISTINCT a.user_id, a.code
FROM my_table a
WHERE a.user_id in
(SELECT b.user_id
FROM my_table b
WHERE b.ROLE_ID = 13)
AND a.user_id in
(SELECT b.user_id
FROM my_table b
WHERE b.ROLE_ID = 15)
AND a.user_id NOT IN
(SELECT b.user_id
FROM my_table b
WHERE b.ROLE_ID NOT IN (13,15))

I will:
SELECT a.user_id, a.code
FROM my_table a
GROUP BY a.user_id, a.code
HAVING sum(case when a.role_id in (13, 15) then 1 else 3 end) = 2
:)

As proven by EthanB, your query is working exactly as you desire. There must be something in your project data that is not represented in your question's fabricated data.
I do endorse a pivot as you have executed in your question, but I would write it as a single SUM expression to reduce the number of iterations over the aggregate data. I certainly do not endorse multiple subqueries on each row of the table (1, 2, 3) ...regardless of whether the optimizer is converting the subqueries to multiple JOINs.
Your pivot conditions:
having sum( case when a.role_id in (13,15) then 1 else 0 end) = 2
and sum( case when a.role_id in (11,14) then 1 else 0 end) = 0
My recommendation:
As the aggregate data is being iterated, you can keep a tally (+1) of qualifying rows and jump to a disqualifying outcome (+3) after each evaluation. This way, there is only one pass over the aggregate instead of two.
SELECT USER_ID, CODE
FROM my_table
WHERE ROLE_ID IN (13,15,11,14)
GROUP BY USER_ID, CODE
HAVING SUM(CASE WHEN ROLE_ID IN (13,15) THEN 1
WHEN ROLE_ID IN (11,14) THEN 3 END) = 2
Another way of expressing what these HAVING clauses are doing is:
Require that the first CASE is satisfied twice and that the second CASE is never satisfied.
Demo Link
Alternatively, the above HAVING clause could be less elegantly written as:
HAVING SUM(CASE ROLE_ID
WHEN 13 THEN 1
WHEN 15 THEN 1
WHEN 11 THEN 3
WHEN 14 THEN 3
END) = 2
Disclaimer #1: I don't swim in the [oracle] tag pool, I've not investigated how to execute this with PIVOT.
Disclaimer #2: My above advice assumes that ROLE_IDs are unique in the grouped USER_ID+CODE aggregate data. Fringe cases: (a demo)
a given group contains ROLE_ID = 13, ROLE_ID = 13, and ROLE_ID = 15 then of course the SUM will be at least 3 and the group will be disqualified.
a given group contains only ROLE_ID = 15 and ROLE_ID = 15 then of course the SUM will be 2 and the group will be unintentionally qualified.
To combat scenarios like these, make three separate MAX conditions.
HAVING MAX(CASE WHEN ROLE_ID = 13 THEN 1 END) = 1
AND MAX(CASE WHEN ROLE_ID = 15 THEN 1 END) = 1
AND MAX(CASE WHEN ROLE_ID IN (11,14) THEN 1 END) IS NULL
Demo

SELECT user_id, code FROM my_table
WHERE role_id = 13
INTERSECT
SELECT user_id, code FROM my_table
WHERE role_id = 15

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

select non-existing data as null using HIVE LATERAL VIEW - hive

Related

Average and sort by this based on other conditional columns in a table

query a table with multiple rows for same id, into single data row in results

Pivot Different Prices for the Same Item

Getting unique Ids but not loose any data using SQL

logic in HAVING clause to get multiple values of a group by result

Categories

Resources