Hive impala query - sql

Input.
Key---- id---- ind1 ----ind2
1 A Y N
1 B N N
1 C Y Y
2 A N N
2 B Y N
Output
Key ind1 ind2
1 Y Y
2 Y N
So basically whenever the ind1..n col is y for same key different id . The output should be y else N.
That why for key 1 both ind is y
And key 2 ....ind is y and n.

You can use max() for this:
select id, max(ind1), max(ind2)
from t
group by id;

Related

Pandas - find value in column based on values from another column and replace date in different column

I have a df that looks like below:
ID Name Supervisor SupervisorID
1 X Y
2 Y C
3 Z Y
4 C Y
5 V X
What I need is to find SupervisorID. I can find his ID by checking it in column Name and that I will see his ID so if supervisor is Y then I see that in column Name there is Y so his ID id 2. DF should looks like below:
ID Name Supervisor SupervisorID
1 X Y 2
2 Y C 4
3 Z Y 2
4 C Y 2
5 V X 1
Do you have any idea how to solve this?
Thanks for help and best regards
Use Series.map with DataFrame.drop_duplicates for unique Names, because in real data duplicates:
df['SupervisorID']=df['Supervisor'].map(df.drop_duplicates('Name').set_index('Name')['ID'])
print (df)
ID Name Supervisor SupervisorID
0 1 X Y 2
1 2 Y C 4
2 3 Z Y 2
3 4 C Y 2
4 5 V X 1

How to update the columns values when they have no value

For Example Table Test has the below schema with 4 records:
id H1 H2 H3 H4 H5
1 X Y Z M N
2 K L N O
3 G M P
4 J K N
Ouput I want is :
id H1 H2 H3 H4 H5
1 X Y Z M N
2 K L N O
3 G M P
4 J K N
I am trying to implement this using case statements. Any help would be appreciated
Concatenate all columns without empty elements, split again, address array elements:
with data as(
select stack(4,
1 ,'X','Y','Z','M','N',
2 ,null,'K','L','N','O',
3 ,'G',null,'M',null,'P',
4 ,'J',null,'K',null,'N'
) as (id,ad1,ad2,ad3,ad4,ad5)
)
select id, a[0] as ad1, a[1] as ad2, a[2] as ad3, a[3] as ad4, a[4] as ad5
from
(
select id, split(regexp_replace(regexp_replace(concat_ws(',',nvl(ad1,''),nvl(ad2,''),nvl(ad3,''),nvl(ad4,''),nvl(ad5,'')),'^,+|,+$',''),',{2,}',','),',') a
from data
)s
Result:
id ad1 ad2 ad3 ad4 ad5
1 X Y Z M N
2 K L N O NULL
3 G M P NULL NULL
4 J K N NULL NULL
Time taken: 0.394 seconds, Fetched: 4 row(s)
Explanation:
First regexp_replace removes one or more leading and trailing commas '^,+|,+$'.
Second regexp_replace replaces two or more commas ',{2,}' with single one.
split creates array.

Creating dummy columns from cells with multiple values

I have a DF as shown below:
DF =
id Result
1 Li_In-AR-B, Or_Ba-AR-B
1 Li_In-AR-L, Or_Ba-AR-B
3 N
4 Lo_In-AR-U
5 Li_In-AR-U
6 Or_Ba-AR-B
6 Or_Ba-AR-L
7 N
Now I want to create new columns for every unique value in Result before the first "-". Every other value in the new column should be set to N. The delimiter "," is used to separate both instances in case of multiple values (2 or more).
DF =
id Result Li_In Lo_In Or_Ba
1 Li_In-AR-B Li_In-AR-B N Or_Ba-AR-B
1 Li_In-AR-L Li_In-AR-L N Or_Ba-AR-B
3 N N N N
4 Lo_In-AR-U N Lo_In-AR-U N
5 Li_In-AR-U Li_In-AR-U N N
6 Or_Ba-AR-B N N Or_Ba-AR-B
6 Or_Ba-AR-L N N Or_Ba-AR-L
7 N N N N
I thought I could do this easily using .get_dummies but this only returns a binary value for each cell.
DF_dummy = DF.Result.str.get_dummies(sep='-')
DF = pd.concat([DF,DF_dummy ],axis=1)
Also this solution for an earlier post is not applicable for the new case.
m = DF['Result'].str.split('-', n=1).str[0].str.get_dummies().drop('N', axis=1) == 1
df1 = pd.concat([DF['Result']] * len(m.columns), axis=1, keys=m.columns)
Any ideas?
Use dictionary comprehension with DataFrame constructor for split by ,\s+ for split by coma with one or more whitespaces.
import re
f = lambda x: {y.split('-', 1)[0] : y for y in re.split(',\s+', x) if y != 'N' }
df1 = pd.DataFrame(DF['Result'].apply(f).values.tolist(), index=DF.index).fillna('N')
print (df1)
Li_In Lo_In Or_Ba
0 Li_In-AR-B N Or_Ba-AR-B
1 Li_In-AR-L N Or_Ba-AR-B
2 N N N
3 N Lo_In-AR-U N
4 Li_In-AR-U N N
5 N N Or_Ba-AR-B
6 N N Or_Ba-AR-L
7 N N N
Last add to original DataFrame:
df = DF. join(df1)
print (df)
id Result Li_In Lo_In Or_Ba
0 1 Li_In-AR-B, Or_Ba-AR-B Li_In-AR-B N Or_Ba-AR-B
1 1 Li_In-AR-L, Or_Ba-AR-B Li_In-AR-L N Or_Ba-AR-B
2 3 N N N N
3 4 Lo_In-AR-U N Lo_In-AR-U N
4 5 Li_In-AR-U Li_In-AR-U N N
5 6 Or_Ba-AR-B N N Or_Ba-AR-B
6 6 Or_Ba-AR-L N N Or_Ba-AR-L
7 7 N N N N

Group records from Datatable by Linq and add them to a Dictionary

I have a datatable which is filled from sql query. Sample Query:
A B C D E
1 1 x y z
1 1 x y z
1 2 x y z
2 1 x y z
i want to group them like this one: (A and B will be unique)
A B C D E
1 1 x y z
1 1 x y z
A B C D E
1 2 x y z
A B C D E
2 1 x y z
i tried with linq but could not properly done it. i checked similar questions but they did not solve my problem. It does not have to be linq but i thought if i use linq and group them by using Dictionary that would be good solution for me.

Setting a min. range before fetching data

i have a table which has localities numbered with unique numbers. Each locality has some buildings numbered that have the status as Activated = Y or N. i want to pick localities which have the min Building Activated = 'Y' count of 15.
Sample Data :
Locality ACTIVATED
1 Y
1 Y
1 N
1 N
1 N
1 N
2 Y
2 Y
2 Y
2 Y
2 Y
Eg : i need count of locality that with min. 5 Y in ACTIVATED Column
SELECT l.*
FROM Localities l
WHERE (SELECT COUNT(*) FROM Building b
WHERE b.LocalityNumber = l.LocalityNumber
AND b.Activated = 'Y') >= 15