Hive impala query

Hive impala query - sql

Input.
Key---- id---- ind1 ----ind2
1 A Y N
1 B N N
1 C Y Y
2 A N N
2 B Y N
Output
Key ind1 ind2
1 Y Y
2 Y N
So basically whenever the ind1..n col is y for same key different id . The output should be y else N.
That why for key 1 both ind is y
And key 2 ....ind is y and n.

You can use max() for this:
select id, max(ind1), max(ind2)
from t
group by id;

Related

Pandas - find value in column based on values from another column and replace date in different column

I have a df that looks like below:
ID Name Supervisor SupervisorID
1 X Y
2 Y C
3 Z Y
4 C Y
5 V X
What I need is to find SupervisorID. I can find his ID by checking it in column Name and that I will see his ID so if supervisor is Y then I see that in column Name there is Y so his ID id 2. DF should looks like below:
ID Name Supervisor SupervisorID
1 X Y 2
2 Y C 4
3 Z Y 2
4 C Y 2
5 V X 1
Do you have any idea how to solve this?
Thanks for help and best regards

Use Series.map with DataFrame.drop_duplicates for unique Names, because in real data duplicates:
df['SupervisorID']=df['Supervisor'].map(df.drop_duplicates('Name').set_index('Name')['ID'])
print (df)
ID Name Supervisor SupervisorID
0 1 X Y 2
1 2 Y C 4
2 3 Z Y 2
3 4 C Y 2
4 5 V X 1

How to update the columns values when they have no value

For Example Table Test has the below schema with 4 records:
id H1 H2 H3 H4 H5
1 X Y Z M N
2 K L N O
3 G M P
4 J K N
Ouput I want is :
id H1 H2 H3 H4 H5
1 X Y Z M N
2 K L N O
3 G M P
4 J K N
I am trying to implement this using case statements. Any help would be appreciated

Concatenate all columns without empty elements, split again, address array elements:
with data as(
select stack(4,
1 ,'X','Y','Z','M','N',
2 ,null,'K','L','N','O',
3 ,'G',null,'M',null,'P',
4 ,'J',null,'K',null,'N'
) as (id,ad1,ad2,ad3,ad4,ad5)
)
select id, a[0] as ad1, a[1] as ad2, a[2] as ad3, a[3] as ad4, a[4] as ad5
from
(
select id, split(regexp_replace(regexp_replace(concat_ws(',',nvl(ad1,''),nvl(ad2,''),nvl(ad3,''),nvl(ad4,''),nvl(ad5,'')),'^,+|,+$',''),',{2,}',','),',') a
from data
)s
Result:
id ad1 ad2 ad3 ad4 ad5
1 X Y Z M N
2 K L N O NULL
3 G M P NULL NULL
4 J K N NULL NULL
Time taken: 0.394 seconds, Fetched: 4 row(s)
Explanation:
First regexp_replace removes one or more leading and trailing commas '^,+|,+$'.
Second regexp_replace replaces two or more commas ',{2,}' with single one.
split creates array.

Creating dummy columns from cells with multiple values

I have a DF as shown below:
DF =
id Result
1 Li_In-AR-B, Or_Ba-AR-B
1 Li_In-AR-L, Or_Ba-AR-B
3 N
4 Lo_In-AR-U
5 Li_In-AR-U
6 Or_Ba-AR-B
6 Or_Ba-AR-L
7 N
Now I want to create new columns for every unique value in Result before the first "-". Every other value in the new column should be set to N. The delimiter "," is used to separate both instances in case of multiple values (2 or more).
DF =
id Result Li_In Lo_In Or_Ba
1 Li_In-AR-B Li_In-AR-B N Or_Ba-AR-B
1 Li_In-AR-L Li_In-AR-L N Or_Ba-AR-B
3 N N N N
4 Lo_In-AR-U N Lo_In-AR-U N
5 Li_In-AR-U Li_In-AR-U N N
6 Or_Ba-AR-B N N Or_Ba-AR-B
6 Or_Ba-AR-L N N Or_Ba-AR-L
7 N N N N
I thought I could do this easily using .get_dummies but this only returns a binary value for each cell.
DF_dummy = DF.Result.str.get_dummies(sep='-')
DF = pd.concat([DF,DF_dummy ],axis=1)
Also this solution for an earlier post is not applicable for the new case.
m = DF['Result'].str.split('-', n=1).str[0].str.get_dummies().drop('N', axis=1) == 1
df1 = pd.concat([DF['Result']] * len(m.columns), axis=1, keys=m.columns)
Any ideas?

Use dictionary comprehension with DataFrame constructor for split by ,\s+ for split by coma with one or more whitespaces.
import re
f = lambda x: {y.split('-', 1)[0] : y for y in re.split(',\s+', x) if y != 'N' }
df1 = pd.DataFrame(DF['Result'].apply(f).values.tolist(), index=DF.index).fillna('N')
print (df1)
Li_In Lo_In Or_Ba
0 Li_In-AR-B N Or_Ba-AR-B
1 Li_In-AR-L N Or_Ba-AR-B
2 N N N
3 N Lo_In-AR-U N
4 Li_In-AR-U N N
5 N N Or_Ba-AR-B
6 N N Or_Ba-AR-L
7 N N N
Last add to original DataFrame:
df = DF. join(df1)
print (df)
id Result Li_In Lo_In Or_Ba
0 1 Li_In-AR-B, Or_Ba-AR-B Li_In-AR-B N Or_Ba-AR-B
1 1 Li_In-AR-L, Or_Ba-AR-B Li_In-AR-L N Or_Ba-AR-B
2 3 N N N N
3 4 Lo_In-AR-U N Lo_In-AR-U N
4 5 Li_In-AR-U Li_In-AR-U N N
5 6 Or_Ba-AR-B N N Or_Ba-AR-B
6 6 Or_Ba-AR-L N N Or_Ba-AR-L
7 7 N N N N

Group records from Datatable by Linq and add them to a Dictionary

I have a datatable which is filled from sql query. Sample Query:
A B C D E
1 1 x y z
1 1 x y z
1 2 x y z
2 1 x y z
i want to group them like this one: (A and B will be unique)
A B C D E
1 1 x y z
1 1 x y z
A B C D E
1 2 x y z
A B C D E
2 1 x y z
i tried with linq but could not properly done it. i checked similar questions but they did not solve my problem. It does not have to be linq but i thought if i use linq and group them by using Dictionary that would be good solution for me.

Setting a min. range before fetching data

i have a table which has localities numbered with unique numbers. Each locality has some buildings numbered that have the status as Activated = Y or N. i want to pick localities which have the min Building Activated = 'Y' count of 15.
Sample Data :
Locality ACTIVATED
1 Y
1 Y
1 N
1 N
1 N
1 N
2 Y
2 Y
2 Y
2 Y
2 Y
Eg : i need count of locality that with min. 5 Y in ACTIVATED Column

SELECT l.*
FROM Localities l
WHERE (SELECT COUNT(*) FROM Building b
WHERE b.LocalityNumber = l.LocalityNumber
AND b.Activated = 'Y') >= 15

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Hive impala query - sql

Input. Key---- id---- ind1 ----ind2 1 A Y N 1 B N N 1 C Y Y 2 A N N 2 B Y N Output Key ind1 ind2 1 Y Y 2 Y N So basically whenever the ind1..n col is y for same key different id . The output should be y else N. That why for key 1 both ind is y And key 2 ....ind is y and n.

You can use max() for this: select id, max(ind1), max(ind2) from t group by id;

Related

Pandas - find value in column based on values from another column and replace date in different column

How to update the columns values when they have no value

Creating dummy columns from cells with multiple values

Group records from Datatable by Linq and add them to a Dictionary

Setting a min. range before fetching data

Categories

Resources