How to update the columns values when they have no value - sql

For Example Table Test has the below schema with 4 records:
id H1 H2 H3 H4 H5
1 X Y Z M N
2 K L N O
3 G M P
4 J K N
Ouput I want is :
id H1 H2 H3 H4 H5
1 X Y Z M N
2 K L N O
3 G M P
4 J K N
I am trying to implement this using case statements. Any help would be appreciated

Concatenate all columns without empty elements, split again, address array elements:
with data as(
select stack(4,
1 ,'X','Y','Z','M','N',
2 ,null,'K','L','N','O',
3 ,'G',null,'M',null,'P',
4 ,'J',null,'K',null,'N'
) as (id,ad1,ad2,ad3,ad4,ad5)
)
select id, a[0] as ad1, a[1] as ad2, a[2] as ad3, a[3] as ad4, a[4] as ad5
from
(
select id, split(regexp_replace(regexp_replace(concat_ws(',',nvl(ad1,''),nvl(ad2,''),nvl(ad3,''),nvl(ad4,''),nvl(ad5,'')),'^,+|,+$',''),',{2,}',','),',') a
from data
)s
Result:
id ad1 ad2 ad3 ad4 ad5
1 X Y Z M N
2 K L N O NULL
3 G M P NULL NULL
4 J K N NULL NULL
Time taken: 0.394 seconds, Fetched: 4 row(s)
Explanation:
First regexp_replace removes one or more leading and trailing commas '^,+|,+$'.
Second regexp_replace replaces two or more commas ',{2,}' with single one.
split creates array.

Related

How can I map a list of values to a dataframe

i'm trying to map some values to a dataframe, i used some looping methods but it seems that there must be a simple way to acheive the result im looking for
input_df :
A B C
x y z
i j t
f g h
list if values to map :
list = [1,2,3]
result_df :
A B C D
x y z 1
i j t 1
f g h 1
x y z 2
i j t 2
f g h 2
x y z 3
i j t 3
f g h 3
Try a cross join (i.e. Cartesian product):
tmp = pd.DataFrame(list, columns=["D"])
tmp.merge(input_df, how="cross")
Require pandas >= 1.2.0. pd.merge

Pandas Merging Data Frames Repeated Values and Values Missing

So I've created three data frames from 3 separate files (csv and xls). I want to combine the three of them into a single data frame that is 20 columns and 15 rows. I've managed to successfully do this using the code at the bottom (this is the final part of the code where I started to merge all of the existing data frames I created). However, an odd thing is happening, where the highest ranking country is duplicated 3 times, and there are two values from the 15 columns that should be there but that are missing, and I'm not exactly sure why.
I've set the index to be the same in each data frame!
So essentially my issue is that there are duplicate values showing up and other values being eliminated after I merge the data frames.
If someone could explain the mechanics to me as to why this issue is occuring I'd really appreciate it :)
***merged = pd.merge(pd.merge(df_ScimEn,df_energy[ListEnergy],left_index=True,right_index=True),df_GDP[ListOfGDP],left_index=True,right_index=True))
merged = merged[ListOfColumns]
merged = merged.sort_values('Rank')
merged = merged[merged['Rank']<16]
final = pd.DataFrame(merged)***
***Example: a shorter version of what is happening
expected:
A B C D J K L R
1 x y z j a e c d
2 b c d l a l c d
3 j k e k a m c d
4 d k c k a n h d
5 d k j l a h c d
generated after I run the code above: (the 1 is repeated and the 3 is missing)
A B C D J K L R
1 x y z j a b c d
1 x y z j a b c d
1 x y z j a b c d
4 d k c k a b h d
5 d k j l a h c d***
***Example Input
df1 = {[1:A,B,C],[2:A,B,C],[3:A,B,C],[4:A,B,C],[5:A,B,C]}
df2 = {[1:J,K,L,M],[2:J,K,L,M],[3:J,K,L,M],[4:J,K,L,M],[5:J,K,L,M]}
df3 = {[1:R,E,T],[2:R,E,T],[3:R,E,T],[4:R,E,T],[5:R,E,T]}
So the indexes are all the same for each data frame and then some have a
different number of rows and different number of columns but I've edited them
to form the final data frame. and each capital letter stands for a column
name with different values for each column***

Append two pandas dataframe with different shapes and in for loop using python or pandasql

I have two dataframe such as:
df1:
id A B C D
1 a b c d
1 e f g h
1 i j k l
df2:
id A C D
2 x y z
2 u v w
The final outcome should be:
id A B C D
1 a b c d
1 e f g h
1 i j k l
2 x y z
2 u v w
These tables are generated using for loop from json files. So have to keep on appending these tables one below another.
Note: Two dataframes 'id' column is always different.
My approach:
data is a dataframe in which column 'X' has json data and has and "id" column also.
df1=pd.DataFrame()
for i, row1 in data.head(2).iterrows():
df2= pd.io.json.json_normalize(row1["X"])
df2.columns = df2.columns.map(lambda x: x.split(".")[-1])
df2["id"]=[row1["id"] for i in range(df2.shape[0])]
if len(df1)==0:
df1=df2.copy()
df1=pd.concat((df1,df2), ignore_index=True)
Error: AssertionError: Number of manager items must equal union of block items # manager items: 46, # tot_items: 49
How to solve this using python or pandas sql.
You can use pd.concat to concatenate two dataframes like
>>> pd.concat((df,df1), ignore_index=True)
id A B C D
0 1 a b c d
1 1 e f g h
2 1 i j k l
3 2 x NaN y z
4 2 u NaN v w

Hive impala query

Input.
Key---- id---- ind1 ----ind2
1 A Y N
1 B N N
1 C Y Y
2 A N N
2 B Y N
Output
Key ind1 ind2
1 Y Y
2 Y N
So basically whenever the ind1..n col is y for same key different id . The output should be y else N.
That why for key 1 both ind is y
And key 2 ....ind is y and n.
You can use max() for this:
select id, max(ind1), max(ind2)
from t
group by id;

Setting a min. range before fetching data

i have a table which has localities numbered with unique numbers. Each locality has some buildings numbered that have the status as Activated = Y or N. i want to pick localities which have the min Building Activated = 'Y' count of 15.
Sample Data :
Locality ACTIVATED
1 Y
1 Y
1 N
1 N
1 N
1 N
2 Y
2 Y
2 Y
2 Y
2 Y
Eg : i need count of locality that with min. 5 Y in ACTIVATED Column
SELECT l.*
FROM Localities l
WHERE (SELECT COUNT(*) FROM Building b
WHERE b.LocalityNumber = l.LocalityNumber
AND b.Activated = 'Y') >= 15