Create multiple columns from existing Hive table columns - hive

How to create multiple columns from an existing hive table. The example data would be like below.
My requirement is to create 2 new columns from existing table only when the condition met.
col1 when code=1. col2 when code=2.
expected output:
Please help in how to achieve it in Hive queries?

If you aggregate values required into arrays, then you can explode and filter only those with matching positions.
Demo:
with
my_table as (--use your table instead of this CTE
select stack(8,
'a',1,
'b',2,
'c',3,
'b1',2,
'd',4,
'c1',3,
'a1',1,
'd1',4
) as (col, code)
)
select c1.val as col1, c2.val as col2 from
(
select collect_set(case when code=1 then col else null end) as col1,
collect_set(case when code=2 then col else null end) as col2
from my_table where code in (1,2)
)s lateral view outer posexplode(col1) c1 as pos, val
lateral view outer posexplode(col2) c2 as pos, val
where c1.pos=c2.pos
Result:
col1 col2
a b
a1 b1
This approach will not work if arrays are of different size.
Another approach - calculate row_number and full join on row_number, this will work if col1 and col2 have different number of values (some values will be null):
with
my_table as (--use your table instead of this CTE
select stack(8,
'a',1,
'b',2,
'c',3,
'b1',2,
'd',4,
'c1',3,
'a1',1,
'd1',4
) as (col, code)
),
ordered as
(
select code, col, row_number() over(partition by code order by col) rn
from my_table where code in (1,2)
)
select c1.col as col1, c2.col as col2
from (select * from ordered where code=1) c1
full join
(select * from ordered where code=2) c2 on c1.rn = c2.rn
Result:
col1 col2
a b
a1 b1

Related

SQL with having statement now want complete rows

Here is a mock table
MYTABLE ROWS
PKEY 1,2,3,4,5,6
COL1 a,b,b,c,d,d
COL2 55,44,33,88,22,33
I want to know which rows have duplicated COL1 values:
select col1, count(*)
from MYTABLE
group by col1
having count(*) > 1
This returns :
b,2
d,2
I now want all the rows that contain b and d. Normally, I would use where in stmt, but with the count column, not certain what type of statement I should use?
maybe you need
select * from MYTABLE
where col1 in
(
select col1
from MYTABLE
group by col1
having count(*) > 1
)
Use a CTE and a windowed aggregate:
WITH CTE AS(
SELECT Pkey,
Col1,
Col2,
COUNT(1) OVER (PARTITION BY Col1) AS C
FROM dbo.YourTable)
SELECT PKey,
Col1,
Col2
FROM CTE
WHERE C > 1;
Lots of ways to solve this here's another
select * from MYTABLE
join
(
select col1 ,count(*)
from MYTABLE
group by col1
having count(*) > 1
) s on s.col1 = mytable.col1;

Average of rows in SQL Server

I have the below table
Col1 Col2 Col3 Col4 Col5
TotalAvg 68.79 65.39 88.21 63.14
I am already saving the total of all columns in the TotalAvg row but now I want to calculate the Average of the TotalAvg row. Can someone please tell me how I can calculate row average.
I am looking for
Select Avg(Col2,Col3,Col4,Col5)
where Col1 = 'TotalAvg'
Thanks
If some of them may have NULL values, you could still use AVG() inside an APPLY.
SELECT
yourTable.Col1,
RowStats.avg
FROM
yourTable
CROSS APPLY
(
SELECT
AVG(x) AS avg
FROM
(
SELECT yourTable.col2 AS x
UNION ALL
SELECT yourTable.col3 AS x
UNION ALL
SELECT yourTable.col4 AS x
UNION ALL
SELECT yourTable.col5 AS x
)
pivot
)
AS rowStats
If by chance you need a more dynamic approach (i.e. variable columns), and IF you're open to a TVF, consider the following:
EDIT
The 1st parameter is a delimited list of columns to exclude. For example: 'IDNr,Year,AnyOtherNumericCol'.
Example
Select A.*
,B.*
From YourTable A
Cross Apply [dbo].[tvf-Stat-Row-Agg]('',(Select A.* for XML Raw)) B
Returns
Col1 Col2 Col3 Col4 Col5 RetCnt RetSum RetMin RetMax RetAvg RetStd
TotalAvg 68.79 65.39 88.21 63.14 4 285.53 63.14 88.21 71.3825 11.4562162892757
The TVF if Interested
CREATE FUNCTION [dbo].[tvf-Stat-Row-Agg](#Exclude varchar(500),#XML xml)
Returns Table
As
Return (
Select RetCnt = Count(Value)
,RetSum = Sum(Value)
,RetMin = Min(Value)
,RetMax = Max(Value)
,RetAvg = Avg(Value)
,RetStd = Stdev(Value)
From (
Select Item = convert(varchar(100),xAttr.query('local-name(.)'))
,Value = try_convert(float,xAttr.value('.','varchar(max)'))
From #XML.nodes('//#*') x(xAttr)
) S
Where charindex(','+S.Item+',',','+#Exclude+',')=0
);
EDIT 2
If the columns are fixed, and performance is paramount, then...
Select A.*
,B.*
From YourTable A
Cross Apply (
Select AvgVal = avg(Value)
From (values (Col2)
,(Col3)
,(Col4)
,(Col5)
) B1(Value)
) B

HIVE JOIN two tables with different number of rows giving wrong column values

I am relatively new to Hive. Exploring on ways to merge two tables that are not connected to each other by keys. So, I have not used 'ON' condition in the query.
The below is table_1 :
COL1
hello
The below is table_2 :
COL2
world
excellent
EXPECTED RESULT :
hello world
NULL excellent
ACTUAL RESULT :
hello world
hello excellent
My Query :
select col_one,
col_two
from (
select COL1 as col_one
from table_1
) as c1
join (
select COL2 as col_two
from table_2
) as c2;
I'm not sure from how the 'hello' in the result comes when there is no row-2 in table_1
I'm not sure how your query works without an on clause. But, you can do what you want using row_number(), something like this:
select c1.col_one, c2.col_two
from (select COL1 as col_one, row_number() over (order by col1) as seqnum
from table_1
) c1 join
(select COL2 as col_two, row_number() over (order by col2) as seqnum
from table_2
) c2
on c1.seqnum = c2.seqnum;

Find duplicate symmetric rows in a table

I have a table which contains data as
col1 col2
a b
b a
c d
d c
a d
a c
For me row 1 and row 2 are duplicate because a, b & b, a are the same. The same stands for row 3 and row 4.
I need an SQL (not PL/SQL) query which gives output as
col1 col2
a b
c d
a d
a c
select distinct least(col1, col2), greatest(col1, col2)
from your_table
Edit: for those using a DBMS that does support the standard SQL functions least and greatest this can be simulated using a CASE expression:
select distinct
case
when col1 < col2 then col1
else col2
end as least_col,
case
when col1 > col2 then col1
else col2
end as greatest_col
from your_table
Try this:
CREATE TABLE t_1(col1 varchar(10),col2 varchar(10))
INSERT INTO t_1
VALUES ('a','b'),
('b','a'),
('c','d'),
('d','c'),
('a','d'),
('a','c')
;with CTE as (select ROW_NUMBER() over (order by (select 0)) as id,col1,col2,col1+col2 as col3 from t_1)
,CTE1 as (
select id,col1,col2,col3 from CTE where id=1
union all
select c.id,c.col1,c.col2,CASE when c.col3=REVERSE(c1.col3) then null else c.col3 end from CTE c inner join CTE1 c1
on c.id-1=c1.id
)
select col1,col2 from CTE1 where col3 is not null

sort items based on their appears count

I have data like this
d b c
a d
c b
a b
c a
c a d
c
if you analyse, you will find the appearance of each element as follows
a: 4
b: 3
c: 5
d: 2
According to appearance my sorted elements would be
c,a,b,d
and final output should be
c b d
a d
c b
a b
c a
c a d
c
Any clue, how we can achieve this using sql query ?
Unless there is another column which dictates the order of the input rows, it will not be possible to guarantee that the output rows are returned in the same order. I've made an assumption here to order them by the three column values so that the result is deterministic.
It's likely to be possible to compact this code into fewer steps, but shows the steps reasonably clearly.
Note that for a large dataset, it may be more efficient to partition some of these steps into SELECT INTO operations creating temporary tables or work tables.
DECLARE #t TABLE
(col1 CHAR(1)
,col2 CHAR(1)
,col3 CHAR(1)
)
INSERT #t
SELECT 'd','b','c'
UNION SELECT 'a','d',NULL
UNION SELECT 'c','b',NULL
UNION SELECT 'a','b',NULL
UNION SELECT 'c','a',NULL
UNION SELECT 'c','a','d'
UNION SELECT 'c',NULL,NULL
;WITH freqCTE
AS
(
SELECT col1 FROM #t WHERE col1 IS NOT NULL
UNION ALL
SELECT col2 FROM #t WHERE col2 IS NOT NULL
UNION ALL
SELECT col3 FROM #t WHERE col3 IS NOT NULL
)
,grpCTE
AS
(
SELECT col1 AS val
,COUNT(1) AS cnt
FROM freqCTE
GROUP BY col1
)
,rowNCTE
AS
(
SELECT *
,ROW_NUMBER() OVER (ORDER BY col1
,col2
,col3
) AS rowN
FROM #t
)
,buildCTE
AS
(
SELECT rowN
,val
,cnt
,ROW_NUMBER() OVER (PARTITION BY rowN
ORDER BY ISNULL(cnt,-1) DESC
,ISNULL(val,'z')
) AS colOrd
FROM (
SELECT *
FROM rowNCTE AS t
JOIN grpCTE AS g1
ON g1.val = t.col1
UNION ALL
SELECT *
FROM rowNCTE AS t
LEFT JOIN grpCTE AS g2
ON g2.val = t.col2
UNION ALL
SELECT *
FROM rowNCTE AS t
LEFT JOIN grpCTE AS g3
ON g3.val = t.col3
) AS x
)
SELECT b1.val AS col1
,b2.val AS col2
,b3.val AS col3
FROM buildCTE AS b1
JOIN buildCTE AS b2
ON b2.rowN = b1.rowN
AND b2.colOrd = 2
JOIN buildCTE AS b3
ON b3.rowN = b1.rowN
AND b3.colOrd = 3
WHERE b1.colOrd = 1
ORDER BY b1.rowN