How to find mode of multiple columns in Snowflake SQL - sql

Input column example :
ID
Column A
Column B
Column C
1
cat
cat
dog
2
dog
cat
dog
3
cat
cat
dog
4
bird
cat
dog
Output column example :
ID
Column A
Column B
Column C
Mode
1
cat
cat
dog
cat
2
dog
cat
dog
dog
3
cat
cat
dog
cat
4
bird
cat
bird
bird
So far I have only calculated mode for a single column. Not sure how we can do it horizontally by combining 4 columns.

We can use an unpivot approach with the help of a union query. Then, use ROW_NUMBER() to select the mode:
WITH cte AS (
SELECT ID, ColumnA AS val FROM yourTable UNION ALL
SELECT ID, ColumnB FROM yourTable UNION ALL
SELECT ID, ColumnC FROM yourTable
),
cte2 AS (
SELECT *, COUNT(*) OVER (PARTITION BY ID, val) cnt
FROM cte
),
cte3 AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY cnt DESC, val) rn
FROM cte2
)
SELECT t1.ColumnA, t1.ColumnB, t1.ColumnC, t2.val AS Mode
FROM yourTable t1
INNER JOIN cte3 t2
ON t2.ID = t1.ID
WHERE t2.rn = 1
ORDER BY t1.ID;
In the event that two or more values are tied for the mode, it breaks the tie by arbitrarily returning the alphabetically lower value.

I think you can use the built in MODE function for this, and Snowflake semi-structured functionality to unpivot and apply it. Check the behaviour of MODE suits your needs with regards to breaking ties and null handling etc.
First create test data like your example (please include code to repro in future!)
create view YOURTABLE as
select
ID,
COLUMN_A,
COLUMN_B,
COLUMN_C
from (values
(1,'cat','cat','dog'),
(2,'dog','cat','dog'),
(3,'cat','cat','dog'),
(4,'bird','cat','dog')
) vw (ID, COLUMN_A, COLUMN_B, COLUMN_C);
Here's the query to get your output;
with o_r as (Select ID, COLUMN_A, COLUMN_B, COLUMN_C,
array_construct(COLUMN_A, COLUMN_B, COLUMN_C) arr_row from YOURTABLE )
select ID, COLUMN_A, COLUMN_B, COLUMN_C, mode(value::VARCHAR) MODE
from o_r, lateral flatten (input => o_r.arr_row) lf
group by 1,2,3,4
order by 1;
Gather up the columns we want to calculate MODE over with array_construct(),lateral flatten the array, and then group by your ID, and columns with MODE on the flattened VALUE column.

Related

HIVE JOIN two tables with different number of rows giving wrong column values

I am relatively new to Hive. Exploring on ways to merge two tables that are not connected to each other by keys. So, I have not used 'ON' condition in the query.
The below is table_1 :
COL1
hello
The below is table_2 :
COL2
world
excellent
EXPECTED RESULT :
hello world
NULL excellent
ACTUAL RESULT :
hello world
hello excellent
My Query :
select col_one,
col_two
from (
select COL1 as col_one
from table_1
) as c1
join (
select COL2 as col_two
from table_2
) as c2;
I'm not sure from how the 'hello' in the result comes when there is no row-2 in table_1
I'm not sure how your query works without an on clause. But, you can do what you want using row_number(), something like this:
select c1.col_one, c2.col_two
from (select COL1 as col_one, row_number() over (order by col1) as seqnum
from table_1
) c1 join
(select COL2 as col_two, row_number() over (order by col2) as seqnum
from table_2
) c2
on c1.seqnum = c2.seqnum;

Big Query view (table without duplicate rows)

I need to create a view that is pretty much just like some table with some simple transformations and I want to make sure the values in a particular column are not duplicate.
So let's say the table looks like this:
ID, ColumnA, ColumnB
-------------------
1 cars shirts
2 tvs dogs
1 fingers computers
And the resulting view would look like this:
ID, ColumnA, ColumnB
-------------------
1 cars shirts
2 tvs dogs
So, is there an equivalent to SELECT distint(ID), ColumnA, ColumnB?
What's the most efficient way to do it?
If you just want an arbitrary row for each ID, use ANY_VALUE:
#standardSQL
WITH Input AS (
SELECT 1 AS ID, 'cars' AS ColumnA, 'shirts' AS ColumnB UNION ALL
SELECT 2 AS ID, 'tvs' AS ColumnA, 'dogs' AS ColumnB UNION ALL
SELECT 1 AS ID, 'fingers' AS ColumnA, 'computers' AS ColumnB
)
SELECT
ANY_VALUE(t).*
FROM Input AS t
GROUP BY t.ID;
Or you can use the ARRAY_AGG trick to select the latest row based on a condition.
Below is for BigQuery Standard SQL
#standardSQL
WITH yourTable AS (
SELECT 1 AS id, 'cars' AS columnA, 'shirts' AS columnB UNION ALL
SELECT 2, 'tvs', 'dogs' UNION ALL
SELECT 1, 'fingers', 'computers'
)
SELECT r.*
FROM (
SELECT ARRAY_AGG(t ORDER BY columnA LIMIT 1)[OFFSET (0)] AS r
FROM yourTable t
GROUP BY id
)
-- ORDER BY id
Note: you should have some logic about selecting row with cars over the fingers!
Above version (as an example) is based on asc order

select distinct values on 1 column only in oracle sql developer

i have 3 columns.
column_a, column_b, column_c
I am trying to get all the rows with making distinct only in column_a.
when i write
select distinct column_a,column_b,column_c
i think it gives me distinct pairs. So i have
value1 - a - b
value1 - a - c
I want to keep the distinct values of column_a following by column_b and column_c values cause i will create a table from that sql query and i want to add PK the column_a column.
I think the safest way is to use row_number():
select a, b, c
from (select t.*,
row_number() over (partition by a order by a) as seqnum
from t
) t
where seqnum = 1;
This will return an arbitrary row for each a value, but a will not be repeated in the result set.

How to use order by with union all in sql?

I tried the sql query given below:
SELECT * FROM (SELECT *
FROM TABLE_A ORDER BY COLUMN_1)DUMMY_TABLE
UNION ALL
SELECT * FROM TABLE_B
It results in the following error:
The ORDER BY clause is invalid in views, inline functions, derived
tables, subqueries, and common table expressions, unless TOP or FOR
XML is also specified.
I need to use order by in union all. How do I accomplish this?
SELECT *
FROM
(
SELECT * FROM TABLE_A
UNION ALL
SELECT * FROM TABLE_B
) dum
-- ORDER BY .....
but if you want to have all records from Table_A on the top of the result list, the you can add user define value which you can use for ordering,
SELECT *
FROM
(
SELECT *, 1 sortby FROM TABLE_A
UNION ALL
SELECT *, 2 sortby FROM TABLE_B
) dum
ORDER BY sortby
You don't really need to have parenthesis. You can sort directly:
SELECT *, 1 AS RN FROM TABLE_A
UNION ALL
SELECT *, 2 AS RN FROM TABLE_B
ORDER BY RN, COLUMN_1
Not an OP direct response, but I thought I would jimmy in here responding to the the OP's ERROR messsage, which may point you in another direction entirely!
All these answers are referring to an overall ORDER BY once the record set has been retrieved and you sort the lot.
What if you want to ORDER BY each portion of the UNION independantly, and still have them "joined" in the same SELECT?
SELECT pass1.* FROM
(SELECT TOP 1000 tblA.ID, tblA.CustomerName
FROM TABLE_A AS tblA ORDER BY 2) AS pass1
UNION ALL
SELECT pass2.* FROM
(SELECT TOP 1000 tblB.ID, tblB.CustomerName
FROM TABLE_B AS tblB ORDER BY 2) AS pass2
Note the TOP 1000 is an arbitary number. Use a big enough number to capture all of the data you require.
There will be times when you need to do something like this :
Pull top 5 from table 1 based on a sort
and bottom 5 from table 2 based on another sort
and union these together.
solution
select * from (
-- top 5 records
select top 5 col1, col2, col3
from table1
group by col1, col2
order by col3 desc ) z
union all
select * from (
-- bottom 5 records
select top 5 col1, col2, col3
from table2
group by col1, col2
order by col3 ) z
this was the only way i was able to get around the error and worked fine for me.
SELECT * FROM (SELECT *
FROM TABLE_A ORDER BY COLUMN_1)DUMMY_TABLE
UNION ALL
SELECT * FROM TABLE_B
ORDER BY 2;
2 is column number here .. In Oracle SQL you can use the column number by which you want to sort the data
This solved my SELECT statement:
SELECT * FROM
(SELECT id,name FROM TABLE_A
UNION ALL
SELECT id,name FROM TABLE_B ) dum
order by dum.id , dum.name
where id and name columns available in tables and you can use your columns .
Simply use that , no need parenthesis or anything else
SELECT *, id as TABLE_A_ID FROM TABLE_A
UNION ALL
SELECT *, id as TABLE_B_ID FROM TABLE_B
ORDER BY TABLE_A_ID, TABLE_B_ID
ORDER BY after the last UNION should apply to both datasets joined by union.
The solution shown below:
SELECT *,id AS sameColumn1 FROM Locations
UNION ALL
SELECT *,id AS sameColumn2 FROM Cities
ORDER BY sameColumn1,sameColumn2
select CONCAT(Name, '(',substr(occupation, 1, 1), ')') AS f1
from OCCUPATIONS
union
select temp.str AS f1 from
(select count(occupation) AS counts, occupation, concat('There are a total of ' ,count(occupation) ,' ', lower(occupation),'s.') As str from OCCUPATIONS group by occupation order by counts ASC, occupation ASC
) As temp
order by f1

Union select statements horizontally

let's say result of my select statements as follows (I have 5 of those):
Id Animal AnimalId
1 Dog Dog1
1 Cat Cat57
Id Transport TransportId
2 Car Car100
2 Plane Plane500
I'd like to get a result as follows:
Id Animal AnimalId Transport TransportId
1 Dog Dog1
1 Cat Cat57
2 Car Car100
2 Plane Plane500
What I can do is I can crate a tablevariable and specify all possible columns and insert records from each select statement into it. But maybe better solution like PIVOT?
Edit
queries: 1st: Select CategoryId as Id, Animal, AnimalId from Animal
2nd: Select CategoryId as Id, Transport, TransportId from Transport
How about this, if you need them in the same rows, this gets the row_number() for each row and joins on those:
select a.id,
a.aname,
a.aid,
t.tname,
t.tid
from
(
select id, aname, aid, row_number() over(order by aid) rn
from animal
) a
left join
(
select id, tname, tid, row_number() over(order by tid) rn
from transport
) t
on a.rn = t.rn
see SQL Fiddle with Demo
If you don't need them in the same row, then use UNION ALL:
select id, aname, aid, 'Animal' tbl
from animal
union all
select id, tname, tid, 'Transport'
from transport
see SQL Fiddle with Demo
Edit #1, here is a version with an UNPIVOT and PIVOT:
select an_id, [aname], [aid], [tname], [tid]
from
(
select *, row_number() over(partition by col order by col) rn
from animal
unpivot
(
value
for col in (aname, aid)
) u
union all
select *, row_number() over(partition by col order by col) rn
from transport
unpivot
(
value
for col in (tname, tid)
) u
) x1
pivot
(
min(value)
for col in([aname], [aid], [tname], [tid])
) p
order by an_id
see SQL Fiddle with Demo
This would do it for you:
SELECT
ID, field1, field2, '' as field3, '' as field4
FROM sometable
UNION ALL
SELECT
ID, '', '', field3, field4
FROM someothertable
create table Animal (
Animal varchar(50)
,AnimalID varchar(50)
)
create table Transport (
Transport varchar(50)
,TransportID varchar(50)
)
insert into Animal values ('Dog', 'Dog1')
insert into Animal values ('Cat', 'Cat57')
insert into Transport values ('Car', 'Car100')
insert into Transport values ('Plane', 'Plane500')
select ID = 1
,A.Animal
,A.AnimalID
,Transport = ''
,TransportID = ''
from Animal A
union
select ID = 2
,Animal = ''
,AnimalID = ''
,T.Transport
,T.TransportID
from Transport T
To get it in the format you want, select the values you want, and then null (or an empty string) for the other columns.
SELECT
CategoryId as Id,
Animal as 'Animal',
AnimalId as 'AnimalId',
null as 'Transport',
null as 'TransportId'
FROM Animal
UNION
SELECT
CategoryId as Id,
null as 'Animal',
null as 'AnimalId',
Transport as 'Transport',
TransportId as 'TransportId'
FROM Transport
I'm still not sure of the purpose of this, but this should give the output you want.
You shouldn't need to pivot, your results are already fine.
If you want, you can just UNION all 5 statements together in the same format as the first select: ID/Category/CategoryID. Then you'll get one long result set with all 5 sets appended 3 columns wide.
Is that what you want? Or do you need to distinguish between 'categories'?
given your example, try:
Select CategoryId as Id, Animal, AnimalId from Animal
union all
Select CategoryId as Id, Transport, TransportId from Transport
if you want, you can alias the columns like:
Select CategoryId as Id, Animal as category, AnimalId as categoryID from Animal
union all
Select CategoryId as Id, Transport, TransportId from Transport
you really don't need to pivot, just space out your columns like you were thinking initially. You don't pivot to move columns, you pivot to perform an aggregate function over grouped data.