sort items based on their appears count - sql

I have data like this
d b c
a d
c b
a b
c a
c a d
c
if you analyse, you will find the appearance of each element as follows
a: 4
b: 3
c: 5
d: 2
According to appearance my sorted elements would be
c,a,b,d
and final output should be
c b d
a d
c b
a b
c a
c a d
c
Any clue, how we can achieve this using sql query ?

Unless there is another column which dictates the order of the input rows, it will not be possible to guarantee that the output rows are returned in the same order. I've made an assumption here to order them by the three column values so that the result is deterministic.
It's likely to be possible to compact this code into fewer steps, but shows the steps reasonably clearly.
Note that for a large dataset, it may be more efficient to partition some of these steps into SELECT INTO operations creating temporary tables or work tables.
DECLARE #t TABLE
(col1 CHAR(1)
,col2 CHAR(1)
,col3 CHAR(1)
)
INSERT #t
SELECT 'd','b','c'
UNION SELECT 'a','d',NULL
UNION SELECT 'c','b',NULL
UNION SELECT 'a','b',NULL
UNION SELECT 'c','a',NULL
UNION SELECT 'c','a','d'
UNION SELECT 'c',NULL,NULL
;WITH freqCTE
AS
(
SELECT col1 FROM #t WHERE col1 IS NOT NULL
UNION ALL
SELECT col2 FROM #t WHERE col2 IS NOT NULL
UNION ALL
SELECT col3 FROM #t WHERE col3 IS NOT NULL
)
,grpCTE
AS
(
SELECT col1 AS val
,COUNT(1) AS cnt
FROM freqCTE
GROUP BY col1
)
,rowNCTE
AS
(
SELECT *
,ROW_NUMBER() OVER (ORDER BY col1
,col2
,col3
) AS rowN
FROM #t
)
,buildCTE
AS
(
SELECT rowN
,val
,cnt
,ROW_NUMBER() OVER (PARTITION BY rowN
ORDER BY ISNULL(cnt,-1) DESC
,ISNULL(val,'z')
) AS colOrd
FROM (
SELECT *
FROM rowNCTE AS t
JOIN grpCTE AS g1
ON g1.val = t.col1
UNION ALL
SELECT *
FROM rowNCTE AS t
LEFT JOIN grpCTE AS g2
ON g2.val = t.col2
UNION ALL
SELECT *
FROM rowNCTE AS t
LEFT JOIN grpCTE AS g3
ON g3.val = t.col3
) AS x
)
SELECT b1.val AS col1
,b2.val AS col2
,b3.val AS col3
FROM buildCTE AS b1
JOIN buildCTE AS b2
ON b2.rowN = b1.rowN
AND b2.colOrd = 2
JOIN buildCTE AS b3
ON b3.rowN = b1.rowN
AND b3.colOrd = 3
WHERE b1.colOrd = 1
ORDER BY b1.rowN

Related

Create multiple columns from existing Hive table columns

How to create multiple columns from an existing hive table. The example data would be like below.
My requirement is to create 2 new columns from existing table only when the condition met.
col1 when code=1. col2 when code=2.
expected output:
Please help in how to achieve it in Hive queries?
If you aggregate values required into arrays, then you can explode and filter only those with matching positions.
Demo:
with
my_table as (--use your table instead of this CTE
select stack(8,
'a',1,
'b',2,
'c',3,
'b1',2,
'd',4,
'c1',3,
'a1',1,
'd1',4
) as (col, code)
)
select c1.val as col1, c2.val as col2 from
(
select collect_set(case when code=1 then col else null end) as col1,
collect_set(case when code=2 then col else null end) as col2
from my_table where code in (1,2)
)s lateral view outer posexplode(col1) c1 as pos, val
lateral view outer posexplode(col2) c2 as pos, val
where c1.pos=c2.pos
Result:
col1 col2
a b
a1 b1
This approach will not work if arrays are of different size.
Another approach - calculate row_number and full join on row_number, this will work if col1 and col2 have different number of values (some values will be null):
with
my_table as (--use your table instead of this CTE
select stack(8,
'a',1,
'b',2,
'c',3,
'b1',2,
'd',4,
'c1',3,
'a1',1,
'd1',4
) as (col, code)
),
ordered as
(
select code, col, row_number() over(partition by code order by col) rn
from my_table where code in (1,2)
)
select c1.col as col1, c2.col as col2
from (select * from ordered where code=1) c1
full join
(select * from ordered where code=2) c2 on c1.rn = c2.rn
Result:
col1 col2
a b
a1 b1

Merge three tables in Select query by rule 3, 2, 1 records from each table

Merge three tables in a Select query by rule 3, 2, 1 records from each table as follows:
TableA: ID, FieldA, FieldB, FieldC,....
TableB: ID, FieldA, FieldB, FieldC,....
TableC: ID, FieldA, FieldB, FieldC,....
ID : auto number in each table
FieldA will be unique in all three tables.
I am looking for a Select query to merge three tables as follows:
TOP three records from TableA sorted by ID
TOP two records from TableB sorted by ID
TOP 1 record from TableC sorted by ID
Repeat this until select all records from all three tables.
If some table has fewer records or does not meet the criteria, ignore that and continue with others.
My attempt:
I did it totally through programming way, like cursors and If conditions inside a SQL Server stored procedure.
It makes delay.
This requires a formula that takes row numbers from each table and transforms it into a series of integers that skips the desired values.
In the query below, I am adding some CTE for the sake of shortening the formula. The real magic is in the UNION. Also, I am adding an additional field for your control. Feel free to get rid of it.
WITH A_Aux as (
SELECT 'A' As FromTable, ROW_NUMBER() OVER (ORDER BY ID) AS RowNum, TableA.*
FROM TableA
), B_Aux AS (
SELECT 'B' As FromTable, ROW_NUMBER() OVER (ORDER BY ID) AS RowNum, TableB.*
FROM TableB
), C_Aux AS (
SELECT 'C' As FromTable, ROW_NUMBER() OVER (Order BY ID) AS RowNum, TableC.*
FROM TableC
)
SELECT *
FROM (
SELECT RowNum+3*FLOOR((RowNum-1)/3) As ColumnForOrder, A_Aux.* FROM A_Aux
UNION ALL
SELECT 3+RowNum+4*FLOOR((RowNum-1)/2), B_Aux.* FROM B_Aux
UNION ALL
SELECT 6*RowNum, C_Aux.* FROM C_Aux
) T
ORDER BY ColumnForOrder
PS: note the pattern Offset + RowNum + (6-N) * Floor((RowNum-1)/N) to group N records together (it of course simplifies a lot for TableC).
PPS: I don't have a SQL server at hand to test it. Let me know if there is a syntax error.
You may try this..
GO
select * into #temp1 from (select * from table1) as t1
select * into #temp2 from (select * from table2) as t2
select * into #temp3 from (select * from table3) as t3
select * into #final from (select col1, col2, col3 from #temp1 where 1=0) as tb
declare #i int
set #i=1
while( (select COUNT(*) from #temp1)>#i)
Begin
;with ct1 as (
select ROW_NUMBER() over (order by id) as Slno, * from #temp1
),ct2 as (
select ROW_NUMBER() over (order by id) as Slno, * from #temp2
),ct3 as (
select ROW_NUMBER() over (order by id) as Slno, * from #temp3
),cfinal as (
select top 3 * from #temp1
union all
select top 2 * from #temp2
union all
select top 1 * from #temp3
)
insert into #final ( col1 , col2, col3 )
select col1, col2, col3 from cfinal
delete from #temp1 where id in (select top 3 ID from #temp1)
delete from #temp2 where id in (select top 2 ID from #temp2)
delete from #temp3 where id in (select top 1 ID from #temp3)
set #i = #i+1
End
Select * from #final
Drop table #temp1
Drop table #temp2
Drop table #temp3
GO
First create temp table for all 3 tables with each insert delete the inserted record and this will result you the desired result, if nothing is missing from my side.
Please see to this if this works.
There is not a lot of information to go with here, but I assume you can use UNION to combine multiple statements.
SELECT * TableA ORDER BY ID DESC OFFSET 3 ROWS
UNION
SELECT * TableB ORDER BY ID DESC OFFSET 2 ROWS
UNION
SELECT * TableC ORDER BY ID DESC OFFSET 1 ROWS
Execute and see if this works.
/AF
From my understanding, I create three temp tables as ta, tb, tc.
select * into #ta from (
select 'A' a
union all
select 'A' a
union all
select 'A' a
union all
select 'A' a
union all
select 'A' a
union all
select 'A' a
union all
select 'A' a
) a
select * into #tb from (
select 'B' b
union all
select 'B'
union all
select 'B'
union all
select 'B'
union all
select 'B'
) b
select * into #tc from (
select 'C' c
union all
select 'C'
union all
select 'C'
union all
select 'C'
union all
select 'C'
) c
If tables match you tables, then the output looks like A,A,A,B,B,C,A,A,A,B,B,C,A,B,C,C,C
T-SQL
declare #TAC int = (select count (*) from #ta) -- Table A Count = 7
declare #TBC int = (select count (*) from #tb) -- Table B Count = 5
declare #TAR int = #TAC % 3 -- Table A Reminder = 1
declare #TBR int = #TBC % 2 -- Table B Reminder = 1
declare #TAQ int = (#TAC - #TAR) / 3 -- Table A Quotient = (7 - 1) / 3 = 2, is will passed on NTILE
-- So we gonna split as two group (111), (222)
declare #TBQ int = (#TBC - #TBR) / 2 -- Table B Quotient = (5 - 1) / 2 = 2, is will passed on NTILE
-- So we gonna split as two group (11), (22)
select * from (
select *, NTILE (#TAQ) over ( order by a) FirstOrder, 1 SecondOrder from (
select top (#TAC - #TAR) * from #ta order by a
) ta -- 6 rows are obtained out of 7.
union all
select *, #TAQ + 1, 1 from (
select top (#TAR) * from #ta order by a desc
) ta -- Remaining one row is obtained. Order by desc is must
-- Here FirstOrder is next value of previous value.
union all
select *, NTILE (#TBQ) over ( order by b), 2 from (
select top (#TBC - #TBR) * from #tb order by b
) tb
union all
select *, #TBQ + 1, 2 from (
select top (#TBR) * from #tb order by b desc
) tb
union all
select *, ROW_NUMBER () over (order by c), 3 from #tc
) abc order by FirstOrder, SecondOrder
Let me explain the T-SQL:
Before that, FYR: NTILE and Row Number
Get the count.
Find the Quotient which will pass to NTILE function.
Order by the NTILE value and static.
Note:
I am using SQL Server 2017.
If T-SQL works fine, then you need to change the column in order by <yourcolumn>.

Combine row pairs with swapped columns

I have a table that contains an even number of rows.
The rows form pairs with the same information but the content
of the first two columns swapped. Here is an example with three columns:
1 2 3
=======
A B W
B A W
C D X
D C X
E F Y
H G Z
F E Y
G H Z
My actual table has many more columns, but the content is always the same
within a pair.
I'm looking for an SQL-Statement that gets rid of one row of each pair.
The result should look like this:
1 2 3
=======
A B W
C D X
E F Y
G H Z
My table is generated by a script (which I can't change), so I assume
my input is correct (every row has a partner, rows >=3 are the same for each pair).
A statement that could check these preconditions would be extra cool.
For Me the below code is working
DECLARE #TEST TABLE
(A CHAR(1),B CHAR(1),C CHAR(1))
INSERT INTO #TEST VALUES
('A','B','W'),
('B','A','W'),
('C','D','X'),
('D','C','X'),
('E','F','Y'),
('H','G','Z'),
('F','E','Y'),
('G','H','Z')
SELECT MIN(A) [1],
MAX(A) [2],
C [3]
FROM #TEST
GROUP BY C
Result:
Something similar would help using ROWNUM and CTE
with test_Data as
(
SELECT COL1, COL2, COL3, ROWNUM ROWCOUNT FROM
(
SELECT 'A' COL1, 'B' COL2, 'W' COL3 FROM DUAL UNION
SELECT 'B' COL1, 'A' COL2, 'W' COL3 FROM DUAL UNION
SELECT 'C' COL1, 'D' COL2, 'X' COL3 FROM DUAL UNION
SELECT 'D' COL1, 'C' COL2, 'X' COL3 FROM DUAL
) ORDER BY COL3, COL1
)
SELECT TAB1.COL1, TAB1.COL2, TAB1.COL3 FROM TEST_DATA TAB1, TEST_DATA TAB2
WHERE
TAB1.COL1 = TAB2.COL2
AND TAB1.COL2 = TAB2.COL1
AND TAB1.COL3 = TAB2.COL3
AND TAB1.ROWCOUNT = TAB2.ROWCOUNT+1;
Your query without testdata would be,
with CTE as
(SELECT COL1, COL2, COL3, ROWNUM ROWCOUNT FROM MY_TABLE ORDER BY COL3,COL1)
SELECT TAB1.COL1, TAB1.COL2, TAB1.COL3 FROM CTE TAB1, CTE TAB2
WHERE
TAB1.COL1 = TAB2.COL2
AND TAB1.COL2 = TAB2.COL1
AND TAB1.COL3 = TAB2.COL3
AND TAB1.ROWCOUNT = TAB2.ROWCOUNT+1;
You didn't state your DBMS so this is ANSI SQL:
select least(c1,c2),
greatest(c1,c2),
min(c3) -- choose min or max to your liking
from the_table
group by least(c1,c2), greatest(c1,c2)
If every row has a counterpart where c1 and c2 are swapped then just select rows where c1 and c2 are in a certain order (i.e. c1 < c2).
The EXISTS part makes sure that only rows that have a counterpart are shown. If you want to show all unique rows regardless of whether or not they have a counterpart, then change the last condition from AND EXISTS to OR NOT EXISTS.
SELECT * FROM myTable t1
WHERE c1 < c2
AND EXISTS (
SELECT * FROM myTable t2
WHERE t2.c1 = t1.c2
AND t2.c2 = t1.c1
AND t2.c3 = t1.c3
) ORDER BY c1

row_number or rank similar data

I have a table like this:
col1 col2
A A
A A
A F
B B
B B
B H
C L
A A
A A
A A
A E
C C
C C
C C
C C
C C
C J
And I want result like this:
col1 count
A 3
B 3
C 1
A 4
C 6
If the col1 <> col2 reset count... But I only want sql code not pl-sql etc.
Maybe row_number() over(RESET WHEN col1<>col2).
Please help me.
Ok freinds thank you. sorry for my bad english.
In fact my table like this :
id col1 col2
1000 A A
2000 A A
3000 A F
4000 B B
5000 B B
6000 B H
7000 C L
8000 A A
9000 A A
10000 A A
11000 A E
12000 C C
13000 C C
14000 C C
15000 C C
16000 C C
17000 C J
Id column is unique and has ordered values always. Maybe this will help us to solve problem. Sorry for my missing information to you. And I want solution like above.
I only want col1 and count. But not col1 unique, count must be 1,2,3 bla bla bla... until col1 <> col2...
After this row count must be reset.
First, I'd like to note, that without having an ORDER BY clause, you cannot guarantee the order of the results. To do this sort of calculation, it would be useful to have an identity (auto-incremental) field to establish an order.
That said, you can attempt to use ROW_NUMBER() to create a field to order on.
with yourtablewithrn as (
select col1, col2, row_number() over (order by (select null)) rn
from yourtable
),
yourtablegrouped as (
select *,
rn - row_number() over (partition by col1 order by rn) as grp
from yourtablewithrn
)
select col1,
count(col2) AS cnt
from yourtablegrouped
group by col1, grp
order by min(rn)
SQL Fiddle Demo
As mentioned above I agree with sgeddes that we need some kind of order that we can rely on for this kind of problem. row_number() over () wont do since it more or less is a random number:
create table yourtable
( n int
, col1 varchar(1)
, col2 varchar(1));
insert into yourtable values
(1,'A','A'),
(2,'A','A'),
(3,'A','F'),
(4,'B','B'),
(5,'B','B'),
(6,'B','H'),
(7,'C','L'),
(8,'A','A'),
(9,'A','A'),
(10,'A','A'),
(11,'A','E'),
(12,'C','C'),
(13,'C','C'),
(14,'C','C'),
(15,'C','C'),
(16,'C','C'),
(17,'C','J');
For this sample data col2 has no impact. We could do (a slight variation of sgeddes solution):
select col1, count(1)
from (
select n, col1
, col2
, row_number() over (order by n)
- row_number() over (partition by col1
order by n) as grp
from yourtable
) t
group by col1, grp
order by min(n)
But, what should the result be with a sample like below?
delete from yourtable;
insert into yourtable values
('A','A'),
('A','A'),
('A','F'),
('A','A'),
('A','G');

Find duplicate symmetric rows in a table

I have a table which contains data as
col1 col2
a b
b a
c d
d c
a d
a c
For me row 1 and row 2 are duplicate because a, b & b, a are the same. The same stands for row 3 and row 4.
I need an SQL (not PL/SQL) query which gives output as
col1 col2
a b
c d
a d
a c
select distinct least(col1, col2), greatest(col1, col2)
from your_table
Edit: for those using a DBMS that does support the standard SQL functions least and greatest this can be simulated using a CASE expression:
select distinct
case
when col1 < col2 then col1
else col2
end as least_col,
case
when col1 > col2 then col1
else col2
end as greatest_col
from your_table
Try this:
CREATE TABLE t_1(col1 varchar(10),col2 varchar(10))
INSERT INTO t_1
VALUES ('a','b'),
('b','a'),
('c','d'),
('d','c'),
('a','d'),
('a','c')
;with CTE as (select ROW_NUMBER() over (order by (select 0)) as id,col1,col2,col1+col2 as col3 from t_1)
,CTE1 as (
select id,col1,col2,col3 from CTE where id=1
union all
select c.id,c.col1,c.col2,CASE when c.col3=REVERSE(c1.col3) then null else c.col3 end from CTE c inner join CTE1 c1
on c.id-1=c1.id
)
select col1,col2 from CTE1 where col3 is not null