merging rows into 1 column - sql

I have a table with 2 columns
Input
Col 1 ---- Col 2
1 ---- aaaa
1 ---- bbbb
1 ---- cccc
2 ---- dddd
2 ---- eeee
2 ---- ffff
2 ---- gggg
Output
Col 1 ---- Col 2
1 ---- aaaabbbbcccc
2 ---- ddddeeeeffffgggg
I was thinking of doing several self joins, but doesnt seem efficient. Any ideas on how the sql has to be written?

Ok, I'll bite. Instead of stragg, try listagg (in 11.2):
create table tst1
(
pid number,
val varchar2(10)
);
insert into tst1 values(1, 'Rec1');
insert into tst1 values(1, 'Rec2');
insert into tst1 values(1, 'Rec3');
insert into tst1 values(2, 'Rec1');
insert into tst1 values(2, 'Rec2');
commit;
select pid, listagg(val, ':') within group(order by val) as "The List"
from tst1
group by pid;
And you get:
pid The List
1 Rec1:Rec2:Rec3
2 Rec1:Rec2
If you change the order by to "order by val desc" you'd get
pid The List
1 Rec3:Rec2:Rec1
2 Rec2:Rec1

This is a version that will work in Oracle 9i and up.
create table foo (
key_column number,
val_column varchar2(4)
);
insert into foo values (1, 'aaaa');
insert into foo values (1, 'bbbb');
insert into foo values (1, 'cccc');
insert into foo values (2, 'dddd');
insert into foo values (2, 'eeee');
insert into foo values (2, 'ffff');
insert into foo values (2, 'gggg');
select key_column
, replace(max(sys_connect_by_path(val_column, ',')), ',') combined
from (select key_column
, val_column
, row_number() over (partition by key_column order by val_column) cur
, row_number() over (partition by key_column order by val_column) - 1 prev
from foo) foo
group by key_column
connect by prior cur = prev and prior key_column = key_column
start with cur = 1;
key_column | val_column
--------------------------
1 | aaaabbbbcccc
2 | ddddeeeeffffgggg

Related

SQL Server 2008 Select all columns of a table + a column from the same table based on a column value

I have a table VERSION with composite keys docId, verId:
docId verId apprvDt old_verId
------ ------ ----------- ----------
A 3 03/20/2017 2
A 2 03/18/2017 1
A 1 03/16/2017 null
B 1 03/18/2017 null
C 2 03/20/2017 1
C 1 03/16/2017 null
Say I select docId=A, verId=3, query should return
docId verId apprvDt old_verId old_apprvDt
------ ------ ----------- ---------- ------------
A 3 03/20/2017 2 03/18/2017
that is to retrieve the apprvDt of the old_verId.
I tried like this
select a.docId, a.verId, a.apprvDt, a.old_verId, b.old_apprvDt
from VERSION as a left join
(select x.docId, x.verId, x.apprvDt as old_apprvDt from REVISN as x
where x.docId = 'A' and x.verId = a.old_verId) as b
on b.docId = a.docId and b.verId = a.old_verId
but I am getting a multi-part binding error.
I want to select a row from VERSION including the apprvDt (old_apprvDt) of old_verId
Solution to your problem is as given below :
DECLARE #tbl TABLE(docId varchar(100), verId int, apprvDt datetime, old_verId int)
insert into #tbl values('A', 3 , '03/20/2017', 2)
insert into #tbl values('A', 2 , '03/18/2017', 1)
insert into #tbl values('A', 1 , '03/16/2017', NULL)
insert into #tbl values('B', 1 , '03/18/2017', NULL)
insert into #tbl values('C', 2 , '03/20/2017', 1)
insert into #tbl values('C', 1 , '03/16/2017', NULL)
select * from #tbl
;with data_table
as
(
select docId,verId,apprvDt,old_verId,
(select apprvDt from #tbl T2 where T2.docId=t1.docid and T2.verId=t1.old_verId)
old_apprvDt from #tbl t1
)
select * from data_table where docId='A' and verId=3
Result is as below :
-------------------------------------------------------------------------------------
docId verId apprvDt old_verId old_apprvDt
-------------------------------------------------------------------------------------
A 3 2017-03-20 00:00:00.000 2 2017-03-18 00:00:00.000
select t1.*,
b.apprvdt as old_apprvdt
from table1 t1
cross apply
(
select apprvdt from table1 t2 where t1.docid=t2.docid
and t2.old_verid =t1.verid ) b

Don't select duplicates based on a column in a union query

I have a query which does an UNION :
select deal_id, codptf from acc_deals where run_id = 1
union
select deal_id, codptf from acc_deals where run_id = 2
However, I find myself with this result :
AAAA;1234
AAAA;3456
BBBB;4569
There is a duplicate row, aka the rows 1 and 2 (same deal_id).
How can I exclude one of the duplicate rows ?
it looks that you need full outer join here
create table test(id varchar2(10), val int, run_id int);
insert into test values('AAAA', 1234, 1);
insert into test values('BBBB', 4569, 1);
insert into test values('AAAA', 3456, 2);
insert into test values('CCCC', 1111, 2);
select
nvl(t1.id, t2.id) as id, nvl(t1.val, t2.val) as val
from
(select * from test where run_id = 1) t1
full join
(select * from test where run_id = 2) t2
on t1.id = t2.id
ID VAL
1 AAAA 1234
2 CCCC 1111
3 BBBB 4569
As an alternative to #are's solution, you could do a group by instead:
create table test(id varchar2(10), val int, run_id int);
insert into test values('AAAA', 1234, 1);
insert into test values('BBBB', 4569, 1);
insert into test values('AAAA', 3456, 2);
insert into test values('CCCC', 1111, 2);
insert into test values('DDDD', 8958, 1);
insert into test values('DDDD', 2345, 2);
commit;
select id,
max(val) max_val,
min(val) min_val,
max(val) keep (dense_rank first order by run_id) val_from_first_run_id,
max(val) keep (dense_rank last order by run_id) val_from_last_run_id
from test
where run_id in (1, 2)
group by id;
ID MAX_VAL MIN_VAL VAL_FROM_FIRST_RUN_ID VAL_FROM_LAST_RUN_ID
---------- ---------- ---------- --------------------- --------------------
AAAA 3456 1234 1234 3456
BBBB 4569 4569 4569 4569
CCCC 1111 1111 1111 1111
DDDD 8958 2345 8958 2345
You'll note that there are several columns there - I've simply listed out a few of the options depending on which value you wanted to keep. It's up to you to decide which of the columns you're going to use, since you didn't specify which value you wanted displayed in the event of duplicates.

How can i remove rows from several tables with SQL?

I have a 3 tables:
mainTable:
id | uniqField
--------------
1 | 1111
2 | 1111
3 | 2222
4 | 2222
5 | 3333
6 | 4444
table1:
id|name|deleted
----------
1|Mike|0
2|Mike|0
5|John|0
table2:
id|name|deleted
----------
3|Peke|0
4|Peke|0
6|Vels|0
That tables bind by id field.
Now i want to remove duplicates from this tables. For mainTable i can use:
DELETE mainTable
FROM mainTable
LEFT OUTER JOIN (
SELECT MIN(id) as RowId, uniqField
FROM mainTable
GROUP BY uniqField
) as KeepRows ON
mainTable.id= KeepRows.RowId
WHERE
KeepRows.RowId IS NULL
But in tables1 and table2 i want to set deleted field to 0->1 to duplicates I mean:
1 | 1111
2 | 1111 -> duplicate->remove
1|Mike|0
2|Mike|0->duplicate-> 0->1
Try this one -
SET NOCOUNT ON;
DECLARE #mainTable TABLE
(
id INT,
uniqField INT
)
INSERT INTO #mainTable (id, uniqField)
VALUES (1, 1111), (2, 1111), (3, 2222), (4, 2222), (5, 3333), (6, 4444)
DECLARE #deleted TABLE (id INT)
;WITH cte AS
(
SELECT *, RowNum = ROW_NUMBER() OVER (PARTITION BY uniqField ORDER BY 1/0)
FROM #mainTable
)
DELETE FROM cte
OUTPUT DELETED.id INTO #deleted
WHERE RowNum > 1
DECLARE #table1 TABLE
(
id INT,
uniqField VARCHAR(10),
deleted BIT
)
DECLARE #table2 TABLE
(
id INT,
uniqField VARCHAR(10),
deleted BIT
)
INSERT INTO #table1 (id, uniqField, deleted)
VALUES (1, 'Mike', 0), (2, 'Mike', 0), (5, 'John', 0)
INSERT INTO #table2 (id, uniqField, deleted)
VALUES (3, 'Peke', 0), (4, 'Peke', 0), (6, 'Vels', 0)
UPDATE #table1
SET deleted = 1
FROM #deleted
WHERE [#deleted].id = [#table1].id
UPDATE #table2
SET deleted = 1
FROM #deleted
WHERE [#deleted].id = [#table2].id
SELECT * FROM #table1
SELECT * FROM #table2
Output -
id uniqField deleted
----------- ---------- -------
1 Mike 0
2 Mike 1
5 John 0
id uniqField deleted
----------- ---------- -------
3 Peke 0
4 Peke 1
6 Vels 0

Select 30% of each column value

Let's assume we have a table with a column 'A' that has values from 0 to N. And I want to select 30% each rows that have the same value for the column 'A'.
So if I have this:
A| B
-------
0 hello
0 test
0 hi
1 blah1
1 blah2
1 blah3
1 blah4
1 blah5
1 blah6
Result:
A| B
-------
0 hello
1 blah1
1 blah4
it could be blah1 or any other blah that is not blah4, and blah4 can be any other blah that is not blah1, basically it could be random or skipping.
By the way, the actual table is huge, talking terabytes, so think about performance.
try something like this:
DECLARE #YourTable table (A int, b varchar(10))
INSERT #YourTable VALUES (0, 'hello') --OP's data
INSERT #YourTable VALUES (0, 'test')
INSERT #YourTable VALUES (0, 'hi')
INSERT #YourTable VALUES (1, 'blah1')
INSERT #YourTable VALUES (1, 'blah2')
INSERT #YourTable VALUES (1, 'blah3')
INSERT #YourTable VALUES (1, 'blah4')
INSERT #YourTable VALUES (1, 'blah5')
INSERT #YourTable VALUES (1, 'blah6')
;WITH NumberedRows AS
( SELECT
A,B,ROW_NUMBER() OVER (PARTITION BY A ORDER BY A,B) AS RowNumber
FROM #YourTable
)
, GroupCounts AS
( SELECT
A,MAX(RowNumber) AS MaxA
FROM NumberedRows
GROUP BY A
)
SELECT
n.a,n.b
FROM NumberedRows n
INNER JOIN GroupCounts c ON n.A=c.A
WHERE n.RowNUmber<=(c.MaxA+1)*0.3
OUTPUT:
a b
----------- ----------
0 hello
1 blah1
1 blah2
(3 row(s) affected)
EDIT based on the great idea in the comment from Andriy M
;WITH NumberedRows AS
( SELECT
A,B,ROW_NUMBER() OVER (PARTITION BY A ORDER BY A,B) AS RowNumber
,COUNT(*) OVER (PARTITION BY A) AS TotalOf
FROM #YourTable
)
SELECT
n.a,n.b
FROM NumberedRows n
WHERE n.RowNumber<=(n.TotalOf+1)*0.3
ORDER BY A
OUTPUT:
a b
----------- ----------
0 hello
1 blah1
1 blah2
(3 row(s) affected)
EDIT here are "random" rows, using Andriy M idea:
DECLARE #YourTable table (A int, b varchar(10))
INSERT #YourTable VALUES (0, 'hello') --OP's data
INSERT #YourTable VALUES (0, 'test')
INSERT #YourTable VALUES (0, 'hi')
INSERT #YourTable VALUES (1, 'blah1')
INSERT #YourTable VALUES (1, 'blah2')
INSERT #YourTable VALUES (1, 'blah3')
INSERT #YourTable VALUES (1, 'blah4')
INSERT #YourTable VALUES (1, 'blah5')
INSERT #YourTable VALUES (1, 'blah6')
;WITH NumberedRows AS
( SELECT
A,B,ROW_NUMBER() OVER (PARTITION BY A ORDER BY newid()) AS RowNumber
FROM #YourTable
)
, GroupCounts AS (SELECT A,COUNT(A) AS MaxA FROM NumberedRows GROUP BY A)
SELECT
n.A,n.B
FROM NumberedRows n
INNER JOIN GroupCounts c ON n.A=c.A
WHERE n.RowNUmber<=(c.MaxA+1)*0.3
ORDER BY n.A
OUTPUT:
a b
----------- ----------
0 hi
1 blah3
1 blah6
(3 row(s) affected)
This uses only one subquery, and thus a single pass through your set.
SELECT a
, b
FROM
(
SELECT A
, b
, ROW_NUMBER()
OVER( PARTITION BY A
ORDER BY b
) r
, COUNT(b)
OVER( PARTITION BY A
) ct
FROM #YourTable
) n
WHERE n.r <= n.ct * 0.3
As does this, although this always returns the top 3 if there are fewer than 10 and "extras" get posted to the first bins.:
SELECT A
, b
FROM
(
SELECT A
, b
, NTILE(10)
OVER( PARTITION BY a
ORDER BY b
) tens
FROM #YourTable
) n
WHERE tens <= 3;

Stick two tables together

Let's consider two tables:
First:
Id Data
1 asd
2 buu
And Second:
UPD:
Id Data
10 ffu
11 fffuuu
10001 asd
I want to get a 4-column table looking like this:
Id1 Data1 Id2 Data2
1 asd 10 fuu
2 buu 11 fffuuu
-1 [any text] 10001 asd
(if the numbers of rows are not equal ,let's use "-1" for the id)
How to do this?
I'm using sqlite3-3.7.3.
UPD2:
There is no matching criteria between tables,any random matching between them will be sufficient for me.
Assuming that the id columns are unique and not null, you can "zip" your tables by:
Creating a row number for each row that corresponds to the
position of the row when the table is ordered by the unique id (as
polishchuk mentioned in his comment); and,
Simulating a FULL OUTER JOIN with 2 LEFT OUTER JOINS.
To demonstrate, I used two tables with differing row counts:
CREATE TABLE foo (id INTEGER PRIMARY KEY AUTOINCREMENT, data TEXT);
INSERT INTO foo VALUES (NULL, 'a');
INSERT INTO foo VALUES (NULL, 'b');
INSERT INTO foo VALUES (NULL, 'c');
INSERT INTO foo VALUES (NULL, 'd');
INSERT INTO foo VALUES (NULL, 'e');
INSERT INTO foo VALUES (NULL, 'f');
INSERT INTO foo VALUES (NULL, 'g');
INSERT INTO foo VALUES (NULL, 'h');
INSERT INTO foo VALUES (NULL, 'i');
INSERT INTO foo VALUES (NULL, 'j');
DELETE FROM foo WHERE data IN ('b', 'd', 'f', 'i');
CREATE TABLE bar (id INTEGER PRIMARY KEY AUTOINCREMENT, data TEXT);
INSERT INTO bar VALUES (NULL, 'a');
INSERT INTO bar VALUES (NULL, 'b');
INSERT INTO bar VALUES (NULL, 'c');
INSERT INTO bar VALUES (NULL, 'd');
INSERT INTO bar VALUES (NULL, 'e');
INSERT INTO bar VALUES (NULL, 'f');
INSERT INTO bar VALUES (NULL, 'g');
INSERT INTO bar VALUES (NULL, 'h');
INSERT INTO bar VALUES (NULL, 'i');
INSERT INTO bar VALUES (NULL, 'j');
DELETE FROM bar WHERE data IN ('a', 'b');
To obtain a more readable output, I then ran:
.headers on
.mode column
Then you can execute this SQL statement:
SELECT COALESCE(id1, -1) AS id1, data1,
COALESCE(id2, -1) as id2, data2
FROM (
SELECT ltable.rnum AS rnum,
ltable.id AS id1, ltable.data AS data1,
rtable.id AS id2, rtable.data AS data2
FROM
(SElECT (SELECT COUNT(*) FROM foo
WHERE id <= T1.id) rnum, id, data FROM foo T1
) ltable
LEFT OUTER JOIN
(SElECT (SELECT COUNT(*) FROM bar
WHERE id <= T1.id) rnum, id, data FROM bar T1
) rtable
ON ltable.rnum=rtable.rnum
UNION
SELECT rtable.rnum AS rnum,
ltable.id AS id1, ltable.data AS data1,
rtable.id AS id2, rtable.data AS data2
FROM
(SElECT (SELECT COUNT(*) FROM bar
WHERE id <= T1.id) rnum, id, data FROM bar T1
) rtable
LEFT OUTER JOIN
(SElECT (SELECT COUNT(*) FROM foo
WHERE id <= T1.id) rnum, id, data FROM foo T1
) ltable
ON ltable.rnum=rtable.rnum)
ORDER BY rnum
Which gives you:
id1 data1 id2 data2
---------- ---------- ---------- ----------
1 a 3 c
3 c 4 d
5 e 5 e
7 g 6 f
8 h 7 g
10 j 8 h
-1 9 i
-1 10 j
This works "both ways", for example, if you invert the two tables (foo and bar), you get:
id1 data1 id2 data2
---------- ---------- ---------- ----------
3 c 1 a
4 d 3 c
5 e 5 e
6 f 7 g
7 g 8 h
8 h 10 j
9 i -1
10 j -1