Let's consider two tables:
First:
Id Data
1 asd
2 buu
And Second:
UPD:
Id Data
10 ffu
11 fffuuu
10001 asd
I want to get a 4-column table looking like this:
Id1 Data1 Id2 Data2
1 asd 10 fuu
2 buu 11 fffuuu
-1 [any text] 10001 asd
(if the numbers of rows are not equal ,let's use "-1" for the id)
How to do this?
I'm using sqlite3-3.7.3.
UPD2:
There is no matching criteria between tables,any random matching between them will be sufficient for me.
Assuming that the id columns are unique and not null, you can "zip" your tables by:
Creating a row number for each row that corresponds to the
position of the row when the table is ordered by the unique id (as
polishchuk mentioned in his comment); and,
Simulating a FULL OUTER JOIN with 2 LEFT OUTER JOINS.
To demonstrate, I used two tables with differing row counts:
CREATE TABLE foo (id INTEGER PRIMARY KEY AUTOINCREMENT, data TEXT);
INSERT INTO foo VALUES (NULL, 'a');
INSERT INTO foo VALUES (NULL, 'b');
INSERT INTO foo VALUES (NULL, 'c');
INSERT INTO foo VALUES (NULL, 'd');
INSERT INTO foo VALUES (NULL, 'e');
INSERT INTO foo VALUES (NULL, 'f');
INSERT INTO foo VALUES (NULL, 'g');
INSERT INTO foo VALUES (NULL, 'h');
INSERT INTO foo VALUES (NULL, 'i');
INSERT INTO foo VALUES (NULL, 'j');
DELETE FROM foo WHERE data IN ('b', 'd', 'f', 'i');
CREATE TABLE bar (id INTEGER PRIMARY KEY AUTOINCREMENT, data TEXT);
INSERT INTO bar VALUES (NULL, 'a');
INSERT INTO bar VALUES (NULL, 'b');
INSERT INTO bar VALUES (NULL, 'c');
INSERT INTO bar VALUES (NULL, 'd');
INSERT INTO bar VALUES (NULL, 'e');
INSERT INTO bar VALUES (NULL, 'f');
INSERT INTO bar VALUES (NULL, 'g');
INSERT INTO bar VALUES (NULL, 'h');
INSERT INTO bar VALUES (NULL, 'i');
INSERT INTO bar VALUES (NULL, 'j');
DELETE FROM bar WHERE data IN ('a', 'b');
To obtain a more readable output, I then ran:
.headers on
.mode column
Then you can execute this SQL statement:
SELECT COALESCE(id1, -1) AS id1, data1,
COALESCE(id2, -1) as id2, data2
FROM (
SELECT ltable.rnum AS rnum,
ltable.id AS id1, ltable.data AS data1,
rtable.id AS id2, rtable.data AS data2
FROM
(SElECT (SELECT COUNT(*) FROM foo
WHERE id <= T1.id) rnum, id, data FROM foo T1
) ltable
LEFT OUTER JOIN
(SElECT (SELECT COUNT(*) FROM bar
WHERE id <= T1.id) rnum, id, data FROM bar T1
) rtable
ON ltable.rnum=rtable.rnum
UNION
SELECT rtable.rnum AS rnum,
ltable.id AS id1, ltable.data AS data1,
rtable.id AS id2, rtable.data AS data2
FROM
(SElECT (SELECT COUNT(*) FROM bar
WHERE id <= T1.id) rnum, id, data FROM bar T1
) rtable
LEFT OUTER JOIN
(SElECT (SELECT COUNT(*) FROM foo
WHERE id <= T1.id) rnum, id, data FROM foo T1
) ltable
ON ltable.rnum=rtable.rnum)
ORDER BY rnum
Which gives you:
id1 data1 id2 data2
---------- ---------- ---------- ----------
1 a 3 c
3 c 4 d
5 e 5 e
7 g 6 f
8 h 7 g
10 j 8 h
-1 9 i
-1 10 j
This works "both ways", for example, if you invert the two tables (foo and bar), you get:
id1 data1 id2 data2
---------- ---------- ---------- ----------
3 c 1 a
4 d 3 c
5 e 5 e
6 f 7 g
7 g 8 h
8 h 10 j
9 i -1
10 j -1
Related
I have a table called MyTable which has columns A, B and then multiple other columns where the values don't matter.
What I want to do is filter out all the rows, that when we group the data by A gives the maximum amount of rows for the given B. Probably easier to explain with an example, if the data looked like this
A B ...
a f ...
a f ...
a f ...
a g ...
a g ...
b h ...
b h ...
b i ...
b i ...
c j ...
c j ...
The output would be
A B ...
a g ...
a g ...
b i ...
b i ...
Filtered out the all the data with (a, f) because there is 3 of them compared to only 2 of (a, g).
Filtered out (b, h) because there is 2 of them compared to 2 of (b, i), in this case it makes no difference which we filter out as long it's one of them.
Filtered out(c, j) as it is the only grouping and therefore still the maximum amount.
In term of how to implement this I'm thinking we need to do something like this at some point to get the amount for each grouping:
SELECT A, B, count()
FROM MyTable
GROUP BY A, B
This should initially give something of the form:
A B count
a f 3
a g 2
b h 2
b i 2
c j 2
Not sure at this point how to get the maximum for each A then apply it when selecting from the original table?
If your RDBMS uses window functions use row_number in a CTE
with cte as(
select
grouper,
value,
count(value) "number",
row_number() over (partition by grouper order by count(value) desc) rn
from t
group by
grouper,
value)
select * from cte where rn = 1;
grouper | value | number | rn
:------ | :---- | -----: | -:
a | f | 3 | 1
b | h | 2 | 1
c | j | 2 | 1
db<>fiddle here
Here is one option:
with tabl1 as(
select col1,col2,count(*)over(partition by col1,col2) cnt
from test1 t1)
select col1,max(col2) from tabl1 t1
WHERE exists(
select *
from tabl1 t2
where t1.cnt
<=t2.cnt and t1.col1=t2.col1 and t1.col2!=t2.col2
)
group by col1
Sample:
create table test1 (col1 varchar(1),col2 varchar(1));
insert into test1 values ('a', 'f');
insert into test1 values ('a', 'f');
insert into test1 values ('a', 'f');
insert into test1 values ('a', 'g');
insert into test1 values ('a', 'g');
insert into test1 values ('b', 'h');
insert into test1 values ('b', 'h');
insert into test1 values ('b', 'i');
insert into test1 values ('b', 'i');
insert into test1 values ('c', 'j');
insert into test1 values ('c', 'j');
Result:
COL1 | MAX(COL2)
b i
a g
I'm having different results between Oracle and Redshift when I do a count(distinct my_field).
Assuming my_field has the following values : "", a, b c.
Oracle's count distinct will give me 3.
Redshift's count distinct will give me 4 (unless I specifically add a clause testing length > 0).
Has anyone seen this before ?
Is there a way to set up the database so it ignores empty values in a distinct count ?
Thanks a lot !
## CREATE TABLE
dev=# create table my_table (my_key integer null, my_field varchar null) diststyle even;
CREATE TABLE
## LOAD DATA
dev=# insert into my_table values (0, 'A');
INSERT 0 1
dev=# insert into my_table values (1, 'B');
INSERT 0 1
dev=# insert into my_table values (2, 'C');
INSERT 0 1
dev=# insert into my_table values (3, NULL);
INSERT 0 1
dev=# insert into my_table values ( NULL, 'E');
INSERT 0 1
dev=# insert into my_table values (6, NULL);
INSERT 0 1
dev=# insert into my_table values (7, NULL);
INSERT 0 1
dev=# insert into my_table values (8, 'A');
INSERT 0 1
##CHECK CONTENTS OF TABLE
dev=# SELECT * FROM my_table;
my_key | my_field
--------+----------
0 | A
2 | C
3 |
| E
1 | B
6 |
7 |
8 | A
(8 rows)
## DISTINCT RESULTS
dev=# SELECT DISTINCT my_field FROM my_table;
my_field
----------
C
E
B
A
(5 rows)
dev=# SELECT COALESCE(my_field, 'NULL') FROM my_table GROUP BY 1 ;
coalesce
----------
A
NULL
C
E
B
(5 rows)
dev=# SELECT DISTINCT COALESCE(my_field, 'NULL') FROM my_table WHERE my_field ;
ERROR: argument of WHERE must be type boolean, not type character varying
dev=# SELECT DISTINCT my_field FROM my_table WHERE my_field IS NOT NULL;
my_field
----------
B
C
E
A
(4 rows)
dev=#
I have the following tables
Table A Table B Table C
ColumnA ColumnB ColumnA ColumnB ColumnA ColumnB
1 A 2 X X Value1
2 B 3 Y Y Value2
3 C 5 Z Z Value3
4 D
5 E
The result required is
Column1 Column2 Column3
1 A
2 Value1 B
3 Value2 C
4 D
5 Value3 E
I have been playing with the left outer join. But still not getting close to the result I am looking for. Any help is appreciated.
You need to use the LEFT JOIN twice:
CREATE table tablea (
columna NUMBER,
columnb VARCHAR2(1)
);
CREATE table tableb (
columna NUMBER,
columnb VARCHAR2(1)
);
CREATE table tablec (
columna VARCHAR2(1),
columnb VARCHAR2(10)
);
INSERT INTO tablea VALUES (1, 'A');
INSERT INTO tablea VALUES (2, 'B');
INSERT INTO tablea VALUES (3, 'C');
INSERT INTO tablea VALUES (4, 'D');
INSERT INTO tablea VALUES (5, 'E');
INSERT INTO tableb VALUES (2, 'X');
INSERT INTO tableb VALUES (3, 'Y');
INSERT INTO tableb VALUES (5, 'Z');
INSERT INTO tablec VALUES ('X', 'Value1');
INSERT INTO tablec VALUES ('Y', 'Value2');
INSERT INTO tablec VALUES ('Z', 'Value3');
COMMIT;
SELECT ta.columna, tc.columnb, ta.columnb
FROM tablea ta
LEFT JOIN tableb tb ON (ta.columna = tb.columna)
LEFT JOIN tablec tc ON (tc.columna = tb.columnb)
ORDER BY 1
;
Output:
COLUMNA COLUMNB COLUMNB
---------- ---------- -------
1 A
2 Value1 B
3 Value2 C
4 D
5 Value3 E
SQLFiddle demo
I have a table that looks like
id cat data
--------------------
1 1 foo
2 1 bar
3 1 baz
4 2 some
5 2 random
6 3 Data 1
7 2 data
8 3 Data 2
9 3 Data 3
And I want the last 3 ids and data of each category in a single row like
cat id1 data1 id2 data2 id3 data3
-----------------------------------------------------
1 1 foo 2 bar 3 baz
2 4 some 5 random 7 data
3 6 Data 1 8 Data 2 9 Data 3
I already tried the following:
Get the data with the highest id for each cat:
SELECT id, data FROM tbl t1 WHERE EXISTS (
SELECT 1 FROM tbl t2 WHERE t1.cat = t2.cat
GROUP BY t2.cat HAVING MAX(t2.id) = t1.id
)
Get the data with the 2nd highest ids for each cat:
SELECT id, data FROM tbl t1 WHERE EXISTS (
SELECT 1 FROM tbl t2 WHERE t1.cat = t2.cat AND NOT EXISTS (
-- Not the highest value
SELECT 1 FROM tbl t3 WHERE t1.cat = t3.cat GROUP BY t3.cat
HAVING MAX(t3.id) = t2.id
) GROUP BY t2.cat HAVING MAX(t2.id) = t1.id
)
Get the data with the 3rd highest id for each cat:
SELECT id, data FROM tbl t1 WHERE EXISTS (
SELECT 1 FROM tbl t2 WHERE t1.cat = t2.cat AND NOT EXISTS (
-- id is not 2nd highest
SELECT 1 FROM tbl t3 WHERE t1.cat = t3.cat AND NOT EXISTS (
-- id is not the highest
SELECT 1 FROM tbl t4 WHERE t1.cat = t4.cat GROUP BY t4.cat
HAVING MAX(t4.id) = t3.id
) GROUP BY t3.cat HAVING MAX(t3.id) = t2.id
) AND NOT EXIST (
-- not the highest id
SELECT 1 FROM tbl t5 WHERE t1.cat = t5.cat GROUP BY t5.cat
HAVING MAX(t5.id) = t2.id
) GROUP BY t2.cat HAVING MAX(t2.id) = t1.id
)
And now, joining the entire thing. But I believe that there exists a better solution. What is it?
PS: I have to do it with Informix
Not my answer, a coworker of mine came up with this:
create temp table t(
id smallint,
cat smallint,
data char(10)
) with no log;
insert into t values (1, 1, "foo");
insert into t values (2, 1, "bar");
insert into t values (3, 1, "baz");
insert into t values (4, 2, "some");
insert into t values (5, 2, "random");
insert into t values (6, 3, "Data 1");
insert into t values (7, 2, "data");
insert into t values (8, 3, "Data 2");
insert into t values (9, 3, "Data 3");
insert into t values (10, 4, "some");
insert into t values (11, 4, "more");
insert into t values (12, 4, "random");
insert into t values (13, 4, "data");
insert into t values (14, 4, "for");
insert into t values (15, 4, "testing");
insert into t values (16, 5, "one");
select
cat,
max(case when cnt = 3 then id end) as id1,
max(case when cnt = 2 then id end) as id2,
max(case when cnt = 1 then id end) as id3,
max(case when cnt = 3 then data end) as data1,
max(case when cnt = 2 then data end) as data2,
max(case when cnt = 1 then data end) as data3
from
(
select
a.cat,
a.id,
a.data,
count(*) as cnt
from
t a,
t b
where
a.cat = b.cat and
a.id <= b.id
group by
a.id,
a.cat,
a.data
having
count(*) <= 3
)
group by
1
order by
1;
cat id1 id2 id3 data1 data2 data3
1 1 2 3 foo bar baz
2 4 5 7 some random data
3 6 8 9 Data 1 Data 2 Data 3
4 13 14 15 data for testing
5 16 one
If you using Informix 11.50 or above , there is an option where isn't perfect, but maybe can help. Check the select at end bellow.
They will return a multiset datatype with char() data type... where probably will create a difficult to read it, depending of the program language are you using.
Thanks to Fernando Nunes who suggest this SQL into IIUG forum
At this moment I don't see other alternative besides complex SQLs.
drop table teste;
create temp table teste ( id smallint, cat smallint, data char(10));
insert into teste values ( 1, 1, 'foo ' );
insert into teste values ( 2, 1, 'bar ' );
insert into teste values ( 3, 1, 'baz ' );
insert into teste values ( 4, 2, 'some ' );
insert into teste values ( 5, 2, 'random ' );
insert into teste values ( 6, 3, 'Data 1 ' );
insert into teste values ( 7, 2, 'data ' );
insert into teste values ( 8, 3, 'Data 2 ' );
insert into teste values ( 9, 3, 'Data 3 ' );
insert into teste values ( 10, 3, 'Data 4 ' );
select * from teste;
select ms.*
from
(
SELECT MULTISET( SELECT ITEM t.id || ',' || t.cat || ',' || t.data m1 FROM
teste t WHERE t.cat = tout.cat) FROM (SELECT unique cat from teste) tout
) msdrop table teste;
will return :
expression MULTISET{'1,1,foo ','2,1,bar ','3,1,baz '}
expression MULTISET{'4,2,some ','5,2,random ','7,2,data '}
expression MULTISET{'6,3,Data 1 ','8,3,Data 2 ','9,3,Data 3 '}
I have a table with 2 columns
Input
Col 1 ---- Col 2
1 ---- aaaa
1 ---- bbbb
1 ---- cccc
2 ---- dddd
2 ---- eeee
2 ---- ffff
2 ---- gggg
Output
Col 1 ---- Col 2
1 ---- aaaabbbbcccc
2 ---- ddddeeeeffffgggg
I was thinking of doing several self joins, but doesnt seem efficient. Any ideas on how the sql has to be written?
Ok, I'll bite. Instead of stragg, try listagg (in 11.2):
create table tst1
(
pid number,
val varchar2(10)
);
insert into tst1 values(1, 'Rec1');
insert into tst1 values(1, 'Rec2');
insert into tst1 values(1, 'Rec3');
insert into tst1 values(2, 'Rec1');
insert into tst1 values(2, 'Rec2');
commit;
select pid, listagg(val, ':') within group(order by val) as "The List"
from tst1
group by pid;
And you get:
pid The List
1 Rec1:Rec2:Rec3
2 Rec1:Rec2
If you change the order by to "order by val desc" you'd get
pid The List
1 Rec3:Rec2:Rec1
2 Rec2:Rec1
This is a version that will work in Oracle 9i and up.
create table foo (
key_column number,
val_column varchar2(4)
);
insert into foo values (1, 'aaaa');
insert into foo values (1, 'bbbb');
insert into foo values (1, 'cccc');
insert into foo values (2, 'dddd');
insert into foo values (2, 'eeee');
insert into foo values (2, 'ffff');
insert into foo values (2, 'gggg');
select key_column
, replace(max(sys_connect_by_path(val_column, ',')), ',') combined
from (select key_column
, val_column
, row_number() over (partition by key_column order by val_column) cur
, row_number() over (partition by key_column order by val_column) - 1 prev
from foo) foo
group by key_column
connect by prior cur = prev and prior key_column = key_column
start with cur = 1;
key_column | val_column
--------------------------
1 | aaaabbbbcccc
2 | ddddeeeeffffgggg