How to merge MySQL queries with different column counts? - sql

Definitions:
In the results, * denotes an empty column
The data in the tables is such that every field in the table has the value Fieldname + RowCount (so column 'a' in row 1 contains the value 'a1').
2 MySQL Tables
Table1
Fieldnames: a,b,c,d
Table2
Fieldnames: e,f,g,h,i,j
Task:
I want to get the first 4 rows from each of the tables.
Standalone Queries
SELECT Table1.* FROM Table1 WHERE 1 LIMIT 0,4 -- Colcount 4
SELECT Table2.* FROM Table2 WHERE 1 LIMIT 0,4 -- Colcount 6
A simple UNION of the queries fails because the two parts have different column counts.
Version1: add two empty fields to the first query
SELECT Table1.*,'' AS i,'' AS j FROM Table1 WHERE 1 LIMIT 0,4
UNION
SELECT Table2.* FROM Table2 WHERE 1 LIMIT 0,4
So I will get the following fields in the result set:
a,b,c,d,i,j
a1,b1,c1,d1,*,*,
a2,b2,c2,d2,*,*,
....
....
e1,f1,g1,h1,i1,j1
e2,f2,g2,h2,i2,j2
The problem is that the field names of Table2 are overridden by Table1.
Version2 - shift columns by using empty fields:
SELECT Table1.*,'','','','','','' FROM Table1 WHERE 1 LIMIT 0,4
UNION
SELECT '','','','',Table2.* FROM Table2 WHERE 1 LIMIT 0,4
So I will get the following fields in the result set:
a,b,c,d,i,j
a1,b1,c1,d1,*,*,*,*,*,*,
a2,b2,c2,d2,*,*,*,*,*,*,
....
....
*,*,*,*,e1,f1,g1,h1,i1,j1
*,*,*,*,e2,f2,g2,h2,i2,j2
....
....
Problem is solved but I get many empty fields.
Is there a known performance issue?
How do you solve this task?
Is there a best practice to solve this issue?

The output from a query should be a table, which is a set of rows, each row with the same set of column names and types. (There are some DBMS that support ragged rows - with different sets of columns, but that is not a mainstream feature.)
You have to decide how to handle two sets of four rows with different sets of columns in the two sets.
The simplest option, usually, is to do the two standalone queries. The two result sets are not comparable, and should not be conflated.
If you choose your Version 1, then you should decide which set of column names is appropriate, or create a composite set of names using 'AS x' column aliases.
If you choose your Version 2, then you should probably name the trailing columns of the first clause of the UNION; at the moment, they all have no name:
SELECT Table1.*, '' AS e, '' AS f, '' AS g, '' AS h, '' AS i, '' AS j
FROM Table1 WHERE 1 LIMIT 0,4
UNION
SELECT '' AS a, '' AS b, '' AS c, '' AS d, Table2.*
FROM Table2 WHERE 1 LIMIT 0,4
(The AS comments in the second are redundant, but self-consistent; the two halves of the UNION have the same column headings explicitly.)
Except that you have provided empty strings instead of NULL, the notation you have chosen corresponds to an 'OUTER UNION'. You can find occasional references to it in selected parts of the literature (E F Codd in the RM/V2 book; C J Date in critiques of all things OUTER). SQL 1999 provided it as a UNION JOIN; SQL 2003 removed UNION JOIN (that's pretty unusual - and damning of the feature).
I'd use two separate queries.

The thing that seems most sensible is your "version 2", except using NULLs instead of empty strings.

This took some thinking, and then some MySQL-specific workarounds. The concept is this: A Join will produce the table structure you want. What you really want is a full outer join where no row 'matches.' To do this, we need a reliable way to ensure that rows don't match, and then, we have to UNION and LEFT JOIN and a RIGHT JOIN, to overcome MySQL's limitation of no FULL OUTER JOINs.
SQL Fiddle
MySQL 5.6 Schema Setup:
CREATE TABLE A (a int, b int, c int, d int);
CREATE TABLE B (e int, f int, g int, h int, i int, j int);
INSERT INTO A VALUES (1,1,1,1),(2,2,2,2);
INSERT INTO B VALUES (8,8,8,8,8,8),(9,9,9,9,9,9);
Query 1:
SELECT * FROM
(SELECT * FROM (SELECT "TableA" as unique_field) as Ax CROSS JOIN A) as A
LEFT JOIN
(SELECT * FROM (SELECT "TableB" as unique_field) as Bx CROSS JOIN B) AS B
on A.unique_field=B.unique_field
UNION
SELECT * FROM
(SELECT * FROM (SELECT "TableA" as unique_field) as Ax CROSS JOIN A) as A
RIGHT JOIN
(SELECT * FROM (SELECT "TableB" as unique_field) as Bx CROSS JOIN B) AS B
on A.unique_field=B.unique_field
Results:
| unique_field | a | b | c | d | unique_field | e | f | g | h | i | j |
|--------------|--------|--------|--------|--------|--------------|--------|--------|--------|--------|--------|--------|
| TableA | 1 | 1 | 1 | 1 | (null) | (null) | (null) | (null) | (null) | (null) | (null) |
| TableA | 2 | 2 | 2 | 2 | (null) | (null) | (null) | (null) | (null) | (null) | (null) |
| (null) | (null) | (null) | (null) | (null) | TableB | 8 | 8 | 8 | 8 | 8 | 8 |
| (null) | (null) | (null) | (null) | (null) | TableB | 9 | 9 | 9 | 9 | 9 | 9 |
This syntax: SELECT * FROM (SELECT 1 as unique_field) as Ax CROSS JOIN A) as A is more easily understood as (SELECT 1 as unique_field, * FROM A) AS A, but, MySQL doesn't allow a * to follow a field specification.

Related

Comparing different columns in SQL for each row

after some transformation I have a result from a cross join (from table a and b) where I want to do some analysis on. The table for this looks like this:
+-----+------+------+------+------+-----+------+------+------+------+
| id | 10_1 | 10_2 | 11_1 | 11_2 | id | 10_1 | 10_2 | 11_1 | 11_2 |
+-----+------+------+------+------+-----+------+------+------+------+
| 111 | 1 | 0 | 1 | 0 | 222 | 1 | 0 | 1 | 0 |
| 111 | 1 | 0 | 1 | 0 | 333 | 0 | 0 | 0 | 0 |
| 111 | 1 | 0 | 1 | 0 | 444 | 1 | 0 | 1 | 1 |
| 112 | 0 | 1 | 1 | 0 | 222 | 1 | 0 | 1 | 0 |
+-----+------+------+------+------+-----+------+------+------+------+
The ids in the first column are different from the ids in the sixth column.
In a row are always two different IDs that are matched with each other. The other columns always have either 0 or 1 as a value.
I am now trying to find out how many values(meaning both have "1" in 10_1, 10_2 etc) two IDs have on average in common, but I don't really know how to do so.
I was trying something like this as a start:
SELECT SUM(CASE WHEN a.10_1 = 1 AND b.10_1 = 1 then 1 end)
But this would obviously only count how often two ids have 10_1 in common. I could make something like this for example for different columns:
SELECT SUM(CASE WHEN (a.10_1 = 1 AND b.10_1 = 1)
OR (a.10_2 = 1 AND b.10_1 = 1) OR [...] then 1 end)
To count in general how often two IDs have one thing in common, but this would of course also count if they have two or more things in common. Plus, I would also like to know how often two IDS have two things, three things etc in common.
One "problem" in my case is also that I have like ~30 columns I want to look at, so I can hardly write down for each case every possible combination.
Does anyone know how I can approach my problem in a better way?
Thanks in advance.
Edit:
A possible result could look like this:
+-----------+---------+
| in_common | count |
+-----------+---------+
| 0 | 100 |
| 1 | 500 |
| 2 | 1500 |
| 3 | 5000 |
| 4 | 3000 |
+-----------+---------+
With the codes as column names, you're going to have to write some code that explicitly references each column name. To keep that to a minimum, you could write those references in a single union statement that normalizes the data, such as:
select id, '10_1' where "10_1" = 1
union
select id, '10_2' where "10_2" = 1
union
select id, '11_1' where "11_1" = 1
union
select id, '11_2' where "11_2" = 1;
This needs to be modified to include whatever additional columns you need to link up different IDs. For the purpose of this illustration, I assume the following data model
create table p (
id integer not null primary key,
sex character(1) not null,
age integer not null
);
create table t1 (
id integer not null,
code character varying(4) not null,
constraint pk_t1 primary key (id, code)
);
Though your data evidently does not currently resemble this structure, normalizing your data into a form like this would allow you to apply the following solution to summarize your data in the desired form.
select
in_common,
count(*) as count
from (
select
count(*) as in_common
from (
select
a.id as a_id, a.code,
b.id as b_id, b.code
from
(select p.*, t1.code
from p left join t1 on p.id=t1.id
) as a
inner join (select p.*, t1.code
from p left join t1 on p.id=t1.id
) as b on b.sex <> a.sex and b.age between a.age-10 and a.age+10
where
a.id < b.id
and a.code = b.code
) as c
group by
a_id, b_id
) as summ
group by
in_common;
The proposed solution requires first to take one step back from the cross-join table, as the identical column names are super annoying. Instead, we take the ids from the two tables and put them in a temporary table. The following query gets the result wanted in the question. It assumes table_a and table_b from the question are the same and called tbl, but this assumption is not needed and tbl can be replaced by table_a and table_b in the two sub-SELECT queries. It looks complicated and uses the JSON trick to flatten the columns, but it works here:
WITH idtable AS (
SELECT a.id as id_1, b.id as id_2 FROM
-- put cross join of table a and table b here
)
SELECT in_common,
count(*)
FROM
(SELECT idtable.*,
sum(CASE
WHEN meltedR.value::text=meltedL.value::text THEN 1
ELSE 0
END) AS in_common
FROM idtable
JOIN
(SELECT tbl.id,
b.*
FROM tbl, -- change here to table_a
json_each(row_to_json(tbl)) b -- and here too
WHERE KEY<>'id' ) meltedL ON (idtable.id_1 = meltedL.id)
JOIN
(SELECT tbl.id,
b.*
FROM tbl, -- change here to table_b
json_each(row_to_json(tbl)) b -- and here too
WHERE KEY<>'id' ) meltedR ON (idtable.id_2 = meltedR.id
AND meltedL.key = meltedR.key)
GROUP BY idtable.id_1,
idtable.id_2) tt
GROUP BY in_common ORDER BY in_common;
The output here looks like this:
in_common | count
-----------+-------
2 | 2
3 | 1
4 | 1
(3 rows)

How to query 2 tables in sql server with many to many relationship to identify differences

I have two tables with a many to many relationship and I am trying to merge the 2 tables in a select statement. I want to see all of the records from both tables, but only match 1 record from table A to 1 record to table b, so null values are ok.
For example table A has 20 records that match only 15 records from table B. I want to see all 20 records, the 5 that are unable to be matched can show null.
Table 1
Something | Code#
apple | 75
pizza | 75
orange | 6
Ball | 75
green | 4
red | 6
Table 2
date | id#
Feb-15 | 75
Feb-11 | 75
Jan-10 | 6
Apr-08 | 4
The result I need is
Something | Date | Code# | ID#
apple | Feb-15 | 75 | 75
pizza | Feb-11 | 75 | 75
orange | Jan-10 | 6 | 6
Ball | NULL | 75 | NULL
green | Apr-08 | 4 | 4
red | NULL | 6 | NULL
I'm imagining something like this. You want to pair of the rows side by side but one side is going to have more than the others.
select * /* change to whatever you need */
from
(
select *, row_number() over (partition by "code#" order by "something") as rn
from tableA
) as a
full outer join /* sounds like maybe left outer join will work too */
(
select *, row_number() over (partition by "id#" order by "date" desc) as rn
from tableB
) as b
on b."id#" = a."code#" and b.rn = a.rn
Actually I don't know how you're going to get "ball" to comes after "apple" and "pizza" without some other column to sort on. Rows in SQL tables don't have any ordering and you can't rely on the default listing from select *... or assume that the order of insertion is significant.
A regular Left-join should do it for you.
select tableA.*
, tableB.*
from tableA
left join tableB
on tableB.PrimaryKey = tableA.PrimaryKey
we would need to see the table structure to tell you for sure, but essentially you join on the full key (if possible)
SELECT * FROM TABLEA A
JOIN TABLEB B ON
A.FULLKEY = B.FULLKEY
Left outer join
Question changed
Make that a full outer join
select table1.*, table2.*
from table1
full outer join table2
on table1.Code# = table2.id#
This is probably not a true many to many but I think this is what you are asking for

Remove partial duplicates sql server

I am altering an existing view within SQL Server. My union statement creates something along the lines of:
Col1 | C2 | C3 | C4
-----|----|------|-----
1 A | B | NULL | NULL
2 A | B | C | NULL
3 A | B | C | D
4 E | F | NULL | NULL
5 E | F | G | NULL
However, I only want (in this scenario) rows 3 and 5 (I need to ommit one and two because they contain duplicate info - columns one, two, and three contain the same info as row three, but the third row is the most 'complete'). Row 5 for the same reason vs row 4.
Is this an outer join / intersect issue? How the heck do you create a view in this manner?
Assuming that Col1 is not NULL, then we can use ROW_NUMBER with order by on all 4 columns total value
; with cte
AS
(
select ROW_NUMBER() over ( partition by col1 order by (coalesce(Col1,'')+
coalesce([C2],'') +
coalesce([C3],'') +
coalesce([C4],'') ) desc) as seq,
*
FROM Table1
)
select * from cte
where seq =1

Distinct Values Ignoring Column Order

I have a table similar to:-
+----+---+---+
| Id | A | B |
+----+---+---+
| 1 | 1 | 2 |
+----+---+---+
| 2 | 2 | 1 |
+----+---+---+
| 3 | 3 | 4 |
+----+---+---+
| 4 | 0 | 5 |
+----+---+---+
| 5 | 5 | 0 |
+----+---+---+
I want to remove all duplicate pairs of values, regardless of which column contains which value, e.g. after whatever the query might be I want to see:-
+----+---+---+
| Id | A | B |
+----+---+---+
| 1 | 1 | 2 |
+----+---+---+
| 3 | 3 | 4 |
+----+---+---+
| 4 | 0 | 5 |
+----+---+---+
I'd like to find a solution in Microsoft SQL Server (has to work in <= 2005, though I'd be interested in any solutions which rely upon >= 2008 features regardless).
In addition, note that A and B are going to be in the range 1-100 (but that's not guaranteed forever. They are surrogate seeded integer foreign keys, however the foreign table might grow to a couple hundred rows max).
I'm wondering whether I'm missing some obvious solution here. The ones which have occurred all seem rather overwrought, though I do think they'd probably work, e.g.:-
Have a subquery return a bitfield with each bit corresponding to one of the ids and use this value to remove duplicates.
Somehow, pivot, remove duplicates, then unpivot. Likely to be tricky.
Thanks in advance!
Test data and sample below.
Basically, we do a self join with an OR criteria so either a=a and b=b OR a=b and b=a.
The WHERE in the subquery gives you the max for each pair to eliminate.
I think this should work for triplicates as well (note I added a 6th row).
DECLARE #t table(id int, a int, b int)
INSERT INTO #t
VALUES
(1,1,2),
(2,2,1),
(3,3,4),
(4,0,5),
(5,5,0),
(6,5,0)
SELECT *
FROM #t
WHERE id NOT IN (
SELECT a.id
FROM #t a
INNER JOIN #t b
ON (a.a=b.a
AND a.b=b.b)
OR
(a.b=b.a
AND a.a = b.b)
WHERE a.id > b.id)
Try:
select min(Id) Id, A, B
from (select Id, A, B from DuplicatesTable where A <= B
union all
select Id, B A, A B from DuplicatesTable where A > B) v
group by A, B
order by 1
Not 100% tested and I'm sure it can be tidied up but it produces your required result:
DECLARE #T TABLE (id INT IDENTITY(1,1), A INT, B INT)
INSERT INTO #T
VALUES (1,2), (2,1), (3,4), (0,5), (5,0);
SELECT *
FROM #T
WHERE id IN (SELECT DISTINCT MIN(id)
FROM (SELECT id, a, b
FROM #T
UNION ALL
SELECT id, b, a
FROM #T) z
GROUP BY a, b)

Postgresql: Insert the cartesian product of two or more sets

as definition: The cartesian product of two sets is the set of all possible pairs of these sets, so {A,B} x {a,b} = {(A,a),(A,b),(B,a),(B,b)}.
Now i want to insert such a cartesian product into a database table (each pair as a row). It is intended to fill the table with default values for each pair, so the data, i.e. the two sets, are not present in the database at this point.
Any idea how to achieve this with postgresql?
EDIT :
With the help of Grzegorz Szpetkowski's answer I was able to produce a query that does what I want to achieve, but it really isn't the prettiest one. Suppose I want to insert the cartesian product of the sets {1,2,3} and {'A','B','C'}.
INSERT INTO "Test"
SELECT * FROM
(SELECT 1 UNION SELECT 2 UNION SELECT 3) P
CROSS JOIN
(SELECT 'A' UNION SELECT 'B' UNION SELECT 'C') Q
Is there any better way to do this?
EDIT2 :
Accepted answer is fine, but i found another version which might be appropriate if it gets more complex:
CREATE TEMP TABLE "Numbers" (ID integer) ON COMMIT DROP;
CREATE TEMP TABLE "Chars" (Char character varying) ON COMMIT DROP;
INSERT INTO "Numbers" (ID) VALUES (1),(2),(3);
INSERT INTO "Chars" (Char) VALUES ('A'),('B'),('C');
INSERT INTO "Test"
SELECT * FROM
"Numbers"
CROSS JOIN
"Chars";
I am not sure if this really answers your question, but in PostgreSQL there is CROSS JOIN defined as:
For every possible combination of rows from T1 and T2 (i.e., a
Cartesian product), the joined table will contain a row consisting of
all columns in T1 followed by all columns in T2. If the tables have N
and M rows respectively, the joined table will have N * M rows.
FROM T1 CROSS JOIN T2 is equivalent to FROM T1, T2. It is also
equivalent to FROM T1 INNER JOIN T2 ON TRUE (see below).
EDIT:
One way is to use VALUES Lists (note that in fact you have no order, use ORDER BY clause to get some ordering):
SELECT N AS number, L AS letter FROM
(VALUES (1), (2), (3)) a(N)
CROSS JOIN
(VALUES ('A'), ('B'), ('C')) b(L);
Result:
number | letter
--------+--------
1 | A
1 | B
1 | C
2 | A
2 | B
2 | C
3 | A
3 | B
3 | C
(9 rows)
BTW:
For more numbers I believe it's handle to use generate_series function, e.g.:
SELECT n AS number, chr(ascii('A') + L - 1) AS letter
FROM
generate_series(1, 5) N
CROSS JOIN
generate_series(1, 5) L
ORDER BY N, L;
Result:
number | letter
--------+--------
1 | A
1 | B
1 | C
1 | D
1 | E
2 | A
2 | B
2 | C
2 | D
2 | E
3 | A
3 | B
3 | C
3 | D
3 | E
4 | A
4 | B
4 | C
4 | D
4 | E
5 | A
5 | B
5 | C
5 | D
5 | E
(25 rows)