SELECT values to create a unique row with optional foreign key - sql

I have several tables in a SQL Server database, two of them (Table1 and Table2) I would like to select a specific subset from, to fill in a third table (Table3).
In Table1 there are 25 columns, from which I am only interested in three, let's call them Col1, Col2 and Col3. All three are not unique in this table, but I would like to extract the unique pairs, as follows:
Col1 + Col2 = Unique Key for Table3.
Col3 + Col2 = Optional, foreign key into Table2.
To extract the unique keys for Table3 from Table1 the following SQL works fine:
SELECT Col1, Col2
FROM Table1
GROUP BY Col1, Col2
However this is missing Col3. The first problem is that Col3 can't simply be added as part of the GROUP BY as there can be different values for it, which causes duplicate combinations of Col1 + Col2 to be returned.
This is where Table2 comes into play; Col3 + Col2 form a unique key into Table2, but not every combination is present (which is helpful), as a JOIN can be used to filter away the invalid combinations:
SELECT a.Col1, a.Col2, a.Col3
FROM Table1 a
JOIN Table2 b ON b.Col3 = a.Col3 AND b.Col2 = a.Col2
GROUP BY a.Col1, a.Col2, a.Col3
Now my final problem, unfortunately there are some (very few) combinations that do result in duplicate Col1 + Col2 keys for Table3.
If we assume it is okay to lose some Col3 values, how can I write a SELECT to extract the three columns, ensuring that the combination Col1 + Col2 is unique? And if possible keeping a Col3 value that provides a valid key combination in Table2.
I've messed about with adding TOP 1 but I've failed in getting anything to work to my liking...
EDIT: Example data as requested.
Table1
| Col1 | Col2 | Col3 |
| 100 | 00 | 010 |
| 100 | 10 | 020 |
| 200 | 00 | 030 |
| 300 | 00 | 040 |
| 300 | 00 | 040 |
| 400 | 10 | 050 |
| 400 | 10 | 060 |
| 400 | 10 | 070 |
Table2
| Colx | Col2 | Col3 |
| car | 00 | 010 |
| cat | 10 | 030 |
| dog | 00 | 040 |
| bee | 10 | 040 |
| eye | 10 | 060 |
| bit | 10 | 070 |
Table3
| Col1 | Col2 | Col3 |
| 100 | 00 | 010 |
| 100 | 10 | 020 |
| 200 | 00 | 030 |
| 300 | 00 | 040 |
| 400 | 00 | 060 |
The third table shows the result I am looking for - the table only contains unique combinations of Col1 + Col2 and also contains a Col3 values, preferably one that provides a value combination with Col2 in the second table (ie. the last entry, 400, 00, 060).
I hope this provides a little more clarity.

Maybe this way?
SELECT a.Col1, a.Col2, Max(a.Col3)
FROM Table1 a
LEFT JOIN Table2 b ON b.Col3 = a.Col3 AND b.Col2 = a.Col2
GROUP BY a.Col1, a.Col2

"The first problem is that Col3 can't simply be added as part of the GROUP BY as there can be different values for it, which causes duplicate combinations of Col1 + Col2 to be returned."
You can put the queries for different values into a select union subquery as a derived table, and group by on the derived table.

Related

Insert into table, in 2 phases

I have two tables: table1 and table2 (The tables are almost identical, table2 has an extra field. 30 columns in table1, and 31 columns in table2. the extra column is a key).
I also have a procedure, which gets a number at first. If the number is above 10, I want to isnert the row in all the 30 columns from table1 to table2. Otherwise, I want to insert columns 1 to 20 (from table1), and insert columns 20-30 multiplied by 30. I have created two different "INSERT INTO TABLE" for each situation (above/under 10), but I belive there is more efficient way, since the first 20 rows should be the same in every case. I thought to insert first the first 20 rows, and after enter "IF" statement and then 'insert' according to the given parameter the remain columns. But ofcourse I'm getting two rows instead of one.
what is the solution, so I will insert all the data into one row?
Here is an example with 10 columns (instead of 30). In this example, if the paramter is above 10, we'll insert the row as is into table2. otherwise, we will insert col1-col7, and multiply col8-col10.
parameter = 15
table1
Col1 | Col2 | Col3 | Col4 | Col5 | Col6 | Col7 | Col8 | Col9 |Col10 |
======+======+======+======+======+======+======+======+======+======+
1 | 1 | 1 | 2 | 2 | 2 | 2 | 5 | 5 | 5 |
table2 (Identical to table 1, because the parameter > 10 )
Col1 | Col2 | Col3 | Col4 | Col5 | Col6 | Col7 | Col8 | Col9 |Col10 |
======+======+======+======+======+======+======+======+======+======+
1 | 1 | 1 | 2 | 2 | 2 | 2 | 5 | 5 | 5 |
If the parameter was parameter = 3 , then table two was:
table2 (columns 8-10 multiplied)
Col1 | Col2 | Col3 | Col4 | Col5 | Col6 | Col7 | Col8 | Col9 |Col10 |
======+======+======+======+======+======+======+======+======+======+
1 | 1 | 1 | 2 | 2 | 2 | 2 | 150 | 150 | 150 |
A template to my code:
if #Parameter >10
begin
INSERT INTO Table1
(Col1
,Col2
,Col3
...
,Col29
,Col30)
SELECT
Col1
,Col2
,Col3
...
,Col29
,Col30
FROM ...
wHERE ...
end
else
begin
INSERT INTO Table1
(Col1
,Col2
,Col3
...
,Col29
,Col30)
SELECT
Col1
,Col2
,Col3
...
,Col29
,Col30
FROM ...
wHERE ...
end
Right now I have more then 120 lines, when 2/3 from them are duplicated.
How can I make it more efficient?
As far as I could your problem you could use,INSERT ALL COMMAND
FOR eg:-
INSERT ALL
WHEN number>10 THEN
INTO table1 VALUES(col1,col2,col3)
INTO table1 VALUES(col1,col2,col3)
WHEN number<10
INTO table1 VALUES(col1,col2,col3)
INTO table1 VALUES(col1,col2,col3)
select * from dual
use condition as per your requirement

sql table comparisons - postgres

I have two tables with some columns being the same:
TABLE A
| Col1 | Col2 | Col3 |
+------+------+------+
| 1 | aa | ccc |
| 2 | null | ccc |
| null | bb | null |
TABLE B
|Col1 | Col2 | Col3| Col4 |
+------+-------+-----+------+
| 1 | aa | ccc | aaaa |
| 2 | null | ccc | cccc |
| null | bb | null | sss |
| 4 | bb | null | ddd |
I'd like to return the following:
|Col1 | Col2 | Col3| Col4 |
+------+-------+-----+------+
| 4 | bb | null | ddd |
How do I check what rows from table B are in table A and also return Col4 (from table B) where they match in the query.
I was using EXCEPT which worked great but now I need to have the outputs of Col4 in the returned query results.
Thanks.
Something like this?
SELECT Col1, Col2, Col3, Col4
FROM TableB
WHERE NOT EXISTS (
SELECT 1
FROM TableA
WHERE TableA.Col1 IS NOT DISTINCT FROM TableB.Col1
AND TableA.Col2 IS NOT DISTINCT FROM TableB.Col2
AND TableA.Col3 IS NOT DISTINCT FROM TableB.Col3
)
(Using IS NOT DISTINCT FROM to say that columns with null are equal to each other.)
The answer to your question is very easy: none of the rows from table 'A' are in table 'B'.
Those are separate tables and they have separate rows.
Now, if you want to find 'a row in table B, that has similar values in columns as a specific row in table A, you may do following:
select a.*,
case when b.ctid is null then 'I AM A VERY SAD PENGUIN AND ROW IN B WAS NOT FOUND :('
else b.col4 end as col4
from table_a a
left join table_b b on
((a.col1 = b.col1 or (a.col1 is null and b.col1 is null)) and
(a.col2 = b.col2 or (a.col2 is null and b.col2 is null)) and
(a.col3 = b.col3 or (a.col3 is null and b.col3 is null))
)
I assume that by 'similar' you would mean "if null appears in columns in both tables, the rows are still similar.
Notice the last insert into table_b i added - you are not guaranteed to find unique values of col4!:
dbfiddle, modified version of VBoka's answer
If you have no duplicates within the b table, then the following handles NULL values rather elegantly:
select col1, col2, col3, max(col4)
from ((select coll, col2, col3, null as col4, 'a' as which
from a
) union all
(select coll, col2, col3, col4, 'b' as which
from b
)
) b
group by col1, col2, col3
having min(which) = 'b';

SQL Server select column names from multiple tables

I have three tables in SQL Server with following structure:
col1 col2 a1 a2 ... an,
col1 col2 b1 b2 ... bn,
col1 col2 c1 c2 ... cn
The two first records are the same, col1 and col2, however the tables have different lengths.
I need to select the column names of the tables and the result I'm trying to achieve is the followig:
col1, col2, a1, b1, c1, a2, b2, c2 ...
Is there a way to do it?
It's possible but result's is combined into single column of three table tables.
For example
SELECT A.col1 +'/' +B.col1 +'/' + C.col1 As Col1 ,
A.col2 +'/' +B.col2 +'/' + C.col2 As col2 ,a1, b1, c1, a2, b2, c2 ,
* FROM A
INNER JOIN B
ON A.ID =B.ID
INNER JOIN C
ON C.ID = B.ID
SQL-Server is not the right tool to create a generic resultset. The engine needs to know what's coming out in advance. Well, you might try to find a solution with dynamic SQL...
I want to suggest two different approaches.
Both would work with any number of tables, as long as all of them have the columns col1 and col2 with appropriate types.
Let's create a simple mokcup scenario before:
DECLARE #mockup1 TABLE(col1 INT,col2 INT,SomeMore1 VARCHAR(100),SomeMore2 VARCHAR(100));
INSERT INTO #mockup1 VALUES(1,1,'blah 1.1','blub 1.1')
,(1,2,'blah 1.2','blub 1.2')
,(1,100,'not in t2','not in t2');
DECLARE #mockup2 TABLE(col1 INT,col2 INT,OtherType1 INT,OtherType2 DATETIME);
INSERT INTO #mockup2 VALUES(1,1,101,GETDATE())
,(1,2,102,GETDATE()+1)
,(1,200,200,GETDATE()+200);
--You can add as many tables as you need
A very pragmatic approach:
Try this simple FULL OUTER JOIN:
SELECT *
FROM #mockup1 m1
FULL OUTER JOIN #mockup2 m2 ON m1.col1=m2.col1 AND m1.col2=m2.col2
--add more tables here
The result
+------+------+-----------+-----------+------+------+------------+-------------------------+
| col1 | col2 | SomeMore1 | SomeMore2 | col1 | col2 | OtherType1 | OtherType2 |
+------+------+-----------+-----------+------+------+------------+-------------------------+
| 1 | 1 | blah 1.1 | blub 1.1 | 1 | 1 | 101 | 2019-03-08 10:53:20.257 |
+------+------+-----------+-----------+------+------+------------+-------------------------+
| 1 | 2 | blah 1.2 | blub 1.2 | 1 | 2 | 102 | 2019-03-09 10:53:20.257 |
+------+------+-----------+-----------+------+------+------------+-------------------------+
| 1 | 100 | not in t2 | not in t2 | NULL | NULL | NULL | NULL |
+------+------+-----------+-----------+------+------+------------+-------------------------+
| NULL | NULL | NULL | NULL | 1 | 200 | 200 | 2019-09-24 10:53:20.257 |
+------+------+-----------+-----------+------+------+------------+-------------------------+
But you will have to deal with non-unique column names... (This is the moment, where a dynamically created statement can help).
A generic approach using container type XML
Whenever you do not know the result in advance, you can pack the result in a container. This allows a clear structure on the side of your RDBMS and shifts the troubles how to deal with this set to the consumer.
The cte will read all existing pairs of col1 and col2
Each table's row(s) for the pair of values is inserted as XML
Pairs not existing in any of the tables show up as NULL
Try this out
WITH AllDistinctCol1Col2Values AS
(
SELECT col1,col2 FROM #mockup1
UNION ALL
SELECT col1,col2 FROM #mockup2
--add all your tables here
)
SELECT col1,col2
,(SELECT * FROM #mockup1 x WHERE c1c2.col1=x.col1 AND c1c2.col2=x.col2 FOR XML PATH('row'),TYPE) AS Content1
,(SELECT * FROM #mockup2 x WHERE c1c2.col1=x.col1 AND c1c2.col2=x.col2 FOR XML PATH('row'),TYPE) AS Content2
FROM AllDistinctCol1Col2Values c1c2
GROUP BY col1,col2;
The result
+------+------+-----------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------+
| col1 | col2 | Content1 | Content2 |
+------+------+-----------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------+
| 1 | 1 | <row><col1>1</col1><col2>1</col2><SomeMore1>blah 1.1</SomeMore1><SomeMore2>blub 1.1</SomeMore2></row> | <row><col1>1</col1><col2>1</col2><OtherType1>101</OtherType1><OtherType2>2019-03-08T11:03:49.877</OtherType2></row> |
+------+------+-----------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------+
| 1 | 2 | <row><col1>1</col1><col2>2</col2><SomeMore1>blah 1.2</SomeMore1><SomeMore2>blub 1.2</SomeMore2></row> | <row><col1>1</col1><col2>2</col2><OtherType1>102</OtherType1><OtherType2>2019-03-09T11:03:49.877</OtherType2></row> |
+------+------+-----------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------+
| 1 | 100 | <row><col1>1</col1><col2>100</col2><SomeMore1>not in t2</SomeMore1><SomeMore2>not in t2</SomeMore2></row> | NULL |
+------+------+-----------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------+
| 1 | 200 | NULL | <row><col1>1</col1><col2>200</col2><OtherType1>200</OtherType1><OtherType2>2019-09-24T11:03:49.877</OtherType2></row> |
+------+------+-----------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------+

Oracle group by only ONE column

I have a table in Oracle database, which have 40 columns.
I know that if I want to do a group by query, all the columns in select must be in group by.
I simply just want to do:
select col1, col2, col3, col4, col5 from table group by col3
If I try:
select col1, col2, col3, col4, col5 from table group by col1, col2, col3, col4, col5
It does not give the required output.
I have searched this, but did not find any solution. All the queries that I found using some kind of Add() or count(*) function.
In Oracle is it not possible to simply group by one column ?
UPDATE:
My apologies, for not being clear enough.
My Table:
+--------+----------+-------------+-------+
| id | col1 | col2 | col3 |
+--------+----------+-------------+-------+
| 1 | 1 | some text 1 | 100 |
| 2 | 1 | some text 1 | 200 |
| 3 | 2 | some text 1 | 200 |
| 4 | 3 | some text 1 | 78 |
| 5 | 4 | some text 1 | 65 |
| 6 | 5 | some text 1 | 101 |
| 7 | 5 | some text 1 | 200 |
| 8 | 1 | some text 1 | 200 |
| 9 | 6 | some text 1 | 202 |
+--------+----------+-------------+-------+
and by running following query:
select col1, col2, col3 from table where col3='200' group by col1;
I will get the following desired Output:
+--------+----------+-------------+-------+
| id | col1 | col2 | col3 |
+--------+----------+-------------+-------+
| 2 | 1 | some text 1 | 200 |
| 3 | 2 | some text 1 | 200 |
| 7 | 5 | some text 1 | 200 |
+--------+----------+-------------+-------+
Long comment here;
Yeah, you can't do that. Think about it... If you have a table like so:
Col1 Col2 Col3
A A 1
B A 2
C A 3
And you're grouping by only Col2, which will group down to a single row... what happens to Col1 and Col3? Both of those have 3 distinct row values.
How is your DBMS supposed to display those?
Col1 Col2 Col3
A? A 1?
B? 2?
C? 3?
This is why you have to group by all columns, or otherwise aggregate or concatenate them. (SUM(),MAX(), MIN(), etc..)
Show us how you want the results to look and I'm sure we can help you.
Edit - Answer:
First off, thanks for updating your question. Your query doesn't have id but your expected results do, so I will answer for each separately.
Without id
You will still need to group by all columns to achieve what you're going for. Let's walk through it.
If you run your query without any group by:
select col1, col2, col3 from table where col3='200'
You will get this back:
+----------+-------------+-------+
| col1 | col2 | col3 |
+----------+-------------+-------+
| 1 | some text 1 | 200 |
| 2 | some text 1 | 200 |
| 5 | some text 1 | 200 |
| 1 | some text 1 | 200 |
+----------+-------------+-------+
So now you want to only see the col1 = 1 row once. But to do so, you need to roll all of the columns up, so your DBMS knows what do to with each of them. If you try to group by only col1, you DBMS will through an error because you didn't tell it what to do with the extra data in col2 and col3:
select col1, col2, col3 from table where col3='200' group by col1 --Errors
+----------+-------------+-------+
| col1 | col2 | col3 |
+----------+-------------+-------+
| 1 | some text 1 | 200 |
| 2 | some text 1 | 200 |
| 5 | some text 1 | 200 |
| ? | some text 1?| 200? |
+----------+-------------+-------+
If you group by all 3, your DBMS knows to group together the entire rows (which is what you want), and will only display duplicate rows once:
select col1, col2, col3 from table where col3='200' group by col1, col2, col3
+----------+-------------+-------+
| col1 | col2 | col3 |
+----------+-------------+-------+
| 1 | some text 1 | 200 |
| 2 | some text 1 | 200 | --Desired results
| 5 | some text 1 | 200 |
+----------+-------------+-------+
With id
If you want to see id, you will have to tell your DBMS which id to display. Even if we group by all columns, you won't get your desired results, because the id column will make each row distinct (They will no longer group together):
select id, col1, col2, col3 from table where col3='200' group by id, col1, col2, col3
+--------+----------+-------------+-------+
| id | col1 | col2 | col3 |
+--------+----------+-------------+-------+
| 2 | 1 | some text 1 | 200 | --id = 2
| 3 | 2 | some text 1 | 200 |
| 7 | 5 | some text 1 | 200 |
| 8 | 1 | some text 1 | 200 | --id = 8
+--------+----------+-------------+-------+
So in order to group these rows, we need to explicitly say what to do with the ids. Based on your desired results, you want to choose id = 2, which is the minimum id, so let's use MIN():
select MIN(id), col1, col2, col3 from table where col3='200' group by col1, col2, col3
--Note, MIN() is an aggregate function, so id need not be in the group by
Which returns your desired results (with id):
+--------+----------+-------------+-------+
| id | col1 | col2 | col3 |
+--------+----------+-------------+-------+
| 2 | 1 | some text 1 | 200 |
| 3 | 2 | some text 1 | 200 |
| 7 | 5 | some text 1 | 200 |
+--------+----------+-------------+-------+
Final thought
Here were your two trouble rows:
+--------+----------+-------------+-------+
| id | col1 | col2 | col3 |
+--------+----------+-------------+-------+
| 2 | 1 | some text 1 | 200 |
| 8 | 1 | some text 1 | 200 |
+--------+----------+-------------+-------+
Any time you hit these, just think about what you want each column to do, one at a time. You will need to handle all columns any time you do grouping or aggregates.
id, you only want to see id = 2, which is the MIN()
co1, you only want to see distinct values, so GROUP BY
col2, you only want to see distinct values, so GROUP BY
col3, you only want to see distinct values, so GROUP BY
maybe analytic functions is what you need
try smth like this:
select col1, col2, col3, col4, col5
, sum(*) over (partition by col1) as col1_summary
, count(*) over () as total_count
from t1
if you google the article - you find thousands on examples
for example this
Introduction to Analytic Functions (Part 1)
Why do you want to GROUP BY , wouldn't you want to ORDER BY instead?
If you state an English language version of the problem you are trying to solve (i.e. the requirements) it would be easier to be more specific.
I guess,maybe you need upivot function
or post your specific final result you want
select col3, col_group
from table
UNPIVOT ( col_group for value in ( col1,col2,col4,col5))
SELECT * FROM table
WHERE id IN (SELECT MIN(id) FROM table WHERE col3='200' GROUP BY col1)

how to find identical columns in different tables

I have two tables each with multiple columns.
+------+-------+--+
| Col1 | Col2 | |
+------+-------+--+
| 1 | 1231 | |
| 2 | 123 | |table 1
| 3 | 14124 | |
+------+-------+--+
+------+-------+--+
| Col3 | Col4 | |table 2
+------+-------+--+
| 1 | 1231 | |
| 2 | 323 | |
| 3 | 14324 | |
+------+-------+--+
I want to check is if col1 and col3 are identical. That is: all the values match, to be determined using sql?
I dont want to use except and also I don't want to take difference of the two columns and check if its zero.
Is there a more efficient way to do this?
Am I missing something? Can't you join the tables together and compare?
select
...
from table1
inner join table2
on table1.col1 = table2.col3
You can outer join them and filter on null results
select table_1.col1, table_2.col3
from table_1 full outer join table_2 on table_1.col1, table_2.col3
where table_1.col1 is null
or table_2.col3 is null
This will give you all records where one of the values of either column does not exist in the other table.