Issue in joining 2 datasets

Issue in joining 2 datasets - sql

I have two datasets like below:
1:
+---------------------------+
| Id | Col1 | Col2 | Col3 |
+---------------------------+
| 1 | abc | 0 | 01/01/2010 |
| 2 | def | 10 | 10/10/2011 |
+---------------------------+
2:
+-------------------------------------------+
| Id | Col4 | Col5 | Col6 |
+-------------------------------------------+
| 1 | abc | 0 | 01/01/2010 |
| 5 | xyz | 12 | 5/6/2013 |
+-------------------------------------------+
Now I want to combine both these into a single dataset which shows something like this:
+----------------------------------------------------------------------+
| ID | Col1 | Col2 | Col3 | Col4 | Col5 | Col6 |
+----------------------------------------------------------------------+
| 1 | abc | 0 | 01/01/2010 | abc | 0 | 01/01/2010 |
| 2 | def | 10 | 10/10/2011 | null | null | null |
| 5 | null | null | null | xyz | 12 | 5/6/2013 |
+----------------------------------------------------------------------+
The issue is not all ids in dataset 1 are in dataset 2 and vice versa. What i need as all data from datasets1 and 2 and only the common from 1 and 2 with 2 transposed with 1 as shown above. I have used pipe as a separator.
An inputs are highly appreciated. i tried everything like full outer join, inner join , CTE etc - nothing is working.
CREATE TABLE #TEMP1 (ID INT, Col1 VARCHAR(100), Col2 INT, Col3 DATETIME)
CREATE TABLE #TEMP2 (ID INT, Col4 VARCHAR(100), Col5 INT, Col6 DATETIME)
INSERT INTO #TEMP1 VALUES (1,'abc',0,'1/1/2010')
INSERT INTO #TEMP1 VALUES (1,'def',0,'1/1/2010')
INSERT INTO #TEMP2 VALUES (1,'abc',0,'1/1/2010')
INSERT INTO #TEMP2 VALUES (1,'def',0,'1/1/2010')
SELECT DISTINCT A.ID,A.Col1,A.Col2,A.Col3,B.Col4,B.Col5,B.Col6
FROM #TEMP1 A
FULL OUTER JOIN #TEMP2 B ON A.ID = B.ID
Thanks.

Try using below SQL :
select t1.Id , Col1 , Col2 , Col3 , Col4 , Col5 , Col6
from temp1 t1 left join temp2 t2
on t1.Id=t2.Id
union
select t2.Id , Col1 , Col2 , Col3 , Col4 , Col5 , Col6
from temp1 t1 right join temp2 t2
on t1.Id=t2.Id
Also, i tried on fiddle for you :
http://sqlfiddle.com/#!2/d60a1e/5

Related

Insert into table, in 2 phases

I have two tables: table1 and table2 (The tables are almost identical, table2 has an extra field. 30 columns in table1, and 31 columns in table2. the extra column is a key).
I also have a procedure, which gets a number at first. If the number is above 10, I want to isnert the row in all the 30 columns from table1 to table2. Otherwise, I want to insert columns 1 to 20 (from table1), and insert columns 20-30 multiplied by 30. I have created two different "INSERT INTO TABLE" for each situation (above/under 10), but I belive there is more efficient way, since the first 20 rows should be the same in every case. I thought to insert first the first 20 rows, and after enter "IF" statement and then 'insert' according to the given parameter the remain columns. But ofcourse I'm getting two rows instead of one.
what is the solution, so I will insert all the data into one row?
Here is an example with 10 columns (instead of 30). In this example, if the paramter is above 10, we'll insert the row as is into table2. otherwise, we will insert col1-col7, and multiply col8-col10.
parameter = 15
table1
Col1 | Col2 | Col3 | Col4 | Col5 | Col6 | Col7 | Col8 | Col9 |Col10 |
======+======+======+======+======+======+======+======+======+======+
1 | 1 | 1 | 2 | 2 | 2 | 2 | 5 | 5 | 5 |
table2 (Identical to table 1, because the parameter > 10 )
Col1 | Col2 | Col3 | Col4 | Col5 | Col6 | Col7 | Col8 | Col9 |Col10 |
======+======+======+======+======+======+======+======+======+======+
1 | 1 | 1 | 2 | 2 | 2 | 2 | 5 | 5 | 5 |
If the parameter was parameter = 3 , then table two was:
table2 (columns 8-10 multiplied)
Col1 | Col2 | Col3 | Col4 | Col5 | Col6 | Col7 | Col8 | Col9 |Col10 |
======+======+======+======+======+======+======+======+======+======+
1 | 1 | 1 | 2 | 2 | 2 | 2 | 150 | 150 | 150 |
A template to my code:
if #Parameter >10
begin
INSERT INTO Table1
(Col1
,Col2
,Col3
...
,Col29
,Col30)
SELECT
Col1
,Col2
,Col3
...
,Col29
,Col30
FROM ...
wHERE ...
end
else
begin
INSERT INTO Table1
(Col1
,Col2
,Col3
...
,Col29
,Col30)
SELECT
Col1
,Col2
,Col3
...
,Col29
,Col30
FROM ...
wHERE ...
end
Right now I have more then 120 lines, when 2/3 from them are duplicated.
How can I make it more efficient?

As far as I could your problem you could use,INSERT ALL COMMAND
FOR eg:-
INSERT ALL
WHEN number>10 THEN
INTO table1 VALUES(col1,col2,col3)
INTO table1 VALUES(col1,col2,col3)
WHEN number<10
INTO table1 VALUES(col1,col2,col3)
INTO table1 VALUES(col1,col2,col3)
select * from dual
use condition as per your requirement

Translate table values to text following a fixed pattern

We use software to store combinations of financial elements. Those elements are allowed in certain combinations. Exceptions of these combinations are SQL-like statements in the front-end, and are saved as numerical values in a database table like the following example:
+------+------+------+------+------+
| Col1 | Col2 | Col3 | Col4 | Col5 |
+------+------+------+------+------+
| 1 | 2 | 4 | 5 | 1 |
+------+------+------+------+------+
| -1 | 2 | 6 | 4 | 5 |
+------+------+------+------+------+
| 1 | 2 | 5 | 7 | 1 |
+------+------+------+------+------+
I would like to translate those numerical values back to a SQL-statement like the following example:
+------+-----------+------+-----------+------+-----------+------+-----------+------+-----------+
| Col1 | Col1Trans | Col2 | Col2Trans | Col3 | Col3Trans | Col4 | Col4Trans | Col5 | Col5Trans |
+------+-----------+------+-----------+------+-----------+------+-----------+------+-----------+
| 1 | ( | 2 | SELECT | 4 | CODE | 5 | LIKE | 1 | * |
+------+-----------+------+-----------+------+-----------+------+-----------+------+-----------+
| -1 | | 2 | SELECT | 6 | NUMBER | 4 | = | 5 | AND |
+------+-----------+------+-----------+------+-----------+------+-----------+------+-----------+
| 1 | ( | 2 | SELECT | 5 | TOOL | 7 | <> | 1 | * |
+------+-----------+------+-----------+------+-----------+------+-----------+------+-----------+
The numerical values differ in each column so I can only imagine the use of a lot of case...when statements which I doubt will be efficiënt. I don't want to create tables to hold the translation values. Are there ways to do this with arrays?
Are there any code samples to easily loop through table/columns and translate the contents of it?

You can use below code and add more case statement as per the requirement.
SELECT Col1
,CASE
WHEN Col1 = 1 THEN '('
ELSE '' END AS Col1Trans
,Col2
,CASE
WHEN Col2 = 2 THEN 'SELECT'
END AS Col2Trans
,Col3
,CASE
WHEN Col3 = 4 THEN 'CODE'
WHEN Col3 = 6 THEN 'NUMBER'
WHEN Col3 = 5 THEN 'TOOL'
END AS Col3Trans
,Col4
,CASE
WHEN Col4 = 5 THEN 'LIKE'
WHEN Col4 = 4 THEN '='
WHEN Col4 = 7 THEN '<>'
END AS Col4Trans
,Col5
,CASE
WHEN Col5 = 1 THEN '*'
WHEN Col5 = 5 THEN 'AND'
END AS Col5Trans

The best way to avoid so many case when and decode and etc is to use with as clause as following:
With col1trans (value, translation) as
(Select 1, '(' from dual union all
Select -1, null from dual),
Col2trans (value, translation) as
(Select 2, 'SELECT' from dual)
..
... till col5trans
Select m.col1, t1.translation as col1trans,
.... till m.col5, t5.translation
From your_table m join col1trans t1 m.col1=t1.value
join col2trans t2 m.col2=t2.value
... till col5trans
Cheers!!

SQL Server select column names from multiple tables

I have three tables in SQL Server with following structure:
col1 col2 a1 a2 ... an,
col1 col2 b1 b2 ... bn,
col1 col2 c1 c2 ... cn
The two first records are the same, col1 and col2, however the tables have different lengths.
I need to select the column names of the tables and the result I'm trying to achieve is the followig:
col1, col2, a1, b1, c1, a2, b2, c2 ...
Is there a way to do it?

It's possible but result's is combined into single column of three table tables.
For example
SELECT A.col1 +'/' +B.col1 +'/' + C.col1 As Col1 ,
A.col2 +'/' +B.col2 +'/' + C.col2 As col2 ,a1, b1, c1, a2, b2, c2 ,
* FROM A
INNER JOIN B
ON A.ID =B.ID
INNER JOIN C
ON C.ID = B.ID

SQL-Server is not the right tool to create a generic resultset. The engine needs to know what's coming out in advance. Well, you might try to find a solution with dynamic SQL...
I want to suggest two different approaches.
Both would work with any number of tables, as long as all of them have the columns col1 and col2 with appropriate types.
Let's create a simple mokcup scenario before:
DECLARE #mockup1 TABLE(col1 INT,col2 INT,SomeMore1 VARCHAR(100),SomeMore2 VARCHAR(100));
INSERT INTO #mockup1 VALUES(1,1,'blah 1.1','blub 1.1')
,(1,2,'blah 1.2','blub 1.2')
,(1,100,'not in t2','not in t2');
DECLARE #mockup2 TABLE(col1 INT,col2 INT,OtherType1 INT,OtherType2 DATETIME);
INSERT INTO #mockup2 VALUES(1,1,101,GETDATE())
,(1,2,102,GETDATE()+1)
,(1,200,200,GETDATE()+200);
--You can add as many tables as you need
A very pragmatic approach:
Try this simple FULL OUTER JOIN:
SELECT *
FROM #mockup1 m1
FULL OUTER JOIN #mockup2 m2 ON m1.col1=m2.col1 AND m1.col2=m2.col2
--add more tables here
The result
+------+------+-----------+-----------+------+------+------------+-------------------------+
| col1 | col2 | SomeMore1 | SomeMore2 | col1 | col2 | OtherType1 | OtherType2 |
+------+------+-----------+-----------+------+------+------------+-------------------------+
| 1 | 1 | blah 1.1 | blub 1.1 | 1 | 1 | 101 | 2019-03-08 10:53:20.257 |
+------+------+-----------+-----------+------+------+------------+-------------------------+
| 1 | 2 | blah 1.2 | blub 1.2 | 1 | 2 | 102 | 2019-03-09 10:53:20.257 |
+------+------+-----------+-----------+------+------+------------+-------------------------+
| 1 | 100 | not in t2 | not in t2 | NULL | NULL | NULL | NULL |
+------+------+-----------+-----------+------+------+------------+-------------------------+
| NULL | NULL | NULL | NULL | 1 | 200 | 200 | 2019-09-24 10:53:20.257 |
+------+------+-----------+-----------+------+------+------------+-------------------------+
But you will have to deal with non-unique column names... (This is the moment, where a dynamically created statement can help).
A generic approach using container type XML
Whenever you do not know the result in advance, you can pack the result in a container. This allows a clear structure on the side of your RDBMS and shifts the troubles how to deal with this set to the consumer.
The cte will read all existing pairs of col1 and col2
Each table's row(s) for the pair of values is inserted as XML
Pairs not existing in any of the tables show up as NULL
Try this out
WITH AllDistinctCol1Col2Values AS
(
SELECT col1,col2 FROM #mockup1
UNION ALL
SELECT col1,col2 FROM #mockup2
--add all your tables here
)
SELECT col1,col2
,(SELECT * FROM #mockup1 x WHERE c1c2.col1=x.col1 AND c1c2.col2=x.col2 FOR XML PATH('row'),TYPE) AS Content1
,(SELECT * FROM #mockup2 x WHERE c1c2.col1=x.col1 AND c1c2.col2=x.col2 FOR XML PATH('row'),TYPE) AS Content2
FROM AllDistinctCol1Col2Values c1c2
GROUP BY col1,col2;
The result
+------+------+-----------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------+
| col1 | col2 | Content1 | Content2 |
+------+------+-----------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------+
| 1 | 1 | <row><col1>1</col1><col2>1</col2><SomeMore1>blah 1.1</SomeMore1><SomeMore2>blub 1.1</SomeMore2></row> | <row><col1>1</col1><col2>1</col2><OtherType1>101</OtherType1><OtherType2>2019-03-08T11:03:49.877</OtherType2></row> |
+------+------+-----------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------+
| 1 | 2 | <row><col1>1</col1><col2>2</col2><SomeMore1>blah 1.2</SomeMore1><SomeMore2>blub 1.2</SomeMore2></row> | <row><col1>1</col1><col2>2</col2><OtherType1>102</OtherType1><OtherType2>2019-03-09T11:03:49.877</OtherType2></row> |
+------+------+-----------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------+
| 1 | 100 | <row><col1>1</col1><col2>100</col2><SomeMore1>not in t2</SomeMore1><SomeMore2>not in t2</SomeMore2></row> | NULL |
+------+------+-----------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------+
| 1 | 200 | NULL | <row><col1>1</col1><col2>200</col2><OtherType1>200</OtherType1><OtherType2>2019-09-24T11:03:49.877</OtherType2></row> |
+------+------+-----------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------+

Exclude rows with the same values in some columns

I have the table like following:
id | col1 | col2 | col3 | col4
---+------+------+--------+-----------
1 | abc | 23 | data1 | otherdata1
2 | def | 41 | data2 | otherdata2
3 | ghi | 41 | data3 | otherdata3
4 | jkl | 58 | data4 | otherdata4
5 | mno | 23 | data1 | otherdata5
6 | pqr | 41 | data3 | otherdata6
7 | stu | 76 | data2 | otherdata7
How can I fast select rows where col2+col3 doesn't have duplicates? There is over 15 millions of rows in the table, so join may be not suitable.
Final result should look like this:
id | col1 | col2 | col3 | col4
---+------+------+--------+-----------
2 | def | 41 | data2 | otherdata2
4 | jkl | 58 | data4 | otherdata4
7 | stu | 76 | data2 | otherdata7

Not sure how fast this will be, but this should work:
select id, col1, col2, col3, col4
from (
select id, col1, col2, col3, col4,
count(*) over (partition by col2, col3) as cnt
from the_table
) t
where cnt = 1
order by id;

Window functions are definitely one possibility. But, if you care about performance, it is also worth trying another approach and comparing the speed.
NOT EXISTS comes to mind:
select t.*
from table t
where not exists (select 1
from table t2
where t2.col2 = t.col2 and t2.col3 = t.col3 and
t2.id <> t.id
);
This can take advantage of an index on table(col2, col3).

Try this as well..
select * from
(
select id,col1,col2,col3,col4
,row_number() over (partition by col2,col3 order by col2,col3 desc ) as rnm
from
table
) x where rnm =1;

TSQL select the from two rows that has higher priority and is not null

I try to consolidate two rows of the same table whereas each row has a priority.
The value of interest is the value having priority 1 if it is not NULL; otherwise the value with priority 0.
An example data source could be:
| Id | GroupId | Priority | Col1 | Col2 | Col3 | ... | Coln |
-----------------------------------------------------------------
| 1 | 1 | 0 | NULL | 4711 | 3.41 | ... | f00 |
| 2 | 1 | 1 | NULL | NULL | 2.83 | ... | bar |
| 3 | 2 | 0 | NULL | 4711 | 3.41 | ... | f00 |
| 4 | 2 | 1 | 23 | NULL | 2.83 | ... | NULL |
and I want to have:
| GroupId | Col1 | Col2 | Col3 | ... | Coln |
-------------------------------------------------
| 1 | NULL | 4711 | 2.83 | ... | bar |
| 2 | 23 | 4711 | 2.83 | ... | f00 |
Is there a generic way in TSQL without the need to check each column explicitly?
SELECT
t1.GroupId,
ISNULL(t2.Col1, t1.Col1) as Col1,
ISNULL(t2.Col2, t1.Col2) as Col2,
ISNULL(t2.Col3, t1.Col3) as Col3,
...
ISNULL(t2.Coln, t1.Coln) as Coln
FROM mytable t1
JOIN mytable t2 ON t1.GroupId = t2.GroupId
WHERE
t1.Priority = 0 AND
t2.Priority = 1
Regards

I'll elaborate the ROW_NUMBER() solution that #KM suggested since IMO it's the best solution for this. (In CTE form for easier readability)
WITH cte AS (
SELECT
t1.GroupId,
t1.Col1,
t1.Col2,
ROW_NUMBER() OVER(PARTITION BY t1.GroupId ORDER BY ISNULL(GroupId ,-1) ) AS [row_id]
FROM
mytable t1
)
SELECT
*
FROM
cte
WHERE
row_id = 1
That will give you the row with the highest priority (according to your rules) for each GroupId in mytable.
ROW_NUMBER and RANK are two of my favorite TSQL tricks. http://msdn.microsoft.com/en-us/library/ms186734.aspx
edit: Another favorite of mine is PIVOT/UNPIVOT which you can use to transpose rows/columns which is another way of going about this type of problem. http://msdn.microsoft.com/en-us/library/ms177410.aspx

I think this would do what you are asking for without using isnull for every column
select
*
from
mytable t1
where
priority=(select max(priority) from mytable where groupid=t1.groupid group by groupid)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Issue in joining 2 datasets - sql

Related

Insert into table, in 2 phases

Translate table values to text following a fixed pattern

SQL Server select column names from multiple tables

Exclude rows with the same values in some columns

TSQL select the from two rows that has higher priority and is not null

Categories

Resources