how to join two tables in sql server with out duplication

how to join two tables in sql server with out duplication - sql

Hi I have two tables A and B
Table A:
Order Pick up
100 Toronto
100 Mississauga
100 Scarborough
Table B
Order Drop off
100 Oakvile
100 Hamilton
100 Milton
Please let me know how can I can get this output (ie I just want to join the fields from in B in right hand side of A)
Order pickup Dropoff
100 Toronto oakvile
100 Mississauga Hamilton
100 Scarborough Milton
How can I write query for the same I try to join a.rownum = b.rownum but no luck.

As OP has not mention any RDBMS
I am taking the liberty for taking SQL SERVER 2008 as his RDBMS. If OP wants the following Query can be converted to any other RDBMS easily.
select A.[Order],
ROW_NUMBER() OVER(ORDER BY A.[Pick up]) rn1,
A.[Pick up]
into A1
FROM A
;
select B.[Order],
ROW_NUMBER() OVER(ORDER BY B.[Drop off]) rn2,
B.[Drop off]
into B1
FROM B
;
Select A1.[Order],
A1.[Pick up],
B1.[Drop off]
FROM A1
INNER JOIN B1 on A1.rn1=B1.rn2
SQL FIDDLE to Test

From the use rownum, I'm presuming that you are using Oracle. You can attempt the following:
select a.Order as "order", a.Pickup, b.DropOff
from (select a.*, rownum as seqnum
from a
) a join
(select b.*, rownum as seqnum
from b
) b
on a.order = b.order and a.seqnum = b.seqnum;
(This assumes that all orders match up exactly.)
I must emphasize that although this might seem to work (and it should work on small examples), it will not work in general. And, it will not work on data that has deleted records. And, it probably won't work on parallel systems. If you have a small amount of data, I'd suggest dumping it in Excel and doing the work there -- that way, you can see if the pairs make sense.
Also, if you do have a column that specifies the ordering, then basically the same structure will work:
select coalesce(a.Order, b.Order) as "order", a.Pickup, b.DropOff
from (select a.*,
row_number() over (partition by "order" order by <ordering field>) as seqnum
from a
) a join
(select b.*,
row_number() over (partition by "order" order by <ordering field>) as seqnum
from b
) b
on a.order = b.order and a.seqnum = b.seqnum;

I'd use a CTE along with the ROW_NUMBER windowing function.
WITH keyed_A AS (
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS id
,[Order]
,[Pick Up]
FROM A
), keyed_B AS (
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS id
,[Order]
,[Drop Off]
FROM B
)
SELECT
a.[Pick Up]
,b.[Drop Off]
FROM keyed_A AS a
INNER JOIN keyed_B AS b
ON a.id = b.id
;
The CTE can be thought of as a virtual table with an id that crosses the two tables. The OVER clause with the Windowing function ROW_NUMBER can be used to create an id in the CTE. Since we are relying on the physical storage of the records (not a good idea, please add keys to the tables) we can ORDER BY (SELECT NULL) which means just use the order in will be read in.
SQLFiddle to test

Related

Joining two tables at random

I have two tables, on with main data, and another shorter table with additional data.
I would like to join the rows from the shorter table to some of the rows of the main table, at random. For example:
main table:
id
data
1
apple
2
banana
3
cherry
4
date
5
elderberry
6
fig
secondary table:
id
data
1
accordion
2
banjo
Desired Result:
main
secondary
… ?
accordion
… ?
banjo
I can think of one way to do it, using a lot of pre-processing with CTEs:
WITH
cte1 AS (SELECT data FROM main ORDER BY random() LIMIT 2),
cte2 AS (SELECT row_number() OVER() AS row, data FROM cte1),
cte3 AS (SELECT row_number() OVER () AS row, data FROM secondary)
SELECT *
FROM cte2 JOIN cte3 ON cte2.row=cte3.row;
It works, but is there a more straightforward way of joining two tables at random?
I have attached a fiddle: https://dbfiddle.uk/?rdbms=postgres_13&fiddle=21af08976112c7ac7c18329fa3699b8c&hide=2

A CTE is basically just a re-usable template for a subquery.
So this can be golfcoded to using 2 subqueries.
SELECT m.rn, m.data main_data, s.data secondary_data
FROM (SELECT data, ROW_NUMBER() OVER (ORDER BY random()) rn FROM main) m
JOIN (SELECT data, ROW_NUMBER() OVER (ORDER BY random()) rn FROM secondary) s USING (rn)

I could rewrite it to this:
SELECT *
FROM (SELECT row_number() OVER (ORDER BY random()) as id,
data
FROM main
ORDER BY RANDOM()) m1
JOIN secondary s on s.id = m1.id
dbfiddle
Update: LIMIT is not needed after looking at #LukStorm's version
I assumed that you know which table is shorter so there is only one column with generated id's

Why are these two SQL queries so different in efficiency?

I have to use SQL for my internship and while I know the gist of it, I do not really have a background in programming nor do I know what makes codes efficient etc.
Query #1
SELECT DISTINCT
c.[STAT], c.[EVENT], f.[STAT], f.[EVENT]
FROM
(SELECT *
FROM
(SELECT
*,
ROW_NUMBER() OVER (PARTITION BY [ID] ORDER BY [PROCDT], [PROCTIME]) AS a
FROM
TABLE) AS b
) AS c
LEFT JOIN
(SELECT
*
FROM
(SELECT
*,
ROW_NUMBER() OVER (PARTITION BY [ID] ORDER BY [PROCDT], [PROCTIME]) AS d
FROM
TABLE) AS e
) AS f ON c.[ID] = f.[ID] AND a = d - 1
ORDER BY
c.[STAT], c.[EVENT], f.[STAT], f.[EVENT]
Query #2
SELECT DISTINCT
b.[STAT], b.[EVENT], d.[STAT], d.[EVENT]
FROM
(SELECT
*,
ROW_NUMBER() OVER (PARTITION BY [ID] ORDER BY [PROCDT], [PROCTIME]) AS a
FROM TABLE) AS b
LEFT JOIN
(SELECT
*,
ROW_NUMBER() OVER (PARTITION BY [ID] ORDER BY [PROCDT], [PROCTIME]) AS c
FROM TABLE) AS d ON b.[ID] = f.[ID] AND a = c - 1
ORDER BY
b.[STAT], b.[EVENT], d.[STAT], d.[EVENT]
Queries #1 and #2 return the same result, which is expected, but query #1 has a runtime of roughly 5 seconds while query #2 has a runtime of roughly 1 minute and 35 seconds. In other words, the second query takes a good 1.5 minutes longer to run than the first and I am really curious to know why.

The correct way to write this query uses lead(). I'm pretty sure the select distinct is not needed, so this does what you want:
SELECT stat, event,
LEAD(stat) OVER (PARTITION BY ID, ORDER BY PROCDT, PROCTIME) as next_stat,
LEAD(event) OVER (PARTITION BY ID, ORDER BY PROCDT, PROCTIME) as next_event
FROM TABLE t
ORDER BY stat, event;
The two queries you have written should be the same in SQL Server. Apparently, the extra subqueries are confusing the optimizer. You would need to learn about execution plans to understand this better.

Return two rows from SQL table with a difference in values [duplicate]

This question already has answers here:
How to request a random row in SQL?
(30 answers)
Closed 6 years ago.
iam trying to return 2 rows from table that have a difference in values, not being an SQL wise man i am stuck any help would be appreciated :-)
TABLE A:
NAME DATA
Oscar HOME1
Jens HOME2
Will HOME1
Jeremy HOME2
Al HOME1
Result, should be 2 random rows with a difference in DATA value
NAME DATA
Oscar HOME1
Jeremy HOME2
Anyone?

Easy way to have random data.
;with tblA as (
select name,data,
row_number() over(partition by data order by newid()) rn
from A
)
select name,data
from tblA
where rn = 1

Couuld be you need
select * from my_table a
inner join my_table b on a.data !=b.data
where a.data in ( SELECT data FROM my_table ORDER BY RAND() LIMIT 1);
For your code
SELECT *
FROM [dbo].[ComputerState] as a
INNER JOIN [dbo].[ComputerState] as b ON a.ServiceName != b.ServiceName
WHERE a.ServiceName IN (
SELECT top 1 [ServiceName] FROM [dbo].[ComputerState]
);

If the question is really this simple, you can use an aggregate such as MAX() or MIN() to grab one row for each different DATA:
SELECT MAX(NAME), DATA
FROM TABLE_A
GROUP BY DATA
Of course, if any other variables are introduced to the requirements, this may no longer work.

;WITH cteA AS (
SELECT
name
,data
,ROW_NUMBER() OVER (PARTITION BY data ORDER BY NEWID()) as DataRowNumber
,ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY NEWID()) as RandomRowNumber
FROM
A
)
SELECT *
FROM
cteA
WHERe
DataRowNumber = 1
AND RandomRowNumber <= 2
This Expands on #AlexKudryashev 's answer a little.
;with tblA as (
select name,data,
row_number() over(partition by data order by newid()) rn
from A
)
select name,data
from tblA
where rn = 1
The only issue with what he had Is that the number of Rows where rn = 1 will be depended on the COUNT(DISTINCT data) so it could lead to more than 2 results. To fix one could add a SELECT TOP 2 clause but it might not be fully random as results at that point as it will be dependent on the ordinal results of how SQL optimizes the query which is likely to be consistent. To get truly random add a second random row number and limit the results to the top 2 of those.

Amalgamating SQL queries stored as views together / Combining tables

I have several summary queries stored as Views...
...and would like to join them together into one combined output as follows:
..so I can use it as a pivot table in Excel.
Date is the only common denominator in the case.
I can do this in Excel using SUMIFS but would prefer to manage it in the SQL before it arrives in Excel.
Can anyone help?

Without a matching ID, the best I can think of is to order by ROW_NUMBER(), which gives a slightly verbose query;
WITH cte1 AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY DATE
ORDER BY CASE WHEN Dogs IS NULL THEN 1 END) r1
FROM View1
), cte2 AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY DATE
ORDER BY CASE WHEN Region IS NULL THEN 1 END) r2
FROM View2
), cte3 AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY DATE
ORDER BY CASE WHEN Bed IS NULL THEN 1 END) r3
FROM View3
)
SELECT COALESCE(cte1.Date, cte2.Date, cte3.Date) Date,
Dogs, D_Qty, Region, R_Qty, Bed, B_Qty
FROM cte1
FULL OUTER JOIN cte2
ON cte1.Date = cte2.Date AND r1=r2
FULL OUTER JOIN cte3
ON cte1.Date = cte3.Date AND r1=r3
OR cte2.Date = cte3.Date AND r2=r3
ORDER BY Date, COALESCE(r1,r2,r3)
An SQLfiddle to test with.
You may consider adding an order column to your views, using ROW_NUMBER() OVER (PARTITION BY DATE ORDER BY (whatever order is in them), that would eliminate all the cte's and give you a stable ordering of things.

if you can Add one more column in your view1 and view2 and view3 than you can solve your issue easily,
Check this

Invalid column name in SQL Server

I'm trying to add where condition to my select statement but I'm getting invalid column name exception.
SELECT "Ugly OLAP name" as "Value"
FROM OpenQuery( OLAP, 'OLAP Query')
But if I try to add:
WHERE "Value" > 0
you suggested that I have to use the original name and it works fine.
But what if I can't use the original column name as follow
SELECT
ROW_NUMBER() OVER(PARTITION BY P.ProviderID
ORDER BY T.PostedUTC DESC, T.TransactionID DESC) as RN
FROM
Provider p
INNER JOIN
Transaction T
WHERE
RN = 1
How can I access RN at my where ???

You need to use a CTE or a derived table:
SELECT *
FROM ( SELECT ROW_NUMBER() OVER(PARTITION BY P.ProviderID
ORDER BY T.PostedUTC DESC, T.TransactionID DESC) as RN,
[More Columns]
FROM Provider p
INNER JOIN Transaction T
ON SomeCondition) DT
where DT.RN = 1
Or
;WITH CTE AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY P.ProviderID
ORDER BY T.PostedUTC DESC, T.TransactionID DESC) as RN,
[More Columns]
FROM Provider p
INNER JOIN Transaction T
ON SomeCondition
)
SELECT *
FROM CTE
where RN = 1

You could do it this way:
WITH T AS (
SELECT ROW_NUMBER() OVER(PARTITION BY P.ProviderID ORDER BY T.PostedUTC DESC, T.TransactionID DESC) as RN
From Provider p Inner join Transaction T
)
SELECT RN
FROM T
WHERE RN > 0;
EDIT: Missed second query in the question...

you must repeat the original calculation

Another way to make this easier to understand is to rewrite your query as a series of CTE's - Common Table Expressions. They look and act like 'mini local views' where you can rename columns, etc. It's tough from the example you gave but often you can rewrite complex queries and return to columns with nicer names in subsequent queries.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

how to join two tables in sql server with out duplication - sql

Related

Joining two tables at random

Why are these two SQL queries so different in efficiency?

Return two rows from SQL table with a difference in values [duplicate]

Amalgamating SQL queries stored as views together / Combining tables

Invalid column name in SQL Server

Categories

Resources