ORDER BY with a UNION of disparate datasets (T-SQL)

ORDER BY with a UNION of disparate datasets (T-SQL) - sql

I have a query that UNION's two somewhat similar datasets, but they both have some columns that are not present in the other (i.e., the columns have NULL values in the resulting UNION.)
The problem is, I need to ORDER the resulting data using those columns that only exist in one or the other set, to get the data in a friendly format for the software-side.
For example: Table1 has fields ID, Cat, Price. Table2 has fields ID, Name, Abbrv. The ID field is common between the two tables.
My query looks like something like this:
SELECT t1.ID, t1.Cat, t1.Price, NULL as Name, NULL as Abbrv FROM t1
UNION
SELECT t2.ID, NULL as Cat, NULL as Price, t2.Name, t2.Abbrv FROM t2
ORDER BY Price DESC, Abbrv ASC
The ORDER BY is where I'm stuck. The data looks like this:
100 Balls 1.53
200 Bubbles 1.24
100 RedBall 101RB
100 BlueBall 102BB
200 RedWand 201RW
200 BlueWand 202BW
...but I want it to look like this:
100 Balls 1.53
100 RedBall 101RB
100 BlueBall 102BB
200 Bubbles 1.24
200 RedWand 201RW
200 BlueWand 202BW
I'm hoping this can be done in T-SQL.

Select ID, Cat, Price, Name, Abbrv
From
(SELECT t1.ID, t1.Cat, t1.Price, t1.Price AS SortPrice, NULL as Name, NULL as Abbrv
FROM t1
UNION
SELECT t2.ID, NULL as Cat, NULL as Price, t1.Price as SortPrice, t2.Name, t2.Abbrv
FROM t2
inner join t1 on t2.id = t1.id
) t3
ORDER BY SortPrice DESC, Abbrv ASC
Somehow you have to know the data in table 2 are linked to table 1 and share the price. Since the Null in abbrv will come first, there is no need to create a SortAbbrv column.

You should use UNION ALL instead of UNION to save the cost of duplicate checking.
SELECT *
FROM
(
SELECT t1.ID, t1.Cat, t1.Price, NULL as Name, NULL as Abbrv FROM t1
UNION ALL
SELECT t2.ID, NULL as Cat, NULL as Price, t2.Name, t2.Abbrv FROM t2
) as sub
ORDER BY
ID,
CASE WHEN Price is not null THEN 1 ELSE 2 END,
Price DESC,
CASE WHEN Abbrv is not null THEN 1 ELSE 2 END,
Abbrv ASC

A quick solution would be to do 2 inserts into a temp table or a table variable and as part of insert into the temp table you can set a flag column to help with sorting and then order by that flag column.

Off the top of my head i would say the worst case scenario is you create a temporary table with all the fields do an INSERT INTO the temp table from both T1 & T2 then SELECT from the temp table with an order by.
ie. Create a temp table (eg. #temp) with fields Id, Cat, Price, Name, Abbrv, and then:
SELECT Id, Cat, Price, null, null INTO #temp FROM T1
SELECT Id, null, null, Name, Abbrv INTO #temp FROM T2
SELECT * FROM #temp ORDER BY Id, Price DESC, Abbrv ASC
NB: I'm not 100% sure on the null syntax from the inserts but i think it will work.
EDIT: Added ordering by Price & Abbrv after id... if Id doesn't link T1 & T2 then what does?

Related

most efficient way to select duplicate rows with max timestamp

Suppose I have a table called t, which is like
id content time
1 'a' 100
1 'a' 101
1 'b' 102
2 'c' 200
2 'c' 201
id are duplicate, and for the same id, content could also be duplicate. Now I want to select for each id the rows with max timestamp, which would be
id content time
1 'b' 102
2 'c' 201
And this is my current solution:
select t1.id, t1.content, t1.time
from (
select id, content, time from t
) as t1
right join (
select id, max(time) as time from t group by id
) as t2
on t1.id = t2.id and t1.time = t2.time;
But this looks inefficient to me. Because theoretically when select id, max(time) as time from t group by id is executed, the rows I want have already been located. The right join brings extra O(n^2) time cost, which seems unnecessary.
So is there any more efficient way to do it, or anything that I missunderstand?

Use DISTINCT ON:
SELECT DISTINCT ON (id) id, content, time
FROM yourTable
ORDER BY id, time DESC;
On Postgres, this is usually the most performant way to write your query, and it should outperform ROW_NUMBER and other approaches.
The following index might speed up this query:
CREATE INDEX idx ON yourTable (id, time DESC, content);
This index, if used, would let Postgres rapidly find, for each id, the record having the latest time. This index also covers the content column.

Try this
SELECT a.id, a.content, a.time FROM t AS a
INNER JOIN (
SELECT a.content, MAX(a.time) AS time FROM t
GROUP BY a.content
) AS b ON a.content = b.content AND a.time = b.time

Join two tables using join

I have two tables as below. I would like to keep everything from t1 and everything from t2 except Date,Id. The metrics value (Salary,Bonus) for both tables should not be changed. As there are multiple date and id in t1, I am getting duplicate in the output. My code is as below.Please assist.
select t1.*,t2.*except(Date,Id) from t1
left join t2
on t1.Date = t2. Date
and t1.Id= t2.Id
enter image description here

While the logic for your output table might be lacking of explanation, I can answer your main question. You can use LEFT JOIN and simply manually SELECT all the columns you desired from each table, writing them in the desired order.
Below is the syntax for that with some sample data I created.
with t1 as (
SELECT DATE(2020,01,22) as Date, 1 as id, "abc" as Name, "NYC" as City, "USA" as Country, 5000 as Salary UNION ALL
SELECT DATE(2020,01,23) as Date, 2 as id, "abc" as Name, "SF" as City, "USA" as Country, 8000 as Salary UNION ALL
SELECT DATE(2020,01,22) as Date, 2 as id, "abc" as Name, "SF" as City, "USA" as Country, 8000 as Salary
),
t2 as (
SELECT DATE(2020,01,22) as Date, 1 as id, "Man" as Position, "1st" as Rank, 1000 as Bonus UNION ALL
SELECT DATE(2020,01,22) as Date, 2 as id, "Man" as Position, "1st" as Rank, 1000 as Bonus
)
SELECT t1.Date, t1.id, t1.Name, t1.City, t2.Position, t2.Rank, t1.Country, t1.Salary,
t2.Bonus
FROM t1 LEFT JOIN t2 on t1.Date=t2.Date and t1.id=t2.id
And the output,
Notice that I have selected the columns in the order I wanted them to be in the output. Furthermore, Date and id come from table 1 as specified in the select statement. Another important point is that where t1.Date=t2.Date and t1.id=t2.id are not checked as true, the values assigned to the columns from t2 are null.
I would like to point that I have set manually the value for the Rank column just as a sample. Lastly, everything within with() is sample data.

How do i get the unique records in a UNION where one column is null for one part of the union?

I get a daily feed of products in a staging table. I want to update the actual tables with records from the staging table.
Heres my query.
SELECT NUll, ColumnA, ColumnB FROM stagingTable
UNION
SELECT ID, ColumnA, ColumnB From actualTable
This gives me
NULL 10 100
NULL 20 200
NULL 30 300
1 10 100
I want to remove the duplicate record as that record is already in the actual table.
NULL 10 100

I would simply use not exists:
SELECT ID, ColumnA, ColumnB From actualTable
UNION ALL
SELECT NUll, s.ColumnA, s.ColumnB
FROM stagingTable s
WHERE EXISTS (SELECT 1 FROM actualTable t WHERE t.columnA = s.columnA);

I would do it slightly differently from the first answer and use a subquery for the staging table:
SELECT DISTINCT
COALESCE(T1.ID,T2.ID) AS ID
,T1.ColumnA
,T1.ColumnB
FROM
(
SELECT
NULL AS ID
,ColumnA
,ColumnB
FROM
#stagingtable
) AS T1
LEFT OUTER JOIN #actualtable T2
ON T1.ColumnA = T2.ColumnA
AND T1.ColumnB = T2.ColumnB

Sql Server Query design

I have two tables in Sql Server Table1 and Table2.
The First Table has PartID, Code, Brand
The Second Table has ID, PartID, AddCode, AddBrand
The idea is that the first table is main table where Some Article is entered with his original code and Brand.
The Second Table is table where we can store additional Codes and Brands which original Article is related to them
Let say that in First Table We have following Data:
PartId Code Brand
100 15FY MCD
Second Table Has following data:
ID PartID AddCode AddData
1 100 1888 AddBrand1
2 100 FF0-1 AddBrand2
I want to display data with select like this:
PartId Code Brand
100 15FY MCD
100 1888 AddBrand1
100 FF0-1 AddBrand2
I've tried to use:
Select a.PartID, a.Code, a.Brand,b.AddCode,b.AddData
from table1 a left outer join
table2 b on a.PartId=b.PartId
but i cant figure out how to do it...
Thank you in advance

This sounds more like union all then join:
select PartId, Code, Brand
from ((select t1.PartId, t1.Code, t1.Brand, 1 as seq
from table1 t1
) union all
(select t2.PartId, t2.AddCode as Code, t2.AddBrand as brand, 2 as seq
from t2
)
) x
order by PartId, seq;
Note that this orders the results so all PartIds appear together in the result set, with the row from the first table appearing first.

Use UNION ALL Statement In SELECT Clause :
SELECT PartId, Code, Brand
FROM Table1
UNION ALL
SELECT PartID ,AddCode Code,AddData Brand
FROM Table2

SELECT *
FROM (
SELECT A.PARTID
,A.CODE
,A.BRAND
FROM TABLE1 A
UNION ALL
SELECT B.PARTID
,B.ADDCODE
,B.ADDDATA
FROM TABLE B
) RESULT
ORDER BY RESULT.PARTID

Use Union of both tables like this
Select PartId, Code, Brand from table1
UNION ALL
Select PartID, AddCode, addData
from table2

Is there something equivalent to putting an order by clause in a derived table?

This is sybase 15.
Here's my problem.
I have 2 tables.
t1.jobid t1.date
------------------------------
1 1/1/2012
2 4/1/2012
3 2/1/2012
4 3/1/2012
t2.jobid t2.userid t2.status
-----------------------------------------------
1 100 1
1 110 1
1 120 2
1 130 1
2 100 1
2 130 2
3 100 1
3 110 1
3 120 1
3 130 1
4 110 2
4 120 2
I want to find all the people who's status for THEIR two most recent jobs is 2.
My plan was to take the top 2 of a derived table that joined t1 and t2 and was ordered by date backwards for a given user. So the top two would be the most recent for a given user.
So that would give me that individuals most recent job numbers. Not everybody is in every job.
Then I was going to make an outer query that joined against the derived table searching for status 2's with a having a sum(status) = 4 or something like that. That would find the people with 2 status 2s.
But sybase won't let me use an order by clause in the derived table.
Any suggestions on how to go about this?
I can always write a little program to loop through all the users, but I was gonna try to make one horrendus sql out of it.
Juicy one, no?

You could rank the rows in the subquery by adding an extra column using a window function. Then select the rows that have the appropriate ranks within their groups.
I've never used Sybase, but the documentation seems to indicate that this is possible.

With Table1 As
(
Select 1 As jobid, '1/1/2012' As [date]
Union All Select 2, '4/1/2012'
Union All Select 3, '2/1/2012'
Union All Select 4, '3/1/2012'
)
, Table2 As
(
Select 1 jobid, 100 As userid, 1 as status
Union All Select 1,110,1
Union All Select 1,120,2
Union All Select 1,130,1
Union All Select 2,100,1
Union All Select 2,130,2
Union All Select 3,100,1
Union All Select 3,110,1
Union All Select 3,120,1
Union All Select 3,130,1
Union All Select 4,110,2
Union All Select 4,120,2
)
, MostRecentJobs As
(
Select T1.jobid, T1.date, T2.userid, T2.status
, Row_Number() Over ( Partition By T2.userid Order By T1.date Desc ) As JobCnt
From Table1 As T1
Join Table2 As T2
On T2.jobid = T1.jobid
)
Select *
From MostRecentJobs As M2
Where Not Exists (
Select 1
From MostRecentJobs As M1
Where M1.userid = M2.userid
And M1.JobCnt <= 2
And M1.status <> 2
)
And M2.JobCnt <= 2
I'm using a number of features here which do exist in Sybase 15. First, I'm using common-table expressions both for my sample data and clump my queries together. Second, I'm using the ranking function Row_Number to order the jobs by date.
It should be noted that in the example data you gave, no user satisfies the requirement of having their two most recent jobs both be of status "2".
__
Edit
If you are using a version of Sybase that does not support ranking functions (e.g. Sybase 15 prior to 15.2), then you need simulate the ranking function using Counts.
Create Table #JobRnks
(
jobid int not null
, userid int not null
, status int not null
, [date] datetime not null
, JobCnt int not null
, Primary Key ( jobid, userid, [date] )
)
Insert #JobRnks( jobid, userid, status, [date], JobCnt )
Select T1.jobid, T1.userid, T1.status, T1.[date], Count(T2.jobid)+ 1 As JobCnt
From (
Select T1.jobid, T2.userid, T2.status, T1.[date]
From #Table2 As T2
Join #Table1 As T1
On T1.jobid = T2.jobid
) As T1
Left Join (
Select T1.jobid, T2.userid, T2.status, T1.[date]
From #Table2 As T2
Join #Table1 As T1
On T1.jobid = T2.jobid
) As T2
On T2.userid = T1.userid
And T2.[date] < T1.[date]
Group By T1.jobid, T1.userid, T1.status, T1.[date]
Select *
From #JobRnks As J1
Where Not Exists (
Select 1
From #JobRnks As J2
Where J2.userid = J1.userid
And J2.JobCnt <= 2
And J2.status <> 2
)
And J1.JobCnt <= 2
The reason for using the temp table here is for performance and ease of reading. Technically, you could plug in the query for the temp table into the two places used as a derived table and achieve the same result.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

ORDER BY with a UNION of disparate datasets (T-SQL) - sql

A quick solution would be to do 2 inserts into a temp table or a table variable and as part of insert into the temp table you can set a flag column to help with sorting and then order by that flag column.

Related

most efficient way to select duplicate rows with max timestamp

Join two tables using join

How do i get the unique records in a UNION where one column is null for one part of the union?

Sql Server Query design

Is there something equivalent to putting an order by clause in a derived table?

Categories

Resources