SQL to resequence items by groups

SQL to resequence items by groups - sql

Lets say I have a database that looks like this:
tblA:
ID, Name, Sequence, tblBID
1 a 5 14
2 b 3 15
3 c 3 16
4 d 3 17
tblB:
ID, Group
14 1
15 1
16 2
17 3
I would like to sequence A so that the sequences go 1...n for each group of B.
So in this case, the sequences going down should be 1,2,1,1.
The ordering needs to be consistent with the current ordering, but there are no guarantees as to the current ordering.
I am not exactly a sql master and I am sure there is a fairly easy way to do this, but I really don't know the right route to take. Any hints?

If you are using SQL Server 2005+ or higher, you can use a ranking function:
Select tblA.Id, tblA.Name
, Row_Number() Over ( Partition By tblB.[Group] Order By tblA.Id ) As Sequence
, tblA.tblBID
From tblA
Join tblB
On tblB.tblBID = tblB.ID
Row_Number ranking function.
Here's another solution that would work in SQL Server 2000 and prior.
Select A.Id, A.Name
, (Select Count(*)
From tblB As B1
Where B1.[Group] = B.[Group]
And B1.Id < B.ID) + 1 As Sequence
, A.tblBID
From tblA As A
Join tblB As B
On B.Id = A.tblBID
EDIT
Also want to make it clear that I want to actually update tblA to reflect the proper sequences.
In SQL Server, you can use their proprietary From clause in an Update statement like so:
Update tblA
Set Sequence = (
Select Count(*)
From tblB As B1
Where B1.[Group] = B.[Group]
And B1.Id < B.ID
) + 1
From tblA As A
Join tblB As B
On B.Id = A.tblBID
The Hoyle ANSI solution might be something like:
Update tblA
Set Sequence = (
Select (Select Count(*)
From tblB As B1
Where B1.[Group] = B.[Group]
And B1.Id < B.ID) + 1
From tblA As A
Join tblB As B
On B.Id = A.tblBID
Where A.Id = tblA.Id
)
EDIT
Can we do that [the inner group] comparison based on A.Sequence instead of B.ID?
Select A1.*
, (Select Count(*)
From tblB As B2
Join tblA As A2
On A2.tblBID = B2.Id
Where B2.[Group] = B1.[Group]
And A2.Sequence < A1.Sequence) + 1
From tblA As A1
Join tblB As B1
On B1.Id = A1.tblBID

Because it's SQL 2000, we can't use a windowing function. That's okay.
Thomas's queries are good and will work. However, they will get worse and worse as the number of rows increases—with different characteristics depending on how wide (the number of groups) and how deep (the number of items per group). This is because those queries use a partial cross-join, perhaps we could call it a "pyramidal cross-join" where the crossing part is limited to right side values less than left side values rather than left crossing to all right values.
What to do?
I think you will be surprised to find that the following long and painful-looking script will outperform the pyramidal join at a certain size of data (which may not be all that big) and eventually, with really large data sets must be considered a screaming performer:
CREATE TABLE #tblA (
ID int identity(1,1) NOT NULL,
Name varchar(1) NOT NULL,
Sequence int NOT NULL,
tblBID int NOT NULL,
PRIMARY KEY CLUSTERED (ID)
)
INSERT #tblA VALUES ('a', 5, 14)
INSERT #tblA VALUES ('b', 3, 15)
INSERT #tblA VALUES ('c', 3, 16)
INSERT #tblA VALUES ('d', 3, 17)
CREATE TABLE #tblB (
ID int NOT NULL PRIMARY KEY CLUSTERED,
GroupID int NOT NULL
)
INSERT #tblB VALUES (14, 1)
INSERT #tblB VALUES (15, 1)
INSERT #tblB VALUES (16, 2)
INSERT #tblB VALUES (17, 3)
CREATE TABLE #seq (
seq int identity(1,1) NOT NULL,
ID int NOT NULL,
GroupID int NOT NULL,
PRIMARY KEY CLUSTERED (ID)
)
INSERT #seq
SELECT
A.ID,
B.GroupID
FROM
#tblA A
INNER JOIN #tblB B ON A.tblBID = b.ID
ORDER BY B.GroupID, A.Sequence
UPDATE A
SET A.Sequence = S.seq - X.MinSeq + 1
FROM
#tblA A
INNER JOIN #seq S ON A.ID = S.ID
INNER JOIN (
SELECT GroupID, MinSeq = Min(seq)
FROM #seq
GROUP BY GroupID
) X ON S.GroupID = X.GroupID
SELECT * FROM #tblA
DROP TABLE #seq
DROP TABLE #tblB
DROP TABLE #tblA
If I understood you correctly, then ORDER BY B.GroupID, A.Sequence is correct. If not, you can switch A.Sequence to B.ID.
Also, my index on the temp table should be experimented with. For a certain quantity of rows, and also the width and depth characteristics of those rows, clustering on one of the other two columns in the #seq table could be helpful.
Last, there is a possible different data organization possible: leaving GroupID out of the #seq table and joining again. I suspect it would be worse, but am not 100% sure.

Something like:
SELECT a.id, a.name, row_number() over (partition by b.group order by a.id)
FROM tblA a
JOIN tblB on a.tblBID = b.ID;

Related

LEFT JOIN with OR clause without UNION

I know this shouldn't happen in a database, but it happened and we have to deal with it. We need to insert new rows into a table if they don't exist based on the values in another table. This is easy enough (just do LEFT JOIN and check for NULL values in 1st table). But...the join isn't very straight forward and we need to search 1st table on 2 conditions with an OR and not AND. So basically if it finds a match on either of the 2 attributes, we consider that the corresponding row in 1st table exists and we don't have to insert a new one. If there are no matches on either of the 2 attributes, then we consider it as a new row. We can use OR condition in the LEFT JOIN statement but from what I understand, it does full table scan and the query takes a very long time to complete even though it yields the right results. We cannot use UNION either because it will not give us what we're looking for.
Just for simplicity purpose consider the scenario below (we need to insert data into tableA).
If(OBJECT_ID('tempdb..#tableA') Is Not Null) Begin
Drop Table #tableA End
If(OBJECT_ID('tempdb..#tableB') Is Not Null) Begin
Drop Table #tableB End
create table #tableA ( email nvarchar(50), id int )
create table #tableB ( email nvarchar(50), id int )
insert into #tableA (email, id) values ('123#abc.com', 1), ('456#abc.com', 2), ('789#abc.com', 3), ('012#abc.com', 4)
insert into #tableB (email, id) values ('234#abc.com', 1), ('456#abc.com', 2), ('567#abc.com', 3), ('012#abc.com', 4), ('345#abc.com', 5)
--THIS QUERY IS CORRECTLY RETURNING 1 RECORD
select B.email, B.id
from #tableB B
left join #tableA A on A.email = B.email or B.id = A.id
where A.id is null
--THIS QUERY IS INCORRECTLY RETURNING 3 RECORDS SINCE THERE ARE ALREADY RECORDS WITH ID's 1 & 3 in tableA though the email addresses of these records don't match
select B.email, B.id
from #tableB B
left join #tableA A on A.email = B.email
where A.id is null
union
select B.email, B.id
from #tableB B
left join #tableA A on B.id = A.id
where A.id is null
If(OBJECT_ID('tempdb..#tableA') Is Not Null) Begin
Drop Table #tableA End
If(OBJECT_ID('tempdb..#tableB') Is Not Null) Begin
Drop Table #tableB End
The 1st query works correctly and only returns 1 record, but the table size is just few records and it completes under 1 sec. When the 2 tables have thousands or records, the query may take 10 min to complete. The 2nd query of course returns the records we don't want to insert because we consider them existing. Is there a way to optimize this query so it takes an acceptable time to complete?

You are using an anti join, which is another way of writing the straight-forward NOT EXISTS:
where not exists
(
select null
from #tableA A
where A.email = B.email or B.id = A.id
)
I.e. where not exists a row in table A with the same email or the same id. In other words: where not exists a row with the same email and not exists a row with the same id.
where not exists (select null from #tableA A where A.email = B.email)
and not exists (select null from #tableA A where B.id = A.id)
With the appropriate indexes
on #tableA (id);
on #tableA (email);
this should be very fast.

It's hard to tune something you can't see. Another option to get the data is to:
SELECT B.email
, B.id
FROM #TableB B
EXCEPT
(
SELECT B.email
, B.id
FROM #tableB B
INNER JOIN #tableA A
ON A.email = B.email
UNION ALL
SELECT B.email
, B.id
FROM #tableB B
INNER JOIN #tableA A
ON B.id = A.id
)
This way you don't have to use OR, you can use INNER JOIN rather than LEFT JOIN and you can use UNION ALL instead of UNION (though this advantage may well be negated by the EXCEPT). All of which may help your performance. Perhaps the joins can be more efficient when replaced with EXISTS.
You didn't mention how this problem occurred (where the data from both tables is coming from, and why they are out of sync when they shouldn't be), but it would be preferable to fix it at the source.

No the query returns correctly 3 rows
because
select B.email, B.id
from #tableB B
left join #tableA A on A.email = B.email
where A.id is null
Allone reurns the 3 rows.
For your "problemm"
select B.email, B.id
from #tableB B
left join #tableA A on A.email = B.email or B.id = A.id
where A.id is null
will che3kc for every row, if it is true to be included
So for example
('123#abc.com', 1) ('234#abc.com', 1)
as the Ids are the same it will be joined
but when you join by the emails the condition is false and so is included in the result set
You can only use the UNION approach, when you are comparing only the emails or the ids, but with both the queries are not equivalent

How to use a bunch of clause in a having statement. Oracle

assume i have a query like this:
SELECT table1.id
FROM (
SELECT id, sum(column) as A
FROM table1
GROUP BY id
) a1
Left join (
SELECT id,
sum(column) as B
FROM table 2
GROUP BY Id
) a2
on table1.id=table2.id
.
.
.
.
Left join (
SELECT id, sum(column) as G
FROM table 7
GROUP BY id
) g1
on table1.id=table7.id
Having or where A+B - (C+D+E+F+G) >0
I tried both, none works.
Having return error on there's no group by in the first select and where doesn't return any rows.

First your question have some issues.
I'm going to guess you mean put alias a, b, c, d ....
instead of a1, a2, g1.
Also your left join should be something like a.id = b.id at the moment you create a subquery you have to use the alias instead of tablename.
If you fix that you should add a WHERE, I also guess you mean use the SUM() result
WHERE a.A + b.B - (c.C+ d.D+ e.E+ f.F+ g.G) > 0
.
SELECT a.id
FROM (
SELECT id, sum(column) as sumA
FROM table1
GROUP BY id
) a
Left join
(
SELECT id, sum(column) as sumB
FROM table 2
GROUP BY Id
) b
on a.id = b.id
.
.
.
.
Left join
(
SELECT id, sum(column) as sumG
FROM table 2
GROUP BY id
) g
on f.id = g.id
WHERE a.sumA + b.sumB - (c.sumC + d.sumD + e.sumE + f.sumF + g.sumG) >0

Juan has the right answer. I am just adding a SQLFiddle to help strengthen his answer. Please look at a smaller instance of the same solution here: http://sqlfiddle.com/#!4/81c275/1
Tables
create table table1(id int, col int);
insert into table1 values (1, 10);
insert into table1 values (2, 20);
insert into table1 values (2, 30);
create table table2(id int, col int);
insert into table2 values (1, 5);
insert into table2 values (2, 3);
insert into table2 values (2, 2);
create table table3(id int, col int);
insert into table3 values (1, 100);
insert into table3 values (2, 20);
insert into table3 values (2, 3);
SQL
select a1.id
from (select id, sum(col) as A from table1 group by id) a1
left join (select id, sum(col) as B from table2 group by id) a2
on a1.id = a2.id
left join (select id, sum(col) as C from table3 group by id) a3
on a1.id = a3.id
where A + B - (C) > 0
You can add more tables in the SQLFiddle with whatever values you please, and change the SQL accordingly by appending D, E, F, G etc after C in (C).
The above example will result in output of 2 since ID 2's A+B = 55 and C = 23. A+B-C > 0 for this record and therefore the output will be 2.

I believe that you need to take out the 'where' and move it up if you still need it.
So that it would look something like this,
select table1.id from(
...
...
...)
Having ((A+B)-(C+D+R+F+G)>0)
According to this site:
http://www.w3schools.com/sql/sql_having.asp

Order of Full Outer Joins Yields Different Number of Result Rows..Why?

I'm working with two tables A and B. Table A identifies equity securities and Table B has a number of details on the security.
For example, when B.Item = 5301, the row specifies price for a given security. When B.Item = 9999, the row specifies dividends for a given security. I am trying to get both price and dividends in the same row. In order to achieve this, I FULL JOINed table B twice to table A.
SELECT *
FROM a a
FULL JOIN (SELECT *
FROM b) b
ON b.code = a.code
AND b.item = 3501
FULL JOIN (SELECT *
FROM b) b2
ON b2.code = a.code
AND b.item = 9999
AND b2.year_ = b.year_
AND b.freq = b2.freq
AND b2.seq = b.seq
WHERE a.code IN ( 122514 )
The remaining fields in the join clause like Year_, Freq, and Seq just make sure the dates of the price and dividends match. A.Code simply identifies a single security.
My issue is that when I flip the order of the full joins I get a different number of results. So if b.Item = 9999 comes before b.Item 2501, I get one result. The other way around I get 2 results. I realized the table B has zero entries for security 122514 for dividend, but has two entries for price.
When price is specified first, I get both prices and dividend fields are null. However, when dividend is specified first, I get NULLs for the dividend fields and also nulls for the prices fields.
Why aren't the two price entries showing up? I would expect them to do so in a FULL JOIN

It's because your second FULL OUTER JOIN refers to your first FULL OUTER JOIN. This means changing the order of them is making a fundamental change to the query.
Here is some pseudo-SQL that demonstrates how this works:
DECLARE #a TABLE (Id INT, Name VARCHAR(50));
INSERT INTO #a VALUES (1, 'Dog Trades');
INSERT INTO #a VALUES (2, 'Cat Trades');
DECLARE #b TABLE (Id INT, ItemCode VARCHAR(1), PriceDate DATE, Price INT, DividendDate DATE, Dividend INT);
INSERT INTO #b VALUES (1, 'p', '20141001', 100, '20140101', 1000);
INSERT INTO #b VALUES (1, 'p', '20141002', 50, NULL, NULL);
INSERT INTO #b VALUES (2, 'c', '20141001', 10, '20141001', 500);
INSERT INTO #b VALUES (2, 'c', NULL, NULL, '20141002', 300);
--Same results
SELECT a.*, b1.*, b2.* FROM #a a FULL OUTER JOIN #b b1 ON b1.Id = a.Id AND b1.ItemCode = 'p' FULL OUTER JOIN #b b2 ON b2.Id = a.Id AND b2.ItemCode = 'c';
SELECT a.*, b2.*, b1.* FROM #a a FULL OUTER JOIN #b b1 ON b1.Id = a.Id AND b1.ItemCode = 'c' FULL OUTER JOIN #b b2 ON b2.Id = a.Id AND b2.ItemCode = 'p';
--Different results
SELECT a.*, b1.*, b2.* FROM #a a FULL OUTER JOIN #b b1 ON b1.Id = a.Id AND b1.ItemCode = 'p' FULL OUTER JOIN #b b2 ON b2.Id = a.Id AND b2.ItemCode = 'c' AND b2.DividendDate = b1.PriceDate;
SELECT a.*, b2.*, b1.* FROM #a a FULL OUTER JOIN #b b1 ON b1.Id = a.Id AND b1.ItemCode = 'c' FULL OUTER JOIN #b b2 ON b2.Id = a.Id AND b2.ItemCode = 'p' AND b2.DividendDate = b1.PriceDate;

Filtering out null values only if they exist in SQL select

Below I have sql select to retrieve values from a table. I want to retrieve the values from tableA regardless of whether or not there are matching rows in tableB. The below gives me both non-null values and null values. How do I filter out the null values if non-null rows exist, but otherwise keep the null values?
SELECT a.* FROM
(
SELECT
id,
col1,
coll2
FROM tableA a LEFT OUTER JOIN tableB b ON b.col1=a.col1 and b.col2='value'
WHERE a.id= #id
AND a.col2= #arg
) AS a
ORDER BY col1 ASC

You can do this by counting the number of matches using a window function. Then, either return all rows in A if there are no matching B rows, or only return the rows that do match:
select id, col1, col2
from (SELECT a.id, a.col1, a.coll2,
count(b.id) over () as numbs
FROM tableA a LEFT OUTER JOIN tableB b ON b.col1=a.col1 and b.col2='value'
WHERE a.id = #id AND a.col2= #arg
) ab
where numbs = 0 or b.id is not null;

Filter them out in WHERE clause
SELECT
id,
col1,
coll2
FROM tableA a LEFT OUTER JOIN tableB b ON b.col1=a.col1 and b.col2='value'
WHERE a.id= #id
AND a.col2= #arg
AND A.Col1 IS NOT NULL -- HERE
) AS a
ORDER BY col1 ASC

For some reason, people write code like Marko that puts a filter (b.col2 = 'value') in the JOIN clause. While this works, it is not good practice.
Also, you should get in the habit of having the ON clause in the right sequence. We are joining table A to table B, why write it as B.col1 = A.col1 which is backwards.
While the above statement works, it could definitely be improved.
I created the following test tables.
-- Just playing
use tempdb;
go
-- Table A
if object_id('A') > 0 drop table A
go
create table A
(
id1 int,
col1 int,
col2 varchar(16)
);
go
-- Add data
insert into A
values
(1, 1, 'Good data'),
(2, 2, 'Good data'),
(3, 3, 'Good data');
-- Table B
if object_id('B') > 0 drop table B
go
create table B
(
id1 int,
col1 int,
col2 varchar(16)
);
-- Add data
insert into B
values
(1, 1, 'Good data'),
(2, 2, 'Good data'),
(3, NULL, 'Null data');
Here is the improved statement. I choose literals instead of variables. However, you can change for your example.
-- Filter non matching records
SELECT
A.*
FROM A LEFT OUTER JOIN B ON
A.col1 = B.col1
WHERE
B.col1 IS NOT NULL AND
A.id1 in (1, 2) AND
A.col2 = 'Good data'
ORDER BY
A.id1 DESC
Here is an image of the output.

Join to only the "latest" record with t-sql

I've got two tables. Table "B" has a one to many relationship with Table "A", which means that there will be many records in table "B" for one record in table "A".
The records in table "B" are mainly differentiated by a date, I need to produce a resultset that includes the record in table "A" joined with only the latest record in table "B". For illustration purpose, here's a sample schema:
Table A
-------
ID
Table B
-------
ID
TableAID
RowDate
I'm having trouble formulating the query to give me the resultset I'm looking for any help would be greatly appreciated.

SELECT *
FROM tableA A
OUTER APPLY (SELECT TOP 1 *
FROM tableB B
WHERE A.ID = B.TableAID
ORDER BY B.RowDate DESC) as B

select a.*, bm.MaxRowDate
from (
select TableAID, max(RowDate) as MaxRowDate
from TableB
group by TableAID
) bm
inner join TableA a on bm.TableAID = a.ID
If you need more columns from TableB, do this:
select a.*, b.* --use explicit columns rather than * here
from (
select TableAID, max(RowDate) as MaxRowDate
from TableB
group by TableAID
) bm
inner join TableB b on bm.TableAID = b.TableAID
and bm.MaxRowDate = b.RowDate
inner join TableA a on bm.TableAID = a.ID

table B join is optional: it depends if there are other columns you want
SELECT
*
FROM
tableA A
JOIN
tableB B ON A.ID = B.TableAID
JOIN
(
SELECT Max(RowDate) AS MaxRowDate, TableAID
FROM tableB
GROUP BY TableAID
) foo ON B.TableAID = foo.TableAID AND B.RowDate= foo.MaxRowDate

With ABDateMap AS (
SELECT Max(RowDate) AS LastDate, TableAID FROM TableB GROUP BY TableAID
),
LatestBRow As (
SELECT MAX(ID) AS ID, TableAID FROM ABDateMap INNER JOIN TableB ON b.TableAID=a.ID AND b.RowDate = LastDate GROUP BY TableAID
)
SELECT columns
FROM TableA a
INNER JOIN LatestBRow m ON m.TableAID=a.ID
INNER JOIN TableB b on b.ID = m.ID

Just for the clarity's sake and to benefit those who will stumble upon this ancient question. The accepted answer would return duplicate rows if there are duplicate RowDate in Table B. A safer and more efficient way would be to utilize ROW_NUMBER():
Select a.*, b.* -- Use explicit column list rather than * here
From [Table A] a
Inner Join ( -- Use Left Join if the records missing from Table B are still required
Select *,
ROW_NUMBER() OVER (PARTITION BY TableAID ORDER BY RowDate DESC) As _RowNum
From [Table B]
) b
On b.TableAID = a.ID
Where b._RowNum = 1

Try using this:
BEGIN
DECLARE #TB1 AS TABLE (ID INT, NAME VARCHAR(30) )
DECLARE #TB2 AS TABLE (ID INT, ID_TB1 INT, PRICE DECIMAL(18,2))
INSERT INTO #TB1 (ID, NAME) VALUES (1, 'PRODUCT X')
INSERT INTO #TB1 (ID, NAME) VALUES (2, 'PRODUCT Y')
INSERT INTO #TB2 (ID, ID_TB1, PRICE) VALUES (1, 1, 3.99)
INSERT INTO #TB2 (ID, ID_TB1, PRICE) VALUES (2, 1, 4.99)
INSERT INTO #TB2 (ID, ID_TB1, PRICE) VALUES (3, 1, 5.99)
INSERT INTO #TB2 (ID, ID_TB1, PRICE) VALUES (1, 2, 0.99)
INSERT INTO #TB2 (ID, ID_TB1, PRICE) VALUES (2, 2, 1.99)
INSERT INTO #TB2 (ID, ID_TB1, PRICE) VALUES (3, 2, 2.99)
SELECT A.ID, A.NAME, B.PRICE
FROM #TB1 A
INNER JOIN #TB2 B ON A.ID = B.ID_TB1 AND B.ID = (SELECT MAX(ID) FROM #TB2 WHERE ID_TB1 = A.ID)
END

This will fetch the latest record with JOIN. I think this will help someone
SELECT cmp.*, lr_entry.lr_no FROM
(SELECT * FROM lr_entry ORDER BY id DESC LIMIT 1)
lr_entry JOIN companies as cmp ON cmp.id = lr_entry.company_id

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL to resequence items by groups - sql

Something like: SELECT a.id, a.name, row_number() over (partition by b.group order by a.id) FROM tblA a JOIN tblB on a.tblBID = b.ID;

Related

LEFT JOIN with OR clause without UNION

How to use a bunch of clause in a having statement. Oracle

Order of Full Outer Joins Yields Different Number of Result Rows..Why?

Filtering out null values only if they exist in SQL select

Join to only the "latest" record with t-sql

Categories

Resources