Distinct after join or sub-query with distinct and then join - sql

While writing a procedure, I came across a situation were I have to put a DISTINCT in a query. This is somewhat similar to my table schema
CREATE TABLE T1
(
ID INT,
TypeID INT,
SubTypeID INT,
Name VARCHAR(50)
);
GO
CREATE TABLE T2
(
TypeID INT,
SubTypeID INT,
TypeName VARCHAR(50)
);
GO
INSERT INTO T2 (TypeID, SubTypeID, TypeName)
VALUES (1, 1, 'AAA'), (1, 2, 'AAA'),
(2, 1, 'BBB'), (2, 2, 'BBB'),
(3, 1, 'CCC'), (3, 2, 'CCC');
INSERT INTO T1 (ID, TypeID, SubTypeID, Name)
VALUES (1, 1, 1, 'ABC'), (2, 2, 2, 'BCD'),
(3, 3, 2, 'CDE'), (4, 1, 1, 'DEF'),
(5, 2, 2, 'EFG'), (6, 3, 0, 'FGH'); -- Sub Type not detected yet.
GO
In this, either user can provide the SubType or let the system to detect.
Now I have 2 query options for this scenario.
Option 1
SELECT DISTINCT t1.ID, t1.Name, t2.TypeName
FROM T1
JOIN T2 ON T1.TypeID = T2.TypeID;
And Option 2
SELECT t1.ID, t1.Name, t2.TypeName
FROM T1
JOIN (SELECT DISTINCT TypeID, TypeName FROM T2) AS T2 ON T1.TypeID = T2.TypeID;
The result is same in both the cases but I want to know which should be preferred. There may be millions of rows in table T1 and thousands of rows in T2.
In my opinion, I should use the first option to avoid subquery.
But still want to confirm with the community as it may have some or major performance impact which is not known yet.

If you care about performance, avoid select distinct in the outer query. I would try this:
SELECT t1.ID, t1.Name, t2.TypeName
FROM T1 CROSS APPLY
(SELECT DISTINCT T2.TypeName
FROM T2
WHERE T1.TypeID = T2.TypeID
) T2;

Related

Most efficient way to update table column based on sum

I am looking for the most efficient / minimal code way to update a table column based on the sum of another value in the same table. A method which works and the temp table are shown below.
if object_id('tempdb..#t1') is not null drop table #t1
CREATE TABLE #t1 (id nvarchar(max), astate varchar(16), code varchar(16), price decimal(16,2), total_id_price_bystate decimal(16,2), total_id_price decimal(16,2))
INSERT into #t1 VALUES
(100, 'CA', '0123', 123.01, null, null),
(100, 'CA', '0124', 0.00, null, null),
(100, 'PA', '0256', 12.10, null, null),
(200, 'MA', '0452', 145.00, null, null),
(300, 'MA', '0578', 134.23, null, null),
(400, 'CA', '1111', 94.12, null, null),
(600, 'CA', '0000', 86.34, null, null),
(500, 'CO', '1111', 0.00, null, null);
update t1
set total_id_price_bystate = sum_price_bystate
from #t1 t1
inner join (
select t2_in.Id,
t2_in.astate,
sum(t2_in.price) as sum_price_bystate
from #t1 t2_in
group by t2_in.id, t2_in.astate
) t2
on t1.id = t2.id
and t1.astate = t2.astate
update t1
set total_id_price = sum_price
from #t1 t1
inner join (
select t3_in.Id,
sum(t3_in.price) as sum_price
from #t1 t3_in
group by t3_in.id
) t3
on t1.id = t3.id
select * from #t1
The main thing I don't like about my method is that it requires an inner join with a subquery that requires the same table itself. So I am looking for a way that might be able to avoid this, although I don't think this method I have is overly complicated. Maybe there isn't any method too much more efficient.
To add, I am wondering what the best way would be to combine the two updates together, since they are very similar, but only differ by the group by clause.
As pointed out in the comments, this is not a good way to store data as it violates the basic principles of normalisation -
you are storing data that you can compute
you are storing the same data multiple times, ie, duplicates.
you need to re-calculate the totals whenever any individual values changes
it's possible to update a single row and create a data contradiction
it's also not a bad thing to pre-calculate aggregations, especially in a data warehouse scenario, but you would still only store the value once per unique key.
Normalisation prevents these issues.
Saying that, you can utilise analytic window functions to compute your values in a single pass over the table:
select *,
Sum(price) over(partition by id, astate) total_id_price_bystate,
Sum(price) over(partition by id) total_id_price
from #t1;
If you really want the data in this format you could create a view and query it:
create view Totals as
select id, astate, code, price, total_id_price_bystate, total_id_price,
Sum(price) over(partition by id, astate) total_bystate,
Sum(price) over(partition by id) total
from t1;
select *
from Totals where id = 100;
And to answer your specific question, a view (or a CTE) that touches a single base table can be updated so you can accomplish what you are doing like so:
drop view Totals;
create view Totals as
select id, astate, code, price, total_id_price_bystate, total_id_price,
Sum(price) over(partition by id, astate) total_bystate,
Sum(price) over(partition by id) total
from t1;
update totals set
total_id_price_bystate = total_bystate,
total_id_price = total;
You can use PARTITION BY to get the two different aggregated value,
if object_id('tempdb..#t1') is not null drop table #t1
CREATE TABLE #t1 (id nvarchar(max), astate varchar(16), code varchar(16), price decimal(16,2), total_id_price_bystate decimal(16,2), total_id_price decimal(16,2))
INSERT into #t1 VALUES
(100, 'CA', '0123', 123.01, null, null),
(100, 'CA', '0124', 0.00, null, null),
(100, 'PA', '0256', 12.10, null, null),
(200, 'MA', '0452', 145.00, null, null),
(300, 'MA', '0578', 134.23, null, null),
(400, 'CA', '1111', 94.12, null, null),
(600, 'CA', '0000', 86.34, null, null),
(500, 'CO', '1111', 0.00, null, null);
update t1
set total_id_price_bystate = sum_price_bystate,total_id_price=sum_price
from #t1 t1
inner join (
select t2_in.Id,
t2_in.astate,
sum(t2_in.price) over(partition by t2_in.id, t2_in.astate) as sum_price_bystate,
sum(t2_in.price) over(partition by t2_in.id) as sum_price
from #t1 t2_in
) t2
on t1.id = t2.id
and t1.astate = t2.astate
select * from #t1

SQL statement to get all customers with no orders TODAY(current date)

The question is, how do I write a statement that would return all customers with NO Orders TODAY using sql join?
Tables : tbl_member ,tbl_order
tbl_member consist of id,name,
tbl_order consist of id, date, foodOrdered
If you left join, the select where the table on the right is nulkl, it limits to the rows that DO NOT meet the join condition:
select t1.*
from tbl_member t1
left join tbl_member t2
on t1.id = t2.id -- assuming that t2.id relates to t1.id
and t2.date = current_date() -- today's date in mysql
where t2.id is null
Assuming tbl_order date is a datetime (it probably should be) for sql server you could use something like:
declare #tbl_member table
(
id int,
fullname varchar(50)
)
declare #tbl_order table
(
id int,
orderdate datetime,
foodOrdered varchar(50)
)
INSERT INTO #tbl_member VALUES (1, 'George Washington')
INSERT INTO #tbl_member VALUES (2, 'Abraham Lincoln')
INSERT INTO #tbl_member VALUES (3, 'Mickey Mouse')
INSERT INTO #tbl_member VALUES (3, 'Donald Duck')
INSERT INTO #tbl_order VALUES (1, '2017-07-01 13:00:00', 'Fish and Chips')
INSERT INTO #tbl_order VALUES (2, '2017-07-03 08:00:00', 'Full English')
INSERT INTO #tbl_order VALUES (3, '2017-07-25 08:00:00', 'Veggie Burger')
INSERT INTO #tbl_order VALUES (3, '2017-07-25 12:00:00', 'Bangers and Mash')
SELECT id, fullname FROM #tbl_member WHERE id NOT IN
(SELECT id FROM #tbl_order
WHERE CAST(orderDate as date) = CAST(GETDATE() as Date))
It helps if you specify what flavour database you are using as the syntax is often subtly different.

SQL Server recursive self join

I have a simple categories table as with the following columns:
Id
Name
ParentId
So, an infinite amount of Categories can be the child of a category. Take for example the following hierarchy:
I want, in a simple query that returns the category "Business Laptops" to also return a column with all it's parents, comma separator or something:
Or take the following example:
Recursive cte to the rescue....
Create and populate sample table (Please save us this step in your future questions):
DECLARE #T as table
(
id int,
name varchar(100),
parent_id int
)
INSERT INTO #T VALUES
(1, 'A', NULL),
(2, 'A.1', 1),
(3, 'A.2', 1),
(4, 'A.1.1', 2),
(5, 'B', NULL),
(6, 'B.1', 5),
(7, 'B.1.1', 6),
(8, 'B.2', 5),
(9, 'A.1.1.1', 4),
(10, 'A.1.1.2', 4)
The cte:
;WITH CTE AS
(
SELECT id, name, name as path, parent_id
FROM #T
WHERE parent_id IS NULL
UNION ALL
SELECT t.id, t.name, cast(cte.path +','+ t.name as varchar(100)), t.parent_id
FROM #T t
INNER JOIN CTE ON t.parent_id = CTE.id
)
The query:
SELECT id, name, path
FROM CTE
Results:
id name path
1 A A
5 B B
6 B.1 B,B.1
8 B.2 B,B.2
7 B.1.1 B,B.1,B.1.1
2 A.1 A,A.1
3 A.2 A,A.2
4 A.1.1 A,A.1,A.1.1
9 A.1.1.1 A,A.1,A.1.1,A.1.1.1
10 A.1.1.2 A,A.1,A.1.1,A.1.1.2
See online demo on rextester

Copying data from one table to another

I'm updating data by selecting data from table and inserting into another. However there are some constraints on the other table and I get this :
DETAIL: Key (entry_id)=(391) is duplicated.
I basically do this :
insert into table_tmp
select * from table_one
How can I skip insert when this key entry duplicate occurs?
Update I can't save this schema info on SQL fiddle but here it is :
CREATE TABLE table1
("entry_id" int, "text" varchar(255))
;
INSERT INTO table1
("entry_id", "text")
VALUES
(1, 'one'),
(2, 'two'),
(3, 'test'),
(3, 'test'),
(12, 'three'),
(13, 'four')
;
CREATE TABLE table2
("entry_id" int, "text" varchar(255))
;
Create unique index entry_id_idxs
on table2 (entry_id)
where text='test';
INSERT INTO table2
("entry_id", "text")
VALUES
(1, 'one'),
(2, 'two'),
(3, 'test'),
(3, 'test'),
(12, 'three'),
(13, 'four')
;
Error that I get if I try to build the schema
Insert using join that returns unmatched rows:
INSERT INTO table2
SELECT DISTINCT t1.*
FROM table1 t1
LEFT JOIN table2 t2 ON t2.entry_id = t1.entry_id
WHERE t2.entry_id IS NULL
Use this query - SQLFiddle Demo:
INSERT INTO table2
SELECT t1.* FROM table1 t1
WHERE NOT EXISTS (
SELECT entry_id
FROM table2 t2
WHERE t2.entry_id = t1.entry_id
)

SQL recursive logic

I have a situation where I need to configure existing client data to address a problem where our application was not correctly updating IDs in a table when it should have been.
Here's the scenario. We have a parent table, where rows can be inserted that effectively replace existing rows; the replacement can be recursive. We also have a child table, which has a field that points to the parent table. In existing data, the child table could be pointing at rows that have been replaced, and I need to correct that. I can't simply update each row to the replacing row, however, because that row could have been replaced as well, and I need the latest row to be reflected.
I was trying to find a way to write a CTE that would accomplish this for me, but I'm struggling to find a query that finds what I'm actually looking for. Here's a sample of the tables that I'm working with; the 'ShouldBe' column is what I'd like my update query to end up with, taking into account the recursive replacement of some of the rows.
DECLARE #parent TABLE (SampleID int,
SampleIDReplace int,
GroupID char(1))
INSERT INTO #parent (SampleID, SampleIDReplace, GroupID)
VALUES (1, -1, 'A'), (2, 1, 'A'), (3, -1, 'A'),
(4, -1, 'A'), (5, 4, 'A'), (6, 5, 'A'),
(7, -1, 'B'), (8, 7, 'B'), (9, 8, 'B')
DECLARE #child TABLE (ChildID int, ParentID int)
INSERT INTO #child (ChildID, ParentID)
VALUES (1, 4), (2, 7), (3, 1), (4, 3)
Desired results in child table, after the update script has been applied:
ChildID ParentID ParentID_ShouldBe
1 4 6 (4 replaced by 5, 5 replaced by 6)
2 7 9 (7 replaced by 8, 8 replaced by 9)
3 1 2 (1 replaced by 2)
4 3 3 (unchanged, never replaced)
The following returns what you are looking for:
with cte as (
select sampleid, sampleidreplace, 1 as num
from #parent
where sampleidreplace <> -1
union all
select p.sampleid, cte.sampleidreplace, cte.num+1
from #parent p join
cte
on p.sampleidreplace = cte.sampleId
)
select c.*, coalesce(p.sampleid, c.parentid)
from #child c left outer join
(select ROW_NUMBER() over (partition by sampleidreplace order by num desc) as seqnum, *
from cte
) p
on c.ParentID = p.SampleIDReplace and p.seqnum = 1
The recursive part keeps track of every correspondence (4-->5, 4-->6). The addition number is a "generation" count. We actually want the last generation. This is identified by using the row_number() function, ordering by the num in decreasing order -- hence the p.seqnum = 1.
Ok, so it took me a while and there are probably better ways to do it, but here is one option.
DECLARE #parent TABLE (SampleID int,
SampleIDReplace int,
GroupID char(1))
INSERT INTO #parent (SampleID, SampleIDReplace, GroupID)
VALUES (1, -1, 'A'), (2, 1, 'A'), (3, -1, 'A'),
(4, -1, 'A'), (5, 4, 'A'), (6, 5, 'A'),
(7, -1, 'B'), (8, 7, 'B'), (9, 8, 'B')
DECLARE #child TABLE (ChildID int, ParentID int)
INSERT INTO #child (ChildID, ParentID)
VALUES (1, 4), (2, 7), (3, 1), (4, 3)
;WITH RecursiveParent1 AS
(
SELECT SampleIDReplace, SampleID, 1 RecursionLevel
FROM #parent
WHERE SampleIDReplace != -1
UNION ALL
SELECT A.SampleIDReplace, B.SampleID, RecursionLevel + 1
FROM RecursiveParent1 A
INNER JOIN #parent B
ON A.SampleId = B.SampleIDReplace
),RecursiveParent2 AS
(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY SampleIdReplace ORDER BY RecursionLevel DESC) RN
FROM RecursiveParent1
)
SELECT A.ChildID, ISNULL(B.ParentID,A.ParentID) ParentID
FROM #child A
LEFT JOIN ( SELECT SampleIDReplace, SampleID ParentID
FROM RecursiveParent2
WHERE RN = 1) B
ON A.ParentID = B.SampleIDReplace
OPTION(MAXRECURSION 500)
I've got a iterative SQL loop that I think sorts this out as follows:
WHILE EXISTS (SELECT * FROM #child C INNER JOIN #parent P ON C.ParentID = P.SampleIDReplace WHERE P.SampleIDReplace > -1)
BEGIN
UPDATE #child
SET ParentID = SampleID
FROM #parent
WHERE #child.ParentID = SampleIDReplace
END
Basically, the while condition compares the contents of the parent ID column in the child table and sees if there is a matching value in the SampleIDReplace column of the parent table. If there is, it goes and gets the SampleID of that record. It only stops when the join results in every SampleIDReplace being -1, meaning we have nothing else to do.
On your sample data, the above results in the expected output.
Note that I had to use temp tables rather than table variables here in order for the table to be accessible within the loop. If you have to use table variables then there would need to be a bit more surgery done.
Clearly if you have deep replacement hierarchies then you'll do quite a few updates, which may be a consideration when looking to perform the query against a production database.