Most efficient way to update table column based on sum

Most efficient way to update table column based on sum - sql

I am looking for the most efficient / minimal code way to update a table column based on the sum of another value in the same table. A method which works and the temp table are shown below.
if object_id('tempdb..#t1') is not null drop table #t1
CREATE TABLE #t1 (id nvarchar(max), astate varchar(16), code varchar(16), price decimal(16,2), total_id_price_bystate decimal(16,2), total_id_price decimal(16,2))
INSERT into #t1 VALUES
(100, 'CA', '0123', 123.01, null, null),
(100, 'CA', '0124', 0.00, null, null),
(100, 'PA', '0256', 12.10, null, null),
(200, 'MA', '0452', 145.00, null, null),
(300, 'MA', '0578', 134.23, null, null),
(400, 'CA', '1111', 94.12, null, null),
(600, 'CA', '0000', 86.34, null, null),
(500, 'CO', '1111', 0.00, null, null);
update t1
set total_id_price_bystate = sum_price_bystate
from #t1 t1
inner join (
select t2_in.Id,
t2_in.astate,
sum(t2_in.price) as sum_price_bystate
from #t1 t2_in
group by t2_in.id, t2_in.astate
) t2
on t1.id = t2.id
and t1.astate = t2.astate
update t1
set total_id_price = sum_price
from #t1 t1
inner join (
select t3_in.Id,
sum(t3_in.price) as sum_price
from #t1 t3_in
group by t3_in.id
) t3
on t1.id = t3.id
select * from #t1
The main thing I don't like about my method is that it requires an inner join with a subquery that requires the same table itself. So I am looking for a way that might be able to avoid this, although I don't think this method I have is overly complicated. Maybe there isn't any method too much more efficient.
To add, I am wondering what the best way would be to combine the two updates together, since they are very similar, but only differ by the group by clause.

As pointed out in the comments, this is not a good way to store data as it violates the basic principles of normalisation -
you are storing data that you can compute
you are storing the same data multiple times, ie, duplicates.
you need to re-calculate the totals whenever any individual values changes
it's possible to update a single row and create a data contradiction
it's also not a bad thing to pre-calculate aggregations, especially in a data warehouse scenario, but you would still only store the value once per unique key.
Normalisation prevents these issues.
Saying that, you can utilise analytic window functions to compute your values in a single pass over the table:
select *,
Sum(price) over(partition by id, astate) total_id_price_bystate,
Sum(price) over(partition by id) total_id_price
from #t1;
If you really want the data in this format you could create a view and query it:
create view Totals as
select id, astate, code, price, total_id_price_bystate, total_id_price,
Sum(price) over(partition by id, astate) total_bystate,
Sum(price) over(partition by id) total
from t1;
select *
from Totals where id = 100;
And to answer your specific question, a view (or a CTE) that touches a single base table can be updated so you can accomplish what you are doing like so:
drop view Totals;
create view Totals as
select id, astate, code, price, total_id_price_bystate, total_id_price,
Sum(price) over(partition by id, astate) total_bystate,
Sum(price) over(partition by id) total
from t1;
update totals set
total_id_price_bystate = total_bystate,
total_id_price = total;

You can use PARTITION BY to get the two different aggregated value,
if object_id('tempdb..#t1') is not null drop table #t1
CREATE TABLE #t1 (id nvarchar(max), astate varchar(16), code varchar(16), price decimal(16,2), total_id_price_bystate decimal(16,2), total_id_price decimal(16,2))
INSERT into #t1 VALUES
(100, 'CA', '0123', 123.01, null, null),
(100, 'CA', '0124', 0.00, null, null),
(100, 'PA', '0256', 12.10, null, null),
(200, 'MA', '0452', 145.00, null, null),
(300, 'MA', '0578', 134.23, null, null),
(400, 'CA', '1111', 94.12, null, null),
(600, 'CA', '0000', 86.34, null, null),
(500, 'CO', '1111', 0.00, null, null);
update t1
set total_id_price_bystate = sum_price_bystate,total_id_price=sum_price
from #t1 t1
inner join (
select t2_in.Id,
t2_in.astate,
sum(t2_in.price) over(partition by t2_in.id, t2_in.astate) as sum_price_bystate,
sum(t2_in.price) over(partition by t2_in.id) as sum_price
from #t1 t2_in
) t2
on t1.id = t2.id
and t1.astate = t2.astate
select * from #t1

Related

Spark SQL Query to assign join date by next closest once

`CREATE TABLE TABLE_1(
CALL_ID INT,
CALL_DATE DATE);
INSERT INTO TABLE_1(CALL_ID, CALL_DATE)
VALUES (1, '2022-10-22'),
(2, '2022-10-31'),
(3, '2022-11-04');
CREATE TABLE TABLE_2(
PROD_ID INT,
PROD_DATE DATE);
INSERT INTO TABLE_2(PROD_ID, PROD_DATE)
VALUES (1, '2022-10-25'),
(2, '2022-11-17');
CREATE TABLE TABLE_RESULT(
CALL_ID INT,
CALL_DATE DATE,
PROD_ID INT,
PROD_DATE DATE);
INSERT INTO TABLE_RESULT(CALL_ID, CALL_DATE, PROD_ID, PROD_DATE)
VALUES (1, '2022-10-22', 1, '2022-10-25'),
(2, '2022-10-31', NULL, NULL),
(3, '2022-11-04', 2, '2022-11-17');`
Can you help me to create the TABLE_RESULT with a join in a elegant way? This is a very small example.
Thanks

I solved it. Thanks anyway.
SELECT * FROM (SELECT *, COALESCE(LEAD(CALL_DATE) OVER (PARTITION BY 1 ORDER BY CALL_DATE), CURRENT_DATE) AS CALL_DATE_NEXT FROM TABLE_1) AS A LEFT JOIN TABLE_2 AS B ON (A.CALL_DATE<=B.PROD_DATE AND A.CALL_DATE_NEXT>B.PROD_DATE)

Distinct after join or sub-query with distinct and then join

While writing a procedure, I came across a situation were I have to put a DISTINCT in a query. This is somewhat similar to my table schema
CREATE TABLE T1
(
ID INT,
TypeID INT,
SubTypeID INT,
Name VARCHAR(50)
);
GO
CREATE TABLE T2
(
TypeID INT,
SubTypeID INT,
TypeName VARCHAR(50)
);
GO
INSERT INTO T2 (TypeID, SubTypeID, TypeName)
VALUES (1, 1, 'AAA'), (1, 2, 'AAA'),
(2, 1, 'BBB'), (2, 2, 'BBB'),
(3, 1, 'CCC'), (3, 2, 'CCC');
INSERT INTO T1 (ID, TypeID, SubTypeID, Name)
VALUES (1, 1, 1, 'ABC'), (2, 2, 2, 'BCD'),
(3, 3, 2, 'CDE'), (4, 1, 1, 'DEF'),
(5, 2, 2, 'EFG'), (6, 3, 0, 'FGH'); -- Sub Type not detected yet.
GO
In this, either user can provide the SubType or let the system to detect.
Now I have 2 query options for this scenario.
Option 1
SELECT DISTINCT t1.ID, t1.Name, t2.TypeName
FROM T1
JOIN T2 ON T1.TypeID = T2.TypeID;
And Option 2
SELECT t1.ID, t1.Name, t2.TypeName
FROM T1
JOIN (SELECT DISTINCT TypeID, TypeName FROM T2) AS T2 ON T1.TypeID = T2.TypeID;
The result is same in both the cases but I want to know which should be preferred. There may be millions of rows in table T1 and thousands of rows in T2.
In my opinion, I should use the first option to avoid subquery.
But still want to confirm with the community as it may have some or major performance impact which is not known yet.

If you care about performance, avoid select distinct in the outer query. I would try this:
SELECT t1.ID, t1.Name, t2.TypeName
FROM T1 CROSS APPLY
(SELECT DISTINCT T2.TypeName
FROM T2
WHERE T1.TypeID = T2.TypeID
) T2;

SQL statement to get all customers with no orders TODAY(current date)

The question is, how do I write a statement that would return all customers with NO Orders TODAY using sql join?
Tables : tbl_member ,tbl_order
tbl_member consist of id,name,
tbl_order consist of id, date, foodOrdered

If you left join, the select where the table on the right is nulkl, it limits to the rows that DO NOT meet the join condition:
select t1.*
from tbl_member t1
left join tbl_member t2
on t1.id = t2.id -- assuming that t2.id relates to t1.id
and t2.date = current_date() -- today's date in mysql
where t2.id is null

Assuming tbl_order date is a datetime (it probably should be) for sql server you could use something like:
declare #tbl_member table
(
id int,
fullname varchar(50)
)
declare #tbl_order table
(
id int,
orderdate datetime,
foodOrdered varchar(50)
)
INSERT INTO #tbl_member VALUES (1, 'George Washington')
INSERT INTO #tbl_member VALUES (2, 'Abraham Lincoln')
INSERT INTO #tbl_member VALUES (3, 'Mickey Mouse')
INSERT INTO #tbl_member VALUES (3, 'Donald Duck')
INSERT INTO #tbl_order VALUES (1, '2017-07-01 13:00:00', 'Fish and Chips')
INSERT INTO #tbl_order VALUES (2, '2017-07-03 08:00:00', 'Full English')
INSERT INTO #tbl_order VALUES (3, '2017-07-25 08:00:00', 'Veggie Burger')
INSERT INTO #tbl_order VALUES (3, '2017-07-25 12:00:00', 'Bangers and Mash')
SELECT id, fullname FROM #tbl_member WHERE id NOT IN
(SELECT id FROM #tbl_order
WHERE CAST(orderDate as date) = CAST(GETDATE() as Date))
It helps if you specify what flavour database you are using as the syntax is often subtly different.

How do you join tables sharing the same column?

I made an SQL Fiddle and what I would like to do is join these two queries by using the departmentid.
What I would like to show is the departmentname and not_approved_manager.
Would it be best to use a union or join in this case?
Tables
create table cserepux
(
status int,
comment varchar(25),
departmentid int,
approveddate datetime
);
insert into cserepux (status, comment, departmentid, approveddate)
values (1, 'testing1', 1, NULL), (1, 'testing2', 1, NULL),
(1, 'testing2', 2, NULL), (0, 'testing2', 1, NULL),
(0, 'tesitng2', 1, NULL), (0, 'testing2', 1, NULL),
(0, 'tesitng2', 1, NULL), (0, 'testing3', 2, NULL),
(0, 'testing3', 3, NULL);
create table cseDept
(
departmentid int,
department_name varchar(25)
);
insert into cseDept (departmentid,department_name)
values (1, 'department one'), (2, 'department two'),
(3, 'department three'), (4, 'department four');
Query
select
departmentid,
COUNT(*) AS 'not_approved_manager'
from
cserepux
where
approveddate is null
group by
departmentid
SELECT * FROM cseDept

You need to do a join. A union will not get you what you want.
select d.department_name, COUNT(*) AS 'not_approved_manager'
from cserepux c
inner join cseDept d on c.departmentid = d.departmentid
where approveddate is null
group by d.department_name

Do you need just a join and a correct group by
select dep.department_name, COUNT(*) AS 'not_approved_manager'
from cseDept dep
join cserepux cs on cs.departmentid = dep.departmentid
where approveddate is null
group by dep.department_name
Fiddle: http://sqlfiddle.com/#!3/5cf4e/30
Since joins and group by are really basic things in SQL I can suggest you do take a look on some tutorials to get a bit more proficiency whit it. You can try SQL Server Central stairway articles series

Select rows with duplicate values in 2 columns

This is my table:
CREATE TABLE [Test].[dbo].[MyTest]
(
[Id] BIGINT NOT NULL,
[FId] BIGINT NOT NULL,
[SId] BIGINT NOT NULL
);
And some data:
INSERT INTO [Test].[dbo].[MyTest] ([Id], [FId], [SId]) VALUES (1, 100, 11);
INSERT INTO [Test].[dbo].[MyTest] ([Id], [FId], [SId]) VALUES (2, 200, 12);
INSERT INTO [Test].[dbo].[MyTest] ([Id], [FId], [SId]) VALUES (3, 100, 21);
INSERT INTO [Test].[dbo].[MyTest] ([Id], [FId], [SId]) VALUES (4, 200, 22);
INSERT INTO [Test].[dbo].[MyTest] ([Id], [FId], [SId]) VALUES (5, 300, 13);
INSERT INTO [Test].[dbo].[MyTest] ([Id], [FId], [SId]) VALUES (6, 200, 12);
So I need 2 select query,
First Select FId, SId that like a distinct in both column so the result is:
100, 11
200, 12
100, 21
200, 22
300, 13
As you see the values of 200, 12 returned once.
Second query is the Id's of that columns whose duplicated in both FId, SId So the result is:
2
6
Does any one have any idea about it?

Standard SQL
SELECT
M.ID
FROM
( -- note all duplicate FID, SID pairs
SELECT FID, SID
FROM MyTable
GROUP BY FID, SID
HAVING COUNT(*) > 1
) T
JOIN -- back onto main table using these duplicate FID, SID pairs
MyTable M ON T.FID = M.FID AND T.SID = M.SID
Using windowing:
SELECT
T.ID
FROM
(
SELECT
ID,
COUNT(*) OVER (PARTITION BY FID, SID) AS CountPerPair
FROM
MyTable
) T
WHERE
T.CountPerPair > 1

First query:
SELECT DISTINCT Fid,SId
FROM MyTest
Second query:
SELECT DISTINCT a1.Id
FROM MyTest a1 INNER JOIN MyTest a2
ON a1.Fid = a2.Fid
AND a1.SId = a2.SId
AND a1.Id <> a2.Id
I cannot test them, but I think they should work...

first:
select distinct FId,SId from [Test].[dbo].[MyTest]
second query
select distinct t.Id
from [Test].[dbo].[MyTest] t
inner join [Test].[dbo].[MyTest] t2
on t.Id<>t2.Id and t.FId=t2.FId and t.SId=t2.SId

Part 1 is as mentioned above distinct.
This will resolve second part.
select id from [Test].[dbo].[MyTest] a
where exists(select 1 from [Test].[dbo].[MyTest] where a.[SId] = [SId] and a.[FId] = [FId] and a.id <> id)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Most efficient way to update table column based on sum - sql

Related

Spark SQL Query to assign join date by next closest once

Distinct after join or sub-query with distinct and then join

SQL statement to get all customers with no orders TODAY(current date)

How do you join tables sharing the same column?

Select rows with duplicate values in 2 columns

Categories

Resources