With postgreSQL I would like to ...
Goal: I would like to do a subtraction operation on two tables that were joined (INNER JOIN) and grouped (GROUP BY) before.
Below is a minimal reproducible script that I hope will work for you.
In this script I create the tables, insert data, show what I expect with a workaround. And I show you what I'd like to do in a single SQL operation (unsuccessfully).
I hope you understand and thank you for your attention.
/* Scrip minimal reproducible example (Script_mre)
* Data: 04/01/2023
* Autor: Carlos Antonio Zarzar */
-- Purpose: I would like to do a subtraction operation on two tables that
-- were joined (INNER JOIN) and grouped (GROUP BY) before.
-------------------#-------------------#-------------------#-------------------#
--## Creating database and tables ##--
-- Building Database
CREATE DATABASE db_racao;
-- TABLE racao
CREATE TABLE racao(
id_racao SERIAL PRIMARY KEY NOT NULL,
tamanho INT NOT NULL,
tipo VARCHAR(20) NOT NULL,
proteina INT NOT NULL
);
-- TABLE compra_racao
CREATE TABLE compra_racao(
id_comp_racao SERIAL PRIMARY KEY NOT NULL,
id_racao SERIAL NOT NULL REFERENCES racao(id_racao),
valor_uni NUMERIC NOT NULL,
quantidade REAL NOT NULL,
valor_entrada NUMERIC NOT NULL,
validade DATE NOT NULL,
cod_lote INT NOT NULL
);
-- TABLE saida_racao
CREATE TABLE saida_racao(
id_saida_racao SERIAL PRIMARY KEY NOT NULL,
quantidade REAL NOT NULL,
valor_saida NUMERIC NOT NULL,
data_saida TIMESTAMP NOT NULL,
id_comp_racao SERIAL NOT NULL REFERENCES compra_racao(id_comp_racao),
id_racao SERIAL NOT NULL REFERENCES racao(id_racao)
);
-------------------#-------------------#-------------------#-------------------#
--## Inserting data into tables ##--
-- TABLE racao
INSERT INTO racao(tamanho,tipo,proteina)
VALUES
(5,'alevino',48),
(10,'engorda',38),
(25,'prime',42),
(5,'alevino',48);
-- TABLE compra_racao
INSERT INTO compra_racao(id_racao,valor_uni,quantidade,valor_entrada,validade,cod_lote)
VALUES
(1,2.5,2000,5000,'2025-01-01',123),
(2,3.4,1000,3400,'2025-01-01',321),
(3,4.0,1000,4000,'2025-01-01',654),
(1,2.5,4000,10000,'2025-01-01',456),
(2,3.4,2000,6800,'2025-01-01',987),
(3,4.0,1500,6000,'2025-01-01',789),
(4,2.5,2500,6250,'2025-01-01',789);
-- TABLE saida_racao
INSERT INTO saida_racao(quantidade,valor_saida,data_saida,id_comp_racao,id_racao)
VALUES
(2000,5000,'2022-03-05 00:00:00',1,1),
(1000,3400,'2022-05-08 00:00:00',2,2),
(500,1700,'2022-09-25 00:00:00',3,3),
(100,340,'2022-09-25 00:00:00',3,3),
(1000,2500,'2023-02-10 00:00:00',4,1),
(1000,2500,'2023-03-30 00:00:00',5,2),
(1000,2500,'2023-04-05 00:00:00',6,3),
(575,1437.5,'2023-11-10 00:00:00',4,1),
(1525,3812.5,'2023-12-15 00:00:00',4,1),
(1000,2500,'2023-12-20 00:00:00',7,4),
(1200,3000,'2023-12-20 00:00:00',7,4);
-------------------#-------------------#-------------------#-------------------#
--## Making the queries ##--
/* The problem:
* I would like to subtract two resulting tables, the "Entrada" table and the "Saida" table.
* Each tables are grouped by id.racao.*/
/* Not an "elegant" solution, i.e. a workaround:
* I transform each table resulting from the operation into another table (of the MATERIALIZED VIEW type).
* And then I do the third operation which is the query of the "Estoque" table*/
-- MATERIALIZED VIEW Entrada
CREATE MATERIALIZED VIEW view_entrada AS
SELECT r.id_racao, SUM(cr.quantidade) AS "entrada", SUM(cr.valor_entrada) AS "valor_entrada"
FROM racao AS r
INNER JOIN compra_racao AS cr
ON r.id_racao = cr.id_racao
GROUP BY r.id_racao
ORDER BY r.id_racao
WITH DATA;
-- MATERIALIZED VIEW Saida
CREATE MATERIALIZED VIEW view_saida AS
SELECT r.id_racao, SUM(sr.quantidade) AS "saida", SUM(sr.valor_saida) AS "valor_saida"
FROM racao AS r
INNER JOIN saida_racao AS sr
ON r.id_racao = sr.id_racao
GROUP BY r.id_racao
ORDER BY r.id_racao
WITH DATA;
-- And finally the query with the "Estoque" Table (joining the two "Entrada" and "Saida" by group)
-- Estoque
SELECT id_racao, ve.entrada - vs.saida AS quant_total, ve.valor_entrada - vs.valor_saida AS valor_total
FROM view_entrada AS ve
INNER JOIN view_saida AS vs
USING (id_racao);
-- This is the result I expect.
-------------------#-------------------#-------------------#-------------------#
-- Now what I would like to do is do all the operations at once and then
-- make the resulting table a MATERIALIZED VIEW for queries.
-- An idea of what you'd like (may help):
SELECT
(SELECT r.id_racao, SUM(cr.quantidade) FROM racao AS r
INNER JOIN compra_racao AS cr
ON r.id_racao = cr.id_racao
GROUP BY r.id_racao)
-
(SELECT r.id_racao, SUM(sr.quantidade)
FROM racao AS r
INNER JOIN saida_racao AS sr
ON r.id_racao = sr.id_racao
GROUP BY r.id_racao)
You can do it this way (Result here)
with entrada as (
SELECT r.id_racao, SUM(cr.quantidade) AS qty, SUM(cr.valor_entrada) AS valor
FROM racao AS r
INNER JOIN compra_racao AS cr
ON r.id_racao = cr.id_racao
GROUP BY r.id_racao
ORDER BY r.id_racao),
salida as (
SELECT r.id_racao, SUM(sr.quantidade) AS qty, SUM(sr.valor_saida) AS valor
FROM racao AS r
INNER JOIN saida_racao AS sr
ON r.id_racao = sr.id_racao
GROUP BY r.id_racao
ORDER BY r.id_racao)
select
coalesce(e.id_racao,s.id_racao) as id, coalesce(e.qty,0) - coalesce(s.qty,0) AS quant_total,
coalesce(e.valor,0) - coalesce(s.valor,0) AS valor_total
from
entrada e left join salida s on e.id_racao = s.id_racao
I'm trying to optimize the performance of the following update query:
UPDATE a
SET a.[qty] =
(
SELECT MAX(b.[qty])
FROM [TableA] AS b
WHERE b.[ID] = a.[ID]
AND b.[Date] = a.[Date]
AND b.[qty] <> 0
)
FROM [TableA] a
WHERE a.[qty] = 0
AND a.[status] = 'New'
It deals with a large table with over 200m. rows.
I've already tried to create an index on [qty,status], but it was not really helpfull due to the index update at the end.
Generally it is not so easy to create indexes on this table, cause there are a lot other update/insert-queries.
So I'm think to reorganize this query somehow.
Any ideas?
TableA is a heap like this:
CREATE TABLE TableA (
ID INTEGER null,
qty INTEGER null,
date date null,
status VARCHAR(50) null,
);
Execution plan: https://www.brentozar.com/pastetheplan/?id=S1KLUWO15
It's difficult to answer without seeing execution plans and table definitions, but you can avoid self-joining by using an updatable CTE/derived table with window functions
UPDATE a
SET
qty = a.maxQty
FROM (
SELECT *,
MAX(CASE WHEN a.qty <> 0 THEN a.qty END) OVER (PARTITION BY a.ID, a.Date) AS maxQty
FROM [TableA] a
) a
WHERE a.qty = 0
AND a.status = 'New';
To support this query, you will need the following index
TableA (ID, Date) INCLUDE (qty, status)
The two key columns can be in either order, and if you do a clustered index then the INCLUDE columns are included automatically.
I have two tables(Current and Prior) that have all the same columns and are combined through a full outer join in a query. I also have a derived column for each of the respective columns that compares the values of Current and Prior corresponding fields and says whether they match or not. This creates a derived table that has all the Current and Prior Fields as well as a derived comparing column. I need to create an actual table in a database that captures that data. How would I do that?
This should create a view for you to use, in case the tables are not very large.
CREATE VIEW [dbo].[vw_Compare]
AS
SELECT /* Column list*/
IIF(A.Col1 IS NULL, 1, 0) AS [CompareCol1],
IIF(A.Col2 IS NULL, 1, 0) AS [CompareCol2]
FROM A
FULL OUTER JOIN C
ON A.Col1 = C.Col1
If you wish to create a table:
CREATE TABLE [dbo].[Compare]
(
[CompareCol1] BIT,
[CompareCol2] BIT,
/* Insert Column 1 to N here */
)
INSERT INTO [dbo].[Compare]
(
[CompareCol1],
[CompareCol2],
/* Column list*/
)
SELECT IIF(A.Col1 IS NULL, 1, 0) AS [CompareCol1],
IIF(A.Col2 IS NULL, 1, 0) AS [CompareCol2],
/* Column list*/
FROM A
FULL OUTER JOIN C
ON A.Col1 = C.Col1
This query shown below is taking almost 2 hrs to run and I want to reduce the execution time of this query. Any help would be really helpful for me.
Currently:
If Exists (Select 1
From PRODUCTS prd
Join STORE_RANGE_GRP_MATCH srg On prd.Store_Range_Grp_Id = srg.Orig_Store_Range_Grp_ID
And srg.Match_Flag = 'Y'
And prd.Range_Event_Id = srg.LAR_Range_Event_Id
Where srg.Range_Event_Id Not IN (Select distinct Range_Event_Id
From Last_Authorised_Range)
)
I have tried replacing the Not IN clause by Not Exists and Left join but no luck in runtime execution.
What I have used:
If Exists( Select top 1 *
From PRODUCTS prd
Join STORE srg
On prd.Store_Range_Grp_Id = srg.Orig_Store_Range_Grp_ID
And srg.Match_Flag = 'Y'
And prd.Range_Event_Id = srg.LAR_Range_Event_Id
and srg.Range_Event_Id ='45655'
Where NOT EXISTS (Select top 1 *
From Last_Authorised_Range where Range_Event_Id=srg.Range_Event_Id)
)
Product table has 432837 records and the Store table also has almost the same number of records. This table I am creating in the stored procedure itself and then dropping it in the end in the stored procedure.
Create Table PRODUCTS
(
Range_Event_Id int,
Store_Range_Grp_Id int,
Ranging_Prod_No nvarchar(14) collate database_default,
Space_Break_Code nchar(1) collate database_default
)
Create Clustered Index Idx_tmpLAR_PRODUCTS
ON PRODUCTS (Range_Event_Id, Ranging_Prod_No, Store_Range_Grp_Id, Space_Break_Code)
Should I use non clustered index on this table or what all can I do to lessen the execution time? Thanks in advance
First, you don't need top 1 or distinct in exists and in subqueries. But this shouldn't affect performance.
This is the query, slightly re-arranged so I can understand it better:
Select 1
From PRODUCTS prd Join
STORE srg
On prd.Store_Range_Grp_Id = srg.Orig_Store_Range_Grp_ID and
prd.Range_Event_Id = srg.LAR_Range_Event_Id
Where srg.Match_Flag = 'Y'
srg.Range_Event_Id = 45655 and
Where NOT EXISTS (Select 1
From Last_Authorised_Range lar
where lar.Range_Event_Id = srg.Range_Event_Id)
)
Do note that I removed the double quotes around 45655. I presume this column is actually a number. If so, don't confuse yourself and the optimizer by using a string for the comparison.
Then, try indexes. I think the best indexes are:
store(Range_Event_Id, Match_Flag, Orig_Store_Range_Grp_ID, LAR_Range_Event_Id)
products(Store_Range_Grp_Id, Range_Event_Id) (or any index, clustered or otherwise, that starts with these two columns in either order)
Last_Authorised_Range(Range_Event_Id)
From what you describe as the volume of data, your query should not be taking hours. I think indexes can help.
We have a data processing application that has two separate paths that should eventually produce similar results. We also have a database-backed monitoring service that compares and utilizes the results of this processing. At any point in time, either of the two paths may or may not have produced results for the operation, but I want to be able to query a view that tells me about any results that have been produced.
Here's a simplified example of the schema I started with:
create table LeftResult (
DateId int not null,
EntityId int not null,
ProcessingValue int not null
primary key ( DateId, EntityId ) )
go
create table RightResult (
DateId int not null,
EntityId int not null,
ProcessingValue int not null
primary key ( DateId, EntityId ) )
go
create view CombinedResults
as
select
DateId = isnull( l.DateId, r.DateId ),
EntityId = isnull( l.EntityId, r.EntityId ),
LeftValue = l.ProcessingValue,
RightValue = r.ProcessingValue,
MaxValue = case
when isnull( l.ProcessingValue, 0 ) > isnull( r.ProcessingValue, 0 )
then isnull( l.ProcessingValue, 0 )
else isnull( r.ProcessingValue, 0 )
end
from LeftResult l
full outer join RightResult r
on l.DateId = r.DateId
and l.EntityId = r.EntityId
go
The problem with this is that Sql Server always chooses to scan the PK on LeftResult and RightResult rather than seek, even when queries to the view include DateId and EntityId as predicates. This seems to be due to the isnull() checks on the results. (I've even tried using index hints and forceseek, but without avail -- the query plan still shows a scan.)
However, I can't simply replace the isnull() results, since either the left or right side could be missing from the join (because the associated process hasn't populated the table yet).
I don't particularly want to duplicate the MaxValue logic across all of the consumers of the view (in reality, it's quite a bit more complex calculation, but the same idea applies.)
Is there a good strategy I can use to structure this view or queries against it so that the
query plan will utilize a seek rather than a scan?
try using left outer join for one of the tables, then union those results with the excluded rows from the other table.
like:
select (...)
from LeftResult l
left outer join RightResult r
on l.DateId = r.DateId
and l.EntityId = r.EntityId
(...)
UNION ALL
select (...)
from RightResult r
leftouter join LeftResult l
on l.DateId = r.DateId
and l.EntityId = r.EntityId
WHERE
l.dateid is null