Related
I use SQL Server SGBD and I have the following scenario with 2 tables :
CREATE TABLE D_CLIENT
(
ID_CLIENT varchar(10) NOT NULL,
NOM_CLIENT varchar(10) NULL,
PRIMARY KEY (ID_CLIENT)
)
CREATE TABLE F_FACT
(
ANNEE varchar(10) NOT NULL,
DOCUMENT varchar(10) NOT NULL,
NUM_DOC varchar(10) NOT NULL,
NUM_LIGNE_DOC varchar(10) NOT NULL,
ID_CLIENT varchar(10) NOT NULL,
ID_REP varchar(10) NOT NULL,
CA decimal(10,2) NULL,
PRIMARY KEY (ANNEE, DOCUMENT, NUM_DOC, NUM_LIGNE_DOC),
CONSTRAINT FK_FactClient
FOREIGN KEY (ID_CLIENT) REFERENCES D_CLIENT(ID_CLIENT)
)
INSERT INTO D_CLIENT (ID_CLIENT, NOM_CLIENT)
VALUES ('1', 'A'), ('2', 'B'), ('3', 'C'), ('4', 'D')
INSERT INTO F_FACT (ANNEE, DOCUMENT, NUM_DOC, NUM_LIGNE_DOC, ID_CLIENT, ID_REP, CA)
VALUES ('2022', 'FAC', '1', '1', '1', '1', 100),
('2022', 'FAC', '1', '2', '1', '1', 100),
('2022', 'FAC', '2', '1', '5', '1', 100)
I have a foreign key on ID_CLIENT for the integrity of data, so if I try to insert a row into F_FACT with an ID_CLIENT which doesn't exist in D_CLIENT, it will fail and it's normal because of foreign key constraint.
So when I execute the INSERT query, I get a error message because the value '5' doesn't exist in the table D_CLIENT but the 2 first row are not inserted either, where the ID_CLIENT does exist in the D_CLIENT table.
My question: is it possible, with a query, to insert only the correct rows (that's means the 2 first rows) and **reject only ** the third row ?
Thanks for your help
Join the source with the lookup table to reject missing values
with src as (
select *
from (
VALUES
('2022','FAC','1','1','1','1',100),
('2022','FAC','1','2','1','1',100),
('2022','FAC','2','1','5','1',100)
) t(ANNEE, DOCUMENT, NUM_DOC, NUM_LIGNE_DOC, ID_CLIENT, ID_REP, CA)
)
insert into F_FACT(ANNEE, DOCUMENT, NUM_DOC, NUM_LIGNE_DOC, ID_CLIENT, ID_REP, CA)
select src.ANNEE, src.DOCUMENT, src.NUM_DOC, src.NUM_LIGNE_DOC, src.ID_CLIENT, src.ID_REP, src.CA
from src
join D_CLIENT c on c.ID_CLIENT = src.ID_CLIENT
db<>fiddle
This is something I would use an exists check for:
insert into F_FACT (ANNEE, DOCUMENT, NUM_DOC, NUM_LIGNE_DOC, ID_CLIENT, ID_REP, CA)
select ANNEE, DOCUMENT, NUM_DOC, NUM_LIGNE_DOC, ID_CLIENT, ID_REP, CA from (
values
('2022','FAC','1','1','1','1',100),
('2022','FAC','1','2','1','1',100),
('2022','FAC','2','1','5','1',100)
)v(ANNEE, DOCUMENT, NUM_DOC, NUM_LIGNE_DOC, ID_CLIENT, ID_REP, CA)
where exists (select * from D_CLIENT d where d.ID_CLIENT = v.ID_CLIENT)
I work with languages where I can assign intermediate outputs to a variable and then work the with variables to create a final output. I know SQL doesn't work this way as much. Currently I have queries that require me to make subsets of tables and then I want to join those subsets together. I can mimic the variable assignment I do in my native languages using a VIEW but I want to know how to do this using a single query (otherwise the database will get messy with views quickly).
Below is a MWE to make 2 initial tables DeleteMe1 and DeleteMe2 (at the end). Then I'd use these two views to get current snapshots of each table. Last I'd use LEFT JOIN with the views to merge the 2 data sets.
Is there a way to see the code SQL uses on the Join Snapshoted Views header code I supply below
How could I eliminate the views intermediate step and combine into a single SQL query?
Create views for current snapshot:
CREATE VIEW [dbo].[CurrentSnapshotDeleteMe1]
AS
SELECT DISTINCT *
FROM
(SELECT
t.[Id]
,t.[OppId]
,t.[LastModifiedDate]
,t.[Stage]
FROM
[dbo].DeleteMe1 as t
INNER JOIN
(SELECT
[OppId], MAX([LastModifiedDate]) AS MaxLastModifiedDate
FROM
[dbo].DeleteMe1
WHERE
LastModifiedDate <= GETDATE()
GROUP BY
[OppId]) AS referenceGroup ON t.[OppId] = referenceGroup.[OppId]
AND t.[LastModifiedDate] = referenceGroup.[MaxLastModifiedDate]) as BigGroup
GO
CREATE VIEW [dbo].[CurrentSnapshotDeleteMe2]
AS
SELECT DISTINCT *
FROM
(SELECT
t.[Id]
,t.[OppId]
,t.[LastModifiedDate]
,t.[State]
FROM
[dbo].DeleteMe2 AS t
INNER JOIN (
SELECT [OppId], MAX([LastModifiedDate]) AS MaxLastModifiedDate
FROM [dbo].DeleteMe2
WHERE LastModifiedDate <= GETDATE()
GROUP BY [OppId]
) as referenceGroup
ON t.[OppId] = referenceGroup.[OppId] AND t.[LastModifiedDate] = referenceGroup.[MaxLastModifiedDate]
) as BigGroup
GO
Join snapshoted views:
SELECT
dm1.[Id] as IdDM1
,dm1.[OppId]
,dm1.[LastModifiedDate] as LastModifiedDateDM1
,dm1.[Stage]
,dm2.[Id] as IdDM2
,dm2.[LastModifiedDate] as LastModifiedDateDM2
,dm2.[State]
FROM [dbo].[CurrentSnapshotDeleteMe1] as dm1
LEFT JOIN [dbo].[CurrentSnapshotDeleteMe2] as dm2 ON dm1.OppId = dm2.OppId
Create original tables:
CREATE TABLE DeleteMe1
(
[Id] INT,
[OppId] INT,
[LastModifiedDate] DATE,
[Stage] VARCHAR(250),
)
INSERT INTO DeleteMe1
VALUES ('1', '1', '2019-04-01', 'A'),
('2', '1', '2019-05-01', 'E'),
('3', '1', '2019-06-01', 'B'),
('4', '2', '2019-07-01', 'A'),
('5', '2', '2019-08-01', 'B'),
('6', '3', '2019-09-01', 'C'),
('7', '4', '2019-10-01', 'B'),
('8', '4', '2019-11-01', 'C')
CREATE TABLE DeleteMe2
(
[Id] INT,
[OppId] INT,
[LastModifiedDate] DATE,
[State] VARCHAR(250),
)
INSERT INTO DeleteMe2
VALUES (' 1', '1', '2018-07-01', 'California'),
(' 2', '1', '2017-11-01', 'Delaware'),
(' 3', '4', '2017-12-01', 'California'),
(' 4', '2', '2018-01-01', 'Alaska'),
(' 5', '4', '2018-02-01', 'Delaware'),
(' 6', '2', '2018-09-01', 'Delaware'),
(' 7', '3', '2018-04-01', 'Alaska'),
(' 8', '1', '2018-05-01', 'Hawaii'),
(' 9', '4', '2018-06-01', 'California'),
('10', '1', '2018-07-01', 'Connecticut'),
('11', '2', '2018-08-01', 'Delaware'),
('12', '2', '2018-09-01', 'California')
I work with languages where I can assign intermediate outputs to a variable and then work the with variables to create a final output. I know SQL doesn't work this way as much.
Well, that's not true, sql does work this way, or at least sql-server does. You have temp tables and table variables.
Although you named your tables DeleteMe, from your statements it seems like it's the views you wish to treat as variables. So I'll focus on this.
Here's how to do it for your first view. It puts the results into a temporary table called #tempData1:
-- Optional: In case you re-run before you close your connection
if object_id('tempdb..#snapshot') is not null
drop table #snapshot1;
select
distinct t.Id, t.OppId, t.LastModifiedDate, t.Stage
into #snapshot1
from dbo.DeleteMe1 as t
inner join (
select OppId, max(LastModifiedDate) AS MaxLastModifiedDate
from dbo.DeleteMe1
where LastModifiedDate <= getdate()
group by OppId
) referenceGroup
on t.OppId = referenceGroup.OppId
and t.LastModifiedDate = referenceGroup.MaxLastModifiedDate;
The hashtag tells sql server that the table is to be stored temporarially. #tempTable1 will not survive when your connection closes.
Alternatively, you can create a table variable.
declare #snapshot1 table (
id int,
oppId int,
lastModifiedDate date,
stage varchar(50)
);
insert #snapshot1 (id, oppId, lastModifiedDate, stage)
select distinct ...
This table is discarded as soon as the query has finished executing.
From there, you can join on your temp tables:
SELECT dm1.[Id] as IdDM1, dm1.[OppId],
dm1.[LastModifiedDate] as LastModifiedDateDM1, dm1.[Stage],
dm2.[Id] as IdDM2, dm2.[LastModifiedDate] as LastModifiedDateDM2,
dm2.[State]
FROM #snapshot1 dm1
LEFT JOIN #snapshot2 dm2 ON dm1.OppId = dm2.OppId
Or your table variables:
From there, you can join on your temp tables:
SELECT dm1.[Id] as IdDM1, dm1.[OppId],
dm1.[LastModifiedDate] as LastModifiedDateDM1, dm1.[Stage],
dm2.[Id] as IdDM2, dm2.[LastModifiedDate] as LastModifiedDateDM2,
dm2.[State]
FROM #snapshot1 dm1
LEFT JOIN #snapshot2 dm2 ON dm1.OppId = dm2.OppId
So I have the following table with the schema:
CREATE TABLE stages (
id serial PRIMARY KEY,
cid VARCHAR(6) NOT NULL,
stage varchar(30) NOT null,
status varchar(30) not null,
);
with the following test data:
INSERT INTO stages (id, cid, stage, status) VALUES
('1', '1', 'first stage', 'accepted'),
('2', '1', 'second stage', 'current'),
('3', '2', 'first stage', 'accepted'),
('4', '3', 'first stage', 'accepted'),
('5', '3', 'second stage', 'accepted'),
('6', '3', 'third stage', 'current')
;
Now the use case is that we want to query this table for each stage for example we will query this table for the 'first stage' and then try to fetch all those cids which do not exist in the subsequent stage for example the 'second stage':
Result Set:
cid | status
2 | 'accepted'
While running the query for the 'second stage', we will try to fetch all those cids that do not exist in the 'third stage' and so on.
Result Set:
cid | status
1 | 'current'
Currently, we do this by making an exists subquery in the where clause which is not very performant.
The question is that is there a better alternative approach to the one we're currently using or do we need to focus on optimizing this current approach only? Also, what further optimizations can we do to make the exists subquery more performant?
Thanks!
You can use lead():
select s.*
from (select s.*,
lead(stage) over (partition by cid order by id) as next_stage
from stages s
) s
where stage = 'first stage' and next_stage is null;
CREATE TABLE stages (
id serial PRIMARY KEY
, cid VARCHAR(6) NOT NULL
, stage varchar(30) NOT null
, status varchar(30) not null
, UNIQUE ( cid, stage)
);
INSERT INTO stages (id, cid, stage, status) VALUES
(1, '1', 'first stage', 'accepted'),
(2, '1', 'second stage', 'current'),
(3, '2', 'first stage', 'accepted'),
(4, '3', 'first stage', 'accepted'),
(5, '3', 'second stage', 'accepted'),
(6, '3', 'third stage', 'current')
;
ANALYZE stages;
-- You can fetch all (three) stages with one query
-- Luckily, {'first', 'second', 'third'} are ordered alphabetically ;-)
-- --------------------------------------------------------------
-- EXPLAIN ANALYZE
SELECT * FROM stages q
WHERE NOT EXISTS (
SELECT * FROM stages x
WHERE x.cid = q.cid AND x.stage > q.stage
);
-- Some people dont like EXISTS, or think that it is slow.
-- --------------------------------------------------------------
-- EXPLAIN ANALYZE
SELECT q.*
FROM stages q
JOIN (
SELECT id
, row_number() OVER (PARTITION BY cid ORDER BY stage DESC) AS rn
FROM stages x
)x ON x.id = q.id AND x.rn = 1;
My question is: How do I order the subquery by PositionAssetId and then follow by its related PhysicalAssetId based on table TrxAssetPool?
I need LEFT JOIN because not all Position and Physical were linked together. Some of Position/Physical were standalone. A Physical might exist in PhysicalAsset and TrxPhysicalAsset but not exist in TrxAssetPool because it was not linked to any Position; and vice versa. These data also needed to be displayed.
CREATE TABLE `PositionAssets` (
`Id` int(5) unsigned NOT NULL,
`Code` varchar(50) NOT NULL,
`Desc` varchar(200) NOT NULL,
PRIMARY KEY (`Id`)
);
CREATE TABLE `PhysicalAssets` (
`Id` int(5) unsigned NOT NULL,
`Code` varchar(50) NOT NULL,
`Desc` varchar(200) NOT NULL,
PRIMARY KEY (`Id`)
);
CREATE TABLE `TrxPositionAssets` (
`Id` int(5) unsigned NOT NULL,
`MaintTrxId` int(5) unsigned NOT NULL,
`PositionAssetId` int(5) NOT NULL,
PRIMARY KEY (`Id`,`MaintTrxId`)
);
CREATE TABLE `TrxPhysicalAssets` (
`Id` int(5) unsigned NOT NULL,
`MaintTrxId` int(5) unsigned NOT NULL,
`PhysicalAssetId` int(5) NOT NULL,
PRIMARY KEY (`Id`,`MaintTrxId`)
);
CREATE TABLE `TrxAssetPool` (
`Id` int(5) unsigned NOT NULL,
`MaintTrxId` int(5) NOT NULL,
`PositionAssetId` int(5) NOT NULL,
`PhysicalAssetId` int(5) NOT NULL,
PRIMARY KEY (`Id`)
);
INSERT INTO `PositionAssets` (`Id`, `Code`, `Desc`) VALUES
('1', 'PositionC', 'Air conditioner'),
('2', 'PositionB', 'Laptop'),
('3', 'PositionA', 'Mobile Phone')
;
INSERT INTO `PhysicalAssets` (`Id`, `Code`, `Desc`) VALUES
('1', 'PhysicalD', 'Dunlop Car Tyre'),
('2', 'PhysicalA1', 'Samsung'),
('3', 'PhysicalB2', 'Acer'),
('4', 'PhysicalB1', 'Lenovo')
;
INSERT INTO `TrxPositionAssets` (`Id`, `MaintTrxId`, `PositionAssetId`) VALUES
('1', '1', '2'),
('2', '1', '3'),
('3', '1', '1')
;
INSERT INTO `TrxPhysicalAssets` (`Id`, `MaintTrxId`, `PhysicalAssetId`) VALUES
('1', '1', '2'),
('2', '1', '3'),
('3', '1', '1'),
('4', '1', '4')
;
INSERT INTO `TrxAssetPool` (`Id`,`MaintTrxId`,`PositionAssetId`,`PhysicalAssetId`) VALUES
('1', '1', '3', '2'),
('2', '1', '2', '4'),
('3', '1', '2', '3')
;
SELECT DataType, DataCode, DataDesc
FROM (
SELECT 'Position' AS DataType, pos.Code AS DataCode, pos.Desc AS DataDesc
FROM TrxPositionAssets trxpos
JOIN PositionAssets pos ON pos.Id = trxpos.PositionAssetId
LEFT JOIN TrxAssetPool trxpool ON (trxpool.PositionAssetId = trxpos.PositionAssetId and trxpool.MaintTrxId = trxpos.MaintTrxId)
WHERE trxpos.MaintTrxId = 1
UNION
SELECT 'Physical' AS DataType, phy.Code AS DataCode, phy.Desc AS DataDesc
FROM TrxPhysicalAssets trxphy
JOIN PhysicalAssets phy ON phy.Id = trxphy.PhysicalAssetId
LEFT JOIN TrxAssetPool trxpool ON (trxpool.PhysicalAssetId = trxphy.PhysicalAssetId and trxpool.MaintTrxId = trxphy.MaintTrxId)
WHERE trxphy.MaintTrxId = 1
) DataPool
Sample at sqlfiddle.com
Current result:
DataType DataCode DataDesc
Position PositionA Mobile Phone
Position PositionB Laptop
Position PositionC Air conditioner
Physical PhysicalA1 Samsung
Physical PhysicalB1 Lenovo
Physical PhysicalB2 Acer
Physical PhysicalD Dunlop Car Tyre
Expected Result:
DataType DataCode DataDesc
Position PositionA Mobile Phone
Physical PhysicalA1 Samsung
Position PositionB Laptop
Physical PhysicalB1 Lenovo
Physical PhysicalB2 Acer
Position PositionC Air conditioner
Physical PhysicalD Dunlop Car Tyre
Air conditioner is not related to any Physical. Dunlop Car Tyre is not related to any Position.
In the end of query put,
ORDER BY DATA.DataId ASC;
You need to select the information you want in the subquery. Also, the LEFT JOINs are not necessary, because they are undone by the WHERE and you probably want UNION ALL:
SELECT Data.[DataId], Data.[TrxnDataId], Data.[Type]
FROM ((SELECT pa.[Id] AS DataId, tpa.[Id] AS TrxnDataId, 'Position' AS Type,
tap.PositionAssetId, 1 as ord
FROM {TrxPositionAssets} tpa JOIN
{PositionAssets} pa
ON pa.[Id] = tpa.[PositionAssetId] JOIN
TrxAssetPool} tap
ON tap.[PositionAssetId] = pa.[Id] AND tap.[TrxId] = tpa.[TrxId])
WHERE tpa.[TrxId] = #TrxId
) UNION ALL
(SELECT pa.[Id] AS DataId, tpa.[Id] AS TrxnDataId, 'Physical' AS Type,
tap.PositionAssetId, 2 as ord
FROM {TrxPhysicalAssets} tpa JOIN
{PhysicalAssets} pa
ON pa.[Id] = tpa.[PhysicalAssetId] JOIN
{TrxAssetPool} tap
ON tap.[PhysicalAssetId] = pa.[Id] AND tap.[TrxId] = tpa.[TrxId]
WHERE tpa.[TrxId] = #TrxId
)
) data
ORDER BY PositionAssetId, ord, dataId;
To solve your problem you have to simplify it and solve it step by step, it will be easier to find a solution.
E.g. Simply join two table
SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDate
FROM Orders
INNER JOIN Customers ON Orders.CustomerID=Customers.CustomerID;
I am stuck with a strange scenario now.
I have a table with many records. A few of which looks like:
Table 1:
----------------------------------------------
M_ID ROLE_ID
----------------------------------------------
idA adVA12^~^dsa25
idA adsf32^~^123^~^asdf32
idA hdghf45
idB fdngfhlo43^~^
idB pnsdfmg23
idC 123ghaskfdnk
idC hafg32^~^^~^gasdfg
----------------------------------------------
and Table 2:
-----------------------------------------------------------
ROLE_ID ADDR1 ADDR2 ADDR3
-----------------------------------------------------------
adVA12^~^dsa25 18 ben street
adsf32^~^123^~^asdf32 24 naruto park
hdghf45 18 ben street
fdngfhlo43^~^ 40 spartan ave
pnsdfmg23 40 spartan ave
123ghaskfdnk 14 southpark ave
hafg32^~^^~^gasdfg 88 brooks st
-----------------------------------------------------------
I have these tables linked by ROLE_ID.
My requirement is that, all the ROLE_IDs of a single M_ID in Table 1 must be compared for their address fields in Table 2. In case if the address of all the ROLE_IDs corresponding to that single M_ID is not the same in Table 2, it should be returned.
i.e., in this case, my result should be:
-----------------------------
M_ID ROLE_ID
-----------------------------
idA adVA12^~^dsa25
idA adsf32^~^123^~^asdf32
idA hdghf45
-----------------------------
the M_ID, and the corresponding ROLE_IDs.
I have no idea on how to compare multiple records.
I'd join the tables and count the distinct number of addresses:
SELECT m_id, role_id
FROM (SELECT t1.m_id AS m_id,
t1.role_id AS role_id,
COUNT(DISTINCT t2.addr1 || '-' || t2.addr2 || '-' || t2.addr3)
OVER (PARTITION BY t1.m_id) AS cnt
FROM t1
JOIN t2 ON t1.role_id = t2.role_id) t
WHERE cnt > 1
Slightly different approach:
SELECT t1.m_id, substr(sys.stragg(',' || t1.role_id),2) Roles,
t2.ADDR1, t2.ADDR2, t2.ADDR3
FROM t1
JOIN t2 ON t1.role_id = t2.role_id
GROUP BY t1.m_id, t2.ADDR1, t2.ADDR2, t2.ADDR3;
"M_ID" "ROLES" "ADDR1" "ADDR2" "ADDR3"
"A" "A1,A2,A3,A5,A6,A7" "1" "2" "3"
"A" "A4" "4" "2" "3"
You could add HAVING COUNT(0) > 1 to get all with matches, or use HAVING COUNT(0) = 1 to get all instances that are only used once, or use it without a HAVING to get a summary.
I used the following test data:
CREATE TABLE TEST_ROLE (
ID INTEGER PRIMARY KEY,
M_ID VARCHAR2(32) NOT NULL,
ROLE_ID VARCHAR2(256) NOT NULL
);
CREATE TABLE TEST_ROLE_ADDRESS (
ROLE_ID VARCHAR2(256) NOT NULL,
ADDR1 VARCHAR2(1000),
ADDR2 VARCHAR2(1000),
ADDR3 VARCHAR2(1000)
);
INSERT INTO TEST_ROLE VALUES(1, 'A', 'A1');
INSERT INTO TEST_ROLE VALUES(2, 'A', 'A2');
INSERT INTO TEST_ROLE VALUES(3, 'A', 'A3');
INSERT INTO TEST_ROLE VALUES(4, 'A', 'A4');
INSERT INTO TEST_ROLE VALUES(5, 'A', 'A5');
INSERT INTO TEST_ROLE VALUES(6, 'A', 'A6');
INSERT INTO TEST_ROLE VALUES(7, 'A', 'A7');
INSERT INTO TEST_ROLE_ADDRESS VALUES ('A1', '1', '2', '3');
INSERT INTO TEST_ROLE_ADDRESS VALUES ('A2', '1', '2', '3');
INSERT INTO TEST_ROLE_ADDRESS VALUES ('A3', '1', '2', '3');
INSERT INTO TEST_ROLE_ADDRESS VALUES ('A4', '4', '2', '3');
INSERT INTO TEST_ROLE_ADDRESS VALUES ('A5', '1', '2', '3');
INSERT INTO TEST_ROLE_ADDRESS VALUES ('A6', '1', '2', '3');
INSERT INTO TEST_ROLE_ADDRESS VALUES ('A7', '1', '2', '3');