Postgresql: An alternative to subqueries to make the query more efficient? - sql

So I have the following table with the schema:
CREATE TABLE stages (
id serial PRIMARY KEY,
cid VARCHAR(6) NOT NULL,
stage varchar(30) NOT null,
status varchar(30) not null,
);
with the following test data:
INSERT INTO stages (id, cid, stage, status) VALUES
('1', '1', 'first stage', 'accepted'),
('2', '1', 'second stage', 'current'),
('3', '2', 'first stage', 'accepted'),
('4', '3', 'first stage', 'accepted'),
('5', '3', 'second stage', 'accepted'),
('6', '3', 'third stage', 'current')
;
Now the use case is that we want to query this table for each stage for example we will query this table for the 'first stage' and then try to fetch all those cids which do not exist in the subsequent stage for example the 'second stage':
Result Set:
cid | status
2 | 'accepted'
While running the query for the 'second stage', we will try to fetch all those cids that do not exist in the 'third stage' and so on.
Result Set:
cid | status
1 | 'current'
Currently, we do this by making an exists subquery in the where clause which is not very performant.
The question is that is there a better alternative approach to the one we're currently using or do we need to focus on optimizing this current approach only? Also, what further optimizations can we do to make the exists subquery more performant?
Thanks!

You can use lead():
select s.*
from (select s.*,
lead(stage) over (partition by cid order by id) as next_stage
from stages s
) s
where stage = 'first stage' and next_stage is null;

CREATE TABLE stages (
id serial PRIMARY KEY
, cid VARCHAR(6) NOT NULL
, stage varchar(30) NOT null
, status varchar(30) not null
, UNIQUE ( cid, stage)
);
INSERT INTO stages (id, cid, stage, status) VALUES
(1, '1', 'first stage', 'accepted'),
(2, '1', 'second stage', 'current'),
(3, '2', 'first stage', 'accepted'),
(4, '3', 'first stage', 'accepted'),
(5, '3', 'second stage', 'accepted'),
(6, '3', 'third stage', 'current')
;
ANALYZE stages;
-- You can fetch all (three) stages with one query
-- Luckily, {'first', 'second', 'third'} are ordered alphabetically ;-)
-- --------------------------------------------------------------
-- EXPLAIN ANALYZE
SELECT * FROM stages q
WHERE NOT EXISTS (
SELECT * FROM stages x
WHERE x.cid = q.cid AND x.stage > q.stage
);
-- Some people dont like EXISTS, or think that it is slow.
-- --------------------------------------------------------------
-- EXPLAIN ANALYZE
SELECT q.*
FROM stages q
JOIN (
SELECT id
, row_number() OVER (PARTITION BY cid ORDER BY stage DESC) AS rn
FROM stages x
)x ON x.id = q.id AND x.rn = 1;

Related

Combine subqueries without views

I work with languages where I can assign intermediate outputs to a variable and then work the with variables to create a final output. I know SQL doesn't work this way as much. Currently I have queries that require me to make subsets of tables and then I want to join those subsets together. I can mimic the variable assignment I do in my native languages using a VIEW but I want to know how to do this using a single query (otherwise the database will get messy with views quickly).
Below is a MWE to make 2 initial tables DeleteMe1 and DeleteMe2 (at the end). Then I'd use these two views to get current snapshots of each table. Last I'd use LEFT JOIN with the views to merge the 2 data sets.
Is there a way to see the code SQL uses on the Join Snapshoted Views header code I supply below
How could I eliminate the views intermediate step and combine into a single SQL query?
Create views for current snapshot:
CREATE VIEW [dbo].[CurrentSnapshotDeleteMe1]
AS
SELECT DISTINCT *
FROM
(SELECT
t.[Id]
,t.[OppId]
,t.[LastModifiedDate]
,t.[Stage]
FROM
[dbo].DeleteMe1 as t
INNER JOIN
(SELECT
[OppId], MAX([LastModifiedDate]) AS MaxLastModifiedDate
FROM
[dbo].DeleteMe1
WHERE
LastModifiedDate <= GETDATE()
GROUP BY
[OppId]) AS referenceGroup ON t.[OppId] = referenceGroup.[OppId]
AND t.[LastModifiedDate] = referenceGroup.[MaxLastModifiedDate]) as BigGroup
GO
CREATE VIEW [dbo].[CurrentSnapshotDeleteMe2]
AS
SELECT DISTINCT *
FROM
(SELECT
t.[Id]
,t.[OppId]
,t.[LastModifiedDate]
,t.[State]
FROM
[dbo].DeleteMe2 AS t
INNER JOIN (
SELECT [OppId], MAX([LastModifiedDate]) AS MaxLastModifiedDate
FROM [dbo].DeleteMe2
WHERE LastModifiedDate <= GETDATE()
GROUP BY [OppId]
) as referenceGroup
ON t.[OppId] = referenceGroup.[OppId] AND t.[LastModifiedDate] = referenceGroup.[MaxLastModifiedDate]
) as BigGroup
GO
Join snapshoted views:
SELECT
dm1.[Id] as IdDM1
,dm1.[OppId]
,dm1.[LastModifiedDate] as LastModifiedDateDM1
,dm1.[Stage]
,dm2.[Id] as IdDM2
,dm2.[LastModifiedDate] as LastModifiedDateDM2
,dm2.[State]
FROM [dbo].[CurrentSnapshotDeleteMe1] as dm1
LEFT JOIN [dbo].[CurrentSnapshotDeleteMe2] as dm2 ON dm1.OppId = dm2.OppId
Create original tables:
CREATE TABLE DeleteMe1
(
[Id] INT,
[OppId] INT,
[LastModifiedDate] DATE,
[Stage] VARCHAR(250),
)
INSERT INTO DeleteMe1
VALUES ('1', '1', '2019-04-01', 'A'),
('2', '1', '2019-05-01', 'E'),
('3', '1', '2019-06-01', 'B'),
('4', '2', '2019-07-01', 'A'),
('5', '2', '2019-08-01', 'B'),
('6', '3', '2019-09-01', 'C'),
('7', '4', '2019-10-01', 'B'),
('8', '4', '2019-11-01', 'C')
CREATE TABLE DeleteMe2
(
[Id] INT,
[OppId] INT,
[LastModifiedDate] DATE,
[State] VARCHAR(250),
)
INSERT INTO DeleteMe2
VALUES (' 1', '1', '2018-07-01', 'California'),
(' 2', '1', '2017-11-01', 'Delaware'),
(' 3', '4', '2017-12-01', 'California'),
(' 4', '2', '2018-01-01', 'Alaska'),
(' 5', '4', '2018-02-01', 'Delaware'),
(' 6', '2', '2018-09-01', 'Delaware'),
(' 7', '3', '2018-04-01', 'Alaska'),
(' 8', '1', '2018-05-01', 'Hawaii'),
(' 9', '4', '2018-06-01', 'California'),
('10', '1', '2018-07-01', 'Connecticut'),
('11', '2', '2018-08-01', 'Delaware'),
('12', '2', '2018-09-01', 'California')
I work with languages where I can assign intermediate outputs to a variable and then work the with variables to create a final output. I know SQL doesn't work this way as much.
Well, that's not true, sql does work this way, or at least sql-server does. You have temp tables and table variables.
Although you named your tables DeleteMe, from your statements it seems like it's the views you wish to treat as variables. So I'll focus on this.
Here's how to do it for your first view. It puts the results into a temporary table called #tempData1:
-- Optional: In case you re-run before you close your connection
if object_id('tempdb..#snapshot') is not null
drop table #snapshot1;
select
distinct t.Id, t.OppId, t.LastModifiedDate, t.Stage
into #snapshot1
from dbo.DeleteMe1 as t
inner join (
select OppId, max(LastModifiedDate) AS MaxLastModifiedDate
from dbo.DeleteMe1
where LastModifiedDate <= getdate()
group by OppId
) referenceGroup
on t.OppId = referenceGroup.OppId
and t.LastModifiedDate = referenceGroup.MaxLastModifiedDate;
The hashtag tells sql server that the table is to be stored temporarially. #tempTable1 will not survive when your connection closes.
Alternatively, you can create a table variable.
declare #snapshot1 table (
id int,
oppId int,
lastModifiedDate date,
stage varchar(50)
);
insert #snapshot1 (id, oppId, lastModifiedDate, stage)
select distinct ...
This table is discarded as soon as the query has finished executing.
From there, you can join on your temp tables:
SELECT dm1.[Id] as IdDM1, dm1.[OppId],
dm1.[LastModifiedDate] as LastModifiedDateDM1, dm1.[Stage],
dm2.[Id] as IdDM2, dm2.[LastModifiedDate] as LastModifiedDateDM2,
dm2.[State]
FROM #snapshot1 dm1
LEFT JOIN #snapshot2 dm2 ON dm1.OppId = dm2.OppId
Or your table variables:
From there, you can join on your temp tables:
SELECT dm1.[Id] as IdDM1, dm1.[OppId],
dm1.[LastModifiedDate] as LastModifiedDateDM1, dm1.[Stage],
dm2.[Id] as IdDM2, dm2.[LastModifiedDate] as LastModifiedDateDM2,
dm2.[State]
FROM #snapshot1 dm1
LEFT JOIN #snapshot2 dm2 ON dm1.OppId = dm2.OppId

Oracle - Comparing multiple records

I am stuck with a strange scenario now.
I have a table with many records. A few of which looks like:
Table 1:
----------------------------------------------
M_ID ROLE_ID
----------------------------------------------
idA adVA12^~^dsa25
idA adsf32^~^123^~^asdf32
idA hdghf45
idB fdngfhlo43^~^
idB pnsdfmg23
idC 123ghaskfdnk
idC hafg32^~^^~^gasdfg
----------------------------------------------
and Table 2:
-----------------------------------------------------------
ROLE_ID ADDR1 ADDR2 ADDR3
-----------------------------------------------------------
adVA12^~^dsa25 18 ben street
adsf32^~^123^~^asdf32 24 naruto park
hdghf45 18 ben street
fdngfhlo43^~^ 40 spartan ave
pnsdfmg23 40 spartan ave
123ghaskfdnk 14 southpark ave
hafg32^~^^~^gasdfg 88 brooks st
-----------------------------------------------------------
I have these tables linked by ROLE_ID.
My requirement is that, all the ROLE_IDs of a single M_ID in Table 1 must be compared for their address fields in Table 2. In case if the address of all the ROLE_IDs corresponding to that single M_ID is not the same in Table 2, it should be returned.
i.e., in this case, my result should be:
-----------------------------
M_ID ROLE_ID
-----------------------------
idA adVA12^~^dsa25
idA adsf32^~^123^~^asdf32
idA hdghf45
-----------------------------
the M_ID, and the corresponding ROLE_IDs.
I have no idea on how to compare multiple records.
I'd join the tables and count the distinct number of addresses:
SELECT m_id, role_id
FROM (SELECT t1.m_id AS m_id,
t1.role_id AS role_id,
COUNT(DISTINCT t2.addr1 || '-' || t2.addr2 || '-' || t2.addr3)
OVER (PARTITION BY t1.m_id) AS cnt
FROM t1
JOIN t2 ON t1.role_id = t2.role_id) t
WHERE cnt > 1
Slightly different approach:
SELECT t1.m_id, substr(sys.stragg(',' || t1.role_id),2) Roles,
t2.ADDR1, t2.ADDR2, t2.ADDR3
FROM t1
JOIN t2 ON t1.role_id = t2.role_id
GROUP BY t1.m_id, t2.ADDR1, t2.ADDR2, t2.ADDR3;
"M_ID" "ROLES" "ADDR1" "ADDR2" "ADDR3"
"A" "A1,A2,A3,A5,A6,A7" "1" "2" "3"
"A" "A4" "4" "2" "3"
You could add HAVING COUNT(0) > 1 to get all with matches, or use HAVING COUNT(0) = 1 to get all instances that are only used once, or use it without a HAVING to get a summary.
I used the following test data:
CREATE TABLE TEST_ROLE (
ID INTEGER PRIMARY KEY,
M_ID VARCHAR2(32) NOT NULL,
ROLE_ID VARCHAR2(256) NOT NULL
);
CREATE TABLE TEST_ROLE_ADDRESS (
ROLE_ID VARCHAR2(256) NOT NULL,
ADDR1 VARCHAR2(1000),
ADDR2 VARCHAR2(1000),
ADDR3 VARCHAR2(1000)
);
INSERT INTO TEST_ROLE VALUES(1, 'A', 'A1');
INSERT INTO TEST_ROLE VALUES(2, 'A', 'A2');
INSERT INTO TEST_ROLE VALUES(3, 'A', 'A3');
INSERT INTO TEST_ROLE VALUES(4, 'A', 'A4');
INSERT INTO TEST_ROLE VALUES(5, 'A', 'A5');
INSERT INTO TEST_ROLE VALUES(6, 'A', 'A6');
INSERT INTO TEST_ROLE VALUES(7, 'A', 'A7');
INSERT INTO TEST_ROLE_ADDRESS VALUES ('A1', '1', '2', '3');
INSERT INTO TEST_ROLE_ADDRESS VALUES ('A2', '1', '2', '3');
INSERT INTO TEST_ROLE_ADDRESS VALUES ('A3', '1', '2', '3');
INSERT INTO TEST_ROLE_ADDRESS VALUES ('A4', '4', '2', '3');
INSERT INTO TEST_ROLE_ADDRESS VALUES ('A5', '1', '2', '3');
INSERT INTO TEST_ROLE_ADDRESS VALUES ('A6', '1', '2', '3');
INSERT INTO TEST_ROLE_ADDRESS VALUES ('A7', '1', '2', '3');

Need to create rows out of a column in a specific way

CREATE TABLE STORE
(
STORE_CODE INT,
STORE_NAME VARCHAR(20),
STORE_YTD_SALES NUMERIC,
REGION_CODE INT,
EMP_CODE INT
);
INSERT INTO STORE
VALUES ('1', 'Access Junction', '1003455.76', '2', '8'),
('2', 'Database Corner', '1421987.39', '2', '12'),
('3', 'Tuple Charge', '986783.22', '1', '7'),
('4', 'Attribute Alley', '944568.56', '2', '3'),
('5', 'Primary Key Point', '2930098.45', '1', '15');
CREATE TABLE REGION
(
REGION_CODE INT,
REGION_DESCRIPT VARCHAR(10)
);
INSERT INTO REGION
VALUES ('1', 'East'), ('2', 'West');
I am new to SQL I need to make a list of all stores and regions, as in the following sample:
Code Description
----------- --------------------
1 Access Junction
1 East
2 Database Corner
but I am not sure how to make it work. Can someone help me, please?
Solution for your problem:
SELECT Code, Description
FROM (
SELECT STORE_CODE AS Code, Store_Name AS Description
FROM Store
UNION ALL
SELECT REGION_CODE AS Code, REGION_DESCRIPT AS Description
FROM REGION
) AS t
ORDER BY Code,Description
OUTPUT:
Code Description
---------------------------
1 Access Junction
1 East
2 Database Corner
2 West
3 Tuple Charge
4 Attribute Alley
5 Primary Key Point
Link To the Demo:
http://sqlfiddle.com/#!18/6cc1a/2

Comparing data in same table - SQL

I'm working with a course catalog table in which I have catalog codes and course codes for when the courses were offered. What I need to do is to determine when a course isn't being offered any longer and mark it as an archived course.
CREATE TABLE [dbo].[COURSECATALOG](
[catalog_code] [char](6) NOT NULL,
[course_code] [char](7) NOT NULL,
[title] [char](40) NOT NULL,
[credits] [decimal](7, 4) NULL,
)
insert into coursecatalog
values
('200810', 'BIOL101', 'Biology', '3'),
('200810', 'CHEM201', 'Advanced Chemistry', '3'),
('200810', 'ACCT101', 'Beginning Accounting', '3'),
('201012', 'ACCT101', 'Beginning Accounting', '3'),
('201214', 'ACCT101', 'Beginning Accounting', '3'),
('201214', 'ENGL101', 'English Composition', '3'),
('201416', 'PSYC101', 'Psychology', '3'),
('201618', 'PSYC101', 'Psychology', '3'),
('201618', 'BIOL101', 'Biology', '3'),
('201618', 'CHEM201', 'Advanced Chemistry', '3'),
('201618', 'ENGL101', 'English Composition', '3'),
('201618', 'PSYC101', 'Psychology', '3')
In this case, I need to return ACCT101 - Beginning Accounting since this isn't being offered anymore and should be considered an archived course.
My code so far:
SELECT
catalog_code, course_code
FROM COURSECATALOG t1
WHERE NOT EXISTS (SELECT 1
FROM COURSECATALOG t2
WHERE t2.catalog_code <> t1.catalog_code
AND t2.course_code = t1.course_code)
order by
course_code, catalog_code
But this only returns courses that were only ever offered one time (in one catalog). I need to figure out how I can get courses that might have been offered in multiple catalogs but isn't offered any longer.
Any assistance that can be provide is appreciated!
Thank you!
I think the catalog_code is a date with YYYYMM format
SELECT course_code FROM (
SELECT CONVERT(char, catalog_code,112) AS catalog_code, course_code FROM COURSECATALOG
) AS Q
GROUP BY course_code
HAVING MAX(catalog_code) < '20160101'
Example:
http://sqlfiddle.com/#!6/32adfb/14/1
You want something like this:
SELECT course_code
FROM COURSECATALOG t1
GROUP BY course_code
HAVING MAX(catalog_code) <> '201618';
This assumes that "currently being offered" means that it is in the 201618 catalog.
You could calculate the most recent catalog:
SELECT course_code
FROM COURSECATALOG t1
GROUP BY course_code
HAVING MAX(catalog_code) <> (SELECT MAX(catalog_code FROM COURSECATALOG);

SQL Update First record of Duplicate row in table

I am looking to update the first record when a duplicate is found in a table.
CREATE TABLE tblauthor
(
Col1 varchar(20),
Col2 varchar(30)
);
CREATE TABLE tblbook
(
Col1 varchar(20),
Col2 varchar(30),
Col3 varchar(30)
);
INSERT INTO tblAuthor
(Col1,Col2)
VALUES
('1', 'John'),
('2', 'Jane'),
('3', 'Jack'),
('4', 'Joe');
INSERT INTO tblbook
(Col1,Col2,Col3)
VALUES
('1', 'John','Book 1'),
('2', 'John','Book 2'),
('3', 'Jack','Book 1'),
('4', 'Joe','Book 1'),
('5', 'Joe','Book 2'),
('6', 'Jane','Book 1'),
('7', 'Jane','Book 2');
The update result I want to accomplish should update the records as follows. I would like tblbook.col3 = 1st.
select * from tblbook
('1', 'John','1st'),
('3', 'Jack','1st'),
('4', 'Joe','1st'),
('6', 'Jane','1st');
Can't seem to even get this done with distinct.
Use ROW_NUMBER to assign a number to each row grouped by the Author's name (col2) and then update the ones that have a number of 1
update tblbook set col3 = '1st'
where col1 in(
select
col1
from (
select
tblbook.col1,
tblbook.col2,
tblbook.col3,
ROW_NUMBER() OVER (PARTITION BY tblbook.Col2 order by tblbook.col1) as rownum
from tblbook
left outer join tblauthor on tblbook.col2 = tblauthor.col2
) [t1]
where [t1].rownum = 1
)
Fiddle: http://sqlfiddle.com/#!3/4b6c8/20/0
If you want to update tblbook so the third column is '1st' on duplicates, then you can easily do so with an updatable CTE:
with toupdate as (
select tbl2.*, row_number() over (partition by col2 order by col1) as seqnum
from tbl2
)
update toupdate
set col3 = '1st'
where seqnum = 1;
This is the closest that I can come to understanding what you really want.