Comparing data in same table - SQL - sql

I'm working with a course catalog table in which I have catalog codes and course codes for when the courses were offered. What I need to do is to determine when a course isn't being offered any longer and mark it as an archived course.
CREATE TABLE [dbo].[COURSECATALOG](
[catalog_code] [char](6) NOT NULL,
[course_code] [char](7) NOT NULL,
[title] [char](40) NOT NULL,
[credits] [decimal](7, 4) NULL,
)
insert into coursecatalog
values
('200810', 'BIOL101', 'Biology', '3'),
('200810', 'CHEM201', 'Advanced Chemistry', '3'),
('200810', 'ACCT101', 'Beginning Accounting', '3'),
('201012', 'ACCT101', 'Beginning Accounting', '3'),
('201214', 'ACCT101', 'Beginning Accounting', '3'),
('201214', 'ENGL101', 'English Composition', '3'),
('201416', 'PSYC101', 'Psychology', '3'),
('201618', 'PSYC101', 'Psychology', '3'),
('201618', 'BIOL101', 'Biology', '3'),
('201618', 'CHEM201', 'Advanced Chemistry', '3'),
('201618', 'ENGL101', 'English Composition', '3'),
('201618', 'PSYC101', 'Psychology', '3')
In this case, I need to return ACCT101 - Beginning Accounting since this isn't being offered anymore and should be considered an archived course.
My code so far:
SELECT
catalog_code, course_code
FROM COURSECATALOG t1
WHERE NOT EXISTS (SELECT 1
FROM COURSECATALOG t2
WHERE t2.catalog_code <> t1.catalog_code
AND t2.course_code = t1.course_code)
order by
course_code, catalog_code
But this only returns courses that were only ever offered one time (in one catalog). I need to figure out how I can get courses that might have been offered in multiple catalogs but isn't offered any longer.
Any assistance that can be provide is appreciated!
Thank you!

I think the catalog_code is a date with YYYYMM format
SELECT course_code FROM (
SELECT CONVERT(char, catalog_code,112) AS catalog_code, course_code FROM COURSECATALOG
) AS Q
GROUP BY course_code
HAVING MAX(catalog_code) < '20160101'
Example:
http://sqlfiddle.com/#!6/32adfb/14/1

You want something like this:
SELECT course_code
FROM COURSECATALOG t1
GROUP BY course_code
HAVING MAX(catalog_code) <> '201618';
This assumes that "currently being offered" means that it is in the 201618 catalog.
You could calculate the most recent catalog:
SELECT course_code
FROM COURSECATALOG t1
GROUP BY course_code
HAVING MAX(catalog_code) <> (SELECT MAX(catalog_code FROM COURSECATALOG);

Related

get only records which have similar values in column A and different values in column B

My dataset has 2 tables:
animals with animal_id and animal_type
owners with animal_id and owner_name
I want to get records only for those animals (+ their owners name) which owners have a CAT and another different pet.
Here is my schema:
CREATE TABLE IF NOT EXISTS `animals` (
`animal_id` int(6) unsigned NOT NULL,
`animal_type` varchar(200) NOT NULL,
PRIMARY KEY (`animal_id`)
) DEFAULT CHARSET=utf8;
INSERT INTO `animals` (`animal_id`, `animal_type`) VALUES
('1', 'cat'),
('2', 'dog'),
('3', 'cat'),
('4', 'cat'),
('5', 'dog'),
('6', 'dog'),
('7', 'cat'),
('8', 'dog'),
('9', 'cat'),
('10', 'hamster');
CREATE TABLE IF NOT EXISTS `owners` (
`animal_id` int(6) unsigned NOT NULL,
`owner_name` varchar(200) NOT NULL,
PRIMARY KEY (`animal_id`)
) DEFAULT CHARSET=utf8;
INSERT INTO `owners` (`animal_id`, `owner_name`) VALUES
('1', 'CatOwner'),
('2', 'DogOwner'),
('3', 'CatsOwner'),
('4', 'CatsOwner'),
('5', 'DogsOwner'),
('6', 'DogsOwner'),
('7', 'CatDogOwner'),
('8', 'CatDogOwner'),
('9', 'CatHamsterOwner'),
('10', 'CatHamsterOwner');
I can filter and show only records for owners which have more then one pet:
SELECT *
FROM animals AS a
JOIN owners AS o
ON a.animal_id = o.animal_id
WHERE o.owner_name IN (SELECT o.owner_name
FROM animals AS a
JOIN owners AS o
ON a.animal_id = o.animal_id
GROUP BY o.owner_name HAVING COUNT(o.owner_name) > 1)
Please tell me how can I make it this way:
I would suggest window functions:
SELECT ao.*
FROM (SELECT a.*, o.owner_name,
SUM(CASE WHEN a.animal_type = 'cat' THEN 1 ELSE 0 END) OVER (PARTITION BY o.owner_name) as num_cats,
COUNT(*) OVER (PARTITION BY o.owner_name) as num_animals
FROM owners o JOIN
animals a
ON a.animal_id = o.animal_id
) ao
WHERE num_cats > 0 AND num_animals >= 2;
Note: I'm not clear if the condition is for more than one animal or an animal that is not a cat. If the latter, then use:
SELECT ao.*
FROM (SELECT a.*, o.owner_name,
SUM(CASE WHEN a.animal_type = 'cat' THEN 1 ELSE 0 END) OVER (PARTITION BY o.owner_name) as num_cats,
COUNT(*) OVER (PARTITION BY o.owner_name) as num_animals
FROM owners o JOIN
animals a
ON a.animal_id = o.animal_id
) ao
WHERE num_cats > 0 AND num_animals <> num_cats;

Combine subqueries without views

I work with languages where I can assign intermediate outputs to a variable and then work the with variables to create a final output. I know SQL doesn't work this way as much. Currently I have queries that require me to make subsets of tables and then I want to join those subsets together. I can mimic the variable assignment I do in my native languages using a VIEW but I want to know how to do this using a single query (otherwise the database will get messy with views quickly).
Below is a MWE to make 2 initial tables DeleteMe1 and DeleteMe2 (at the end). Then I'd use these two views to get current snapshots of each table. Last I'd use LEFT JOIN with the views to merge the 2 data sets.
Is there a way to see the code SQL uses on the Join Snapshoted Views header code I supply below
How could I eliminate the views intermediate step and combine into a single SQL query?
Create views for current snapshot:
CREATE VIEW [dbo].[CurrentSnapshotDeleteMe1]
AS
SELECT DISTINCT *
FROM
(SELECT
t.[Id]
,t.[OppId]
,t.[LastModifiedDate]
,t.[Stage]
FROM
[dbo].DeleteMe1 as t
INNER JOIN
(SELECT
[OppId], MAX([LastModifiedDate]) AS MaxLastModifiedDate
FROM
[dbo].DeleteMe1
WHERE
LastModifiedDate <= GETDATE()
GROUP BY
[OppId]) AS referenceGroup ON t.[OppId] = referenceGroup.[OppId]
AND t.[LastModifiedDate] = referenceGroup.[MaxLastModifiedDate]) as BigGroup
GO
CREATE VIEW [dbo].[CurrentSnapshotDeleteMe2]
AS
SELECT DISTINCT *
FROM
(SELECT
t.[Id]
,t.[OppId]
,t.[LastModifiedDate]
,t.[State]
FROM
[dbo].DeleteMe2 AS t
INNER JOIN (
SELECT [OppId], MAX([LastModifiedDate]) AS MaxLastModifiedDate
FROM [dbo].DeleteMe2
WHERE LastModifiedDate <= GETDATE()
GROUP BY [OppId]
) as referenceGroup
ON t.[OppId] = referenceGroup.[OppId] AND t.[LastModifiedDate] = referenceGroup.[MaxLastModifiedDate]
) as BigGroup
GO
Join snapshoted views:
SELECT
dm1.[Id] as IdDM1
,dm1.[OppId]
,dm1.[LastModifiedDate] as LastModifiedDateDM1
,dm1.[Stage]
,dm2.[Id] as IdDM2
,dm2.[LastModifiedDate] as LastModifiedDateDM2
,dm2.[State]
FROM [dbo].[CurrentSnapshotDeleteMe1] as dm1
LEFT JOIN [dbo].[CurrentSnapshotDeleteMe2] as dm2 ON dm1.OppId = dm2.OppId
Create original tables:
CREATE TABLE DeleteMe1
(
[Id] INT,
[OppId] INT,
[LastModifiedDate] DATE,
[Stage] VARCHAR(250),
)
INSERT INTO DeleteMe1
VALUES ('1', '1', '2019-04-01', 'A'),
('2', '1', '2019-05-01', 'E'),
('3', '1', '2019-06-01', 'B'),
('4', '2', '2019-07-01', 'A'),
('5', '2', '2019-08-01', 'B'),
('6', '3', '2019-09-01', 'C'),
('7', '4', '2019-10-01', 'B'),
('8', '4', '2019-11-01', 'C')
CREATE TABLE DeleteMe2
(
[Id] INT,
[OppId] INT,
[LastModifiedDate] DATE,
[State] VARCHAR(250),
)
INSERT INTO DeleteMe2
VALUES (' 1', '1', '2018-07-01', 'California'),
(' 2', '1', '2017-11-01', 'Delaware'),
(' 3', '4', '2017-12-01', 'California'),
(' 4', '2', '2018-01-01', 'Alaska'),
(' 5', '4', '2018-02-01', 'Delaware'),
(' 6', '2', '2018-09-01', 'Delaware'),
(' 7', '3', '2018-04-01', 'Alaska'),
(' 8', '1', '2018-05-01', 'Hawaii'),
(' 9', '4', '2018-06-01', 'California'),
('10', '1', '2018-07-01', 'Connecticut'),
('11', '2', '2018-08-01', 'Delaware'),
('12', '2', '2018-09-01', 'California')
I work with languages where I can assign intermediate outputs to a variable and then work the with variables to create a final output. I know SQL doesn't work this way as much.
Well, that's not true, sql does work this way, or at least sql-server does. You have temp tables and table variables.
Although you named your tables DeleteMe, from your statements it seems like it's the views you wish to treat as variables. So I'll focus on this.
Here's how to do it for your first view. It puts the results into a temporary table called #tempData1:
-- Optional: In case you re-run before you close your connection
if object_id('tempdb..#snapshot') is not null
drop table #snapshot1;
select
distinct t.Id, t.OppId, t.LastModifiedDate, t.Stage
into #snapshot1
from dbo.DeleteMe1 as t
inner join (
select OppId, max(LastModifiedDate) AS MaxLastModifiedDate
from dbo.DeleteMe1
where LastModifiedDate <= getdate()
group by OppId
) referenceGroup
on t.OppId = referenceGroup.OppId
and t.LastModifiedDate = referenceGroup.MaxLastModifiedDate;
The hashtag tells sql server that the table is to be stored temporarially. #tempTable1 will not survive when your connection closes.
Alternatively, you can create a table variable.
declare #snapshot1 table (
id int,
oppId int,
lastModifiedDate date,
stage varchar(50)
);
insert #snapshot1 (id, oppId, lastModifiedDate, stage)
select distinct ...
This table is discarded as soon as the query has finished executing.
From there, you can join on your temp tables:
SELECT dm1.[Id] as IdDM1, dm1.[OppId],
dm1.[LastModifiedDate] as LastModifiedDateDM1, dm1.[Stage],
dm2.[Id] as IdDM2, dm2.[LastModifiedDate] as LastModifiedDateDM2,
dm2.[State]
FROM #snapshot1 dm1
LEFT JOIN #snapshot2 dm2 ON dm1.OppId = dm2.OppId
Or your table variables:
From there, you can join on your temp tables:
SELECT dm1.[Id] as IdDM1, dm1.[OppId],
dm1.[LastModifiedDate] as LastModifiedDateDM1, dm1.[Stage],
dm2.[Id] as IdDM2, dm2.[LastModifiedDate] as LastModifiedDateDM2,
dm2.[State]
FROM #snapshot1 dm1
LEFT JOIN #snapshot2 dm2 ON dm1.OppId = dm2.OppId

Postgresql: An alternative to subqueries to make the query more efficient?

So I have the following table with the schema:
CREATE TABLE stages (
id serial PRIMARY KEY,
cid VARCHAR(6) NOT NULL,
stage varchar(30) NOT null,
status varchar(30) not null,
);
with the following test data:
INSERT INTO stages (id, cid, stage, status) VALUES
('1', '1', 'first stage', 'accepted'),
('2', '1', 'second stage', 'current'),
('3', '2', 'first stage', 'accepted'),
('4', '3', 'first stage', 'accepted'),
('5', '3', 'second stage', 'accepted'),
('6', '3', 'third stage', 'current')
;
Now the use case is that we want to query this table for each stage for example we will query this table for the 'first stage' and then try to fetch all those cids which do not exist in the subsequent stage for example the 'second stage':
Result Set:
cid | status
2 | 'accepted'
While running the query for the 'second stage', we will try to fetch all those cids that do not exist in the 'third stage' and so on.
Result Set:
cid | status
1 | 'current'
Currently, we do this by making an exists subquery in the where clause which is not very performant.
The question is that is there a better alternative approach to the one we're currently using or do we need to focus on optimizing this current approach only? Also, what further optimizations can we do to make the exists subquery more performant?
Thanks!
You can use lead():
select s.*
from (select s.*,
lead(stage) over (partition by cid order by id) as next_stage
from stages s
) s
where stage = 'first stage' and next_stage is null;
CREATE TABLE stages (
id serial PRIMARY KEY
, cid VARCHAR(6) NOT NULL
, stage varchar(30) NOT null
, status varchar(30) not null
, UNIQUE ( cid, stage)
);
INSERT INTO stages (id, cid, stage, status) VALUES
(1, '1', 'first stage', 'accepted'),
(2, '1', 'second stage', 'current'),
(3, '2', 'first stage', 'accepted'),
(4, '3', 'first stage', 'accepted'),
(5, '3', 'second stage', 'accepted'),
(6, '3', 'third stage', 'current')
;
ANALYZE stages;
-- You can fetch all (three) stages with one query
-- Luckily, {'first', 'second', 'third'} are ordered alphabetically ;-)
-- --------------------------------------------------------------
-- EXPLAIN ANALYZE
SELECT * FROM stages q
WHERE NOT EXISTS (
SELECT * FROM stages x
WHERE x.cid = q.cid AND x.stage > q.stage
);
-- Some people dont like EXISTS, or think that it is slow.
-- --------------------------------------------------------------
-- EXPLAIN ANALYZE
SELECT q.*
FROM stages q
JOIN (
SELECT id
, row_number() OVER (PARTITION BY cid ORDER BY stage DESC) AS rn
FROM stages x
)x ON x.id = q.id AND x.rn = 1;

SQL relationships for unique sets of rows

I am trying to set up a relationship between a couple of tables where a unique set of rows in one table relate to a row in another table.
I have came up with a scenario to reflect what I am trying to accomplish.
In this scenario, we are trying to determine the role(s) that a new hire should be given, based on the set of skills that they posses. An employee can be given multiple roles. For example, a software engineer with management experience is given both the Software Engineer and the Tech Lead roles. However, the roles given must line up exactly with a given skill set. If a new hire comes in with every skill we are looking for, we give them the CTO role. The CTO posses all of the skills for both the Software Engineer and Tech Lead roles, but they are not given those roles.
I believe my issue boils down to the skill_set relationship, where I am trying to tie a unique set of rows from the skill table to a specific skill_set. Any given skill can be in many skill_sets, but when querying for a skill_set, I only want to return the skill_set that contains all of the skills, but currently I don't know of a good way to query for that specific skill_set
We don't need to worry about trying to find roles for lists of skills that aren't valid skill_sets. Those can return no role.
Note: This schema is not set in stone. Changing it is definitely an option, so if I have modeled this incorrectly, we can fix that.
CREATE TABLE IF NOT EXISTS `skill` (
`id` int(6) unsigned NOT NULL,
`name` varchar(16) NOT NULL,
PRIMARY KEY (`id`)
) DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `skill_set` (
`id` int(6) unsigned NOT NULL,
`skill_id` int(6) unsigned NOT NULL,
PRIMARY KEY (`id`, `skill_id`)
) DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `default_role` (
`skill_set_id` int(6) unsigned NOT NULL,
`role_id` int(6) unsigned NOT NULL,
PRIMARY KEY (`skill_set_id`, `role_id`)
) DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `role` (
`id` int(6) unsigned NOT NULL,
`name` varchar(32) NOT NULL,
PRIMARY KEY (`id`)
) DEFAULT CHARSET=utf8;
INSERT INTO `skill` (`id`, `name`) VALUES
('1', 'python'),
('2', 'javascript'),
('3', 'ec2'),
('4', 'docker'),
('5', 'management');
INSERT INTO `skill_set` (`id`, `skill_id`) VALUES
('1', '1'),
('2', '2'),
('3', '1'),
('3', '2'),
('4', '3'),
('5', '4'),
('6', '1'),
('6', '2'),
('6', '5'),
('7', '3'),
('7', '4'),
('7', '5'),
('8', '1'),
('8', '2'),
('8', '3'),
('8', '4'),
('8', '5');
INSERT INTO `default_role` (`skill_set_id`, `role_id`) VALUES
('1', '1'),
('2', '1'),
('3', '2'),
('4', '3'),
('5', '3'),
('6', '2'),
('6', '4'),
('7', '3'),
('7', '4'),
('8', '5');
INSERT INTO `role` (`id`, `name`) VALUES
('1', 'Junior Software Engineer'),
('2', 'Software Engineer'),
('3', 'DevOps Engineer'),
('4', 'Tech Lead'),
('5', 'CTO');
A SQL fiddle is also available: http://sqlfiddle.com/#!9/86bcfe0
Some example outputs:
Given the skills: ['python']
Return the default role: Junior Software Engineer
Given the skills: ['python', 'javascript']
Return the default role: Software Engineer
Given the skills: ['ec2']
Return the default role: DevOps Engineer
Given the skills: ['python', 'javascript', 'management']
Return the default roles: Software Engineer, Tech Lead
Given the skills: ['python', 'javascript', 'ec2', 'docker', 'management']
Return the default role: CTO

Display city with second highest number of stores in sql

Display city with second highest number of stores.
here is table toy_store with data IN IMAGETOY_STORE
Remember this is not service to complete your assignments, you should try it first, show some efforts, for now you could get ans for you question and there are number of way to do this.
SELECT TOP(1),*
FROM
(
SELECT TOP(2) CITY,COUNT(CITY)
FROM TableName
GROUP BY CITY
ORDER BY CITY DESC
)M
The question as I and #jaydip-jadhav have mentioned lacks details. It's also very old at this point but since it came back up via your comments I decided to take another look.
However, the above should work with minimal changes depending on what you're using (again, more details would help such as what DBMS your using and what version, etc).
That said you could also google what you're after and find a variety of solutions.
An Example solution
But here's 2 working basic examples you can view, run and use with this SQL Fiddle.
If the fiddle stops working here's the details you could use to recreate on your own somewhere.
This is using MySQL 5.6 (old, but so's this question).
Schema based on your image (next time, type it out):
CREATE TABLE IF NOT EXISTS `toy_store` (
`toy_store_id` int(6) NOT NULL,
`toy_store_name` varchar(200) NOT NULL,
`city` varchar(200) NOT NULL,
`phone_number` varchar(200) NOT NULL,
`store_opening_time` varchar(200) NOT NULL,
`store_closing_time` varchar(200) NOT NULL,
PRIMARY KEY (`toy_store_id`)
) DEFAULT CHARSET=utf8;
INSERT INTO `toy_store` (`toy_store_id`, `toy_store_name`, `city`, `phone_number`, `store_opening_time`, `store_closing_time`) VALUES
('1', 'kids cave', 'Delhi', '9912312312', '2014-04-01 09:10:12', '2014-04-01 21:42:05'),
('2', 'kids corner', 'Mumbai', '9912312312', '2014-04-01 09:10:12', '2014-04-01 21:42:05'),
('3', 'play and grow', 'Mumbai', '9912312312', '2014-04-01 09:10:12', '2014-04-01 21:42:05'),
('4', 'puzzles and more', 'Delhi', '9912312312', '2014-04-01 09:10:12', '2014-04-01 21:42:05'),
('5', 'uncle same toys den', 'Delhi', '9912312312', '2014-04-01 09:10:12', '2014-04-01 21:42:05'),
('6', 'mickey toys', 'Delhi', '9912312312', '2014-04-01 09:10:12', '2014-04-01 21:42:05'),
('7', 'mickey toys', 'Somewhere Else', '9912312312', '2014-04-01 09:10:12', '2014-04-01 21:42:05');
2 basic queries that work against the above schema:
SELECT * FROM (
SELECT city, count(city) as count_city
FROM toy_store
GROUP BY city
ORDER BY COUNT(city) desc
LIMIT 2
) as top2
ORDER BY top2.count_city asc
LIMIT 1;
-- RESULT
-- city count_city
-- Mumbai 2
SELECT DISTINCT city, count(city) as count_city
FROM toy_store
GROUP BY city
ORDER BY COUNT(city)
LIMIT 1,1;
-- RESULT
-- city count_city
-- Mumbai 2