SELECT values where their foreign keys are duplicate - sql

Let's say that we have clients and providers. A client can have multiple providers (like the internet, phone, TV etc) and I would like to find clients' names who have multiple providers.
create table clients
(
client_id char(8) not null,
client_name varchar(80) not null,
contract char(1) not null,
primary key (client_id)
)
create table client_provider
(
provider_id char(11) not null,
client_id char(8) not null,
primary key (provider_id, client_id),
foreign key (provder_id) references providers ON DELETE CASCADE,
foreign key (client_id) references clients ON DELETE CASCADE
);
Therefore, even without knowing anything about providers, we can know clients with multiple providers by the following relational algebra (just started learning, please correct me if I am wrong):
π client_name (
[
σ client_provider2.provider_id ≠ client_provider.provider_id ∧ client_provider2.client_id = client_provider.client_id (ρ client_provider2 (client_provider) ⨯ client_provider))
⨝ clients]
what I have tried so far (returning "not a GROUP BY expression" in line 1):
SQL> select c.client_name
2 from clients c
3 inner join client_provider cp on c.client_id = cp.client_id
4 group by cp.client_id
5 having count(*) > 1;

When using GROUP BY all columns used should either be in GROUP BY or in an aggregate function. To resolve the issue do the following:
Add cp.client_id in SELECT clause
Add c.client_name in GROUP BY clause
SELECT
cp.client_id,
c.client_name
FROM clients c
INNER JOIN client_provider cp
ON c.client_id = cp.client_id
GROUP BY
cp.client_id,
c.client_name
HAVING
COUNT(1) > 1

All non-aggregated columns must be in group by clause, now you know that.
As you commented that you want to display only client_name but not client_id (while it has to be in the group by clause), use current query as source for the final result:
select client_name
from (-- current query begins here
select cp.client_id,
c.client_name
from clients c join client_provider cp on c.client_id = cp.client_id
group by cp.client_id,
c.client_name
having count(*) > 1
-- current query ends here
);
Alternatively, you could do it by using (slightly modified) current query as a subquery:
select cl.client_name
from client cl
where cl.client_id in (select cp.client_id
from client_provider cp
group by cp.client_id
having count(*) > 1
);

Related

How to relate tables SQL

I have three tables and i want to relate them, but i don't know what im doing wrong. If the way that im thinking is bad, can you correct me also?
I have clients table with Primary key as ID_c column,
create table clients
(
id_c INTEGER not null,
name VARCHAR2(20),
age INTEGER,
address VARCHAR2(20),
Primary key (id_c)
);
also i have products with primary key as ID_p column.
create table PRODUCTS
(
id_p NUMBER not null,
name_product VARCHAR2(30),
price NUMBER,
duration NUMBER,
primary key (id_p)
);
and now i create third
create table TRANSACTIONS
(
id_t NUMBER not null,
id_c NUMBER not null,
id_p NUMBER not null
primary key (ID_t),
foreign key (ID_c) references CLIENTS (ID_c),
foreign key (ID_p) references PRODUCTS (ID_p)
);
and now i want to see all records that are connected, so im trying to use that:
select * from transactions join clients using (id_c) and join products using (id_p);
but only what works is
select * from transactions join clients using (id_c);
is it relational database or im making something too easy, and too primitive? How can i do that to connect everything?
try this
select *
from transactions
inner join clients on transactions.id_c = clients.id_c
inner join products on transactions.id_p = products.id_p;
Are you just trying to join?
select * from transactions a
join clients b on a.id_c = b.id_c
join products c on a.id_p = c.id_p
If you want to join 3 tables, just write:
SELECT * FROM TRANSACTIONS t JOIN client c on t.id_c = c.id_c JOIN PRODUCTS p on t.id_p = p.id_p

How to select from multiple tables in a group by query?

I have some database tables containing some documents that people need to sign. The tables are defined (somewhat simplified) as follows.
create table agreement (
id integer NOT NULL,
name character varying(50) NOT NULL,
org_id integer NOT NULL,
CONSTRAINT agreement_pkey PRIMARY KEY (id)
CONSTRAINT org FOREIGN KEY (org_id) REFERENCES org (id) MATCH SIMPLE
)
create table version (
id integer NOT NULL,
content text NOT NULL,
publish_date timestamp NOT NULL,
agreement_id integer NOT NULL,
CONSTRAINT version_pkey PRIMARY KEY (id)
CONSTRAINT agr FOREIGN KEY (agreement_id) REFERENCES agreement (id) MATCH SIMPLE
)
I skipped the org table, to reduce clutter. I have been trying to write a query that would give me all the right agreement information for a given org. So far, I can do
SELECT a.id, a.name FROM agreement AS a
JOIN version as v ON (a.id = v.agreement_id)
JOIN org as o ON (o.id = a.org_id)
WHERE o.name = $1
GROUP BY a.id
This seems to give me a single record for each agreement that belongs to the org I want and has at least one version. But I need to also include content and date published of the latest version available. How do I do that?
Also, I have a separate table called signatures that links to a user and a version. If possible, I would like to extend this query to only include agreements where a given user didn't yet sign the latest version.
Edit: reflected the need for the org join, since I select orgs by name rather than by id
You can use a correlated subquery:
SELECT a.id, a.name, v.*
FROM agreement a JOIN
version v
ON a.id = v.agreement_id
WHERE a.org_id = $1 AND
v.publish_date = (SELECT MAX(v2.publish_date) FROM version v2 WHERE v2.agreement_id = v.agreement_id);
Notes:
The org table is not needed because agreement has an org_id.
No aggregation is needed for this query. You are filtering for the most recent record.
The correlated subquery is one method that retrieves the most recent version.
Postgresql has Window Functions.
Window functions allow you to operate a sort over a specific column or set of columns. the rank function returns the row's place in the results for the sort. If you filter to just where the rank is 1 then you will always get just one row and it will be the highest sorted for the partition.
select u.id, u.name, u.content, u.publish_date from (
SELECT a.id, a.name, v.content, v.publish_date, rank() over (partition by a.id order by v.id desc) as pos
FROM agreement AS a
JOIN version as v ON (a.id = v.agreement_id)
JOIN org as o ON (o.id = a.org_id)
WHERE o.id = $1
) as u
where pos = 1
SELECT a.id, a.name, max(v.publish_date) publish_date FROM agreement AS a
JOIN version as v ON (a.id = v.agreement_id)
JOIN org as o ON (o.id = a.org_id)
WHERE o.id = $1
GROUP BY a.id, a.name

Writing a query to combine results from multiple tables with all possible combinations

I have this database schema:
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name char(50) NOT NULL UNIQUE
);
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name char(50) NOT NULL,
);
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
uid INTEGER REFERENCES users (id) NOT NULL,
pid INTEGER REFERENCES products (id) NOT NULL,
quantity INTEGER NOT NULL,
price FLOAT NOT NULL CHECK (price >= 0)
);
I am trying to write a query that will give me all combinations of users and products, as well as the total amount spent by the user on that product. Specifically, if I have 5 products and 5 users, there should be 25 rows in the table. Right now I have a query that almost gets the job done, however, if the user has never purchased that product then there is no row printed at all.
Here's what I've written so far:
SELECT u.name as username, p.name as productname, SUM(o.quantity * o.price) as totalPrice
FROM users u, orders o, products p
WHERE u.id = o.uid
AND p.id = o.pid
GROUP BY u.name, p.name
ORDER BY u.name, p.name
I figure that this requires some sort of join, but my SQL knowledge is limited and I am not sure what would be the best way to go about doing this. I think if somebody can help me figure this out then I will have a much better understanding.
You can do this using cross join and left join:
select u.name as username, p.name as productname,
sum(o.quantity * o.price) as totalPrice
from users u cross join
products p left join
orders o
on o.uid = u.id and o.pid = p.id
group by u.name, p.name;
The cross join generates all the rows. The left join brings in the matching rows. A simple rule when using SQL is: Never use commas in the FROM clause. Always use explicit JOIN syntax.

SQL COUNT EXCLUDE

Hello here's my question
Retrieve the total number of bookings for each type of the services that
have at least three bookings (excluding those cancelled).
i.e. where status = 'open' AND 'done'
I'm not to sure on how to exclude and how to count values in a column?
SELECT Service.type, Service.description,
COUNT (DISTINCT status)
FROM Booking
LEFT JOIN Service
ON Booking.service = Service.type
WHERE status >= 3
EXCLUDE 'cancelled'
GROUP BY status DESC;
CREATE TABLE Booking(
car CHAR(8) ,
on_date DATE NOT NULL,
at_time TIME NOT NULL,
technician CHAR(6) NOT NULL,
service VARCHAR(15) NOT NULL,
status VARCHAR(9)CHECK(status IN ('open','done', 'cancelled')) DEFAULT 'open' NOT NULL,
note VARCHAR(200) ,
rating INTEGER CHECK(rating IN('0','1','2','3','4','5')) DEFAULT '0' NOT NULL,
feedback VARCHAR(2048) ,
PRIMARY KEY (car, on_date, at_time),
FOREIGN KEY (car) REFERENCES Car (cid)
ON DELETE CASCADE
ON UPDATE CASCADE,
FOREIGN KEY (technician) REFERENCES Technician (tech_id)
ON DELETE CASCADE
ON UPDATE CASCADE,
FOREIGN KEY (service) REFERENCES Service (type)
ON DELETE CASCADE
ON UPDATE CASCADE
);
CREATE TABLE Service(
type VARCHAR(15) PRIMARY KEY,
description VARCHAR(2048)
);
It will be faster to aggregate first and join later. Fewer join operations. Hence the subquery:
SELECT s.type, s.description, b.ct
FROM (
SELECT service, count(*) AS ct
FROM booking
WHERE status <> 'cancelled'
GROUP BY 1
HAVING count(*) > 2
) b
JOIN service s ON s.type = b.service;
Since you enforce referential integrity with a foreign key constraint and service is defined NOT NULL, you can as well use [INNER] JOIN instead of a LEFT [OUTER] JOIN in this query.
It would be cleaner and more efficient to use an enum data type instead of VARCHAR(9) for the status column. Then you wouldn't need the CHECK constraint either.
For best performance of this particular query, you could have a partial covering index (which would also profit from the enum data type):
CREATE INDEX foo ON booking (service)
WHERE status <> 'cancelled';
Every index carries a maintenance cost, so only keep this tailored index if it actually makes your query faster (test with EXPLAIN ANALYZE) and it is run often and / or important.
select s.type, s.description, count(*)
from
booking b
inner join
service s on b.service = s.type
where status != 'cancelled'
group by 1, 2
having count(*) >= 3
order by 3 desc;
How about:
SELECT Service.type, Service.description, COUNT (status)
FROM Booking
LEFT JOIN Service ON Booking.service = Service.type
WHERE status != 'cancelled'
GROUP BY Service.type, Service.description
HAVING COUNT(status) >= 2;
The service attributes have to be grouped as well.
Filtering by aggregate, here COUNT(status), is what the HAVING clause does.

SQL Anomaly Using 'USING' Clause with Nested Queries?

I have a normalized database containing 3 tables whose DDL is this:
CREATE CACHED TABLE Clients (
cli_id INTEGER GENERATED ALWAYS AS IDENTITY (START WITH 100) PRIMARY KEY,
defmrn_id BIGINT,
lastName VARCHAR(48) DEFAULT '' NOT NULL,
midName VARCHAR(24) DEFAULT '' NOT NULL,
firstName VARCHAR(24) DEFAULT '' NOT NULL,
doB INTEGER DEFAULT 0 NOT NULL,
gender VARCHAR(1) NOT NULL);
CREATE TABLE Client_MRNs (
mrn_id BIGINT GENERATED ALWAYS AS IDENTITY (START WITH 100) PRIMARY KEY,
cli_id INTEGER REFERENCES Clients ( cli_id ),
inst_id INTEGER REFERENCES Institutions ( inst_id ),
mrn VARCHAR(32) DEFAULT '' NOT NULL,
CONSTRAINT climrn01 UNIQUE (mrn, inst_id));
CREATE TABLE Institutions (
inst_id INTEGER GENERATED ALWAYS AS IDENTITY (START WITH 100) PRIMARY KEY,
loc_id INTEGER REFERENCES Locales (loc_id ),
itag VARCHAR(6) UNIQUE NOT NULL,
iname VARCHAR(80) DEFAULT '' NOT NULL);
The first table contains a foreign key column, defmrn_id, that is a reference to a "default identifier code" that is stored in the second table (which is a list of all identifier codes). A record in the first table may have many identifiers, but only one default identifier. So yeah, I have created a circular reference.
The third table is just normalized data from the second table.
I wanted a query that would find a CLIENT record based on matching a supplied identifier code to any of the identifier codes in CLIENT_MRNs that may belong to that CLIENT record.
My strategy was to first identify those records that matched in the second table (CLIENT_MRN) and then use that intermediate result to join to records in the CLIENT table that matched other user-supplied searching criteria. I also need to denormalize the identifier reference defmrn_id in the 1st table. Here is what I came up with...
SQL = SELECT c.*, r.mrn, i.inst_id, i.itag, i.iname
FROM Clients AS c
INNER JOIN
(
SELECT m.cli_id
FROM Client_MRNs AS m
WHERE m.mrn = ?
) AS m2 ON m2.cli_id = c.cli_id
INNER JOIN Client_MRNs AS r ON c.defmrn_id = r.mrn_id
INNER JOIN Institutions AS i USING ( inst_id )
WHERE (<other user supplied search criteria...>);
The above works, but I spent some time trying to understand why the following was NOT working...
SQL = SELECT c.*, r.mrn, i.inst_id, i.itag, i.iname
FROM Clients AS c
INNER JOIN
(
SELECT m.cli_id
FROM Client_MRNs AS m
WHERE m.mrn = ?
) AS m2 USING ( cli_id )
INNER JOIN Client_MRNs AS r ON c.defmrn_id = r.mrn_id
INNER JOIN Institutions AS i USING ( inst_id )
WHERE (<other user supplied search criteria...>);
It seems to me that the second SQL should work, but it fails on the USING clause every time. I am executing these queries against a database managed by HSQLDB 2.2.9 as the RDBMS. Is this a parsing issue in HSQLDB or is this a known limitation of the USING clause with nested queries?
You can always try with HSQLDB 2.3.0 (a release candidate).
The way you report the incomplet SQL does not allow proper checking. But there is an ovbious mistake in the query. If you have:
SELECT INST_ID FROM CLIENTS_MRS AS R INNER JOIN INSTITUTIONS AS I USING (INST_ID)
INST_ID can be used in the SELECT column list only without a table qualifier. The reason is it is no longer considered a column of either table. The same is true with common columns if you use NATURAL JOIN.
This query is accepted by version 2.3.0
SELECT c.*, r.mrn, inst_id, i.itag, i.iname
FROM Clients AS c
INNER JOIN
(
SELECT m.cli_id
FROM Client_MRNs AS m
WHERE m.mrn = 2
) AS m2 USING ( cli_id )
INNER JOIN Client_MRNs AS r ON c.defmrn_id = r.mrn_id
INNER JOIN Institutions AS i USING ( inst_id )