How to select from multiple tables in a group by query? - sql

I have some database tables containing some documents that people need to sign. The tables are defined (somewhat simplified) as follows.
create table agreement (
id integer NOT NULL,
name character varying(50) NOT NULL,
org_id integer NOT NULL,
CONSTRAINT agreement_pkey PRIMARY KEY (id)
CONSTRAINT org FOREIGN KEY (org_id) REFERENCES org (id) MATCH SIMPLE
)
create table version (
id integer NOT NULL,
content text NOT NULL,
publish_date timestamp NOT NULL,
agreement_id integer NOT NULL,
CONSTRAINT version_pkey PRIMARY KEY (id)
CONSTRAINT agr FOREIGN KEY (agreement_id) REFERENCES agreement (id) MATCH SIMPLE
)
I skipped the org table, to reduce clutter. I have been trying to write a query that would give me all the right agreement information for a given org. So far, I can do
SELECT a.id, a.name FROM agreement AS a
JOIN version as v ON (a.id = v.agreement_id)
JOIN org as o ON (o.id = a.org_id)
WHERE o.name = $1
GROUP BY a.id
This seems to give me a single record for each agreement that belongs to the org I want and has at least one version. But I need to also include content and date published of the latest version available. How do I do that?
Also, I have a separate table called signatures that links to a user and a version. If possible, I would like to extend this query to only include agreements where a given user didn't yet sign the latest version.
Edit: reflected the need for the org join, since I select orgs by name rather than by id

You can use a correlated subquery:
SELECT a.id, a.name, v.*
FROM agreement a JOIN
version v
ON a.id = v.agreement_id
WHERE a.org_id = $1 AND
v.publish_date = (SELECT MAX(v2.publish_date) FROM version v2 WHERE v2.agreement_id = v.agreement_id);
Notes:
The org table is not needed because agreement has an org_id.
No aggregation is needed for this query. You are filtering for the most recent record.
The correlated subquery is one method that retrieves the most recent version.

Postgresql has Window Functions.
Window functions allow you to operate a sort over a specific column or set of columns. the rank function returns the row's place in the results for the sort. If you filter to just where the rank is 1 then you will always get just one row and it will be the highest sorted for the partition.
select u.id, u.name, u.content, u.publish_date from (
SELECT a.id, a.name, v.content, v.publish_date, rank() over (partition by a.id order by v.id desc) as pos
FROM agreement AS a
JOIN version as v ON (a.id = v.agreement_id)
JOIN org as o ON (o.id = a.org_id)
WHERE o.id = $1
) as u
where pos = 1

SELECT a.id, a.name, max(v.publish_date) publish_date FROM agreement AS a
JOIN version as v ON (a.id = v.agreement_id)
JOIN org as o ON (o.id = a.org_id)
WHERE o.id = $1
GROUP BY a.id, a.name

Related

Finding max count of product

I am trying to find max count of product. The result must only display the brands which have max number of products in it. Can anyone suggest a better way to display result.
Here are the table details:
create table Brand (
Id integer PRIMARY KEY,
Name VARCHAR(40) NOT NULL
)
create table Product (
ID integer PRIMARY KEY,
Name VARCHAR(40) NOT NULL,
Price integer NOT NULL,
BrandId integer NOT NULL,
CONSTRAINT BrandId FOREIGN KEY (BrandId)
REFERENCES Brand(Id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
I have used this SQL query below but I am seeing an error below. The result must display the list of the brands that have max number of products as compared to other brands.
select count(p.id) as count, b.name AS brand_name from product p join brand b on b.Id = p.BrandId
where count(p.id) = (select max(count) from product)
group by b.name
ERROR: aggregate functions are not allowed in WHERE
LINE 2: where count(p.id) = (select max(count) from product)
^
SQL state: 42803
Character: 105
In Postgres 13+, you can use FETCH WITH TIES:
select count(*) as count, b.name AS brand_name
from product p join
brand b
on b.Id = p.BrandId
group by b.name
order by count(*) desc
fetch first 1 row with ties;
In older versions you can use window functions.

SELECT values where their foreign keys are duplicate

Let's say that we have clients and providers. A client can have multiple providers (like the internet, phone, TV etc) and I would like to find clients' names who have multiple providers.
create table clients
(
client_id char(8) not null,
client_name varchar(80) not null,
contract char(1) not null,
primary key (client_id)
)
create table client_provider
(
provider_id char(11) not null,
client_id char(8) not null,
primary key (provider_id, client_id),
foreign key (provder_id) references providers ON DELETE CASCADE,
foreign key (client_id) references clients ON DELETE CASCADE
);
Therefore, even without knowing anything about providers, we can know clients with multiple providers by the following relational algebra (just started learning, please correct me if I am wrong):
π client_name (
[
σ client_provider2.provider_id ≠ client_provider.provider_id ∧ client_provider2.client_id = client_provider.client_id (ρ client_provider2 (client_provider) ⨯ client_provider))
⨝ clients]
what I have tried so far (returning "not a GROUP BY expression" in line 1):
SQL> select c.client_name
2 from clients c
3 inner join client_provider cp on c.client_id = cp.client_id
4 group by cp.client_id
5 having count(*) > 1;
When using GROUP BY all columns used should either be in GROUP BY or in an aggregate function. To resolve the issue do the following:
Add cp.client_id in SELECT clause
Add c.client_name in GROUP BY clause
SELECT
cp.client_id,
c.client_name
FROM clients c
INNER JOIN client_provider cp
ON c.client_id = cp.client_id
GROUP BY
cp.client_id,
c.client_name
HAVING
COUNT(1) > 1
All non-aggregated columns must be in group by clause, now you know that.
As you commented that you want to display only client_name but not client_id (while it has to be in the group by clause), use current query as source for the final result:
select client_name
from (-- current query begins here
select cp.client_id,
c.client_name
from clients c join client_provider cp on c.client_id = cp.client_id
group by cp.client_id,
c.client_name
having count(*) > 1
-- current query ends here
);
Alternatively, you could do it by using (slightly modified) current query as a subquery:
select cl.client_name
from client cl
where cl.client_id in (select cp.client_id
from client_provider cp
group by cp.client_id
having count(*) > 1
);

Writing a query to combine results from multiple tables with all possible combinations

I have this database schema:
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name char(50) NOT NULL UNIQUE
);
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name char(50) NOT NULL,
);
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
uid INTEGER REFERENCES users (id) NOT NULL,
pid INTEGER REFERENCES products (id) NOT NULL,
quantity INTEGER NOT NULL,
price FLOAT NOT NULL CHECK (price >= 0)
);
I am trying to write a query that will give me all combinations of users and products, as well as the total amount spent by the user on that product. Specifically, if I have 5 products and 5 users, there should be 25 rows in the table. Right now I have a query that almost gets the job done, however, if the user has never purchased that product then there is no row printed at all.
Here's what I've written so far:
SELECT u.name as username, p.name as productname, SUM(o.quantity * o.price) as totalPrice
FROM users u, orders o, products p
WHERE u.id = o.uid
AND p.id = o.pid
GROUP BY u.name, p.name
ORDER BY u.name, p.name
I figure that this requires some sort of join, but my SQL knowledge is limited and I am not sure what would be the best way to go about doing this. I think if somebody can help me figure this out then I will have a much better understanding.
You can do this using cross join and left join:
select u.name as username, p.name as productname,
sum(o.quantity * o.price) as totalPrice
from users u cross join
products p left join
orders o
on o.uid = u.id and o.pid = p.id
group by u.name, p.name;
The cross join generates all the rows. The left join brings in the matching rows. A simple rule when using SQL is: Never use commas in the FROM clause. Always use explicit JOIN syntax.

How to massive update?

I have three tables:
group:
id - primary key
name - varchar
profile:
id - primary key
name - varchar
surname - varchar
[...etc...]
profile_group:
profile_id - integer, foreign key to table profile
group_id - integer, foreign key to table group
Profiles may be in many groups. I have group named "Users" with id=1 and I want to assign all users to this group but only if there was no such entry for the table profiles.
How to do it?
If I understood you correctly, you want to add entries like (profile_id, 1) into profile_group table for all profiles, that were not in this table before. If so, try this:
INSERT INTO profile_group(profile_id, group_id)
SELECT id, 1 FROM profile p
LEFT JOIN profile_group pg on (p.id=pg.profile_id)
WHERE pg.group_id IS NULL;
What you want to do is use a left join to the profile group table and then exclude any matching records (this is done in the where clause of the below SQL statement).
This is faster than using not in (select xxx) since the query profiler seems to handle it better (in my experience)
insert into profile_group (profile_id, group_id)
select p.id, 1
from profiles p
left join profile_group pg on p.id = pg.profile_id
and pg.group_id = 1
where pg.profile_id is null

SQL Anomaly Using 'USING' Clause with Nested Queries?

I have a normalized database containing 3 tables whose DDL is this:
CREATE CACHED TABLE Clients (
cli_id INTEGER GENERATED ALWAYS AS IDENTITY (START WITH 100) PRIMARY KEY,
defmrn_id BIGINT,
lastName VARCHAR(48) DEFAULT '' NOT NULL,
midName VARCHAR(24) DEFAULT '' NOT NULL,
firstName VARCHAR(24) DEFAULT '' NOT NULL,
doB INTEGER DEFAULT 0 NOT NULL,
gender VARCHAR(1) NOT NULL);
CREATE TABLE Client_MRNs (
mrn_id BIGINT GENERATED ALWAYS AS IDENTITY (START WITH 100) PRIMARY KEY,
cli_id INTEGER REFERENCES Clients ( cli_id ),
inst_id INTEGER REFERENCES Institutions ( inst_id ),
mrn VARCHAR(32) DEFAULT '' NOT NULL,
CONSTRAINT climrn01 UNIQUE (mrn, inst_id));
CREATE TABLE Institutions (
inst_id INTEGER GENERATED ALWAYS AS IDENTITY (START WITH 100) PRIMARY KEY,
loc_id INTEGER REFERENCES Locales (loc_id ),
itag VARCHAR(6) UNIQUE NOT NULL,
iname VARCHAR(80) DEFAULT '' NOT NULL);
The first table contains a foreign key column, defmrn_id, that is a reference to a "default identifier code" that is stored in the second table (which is a list of all identifier codes). A record in the first table may have many identifiers, but only one default identifier. So yeah, I have created a circular reference.
The third table is just normalized data from the second table.
I wanted a query that would find a CLIENT record based on matching a supplied identifier code to any of the identifier codes in CLIENT_MRNs that may belong to that CLIENT record.
My strategy was to first identify those records that matched in the second table (CLIENT_MRN) and then use that intermediate result to join to records in the CLIENT table that matched other user-supplied searching criteria. I also need to denormalize the identifier reference defmrn_id in the 1st table. Here is what I came up with...
SQL = SELECT c.*, r.mrn, i.inst_id, i.itag, i.iname
FROM Clients AS c
INNER JOIN
(
SELECT m.cli_id
FROM Client_MRNs AS m
WHERE m.mrn = ?
) AS m2 ON m2.cli_id = c.cli_id
INNER JOIN Client_MRNs AS r ON c.defmrn_id = r.mrn_id
INNER JOIN Institutions AS i USING ( inst_id )
WHERE (<other user supplied search criteria...>);
The above works, but I spent some time trying to understand why the following was NOT working...
SQL = SELECT c.*, r.mrn, i.inst_id, i.itag, i.iname
FROM Clients AS c
INNER JOIN
(
SELECT m.cli_id
FROM Client_MRNs AS m
WHERE m.mrn = ?
) AS m2 USING ( cli_id )
INNER JOIN Client_MRNs AS r ON c.defmrn_id = r.mrn_id
INNER JOIN Institutions AS i USING ( inst_id )
WHERE (<other user supplied search criteria...>);
It seems to me that the second SQL should work, but it fails on the USING clause every time. I am executing these queries against a database managed by HSQLDB 2.2.9 as the RDBMS. Is this a parsing issue in HSQLDB or is this a known limitation of the USING clause with nested queries?
You can always try with HSQLDB 2.3.0 (a release candidate).
The way you report the incomplet SQL does not allow proper checking. But there is an ovbious mistake in the query. If you have:
SELECT INST_ID FROM CLIENTS_MRS AS R INNER JOIN INSTITUTIONS AS I USING (INST_ID)
INST_ID can be used in the SELECT column list only without a table qualifier. The reason is it is no longer considered a column of either table. The same is true with common columns if you use NATURAL JOIN.
This query is accepted by version 2.3.0
SELECT c.*, r.mrn, inst_id, i.itag, i.iname
FROM Clients AS c
INNER JOIN
(
SELECT m.cli_id
FROM Client_MRNs AS m
WHERE m.mrn = 2
) AS m2 USING ( cli_id )
INNER JOIN Client_MRNs AS r ON c.defmrn_id = r.mrn_id
INNER JOIN Institutions AS i USING ( inst_id )