refer to array from SELECT clause in WHERE clause - sql

I have the following two tables:
create table person
(
identifier integer not null,
name text not null,
age integer not null,
primary key(identifier)
);
create table agenda
(
identifier integer not null,
name text not null,
primary key(identifier)
);
They are joined with the following table:
create table person_agenda
(
person_identifier integer not null,
agenda_identifier integer not null,
primary key(person_identifier, agenda_identifier),
foreign key(person_identifier) references person(identifier),
foreign key(agenda_identifier) references agenda(identifier)
);
I am trying to refer to an array, as definied in the SELECT clause, in the WHERE clause.
The following works:
select identifier, name, array(select identifier from agenda a, person_agenda pa where person_identifier = p.identifier and identifier = agenda_identifier and name = '...') as r
from person p;
This does not:
select identifier, name, array(select identifier from agenda a, person_agenda pa where person_identifier = p.identifier and identifier = agenda_identifier and name = '...') as r
from person p
where array_length(r, 1) >= 1;
It says that r is not a known column. How can I refer to this array in the WHERE clause?
The purpose of my second query is to:
omit persons without agendas (by filtering on array_length() >= 1)
get all agenda identifiers, so I can fetch their information in a subsequent query without having to filter again (on field agenda.name in my example above) (by projecting the array in the SELECT clause)
The first bullet can be done with a simple join. But, for the first bullet in combination with the second bullet, I need some kind of aggregation on the agenda identifiers. I thought arrays would be useful for this.
edit
According to user Saba, this is not possible. Thanks for your feedback.
Is the following query a good alternative?
select person.identifier, person.name, array_agg(agenda.identifier)
from person, person_agenda, agenda
where person.identifier = person_identifier and
agenda.identifier = agenda_identifier and
agenda.name = '...'
group by person.identifier;

Alias cant be used in WHERE clause, if you want to use the alias in your WHERE clause, you need to wrap it in a subquery or CTE. Because query will be executed in the following order:
FROM
ON
JOIN
**WHERE**
GROUP BY
WITH CUBE or WITH ROLLUP
HAVING
**SELECT**
DISTINCT
ORDER BY
TOP
Try something like this:
SELECT
FROM (
select identifier, name, array(select identifier from agenda a, person_agenda pa where person_identifier = p.identifier and identifier = agenda_identifier and name = '...') as r
from person p
) AS per
where array_length(r) >= 1;
Your case seems straight forward, the below one should work for you:
select person.identifier, person.name, array_agg(agenda.identifier)
from person, person_agenda, agenda
where person.identifier = person_identifier and
agenda.identifier = agenda_identifier and
agenda.name = '...'
group by person.identifier, person.name;

Related

Group by on non id field

I have the following setup of tables:
CREATE TABLE public.tags (
tag_id int4 NOT NULL,
creation_timestamp timestamp NULL,
"name" varchar(255) NULL,
CONSTRAINT tags_pkey PRIMARY KEY (tag_id)
);
-- public.tag_targets definition
-- Drop table
-- DROP TABLE public.tag_targets;
CREATE TABLE public.tag_targets (
id int4 NOT NULL,
creation_timestamp timestamp NULL,
target_id int8 NULL,
target_name varchar(255) NULL,
last_update_timestamp timestamp NULL,
tag_id int4 NULL,
CONSTRAINT tag_targets_pkey PRIMARY KEY (id),
CONSTRAINT fkcesi55mqvysjv63c1xf2j15oh FOREIGN KEY (tag_id) REFERENCES tags(tag_id)
);
I am trying to run the following query:
SELECT *
FROM tag_targets tt, tags t
WHERE tt.tag_id = t.tag_id
AND (t."name" IN ('Keeper', 'Pk'))
GROUP by tt.target_id
However it wants the PK of both Tags and Tagtarget in the group by:
ERROR: column "tt.id" must appear in the GROUP BY clause or be used in an aggregate function
Is there anyway to group on the target_id column? Also feel free to give any feedback on table design as I went for a generic mapping table and independent tags table
The problem is that you are requesting SELECT * but in GROUP BY you specified only tt.target_id. Generally speaking All column names in SELECT list must appear in GROUP BY. Oversimplifying: your database doesn't know what to do with all values you requested in select, that weren't used in GROUP BY or any agregate.
Try running following query to see if you are getting something
SELECT tt.target_id, count(*)
FROM tag_targets tt, tags t
WHERE tt.tag_id = t.tag_id
AND (t."name" IN ('Keeper', 'Pk'))
GROUP by tt.target_id
Unrelated but your syntax of table1, table2 with the join in the "where" clause is the non-ANSI syntax. It's not wrong or anything, but the ANSI syntax of explicit joins is preferred for a litany of reasons I won't go into:
SELECT *
FROM
tag_targets tt
join tags t on
tt.tag_id = t.tag_id
where
t."name" IN ('Keeper', 'Pk')
On the surface, when you say group I am wondering if you mean "sort..." I am assuming you are new to SQL, so if that's an oversimplification, forgive me, but this would be perhaps what you wanted -- an "order by" instead of a group by.
SELECT *
FROM
tag_targets tt
join tags t on
tt.tag_id = t.tag_id
where
t."name" IN ('Keeper', 'Pk')
order by
tt.target_id
If, on the other hand, you only wanted a single record for each target_id (which is truly a "group by target_id"), then perhaps this is what you wanted... one record per target_id, but then you have to identify how to prioritize which order is selected. In this example, I say pick the one based on the most recent updated date:
SELECT distinct on (tt.target_id)
*
FROM
tag_targets tt
join tags t on
tt.tag_id = t.tag_id
where
t."name" IN ('Keeper', 'Pk')
order by
tt.target_id, tt.last_update_timestamp desc
Not confident on either of these suggestions, so if they miss the mark, post some sample data and expected results.

How to write query/create view to limit multiple records to show only max value

Consider the following three tables. A list of contacts, a list of status with a defined "rank" and a join table that links a contact to multiple status's.
CREATE TABLE public."Contacts"
(
name character varying COLLATE pg_catalog."default",
email character varying COLLATE pg_catalog."default",
contactid integer NOT NULL DEFAULT nextval('"Contacts_contactid_seq"'::regclass),
CONSTRAINT "Contacts_pkey" PRIMARY KEY (contactid)
)
CREATE TABLE public.statusoptions
(
option character varying COLLATE pg_catalog."default" NOT NULL,
"Rank" integer,
CONSTRAINT "ListOptions_pkey" PRIMARY KEY (option)
)
CREATE TABLE public."ContactStatus"
(
contactid integer NOT NULL,
option character varying COLLATE pg_catalog."default" NOT NULL,
CONSTRAINT "Options_pkey" PRIMARY KEY (contactid, option),
CONSTRAINT fk_1 FOREIGN KEY (contactid)
REFERENCES public."Contacts" (contactid) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION,
CONSTRAINT fk_2 FOREIGN KEY (option)
REFERENCES public.statusoptions (option) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION
)
The following query returns all rows.
select "Contacts".contactid, "Contacts".name, "ContactStatus".option, statusoptions."Rank" as
currentRank
from "Contacts","ContactStatus", statusoptions
where "Contacts".contactid = "ContactStatus".contactid
and statusoptions.option="ContactStatus".option
This returns a record set that looks like this:
Contactid name Status CurrentRank
1 "john" "apply" 1
1 "john" "Manager Review" 4
2 "bill" "apply" 1
2 "bill" "1st interview" 2
1 "john" "1st interview" 2
What I need is to create a query/view that would always JUST return the rows of the MAX current RANK. So the expected result I want from this view is:
Contactid name Status CurrentRank
1 "john" "Manager Review" 4
2 "bill" "1st interview" 2
At any time, I could change the "Rank" value in the statusoptions field, which would change the view accordingly.
Is this possible?
You can use distinct on:
select distinct on(c.contactid)
c.contactid,
c.name,
cs.option,
s."Rank" as currentRank
from
"Contacts" c
inner join "ContactStatus" cs on c.contactid = cs.contactid
inner join statusoptions s on s.option = cs.option
order by c.contactid, s."Rank" desc
Note:
always use explicit, standard joins (with the on clause) instead of old-school, implicit joins (with a comma in the where clause)
(short) table aliase make the query shorter and easier to read
consider avoiding quoting table and column names, unless when absolutly necessary; they make the identifiers case-senstive, while by default they are not
In Postgres, you can use distinct on
I think you want:
select distinct on (c.contactid) c.contactid, c.name, cs.option, so."Rank" as currentRank
from "Contacts" c join
"ContactStatus" cs
on c.contactid = cs.contactid join
statusoptions so
on so.option = cs.option
order by c.contactid, so.rank desc;
Notes:
Use proper, explicit, standard JOIN syntax.
Never use commas in the FROM clause.
Table aliases make a query easier to write and to read.
You should avoid quoting table names and column names. That just clutters up queries unnecessarily.
distinct on usually has better performance than alternatives such as row_number().
You can do max(rank) and group by the remaining fields
select c.contactid, c.name, cs.option, max(so.rank) currentRank
from Contacts c
join ContactStatus cs on c.contactid = cs.contactid
join StatusOptions so on so.option = cs.option
group by c.contactid, c.name, cs.option

Attempting to get a table from SQL Request

I'm working on a project right now and I need to do some request to my DB via SQL *PLUS.
Here is what I'm trying to do.
I want to get a table in which I get Professor first and last name with those conditons (I have to verify the first condition, and then the other):
(First) In a session (let's say 12004), a prof did teach those two courses, INF3180 and INF2110
(Second) In another session, 32003, a prof did teach those two courses, INF1130 and INF1110
Here is the code that created the DB:
CREATE TABLE Professor
(professorCode CHAR(5) NOT NULL,
lastName VARCHAR(10) NOT NULL,
firstName VARCHAR(10) NOT NULL,
CONSTRAINT PrimaryKeyProfessor PRIMARY KEY (professorCode)
)
;
CREATE TABLE Group
(sigle CHAR(7) NOT NULL,
noGroup INTEGER NOT NULL,
sessionCode INTEGER NOT NULL,
maxInscriptions INTEGER NOT NULL,
professorCode CHAR(5) NOT NULL,
CONSTRAINT PrimaryKeyGroup PRIMARY KEY
(sigle,noGroupe,sessionCode),
CONSTRAINT CESigleGroupeRefCours FOREIGN KEY (sigle) REFERENCES Cours,
CONSTRAINT CECodeSessionRefSession FOREIGN KEY (sessionCode) REFERENCES
Session,
CONSTRAINT CEcodeProfRefProfessor FOREIGN KEY(professorCode) REFERENCES
Professor
)
;
And here is my current not working request :
SELECT DISTINCT Professor.firstName, Professor.lastName
FROM Professor, Group
WHERE Group.professorCode = Professor.professorCode
AND Group.sessionCode = 32003
AND (Group.sigle = 'INF1130' AND
Group.sigle = 'INF1110')
OR Group.sessionCode = 12004
AND (Group.sigle = 'INF3180' AND
Group.sigle = 'INF2110')
I know there is a way to combine both results, but I can't seem to find it.
There is only one match possible in that case :
Only one match with 32003 : INF1130, INF1110
None match with 12004 : INF3180, INF2110
The resulting table is supposed to look like this :
--------------------------
First Name Last Name
--------------------------
Denis Tremblay
The proposed solution given by Gordon Linoff looks very good, except it returns me no table since with the following the code, it needs to have the 4 courses and 2 sessionCode to be included. The issue here is that it needs to verify both condition and append the result. Let's say the conditions for the session 12004 results to nothing, then I can consider it as NULL. Then, the second condition, with the session 32003, gives me one match. It should append both results to give me the table presented over.
I want to do one request only for this.
Thanks A LOT!
EDIT : Reformulated
EDIT2 : Gave an example of a known match
EDIT3 : Further explanation why the proposed solution isn't working
Think: group by and having. More importantly, think JOIN, JOIN, JOIN. Never use commas in the from clause.
SELECT p.firstName, p.lastName
FROM Professor p JOIN
Group g
ON g.professorCode = p.professorCode
WHERE (g.sessionCode, g.sigle) IN ( (32003, 'INF1130'), (32003, 'INF1110'),
(12004, 'INF3180'), (12004, 'INF2110')
)
GROUP BY p.firstName, p.lastName
HAVING COUNT(DISTINCT g.sigl) = 4; -- has all four
It seems like you want to list any professor who either taught INF1130 and INF1110 in 32003; or taught INF3180 and INF2110 in 12004. Unfortunately you've presented that as AND (i.e. they have to have taught all four courses - one pair of courses AND the other), not OR (one set of courses OR the other).
As a long-winded way of expanding what I think you want:
SELECT p.firstName, p.lastName
FROM Professor p
WHERE (
EXISTS (
SELECT *
FROM GroupX g
WHERE professorCode = p.professorCode
AND sessionCode = 32003
AND sigle = 'INF1130'
)
AND EXISTS (
SELECT *
FROM GroupX g
WHERE professorCode = p.professorCode
AND sessionCode = 32003
AND sigle = 'INF1110'
)
)
OR (
EXISTS (
SELECT *
FROM GroupX g
WHERE professorCode = p.professorCode
AND sessionCode = 12004
AND sigle = 'INF3180'
)
AND EXISTS (
SELECT *
FROM GroupX g
WHERE professorCode = p.professorCode
AND sessionCode = 12004
AND sigle = 'INF2110'
)
);
Four subqueries isn't going to be terribly efficient. You could do mutiple joins instead.
If you will always be looking for two sigle values per sessionCode then you could modify Gordon's answer to count how many matches each sigle, by adding that to the group-by clause:
SELECT p.firstName, p.lastName
FROM GroupX g
JOIN Professor p
ON p.professorCode = g.professorCode
WHERE (g.sessionCode, g.sigle) IN ( (32003, 'INF1130'), (32003, 'INF1110'),
(12004, 'INF3180'), (12004, 'INF2110')
)
GROUP BY p.firstName, p.lastName, g.sessionCode
HAVING COUNT(*) = 2;
If you did have a professor who taught all four then you would get them listed twice; if that can happen you could add your DISTINCT back in, though that feels a bit wrong. You could also use a subquery and IN to avoid that:
SELECT p.firstName, p.lastName
FROM Professor p
WHERE ProfessorCode IN (
SELECT professorCode
FROM GroupX
WHERE (sessionCode, sigle) IN ( (32003, 'INF1130'), (32003, 'INF1110'),
(12004, 'INF3180'), (12004, 'INF2110')
)
GROUP BY professorCode, sessionCode
HAVING COUNT(*) = 2
)
(I've changed Group to GroupX because that isn't a valid identifier; because it's a keyword. I assume you've changed your real names - maybe from another language?)
use modern join
SELECT Professor.firstName, Professor.lastName
FROM Professor join "Group" g on
g.professorCode = Professor.professorCode
where g.sessionCode in( 32003,12004 )
AND g.sigle in( 'INF1130', 'INF1110','INF3180','INF2110')
group by Professor.firstName, Professor.lastName
having count( distinct sigle )=4

How to select from multiple tables in a group by query?

I have some database tables containing some documents that people need to sign. The tables are defined (somewhat simplified) as follows.
create table agreement (
id integer NOT NULL,
name character varying(50) NOT NULL,
org_id integer NOT NULL,
CONSTRAINT agreement_pkey PRIMARY KEY (id)
CONSTRAINT org FOREIGN KEY (org_id) REFERENCES org (id) MATCH SIMPLE
)
create table version (
id integer NOT NULL,
content text NOT NULL,
publish_date timestamp NOT NULL,
agreement_id integer NOT NULL,
CONSTRAINT version_pkey PRIMARY KEY (id)
CONSTRAINT agr FOREIGN KEY (agreement_id) REFERENCES agreement (id) MATCH SIMPLE
)
I skipped the org table, to reduce clutter. I have been trying to write a query that would give me all the right agreement information for a given org. So far, I can do
SELECT a.id, a.name FROM agreement AS a
JOIN version as v ON (a.id = v.agreement_id)
JOIN org as o ON (o.id = a.org_id)
WHERE o.name = $1
GROUP BY a.id
This seems to give me a single record for each agreement that belongs to the org I want and has at least one version. But I need to also include content and date published of the latest version available. How do I do that?
Also, I have a separate table called signatures that links to a user and a version. If possible, I would like to extend this query to only include agreements where a given user didn't yet sign the latest version.
Edit: reflected the need for the org join, since I select orgs by name rather than by id
You can use a correlated subquery:
SELECT a.id, a.name, v.*
FROM agreement a JOIN
version v
ON a.id = v.agreement_id
WHERE a.org_id = $1 AND
v.publish_date = (SELECT MAX(v2.publish_date) FROM version v2 WHERE v2.agreement_id = v.agreement_id);
Notes:
The org table is not needed because agreement has an org_id.
No aggregation is needed for this query. You are filtering for the most recent record.
The correlated subquery is one method that retrieves the most recent version.
Postgresql has Window Functions.
Window functions allow you to operate a sort over a specific column or set of columns. the rank function returns the row's place in the results for the sort. If you filter to just where the rank is 1 then you will always get just one row and it will be the highest sorted for the partition.
select u.id, u.name, u.content, u.publish_date from (
SELECT a.id, a.name, v.content, v.publish_date, rank() over (partition by a.id order by v.id desc) as pos
FROM agreement AS a
JOIN version as v ON (a.id = v.agreement_id)
JOIN org as o ON (o.id = a.org_id)
WHERE o.id = $1
) as u
where pos = 1
SELECT a.id, a.name, max(v.publish_date) publish_date FROM agreement AS a
JOIN version as v ON (a.id = v.agreement_id)
JOIN org as o ON (o.id = a.org_id)
WHERE o.id = $1
GROUP BY a.id, a.name

SQL Anomaly Using 'USING' Clause with Nested Queries?

I have a normalized database containing 3 tables whose DDL is this:
CREATE CACHED TABLE Clients (
cli_id INTEGER GENERATED ALWAYS AS IDENTITY (START WITH 100) PRIMARY KEY,
defmrn_id BIGINT,
lastName VARCHAR(48) DEFAULT '' NOT NULL,
midName VARCHAR(24) DEFAULT '' NOT NULL,
firstName VARCHAR(24) DEFAULT '' NOT NULL,
doB INTEGER DEFAULT 0 NOT NULL,
gender VARCHAR(1) NOT NULL);
CREATE TABLE Client_MRNs (
mrn_id BIGINT GENERATED ALWAYS AS IDENTITY (START WITH 100) PRIMARY KEY,
cli_id INTEGER REFERENCES Clients ( cli_id ),
inst_id INTEGER REFERENCES Institutions ( inst_id ),
mrn VARCHAR(32) DEFAULT '' NOT NULL,
CONSTRAINT climrn01 UNIQUE (mrn, inst_id));
CREATE TABLE Institutions (
inst_id INTEGER GENERATED ALWAYS AS IDENTITY (START WITH 100) PRIMARY KEY,
loc_id INTEGER REFERENCES Locales (loc_id ),
itag VARCHAR(6) UNIQUE NOT NULL,
iname VARCHAR(80) DEFAULT '' NOT NULL);
The first table contains a foreign key column, defmrn_id, that is a reference to a "default identifier code" that is stored in the second table (which is a list of all identifier codes). A record in the first table may have many identifiers, but only one default identifier. So yeah, I have created a circular reference.
The third table is just normalized data from the second table.
I wanted a query that would find a CLIENT record based on matching a supplied identifier code to any of the identifier codes in CLIENT_MRNs that may belong to that CLIENT record.
My strategy was to first identify those records that matched in the second table (CLIENT_MRN) and then use that intermediate result to join to records in the CLIENT table that matched other user-supplied searching criteria. I also need to denormalize the identifier reference defmrn_id in the 1st table. Here is what I came up with...
SQL = SELECT c.*, r.mrn, i.inst_id, i.itag, i.iname
FROM Clients AS c
INNER JOIN
(
SELECT m.cli_id
FROM Client_MRNs AS m
WHERE m.mrn = ?
) AS m2 ON m2.cli_id = c.cli_id
INNER JOIN Client_MRNs AS r ON c.defmrn_id = r.mrn_id
INNER JOIN Institutions AS i USING ( inst_id )
WHERE (<other user supplied search criteria...>);
The above works, but I spent some time trying to understand why the following was NOT working...
SQL = SELECT c.*, r.mrn, i.inst_id, i.itag, i.iname
FROM Clients AS c
INNER JOIN
(
SELECT m.cli_id
FROM Client_MRNs AS m
WHERE m.mrn = ?
) AS m2 USING ( cli_id )
INNER JOIN Client_MRNs AS r ON c.defmrn_id = r.mrn_id
INNER JOIN Institutions AS i USING ( inst_id )
WHERE (<other user supplied search criteria...>);
It seems to me that the second SQL should work, but it fails on the USING clause every time. I am executing these queries against a database managed by HSQLDB 2.2.9 as the RDBMS. Is this a parsing issue in HSQLDB or is this a known limitation of the USING clause with nested queries?
You can always try with HSQLDB 2.3.0 (a release candidate).
The way you report the incomplet SQL does not allow proper checking. But there is an ovbious mistake in the query. If you have:
SELECT INST_ID FROM CLIENTS_MRS AS R INNER JOIN INSTITUTIONS AS I USING (INST_ID)
INST_ID can be used in the SELECT column list only without a table qualifier. The reason is it is no longer considered a column of either table. The same is true with common columns if you use NATURAL JOIN.
This query is accepted by version 2.3.0
SELECT c.*, r.mrn, inst_id, i.itag, i.iname
FROM Clients AS c
INNER JOIN
(
SELECT m.cli_id
FROM Client_MRNs AS m
WHERE m.mrn = 2
) AS m2 USING ( cli_id )
INNER JOIN Client_MRNs AS r ON c.defmrn_id = r.mrn_id
INNER JOIN Institutions AS i USING ( inst_id )