Group by on non id field - sql

I have the following setup of tables:
CREATE TABLE public.tags (
tag_id int4 NOT NULL,
creation_timestamp timestamp NULL,
"name" varchar(255) NULL,
CONSTRAINT tags_pkey PRIMARY KEY (tag_id)
);
-- public.tag_targets definition
-- Drop table
-- DROP TABLE public.tag_targets;
CREATE TABLE public.tag_targets (
id int4 NOT NULL,
creation_timestamp timestamp NULL,
target_id int8 NULL,
target_name varchar(255) NULL,
last_update_timestamp timestamp NULL,
tag_id int4 NULL,
CONSTRAINT tag_targets_pkey PRIMARY KEY (id),
CONSTRAINT fkcesi55mqvysjv63c1xf2j15oh FOREIGN KEY (tag_id) REFERENCES tags(tag_id)
);
I am trying to run the following query:
SELECT *
FROM tag_targets tt, tags t
WHERE tt.tag_id = t.tag_id
AND (t."name" IN ('Keeper', 'Pk'))
GROUP by tt.target_id
However it wants the PK of both Tags and Tagtarget in the group by:
ERROR: column "tt.id" must appear in the GROUP BY clause or be used in an aggregate function
Is there anyway to group on the target_id column? Also feel free to give any feedback on table design as I went for a generic mapping table and independent tags table

The problem is that you are requesting SELECT * but in GROUP BY you specified only tt.target_id. Generally speaking All column names in SELECT list must appear in GROUP BY. Oversimplifying: your database doesn't know what to do with all values you requested in select, that weren't used in GROUP BY or any agregate.
Try running following query to see if you are getting something
SELECT tt.target_id, count(*)
FROM tag_targets tt, tags t
WHERE tt.tag_id = t.tag_id
AND (t."name" IN ('Keeper', 'Pk'))
GROUP by tt.target_id

Unrelated but your syntax of table1, table2 with the join in the "where" clause is the non-ANSI syntax. It's not wrong or anything, but the ANSI syntax of explicit joins is preferred for a litany of reasons I won't go into:
SELECT *
FROM
tag_targets tt
join tags t on
tt.tag_id = t.tag_id
where
t."name" IN ('Keeper', 'Pk')
On the surface, when you say group I am wondering if you mean "sort..." I am assuming you are new to SQL, so if that's an oversimplification, forgive me, but this would be perhaps what you wanted -- an "order by" instead of a group by.
SELECT *
FROM
tag_targets tt
join tags t on
tt.tag_id = t.tag_id
where
t."name" IN ('Keeper', 'Pk')
order by
tt.target_id
If, on the other hand, you only wanted a single record for each target_id (which is truly a "group by target_id"), then perhaps this is what you wanted... one record per target_id, but then you have to identify how to prioritize which order is selected. In this example, I say pick the one based on the most recent updated date:
SELECT distinct on (tt.target_id)
*
FROM
tag_targets tt
join tags t on
tt.tag_id = t.tag_id
where
t."name" IN ('Keeper', 'Pk')
order by
tt.target_id, tt.last_update_timestamp desc
Not confident on either of these suggestions, so if they miss the mark, post some sample data and expected results.

Related

How to write query/create view to limit multiple records to show only max value

Consider the following three tables. A list of contacts, a list of status with a defined "rank" and a join table that links a contact to multiple status's.
CREATE TABLE public."Contacts"
(
name character varying COLLATE pg_catalog."default",
email character varying COLLATE pg_catalog."default",
contactid integer NOT NULL DEFAULT nextval('"Contacts_contactid_seq"'::regclass),
CONSTRAINT "Contacts_pkey" PRIMARY KEY (contactid)
)
CREATE TABLE public.statusoptions
(
option character varying COLLATE pg_catalog."default" NOT NULL,
"Rank" integer,
CONSTRAINT "ListOptions_pkey" PRIMARY KEY (option)
)
CREATE TABLE public."ContactStatus"
(
contactid integer NOT NULL,
option character varying COLLATE pg_catalog."default" NOT NULL,
CONSTRAINT "Options_pkey" PRIMARY KEY (contactid, option),
CONSTRAINT fk_1 FOREIGN KEY (contactid)
REFERENCES public."Contacts" (contactid) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION,
CONSTRAINT fk_2 FOREIGN KEY (option)
REFERENCES public.statusoptions (option) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION
)
The following query returns all rows.
select "Contacts".contactid, "Contacts".name, "ContactStatus".option, statusoptions."Rank" as
currentRank
from "Contacts","ContactStatus", statusoptions
where "Contacts".contactid = "ContactStatus".contactid
and statusoptions.option="ContactStatus".option
This returns a record set that looks like this:
Contactid name Status CurrentRank
1 "john" "apply" 1
1 "john" "Manager Review" 4
2 "bill" "apply" 1
2 "bill" "1st interview" 2
1 "john" "1st interview" 2
What I need is to create a query/view that would always JUST return the rows of the MAX current RANK. So the expected result I want from this view is:
Contactid name Status CurrentRank
1 "john" "Manager Review" 4
2 "bill" "1st interview" 2
At any time, I could change the "Rank" value in the statusoptions field, which would change the view accordingly.
Is this possible?
You can use distinct on:
select distinct on(c.contactid)
c.contactid,
c.name,
cs.option,
s."Rank" as currentRank
from
"Contacts" c
inner join "ContactStatus" cs on c.contactid = cs.contactid
inner join statusoptions s on s.option = cs.option
order by c.contactid, s."Rank" desc
Note:
always use explicit, standard joins (with the on clause) instead of old-school, implicit joins (with a comma in the where clause)
(short) table aliase make the query shorter and easier to read
consider avoiding quoting table and column names, unless when absolutly necessary; they make the identifiers case-senstive, while by default they are not
In Postgres, you can use distinct on
I think you want:
select distinct on (c.contactid) c.contactid, c.name, cs.option, so."Rank" as currentRank
from "Contacts" c join
"ContactStatus" cs
on c.contactid = cs.contactid join
statusoptions so
on so.option = cs.option
order by c.contactid, so.rank desc;
Notes:
Use proper, explicit, standard JOIN syntax.
Never use commas in the FROM clause.
Table aliases make a query easier to write and to read.
You should avoid quoting table names and column names. That just clutters up queries unnecessarily.
distinct on usually has better performance than alternatives such as row_number().
You can do max(rank) and group by the remaining fields
select c.contactid, c.name, cs.option, max(so.rank) currentRank
from Contacts c
join ContactStatus cs on c.contactid = cs.contactid
join StatusOptions so on so.option = cs.option
group by c.contactid, c.name, cs.option

How to select from multiple tables in a group by query?

I have some database tables containing some documents that people need to sign. The tables are defined (somewhat simplified) as follows.
create table agreement (
id integer NOT NULL,
name character varying(50) NOT NULL,
org_id integer NOT NULL,
CONSTRAINT agreement_pkey PRIMARY KEY (id)
CONSTRAINT org FOREIGN KEY (org_id) REFERENCES org (id) MATCH SIMPLE
)
create table version (
id integer NOT NULL,
content text NOT NULL,
publish_date timestamp NOT NULL,
agreement_id integer NOT NULL,
CONSTRAINT version_pkey PRIMARY KEY (id)
CONSTRAINT agr FOREIGN KEY (agreement_id) REFERENCES agreement (id) MATCH SIMPLE
)
I skipped the org table, to reduce clutter. I have been trying to write a query that would give me all the right agreement information for a given org. So far, I can do
SELECT a.id, a.name FROM agreement AS a
JOIN version as v ON (a.id = v.agreement_id)
JOIN org as o ON (o.id = a.org_id)
WHERE o.name = $1
GROUP BY a.id
This seems to give me a single record for each agreement that belongs to the org I want and has at least one version. But I need to also include content and date published of the latest version available. How do I do that?
Also, I have a separate table called signatures that links to a user and a version. If possible, I would like to extend this query to only include agreements where a given user didn't yet sign the latest version.
Edit: reflected the need for the org join, since I select orgs by name rather than by id
You can use a correlated subquery:
SELECT a.id, a.name, v.*
FROM agreement a JOIN
version v
ON a.id = v.agreement_id
WHERE a.org_id = $1 AND
v.publish_date = (SELECT MAX(v2.publish_date) FROM version v2 WHERE v2.agreement_id = v.agreement_id);
Notes:
The org table is not needed because agreement has an org_id.
No aggregation is needed for this query. You are filtering for the most recent record.
The correlated subquery is one method that retrieves the most recent version.
Postgresql has Window Functions.
Window functions allow you to operate a sort over a specific column or set of columns. the rank function returns the row's place in the results for the sort. If you filter to just where the rank is 1 then you will always get just one row and it will be the highest sorted for the partition.
select u.id, u.name, u.content, u.publish_date from (
SELECT a.id, a.name, v.content, v.publish_date, rank() over (partition by a.id order by v.id desc) as pos
FROM agreement AS a
JOIN version as v ON (a.id = v.agreement_id)
JOIN org as o ON (o.id = a.org_id)
WHERE o.id = $1
) as u
where pos = 1
SELECT a.id, a.name, max(v.publish_date) publish_date FROM agreement AS a
JOIN version as v ON (a.id = v.agreement_id)
JOIN org as o ON (o.id = a.org_id)
WHERE o.id = $1
GROUP BY a.id, a.name

How to massive update?

I have three tables:
group:
id - primary key
name - varchar
profile:
id - primary key
name - varchar
surname - varchar
[...etc...]
profile_group:
profile_id - integer, foreign key to table profile
group_id - integer, foreign key to table group
Profiles may be in many groups. I have group named "Users" with id=1 and I want to assign all users to this group but only if there was no such entry for the table profiles.
How to do it?
If I understood you correctly, you want to add entries like (profile_id, 1) into profile_group table for all profiles, that were not in this table before. If so, try this:
INSERT INTO profile_group(profile_id, group_id)
SELECT id, 1 FROM profile p
LEFT JOIN profile_group pg on (p.id=pg.profile_id)
WHERE pg.group_id IS NULL;
What you want to do is use a left join to the profile group table and then exclude any matching records (this is done in the where clause of the below SQL statement).
This is faster than using not in (select xxx) since the query profiler seems to handle it better (in my experience)
insert into profile_group (profile_id, group_id)
select p.id, 1
from profiles p
left join profile_group pg on p.id = pg.profile_id
and pg.group_id = 1
where pg.profile_id is null

SQL Anomaly Using 'USING' Clause with Nested Queries?

I have a normalized database containing 3 tables whose DDL is this:
CREATE CACHED TABLE Clients (
cli_id INTEGER GENERATED ALWAYS AS IDENTITY (START WITH 100) PRIMARY KEY,
defmrn_id BIGINT,
lastName VARCHAR(48) DEFAULT '' NOT NULL,
midName VARCHAR(24) DEFAULT '' NOT NULL,
firstName VARCHAR(24) DEFAULT '' NOT NULL,
doB INTEGER DEFAULT 0 NOT NULL,
gender VARCHAR(1) NOT NULL);
CREATE TABLE Client_MRNs (
mrn_id BIGINT GENERATED ALWAYS AS IDENTITY (START WITH 100) PRIMARY KEY,
cli_id INTEGER REFERENCES Clients ( cli_id ),
inst_id INTEGER REFERENCES Institutions ( inst_id ),
mrn VARCHAR(32) DEFAULT '' NOT NULL,
CONSTRAINT climrn01 UNIQUE (mrn, inst_id));
CREATE TABLE Institutions (
inst_id INTEGER GENERATED ALWAYS AS IDENTITY (START WITH 100) PRIMARY KEY,
loc_id INTEGER REFERENCES Locales (loc_id ),
itag VARCHAR(6) UNIQUE NOT NULL,
iname VARCHAR(80) DEFAULT '' NOT NULL);
The first table contains a foreign key column, defmrn_id, that is a reference to a "default identifier code" that is stored in the second table (which is a list of all identifier codes). A record in the first table may have many identifiers, but only one default identifier. So yeah, I have created a circular reference.
The third table is just normalized data from the second table.
I wanted a query that would find a CLIENT record based on matching a supplied identifier code to any of the identifier codes in CLIENT_MRNs that may belong to that CLIENT record.
My strategy was to first identify those records that matched in the second table (CLIENT_MRN) and then use that intermediate result to join to records in the CLIENT table that matched other user-supplied searching criteria. I also need to denormalize the identifier reference defmrn_id in the 1st table. Here is what I came up with...
SQL = SELECT c.*, r.mrn, i.inst_id, i.itag, i.iname
FROM Clients AS c
INNER JOIN
(
SELECT m.cli_id
FROM Client_MRNs AS m
WHERE m.mrn = ?
) AS m2 ON m2.cli_id = c.cli_id
INNER JOIN Client_MRNs AS r ON c.defmrn_id = r.mrn_id
INNER JOIN Institutions AS i USING ( inst_id )
WHERE (<other user supplied search criteria...>);
The above works, but I spent some time trying to understand why the following was NOT working...
SQL = SELECT c.*, r.mrn, i.inst_id, i.itag, i.iname
FROM Clients AS c
INNER JOIN
(
SELECT m.cli_id
FROM Client_MRNs AS m
WHERE m.mrn = ?
) AS m2 USING ( cli_id )
INNER JOIN Client_MRNs AS r ON c.defmrn_id = r.mrn_id
INNER JOIN Institutions AS i USING ( inst_id )
WHERE (<other user supplied search criteria...>);
It seems to me that the second SQL should work, but it fails on the USING clause every time. I am executing these queries against a database managed by HSQLDB 2.2.9 as the RDBMS. Is this a parsing issue in HSQLDB or is this a known limitation of the USING clause with nested queries?
You can always try with HSQLDB 2.3.0 (a release candidate).
The way you report the incomplet SQL does not allow proper checking. But there is an ovbious mistake in the query. If you have:
SELECT INST_ID FROM CLIENTS_MRS AS R INNER JOIN INSTITUTIONS AS I USING (INST_ID)
INST_ID can be used in the SELECT column list only without a table qualifier. The reason is it is no longer considered a column of either table. The same is true with common columns if you use NATURAL JOIN.
This query is accepted by version 2.3.0
SELECT c.*, r.mrn, inst_id, i.itag, i.iname
FROM Clients AS c
INNER JOIN
(
SELECT m.cli_id
FROM Client_MRNs AS m
WHERE m.mrn = 2
) AS m2 USING ( cli_id )
INNER JOIN Client_MRNs AS r ON c.defmrn_id = r.mrn_id
INNER JOIN Institutions AS i USING ( inst_id )

SQL function to sort by most popular content

I don't know if this is possible with SQL:
I have two tables, one of content, each with an integer ID, and a table of comments each with an "On" field denoting the content it is on. I'd like to receive the content in order of how many comments have it in their "On" field, and was hoping SQL could do it.
SELECT comment.on AS content_id, COUNT(comment_id) AS num_comments
FROM comments
GROUP BY content_id
ORDER BY num_comments DESC
If you need all the fields of the content, you can do a join:
SELECT contents.*, COUNT(comment_id) AS num_comments
FROM contents
LEFT JOIN comments on contents.content_id = comments.on
GROUP BY content_id
ORDER BY num_comments DESC
select c.id, count(cmt.*) as cnt
from Content c, Comment cmt
where c.id = cmt.id
order by cnt
group by c.id,
Let's assume your tables look like this (I wrote this in pseudo-SQL - syntax may differ depending on database you are using). From the description you provided, it is not clear how you are joining the tables. Nevertheless, I think it looks something like this (with the caveat that all primary keys, indexes, and so forth are missing):
CREATE TABLE [dbo].[Content] (
[ContentID] [int] NOT NULL,
[ContentText] [varchar](50) NOT NULL
)
CREATE TABLE [dbo].[ContentComments] (
[ContentCommentID] [int] NOT NULL,
[ContentCommentText] [varchar](50) NOT NULL,
[ContentID] [int] NOT NULL
)
ALTER TABLE [dbo].[ContentComments] WITH CHECK ADD CONSTRAINT
[FK_ContentComments_Content] FOREIGN KEY([ContentID])
REFERENCES [dbo].[Content] ([ContentID])
Here is how you would write your query to get the content sorted by the number of comments each piece of content has. The DESC sorts the content items from those with the most comments to those with the least comments.
SELECT Content.ContentID, COUNT(ContentComments.ContentCommentID) AS CommentCount
FROM Content
INNER JOIN ContentComments
ON Content.ContentID = ContentComments.ContentID
GROUP BY Content.ContentID
ORDER BY COUNT(ContentComments.ContentCommentID) DESC