SQL Anomaly Using 'USING' Clause with Nested Queries? - sql

I have a normalized database containing 3 tables whose DDL is this:
CREATE CACHED TABLE Clients (
cli_id INTEGER GENERATED ALWAYS AS IDENTITY (START WITH 100) PRIMARY KEY,
defmrn_id BIGINT,
lastName VARCHAR(48) DEFAULT '' NOT NULL,
midName VARCHAR(24) DEFAULT '' NOT NULL,
firstName VARCHAR(24) DEFAULT '' NOT NULL,
doB INTEGER DEFAULT 0 NOT NULL,
gender VARCHAR(1) NOT NULL);
CREATE TABLE Client_MRNs (
mrn_id BIGINT GENERATED ALWAYS AS IDENTITY (START WITH 100) PRIMARY KEY,
cli_id INTEGER REFERENCES Clients ( cli_id ),
inst_id INTEGER REFERENCES Institutions ( inst_id ),
mrn VARCHAR(32) DEFAULT '' NOT NULL,
CONSTRAINT climrn01 UNIQUE (mrn, inst_id));
CREATE TABLE Institutions (
inst_id INTEGER GENERATED ALWAYS AS IDENTITY (START WITH 100) PRIMARY KEY,
loc_id INTEGER REFERENCES Locales (loc_id ),
itag VARCHAR(6) UNIQUE NOT NULL,
iname VARCHAR(80) DEFAULT '' NOT NULL);
The first table contains a foreign key column, defmrn_id, that is a reference to a "default identifier code" that is stored in the second table (which is a list of all identifier codes). A record in the first table may have many identifiers, but only one default identifier. So yeah, I have created a circular reference.
The third table is just normalized data from the second table.
I wanted a query that would find a CLIENT record based on matching a supplied identifier code to any of the identifier codes in CLIENT_MRNs that may belong to that CLIENT record.
My strategy was to first identify those records that matched in the second table (CLIENT_MRN) and then use that intermediate result to join to records in the CLIENT table that matched other user-supplied searching criteria. I also need to denormalize the identifier reference defmrn_id in the 1st table. Here is what I came up with...
SQL = SELECT c.*, r.mrn, i.inst_id, i.itag, i.iname
FROM Clients AS c
INNER JOIN
(
SELECT m.cli_id
FROM Client_MRNs AS m
WHERE m.mrn = ?
) AS m2 ON m2.cli_id = c.cli_id
INNER JOIN Client_MRNs AS r ON c.defmrn_id = r.mrn_id
INNER JOIN Institutions AS i USING ( inst_id )
WHERE (<other user supplied search criteria...>);
The above works, but I spent some time trying to understand why the following was NOT working...
SQL = SELECT c.*, r.mrn, i.inst_id, i.itag, i.iname
FROM Clients AS c
INNER JOIN
(
SELECT m.cli_id
FROM Client_MRNs AS m
WHERE m.mrn = ?
) AS m2 USING ( cli_id )
INNER JOIN Client_MRNs AS r ON c.defmrn_id = r.mrn_id
INNER JOIN Institutions AS i USING ( inst_id )
WHERE (<other user supplied search criteria...>);
It seems to me that the second SQL should work, but it fails on the USING clause every time. I am executing these queries against a database managed by HSQLDB 2.2.9 as the RDBMS. Is this a parsing issue in HSQLDB or is this a known limitation of the USING clause with nested queries?

You can always try with HSQLDB 2.3.0 (a release candidate).
The way you report the incomplet SQL does not allow proper checking. But there is an ovbious mistake in the query. If you have:
SELECT INST_ID FROM CLIENTS_MRS AS R INNER JOIN INSTITUTIONS AS I USING (INST_ID)
INST_ID can be used in the SELECT column list only without a table qualifier. The reason is it is no longer considered a column of either table. The same is true with common columns if you use NATURAL JOIN.
This query is accepted by version 2.3.0
SELECT c.*, r.mrn, inst_id, i.itag, i.iname
FROM Clients AS c
INNER JOIN
(
SELECT m.cli_id
FROM Client_MRNs AS m
WHERE m.mrn = 2
) AS m2 USING ( cli_id )
INNER JOIN Client_MRNs AS r ON c.defmrn_id = r.mrn_id
INNER JOIN Institutions AS i USING ( inst_id )

Related

Sum two SQLite columns, when they're subqueries

I have a table of receipts. Each one is associated with a service, and each person is obligated to pay equally for it, except when they are assigned an extra fee that can be activated/deactivaded (0/1). So I used a subquery to get the extra amount they have to pay only if that fee is active; the table 'fees' contains the user_id, the service_id, the extra amount and the active flag. And then, I should get the total per person, adding the extra fee (if any) to the subtotal (receipt total amount minus any active extra fee, and then divided by the number of persons who are obligated to contribute).
SELECT
P.nombre AS person,
S.nombre AS service,
(
SELECT TOTAL(C.value)
FROM fees C
WHERE C.user_id = P.id AND C.service_id = O.service_id AND C.active = 0
) AS fee,
IFNULL(NULL, 23333) AS subtotal,
(fee + subtotal) as total
FROM receipts R
LEFT JOIN obligations O ON O.service_id = R.service_id
LEFT JOIN persons P ON O.user_id = P.id
LEFT JOIN services S ON O.service_id = S.id
WHERE R.id = 3 AND O.active = 0;
Note: 23333 (the subtotal) will be replaced with a '?' and then I'll pass as argument to execute the query with Golang that result that I've already got from another function
Problem occurs at this line
(fee + subtotal) as total
Output: no such column: fee
If I run the query without that line, it will actually return a table with the active extra fees and subtotal, but I'm stuck when trying to create a final column to add those two values.
Thanks!
Edit
Following Stefan's advice, here are the statements I used to create the tables:
CREATE TABLE IF NOT EXISTS persons (id INTEGER PRIMARY KEY, name TEXT NOT NULL, active INTEGER DEFAULT 0); CREATE UNIQUE INDEX per_nom_uindex on persons (name)
CREATE TABLE IF NOT EXISTS services (id INTEGER PRIMARY KEY, name TEXT NOT NULL, active INTEGER DEFAULT 0); CREATE UNIQUE INDEX ser_nom_uindex on services (name)
CREATE TABLE IF NOT EXISTS receipts (id INTEGER PRIMARY KEY, y INTEGER NOT NULL, m INTEGER NOT NULL, service_id INTEGER NOT NULL, amount INTEGER NOT NULL, FOREIGN KEY (service_id) REFERENCES services (id))
CREATE TABLE IF NOT EXISTS fees (id INTEGER PRIMARY KEY, person_id INTEGER NOT NULL, service_id INTEGER NOT NULL, amount INTEGER NOT NULL, active INTEGER DEFAULT 0, FOREIGN KEY(person_id) REFERENCES persons(id), FOREIGN KEY(service_id) REFERENCES services(id))
CREATE TABLE IF NOT EXISTS obligations (id INTEGER PRIMARY KEY, person_id INTEGER NOT NULL, service_id INTEGER NOT NULL, active INTEGER DEFAULT 0, FOREIGN KEY(person_id) REFERENCES persons(id), FOREIGN KEY(service_id) REFERENCES services(id))
Consider moving the subquery from SELECT to JOIN clause (often called derived table) and adjust it with GROUP BY aggregation on user_id and service_id. Doing so, this allows you to reference the column as needed and even avoid rowwise aggregation (unless the SQLite engine runs it as a single aggregation under the hood).
SELECT
P.nombre AS person,
S.nombre AS service,
C.fee, -- REFERENCE SUBQUERY COLUMN
IFNULL(?, 23333) AS subtotal,
C.fee + IFNULL(?, 23333) as total -- REPEAT NEEDED EXPRESSION
FROM receipts R
LEFT JOIN obligations O
ON O.service_id = R.service_id
LEFT JOIN persons P
ON O.user_id = P.id
AND O.active = 0 -- MOVED FROM WHERE CLAUSE
LEFT JOIN services S
ON O.service_id = S.id
LEFT JOIN (
SELECT user_id,
service_id,
TOTAL(value) AS fee
FROM fees
WHERE active = 0
GROUP BY user_id,
service_id
) C ON C.user_id = P.id
AND C.service_id = O.service_id
WHERE R.id = 3

how Inner join work on two foreign key from single table

I am working on Bus route management system , I made two table first one is Cities and second one is route have following queries
CREATE TABLE Cities
(
ID NUMBER GENERATED ALWAYS AS IDENTITY(START with 1 INCREMENT by 1) PRIMARY KEY,
Name Varchar(30) not null,
)
CREATE TABLE route
(
ID NUMBER GENERATED ALWAYS AS IDENTITY(START with 1 INCREMENT by 1) PRIMARY KEY,
Name Varchar(30) not null,
from NUMBER not null,
to NUMBER NOT NULL,
CONSTRAINT FROM_id_FK FOREIGN KEY(from) REFERENCES Cities(ID),
CONSTRAINT TO_id_FK FOREIGN KEY(to) REFERENCES Cities(ID),
)
i am joining the table through inner join
select CITIES.Name
from CITIES
inner join ROUTES on CITIES.ID=ROUTES.ID
but it show single column as
Name
-----------
but i want result as
from | to
------------------------
what is possible way to do this using inner join
I suspect you need something like the following:
select r.Name, cs.Name SourceCity, cd.Name DestinationCity
from routes r
join cities cs on cs.id = r.from
join cities cd on cd.id = r.to
Hope is working for you
select CITIES.Name,ROUTES.from,ROUTES.to
from CITIES inner join ROUTES on CITIES.ID=ROUTES.ID

H2 seems to misinterpret a valid join clause

Here is a simple test database schema. There is really nothing special about it. I am using H2 version 1.4.200 in Oracle compatibility mode.
create table STUFF (
ID number(19) generated by default as identity (start with 1 increment by 1),
NAME varchar2(128) not null,
constraint PK_STUFF primary key (ID),
constraint BK_STUFF unique (NAME)
);
create table STUFF_DETAILS (
ID number(19) generated by default as identity (start with 1 increment by 1),
BLAH varchar2(128) not null,
constraint PK_STUFF_DETAILS primary key (ID)
);
create table STUFF_MORE_DETAILS (
ID number(19) generated by default as identity (start with 1 increment by 1),
BLAH_BLAH varchar2(128) not null,
constraint PK_STUFF_MORE_DETAILS primary key (ID)
);
Here's a view definition that works fine. No objection from H2.
create or replace view V_STUFF1
(
ID,
NAME,
BLAH,
BLAH_BLAH
)
as select
S.ID,
S.NAME,
SD.BLAH,
SMD.BLAH_BLAH
from
STUFF S
inner join STUFF_DETAILS SD
inner join STUFF_MORE_DETAILS SMD
on SD.ID = SMD.ID
on S.ID = SD.ID
;
Here's a view definition that H2 chokes on with the following error message:
Caused by: org.h2.jdbc.JdbcSQLSyntaxErrorException: Column "SD.ID" not found
create or replace view V_STUFF2
(
ID,
NAME,
BLAH,
BLAH_BLAH
)
as select
S.ID,
S.NAME,
SD.BLAH,
SMD.BLAH_BLAH
from
STUFF S
inner join STUFF_DETAILS SD
left outer join STUFF_MORE_DETAILS SMD
on SD.ID = SMD.ID
on S.ID = SD.ID
;
The only difference is the type of the join (left outer vs inner) but I fail to see a reason why this should make a difference with regards to SD.ID column visibility.
To me this looks like a defect in H2 but before I raise an issue with H2 project I want to make sure I am not missing something obvious or doing something stupid.
PS: I am aware I can rewrite the view definition and make H2 accept it but ideally I would like to keep SQL code as close to the original as possible. It is a migration project.
PPS: Oracle (and DB2) have no trouble with both view definitions, so the issue appears H2 specific
A valid Oracle view/query must have each join predicate following the name/alias of the table that is being joined.
By reordering the ON clauses the query could take the form:
create or replace view V_STUFF2
(
ID,
NAME,
BLAH,
BLAH_BLAH
)
as select
S.ID,
S.NAME,
SD.BLAH,
SMD.BLAH_BLAH
from STUFF S
inner join STUFF_DETAILS SD on S.ID = SD.ID
left outer join STUFF_MORE_DETAILS SMD on SD.ID = SMD.ID
The issue has been acknowledged as a defect [1] by H2 developers and resolved with this PR [2]
[1] https://github.com/h2database/h2database/issues/3311
[2] https://github.com/h2database/h2database/pull/3312

How to select from multiple tables in a group by query?

I have some database tables containing some documents that people need to sign. The tables are defined (somewhat simplified) as follows.
create table agreement (
id integer NOT NULL,
name character varying(50) NOT NULL,
org_id integer NOT NULL,
CONSTRAINT agreement_pkey PRIMARY KEY (id)
CONSTRAINT org FOREIGN KEY (org_id) REFERENCES org (id) MATCH SIMPLE
)
create table version (
id integer NOT NULL,
content text NOT NULL,
publish_date timestamp NOT NULL,
agreement_id integer NOT NULL,
CONSTRAINT version_pkey PRIMARY KEY (id)
CONSTRAINT agr FOREIGN KEY (agreement_id) REFERENCES agreement (id) MATCH SIMPLE
)
I skipped the org table, to reduce clutter. I have been trying to write a query that would give me all the right agreement information for a given org. So far, I can do
SELECT a.id, a.name FROM agreement AS a
JOIN version as v ON (a.id = v.agreement_id)
JOIN org as o ON (o.id = a.org_id)
WHERE o.name = $1
GROUP BY a.id
This seems to give me a single record for each agreement that belongs to the org I want and has at least one version. But I need to also include content and date published of the latest version available. How do I do that?
Also, I have a separate table called signatures that links to a user and a version. If possible, I would like to extend this query to only include agreements where a given user didn't yet sign the latest version.
Edit: reflected the need for the org join, since I select orgs by name rather than by id
You can use a correlated subquery:
SELECT a.id, a.name, v.*
FROM agreement a JOIN
version v
ON a.id = v.agreement_id
WHERE a.org_id = $1 AND
v.publish_date = (SELECT MAX(v2.publish_date) FROM version v2 WHERE v2.agreement_id = v.agreement_id);
Notes:
The org table is not needed because agreement has an org_id.
No aggregation is needed for this query. You are filtering for the most recent record.
The correlated subquery is one method that retrieves the most recent version.
Postgresql has Window Functions.
Window functions allow you to operate a sort over a specific column or set of columns. the rank function returns the row's place in the results for the sort. If you filter to just where the rank is 1 then you will always get just one row and it will be the highest sorted for the partition.
select u.id, u.name, u.content, u.publish_date from (
SELECT a.id, a.name, v.content, v.publish_date, rank() over (partition by a.id order by v.id desc) as pos
FROM agreement AS a
JOIN version as v ON (a.id = v.agreement_id)
JOIN org as o ON (o.id = a.org_id)
WHERE o.id = $1
) as u
where pos = 1
SELECT a.id, a.name, max(v.publish_date) publish_date FROM agreement AS a
JOIN version as v ON (a.id = v.agreement_id)
JOIN org as o ON (o.id = a.org_id)
WHERE o.id = $1
GROUP BY a.id, a.name

Multi-table, multi-row SQL select

How would I list all of the info about a freelancer given the schema below? Including niche, language, market, etc. The issue I am having is that every freelancer can have multiple entries for each table. So, how would I do this? Is it even possible using SQL or would I need to use my primary language (golang) for this?
CREATE TABLE freelancer (
freelancer_id SERIAL PRIMARY KEY,
ip inet NOT NULL,
username VARCHAR(20) NOT NULL,
password VARCHAR(100) NOT NULL,
email citext NOT NULL UNIQUE,
email_verified int NOT NULL,
fname VARCHAR(20) NOT NULL,
lname VARCHAR(20) NOT NULL,
phone_number VARCHAR(30) NOT NULL,
address VARCHAR(50) NOT NULL,
city VARCHAR(30) NOT NULL,
state VARCHAR(30) NOT NULL,
zip int NOT NULL,
country VARCHAR(30) NOT NULL,
);
CREATE TABLE market (
market_id SERIAL PRIMARY KEY,
market_name VARCHAR(30) NOT NULL,
);
CREATE TABLE niche (
niche_id SERIAL PRIMARY KEY,
niche_name VARCHAR(30) NOT NULL,
);
CREATE TABLE medium (
medium_id SERIAL PRIMARY KEY,
medium_name VARCHAR(30) NOT NULL,
);
CREATE TABLE format (
format_id SERIAL PRIMARY KEY,
format_name VARCHAR(30) NOT NULL,
);
CREATE TABLE lang (
lang_id SERIAL PRIMARY KEY,
lang_name VARCHAR(30) NOT NULL,
);
CREATE TABLE freelancer_by_niche (
id SERIAL PRIMARY KEY,
niche_id int NOT NULL REFERENCES niche (niche_id),
freelancer_id int NOT NULL REFERENCES freelancer (freelancer_id)
);
CREATE TABLE freelancer_by_medium (
id SERIAL PRIMARY KEY,
medium_id int NOT NULL REFERENCES medium (medium_id),
freelancer_id int NOT NULL REFERENCES freelancer (freelancer_id)
);
CREATE TABLE freelancer_by_market (
id SERIAL PRIMARY KEY,
market_id int NOT NULL REFERENCES market (market_id),
freelancer_id int NOT NULL REFERENCES freelancer (freelancer_id)
);
CREATE TABLE freelancer_by_format (
id SERIAL PRIMARY KEY,
format_id int NOT NULL REFERENCES format (format_id),
freelancer_id int NOT NULL REFERENCES freelancer (freelancer_id)
);
CREATE TABLE freelancer_by_lang (
id SERIAL PRIMARY KEY,
lang_id int NOT NULL REFERENCES lang (lang_id),
freelancer_id int NOT NULL REFERENCES freelancer (freelancer_id)
);
SELECT *
FROM freelancer
INNER JOIN freelancer_by_niche USING (freelancer_id)
INNER JOIN niche USING (niche_id)
INNER JOIN freelancer_by_medium USING (freelancer_id)
INNER JOIN medium USING (medium_id)
INNER JOIN freelancer_by_market USING (freelancer_id)
INNER JOIN market USING (market_id)
INNER JOIN freelancer_by_format USING (freelancer_id)
INNER JOIN format USING (format_id)
INNER JOIN freelancer_by_lang USING (freelancer_id)
INNER JOIN lang USING (lang_id);
And if you want to lose the unnecessary attributes from join tables like freelancer_by_format, then you can do this
SELECT a.ip, a.username, a.password, a.email, a.email_verified,
a.fname, a.lname, a.phone_number, a.address, a.city,
a.state, a.zip, a.country,
b.niche_name, c.medium_name, d.market_name, e.format_name, f.lang_name
FROM freelancer a
INNER JOIN freelancer_by_niche USING (freelancer_id)
INNER JOIN niche b USING (niche_id)
INNER JOIN freelancer_by_medium USING (freelancer_id)
INNER JOIN medium c USING (medium_id)
INNER JOIN freelancer_by_market USING (freelancer_id)
INNER JOIN market d USING (market_id)
INNER JOIN freelancer_by_format USING (freelancer_id)
INNER JOIN format e USING (format_id)
INNER JOIN freelancer_by_lang USING (freelancer_id)
INNER JOIN lang f USING (lang_id);
And if you want to change the column names, for example change "market_name" to just "market", then you go with
SELECT a.ip, ... ,
d.market_name "market", e.format_name AS "format", ...
FROM ...
Remarks
In your join tables (for example freelancer_by_niche) there is not UNIQUE constraint on freelancer_id, which means that you could have the same freelancer in multiple markets (that's ok and probably intended).
But then you also don't have a UNIQUE constraint on both attributes (freelancer_id, niche_id), which means that every freelancer could be in the SAME niche multiple times. ("Joe is in electronics. Three times").
You could prevent that by making (freelancer_id, niche_id) UNIQUE in freelancer_by_niche.
This way you would also not need a surrogate (artificial) PRIMARY KEY freelancer_by_id (id).
So what could go wrong then?
For example imagine the same information about a freelancer in the same niche three times (the same data parts of the row three times):
freelancer_by_niche
id | freelancer_id | niche_id
1 | 1 | 1 -- <-- same data (1, 1), different serial id
2 | 1 | 1 -- <-- same data (1, 1), different serial id
3 | 1 | 1 -- <-- same data (1, 1), different serial id
Then the result of the above query would return each possible row three (!) times with the same (!) content, because freelancer_by_niche can be combined three times with all the other JOINs.
You can eliminate duplicates by using SELECT DISTINCT a.id, ... FROM ... above with DISTINCT.
What if you get many duplicate rows, for example 10 data duplicates in each of the 5 JOIN tables (freelancer_by_niche, freelancer_by_medium etc)? You would get 10 * 10 * 10 * 10 * 10 = 10 ^ 5 = 100000 duplicates, which all have the exact same information.
If you then ask your DBMS to eliminate duplicates with SELECT DISTINCT ... then it has to sort 100000 duplicate rows per different row, because duplicates can be detected by sorting only (or hashing, but never mind). If you have 1000 different rows for freelancers on markets, niches, languages etc, then you are asking your DBMS to SORT 1.000 * 100.000 = 100.000.000 rows to reduce the duplicates down to the unique 1000 rows.
That is 100 million unnecessary rows.
Please make UNIQUE (freelancer_id, niche_id) for freelancer_by_niche and the other JOIN tables.
(By data duplicates i mean that the data (niche_id, freelancer_id) is the same, and only the id is auto incremented serial.)
You can easily reproduce the problem by doing the following:
-- this duplicates all data of your JOIN tables once. Do it many times.
INSERT INTO freelancer_by_niche
SELECT (niche_id, freelancer_id) FROM freelancer_by_niche;
INSERT INTO freelancer_by_medium
SELECT (medium_id, freelancer_id) FROM freelancer_by_medium;
INSERT INTO freelancer_by_market
SELECT (market_id, freelancer_id) FROM freelancer_by_market;
INSERT INTO freelancer_by_format
SELECT (format_id, freelancer_id) FROM freelancer_by_format;
INSERT INTO freelancer_by_lang
SELECT (lang_id, freelancer_id) FROM freelancer_by_lang;
Display the duplicates using
SELECT * FROM freelancer_by_lang;
Now try the SELECT * FROM freelancer INNER JOIN ... thing.
If it still runs fast, then do all the INSERT INTO freelancer_by_niche ... again and again, until it takes forever to calculate the results.
(or you get duplicates, which you can remove with DISTINCT).
Create UNIQUE data JOIN tables
You can prevent duplicates in your join tables.
Remove the id SERIAL PRIMARY KEY and replace it with a multi-attribute PRIMARY KEY (a, b):
CREATE TABLE freelancer_by_niche (
niche_id int NOT NULL REFERENCES niche (niche_id),
freelancer_id int NOT NULL REFERENCES freelancer (freelancer_id),
PRIMARY KEY (freelancer_id, niche_id)
);
(Apply this for all your join tables).
The PRIMARY KEY (freelancer_id, niche_id) will create a UNIQUE index.
This way you cannot insert duplicate data (try the INSERTs above, the will be rejected, because the information is already there once. Adding another time will not add more information AND would make your query runtime much slower).
NON-unique index on the other part of the JOIN tables
With PRIMARY KEY (freelancer_id, niche_id), Postgres creates a unique index on these two attributes (columns).
Accessing or JOINing by freelancer_id is fast, because it's first in the index. Accessing or JOINing into freelancer_by_niche.niche_id will be slow (Full Table Scan on freelancer_by_niche).
Therefore you should create an INDEX on the second part niche_id in this table freelancer_by_niche, too.
CREATE INDEX ON freelancer_by_niche (niche_id) ;
Then joins into this table on niche_id will also be faster, because they are accelerated by an index. The index makes queries faster (usually).
Summary
You have a very good normalized database schema! It's very good. But small improvements can be made (see above).