For my indexation mechanism, i'm trying to write a where clause based on subselect using PostgreSQL.
I have a ManyToOne relation (1 Entity A referenced by x Entity B), in my select Entity A, i try to add a where clause on the Entity B foreign key.
I tried a left join but i don't want to have the same result multiples times. I can use a distinct, it works but it have poor performance.
I tried something like this but it's not a valid syntax
select ...
from entity_a entA
where (select array_agg(entB.id) from entity_b entB where entB.fkEntA= entA.id) = any(?)
(The ? is replaced by one or multiples ids in my JDBC query and represent some entity B ids).
What i ask for help :
What changes should i do to the following where clause to get it working.
where (select array_agg(entB.id) from entity_b entB where entB.fkEntA= entA.id) = any(?)
EDIT :
The expected result doesn't really matter, my SQL result contains a lot a complex select statements and i push the result of the query to ElasticSearch. I just check how to make a where clause without using left join and distinct.
For the statement
select ...
from entity_a entA
where (select array_agg(entB.id) from entity_b entB where entB.fkEntA= entA.id) = any(?)
I get
sql error 42809 : Op ANY/ALL (array) requires array
but i don't find how to transfom any(?) as array
I found some currently working solutions but they are not optimized
Solution 1 : distinct + left join
select distinct ...
from entity_a entA
left join entity_b entB on entB.fkEntA = entA.id
where entB.id = any(?)
Solution 2 : multiple where clause
select ...
from entity_a entA
where (? in (select array_agg(entB.id) from entity_b entB where entB.fkEntA= entA.id)
or ? in (select array_agg(entB.id) from entity_b entB where entB.fkEntA= entA.id))
I finally found an "ugly" way to achieve this
(select array_agg(entB.id) from entity_b entB
where entB.fbEntA = entA.id) &&
(array_remove(string_to_array(translate((?)::text, '() ', ''), ','), '')::int[])
But finally i will not use it and review the database structure instead of doing this kind of query
Related
Trying to understand NOT EXISTS better. Can we always replace NOT EXISTS when we have NOT IN, even with nested situation?
I found this similar question, it only has one NOT IN while trying to do with the nested case.
We have two tables, registered and preActivity .
Registered has mId (string), aId (string), quarter (string), year (integer) and preActivity has aId (string), preAId (string) where
> mId is member id,
> aId is the activity Id,
> preAId is the prerequisite activity Id.
If we have this query with nested NOT IN to find out all the members have registered all the required activities(prerequisite) class before for activity (class) swimming at YMCA.
Can we convert it with to two nested NOT EXIST?
SELECT DISTINCT r.mid
FROM registered r
WHERE r.mid NOT IN (SELECT r.mid
FROM preActivity p
WHERE p.aid = "swimming" AND
p.preAId NOT IN (SELECT r2.mid
FROM registered r2
WHERE r2.mid = r.mid));
Using the hint for this post, we can convert one of the NOT IN, but the second one taking me hours. Can someone please help with some explanation ?
Here is what I have so far:
SELECT DISTINCT r.mid
FROM registered r
WHERE NOT EXISTS (SELECT r.mid
FROM preActivity p
WHERE p.aid = "swimming" AND
p.preAId NOT IN (SELECT r2.mid # how can we compare p.preAId with some rows selected from r2 Notice we don't have preAid field from resistered table (following the idea from the post?
FROM registered r2
WHERE r2.mid = r.mid));
Or we can't apply the same idea here since it is a two nested case ?
First thing to remember: the SELECT in an [NOT] EXISTS query doesn't matter, as we're only looking for the existence of rows. You could even write SELECT 1/0 and not get an error. So most people write [NOT] EXISTS (SELECT 1. (I like to put that all on one line and leave the rest of the subquery on new lines)
Secondly, a NOT IN query can have issues surrounding null columns, so it's best to always write a NOT EXISTS instead.
Now, if you analyze an [NOT] IN query, you will see that the semi-join is on the column just before with the column in the SELECT. So a query:
X.colA [NOT] IN
(SELECT Y.colA FROM Y)
can always be converted to
[NOT] EXISTS (SELECT 1
FROM Y
WHERE Y.colA = X.colA)
Another interesting syntax, most useful with multi-column joins or nullable columns, is:
[NOT] EXISTS (
SELECT X.colA
INTERSECT
SELECT Y.colA
FROM Y)
Don't forget to always use the correct table alias on the subquery columns, if you get this wrong then your query can return incorrect results without you noticing.
For example, what happens here?
[NOT] EXISTS (SELECT 1
FROM Y
WHERE X.colA = colA)
In your case, your first NOT IN query is slightly weird.
You are putting r.mid on both sides of the join, so effectively this becomes an EXISTS anyway.
So your query can be rewritten as this:
select distinct r.mid
from registered r
where not exists (select 1
From preActivity p
where p.aid = "swimming" and
not exists (select 1
From registered r2
where r2.mid = r.mid and r2.mid = p.preAId
)
);
Query :
SELECT *
FROM dbo.employer_job
LEFT JOIN dbo.employer_user
ON dbo.employer_job.employer_id = dbo.employer_user.employer_user_id
LEFT JOIN dbo.company_profile
ON dbo.company_profile.company_id = dbo.employer_user.company_id
Duplicate column results :
dbo.employer_job schema :
dbo.employer_user schema :
dbo.comnpany_profile schema :
How do I remove the duplicate company_id column? My Python app won't accept duplicated columns from the database. Most suggest to use left join but that's not solving the issue.
Don't use *, but list the columns you want from each table (preferably with an alias).
SELECT EJ.job_id, EJ.employer_id, ....
FROM dbo.employer_job EJ
...
It's verbose, but how else would the database engine know what you'd like to see?
You need to list the columns explicitly -- and qualify them:
SELECT ej.*, eu.employee_user_email,
cp.company_name, . . .
FROM dbo.employer_job ej LEFT JOIN
dbo.employer_user eu
ON ej.employer_id = eu.employer_user_id LEFT JOIN
dbo.company_profile cp
ON cp.company_id = eu.company_id;
I'm not sure if the LEFT JOIN is really needed. Note that this introduces table aliases, so the query is easier to write and to read.
I have a join query that works:
The goal is to retrieve candidates not associated with campaigns with id in id (Array) through campaign_addings
Candidate.joins("LEFT OUTER JOIN campaign_addings ON campaign_addings.candidate_id = candidates.id AND campaign_addings.campaign_id IN (#{id.join(',')})").where('campaign_addings.candidate_id IS NULL')
But well.. This query is really ugly, I would like to recreate this behavior with pure ActiveRecord DSL and no SQL, I have this for now:
Candidate.left_outer_joins(:campaign_addings).where(campaign_addings: { campaign_id: id, candidate_id: nil })
The good query generates:
SELECT COUNT(*) FROM "candidates" LEFT OUTER JOIN campaign_addings ON campaign_addings.candidate_id = candidates.id AND campaign_addings.campaign_id IN (4) WHERE (campaign_addings.candidate_id IS NULL)
And the bad one:
SELECT COUNT(DISTINCT "candidates"."id") FROM "candidates" LEFT OUTER JOIN "campaign_addings" ON "campaign_addings"."candidate_id" = "candidates"."id" WHERE "campaign_addings"."campaign_id" = 4 AND "campaign_addings"."candidate_id" IS NULL
You can see this is really close, but I think I a missing something, can't find the way to join something like the first query.
Edit:
Plus, there is a SQL Injection vulnerability
I'm trying to iterate over an array field in order to use every record as
a parameter to query and finally join all results, but I need help to get it.
I have a table with an array field called fleets and it can have one or more values ie. {1,2,3} I need iterate over every value to get all vehicles belonging to these fleets.
With a subquery I'm getting 3 rows with these values 1,2,3
SELECT * FROM vehicles WHERE fleet_fk=(
SELECT unnest(fleets) FROM auth_user WHERE id=4)
I'm using PostgreSQL 9.4.
If your query raises the ERROR: more than one row returned by a subquery used as an expression that means that you should use ANY:
SELECT * FROM vehicles
WHERE fleet_fk = ANY(
SELECT unnest(fleets) FROM auth_user WHERE id=4)
Since fleets is an array column you have a couple of options.
Either use the ANY construct directly (no need to unnest()):
SELECT * FROM vehicles
WHERE fleet_fk = ANY(SELECT fleets FROM auth_user WHERE id = 4);
Or rewrite as join:
SELECT v.*
FROM auth_user a
JOIN vehicles v ON v.fleet_fk = ANY(a.fleets)
WHERE a.id = 4;
Or you can unnest(), then you don't need ANY any more:
SELECT v.*
FROM auth_user a
, unnest(a.fleets) fleet_fk -- implicit LATERAL join
JOIN vehicles v USING (fleet_fk)
WHERE a.id = 4;
This is assuming you don't have another column named fleet_fk in auth_user. Use the more explicit ON clause for the join in this case to avoid the ambiguity.
Be aware that there are two implementation for ANY.
ANY for sets
ANY for arrays
Behavior of the beasts is basically the same, you just feed them differently.
DB design
Consider normalizing the hidden many-to-many (or one-to-many?) relationship in your DB schema:
How to implement a many-to-many relationship in PostgreSQL?
Is is possible to accomplish the equivalent of a LEFT JOIN with subselect where multiple columns are required.
Here's what I mean.
SELECT m.*, (SELECT * FROM model WHERE id = m.id LIMIT 1) AS models FROM make m
As it stands now doing this gives me a 'Operand should contain 1 column(s)' error.
Yes I know this is possible with LEFT JOIN, but I was told it was possible with subselect to I'm curious as to how it's done.
There are many practical uses for what you suggest.
This hypothetical query would return the most recent release_date (contrived example) for any make with at least one release_date, and null for any make with no release_date:
SELECT m.make_name,
sub.max_release_date
FROM make m
LEFT JOIN
(SELECT id,
max(release_date) as max_release_date
FROM make
GROUP BY 1) sub
ON sub.id = m.id
A subselect can only have one column returned from it, so you would need one subselect for each column that you would want returned from the model table.