Left join returns duplicate rows

Left join returns duplicate rows - sql

I am just learning SQL and I'm really struggling to understand why my left join is returning duplicate rows. This is the query I'm using:
SELECT "id", "title"
FROM "posts"
LEFT JOIN "comments" "comment"
ON "comment"."post_id"="id" AND ("comment"."status" = 'hidden')
It returns 4 rows, but should only return 3. Two of the returned rows contain are duplicate (same values). I can fix this by using the DISTINCT prefix on "id".
SELECT DISTINCT "id", "title"
FROM "posts"
LEFT JOIN "comments" "comment"
ON "comment"."post_id"="id" AND ("comment"."status" = 'hidden')
The query returns 3 rows and I get desired result. But I'm still wondering why in the world I would get a duplicate row from the first query in the first place? I'm trying to write an aggregation query and this seems to be the issue I'm having.
I'm using PostgreSQL.
More specific: (as created by my ORM)
Shift DDL
CREATE TABLE shift (
id uuid DEFAULT uuid_generate_v4() PRIMARY KEY,
"gigId" uuid REFERENCES gig(id) ON DELETE CASCADE,
"categoryId" uuid REFERENCES category(id),
notes text,
"createdAt" timestamp without time zone NOT NULL DEFAULT now(),
"updatedAt" timestamp without time zone NOT NULL DEFAULT now(),
"salaryFixed" numeric,
"salaryHourly" numeric,
"salaryCurrency" character varying(3) DEFAULT 'SEK'::character varying,
"staffingMethod" character varying(255) NOT NULL DEFAULT 'auto'::character varying,
"staffingIspublished" boolean NOT NULL DEFAULT false,
"staffingActivateon" timestamp with time zone,
"staffingTarget" integer NOT NULL DEFAULT 0
);
ShiftEmployee DDL
CREATE TABLE "shiftEmployee" (
"employeeId" uuid REFERENCES employee(id) ON DELETE CASCADE,
"shiftId" uuid REFERENCES shift(id) ON DELETE CASCADE,
status character varying(255) NOT NULL,
"updatedAt" timestamp without time zone NOT NULL DEFAULT now(),
"salaryFixed" numeric,
"salaryHourly" numeric,
"salaryCurrency" character varying(3) DEFAULT 'SEK'::character varying,
CONSTRAINT "PK_6acfd2e8f947cee5a62ebff08a5" PRIMARY KEY ("employeeId", "shiftId")
);
Query
SELECT "id", "staffingTarget" FROM "shift" LEFT JOIN "shiftEmployee" "se" ON "se"."shiftId"="id" AND ("se"."status" = 'confirmed');
Result
id staffingTarget
68bb0892-9bce-4d08-b40e-757cb0889e87 3
12d88ff7-9144-469f-8de5-3e316c4b3bbd 6
73c65656-e028-4f97-b855-43b00f953c7b 5
68bb0892-9bce-4d08-b40e-757cb0889e88 3
e3279b37-2ba5-4f1d-b896-70085f2ba345 4
e3279b37-2ba5-4f1d-b896-70085f2ba346 5
e3279b37-2ba5-4f1d-b896-70085f2ba346 5
789bd2fb-3915-4cda-a3d7-2186cf5bb01a 3

If a post has more than one hidden comment, you will see that post multiple times because a join returns one row for each match - that's the nature of a join. And an outer join doesn't behave differently.
If your intention is to list only posts with hidden comments, it's better to use an EXISTS query instead:
SELECT p.id, p.title
FROM posts p
where exists (select *
from comments c
where c.post_id = p.id
and c.status = 'hidden');

Related

How to update a column in a table A using the value from another table B wherein the relationship between tables A & B is 1:N by using max() function

I have two tables namely loan_details and loan_his_mapping with 1:N relationship. I need to set the hhf_request_id of loan_details table by the value which is present in the loan_his_mapping table for each loan.
Since the relationship is 1:N , I want to consider the record for each loan from loan_his_mapping table with two conditions mentioned below. The table definitions are as follows:
CREATE TABLE public.loan_details
(
loan_number bigint NOT NULL,
hhf_lob integer,
hhf_request_id integer,
status character varying(100),
CONSTRAINT loan_details_pkey PRIMARY KEY (loan_number)
);
CREATE TABLE public.loan_his_mapping
(
loan_number bigint NOT NULL,
spoc_id integer NOT NULL,
assigned_datetime timestamp without time zone,
loan_spoc_map_id bigint NOT NULL,
line_of_business_id integer,
request_id bigint,
CONSTRAINT loan_spoc_his_map_id PRIMARY KEY (loan_spoc_map_id),
CONSTRAINT fk_loan_spoc_loan_number_his FOREIGN KEY (loan_number)
REFERENCES public.loan_details (loan_number) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION );
The joining conditions while updating are:
The Records of loan_details with hhf_lob = 4 and status='Release'
I should consider that record for updating value among 'N' number of records from loan_his_mapping table with value max(loan_spoc_map_id) for each loan.
The query I have right now
update lsa_loan_details ldet
set hhf_request_id = history.request_id
from loan_his_mapping history
where ldet.loan_number = history.loan_number and ldet.status='Release' and ldet.hhf_lob=4 and
history.line_of_business_id=4 ;
I want to know how to use that record for each loan from loan_his_mapping with max(loan_spoc_map_id) to update column of loan_details table. Please Assist!

You need a sub-query to fetch the row corresponding to the highest loan_spoc_map_id
Something along the lines:
update loan_details ldet
set hhf_request_id = history.request_id
from (
select distinct on (loan_spoc_map_id) loan_number, request_id
from loan_his_mapping lhm
where lhm.line_of_business_id = 4
order by loan_spoc_map_id desc
) as history
where ldet.loan_number = history.loan_number
and ldet.status = 'Release'
and ldet.hhf_lob = 4;

How can I fix "operator does not exist: text = uuid" when using Haskell's postgres-simple library to perform a multi-row insert?

I am using the postgres-simple library to insert into the eligible_class_passes table. Which is essentially a join table representing a many to many relationship.
I am using the executeMany function from the postgres-simple to do a multi row insert.
updateEligibleClassPasses :: Connection -> Text -> Text -> [Text] -> IO Int64
updateEligibleClassPasses conn tenantId classTypeId classPassTypeIds =
withTransaction conn $ do
executeMany
simpleConn
[sql|
INSERT INTO eligible_class_passes (class_type_id, class_pass_type_id)
SELECT upd.class_type_id::uuid, upd.class_pass_type_id::uuid
FROM (VALUES (?, ?, ?)) as upd(class_type_id, class_pass_type_id, tenant_id)
INNER JOIN class_types AS ct
ON upd.class_type_id::uuid = ct.id
INNER JOIN subscription_types AS st
ON class_pass_type_id::uuid = st.id
WHERE ct.tenant_id = upd.tenant_id::uuid AND st.tenant_id = upd.tenant_id::uuid
|]
params
where
addParams classPassTypeId = (classTypeId, classPassTypeId, tenantId)
params = addParams <$> classPassTypeIds
When this function is executed with the correct parameters applied I get the following runtime error
SqlError {sqlState = "42883", sqlExecStatus = FatalError, sqlErrorMsg = "operator does not exist: text = uuid", sqlErrorDetail = "", sqlErrorHint = "No operator matches the given name and argument type(s). You might need to add explicit type casts."}
However, when translated to SQL without the parameter substitutions (?) the query works correctly when executed in psql.
INSERT INTO eligible_class_passes (class_type_id, class_pass_type_id)
SELECT upd.class_type_id::uuid, upd.class_pass_type_id::uuid
FROM (VALUES ('863cb5ea-7a68-41d5-ab9f-5344605de500', 'e9195660-fd48-4fa2-9847-65a0ad323bd5', '597e6d7a-092a-49be-a2ea-11e8d85d8f82')) as upd(class_type_id, class_pass_type_id, tenant_id)
INNER JOIN class_types AS ct
ON upd.class_type_id::uuid = ct.id
INNER JOIN subscription_types AS st
ON class_pass_type_id::uuid = st.id
WHERE ct.tenant_id = upd.tenant_id::uuid AND st.tenant_id = upd.tenant_id::uuid;
My schema is as follows
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
CREATE TABLE tenants (
id UUID NOT NULL DEFAULT uuid_generate_v4() PRIMARY KEY,
name text NOT NULL UNIQUE,
email text NOT NULL UNIQUE,
created_at timestamp with time zone NOT NULL default now(),
updated_at timestamp with time zone NOT NULL default now()
);
CREATE TABLE class_types (
id UUID NOT NULL DEFAULT uuid_generate_v4() PRIMARY KEY,
FOREIGN KEY (tenant_id) REFERENCES tenants (id),
created_at timestamp with time zone NOT NULL default now(),
updated_at timestamp with time zone NOT NULL default now()
);
CREATE TABLE class_pass_types (
id UUID NOT NULL DEFAULT uuid_generate_v4() PRIMARY KEY,
name TEXT NOT NULL,
tenant_id UUID NOT NULL,
price Int NOT NULL,
created_at timestamp with time zone NOT NULL default now(),
updated_at timestamp with time zone NOT NULL default now(),
FOREIGN KEY (tenant_id) REFERENCES tenants (id)
);
-- Many to many join through table.
-- Expresses class pass type redeemability against class types.
CREATE TABLE eligible_class_passes (
class_type_id UUID,
class_pass_type_id UUID,
created_at timestamp with time zone NOT NULL default now(),
updated_at timestamp with time zone NOT NULL default now(),
FOREIGN KEY (class_type_id) REFERENCES class_types (id) ON DELETE CASCADE,
FOREIGN KEY (class_pass_type_id) REFERENCES class_pass_types (id) ON DELETE CASCADE,
PRIMARY KEY (
class_type_id, class_pass_type_id
)
);

To help debug your issue, use formatQuery function, then you can see what kind of final query postgresql-simple is sending to the server.
Also, I'd recommend using UUID type from uuid-types package, instead of Text for the uuids. Using Text most likely hides some issues from you (which you'll hopefully see by using formatQuery.

PostgreSQL: Select dynamic column in correlated subquery

I'm using the Entity-Attribute-Value (EAV) pattern to store 'overrides' for target objects. That is, there are three tables:
Entity, contains the target records
Attribute, contains the column names of 'overridable' columns in the Entity table
Override, contains the EAV records
What I'd like to do is select Overrides along with the value of the 'overridden' column from the Entity table. Thus, requiring dynamic use of the Attribute name in the SQL.
My naive attempt in (PostgreSQL) SQL:
SELECT
OV.entity_id as entity,
AT.name as attribute,
OV.value as value,
ENT.base_value as base_value
FROM "override" AS OV
LEFT JOIN "attribute" as AT
ON (OV.attribute_id = AT.id)
LEFT JOIN LATERAL (
SELECT
id,
AT.name as base_value -- AT.name doesn't resolve to a SQL identifier
FROM "entity"
) AS ENT
ON ENT.id = OV.entity_id;
This doesn't work as AT.name doesn't resolve to a SQL identifier and simply returns column names such as 'col1', 'col2', etc. rather than querying Entity with the column name.
I'm aware this is dynamic SQL but I'm pretty new to PL/pgSQL and couldn't figure out as it is correlated/lateral joined. Plus, is this even possible since the column types are not homogeneously typed? Note all the 'values' in the Override table are stored as strings to get round this problem.
Any help would be most appreciated!

You can use PL/pgSQL to dynamically request the columns. I'm assuming the following simplified database structure (all original and overide values are "character varying" in this example as I didn't find any further type information):
CREATE TABLE public.entity (
id integer NOT NULL DEFAULT nextval('entity_id_seq'::regclass),
attr1 character varying,
attr2 character varying,
<...>
CONSTRAINT entity_pkey PRIMARY KEY (id)
)
CREATE TABLE public.attribute (
id integer NOT NULL DEFAULT nextval('attribute_id_seq'::regclass),
name character varying,
CONSTRAINT attribute_pkey PRIMARY KEY (id)
)
CREATE TABLE public.override (
entity_id integer NOT NULL,
attribute_id integer NOT NULL,
value character varying,
CONSTRAINT override_pkey PRIMARY KEY (entity_id, attribute_id),
CONSTRAINT override_attribute_id_fkey FOREIGN KEY (attribute_id)
REFERENCES public.attribute (id),
CONSTRAINT override_entity_id_fkey FOREIGN KEY (entity_id)
REFERENCES public.entity (id))
With the PL/pgSQL function
create or replace function get_base_value(
entity_id integer,
column_identifier character varying
)
returns setof character varying
language plpgsql as $$
declare
begin
return query execute 'SELECT "' || column_identifier || '" FROM "entity" WHERE "id" = ' || entity_id || ';';
end $$;
you can use almost exactly your query:
SELECT
OV.entity_id as entity,
AT.name as attribute,
OV.value as value,
ENT.get_base_value as base_value
FROM "override" AS OV
LEFT JOIN "attribute" as AT
ON (OV.attribute_id = AT.id)
LEFT JOIN LATERAL (
SELECT id, get_base_value FROM get_base_value(OV.entity_id, AT.name)
) AS ENT
ON ENT.id = OV.entity_id;

postgres fast check if attribute combination also exists in another table

I want to check if the same two attribute values exist in two different tables. If the combination from table_a is not existing in table_b it should be inserted into the select statement table. Right now I have the following query, which is working:
CREATE TABLE table_a (
attr_a integer,
attr_b text,
uuid character varying(200),
CONSTRAINT table_a_pkey PRIMARY KEY (uuid)
);
CREATE TABLE table_b (
attr_a integer,
attr_b text,
uuid character varying(200),
CONSTRAINT table_b_pkey PRIMARY KEY (uuid)
);
SELECT * FROM table_a
WHERE (table_a.attr_a::text || table_a.attr_b::text) != ALL(SELECT (table_b.attr_a::text || table_b.attr_a::text) FROM table_b)
However, the execution time is pretty long. So I would like to ask if there is a faster solution to check for that.

Your where clause uses a manipulation of attr_a (casting it to text and concatinating with attr_b), so the index can't be used. Instead of this concatination, why not try a straight-forward exists operator?
SELECT *
FROM table_a a
WHERE NOT EXISTS (SELECT *
FROM table_b b
WHERE a.attr_a = b.attr_a AND
b.attr_b = b.attr_b)

SQL JOIN To Find Records That Don't Have a Matching Record With a Specific Value

I'm trying to speed up some code that I wrote years ago for my employer's purchase authorization app. Basically I have a SLOW subquery that I'd like to replace with a JOIN (if it's faster).
When the director logs into the application he sees a list of purchase requests he has yet to authorize or deny. That list is generated with the following query:
SELECT * FROM SA_ORDER WHERE ORDER_ID NOT IN
(SELECT ORDER_ID FROM SA_SIGNATURES WHERE TYPE = 'administrative director');
There are only about 900 records in sa_order and 1800 records in sa_signature and this query still takes about 5 seconds to execute. I've tried using a LEFT JOIN to retrieve records I need, but I've only been able to get sa_order records with NO matching records in sa_signature, and I need sa_order records with "no matching records with a type of 'administrative director'". Your help is greatly appreciated!
The schema for the two tables is as follows:
The tables involved have the following layout:
CREATE TABLE sa_order
(
`order_id` BIGINT PRIMARY KEY AUTO_INCREMENT,
`order_number` BIGINT NOT NULL,
`submit_date` DATE NOT NULL,
`vendor_id` BIGINT NOT NULL,
`DENIED` BOOLEAN NOT NULL DEFAULT FALSE,
`MEMO` MEDIUMTEXT,
`year_id` BIGINT NOT NULL,
`advisor` VARCHAR(255) NOT NULL,
`deleted` BOOLEAN NOT NULL DEFAULT FALSE
);
CREATE TABLE sa_signature
(
`signature_id` BIGINT PRIMARY KEY AUTO_INCREMENT,
`order_id` BIGINT NOT NULL,
`signature` VARCHAR(255) NOT NULL,
`proxy` BOOLEAN NOT NULL DEFAULT FALSE,
`timestamp` TIMESTAMP NOT NULL DEFAULT NOW(),
`username` VARCHAR(255) NOT NULL,
`type` VARCHAR(255) NOT NULL
);

Create an index on sa_signatures (type, order_id).
This is not necessary to convert the query into a LEFT JOIN unless sa_signatures allows nulls in order_id. With the index, the NOT IN will perform as well. However, just in case you're curious:
SELECT o.*
FROM sa_order o
LEFT JOIN
sa_signatures s
ON s.order_id = o.order_id
AND s.type = 'administrative director'
WHERE s.type IS NULL
You should pick a NOT NULL column from sa_signatures for the WHERE clause to perform well.

You could replace the [NOT] IN operator with EXISTS for faster performance.
So you'll have:
SELECT * FROM SA_ORDER WHERE NOT EXISTS
(SELECT ORDER_ID FROM SA_SIGNATURES
WHERE TYPE = 'administrative director'
AND ORDER_ID = SA_ORDER.ORDER_ID);
Reason : "When using “NOT IN”, the query performs nested full table scans, whereas for “NOT EXISTS”, query can use an index within the sub-query."
Source : http://decipherinfosys.wordpress.com/2007/01/21/32/

This following query should work, however I suspect your real issue is you don't have the proper indices in place. You should have an index on the SA_SGINATURES table on the ORDER_ID column.
SELECT *
FROM
SA_ORDER
LEFT JOIN
SA_SIGNATURES
ON
SA_ORDER.ORDER_ID = SA_SIGNATURES.ORDER_ID AND
TYPE = 'administrative director'
WHERE
SA_SIGNATURES.ORDER_ID IS NULL;

select * from sa_order as o inner join sa_signature as s on o.orderid = sa.orderid and sa.type = 'administrative director'
also, you can create a non clustered index on type in sa_signature table
even better - have a master table for types with typeid and typename, and then instead of saving type as text in your sa_signature table, simply save type as integer. thats because computing on integers is way faster than computing on text

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Left join returns duplicate rows - sql

Related

How to update a column in a table A using the value from another table B wherein the relationship between tables A & B is 1:N by using max() function

How can I fix "operator does not exist: text = uuid" when using Haskell's postgres-simple library to perform a multi-row insert?

PostgreSQL: Select dynamic column in correlated subquery

postgres fast check if attribute combination also exists in another table

SQL JOIN To Find Records That Don't Have a Matching Record With a Specific Value

Categories

Resources