How to find all the products with specific multi attribute values - sql

I am using postgresql.
I have a table called custom_field_answers. The data looks like this
Id | product_id | value | number_value |
4 | 2 | | 117 |
3 | 1 | | 107 |
2 | 1 | bangle | |
1 | 2 | necklace | |
I want to find all the products which has text_value as 'bangle' and number_value less than 50.
Here was my first attempt.
SELECT "products".* FROM "products" INNER JOIN "custom_field_answers"
ON "custom_field_answers"."product_id" = "products"."id"
WHERE ("custom_field_answers"."value" ILIKE 'bangle')
Here is my second attempt.
SELECT "products".* FROM "products" INNER JOIN "custom_field_answers"
ON "custom_field_answers"."product_id" = "products"."id"
where ("custom_field_answers"."number_value" < 50)
Here is my final attempt.
SELECT "products".* FROM "products" INNER JOIN "custom_field_answers"
ON "custom_field_answers"."product_id" = "products"."id"
WHERE ("custom_field_answers"."value" ILIKE 'bangle')
AND ("custom_field_answers"."number_value" < 50)
but this does not select any product record.

A WHERE clause can only look at columns from one row at a time.
So if you need a condition that applies to two different rows from a table, you need to join to that table twice, so you can get columns from both rows.
SELECT p.*
FROM "products" AS p
INNER JOIN "custom_field_answers" AS a1 ON p."id" = a1."product_id"
INNER JOIN "custom_field_answers" AS a2 ON p."id" = a1."product_id"
WHERE a1."value" = 'bangle' AND a2."number_value" < 50

It produces no records because there is no custom_field_answers record that meets both criteria. What you want is a list of product_ids that have the necessary records in the table. Just in case no one gets to writing the SQL for you, and until I have a chance to work it out myself, I thought I would at least explain to you why your query is not working.

This should work:
SELECT p.* FROM products LEFT JOIN custom_field_answers c
ON (c.product_id = p.id AND c.value LIKE '%bangle%' AND c.number_value
Hope it helps

Your bangle-related number_value fields are null, so you won't be able to do a straight comparison in those cases. Instead, convert your nulls to 0s first.
SELECT "products".* FROM "products" INNER JOIN "custom_field_answers"
ON "custom_field_answers"."product_id" = "products"."id"
WHERE ("custom_field_answers"."value" LIKE '%bangle%')
AND (coalesce("custom_field_answers"."number_value", 0) < 50)

Didn't actually test it, but this general idea should work:
SELECT *
FROM products
WHERE
EXISTS (
SELECT *
FROM custom_field_answers
WHERE
custom_field_answers.product_id = products.id
AND value = 'bangle'
)
AND EXISTS (
SELECT *
FROM custom_field_answers
WHERE
custom_field_answers.product_id = products.id
AND number_value < 5
)
In plain English: Get all products such that...
there is a related row in custom_field_answers where value = 'bangle'
and there is (possibly different) related row in custom_field_answers where number_value < 5.

Related

Join table using column value as table name

Is it possible to join a table whereby the table name is a value in a column?
Here is a TABLE called food:
id food_name price_table pricing_reference_id
1 | 'apple' | 'daily_price' | 13
2 | 'banana' | 'monthly_price' | 13
3 | 'hotdog' | 'weekly_price' | 17
4 | 'sandwich' | 'monthly_price' | 9
There are three other tables (pricing tables): daily_price, weekly_price, and monthly_price tables.
Side note: Despite their names, the three pricing tables display vastly different kinds of information, which is why the three tables were not merged into one table
Each row in the food table can only be joined with one of the three pricing tables at most.
The following does not work -- it is just to illustrate what I am trying to get at:
SELECT *
FROM food
LEFT JOIN food.price_table ON food.pricing_reference_id = daily_price.id
WHERE id = 1;
Obviously the query does not work. Is there any way that the name of the table in the price_table column could be used as the table name in a join?
I would suggest left joins:
select f.*,
coalesce(dp.price, wp.price, mp.price) as price
from food f left join
daily_price dp
on f.pricing_reference_id = dp.id and
f.pricing_table = 'daily_price' left join
weekly_price wp
on f.pricing_reference_id = wp.id and
f.pricing_table = 'weekly_price' left join
monthly_price mp
on f.pricing_reference_id = mp.id and
f.pricing_table = 'monthly_price' ;
For the columns you reference, you need to use coalesce() to combine the results from the three tables. You say that the tables have different data, so you would need to list the columns separately.
The main reason I recommend this approach is performance. I think the left joins should be faster than any solution that uses union all.
Could you get your expected result using by a derived table with UNION SELECT which has a column of each table name?
SELECT *
FROM food
LEFT JOIN
(
SELECT 'daily_price' AS price_table, * FROM daily_price
UNION ALL SELECT 'monthly_price', * FROM monthly_price
UNION ALL SELECT 'weekly_price', * FROM weekly_price
) t
ON food.price_table = t.price_table AND
food.pricing_reference_id = t.id
ORDER BY food.id;
dbfiddle

LEFT JOIN but take only one row from right side

Context:
I have two tables: ks__dokument and ks_pz. It's one-to-many relation where records from ks__dokument may have multiple records assigned from ks_pz.
Goal:
I want to show every row from ks__dokument and every row from ks__dokument must be shown only once.
What I tried:
Here is query I tried:
SELECT DISTINCT ks_id, * FROM ks__dokument AS dok1
LEFT JOIN ks_pz ON ks_id = kp_ksid
But it still shows duplicates.
EDITS
That ORDER BY and WHERE was unnecessary.
I dont need DISTINCT, it's just what I tried.
STRUCTURE OF TABLES
ks__dokument structure:
| ks_id | X | X | X | X | X | X |
ks_pz:
| kp_id | kp_ksid | X | X | X |
'X' are unimportant columns. kp_ksid is foreign key for ks__dokument.
Use OUTER APPLY:
SELECT dok1.*, k2.*
FROM ks__dokument dok1 OUTER APPLY
(SELECT TOP (1) *
FROM ks_pz
WHERE ks_id = kp_ksid
) k2
WHERE ks_usuniety = 0 AND
ks_data_otrzymania >= '2020-08-31'
ORDER BY ks_rok, ks_nr ASC;
Normally, there would be an ORDER BY in the subquery to specify which row to return.
The structure of your question makes it impossible to know if the ORDER BY should be in the subquery or in the outer query -- and the same for the WHERE conditions.
You really need to specify the tables where columns are coming from.
You can try the below - move your WHERE condition clause to ON clause
SELECT DISTINCT ks_id, * FROM ks__dokument AS dok1
LEFT JOIN ks_pz ON ks_id = kp_ksid
and ks_usuniety = 0 AND ks_data_otrzymania >= '2020-08-31'
ORDER BY ks_rok, ks_nr ASC

postgres STRING_AGG() returns duplicates?

I have seen some similar posts, requesting advice for getting distinct results from the query. This can be solved with a subquery, but the column I am aggregating image_name is unique image_name VARCHAR(40) NOT NULL UNIQUE. I don't believe that should be necersarry.
This is the data in the spot_images table
spotdk=# select * from spot_images;
id | user_id | spot_id | image_name
----+---------+---------+--------------------------------------
1 | 1 | 1 | 81198013-e8f8-4baa-aece-6fbda15a0498
2 | 1 | 1 | 21b78e4e-f2e4-4d66-961f-83e5c28d69c5
3 | 1 | 1 | 59834585-8c49-4cdf-95e4-38c437acb3c1
4 | 1 | 1 | 0a42c962-2445-4b3b-97a6-325d344fda4a
(4 rows)
SELECT Round(Avg(ratings.rating), 2) AS rating,
spots.*,
String_agg(spot_images.image_name, ',') AS imageNames
FROM spots
FULL OUTER JOIN ratings
ON ratings.spot_id = spots.id
INNER JOIN spot_images
ON spot_images.spot_id = spots.id
WHERE spots.id = 1
GROUP BY spots.id;
This is the result of the images row:
81198013-e8f8-4baa-aece-6fbda15a0498,
21b78e4e-f2e4-4d66-961f-83e5c28d69c5,
59834585-8c49-4cdf-95e4-38c437acb3c1,
0a42c962-2445-4b3b-97a6-325d344fda4a,
81198013-e8f8-4baa-aece-6fbda15a0498,
21b78e4e-f2e4-4d66-961f-83e5c28d69c5,
59834585-8c49-4cdf-95e4-38c437acb3c1,
0a42c962-2445-4b3b-97a6-325d344fda4a,
81198013-e8f8-4baa-aece-6fbda15a0498,
21b78e4e-f2e4-4d66-961f-83e5c28d69c5,
59834585-8c49-4cdf-95e4-38c437acb3c1,
0a42c962-2445-4b3b-97a6-325d344fda4a
Not with linebreaks, I added them for visibility.
What should I do to retrieve the image_name's one time each?
If you don't want duplicates, use DISTINCT:
String_agg(distinct spot_images.image_name, ',') AS imageNames
Likely, there are several rows in ratings that match the given spot, and several rows in spot_images that match the given sport as well. As a results, rows are getting duplicated.
One option to avoid that is to aggregate in subqueries:
SELECT r.avg_raging
s.*,
si.image_names
FROM spots s
FULL OUTER JOIN (
SELECT spot_id, Round(Avg(ratings.rating), 2) avg_rating
FROM ratings
GROUP BY spot_id
) r ON r.spot_id = s.id
INNER JOIN (
SELECT spot_id, string_agg(spot_images.image_name, ',') image_names
FROM spot_images
GROUP BY spot_id
) si ON si.spot_id = s.id
WHERE s.id = 1
This actually could be more efficient that outer aggregation.
Note: it is hard to tell without seeing your data, but I am unsure that you really need a FULL JOIN here. A LEFT JOIN might actually be what you want.

How to work left outer join in SQl Server?

First: I know to use all types of join but I don't know why it works like this for this Query
I have a Scenario for making a SQL Query, by using 3 tables and a left outer join between selling and order items.
My Tables:
--------------------
Item
--------------------
ID | Code
--------------------
1 | 7502
SQL > select * from Item where id = 1
---------------------
Item_Order
---------------------------
Item | Box | Quantity
---------------------------
1 | 30 | 15000
1 | 12 | 6000
SQL > select * from Item_Order where Item = 1
--------------------------
Invoice_Item
-------------------
Item | Num | Quantity
-------------------------
1 | 1.64 | 10
1 | 2.4 | 8
SQL > select * from Invoice_Item where Item = 1
I want this output:
Item | OrderQ | OrderB | SellN | SellQ
-----------------------------------------
1 | 1500 | 30 | 1.64 | 10
1 | 6000 | 12 | 2.4 | 8
My SQL code:
SELECT Item.ID, Item_Order.Box As OrderB, Item_Order.Quantity As OrderQ, Invoice_Item.Num As SellN, Invoice_Item.Quantity As SellQ
FROM Item LEFT OUTER JOIN
Invoice_Item ON Item.ID = Invoice_Item.Item LEFT OUTER JOIN
Item_Order ON Item_Order.Item = Item.ID
where Item.ID = 1
Why is my output 2x? or why does my output return 4 records?
Your result can be achieve with row_number:
select a.ID
, a.OrderB
, a.OrderQ
, b.Quantity SellQ
, b.Num SellN
from
(SELECT Item.ID
, Item_Order.Box As OrderB
, Item_Order.Quantity As OrderQ
, row_number () over (order by Item.ID) rn
FROM Item
left outer JOIN Item_Order ON Item.ID = Item_Order.Item) a
left outer join (select Item
, Num
, Quantity
, row_number () over (order by Item) rn
from Invoice_Item ) b
on a.ID = b.Item
and a.rn = b.rn
Here is a demo
You can add more tables like this:
left outer join (select Item
, Num
, Quantity
, row_number () over (order by Item) rn
from Invoice_Item ) b
Because when you first join Item with Item_Order it outputs two records because there are two records in Item_Order. Now this resulting query will be left join with Invoice_Item and that two records will be join with all of the records of Invoice_Item
You can better understand this like this
SELECT Item.ID, Item_Order.Box As OrderB, Item_Order.Quantity As OrderQ, Invoice_Item.Num As SellN, Invoice_Item.Quantity As SellQ
FROM Item LEFT OUTER JOIN
Invoice_Item ON Item.ID = Invoice_Item.Item LEFT OUTER JOIN
where Item.ID = 1 into table4 //Only to explain
Now the result of first query table4 will be joined with Items_Order
You are joining on one key -- two rows with the same key in one table times two rows in the second table = 4 rows.
You need a separate key. You can generate one using row_number():
SELECT i.ID, io.Box As OrderB, io.Quantity As OrderQ,
ii.Num As SellN, ii.Quantity As SellQ
FROM Item i LEFT OUTER JOIN
((SELECT ii.*,
ROW_NUMBER() OVER (PARTITION BY ii.item ORDER BY ii.item) as seqnum
FROM Invoice_Item ii
) FULL JOIN
(SELECT io.*,
ROW_NUMBER() OVER (PARTITION BY io.item ORDER BY io.item) as seqnum
FROM Item_Order io
) io
ON io.Item = ii.ID AND io.seqnum = ii.seqnum
)
ON i. = ii.Item
where i.ID = 1;
Note that this is one of the few cases where I use parentheses in the FROM clause. This code can handle additional rows in either of the tables -- if one table is longer than the other, the columns from the other will be NULL.
If you know the two tables have the same number of rows (for a given item) you can just use inner joins and no parentheses.
It is duplicating because you have no secondary association between Invoice_Item and Item_Order. For each record in Invoice_Item it is matching to Item_Order (known as a Cartesian result) base ONLY on the Item ID. So, your order qty APPEARS to be a 1:1 reference such that the first Invoice item Qty of 10 is MEANT to be associated with Item_Order Box = 30. and Qty 8 is MEANT to be associated with Item_Order Box = 12.
Item_Order
Item Box Quantity
1 30 15000
1 12 6000
Invoice_Item
Item Num Quantity
1 1.64 10
1 2.4 8
You probably need to tack on the "Box" reference so Item_Order and Invoice_Item are a 1:1 match.
What is happening is for each item in Invoice Item is joined to the Item_Order based on Item ID. So you are getting two. If you had 3 Invoice Items with 1 and 6 of Items_Order, you would be getting 18 rows.
FEEDBACK
Even though you have an accepted answer based on an OVER/PARTITION/ROW NUMBER, that process is forcing a surrogate secondary ID to each row. Relying on this approach is not best for an overall data structure association. What happens if you delete the second item on an order. are you positive you are deleting the second item in the invoice_items?
As for returning 2 records in the original scenario, you can via the surrogate process, but I think it would be better for you long term to understand what is happening on the join. Going back to your sample data of Item_Order and Invoice_Item. So lets start with the Item_Order table. The SQL engine is going to process each row individually.
First row SQL grabs Item = 1, Box = 30, Qty = 15000.
So now it joins to the Invoice Item table, and since your criteria it only joins based on Item. So, it sees the first row and says... yup this is item 1, so include that with the item order record (first row returned). Now it goes to the second line in the invoice item table... yup, it too is the same item 1, so it returns it again (second row returned).
Now, SQL grabs the second row Item = 1, Box = 12, Qty = 6000.
Goes back to the Invoice Item table and does exact same test... and for each row in the Item Order that has an Item = 1, and 3rd and 4th row hence your doubling... If either table had more records with the same Item id, it would return that many more records... 3 and 3 records would have returned 9 rows. 4 and 4 records would return 16 rows, etc. Doing the surrogate will work, but I don't think as safe as a better/updated design structure.

SQL Server - only join if condition is met

I have three tables (at least, something similar) with the following relationships:
Item table:
ID | Val
---------+---------
1 | 12
2 | 5
3 | 22
Group table:
ID | Parent | Range
---------+---------+---------
1 | NULL | [10-30]
2 | 1 | [20-25]
3 | NULL | [0-15]
GroupToItem table:
GroupID | ItemID
---------+---------
1 | 1
1 | 3
And now I want to add rows to the GroupToItem table for Groups 2 and 3, using the same query (since some other conditions not shown here are more complicated). I want to restrict the items through which I search if the new group has a parent, but to look through all items if there is not.
At the moment I am using an IF/ELSE on two statements that are almost exactly the same, but for the addition of another JOIN row when a parent exists. Is it possible to do a join to reduce the number of items to look at, only if a restriction is possible?
My two queries as they stand are given below:
DECLARE #GroupID INT = 2;...
INSERT INTO GroupToItem(GroupID, ItemID)
SELECT g.ID,
i.ID,
FROM Group g
JOIN Item i ON i.Val IN g.Range
JOIN GroupToItem gti ON g.Parent = gti.GroupID AND i.ID = gti.ItemID
WHERE g.ID = #GroupID
-
DECLARE #GroupID INT = 3;...
INSERT INTO GroupToItem(GroupID, ItemID)
SELECT g.ID,
i.ID,
FROM Group g
JOIN Item i ON i.Val IN g.Range
WHERE g.ID = #GroupID
So essentially I only want to do the second JOIN if the given group has a parent. Is this possible in a single query? It is important that the number of items that are compared against the range is as small as possible, since for me this is an intensive operation.
EDIT: This seems to have solved it in this test setup, similar to what was suggested by Denis Valeev. I'll accept if I can get it to work with my live data. I've been having some weird issues - potentially more questions coming up.
SELECT g.Id,
i.Id
FROM Group g
JOIN Item i ON (i.Val > g.Start AND i.Val < g.End)
WHERE g.Id = 2
AND (
(g.ParentId IS NULL)
OR
(EXISTS(SELECT 1 FROM GroupToItem gti WHERE g.ParentId = gti.GroupId AND i.Id = gti.ItemId))
)
SQL Fiddle
Try this:
INSERT INTO GroupToItem(GroupID, ItemID)
SELECT g.ID,
i.ID,
FROM Group g
JOIN Item i ON i.Val IN g.Range
WHERE g.ID = #GroupID
and (g.ID in (3) or exists (select top 1 1 from GroupToItem gti where g.Parent = gti.GroupID AND i.ID = gti.ItemID))
If a Range column is a varchar datatype, you can try something like this:
INSERT INTO GROUPTOITEM (GROUPID, ITEMID)
SELECT A.ID, B.ID
FROM GROUP AS A
LEFT JOIN ITEM AS B
ON B.VAL BETWEEN CAST(SUBSTRING(SUBSTRING(A.RANGE,1,CHARINDEX('-',A.RANGE,1)-1),2,10) AS INT)
AND CAST(REPLACE(SUBSTRING(A.RANGE,CHARINDEX('-',A.RANGE,1)+1,10),']','') AS INT)