Optimizing tricky SQL search query - sql

I am trying to come up with a simple, performant query for the following problem:
Let's say there are several entities (items) which all have a unique ID. The entities have a variable set of attributes (properties), which therefore have been moved to a separate table:
T_Items_Props
=======================
Item_ID Prop_ID Value
-----------------------
101 1 'abc'
101 2 '123'
102 1 'xyz'
102 2 '123'
102 3 '102'
... ... ...
Now I want to search for an item, that matches some specified search-criteria, like this:
<<Pseudo-SQL>>
SELECT Item_Id(s)
FROM T_Items_Props
WHERE Prop 1 = 'abc'
AND Prop 2 = '123'
...
AND Prop n = ...
This would be fairly easy if I had a table like Items(Id, Prop_1, Prop_2, ..., Prop_n). Then I could do a simple SELECT where the search criteria could simply (even programmatically) be inserted in the WHERE-clause, but in this case I would have to do something like:
SELECT t1.Item_ID
FROM T_Items_Props t1
, T_Items_Props t2
, ...
, T_Items_Props tn -- (depending on how many properties to compare)
AND t1.Item_ID = t2.Item_ID
AND t1.Prop_ID = 1 AND t1.Value = 'abc'
AND t2.Prop_ID = 2 AND t2.Value = '123'
...
AND tn.Prop_ID = n AND tn.Value = ...
Is there a better/simpler/faster way to do this?

To make the query more readable, you could do something like:
SELECT
t1.Item_ID
FROM
T_Items_Props t1
where convert(varchar(10), t1.Item_ID) + ';' + t1.Value in (
'1;abc',
'2;123',
...
)
NOTE: This assumes, that your IDs will not have more than 10 digets. It might also slow your query down, due to the extra type conversion and string concatanation.

You could count the number of correct Props. This isn't very good in case there could be duplicates. E.g.:
Prop_ID = 1 AND Value = 'abc'
Prop_ID = 2 AND Value = '123'
and the table would look like:
T_Items_Props
=======================
Item_ID Prop_ID Value
-----------------------
101 1 'abc'
101 1 'abc'
this would then be true, although it shouldn't.
But if you wanna give it a try, here's how:
SELECT nested.* FROM (
SELECT item_id, count(*) AS c FROM t_items_props
WHERE ((prop = 1 AND value = 'abc')
OR (prop = 2 AND value = '123')
... more rules here ...)
GROUP BY item_id) nested
WHERE nested.c > 2 ... number of rules ...

I've offered this in a previous post of similar querying intentions. The user could have 2 criteria one time, and five criteria another and wanted an easy way to build the SQL command. To simplify the need of having to add FROM tables and update the WHERE clause, you can simplify by doing joins and put that criteria right at the join level... So, each criteria is it's own set added to the mix.
SELECT
t1.Item_ID
FROM
T_Items_Props t1
JOIN T_Items_Props t2
on t1.Item_ID = t2.Item_ID
AND t2.Prop_ID = 2
AND t2.Value = '123'
JOIN T_Items_Props t3
on t1.Item_ID = t3.Item_ID
AND t3.Prop_ID = 6
AND t3.Value = 'anything'
JOIN T_Items_Props t4
on t1.Item_ID = t4.Item_ID
AND t4.Prop_ID = 15
AND t4.Value = 'another value'
WHERE
t1.Prop_ID = 1
AND t1.Value = 'abc'
Notice the primary query will always start with a minimum of the "T1" property/value criteria, but then, notice the JOIN clauses... they are virtually the same so it is very easy to implement via a loop... Just keep aliasing the T2, T3, T4... as needed. This will start with any items that meet the T1 criteria, but then also require all the rest to be found too.

You can use a join statement together with filtering or faceted search. It gives better performance because you can limit the search space. Here is a good example: Faceted Search (solr) vs Good old filtering via PHP?.

Related

Scalar subquery produced more than one element, using UNNEST

I have the following sentence, as I have read, the UNNEST should be used, but I don't know how
select
(
select
case
when lin = 4 then 1
when lin != 4 then null
end as FLAG,
from table_1
WHERE
table_1.date = table_2.date
AND
table_1.acc = table_2.acc
) as FLAG
FROM
table_2
I have a lot of subqueries and that's why I can't use LEFT JOIN.
Currently my table_2 has 13 million records, and table_1 has 400 million records, what I want is to be able to show a FLAG for each account, knowing that the data universes are different.
I can't use LEFT JOIN ...
Scalar subquery produced more than one element ...
simply use below version of your query - which logically is equivalent to your original query but eliminating the issue you have
select *,
(
select 1 as FLAG
from table_1
WHERE
table_1.date = table_2.date
AND
table_1.acc = table_2.acc
AND lin = 4
LIMIT 1
) as FLAG
FROM
table_2
This indicates that you expect to see one and only one value from the subquery but when you join to table_1 you are getting duplicates on the date and acc fields. If you aggregate the value or remove the duplicates from table_1 that should solve your issue, although at that point why not just use a more efficient JOIN?
-- This will solve your immediate problem, use any aggregation
-- technique, I picked MAX because why not. I do not know your use case.
select
(
select
MAX(case
when lin = 4 then 1
when lin != 4 then null
end) as FLAG
from table_1
WHERE
table_1.date = table_2.date
AND
table_1.acc = table_2.acc
) as FLAG
FROM
table_2
A better way to do this would be
select
case
when t1.lin = 4 then 1
when t1.lin != 4 then null
end as FLAG
FROM
table_2 t2
LEFT JOIN
table_1 t1 ON (
t1.date = t2.date
AND t1.acc = t2.acc
)
As you said, left joins do not work for you. If you want multiple results to be nested in the same subquery then just wrap your subquery in an ARRAY() function to allow for repeated values.
select
ARRAY(
select
case
when lin = 4 then 1
when lin != 4 then null
end as FLAG
from table_1
WHERE
table_1.date = table_2.date
AND
table_1.acc = table_2.acc
-- Now you must filter out results that will return
-- NULL because NULL is not allowed in an ARRAY
AND
table_1.lin = 4
) as FLAG
FROM
table_2

SQL Select Query - "Pairing" Records Together

Here's my data:
Table1...
id
field1
field2
...
Table2...
id
table1_original_id
table1_new_id
Table1 holds records that cannot themselves be updated though I built a mechanism for my users to be able to "update" them... they select a record, make changes, and those changes are actually submitted as a new record. For conversation's sake let's assume there are currently 10 records in Table1 with IDs 1 through 10. User submits an update to ID 3. ID 3 remains as it was and a new record, ID 11, is added to Table1. Along with this new record, Table2 gets a new record, with ID = 1, t1_original_id = 3 and t1_new_id = 11.
What would be by SQL select to retrieve the "pairs" of records from Table1... in this case the query would provide me with... IDs 3 and 11.
scratching head
I don't think it matters much by DB platform is Postgres 8.4 and I'm retrieving the data via PHP to be processed in jqGrid. Bonus points if you can point me in the direction of displaying each pair of records in a separate jqGrid!
Thanks for your time and effort.
=== EDIT ===
The following is a sample of what I need returned from the query based on the scenario above:
id field1 field2
3 blah stuff
11 more things
Once I have these pairs back I can process them further, as necessary, on the PHP side.
Standard SQL solution
To get two separate rows (as later specified in the Q update) use a UNION query:
SELECT 'old' AS version, t1.id, t1.field1, t1.field2
FROM table1 t1
JOIN table2 t2 ON t2.table1_original_id = t1.id
WHERE t2.id = $some_t2_id -- your t2.id here!
UNION ALL
SELECT 'new' as version, t1.id, t1.field1, t1.field2
FROM table1 t1
JOIN table2 t2 ON t2.table1_new_id = t1.id
WHERE t2.id = $some_t2_id; -- your t2.id here!
Note: this is one query.
Returns as requested, plus a column version to distinguish the rows reliably:
version | id | field1 | field2
--------+----+--------+--------
old | 3 | blah | stuff
new | 11 | more | things
Faster Postgres-specific solution
A more sophisticated, but shorter and faster solution:
SELECT version, id, field1, field2
FROM (
SELECT unnest(ARRAY['old', 'new']) AS version
,unnest(ARRAY[table1_original_id, table1_new_id]) AS id
FROM table2 t2 WHERE t2.id = $some_t2_id -- your t2.id here!
) t2
JOIN table1 USING (id);
Same result.
-> SQLfiddle demo for both.
You'll need to JOIN to Table1 twice, something like:
SELECT orig.id AS Orig_ID
, orig.value AS Orig_Value
, n.id AS New_ID
, n.value AS New_Value
FROM Table2 a
JOIN Table1 AS orig
ON a.table1_original_id = orig.id
JOIN Table1 AS n
ON a.table1_new_id = n.id
Demo: SQL Fiddle
Update:
To get them paired as you want without manually choosing a set you'll need something like this:
SELECT sub.id,sub.value
FROM (SELECT a.id as Update_ID,o.id,o.value, '1' AS sort
FROM Table2 a
JOIN Table1 AS o
ON a.table1_original_id = o.id
UNION
SELECT a.id as Update_ID, n.id,n.value, '2' AS sort
FROM Table2 a
JOIN Table1 AS n
ON a.table1_new_id = n.id
) AS sub
ORDER BY Update_ID,sort
Demo: Sql Fiddle
Notes: Change UNION to UNION + ALL, can't post those words next to each other due to firewall limitation.
The hard-coded '1' and '2' are so that original always appear before new.

How should this SQL query be constructed? (Postgres)

I've never had to check for something like this before, so I'm hoping y'all can help. I'm not sure if it's best to create a temporary table, then check if an item is in it or if there's a better way.
Each row in the table represent an object that may have a parent. The parent is stored in the same table. In this example, the parent_id field holds the primary key of a row's parent. I'm trying to select all rows in the table which have the type column set to a particular value and whose parent rows have a 'z' in the field_b column. The part in parenthesis obviously needs work...
SELECT s_id, s_text, s_parent_id
FROM sections
WHERE s_derivedtype >= 10000 AND
if this returns anything
SELECT s_id
FROM sections
WHERE s_id = {the s_parent_id from the first query) AND s_flags LIKE '%z%'
I've updated this to hopefully be easier to read...
What's the most efficient way to do this? I'm expecting something like 100k rows returned out of 18m in the table, so decent performance is non-trivial.
SELECT key_field, field_a
FROM
t s
inner join
t p on p.key_field = s.key_field
WHERE
s.type = 1
AND p.field_b LIKE '%z%'
try
SELECT t1.key_field, t1.field_a
FROM tbl AS t1
WHERE t1.type = 1 AND parent_id = (SELECT t2.id FROM tbl AS t2
WHERE t2.id = t1.parent_id
AND t2.field_b LIKE '%z%')
or
SELECT t1.key_field, t1.field_a
FROM tbl AS t1
INNER JOIN tbl AS t2 ON t2.id = t1.parent_id
WHERE t1.type = 1
AND t2.field_b LIKE '%z%'

MYSQL join, return first matching row only from where join condition using OR

I'm having a problem with a particular MySQL query.
I have table1, and table2, table2 is joined onto table1.
Now the problem is that I am joining table2 to table1 with a condition that looks like:
SELECT
table1.*, table2.*
JOIN table2 ON ( table2.table1_id = table1.id
AND ( table2.lang = 'fr'
OR table2.lang = 'eu'
OR table2.lang = 'default') )
I need it to return only 1 row from table2, even though there might exists many table2 rows for the correct table1.id, with many different locales.
I am looking for a way to join only ONE row with a priority of the locales, first check for one where lang = something, then if that doesn't manage to join/return anything, then where lang = somethingelse, and lastly lang = default.
FR, EU can be different for many users, and rows in the database might exist for many different locales.. I need to select the most suitable ones with the correct fallback priority.
I tried doing the query above with a GROUP BY table2.table1_id, and it seemed to work, but I realised that if the best matching (first OR) was entered later in the table (higher primary ID) it would return 2nd or default priority as the grouped by row..
Any tips?
Thank you!
it still doesn't seem to "know" what t1.id is :(
Here follows my entire query, it goes to show table1 = _product_sku, table2 = _product_sku_data, t1 = ps, t2 = psd
SELECT ps.id, psd.description, psd.lang
FROM _product_sku ps
CROSS JOIN
( SELECT lang, title, description
FROM _product_sku_data
WHERE product_sku_id = ps.id
ORDER BY CASE WHEN lang='$this->profile_language_preference' THEN 0
WHEN lang='$this->browser_language' THEN 1
WHEN lang='default' THEN 2
ELSE 3
END
LIMIT 1
) AS psd
Edit:
This version uses variables to provide some sort of ranking within the available languages. Tables and test-data are not from the original question, but from the query OP provided as an answer.
It produced the expected results when I tried it:
SELECT id, description, lang
FROM
(
SELECT ps.id, psd.description, psd.lang,
CASE
WHEN #id != ps.id THEN #rownum := 1
ELSE #rownum := #rownum + 1
END AS rank,
#id := ps.id
FROM _product_sku ps
JOIN _product_sku_data psd ON ( psd.product_sku_id = ps.id )
JOIN ( SELECT #id:=NULL, #rownum:=0 ) x
ORDER BY id,
CASE WHEN lang='$this->profile_language_preference' THEN 0
WHEN lang='$this->browser_language' THEN 1
WHEN lang='default' THEN 2
ELSE 3
END
) x
WHERE rank = 1;
Old version which did not work, since ps.id is not known in the WHERE clause:
This one should return you the rows of table1 with the "best matching" row of table2 by using LIMIT 1 and ordering languages as defined:
SELECT t1.id, t2.lang, t2.some_column
FROM table1 t1
CROSS JOIN
( SELECT lang, some_column
FROM table2
WHERE table1_id = t1.id
ORDER BY CASE WHEN lang='fr' THEN 0
WHEN lang='eu' THEN 1
WHEN lang='default' THEN 2
ELSE 3
END
LIMIT 1
) t2

grouping records in one temp table

I have a table where one column has duplicate records but other columns are distinct. so something like this
Code SubCode version status
1234 D1 1 A
1234 D1 0 P
1234 DA 1 A
1234 DB 1 P
5678 BB 1 A
5678 BB 0 P
5678 BP 1 A
5678 BJ 1 A
0987 HH 1 A
So in the above table. subcode and Version are unique values whereas Code is repeated. I want to transfer records from the above table into a temporary table. Only records I would like to transfer are where ALL the subcodes for a code have status of 'A' and I want them in the temp table only once.
So from example above. the temporary table should only have
5678 and 0987 since all the subcodes relative to 5678 have status of 'A' and all subcodes for 0987 (it only has one) have status of A. 1234 is ommited because its subcode 'DB' has status of 'P'
I'd appreciate any help!
Here's my solution
SELECT Code
FROM
(
SELECT
Code,
COUNT(SubCode) as SubCodeCount
SUM(CASE WHEN ACount > 0 THEN 1 ELSE 0 END)
as SubCodeCountWithA
FROM
(
SELECT
Code,
SubCode,
SUM(CASE WHEN Status = 'A' THEN 1 ELSE 0 END)
as ACount
FROM CodeTable
GROUP BY Code, SubCode
) sub
GROUP BY Code
) sub2
WHERE SubCodeCountWithA = SubCodeCount
Let's break it down from the inside out.
SELECT
Code,
SubCode,
SUM(CASE WHEN Status = 'A' THEN 1 ELSE 0 END)
as ACount
FROM CodeTable
GROUP BY Code, SubCode
Group up the codes and subcodes (Each row is a distinct pairing of Code and Subcode). See how many A's occured in each pairing.
SELECT
Code,
COUNT(SubCode) as SubCodeCount
SUM(CASE WHEN ACount > 0 THEN 1 ELSE 0 END)
as SubCodeCountWithA
FROM
--previous
GROUP BY Code
Regroup those pairings by Code (now each row is a Code) and count how many subcodes there are, and how many subcodes had an A.
SELECT Code
FROM
--previous
WHERE SubCodeCountWithA = SubCodeCount
Emit those codes with have the same number of subcodes as subcodes with A's.
It's a little unclear as to whether or not the version column comes into play. For example, do you only want to consider rows with the largest version or if ANY subcde has an "A" should it count. Take 5678, BB for example, where version 1 has an "A" and version 0 has a "B". Is 5678 included because at least one of subcode BB has an "A" or is it because version 1 has an "A".
The following code assumes that you want all codes where every subcode has at least one "A" regardless of the version.
SELECT
T1.code,
T1.subcode,
T1.version,
T1.status
FROM
MyTable T1
WHERE
(
SELECT COUNT(DISTINCT subcode)
FROM MyTable T2
WHERE T2.code = T1.code
) =
(
SELECT COUNT(DISTINCT subcode)
FROM MyTable T3
WHERE T3.code = T1.code AND T3.status = 'A'
)
Performance may be abysmal if your table is large. I'll try to come up with a query that is likely to have better performance since this was off the top of my head.
Also, if you explain the full extent of your problem maybe we can find a way to get rid of that temp table... ;)
Here are two more possible methods. Still a lot of subqueries, but they look like they will perform better than the method above. They are both very similar, although the second one here had a better query plan in my DB. Of course, with limited data and no indexing that's not a great test. You should try all of the methods out and see which is best for your database.
SELECT
T1.code,
T1.subcode,
T1.version,
T1.status
FROM
MyTable T1
WHERE
EXISTS
(
SELECT *
FROM MyTable T2
WHERE T2.code = T1.code
AND T2.status = 'A'
) AND
NOT EXISTS
(
SELECT *
FROM MyTable T3
LEFT OUTER JOIN MyTable T4 ON
T4.code = T3.code AND
T4.subcode = T3.subcode AND
T4.status = 'A'
WHERE T3.code = T1.code
AND T3.status <> 'A'
AND T4.code IS NULL
)
SELECT
T1.code,
T1.subcode,
T1.version,
T1.status
FROM
MyTable T1
WHERE
EXISTS
(
SELECT *
FROM MyTable T2
WHERE T2.code = T1.code
AND T2.status = 'A'
) AND
NOT EXISTS
(
SELECT *
FROM MyTable T3
WHERE T3.code = T1.code
AND T3.status <> 'A'
AND NOT EXISTS
(
SELECT *
FROM MyTable T4
WHERE T4.code = T3.code
AND T4.subcode = T3.subcode
AND T4.status = 'A'
)
)
In your select, add a where clause that reads:
Select [stuff]
From Table T
Where Exists
(Select * From Table
Where Code = T.Code
And Status = 'A')
And Not Exists
(Select * From Table I
Where Code = T.Code
And Not Exists
(Select * From Table
Where Code = I.Code
And SubCode = I.SubCode
And Status = 'A'))
In English,
Show me the rows,
where there is at least one row with status 'A',
and there are NO rows with any specific subcode,
that do not have at least one row with that code/subcode, with status 'A'
INSERT theTempTable (Code)
SELECT t.Code
FROM theTable t
LEFT OUTER JOIN theTable subT ON (t.Code = subT.Code AND subT.status <> 'A')
WHERE subT.Code IS NULL
GROUP BY t.Code
This should do the trick. The logic is a little tricky, but I'll do my best to explain how it is derived.
The outer join combined with the IS NULL check allows you to search for the absence of a criteria. Combine that with the inverse of what you're normally looking for (in this case status = 'A') and the query succeeds when there are no rows that do not match. This is the same as ((there are no rows) OR (all rows match)). Since we know that there are rows due to the other query on the table, all rows must match.