Ensuring two columns only contain valid results from same subquery - sql

I have the following table:
id symbol_01 symbol_02
1 abc xyz
2 kjh okd
3 que qid
I need a query that ensures symbol_01 and symbol_02 are both contained in a list of valid symbols. In other words I would needs something like this:
select *
from mytable
where symbol_01 in (
select valid_symbols
from somewhere)
and symbol_02 in (
select valid_symbols
from somewhere)
The above example would work correctly, but the subquery used to determine the list of valid symbols is identical both times and is quite large. It would be very innefficient to run it twice like in the example.
Is there a way to do this without duplicating two identical sub queries?

Another approach:
select *
from mytable t1
where 2 = (select count(distinct symbol)
from valid_symbols vs
where vs.symbol in (t1.symbol_01, t1.symbol_02));
This assumes that the valid symbols are stored in a table valid_symbols that has a column named symbol. The query would also benefit from an index on valid_symbols.symbol

You could try use a CTE like;
WITH ValidSymbols AS (
SELECT DISTINCT valid_symbol
FROM somewhere
)
SELECT mt.*
FROM MyTable mt
INNER JOIN ValidSymbols v1
ON mt.symbol_01 = v1.valid_symbol
INNER JOIN ValidSymbols v2
ON mt.symbol_02 = v2.valid_symbol

From a performance perspective, your query is the right way to do this. I would write it as:
select *
from mytable t
where exists (select 1
from valid_symbols vs
where t.symbol_01 = vs.valid_symbol
) and
exists (select 1
from valid_symbols vs
where t.symbol_02 = vs.valid_symbol
) ;
The important component is that you need an index on valid_symbols(valid_symbol). With this index, the lookup should be pretty fast. Appropriate indexes can even work if valid_symbols is a view, although the effect depends on the complexity of the view.
You seem to have a situation where you have two foreign key relationships. If you explicitly declare these relationships, then the database will enforce that the columns in your table match the valid symbols.

Related

Select rows according to another table with a comma-separated list of items

Have a table test.
select b from test
b is a text column and contains Apartment,Residential
The other table is a parcel table with a classification column. I'd like to use test.b to select the right classifications in the parcels table.
select * from classi where classification in(select b from test)
this returns no rows
select * from classi where classification =any(select '{'||b||'}' from test)
same story with this one
I may make a function to loop through the b column but I'm trying to find an easier solution
Test case:
create table classi as
select 'Residential'::text as classification
union
select 'Apartment'::text as classification
union
select 'Commercial'::text as classification;
create table test as
select 'Apartment,Residential'::text as b;
You don't actually need to unnest the array:
SELECT c.*
FROM classi c
JOIN test t ON c.classification = ANY (string_to_array(t.b, ','));
db<>fiddle here
The problem is that = ANY takes a set or an array, and IN takes a set or a list, and your ambiguous attempts resulted in Postgres picking the wrong variant. My formulation makes Postgres expect an array as it should.
For a detailed explanation see:
How to match elements in an array of composite type?
IN vs ANY operator in PostgreSQL
Note that my query also works for multiple rows in table test. Your demo only shows a single row, which is a corner case for a table ...
But also note that multiple rows in test may produce (additional) duplicates. You'd have to fold duplicates or switch to a different query style to get de-duplicate. Like:
SELECT c.*
FROM classi c
WHERE EXISTS (
SELECT FROM test t
WHERE c.classification = ANY (string_to_array(t.b, ','))
);
This prevents duplication from elements within a single test.b, as well as from across multiple test.b. EXISTS returns a single row from classi per definition.
The most efficient query style depends on the complete picture.
You need to first split b into an array and then get the rows. A couple of alternatives:
select * from nj.parcels p where classification = any(select unnest(string_to_array(b, ',')) from test)
select p.* from nj.parcels p
INNER JOIN (select unnest(string_to_array(b, ',')) from test) t(classification) ON t.classification = p.classification;
Essential to both is the unnest surrounding string_to_array.

CROSS APPLY query very slow when additional column added

I have a CROSS APPLY query which executes very quickly (1 second). However, if I add certain additional columns to the top SELECT, the query will run very slow (many minutes). I'm not seeing what is causing this.
SELECT
cs.show_title, im.primaryTitle
FROM
captive_state cs
CROSS APPLY
(SELECT TOP 1
imdb.tconst, imdb.titleType, imdb.primaryTitle,
imdb.genres, imdb.genre1, imdb.genre2, imdb.genre3
FROM
imdb_data imdb
WHERE
(imdb.primaryTitle LIKE cs.show_title+'%')
AND (imdb.titleType like 'tv%' OR imdb.titleType = 'movie')
ORDER BY
imdb.titleType, imdb.tconst DESC) AS im
WHERE
cs.genre1 IS NULL
I've tried adding/removing various columns and only when adding the 'genre' fields - e.g. genre2 (varchar(50)) - does the slowness occur. For example,
SELECT cs.show_title, im.primaryTitle, im.genre2
I would expect the query to basically have the same performance whether adding one additional column or not.
Here are the query plans without the extra column, and with.
The first table (cs) has a primary key index and an index on genre1. The second table (imdb) has a primary key index and an index on primaryTitle.
I'm not sure if those would cause any problems though.
Thanks for any suggestions.
In your second screenshot, you're performing an Index Scan on the primary key for imdb_data. This is essentially scanning the table as if there is no index.
You have two options. Either change your query to use the indexed columns of imdb_data or create a new index to cover this query.
Maybe switch to an alternative for the topped CROSS APPLY
SELECT TOP 1 WITH TIES
cs.show_title,
imdb.tconst, imdb.titleType, imdb.primaryTitle,
imdb.genres, imdb.genre1, imdb.genre2, imdb.genre3
FROM captive_state cs
JOIN imdb_data imdb
ON imdb.primaryTitle LIKE cs.show_title+'%'
AND (imdb.titleType = 'movie' OR imdb.titleType LIKE 'tv%')
WHERE cs.genre1 IS NULL
ORDER BY ROW_NUMBER() OVER (PARTITION BY cs.show_title ORDER BY imdb.titleType, imdb.tconst DESC)
You could include additional columns to index [imdb_data].[idx_primary_table]. (name is not readable from screenshot):
CREATE INDEX idx_name ON [imdb_data].[idx_primary_table](same cols as in original)
INCLUDE (genre1, genre2, genre3) WITH (DROP_EXISTING=ON)
Try to use "join" with "row_number()" instead of "apply"
select
dat.primaryTitle
,dat.show_title
from (
select
imdb.primaryTitle
,cs.show_title
,row_number() over (partition by cs.show_title order by imdb.titleType, imdb.tconst DESC) as rn
from imdb_data imdb
inner join captive_state cs on imdb.primaryTitle LIKE cs.show_title+'%'
where (imdb.titleType like 'tv%' OR imdb.titleType = 'movie')
and cs.genre1 IS NULL
) dat
where dat.rn = 1

Is there something I can change to make my view run faster?

I just created a view but it is really slow, since my actual table has something around 800k rows.
Is there something I can change in the actual sql code to make it run faster?
Here is how it looks now:
Select B.*
FROM
(Select A.*, (select count(B.KEY_ID)/77
FROM book_new B
where B.KEY_ID = A.KEY_ID) as COUNT_KEY
FROM
(select *
from book_new
where region = 'US'
and (actual_release_date is null or
actual_release_date >= To_Date( '01/07/16','dd/mm/yy'))
) A
) B
WHERE B.COUNT_KEY = 1
OR (B.COUNT_KEY > 1 AND B.NEW_OLD <> 'Old')
The most obvious things to do are add indexes:
Add an index on book_new(key_id)
Add an index on book_new(region, actual_release_date)
These are probably sufficient. It is possible that rewriting the query would help, but this is a good beginning. If you want to rewrite the query, it would help if you described the logic you are trying to implement.
There are many ways to solve this issue based on your needs
You can create an indexed view
You can create an index in the base tables which are used in this view.
You can use the required columns in the SELECT statement instead of using SELECT * FROM,
If the table contains many columns but you require only few columns, you can create a NON CLUSTERED INDEX with INCLUDE COLUMNS option which will reduce the LOGICAL READS.
For starters, replace the scalar subquery for COUNT_KEY with a windowed COUNT(*).
SELECT * FROM
(
select book_new.*, COUNT(*) OVER ( PARTITION BY book_new.key_id)/77 COUNT_KEY
from book_new
where region = 'US'
and (actual_release_date is null or
actual_release_date >= To_Date( '01/07/16','dd/mm/yy'))
)
WHERE count_key = 1 OR ( count_key > 1 AND new_old <> 'Old' )
This way, you only go through the BOOK_NEW table one time.
BTW, I agree with other comments that this query makes little sense.

'In' clause in SQL server with multiple columns

I have a component that retrieves data from database based on the keys provided.
However I want my java application to get all the data for all keys in a single database hit to fasten up things.
I can use 'in' clause when I have only one key.
While working on more than one key I can use below query in oracle
SELECT * FROM <table_name>
where (value_type,CODE1) IN (('I','COMM'),('I','CORE'));
which is similar to writing
SELECT * FROM <table_name>
where value_type = 1 and CODE1 = 'COMM'
and
SELECT * FROM <table_name>
where value_type = 1 and CODE1 = 'CORE'
together
However, this concept of using 'in' clause as above is giving below error in 'SQL server'
ERROR:An expression of non-boolean type specified in a context where a condition is expected, near ','.
Please let know if their is any way to achieve the same in SQL server.
This syntax doesn't exist in SQL Server. Use a combination of And and Or.
SELECT *
FROM <table_name>
WHERE
(value_type = 1 and CODE1 = 'COMM')
OR (value_type = 1 and CODE1 = 'CORE')
(In this case, you could make it shorter, because value_type is compared to the same value in both combinations. I just wanted to show the pattern that works like IN in oracle with multiple fields.)
When using IN with a subquery, you need to rephrase it like this:
Oracle:
SELECT *
FROM foo
WHERE
(value_type, CODE1) IN (
SELECT type, code
FROM bar
WHERE <some conditions>)
SQL Server:
SELECT *
FROM foo
WHERE
EXISTS (
SELECT *
FROM bar
WHERE <some conditions>
AND foo.type_code = bar.type
AND foo.CODE1 = bar.code)
There are other ways to do it, depending on the case, like inner joins and the like.
If you have under 1000 tuples you want to check against and you're using SQL Server 2008+, you can use a table values constructor, and perform a join against it. You can only specify up to 1000 rows in a table values constructor, hence the 1000 tuple limitation. Here's how it would look in your situation:
SELECT <table_name>.* FROM <table_name>
JOIN ( VALUES
('I', 'COMM'),
('I', 'CORE')
) AS MyTable(a, b) ON a = value_type AND b = CODE1;
This is only a good idea if your list of values is going to be unique, otherwise you'll get duplicate values. I'm not sure how the performance of this compares to using many ANDs and ORs, but the SQL query is at least much cleaner to look at, in my opinion.
You can also write this to use EXIST instead of JOIN. That may have different performance characteristics and it will avoid the problem of producing duplicate results if your values aren't unique. It may be worth trying both EXIST and JOIN on your use case to see what's a better fit. Here's how EXIST would look,
SELECT * FROM <table_name>
WHERE EXISTS (
SELECT 1
FROM (
VALUES
('I', 'COMM'),
('I', 'CORE')
) AS MyTable(a, b)
WHERE a = value_type AND b = CODE1
);
In conclusion, I think the best choice is to create a temporary table and query against that. But sometimes that's not possible, e.g. your user lacks the permission to create temporary tables, and then using a table values constructor may be your best choice. Use EXIST or JOIN, depending on which gives you better performance on your database.
Normally you can not do it, but can use the following technique.
SELECT * FROM <table_name>
where (value_type+'/'+CODE1) IN (('I'+'/'+'COMM'),('I'+'/'+'CORE'));
A better solution is to avoid hardcoding your values and put then in a temporary or persistent table:
CREATE TABLE #t (ValueType VARCHAR(16), Code VARCHAR(16))
INSERT INTO #t VALUES ('I','COMM'),('I','CORE')
SELECT DT. *
FROM <table_name> DT
JOIN #t T ON T.ValueType = DT.ValueType AND T.Code = DT.Code
Thus, you avoid storing data in your code (persistent table version) and allow to easily modify the filters (without changing the code).
I think you can try this, combine and and or at the same time.
SELECT
*
FROM
<table_name>
WHERE
value_type = 1
AND (CODE1 = 'COMM' OR CODE1 = 'CORE')
What you can do is 'join' the columns as a string, and pass your values also combined as strings.
where (cast(column1 as text) ||','|| cast(column2 as text)) in (?1)
The other way is to do multiple ands and ors.
I had a similar problem in MS SQL, but a little different. Maybe it will help somebody in futere, in my case i found this solution (not full code, just example):
SELECT Table1.Campaign
,Table1.Coupon
FROM [CRM].[dbo].[Coupons] AS Table1
INNER JOIN [CRM].[dbo].[Coupons] AS Table2 ON Table1.Campaign = Table2.Campaign AND Table1.Coupon = Table2.Coupon
WHERE Table1.Coupon IN ('0000000001', '0000000002') AND Table2.Campaign IN ('XXX000000001', 'XYX000000001')
Of cource on Coupon and Campaign in table i have index for fast search.
Compute it in MS Sql
SELECT * FROM <table_name>
where value_type + '|' + CODE1 IN ('I|COMM', 'I|CORE');

SQL equivalent of IN operator that acts as AND instead of OR?

Rather than describe, I'll simply show what I'm trying to do. 3 tables in 3NF. product_badges is the join table. (The exists sub-query is necessary.)
SELECT * FROM shop_products WHERE EXISTS ( // this line cannot change
SELECT * FROM product_badges as pb
WHERE pb.product_id=shop_products.id
AND pb.badge_id IN (1,2,3,4)
);
Now this will return all the products that have a badge_id of 1 OR 2 OR 3 OR 4. What I want is to get only the products that meet ALL those values. I tried to do pb.badge_id=1 AND pb.badge_id=2 etc but this returns nothing -- which makes sense to me now. I also tried doing multiple queries with an INTERSECT but that resulted in an error. I'm guessing multiple queries is the key but UNION is basically the same as IN and I'm not certain how to use a JOIN in this case.
Please try the following (assuming exactly 4 different required bagde_id's):
SELECT * FROM shop_products WHERE EXISTS ( // this line cannot change
SELECT pb.product_id, count(distinct pb.bagde_id) FROM product_badges as pb
WHERE pb.product_id=shop_products.id
AND pb.badge_id IN (1,2,3,4)
GROUP BY pb.product_id
HAVING count(distinct pb.bagde_id) = 4
);