Query to retrieve data based on many composite keys - sql

Unfortunately the db I am stuck supporting contains little to no surrogate/primary keys, and there's no chance to add one. So I'm left with using composite keys.
I'd like to query a single table with a list of composite keys. I can have as many as 5k composite keys. So how can I do this? The below query works, but I would have to build it dynamically and is not something I've ever seen done or had to do. It seems there should be a better way to do this...
select * from dog_manners
where
col1 NOT IN ('', 'abcd')
and
(
-- here are the composite keys (each pair must be unique)
(col2 = 'Scottish Terrier' and col3 = 'black') or
(col2 = 'Golden Retriever' and col3 = 'brown') or
(col2 = 'Golden Retriever' and col3 = 'wheaten') or
(col2 = 'Yorkshire Terrier' and col3 = 'brown') or
etc...
)
If this is the best way, is there a limit to the number of OR conditions I can have? If so, I'll have to break it up into smaller chuncks.

I don't know informix so I googled and it appears as if temporary tables exists. One idea is to create one and use that in a join:
CREATE TEMP TABLE tmp1 ( col2 varchar(20), col3 varchar(20) );
INSERT INTO tmp1 (col2, col3) VALUES ('Scottish Terrier', 'black')
, ('Golden Retriever', 'brown');
SELECT *
FROM dog_manners x
JOIN tmp1
ON tmp1.col2 = x.col2
AND tmp1.col3 = x.col3;

I'm not sure I'd recommend this, but without more detail to understand your requirement
but needs to be built dynamically
the following alternative would work, although may not be terribly efficient:
select * from dog_manners
where
col1 NOT IN ('', 'abcd')
and TRIM(col2)||':'||TRIM(col3) IN
(
'Scottish Terrier:black', 'Golden Retriever:brown',
'Golden Retriever:wheaten', 'Yorkshire Terrier:brown',
...
)
Can you give some more detail on how these combinations are generated?

Related

SQL query to find columns having at least one non null value

I am developing a data validation framework where I have this requirement of checking that the table fields should have at least one non-null value i.e they shouldn't be completely empty having all values as null.
For a particular column, I can easily check using
select count(distinct column_name) from table_name;
If it's greater than 0 I can tell that the column is not empty. I already have a list of columns. So, I can execute this query in the loop for every column but this would mean a lot of requests and it is not the ideal way.
What is the better way of doing this? I am using Microsoft SQL Server.
I would not recommend using count(distinct) because it incurs overhead for removing duplicate values. You can just use count().
You can construct the query for counts using a query like this:
select count(col1) as col1_cnt, count(col2) as col2_cnt, . . .
from t;
If you have a list of columns you can do this as dynamic SQL. Something like this:
declare #sql nvarchar(max);
select #sql = concat('select ',
string_agg(concat('count(', quotename(s.value), ') as cnt_', s.value),
' from t'
)
from string_split(#list) s;
exec sp_executesql(#sql);
This might not quite work if your columns have special characters in them, but it illustrates the idea.
You should probably use exists since you aren't really needing a count of anything.
You don't indicate how you want to consume the results of multiple counts, however one thing you could do is use concat to return a list of the columns meeting your criteria:
The following sample table has 5 columns, 3 of which have a value on at least 1 row.
create table t (col1 int, col2 int, col3 int, col4 int, col5 int)
insert into t select null,null,null,null,null
insert into t select null,2,null,null,null
insert into t select null,null,null,null,5
insert into t select null,null,null,null,6
insert into t select null,4,null,null,null
insert into t select null,6,7,null,null
You can name the result of each case expression and concatenate, only the columns that have a non-null value are included as concat ignores nulls returned by the case expressions.
select Concat_ws(', ',
case when exists (select * from t where col1 is not null) then 'col1' end,
case when exists (select * from t where col2 is not null) then 'col2' end,
case when exists (select * from t where col3 is not null) then 'col3' end,
case when exists (select * from t where col4 is not null) then 'col4' end,
case when exists (select * from t where col5 is not null) then 'col5' end)
Result:
col2, col3, col5
I asked a similar question about a decade ago. The best way of doing this in my opinion would meet the following criteria.
Combine the requests for multiple columns together so they can all be calculated in a single scan.
If the scan encounters a not null value in every column under consideration allow it to exit early without reading the rest of the table/index as reading subsequent rows won't change the result.
This is quite a difficult combination to get in practice.
The following might give you the desired behaviour
SELECT DISTINCT TOP 2 ColumnWithoutNull
FROM YourTable
CROSS APPLY (VALUES(CASE WHEN b IS NOT NULL THEN 'b' END),
(CASE WHEN c IS NOT NULL THEN 'c' END)) V(ColumnWithoutNull)
WHERE ColumnWithoutNull IS NOT NULL
OPTION ( HASH GROUP, MAXDOP 1, FAST 1)
If it gives you a plan like this
Hash match usually reads all its build input first meaning that no shortcircuiting of the scan will happen. If the optimiser gives you an operator in "flow distinct" mode it won't do this however and the query execution can potentially stop as soon as TOP receives its first two rows signalling that a NOT NULL value has been found in both columns and query execution can stop.
But there is no hint to request the mode for hash aggregate so you are dependent on the whims of the optimiser as to whether you will get this in practice. The various hints I have added to the query above are an attempt to point it in that direction however.

TSQL: Creating a duplicate table and modifying values in the column properly?

I have a master table that I wish to keep as it is. I want to duplicate this table then find a specific record with a where clause and set a column value to null. I then want to reinsert it into the duplicated table without anything getting modified in the master table.
Right now these are the steps that I have taken, however, for some reason the changes propagate all the way to the master table:
select * into Table_Duplicate from Table_Master
create view vw_Filtered as
select Col1, null as Col2 from Table_Master where Col1 = 'Condition'
update
set Table_Duplicate.Col2 = vw_Filtered.Col2
inner join vw_Filtered
on Table_Duplicate.Col1 = vw_Filtered.Col1
Once Statement 3) has been executed, when I do:
select * from Table_Master where Col1 = 'Condition'
I get the modified value in Col2 but I want to get the value before updating it.
Please do let me know if there are any other way to achieve this.
I think you're over-complicating this.
I wont pretend I know why you need to do this in the first place, but if I had to do such a thing I would probably do it like this:
SELECT Col1,
IIF(Col1 = 'Condition', null, Col2) As Col2
[,Coln]
INTO Table_Duplicate
FROM Table_Master
Then you get all the process in a single select into statement.

Insert Statement for List

I'm not too sure how to describe my SQL Insert statement so I will describe the expected result.
I'm building a data extract list and have a table that I've put all my data into. It's called _MATTER_LIST
What I am trying to Achieve is to have the Client_Number + Col1 combination repeat after every unique COL1+COL2+COL3 combination but not duplicate when there is already a CLIENT_NUMBER+COL1. So the end result would be:
thanks in advance for any tips.
Simple ORDER BY should work for you if i understand. Try this :
select Client_Number, Col1, Col2, Col3 from _MATTER_LIST
order by Client_Number, Col1
I've managed to fix my own issue. I added a unique key for the col1 + col2 + col3 , then make col2 repeat over each combination for example.
The result is: select * from _MATTER_LIST order by COL4, COL5

SQL parametrised query containing multiple options

I would like to write a query
Select col1, col2
from table
where col1 = 'blah' or 'blah2' or 'blah3'
and col2 = 'blah' or 'blah2' or 'blah3'
I am used to writing them like this for a SINGLE option
select
col1, col2
from
table
where
col1 = :col1 and col2 = :col2
Parameters.AddWithValue(":col1", 'blah')
Parameters.AddWithValue(":col2", 'blah')
Now I want to add several options with OR between them and obviously the above code wont work. The SQL is for SQLite. Can anyone suggest how I could do this? I may potential have more then 3 different values for each parameter. I have tried searching but the answer is elusive.
You still have to use complete expressions, i.e., you need to write col1 = or col2 = every time.
Alternative, use IN:
SELECT ... WHERE col1 IN (:c11, :c12, :c13) AND col2 IN (:c21, :c22, :c23);

Query optimization; Table.Column = #Param OR #Param IS NULL

In WHERE clause when using condition like this Table.Column = #Param OR #Param IS NULL It does not use INDEX on Column.
Is it true and if so then how to write this kind of query which also use INDEX
Query Example
SELECT Col1, Col2 ...
FROM Table
WHERE (Col1 = #col OR #col IS NULL)
AND (Col2 = #col2 OR #col2 IS NULL)
AND (Col3 = #col3 OR #col3 IS NULL)
Any help.
Unfortunately, the generation of execution plans does not behave as you expect.
For that single query, a single plan is created. In creating that plan the indexes to use are selected, and fixed. It doesn't matter what parameters you provide, the same plan, same indexes, etc, are always used.
The otpimiser has tried to find the best plan that can fit all eventuallities, but by the nature of this type of query, there isn't one. A characteristic born out by the plan you have not using an index at all.
The solution is to use dynamic SQL. This feels untidy, but if you use parameterised queries with sp_executesql, it can actually be quite stuctured, and very performant.
Here is a link to a very useful article on the subject: dynamic search
It's very in depth, but it is a very robust approach to this problem.
SELECT Col1, Col2 ...
FROM Table
WHERE EXISTS(
SELECT Col1, Col2, Col3
INTERSECT
SELECT #col, #col2, #col3)
Intuitively, this seems like it should perform very badly, but SQL Server's query optimiser knows how to give INTERSECT special treatment, and internally translates it to (pseudo-SQL)
SELECT Col1, Col2 ...
FROM Table
WHERE (Col1, Col2, Col3) IS (#col, #col2, #col3)
as you can see in the query plan. If you have indices on these columns, they can and do get used.
I originally picked this up from Paul White's Undocumented Query Plans: Equality Comparisons blog post, which may be an interesting further read.
Why don't try this:
SELECT Col1, Col2 ...
FROM Table
WHERE Col1 = IsNull(#col,Col1)
AND Col2 = IsNull(#col2,Col2)
AND Col3 = IsNull(#col3,Col3)
About your question:
Your query analyzer say it don't use the index on column1,2,3 ? You made a index for all 3 columns? Then it should use it regardless the other OR IS NULL
Try to have index on all where clause columns and try to use the more structured query as given below:
SELECT Col1, Col2 ...
FROM Table
WHERE Col1 = **COALESCE**(#col,Col1)
AND Col2 = **COALESCE**(#col2,Col2)
AND Col3 = **COALESCE**(#col3,Col3)
The COALESCE() function returns the first non-null argument so if STATUS is NULL it will return ''.