Using IN with multiple columns

Using IN with multiple columns - sql

Just a quick question. I'm using single values manually inputed by the user and doing an SQL query comparing to two columns, like:
SELECT col3,col1,col4 FROM table WHERE
col1='SomeReallyLongText' OR col2='SomeReallyLongText'
The repetition of SomeReallyLongText is tolerable, but my program also supports looping through an Excel document with several hundred rows - which means I'll be doing:
SELECT col3,col1,col4 FROM table WHERE
col1 IN('item1','item2',...'itemN') OR col2 IN('item1','item2',...'itemN')
And the query would be exhaustively long, which I can't imagine is efficient.
Is there a way to shorten this so two columns can be compared to the same IN(xxx) set?
If not, are there other (more efficient) ways of giving the set of values in the query?
(I'm using C# with .NET 4.0 Client Profile, using Excel Interop to access the file)

I'm not too sure about the performance you'd get with this:
SELECT col3,col1,col4 FROM table
WHERE EXISTS (
SELECT 1
FROM (VALUES
('item1')
, ('item2')
, ...
, ('itemN')
) AS It(m)
WHERE It.m IN (col1, col2, ...)
)

You can create a temp table to store all the values used inside the IN clause
IF OBJECT_ID('tempdb..#Sample') IS NOT NULL DROP TABLE #Sample
Create table #Sample
(name varchar(20))
Insert into #Sample
values
('item1'),('Item2'),....
SELECT col3,col1,col4 FROM table WHERE
col1 IN ( Select name from #Sample) OR col2 IN(Select name from #Sample)
or if you are using Linq to SQL then you can store the excel data in collection and use Contains method to query the DB
var excelVal = new string[] { 'item1','item2'... };
var result = from x in Table
where excelVal .Contains(x.Col1) || excelVal.Contains(x.Col2)
select x;

Related

Use inserted value as a parameter for other inserts

There is a db2 database with two tables. The first one, table1, has autoincrement column ID. It is the foreign key for the table2.
A am writing an HTML generator for SQL queries. So with some input parameters it generates a query or multiple queries. It is not connected to the database.
What I need is to get that autoincrement field and use it in next queries.
So basically, the scenario is:
insert into table1;
select autogenerated field ID;
insert into table2 using that ID;
insert into table2 using that ID;
...some more similar inserts...
insert into table2 using that ID;
And all that SQL query should be generated and then used as a single SQL script.
I was thinking about something like this:
SELECT ID FROM FINAL TABLE (INSERT INTO Table1 (t1column1, t1column2, etc.)
VALUES (t1value1, t1value2, etc.))
But I don't know, how I can write the result into a variable so I could use it in next queries like this:
INSERT INTO Table2 (foreignKeyCol, t2column1, t2column2, etc.)
VALUES ($ID, t2value1, t2value2, etc.)
I could just paste that select instead of $ID, but the second query can be used several times with the same $ID and different values.
EDIT: DB2 10.5 on Linux.

You can chain several inserts together using CTEs, like so:
WITH idcte (id) as (
SELECT ID FROM FINAL TABLE (
INSERT INTO Table1 (t1column1, t1column2, etc.)
VALUES (t1value1, t1value2, etc.)
)
),
ins1 (id) as (
SELECT foreignKeyCol FROM FINAL TABLE (
INSERT INTO Table2 (foreignKeyCol, t2column1, t2column2, etc.)
SELECT id, t2value1, t2value2, etc.
FROM idcte
)
),
-- more CTEs
SELECT foreignKeyCol FROM FINAL TABLE (
-- your last INSERT ... SELECT FROM
)
Essentially you will have to wrap each INSERT into a SELECT FROM FINAL TABLE for this to work.
Alternatively, you can use a global variable to keep the ID value:
CREATE VARIABLE myNewId INT;
SET myNewId = (SELECT ID FROM FINAL TABLE (
INSERT INTO Table1 (t1column1, t1column2, etc.)
VALUES (t1value1, t1value2, etc.)
));
INSERT INTO Table2 (foreignKeyCol, t2column1, t2column2, etc.)
VALUES (myNewId, t2value1, t2value2, etc.);
DROP VARIABLE myNewId;
This assumes a recent version of Db2 for LUW.

Hive - getting the column names count of a table

How can I get the hive column count names using HQL? I know we can use the describe.tablename to get the names of columns. How do we get the count?

create table mytable(i int,str string,dt date, ai array<int>,strct struct<k:int,j:int>);
select count(*)
from (select transform ('')
using 'hive -e "desc mytable"'
as col_name,data_type,comment
) t
;
5
Some additional playing around:
create table mytable (id int,first_name string,last_name string);
insert into mytable values (1,'Dudu',null);
select size(array(*)) from mytable limit 1;
This is not bulletproof since not all combinations of columns types can be combined into an array.
It also requires that the table will contain at least 1 row.
Here is a more complex but also stronger solution (types versa), but also requires that the table will contain at least 1 row
select size(str_to_map(val)) from (select transform (struct(*)) using 'sed -r "s/.(.*)./\1/' as val from mytable) t;

SQL Query to return rows even if it is not present in the table

This is a specific problem .
I have an excel sheet containing data. Similar data is present in a relational database table. Some rows may be absent or some additional rows may be present. The goal is to verify the data in the excel sheet with the data in the table.
I have the following query
Select e_no, start_dt,end_dt
From MY_TABLE
Where e_no In
(20231, 457)
In this case, e_no 457 is not present in the database (and hence not returned). But I want my query to return a row even if it not present (457 , null , null). How do I do that ?

For Sql-Server: Use a temporary table or table type variable and left join MY_TABLE with it
Sql-Server fiddle demo
Declare #Temp Table (e_no int)
Insert into #Temp
Values (20231), (457)
Select t.e_no, m.start_dt, m.end_dt
From #temp t left join MY_TABLE m on t.e_no = m.e_no
If your passing values are a csv list, then use a split function to get the values inserted to #Temp.

Why not simply populate a temporary table in the database from your spreadsheet and join against that? Any other solution is probably going to be both more work and more difficult to maintain.

You can also do it this way with a UNION
Select
e_no, start_dt ,end_dt
From MY_TABLE
Where e_no In (20231, 457)
UNION
Select 457, null, null

How to combine IN operator with LIKE condition (or best way to get comparable results)

I need to select rows where a field begins with one of several different prefixes:
select * from table
where field like 'ab%'
or field like 'cd%'
or field like "ef%"
or...
What is the best way to do this using SQL in Oracle or SQL Server? I'm looking for something like the following statements (which are incorrect):
select * from table where field like in ('ab%', 'cd%', 'ef%', ...)
or
select * from table where field like in (select foo from bar)
EDIT:
I would like to see how this is done with either giving all the prefixes in one SELECT statement, of having all the prefixes stored in a helper table.
Length of the prefixes is not fixed.

Joining your prefix table with your actual table would work in both SQL Server & Oracle.
DECLARE #Table TABLE (field VARCHAR(32))
DECLARE #Prefixes TABLE (prefix VARCHAR(32))
INSERT INTO #Table VALUES ('ABC')
INSERT INTO #Table VALUES ('DEF')
INSERT INTO #Table VALUES ('ABDEF')
INSERT INTO #Table VALUES ('DEFAB')
INSERT INTO #Table VALUES ('EFABD')
INSERT INTO #Prefixes VALUES ('AB%')
INSERT INTO #Prefixes VALUES ('DE%')
SELECT t.*
FROM #Table t
INNER JOIN #Prefixes pf ON t.field LIKE pf.prefix

you can try regular expression
SELECT * from table where REGEXP_LIKE ( field, '^(ab|cd|ef)' );

If your prefix is always two characters, could you not just use the SUBSTRING() function to get the first two characters of "field", and then see if it's in the list of prefixes?
select * from table
where SUBSTRING(field, 1, 2) IN (prefix1, prefix2, prefix3...)
That would be "best" in terms of simplicity, if not performance. Performance-wise, you could create an indexed virtual column that generates your prefix from "field", and then use the virtual column in your predicate.

Depending on the size of the dataset, the REGEXP solution may or may not be the right answer. If you're trying to get a small slice of a big dataset,
select * from table
where field like 'ab%'
or field like 'cd%'
or field like "ef%"
or...
may be rewritten behind the scenes as
select * from table
where field like 'ab%'
union all
select * from table
where field like 'cd%'
union all
select * from table
where field like 'ef%'
Doing three index scans instead of a full scan.
If you know you're only going after the first two characters, creating a function-based index could be a good solution as well. If you really really need to optimize this, use a global temporary table to store the values of interest, and perform a semi-join between them:
select * from data_table
where transform(field) in (select pre_transformed_field
from my_where_clause_table);

You can also try like this, here tmp is temporary table that is populated by the required prefixes. Its a simple way, and does the job.
select * from emp join
(select 'ab%' as Prefix
union
select 'cd%' as Prefix
union
select 'ef%' as Prefix) tmp
on emp.Name like tmp.Prefix

Alternative SQL ways of looking up multiple items of known IDs?

Is there a better solution to the problem of looking up multiple known IDs in a table:
SELECT * FROM some_table WHERE id='1001' OR id='2002' OR id='3003' OR ...
I can have several hundreds of known items. Ideas?

SELECT * FROM some_table WHERE ID IN ('1001', '1002', '1003')
and if your known IDs are coming from another table
SELECT * FROM some_table WHERE ID IN (
SELECT KnownID FROM some_other_table WHERE someCondition
)

The first (naive) option:
SELECT * FROM some_table WHERE id IN ('1001', '2002', '3003' ... )
However, we should be able to do better. IN is very bad when you have a lot of items, and you mentioned hundreds of these ids. What creates them? Where do they come from? Can you write a query that returns this list? If so:
SELECT *
FROM some_table
INNER JOIN ( your query here) filter ON some_table.id=filter.id

See Arrays and Lists in SQL Server 2005

ORs are notoriously slow in SQL.
Your question is short on specifics, but depending on your requirements and constraints I would build a look-up table with your IDs and use the EXISTS predicate:
select t.id from some_table t
where EXISTS (select * from lookup_table l where t.id = l.id)

For a fixed set of IDs you can do:
SELECT * FROM some_table WHERE id IN (1001, 2002, 3003);
For a set that changes each time, you might want to create a table to hold them and then query:
SELECT * FROM some_table WHERE id IN
(SELECT id FROM selected_ids WHERE key=123);
Another approach is to use collections - the syntax for this will depend on your DBMS.
Finally, there is always this "kludgy" approach:
SELECT * FROM some_table WHERE '|1001|2002|3003|' LIKE '%|' || id || '|%';

In Oracle, I always put the id's into a TEMPORARY TABLE to perform massive SELECT's and DML operations:
CREATE GLOBAL TEMPORARY TABLE t_temp (id INT)
SELECT *
FROM mytable
WHERE mytable.id IN
(
SELECT id
FROM t_temp
)
You can fill the temporary table in a single client-server roundtrip using Oracle collection types.

We have a similar issue in an application written for MS SQL Server 7. Although I dislike the solution used, we're not aware of anything better...
'Better' solutions exist in 2008 as far as I know, but we have Zero clients using that :)
We created a table valued user defined function that takes a comma delimited string of IDs, and returns a table of IDs. The SQL then reads reasonably well, and none of it is dynamic, but there is still the annoying double overhead:
1. Client concatenates the IDs into the string
2. SQL Server parses the string to create a table of IDs
There are lots of ways of turning '1,2,3,4,5' into a table of IDs, but the Stored Procedure which uses the function ends up looking like...
CREATE PROCEDURE my_road_to_hell #IDs AS VARCHAR(8000)
AS
BEGIN
SELECT
*
FROM
myTable
INNER JOIN
dbo.fn_split_list(#IDs) AS [IDs]
ON [IDs].id = myTable.id
END

The fastest is to put the ids in another table and JOIN
SELECT some_table.*
FROM some_table INNER JOIN some_other_table ON some_table.id = some_other_table.id
where some_other_table would have just one field (ids) and all values would be unique

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Using IN with multiple columns - sql

I'm not too sure about the performance you'd get with this: SELECT col3,col1,col4 FROM table WHERE EXISTS ( SELECT 1 FROM (VALUES ('item1') , ('item2') , ... , ('itemN') ) AS It(m) WHERE It.m IN (col1, col2, ...) )

Related

Use inserted value as a parameter for other inserts

Hive - getting the column names count of a table

SQL Query to return rows even if it is not present in the table

How to combine IN operator with LIKE condition (or best way to get comparable results)

Alternative SQL ways of looking up multiple items of known IDs?

Categories

Resources