Easy way to transfer/update a list of numbers to be used in the SQL 'in' command? - sql

I'm always being given a large list of say id's which I need to search in our database have manually put them into a sql statement like the follow which can take a while putting single quotes around each number followed by a comma, I was hoping someone has a easy way of doing this for me? Or am I just being a bit lazy...
select * from blah where idblah in ('1234-A', '1235-A', '1236-A' ................)

You can use the worlds' simplest code generator.
Just paste in the list of values, setup the pattern and voila... you have a set of quoted values.
I have also used Excel in the past, using the CONCAT function with smart paste.

I would set aside a table to hold the values and have my queries JOIN against that table. Set up a simple import script (don't forget to clear out the table at the start) and something like this is a breeze. Run the import, run the query. You never have to touch the query again or regenerate any code.
As an example:
CREATE TABLE Search_ID_List (
id VARCHAR(20) NOT NULL,
CONSTRAINT PK_Search_ID_List PRIMARY KEY CLUSTERED (id)
)
and:
SELECT
<column list>
FROM
Search_ID_List SIL
INNER JOIN Blah B ON
B.id = SIL.id
If you want to be able to save past search criteria or have multiple searches available to you at the same time then you can just add an identifying column which gets filled in by your import. It can be the file from where the ids came, some descriptive code/name, or whatever. Then just add that to the WHERE clause of your query and you're all set.

You could do something like this.
select * from blah where ',' + '1234-A,1235-A,1236-A' + ',' LIKE ',%' + idblah + '%,'
This pattern is super useful when you're being passed a comma delimited list of values to filter by, but I think would be applicable here as well.

Related

Add column with substring of other column in SQL (Snowflake)

I feel like this should be simple but I'm relatively unskilled in SQL and I can't seem to figure it out. I'm used to wrangling data in python (pandas) or Spark (usually pyspark) and this would be a one-liner in either of those. Specifically, I'm using Snowflake SQL, but I think this is probably relevant to a lot of flavors of SQL.
Essentially I just want to trim the first character off of a specific column. More generally, what I'm trying to do is replace a column with a substring of the same column. I would even settle for creating a new column that's a substring of an existing column. I can't figure out how to do any of these things.
On obvious solution would be to create a temporary table with something like
CREATE TEMPORARY TABLE tmp_sub AS
SELECT id_col, substr(id_col, 2, 10) AS id_col_sub FROM table1
and then join it back and write a new table
CREATE TABLE table2 AS
SELECT
b.id_col_sub as id_col,
a.some_col1, a.some_col2, ...
FROM table1 a
JOIN tmp_sub b
ON a.id_col = b.id_col
My tables have roughly a billion rows though and this feels extremely inefficient. Maybe I'm wrong? Maybe this is just the right way to do it? I guess I could replace the CREATE TABLE table2 AS... to INSERT OVERWRITE INTO table1 ... and at least that wouldn't store an extra copy of the whole thing.
Any thoughts and ideas are most welcome. I come at this humbly from the perspective of someone who is baffled by a language that so many people seem to have mastery over.
I'm not sure the exact syntax/functions in Snowflake but generally speaking there's a few different ways of achieving this.
I guess the general approach that would work universally is using the SUBSTRING function that's available in any database.
Assuming you have a table called Table1 with the following data:
+-------+-----------------------------------------+
Code | Desc
+-------+-----------------------------------------+
0001 | 1First Character Will be Removed
0002 | xCharacter to be Removed
+-------+-----------------------------------------+
The SQL code to remove the first character would be:
select SUBSTRING(Desc,2,len(desc)) from Table1
Please note that the "SUBSTRING" function may vary according to different databases. In Oracle for example the function is "SUBSTR". You just need to find the Snowflake correspondent.
Another approach that would work at least in SQLServer and MySQL would be using the "RIGHT" function
select RIGHT(Desc,len(Desc) - 1) from Table1
Based on your question I assume you actually want to update the actual data within the table. In that case you can use the same function above in an update statement.
update Table1 set Desc = SUBSTRING(Desc,2,len(desc))
You didn't try this?
UPDATE tableX
SET columnY = substr(columnY, 2, 10 ) ;
-Paul-
There is no need to specify the length, as is evidenced from the following simple test harness:
SELECT $1
,SUBSTR($1, 2)
,RIGHT($1, -2)
FROM VALUES
('abcde')
,('bcd')
,('cdef')
,('defghi')
,('e')
,('fg')
,('')
;
Both expressions here - SUBSTR(<col>, 2) and RIGHT(<col>, -2) - effectively remove the first character of the <col> column value.
As for the strategy of using UPDATE versus INSERT OVERWRITE, I do not believe that there will be any difference in performance or outcome, so I might opt for the UPDATE since it is simpler. So, in conclusion, I would use:
UPDATE tableX
SET columnY = SUBSTR(columnY, 2)
;

SQL Checking Substring in a String

I have a table with column mapping which store record: "IV=>0J,IV=>0Q,IV=>2,V=>0H,V=>0K,VI=>0R,VI=>1,"
What is the sql to check whether or not a substring is in column mapping.
so, I would like this:
if I have "IV=>0J" would return true, because IV=>0J is exact in string "mapping"
if I have "IV=>01" would return false. And so on...
I try this:
SELECT * FROM table WHERE charindex('IV=>0J',mapping)
But when I have "IV=>0", it returns TRUE. But, it should return FALSE.
Thank You..
You can search with commas included. Just also add one at beginning and end of mapping:
SELECT * FROM table WHERE charindex(',IV=>0J,',',' + mapping + ',') <> 0
or
SELECT * FROM table WHERE ',' + mapping + ',' LIKE '%,IV=>OJ,%'
This should do the trick:
SELECT * FROM table
WHERE
mapping LIKE '%,IV=>0J,%'
OR mapping LIKE '%,IV=>0J'
OR mapping LIKE 'IV=>0J,%'
OR mapping = 'IV=>0J'
But you should really normalize the database - you are currently violating the principle of atomicity, and therefore the 1NF. Your current difficulties in querying and the future difficulties with performance that you are about to encounter all stem from this root problem...
While you can search by including a comma in the string, this is a bad design for several reasons.
You are unable to take advantage of indexing
You force a full scan of the table, which will lead to bad performance AND excessive blocking.
You have to make sure that there is always a leading or a trailing comma (depends on what you expect in your LIKE expression).
You are no longer able to edit a single entry, you'll have to replace the entire string each time you want to change even a single mapping.
You open yourself to a concurrency nightmare if more that one users try to update different mappings that just happen to be stored in the same column.
Your table isn't even in 1st normal form any more, which is why you have such difficulties
You should normalize your mapping column, by extracting the data to a different mapping table, with at least the From and To columns you require. You can then add these columns to an index an convert your query using only a single index seek.
You can also add the ID values of your source table to the Mappings table and the index. This will allow you to convert the lookup for a source row to a join between the two tables that takes advantage of indexing
charindex returns the position of the text, not Boolean.
to check if the text exists, compare to 0:
SELECT * FROM table WHERE charindex('IV=>0J',mapping) <> 0
I think you're missing something here, the Charindex function does not return TRUE or FALSE.
It returns the starting point of the substring inside master string, or if the substring is not present, then -1.
So you query should read,
SELECT * FROM table WHERE charindex('IV=>0J',mapping) > 0

Matching sub string in a column

First I apologize for the poor formatting here.
Second I should say up front that changing the table schema is not an option.
So I have a table defined as follows:
Pin varchar
OfferCode varchar
Pin will contain data such as:
abc,
abc123
OfferCode will contain data such as:
123
123~124~125
I need a query to check for a count of a Pin/OfferCode combination and when I say OfferCode, I mean an individual item delimited by the tilde.
For example if there is one row that looks like abc, 123 and another that looks like abc,123~124, and I search for a count of Pin=abc,OfferCode=123 I wand to get a count = 2.
Obviously I can do a similar query to this:
SELECT count(1) from MyTable (nolock) where OfferCode like '%' + #OfferCode + '%' and Pin = #Pin
using like here is very expensive and I'm hoping there may be a more efficient way.
I'm also looking into using a split string solution. I have a Table-valued function SplitString(string,delim) that will return table OutParam, but I'm not quite sure how to apply this to a table column vs a string. Would this even be worth wile pursuing? It seems like it would be much more expensive, but I'm unable to get a working solution to compare to the like solution.
Your like/% solution is open to a bug if you had offer codes other than 3 digits (if there was offer code 123 and 1234, searching for like '%123%' would return both, which is wrong). You can use your string function this way:
SELECT Pin, count(1)
FROM MyTable (nolock)
CROSS APPLY SplitString(OfferCode,'~') OutParam
WHERE OutParam.Value = #OfferCode and Pin = #Pin
GROUP BY Pin
If you have a relatively small table you can probably get away with this. If you are working with a large number of rows or encountering performance problems, it would be more effective to normalize it as RedFilter suggested.
using like here is very expensive and I'm hoping there may be a more efficient way
The efficient way is to normalize the schema and put each OfferCode in its own row.
Then your query is more like (although you may need to use an intersection table depending on your schema):
select count(*)
from MyTable
where OfferCode = #OfferCode
and Pin = #Pin
Here is one way to use like for this problem, which is standard for getting exact matches when searching delimited strings while avoiding the '%123%' matches '123' and '1234' problem:
-- Create some test data
declare #table table (
Pin varchar(10) not null
, OfferCode varchar(100) not null
)
insert into #table select 'abc', '123'
insert into #table select 'abc', '123~124'
-- Mock some proc params
declare #Pin varchar(10) = 'abc'
declare #OfferCode varchar(10) = '123'
-- Run the actual query
select count(*) as Matches
from #table
where Pin = #Pin
-- Append delimiters to find exact matches
and '~' + OfferCode + '~' like '%~' + #OfferCode + '~%'
As you can see, we're adding the delimiters to the searched string, and also the search string in order to find matches, thus avoiding the bugs mentioned by other answers.
I highly doubt that a string splitting function will yield better performance over like, but it may be worth a test or two using some of the more recently suggested methods. If you still have unacceptable performance, you have a few options:
Updated:
Try an index on OfferCode (or on a computed persisted column of '~' + OfferCode + '~'). Contrary to the myth that SQL Server won't use an index with like and wildcards, this might actually help.
Check out full text search.
Create a normalized version of this table using a string splitter. Use this table to run your counts. Update this table according to some schedule or event (trigger, etc.).
If you have some standard search terms, pre-calculate the counts for these and store them on some regular basis.
Actually, the LIKE condition is going to have much less cost than doing any sort of string manipulation and comparison.
http://www.simple-talk.com/sql/performance/the-seven-sins-against-tsql-performance/

Sql Server Contains

I need to match on a partial string but can't turn full-text indexing on so can't use contains. I've looked at Levenstein's function for determining the distance between two strings but I'm not looking for fuzzy matching but that every character in the column exists in the string.
I.e. If the string being passed is something like AB_SYS_20120430.TXT I want to match on any columns containing AB_SYS. The like predicate isn't getting me there. I really need the equivalent of the .NET contains feature but as mentioned turning on full text indexing isn't an option to be turned on. Thought I would see if there were any other possible work arounds.
Are you looking for the LIKE function?
http://www.w3schools.com/sql/sql_like.asp
... WHERE MyColumn LIKE '%AB_SYS%'
That may not be optimal, but it seems like it answers your question... If you can search from only the left or right side that could further optimize.
That is functionally similar to String.Contains
http://msdn.microsoft.com/en-us/library/dy85x1sa.aspx
EDIT: How will you parse the input text into the "relevant" substring?
EDIT: To search the same LIKE condition but reverse, from your partial column to the complete literal, simply append the wildcard characters:
... WHERE 'AB_SYS_20120430.TXT' LIKE '%' + MyColumn + '%'
EDIT: You have suggested that you can't get it to work. If you add the schema do your question then I can help you further but consider this:
You have a table called MyTable
In that table there is a column called MyColumn
Some rows in that table have the data 'AB_SYS' in MyColumn
Given the parameter 'AB_SYS_20120430.TXT' you want to return all matching rows
CREATE PROCEDURE MyTestProcedure
#pFullNameString nvarchar(4000) = '' -- parameter passed in, like AB_SYS_20120430.TXT
AS
BEGIN
SELECT
*
FROM
MyTable
WHERE
#pFullNameString LIKE '%' + MyTable.[MyColumn] + '%'
END
GO
You could use CHARINDEX
WHERE CHARINDEX(StringToCheckFor, StringToCheckIn) > 0

How can I run a query on IDs in a string?

I have a table A with this column:
IDS(VARCHAR)
1|56|23
I need to run this query:
select TEST from TEXTS where ID in ( select IDS from A where A.ID = xxx )
TEXTS.ID is an INTEGER. How can I split the string A.IDS into several ints for the join?
Must work on MySQL and Oracle. SQL99 preferred.
First of all, you should not store data like this in a column. You should split that out into a separate table, then you would have a normal join, and not this problem.
Having said that, what you have to do is the following:
Convert the number to a string
Pad it with the | (your separator) character, before it, and after it (I'll tell you why below)
Pad the text you're looking in with the same separator, before and after
Do a LIKE on it
This will run slow!
Here's the SQL that does what you want (assuming all the operators and functions work in your SQL dialect, you don't say what kind of database engine this is):
SELECT
TEXT -- assuming this was misspelt?
FROM
TEXTS -- and this as well?
JOIN A ON
'|' + A.IDS + '|' LIKE '%|' + CONVERT(TEXTS.ID) + '|%'
The reason why you need to pad the two with the separator before and after is this: what if you're looking for the number 5? You need to ensure it wouldn't accidentally fit the 56 number, just because it contained the digit.
Basically, we will do this:
... '|1|56|23|' LIKE '%|56|%'
If there is ever only going to be 1 row in A, it might run faster if you do this (but I am not sure, you would need to measure it):
SELECT
TEXT -- assuming this was misspelt?
FROM
TEXTS -- and this as well?
WHERE
(SELECT '|' + IDS + '|' FROM A) LIKE '%|' + CONVERT(TEXTS.ID) + '|%'
If there are many rows in your TEXTS table, it will be worth the effort to add code to generate the appropriate SQL by first retrieving the values from the A table, construct an appropriate SQL with IN and use that instead:
SELECT
TEXT -- assuming this was misspelt?
FROM
TEXTS -- and this as well?
WHERE
ID IN (1, 56, 23)
This will run much faster since now it can use an index on this query.
If you had A.ID as a column, and the values as separate rows, here's how you would do the query:
SELECT
TEXT -- assuming this was misspelt?
FROM
TEXTS -- and this as well?
INNER JOIN A ON TEXTS.ID = A.ID
This will run slightly slower than the previous one, but in the previous one you have overhead in having to first retrieve A.IDS, build the query, and risk producing a new execution plan that has to be compiled.