SQL to filter by multiple criteria including containment in string list

SQL to filter by multiple criteria including containment in string list - sql

so i have a table lets say call it "tbl.items" and there is a column "title" in "tbl.items" i want to loop through each row and for each "title" in "tbl.items" i want to do following:
the column has the datatype nvarchar(max) and contains a string...
filter the string to remove words like in,out, where etc (stopwords)
compare the rest of the string to a predefined list and if there is a match perform some action which involves inserting data in other tables as well..
the problem is im ignotent when it comes to writing T-sql scripts, plz help and guide me how can i achieve this?
whether it can be achieved by writing a sql script??
or i have to develope a console application in c# or anyother language??
im using mssql server 2008
thanks in advance

You want a few things. First, look up SQL Server's syntax for functions, and write something like this:
-- Warning! Code written off the top of my head,
-- don't expect this to work w/copy-n-paste
create function removeStrings(#input nvarchar(4000))
as begin
-- We're being kind of simple-minded and using strings
-- instead of regular expressions, so we are assuming a
-- a space before and after each word. This makes this work better:
#input = ' ' + #input
-- Big list of replaces
#input = replace(' in ','',#input)
#input = replace(' out ','',#input)
--- more replaces...
end
Then you need your list of matches in a table, call this "predefined" with a column "matchString".
Then you can retrieve the matching rows with:
select p.matchString
from items i
join predefined p
on removeStrings(i.title) = p.matchString
Once you have those individual pieces working, I suggest a new question on what particular process you may be doing with them.
Warning: Not knowing how many rows you have or how often you have to do this (every time a user saves something? Once/day?), this will not exactly be zippy, if you know what I mean. So once you have these building blocks in hand, there may also be a follow-up question for how and when to do it.

Related

SQL DB2 - How to SELECT or compare columns based on their name?

Thank you for checking my question out!
I'm trying to write a query for a very specific problem we're having at my workplace and I can't seem to get my head around it.
Short version: I need to be able to target columns by their name, and more specifically by a part of their name that will be consistent throughout all the columns I need to combine or compare.
More details:
We have (for example), 5 different surveys. They have many questions each, but SOME of the questions are part of the same metric, and we need to create a generic field that keeps it. There's more background to the "why" of that, but it's pretty important for us at this point.
We were able to kind of solve this with either COALESCE() or CASE statements but the challenge is that, as more surveys/survey versions continue to grow, our vendor inevitably generates new columns for each survey and its questions.
Take this example, which is what we do currently and works well enough:
CASE
WHEN SURVEY_NAME = 'Service1' THEN SERV1_REC
WHEN SURVEY_NAME = 'Notice1' THEN FNOL1_REC
WHEN SURVEY_NAME = 'Status1' THEN STAT1_REC
WHEN SURVEY_NAME = 'Sales1' THEN SALE1_REC
WHEN SURVEY_NAME = 'Transfer1' THEN Null
ELSE Null
END REC
And also this alternative which works well:
COALESCE(SERV1_REC, FNOL1_REC, STAT1_REC, SALE1_REC) as REC
But as I mentioned, eventually we will have a "SALE2_REC" for example, and we'll need them BOTH on this same statement. I want to create something where having to come into the SQL and make changes isn't needed. Given that the columns will ALWAYS be named "something#_REC" for this specific metric, is there any way to achieve something like:
COALESCE(all columns named LIKE '%_REC') as REC
Bonus! Related, might be another way around this same problem:
Would there also be a way to achieve this?
SELECT (columns named LIKE '%_REC') FROM ...
Thank you very much in advance for all your time and attention.
-Kendall

Table and column information in Db2 are managed in the system catalog. The relevant views are SYSCAT.TABLES and SYSCAT.COLUMNS. You could write:
select colname, tabname from syscat.tables
where colname like some_expression
and syscat.tabname='MYTABLE
Note that the LIKE predicate supports expressions based on a variable or the result of a scalar function. So you could match it against some dynamic input.
Have you considered storing the more complicated properties in JSON or XML values? Db2 supports both and you can query those values with regular SQL statements.

Hide Empty columns

I got a table with 75 columns,. what is the sql statement to display only the columns with values in in ?
thanks

It's true that a similar statement doesn't exist (in a SELECT you can use condition filters only for the rows, not for the columns). But you could try to write a (bit tricky) procedure. It must check which are the columns that contains at least one not NULL/empty value, using queries. When you get this list of columns just join them in a string with a comma between each one and compose a query that you can run, returning what you wanted.
EDIT: I thought about it and I think you can do it with a procedure but under one of these conditions:
find a way to retrieve column names dynamically in the procedure, that is the metadata (I never heard about it, but I'm new with procedures)
or hardcode all column names (loosing generality)
You could collect column names inside an array, if stored procedures of your DBMS support arrays (or write the procedure in a programming language like C), and loop on them, making a SELECT each time, checking if it's an empty* column or not. If it contains at least one value concatenate it in a string where column names are comma-separated. Finally you can make your query with only not-empty columns!
Alternatively to stored procedure you could write a short program (eg in Java) where you can deal with a better flexibility.
*if you check for NULL values it will be simple, but if you check for empty values you will need to manage with each column data type... another array with data types?

I would suggest that you write a SELECT statement and define which COLUMNS you wish to display and then save that QUERY as a VIEW.
This will save you the trouble of typing in the column names every time you wish to run that query.
As marc_s pointed out in the comments, there is no select statement to hide columns of data.

You could do a pre-parse and dynamically create a statement to do this, but this would be a very inefficient thing to do from a SQL performance perspective. Would strongly advice against what you are trying to do.

A simplified version of this is to just select the relevant columns, which was what I needed personally. A quick search of what we're dealing with in a table
SELECT * FROM table1 LIMIT 10;
-> shows 20 columns where im interested in 3 of them. Limit is just to not overflow the console.
SELECT column1,column3,colum19 FROM table1 WHERE column3='valueX';
It is a bit of a manual filter but it works for what I need.

SQL Server stored procedures - update column based on variable name..?

I have a data driven site with many stored procedures. What I want to eventually be able to do is to say something like:
For Each #variable in sproc inputs
UPDATE #TableName SET #variable.toString = #variable
Next
I would like it to be able to accept any number of arguments.
It will basically loop through all of the inputs and update the column with the name of the variable with the value of the variable - for example column "Name" would be updated with the value of #Name. I would like to basically have one stored procedure for updating and one for creating. However to do this I will need to be able to convert the actual name of a variable, not the value, to a string.
Question 1: Is it possible to do this in T-SQL, and if so how?
Question 2: Are there any major drawbacks to using something like this (like performance or CPU usage)?
I know if a value is not valid then it will only prevent the update involving that variable and any subsequent ones, but all the data is validated in the vb.net code anyway so will always be valid on submitting to the database, and I will ensure that only variables where the column exists are able to be submitted.
Many thanks in advance,
Regards,
Richard Clarke
Edit:
I know about using SQL strings and the risk of SQL injection attacks - I studied this a bit in my dissertation a few weeks ago.
Basically the website uses an object oriented architecture. There are many classes - for example Product - which have many "Attributes" (I created my own class called Attribute, which has properties such as DataField, Name and Value where DataField is used to get or update data, Name is displayed on the administration frontend when creating or updating a Product and the Value, which may be displayed on the customer frontend, is set by the administrator. DataField is the field I will be using in the "UPDATE Blah SET #Field = #Value".
I know this is probably confusing but its really complicated to explain - I have a really good understanding of the entire system in my head but I cant put it into words easily.
Basically the structure is set up such that no user will be able to change the value of DataField or Name, but they can change Value. I think if I were to use dynamic parameterised SQL strings there will therefore be no risk of SQL injection attacks.
I mean basically loop through all the attributes so that it ends up like:
UPDATE Products SET [Name] = '#Name', Description = '#Description', Display = #Display
Then loop through all the attributes again and add the parameter values - this will have the same effect as using stored procedures, right??
I dont mind adding to the page load time since this is mainly going to affect the administration frontend, and will marginly affect the customer frontend.

Question 1: you must use dynamic SQL - construct your update statement as a string, and run it with the EXEC command.
Question 2: yes there are - SQL injection attacks, risk of mal-formed queries, added overhead of having to compile a separate SQL statement.

Your example is very inefficient, so if I pass in 10 columns you will update the same table 10 times?
The better way is to do one update by using sp_executesql and build this dynamically, take a look at The Curse and Blessings of Dynamic SQL to see how you have to do it

Is this a new system where you have the freedom to design as necessary, or are you stuck with an existing DB design?
You might consider representing the attributes not as columns, but as rows in a child table.
In the parent MyObject you'd just have header-level data, things that are common to all objects in the system (maybe just an identifier). In the child table MyObjectAttribute you'd have a primary key of with another column attrValue. This way you can do an UPDATE like so:
UPDATE MyObjectAttribute
SET attrValue = #myValue
WHERE objectID = #myID
AND attrName = #myAttrName

Sql Optimization: Xml or Delimited String

This is hopefully just a simple question involving performance optimizations when it comes to queries in Sql 2008.
I've worked for companies that use Stored Procs a lot for their ETL processes as well as some of their websites. I've seen the scenario where they need to retrieve specific records based on a finite set of key values. I've seen it handled in 3 different ways, illustrated via pseudo-code below.
Dynamic Sql that concatinates a string and executes it.
EXEC('SELECT * FROM TableX WHERE xId IN (' + #Parameter + ')'
Using a user defined function to split a delimited string into a table
SELECT * FROM TableY INNER JOIN SPLIT(#Parameter) ON yID = splitId
USING XML as the Parameter instead of a delimited varchar value
SELECT * FROM TableZ JOIN #Parameter.Nodes(xpath) AS x (y) ON ...
While I know creating the dynamic sql in the first snippet is a bad idea for a large number of reasons, my curiosity comes from the last 2 examples. Is it more proficient to do the due diligence in my code to pass such lists via XML as in snippet 3 or is it better to just delimit the values and use an udf to take care of it?

There is now a 4th option - table valued parameters, whereby you can actually pass a table of values in to a sproc as a parameter and then use that as you would normally a table variable. I'd be preferring this approach over the XML (or CSV parsing approach)
I can't quote performance figures between all the different approaches, but that's one I'd be trying - I'd recommend doing some real performance tests on them.
Edit:
A little more on TVPs. In order to pass the values in to your sproc, you just define a SqlParameter (SqlDbType.Structured) - the value of this can be set to any IEnumerable, DataTable or DbDataReader source. So presumably, you already have the list of values in a list/array of some sort - you don't need to do anything to transform it into XML or CSV.
I think this also makes the sproc clearer, simpler and more maintainable, providing a more natural way to achieve the end result. One of the main points is that SQL performs best at set based/not looping/non string manipulation activities.
That's not to say it will perform great with a large set of values passed in. But with smaller sets (up to ~1000) it should be fine.

UDF invocation is a little bit more costly than splitting the XML using the built-in function.
However, this only needs to be done once per query, so the performance difference will be negligible.

SQL to search and replace in mySQL

In the process of fixing a poorly imported database with issues caused by using the wrong database encoding, or something like that.
Anyways, coming back to my question, in order to fix this issues I'm using a query of this form:
UPDATE table_name SET field_name =
replace(field_name,’search_text’,'replace_text’);
And thus, if the table I'm working on has multiple columns I have to call this query for each of the columns. And also, as there is not only one pair of things to run the find and replace on I have to call the query for each of this pairs as well.
So as you can imagine, I end up running tens of queries just to fix one table.
What I was wondering is if there is a way of either combine multiple find and replaces in one query, like, lets say, look for this set of things, and if found, replace with the corresponding pair from this other set of things.
Or if there would be a way to make a query of the form I've shown above, to run somehow recursively, for each column of a table, regardless of their name or number.
Thank you in advance for your support,
titel

Let's try and tackle each of these separately:
If the set of replacements is the same for every column in every table that you need to do this on (or there are only a couple patterns), consider creating a user-defined function that takes a varchar and returns a varchar that just calls replace(replace(#input,'search1','replace1'),'search2','replace2') nested as appropriate.
To update multiple columns at the same time you should be able to do UPDATE table_name SET field_name1 = replace(field_name1,...), field_name2 = replace(field_name2,...) or something similar.
As for running something like that for every column in every table, I'd think it would be easiest to write some code which fetches a list of columns and generates the queries to execute from that.

I don't know of a way to automatically run a search-and-replace on each column, however the problem of multiple pairs of search and replace terms in a single UPDATE query is easily solved by nesting calls to replace():
UPDATE table_name SET field_name =
replace(
replace(
replace(
field_name,
'foo',
'bar'
),
'see',
'what',
),
'I',
'mean?'
)

If you have multiple replaces of different text in the same field, I recommend that you create a table with the current values and what you want them replaced with. (Could be a temp table of some kind if this is a one-time deal; if not, make it a permanent table.) Then join to that table and do the update.
Something like:
update t1
set field1 = t2.newvalue
from table1 t1
join mycrossreferncetable t2 on t1.field1 = t2.oldvalue
Sorry didn't notice this is MySQL, the code is what I would use in SQL Server, my SQL syntax may be different but the technique would be similar.

I wrote a stored procedure that does this. I use this on a per database level, although it would be easy to abstract it to operate globally across a server.
I would just paste this inline, but it would seem that I'm too dense to figure out how to use the markdown deal, so the code is here:
http://www.anovasolutions.com/content/mysql-search-and-replace-stored-procedure

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas