Searching a column containing CSV data in a MySQL table for existence of input values - sql

I have a table say, ITEM, in MySQL that stores data as follows:
ID FEATURES
--------------------
1 AB,CD,EF,XY
2 PQ,AC,A3,B3
3 AB,CDE
4 AB1,BC3
--------------------
As an input, I will get a CSV string, something like "AB,PQ". I want to get the records that contain AB or PQ. I realized that we've to write a MySQL function to achieve this. So, if we have this magical function MATCH_ANY defined in MySQL that does this, I would then simply execute an SQL as follows:
select * from ITEM where MATCH_ANY(FEAURES, "AB,PQ") = 0
The above query would return the records 1, 2 and 3.
But I'm running into all sorts of problems while implementing this function as I realized that MySQL doesn't support arrays and there's no simple way to split strings based on a delimiter.
Remodeling the table is the last option for me as it involves lot of issues.
I might also want to execute queries containing multiple MATCH_ANY functions such as:
select * from ITEM where MATCH_ANY(FEATURES, "AB,PQ") = 0 and MATCH_ANY(FEATURES, "CDE")
In the above case, we would get an intersection of records (1, 2, 3) and (3) which would be just 3.
Any help is deeply appreciated.
Thanks

First of all, the database should of course not contain comma separated values, but you are hopefully aware of this already. If the table was normalised, you could easily get the items using a query like:
select distinct i.Itemid
from Item i
inner join ItemFeature f on f.ItemId = i.ItemId
where f.Feature in ('AB', 'PQ')
You can match the strings in the comma separated values, but it's not very efficient:
select Id
from Item
where
instr(concat(',', Features, ','), ',AB,') <> 0 or
instr(concat(',', Features, ','), ',PQ,') <> 0

For all you REGEXP lovers out there, I thought I would add this as a solution:
SELECT * FROM ITEM WHERE FEATURES REGEXP '[[:<:]]AB|PQ[[:>:]]';
and for case sensitivity:
SELECT * FROM ITEM WHERE FEATURES REGEXP BINARY '[[:<:]]AB|PQ[[:>:]]';
For the second query:
SELECT * FROM ITEM WHERE FEATURES REGEXP '[[:<:]]AB|PQ[[:>:]]' AND FEATURES REGEXP '[[:<:]]CDE[[:>:]];
Cheers!

select *
from ITEM where
where CONCAT(',',FEAURES,',') LIKE '%,AB,%'
or CONCAT(',',FEAURES,',') LIKE '%,PQ,%'
or create a custom function to do your MATCH_ANY

Alternatively, consider using RLIKE()
select *
from ITEM
where ','+FEATURES+',' RLIKE ',AB,|,PQ,';

Just a thought:
Does it have to be done in SQL? This is the kind of thing you might normally expect to write in PHP or Python or whatever language you're using to interface with the database.
This approach means you can build your query string using whatever complex logic you need and then just submit a vanilla SQL query, rather than trying to build a procedure in SQL.
Ben

Related

How to get tally of unique words using only SQL?

How do you get a list of all unique words and their frequencies ("tally") in one text column of one table of an SQL database? An answer for any SQL dialect in which this is possible would be appreciated.
In case it helps, here's a one-liner that does it in Ruby using Sequel:
Hash[DB[:table].all.map{|r| r[:text]}.join("\n").gsub(/[\(\),]+/,'').downcase.strip.split(/\s+/).tally.sort_by{|_k,v| v}.reverse]
To give an example, for table table with 4 rows with each holding one line of Dr. Dre's Still D.R.E.'s refrain in a text field, the output would be:
{"the"=>4,
"still"=>3,
"them"=>3,
"for"=>2,
"d-r-e"=>1,
"it's"=>1,
"streets"=>1,
"love"=>1,
"got"=>1,
"i"=>1,
"and"=>1,
"beat"=>1,
"perfect"=>1,
"to"=>1,
"time"=>1,
"my"=>1,
"taking"=>1,
"girl"=>1,
"low-lows"=>1,
"in"=>1,
"corners"=>1,
"hitting"=>1,
"world"=>1,
"across"=>1,
"all"=>1,
"gangstas"=>1,
"representing"=>1,
"i'm"=>1}
This works, of course, but is naturally not as fast or elegant as it would be to do it in pure SQL - which I have no clue if that's even in the realm of possibilites...
I guess it would depend on how the SQL database would look like. You would have to first turn your 4 row "database" into a data of single column, each row representing one word. To do that you could use something like String_split, where every space would be a delimiter.
STRING_SPLIT('I'm representing for them gangstas all across the world', ' ')
https://www.sqlservertutorial.net/sql-server-string-functions/sql-server-string_split-function/
This would turn it into a table where every word is a row.
Once you've set up your data table, then it's easy.
Your_table:
[Word]
I'm
representing
for
them
...
world
Then you can just write:
SELECT Word, count(*)
FROM your_table
GROUP BY Word;
Your output would be:
Word | Count
I'm 1
representing 1
I had a play using XML in sql server. Just an idea :)
with cteXML as
(
select *
,cast('<wd>' + replace(songline,' ' ,'</wd><wd>') + '</wd>' as xml) as XMLsongline
from #tSongs
),cteBase as
(
select p.value('.','nvarchar(max)') as singleword
from cteXML as x
cross apply x.XMLsongline.nodes('/wd') t(p)
)
select b.singleword,count(b.singleword)
from cteBase as b
group by b.singleword

Convert strings into table columns in biq query

I would like to convert this table
to something like this
the long string can be dynamic so it's important to me that it's not a fixed solution for these values specifically
Please help, i'm using big query
You could start by using SPLIT SPLIT(value[, delimiter]) to convert your long string into separate key-value pairs in an array.
This will be sensitive to you having commas as part of your values.
SPLIT(session_experiments, ',')
Then you could either FLATTEN that array or access each element, and then use some REGEXs to separate the key and the value.
If you share more context on your restrictions and intended result I could try and put together a query for you that does exactly what you want.
It's not possible what you want, however, there is a better practice for BigQuery.
You can use arrays of structs to store that information in a table.
Let's say you have a table like that
You can use that sample query to understand how to use it.
with rawdata AS
(
SELECT 1 as id, 'test1-val1,test2-val2,test3-val3' as experiments union all
SELECT 1 as id, 'test1-val1,test3-val3,test5-val5' as experiments
)
select
id,
(select array_agg(struct(split(param, '-')[offset(0)] as experiment, split(param, '-')[offset(1)] as value)) from unnest(split(experiments)) as param ) as experiments
from rawdata
The output will look like that:
After having that output, it's more convenient to manipulate the data

SQL full text search behavior on numeric values

I have a table with about 200 million records. One of the columns is defined as varchar(100) and it's included in a full text index. Most of the values are numeric. Only few are not numeric.
The problem is that it's not working well. For example if a row contains the value '123456789' and i look for '567', it's not returning this row. It will only return rows where the value is exactly '567'.
What am I doing wrong?
sql server 2012.
Thanks.
Full text search doesn't support leading wildcards
In my setup, these return the same
SELECT *
FROM [dbo].[somelogtable]
where CONTAINS (logmessage, N'28400')
SELECT *
FROM [dbo].[somelogtable]
where CONTAINS (logmessage, N'"2840*"')
This gives zero rows
SELECT *
FROM [dbo].[somelogtable]
where CONTAINS (logmessage, N'"*840*"')
You'll have to use LIKE or some fancy trigram approach
The problem is probably that you are using a wrong tool since Full-text queries perform linguistic searches and it seems like you want to use simple "like" condition.
If you want to get a solution to your needs then you can post DDL+DML+'desired result'
You can do this:
....your_query.... LIKE '567%' ;
This will return all the rows that have a number 567 in the beginning, end or in between somewhere.
99% You're missing % after and before the string you search in the LIKE clause.
es:
SELECT * FROM t WHERE att LIKE '66'
is the same as as using WHERE att = '66'
if you write:
SELECT * FROM t WHERE att LIKE '%66%'
will return you all the lines containing 2 'sixes' one after other

How to Filter WHERE Field Value LIKE any of the values stored in a Multi Value Parameter in SQL

I have a report (built using SSRS) that uses a multi-value parameter.
I want to add a Filter onto my SQL Query WHERE FieldA is LIKE any of the values stored in the parameter.
So FieldA might have the following values:
BOBJAMESLOUISE
MARYBOB
JENNY
JOHNLOUISEJAMES
BOB
JENNYJAMESMIKE
And #ParamA might have the following values:
Bob, Louise
Therefore in this example only records 1, 3, 4 and 5 should be returned
Thanks to any help in advance :)
P.S I'm using SQL Server 2008
You will want to implement a function like the split function. This can take a comma separated value list and separate it into rows like you want.
Below is a link for a couple of different versions, any of them will work for you. It also tells you how to use it.
Split Function
I am guessing its not the spiting sting part that is the issue since just googling for SQL split string you can find a lot of example. In your case what you would want after the split string is something like this. Assuming that the split string function you end up using returns a table of values Here is what your comparison query for with field A would look like.
SELECT * FROM YourTableWithFieldA WHERE (#ParamA IS NULL OR EXISTS ( SELECT * FROM YourSplitFunctionThatReturnsATableOfValues(#ParamA) SplitTable WHERE (FieldA Like '%'+SplitTable.Value+'%')))

Implement an IN Query using XQuery in MSSQLServer 2005

I'm trying to query an xml column using an IN expression. I have not found a native XQuery way of doing such a query so I have tried two work-arounds:
Implement the IN query as a concatenation of ORs like this:
WHERE Data.exist('/Document/ParentKTMNode[text() = sql:variable("#Param1368320145") or
text() = sql:variable("#Param2043685301") or ...
Implement the IN query with the String fn:contains(...) method like this:
WHERE Data.exist('/Document/Field2[fn:contains(sql:variable("#Param1412022317"), .)]') = 1
Where the given parameter is a (long) string with the values separated by "|"
The problem is that Version 1. doesn't work for more than about 50 arguments. The server throws an out of memory exception. Version 2. works, but is very, very slow.
Has anyone a 3. idea? To phrase the problem more complete: Given a list of values, of any sql native type, select all rows whose xml column has one of the given values at a specific field in the xml.
Try to insert all your parameters in a table and query using sql:column clause:
SELECT Mytable.Column FROM MyTable
CROSS JOIN (SELECT '#Param1' T UNION ALL SELECT '#Param2') B
WHERE Data.exist('/Document/ParentKTMNode[text() = sql:column("T")