Find rows that contain all words in any order - sql

My application is built in vb.net with SQL Server Compact as the database so I'm unable to use a full-text index.
Here's my data...
MainTable field1
A B C
B G C
X Y Z
C P B
Search term = B C
Expected Results = any combination of the search term = Rows 1, 2, 4
Here's what I'm currently doing...
I'm permuting the search term B C into an array containing %B%C% and %C%B% and inserting those values into field1 of tempTable.
So my SQL looks like this:
SELECT * FROM MainTable INNER JOIN tempTable ON MainTable.field1 LIKE tempTable.field1
In this simple example, it does return the expected results correctly. However, my search term can contain more values. For example 6 search terms B C D E F G when permuted has 720 different values and as more search terms are used, the permutations grow exponentially...which is not good.
Is there a better way to do this?

The following will work for your example above:
Select * from table where field1 like '%[BC]%'
But it will also return strings that contain ONLY "B" or "C". Do you need both characters in any order or one or more?
EDIT: Then the following would work:
Select * from test_data where col1 LIKE '%Apple%' and col1 like '%Dog%'
See the demo here: http://rextester.com/edit/LNDQ49764

Related

Pattern Matching or Fuzzy Matching of two tables based on one column

Assuming I have the right naming, what O am trying to write is a function or stored procedure to compare names and find out if they are the same value.
I think its called fuzzy matching
For example, a table has 2 columns and table b has 3 columns:
Name
Number
Hello
24
Evening
56
Name
Num
F
Heello
23
some value
GoodEvening
15
some value
I want table like
A
D
Hello
Heello
Morning
GoodMorning
Currently, I'm using
Select A.Name, B.Name
from table A
left table B
on A.Name like B.Name
or (LTRIM(RTRIM(REPLACE(REPLACE(REPLACE( A.Name,' ',''),'-',''),'''',''))) = LTRIM(RTRIM(REPLACE(REPLACE(REPLACE(B.Name,' ',''),'-',''),'''',''))))
OR (A.Name LIKE '%'+B.Name+'%')
OR (B.Name LIKE '%'+A.Name+'%')
It is giving me a result, but not too accurate and is very slow, any other way I could try to compare these values?

SQL Server 'AS' alias unexpected syntax

I've come across following T-SQL today:
select c from (select 1 union all select 1) as d(c)
that yields following result:
c
-----------
1
1
The part that got me confused was d(c)
While trying to understand what's going on I've modified T-SQL into:
select c, b from (select 1, 2 union all select 3, 4) m(c, b)
which yields following result:
c b
----------- -----------
1 2
3 4
It was clear that d & m are table reference while letters in brackets c & b are reference to columns.
I wasn't able to find relevant documentation on msdn, but curious if
You're aware of such syntax?
What would be useful use case scenario?
select c from (select 1 union all select 1) as d(c)
is the same as
select c from (select 1 as c union all select 1) as d
In the first query you did not name the column(s) in your subquery, but named them outside the subquery,
In the second query you name the column(s) inside the subquery
If you try it like this (without naming the column(s) in the subquery)
select c from (select 1 union all select 1) as d
You will get following error
No column name was specified for column 1 of 'd'
This is also in the Documentation
As for the usage, some like to write it the first method, some in the second, whatever you prefer. It's all the same
An observation: Using the table constructor values gives you no way of naming the columns, which makes it neccessary to use column naming after the table alias:
select * from
(values
(1,2) -- can't give a column name here
,(3,4)
) as tableName(column1,column2) -- gotta do it here
You've already had comments that point you to the documentation of how derived tables work, but not to answer you question regarding useful use cases for this functionality.
Personally I find this functionality to be useful whenever I want to create a set of addressable values that will be used extensively in your statement, or when I want to duplicate rows for whatever reason.
An example of addressable values would be a much more compelx version of the following, in which the calculated values in the v derived table can be used many times over via more sensible names, rather than repeated calculations that will be hard to follow:
select p.ProductName
,p.PackPricePlusVAT - v.PackCost as GrossRevenue
,etc
from dbo.Products as p
cross apply(values(p.UnitsPerPack * p.UnitCost
,p.UnitPrice * p.UnitsPerPack * 1.2
,etc
)
) as v(PackCost
,PackPricePlusVAT
,etc
)
and an example of being able to duplicate rows could be in creating an exception report for use in validating data, which will output one row for every DataError condition that the dbo.Product row satisfies:
select p.ProductName
,e.DataError
from dbo.Products as p
cross apply(values('Missing Units Per Pack'
,case when p.SoldInPacks = 1 and isnull(p.UnitsPerPack,0) < 1 then 1 end
)
,('Unusual Price'
,case when p.Price > (p.UnitsPerPack * p.UnitCost) * 2 then 1 end
)
,(etc)
) as e(DataError
,ErrorFlag
)
where e.ErrorFlag = 1
If you can understand what these two scripts are doing, you should find numerous examples of where being able to generate additional values or additional rows of data would be very helpful.

Compare two unrelated tables sql

We're dealing with geographic data with our Oracle database.
There's a function called ST_Insertects(x,y) which returns true if record x intersects y.
What we're trying to do is, compare each record of table A with all records of table B, and check two conditions
condition 1 : A.TIMEZONE = 1 (Timezone field is not unique)
condition 2 : B.TIMEZONE = 1
condition 3 : ST_Intersects(A.SHAPE, B.SHAPE) (Shape field is where the geographical information is stored)
The result we're looking for is records ONLY from the table A that satisfy all 3 conditions above
We tried this in a single select statement but it doesn't seem to make much sense logically
pseudo-code that demonstrates a cross-join:
select A.*
from
tbl1 A, tbl2 B
where
A.TIMEZONE = 1 and
B.TIMEZONE = 1 and
ST_Intersects(A.SHAPE, B.SHAPE)
if you get multiples, you can put a distinct and only select A.XXX columns
With a cross-join rows are matched like this
a.row1 - b.row1
a.row1 - b.row2
a.row1 - b.row3
a.row2 - b.row1
a.row2 - b.row2
a.row2 - b.row3
So if row 1 evaluates to true on multiple rows, then just add a distinct on a.Column1, etc.
If you want to use the return value from your function in an Oracle SQL statement, you will need to change the function to return 0 or 1 (or 'T'/'F' - some data type supported by Oracle Database, which does NOT support the Boolean data type).
Then you probably want something like
select <columns from A>
from A
where A.timezone = 1
and exists ( select *
from B
where B.timezone = 1
and ST_intersects(A.shape, B.shape) = 1
)

Difference in NA/NULL treatment using dplyr::left_join (R lang) vs. SQL LEFT JOIN

I want to left join two dataframes, where there might be NAs in the join column on both side (i.e. both code columns)
a <- data.frame(code=c(1,2,NA))
b <- data.frame(code=c(1,2,NA, NA), name=LETTERS[1:4])
Using dplyr, we get:
left_join(a, b, by="code")
code name
1 1 A
2 2 B
3 NA C
4 NA D
Using SQL, we get:
CREATE TABLE a (code INT);
INSERT INTO a VALUES (1),(2),(NULL);
CREATE TABLE b (code INT, name VARCHAR);
INSERT INTO b VALUES (1, 'A'),(2, 'B'),(NULL, 'C'), (NULL, 'D');
SELECT * FROM a LEFT JOIN b USING (code);
It seems that dplyr joins do not treat NAs like SQL NULL values.
Is there a way to get dplyr to behave in the same way as SQL?
What is rationale behind this type of NA treatment?
PS. Of course, I could remove NAs first to get there left_join(a, na.omit(b), by="code"), but that is not my question.
In SQL, "null" matches nothing, because SQL has no information on what it should join to -- hence the resulting "null"s in your joined data set, just as it would appear if performing left outer joins without a match in the right data set.
In R however, the default behaviour for "NA" when it comes to joins is almost to treat it like a data point (e.g. a null operator), so "NA" would match "NA". For example,
> match(NA, NA)
[1] 1
One way you can circumvent this would be to use the base merge method,
> merge(a, b, by="code", all.x=TRUE, incomparables=NA)
code name
1 1 A
2 2 B
3 NA <NA>
The "incomparables" parameter here allows you to define values that cannot be matched, and essentially forces R to treat "NA" the way SQL treats "null". It doesn't look like the incomparables feature is implemented in left_join, but it may simply be named differently.
By default column code have primary key,therefore not accept NULL value

Select rows based on hierarchical permissions

I have a tree/hierarchy of groups and a SQL table of items,each associated with a group (ie. each item belongs to a group). I need to select only the rows associated with a given group, or with the groups below.
eg. say this is the group tree:
A
=> B
=> D
=> C
=> E
=> F
Selecting items for group A will return all rows, while selecting for group C will select items belonging in C,E and F (descendants of C).
So far, I am thinking I can implement this in one of two ways:
1. IN list
SELECT * FROM table WERE Group in ('C','E','F')
programatically determining the list of descendants before querying
2. BITWISE operator
SELECT * FROM table WHERE GroupBitMask & 52!=0
(ie. bitwise 'C' + 'E' + 'F' ==bit 3 + bit 5 + bit 6 == 110100 ==52 )
again, this 52 will need to be computed before the query by parsing the group tree.
I guess I can probably enforce a limit of 64 groups max. and use a 64-bit mask for this.
I'm not sure if the database will use an index for this or simply scan all rows to determine the bitwise result?
Are there any other (better?) methods of selecting the rows I need ?
A simple solution is to store the ancestry as part of the row:
Group Path Other columns
A A ...
B AB ...
C AC ...
D ABD ...
E ACE ...
F ACF ...
You can retrieve the base path with single query:
select Path from YourTable where Group = 'C'
Then you can query all descendants like:
select * from YourTable where path like 'AC%'
This performs very well with a primary key on (Group) and an index on (Path).