LINQ to SQL query fails using two Contains statements - sql

I have two tables, DH_MASTER and DH_ALIAS. DH_MASTER contains information about a person, including their name. DH_ALIAS contains AKA records about the person. The tables are linked by the Operator field which is a primary key in DH_MASTER.
The users want to search by the name stored in DH_MASTER as well as search through all of their known aliases. If any matches are found in either DH_MASTER or DH_ALIAS then the DH_MASTER entity should be returned.
I created the query below which should give the results I described (return any DH_MASTER rows where the DH_MASTER.Name == name or DH_MASTER.DH_ALIAs(n).Name == name).
It works fine if I use only ONE of the .Contains lines. It doesn't matter which one I use. But the execution fails when I try to use BOTH at the same time.
qry = From m In Context.DH_MASTERs _
Where (m.Name.Contains(name)) _
OrElse ((From a In m.DH_ALIAs _
Where a.Name.Contains(name)).Count() > 0) _
Select m
The LinqToSQL Query evaluates to the following SQL code (as displayed in the SQL Server Query Visualizer)
SELECT [t0].[Operator], [t0].[Name], [t0].[Version]
FROM [DHOWNER].[DH_MASTER] AS [t0]
WHERE ([t0].[Name] LIKE %smith%) OR (((
SELECT COUNT(*)
FROM [DHOWNER].[DH_ALIAS] AS [t1]
WHERE ([t1].[Name] LIKE %smith%) AND ([t1].[Operator] = [t0].[Operator])
)) > 0)
EDIT: Checking the "Show Original" box in the Query Visualizer reveals the parameterized query as expected so this block of text below should be ignored.
I don't know if this is a problem or not but the `.Contains` evaluates to a `LIKE` expression (which is what I expect to happen) but the parameter is not encapsulated in apostrophes.
The interesting thing is that if I copy/paste the SQL Query into SQL 2005 Query Analyzer and add the apostrophes around the LIKE parameters, it runs just fine. In fact, it's lightning quick (blink of an eye) even with more than 2 million rows.
But when the LINQ query runs, the web app locks up for about 31 seconds before it finally fails with this error on gv.DataBind: Exception has been thrown by the target of an invocation.
With this innerException: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
Does anyone know why this happens and how the behavior can be worked around? It's driving me nuts because the LinqToSql-generated SQL runs fine in query analyzer!
Update:
I have refactored my code based on the techniques in the answer. This works!
qry = From m In qry _
Where m.Name.Contains(name) OrElse _
m.DH_ALIAs.Any(Function(aliasRec) aliasRec.Name.Contains(name)) _
Select m

This might not be appropriate, since your problem is different, but I remember a problem in one of my programs: Contains() would not work (in my case it would throw an Exception when evaluating), so maybe the Contains-method is a bit broken.
I replaced
result.Contains( x )
with
result.Any( p => p == x )
which did work.
Can you try if that works? At least it might be a step in the right direction.

Linq to sql doesn't specify values directly into the query, it uses parameters. Are you sure it had the contains parameter value directly in the sql?
Anyway, the timeout is likely caused by a deadlock: the query wants to read from a row in a table which is locked by another (insert/update) query /transaction and that query apparently takes longer to complete.

Related

R function that takes in CL argument and queries SQL database

Brand new to SQL and SQLite here. I'm trying to create a function in R studio that takes in an argument from the command line and queries an SQL database to see if the record exists in one specific column, displaying a message to the user whether the record was found or not (I have a table within this database, lets call it my_table, that has 2 columns, we'll name them column_1 and column_2. column_1 has ID numbers and column_2 has names that are associated with those ID numbers, and there are a total of about 700 rows).
So far I have something that looks like this:
my_function() <- function(CL_record) { db <- dbConnect(SQLite(), dbname = "mysql.db") dbGetQuery(db, sql("SELECT * FROM my_table WHERE column_2 == #CL_record")) }
But this is obviously not the right way to go about it and I keep getting errors thrown regarding invalid (NULL) left side of assignment.
Any help here would be much appreciated.
I recommend using parameterized queries, something like:
my_function <- function(CL_record) {
db <- dbConnect(SQLite(), dbname = "mysql.db")
on.exit(dbDisconnect(db), add = TRUE)
dbGetQuery(db, "SELECT * FROM my_table WHERE column_2 = ?",
params = list(CL_record))
}
The params= argument does not need to be a list(.), it works just fine here as params=CL_record, but if you have two or more, and especially if they are of different classes (strings vs numbers), then you really should list them.
You'll see many suggestions, or even howtos/tutorials that suggest using paste or similar for adding parameters to the query. There are at least two reasons for not using paste or sprintf:
Whether malicious or inadvertent (e.g., typos), sql injection is something that should be actively avoided. Even if your code is not public-facing, accidents happen.
Most (all?) DBMSes have query optimization/compiling. When it receives a query it has not seen before, it parses and attempts to optimize the query; when it sees a repeat query, it can re-use the previous optimization, saving time. When you paste or sprintf an argument into a query, each query is different from the previous, so it is overall less efficient. When using bound parameters, the query itself does not change (just its parameters), so we can take advantage of the compiling/optimization.

LookupRecord in SQL statement with WHERE clause does not work

Background
I am creating a database to tracks lab samples. I wish to put a restriction in place that prevents a technician from reporting multiple results for the same sample.
Current strategy
My query called qselReport lists all samples being reported.
SELECT tblResult.strSampleName, tblResult.ysnReport FROM tblResult WHERE (((tblResult.ysnReport)=True));
When a technician wishes to report a result for a given sample, I use a Before Change Event to check for that sample in qselReport (the code block below is my event macro N.B. it is not VBA).
If Updated("ysnReport") And Old.[ysnReport]=False Then
Look Up A Record In qselReport
Where Condition = [strSampleName]=[tblResult].[strSampleName]
Alias
RaiseError
Error Number 1
Error Description This sample is already being reported.
End If
That all works fine and dandy. The error message pops up if a second result is selected to report for a sample.
The problem
I like to keep things as sleek as possible, so I don't want qselReport unless it's absolutely necessary. So I made a slight adjustment to the LookupRecord block so that it operates on a SQL statement rather than on the query. Here's what that looks like (again N.B. not VBA, just a macro):
If Updated("ysnReport") And Old.[ysnReport]=False Then
Look Up A Record In SELECT tblResult.strSampleName, tblResult.ysnReport FROM tblResult WHERE [tblResult].[ysnReport]=True;
Where Condition = [strSampleName]=[tblResult].[strSampleName]
Alias
RaiseError
Error Number 1
Error Description This sample is already being reported.
End If
Now I get the error message every time that a result is reported, even if it's the first instance for that sample. From what I can tell, the issue is that the SQL statement WHERE clause does not filter the records to only those where ysnReport=True.
Can anyone explain why I can do LookupRecord in a query but not LookupRecord in an identical SQL statement? Thanks for the input.
If you want things as sleek as possible, at least performance-wise, a stored query should, in principle, outperform dynamic SQL.
Syntax-wise, I'm not familiar with the macro constructs, but I'd consider enclosing the select statement in parentheses if it accepts them and also adding an explicit alias. I suspect that alias would in turn need to be referenced in your WHERE condition:
Where Condition = MySelect.[strSampleName]=[tblResult].[strSampleName]
Alias MySelect
I found the solution to my problem. The SQL statement where clause needed to be moved to the LookupRecord data block Where condition. This works:
If Updated("ysnReport") And Old.[ysnReport]=False Then
Look Up A Record In SELECT tblResult.strSampleName, tblResult.ysnReport FROM tblResult;
Where Condition = [ysnReport]=True And [strSampleName]=[tblResult].[strSampleName]
Alias
RaiseError
Error Number 1
Error Description This sample is already being reported.
End If

why would you use WHERE 1=0 statement in SQL?

I saw a query run in a log file on an application. and it contained a query like:
SELECT ID FROM CUST_ATTR49 WHERE 1=0
what is the use of such a query that is bound to return nothing?
A query like this can be used to ping the database. The clause:
WHERE 1=0
Ensures that non data is sent back, so no CPU charge, no Network traffic or other resource consumption.
A query like that can test for:
server availability
CUST_ATTR49 table existence
ID column existence
Keeping a connection alive
Cause a trigger to fire without changing any rows (with the where clause, but not in a select query)
manage many OR conditions in dynamic queries (e.g WHERE 1=0 OR <condition>)
This may be also used to extract the table schema from a table without extracting any data inside that table. As Andrea Colleoni said those will be the other benefits of using this.
A usecase I can think of: you have a filter form where you don't want to have any search results. If you specify some filter, they get added to the where clause.
Or it's usually used if you have to create a sql query by hand. E.g. you don't want to check whether the where clause is empty or not..and you can just add stuff like this:
where := "WHERE 0=1"
if X then where := where + " OR ... "
if Y then where := where + " OR ... "
(if you connect the clauses with OR you need 0=1, if you have AND you have 1=1)
As an answer - but also as further clarification to what #AndreaColleoni already mentioned:
manage many OR conditions in dynamic queries (e.g WHERE 1=0 OR <condition>)
Purpose as an on/off switch
I am using this as a switch (on/off) statement for portions of my Query.
If I were to use
WHERE 1=1
AND (0=? OR first_name = ?)
AND (0=? OR last_name = ?)
Then I can use the first bind variable (?) to turn on or off the first_name search criterium. , and the third bind variable (?) to turn on or off the last_name criterium.
I have also added a literal 1=1 just for esthetics so the text of the query aligns nicely.
For just those two criteria, it does not appear that helpful, as one might thing it is just easier to do the same by dynamically building your WHERE condition by either putting only first_name or last_name, or both, or none. So your code will have to dynamically build 4 versions of the same query. Imagine what would happen if you have 10 different criteria to consider, then how many combinations of the same query will you have to manage then?
Compile Time Optimization
I also might add that adding in the 0=? as a bind variable switch will not work very well if all your criteria are indexed. The run time optimizer that will select appropriate indexes and execution plans, might just not see the cost benefit of using the index in those slightly more complex predicates. Hence I usally advice, to inject the 0 / 1 explicitly into your query (string concatenating it in in your sql, or doing some search/replace). Doing so will give the compiler the chance to optimize out redundant statements, and give the Runtime Executer a much simpler query to look at.
(0=1 OR cond = ?) --> (cond = ?)
(0=0 OR cond = ?) --> Always True (ignore predicate)
In the second statement above the compiler knows that it never has to even consider the second part of the condition (cond = ?), and it will simply remove the entire predicate. If it were a bind variable, the compiler could never have accomplished this.
Because you are simply, and forcedly, injecting a 0/1, there is zero chance of SQL injections.
In my SQL's, as one approach, I typically place my sql injection points as ${literal_name}, and I then simply search/replace using a regex any ${...} occurrence with the appropriate literal, before I even let the compiler have a stab at it. This basically leads to a query stored as follows:
WHERE 1=1
AND (0=${cond1_enabled} OR cond1 = ?)
AND (0=${cond2_enabled} OR cond2 = ?)
Looks good, easily understood, the compiler handles it well, and the Runtime Cost Based Optimizer understands it better and will have a higher likelihood of selecting the right index.
I take special care in what I inject. Prime way for passing variables is and remains bind variables for all the obvious reasons.
This is very good in metadata fetching and makes thing generic.
Many DBs have optimizer so they will not actually execute it but its still a valid SQL statement and should execute on all DBs.
This will not fetch any result, but you know column names are valid, data types etc. If it does not execute you know something is wrong with DB(not up etc.)
So many generic programs execute this dummy statement for testing and fetching metadata.
Some systems use scripts and can dynamically set selected records to be hidden from a full list; so a false condition needs to be passed to the SQL. For example, three records out of 500 may be marked as Privacy for medical reasons and should not be visible to everyone. A dynamic query will control the 500 records are visible to those in HR, while 497 are visible to managers. A value would be passed to the SQL clause that is conditionally set, i.e. ' WHERE 1=1 ' or ' WHERE 1=0 ', depending who is logged into the system.
quoted from Greg
If the list of conditions is not known at compile time and is instead
built at run time, you don't have to worry about whether you have one
or more than one condition. You can generate them all like:
and
and concatenate them all together. With the 1=1 at the start, the
initial and has something to associate with.
I've never seen this used for any kind of injection protection, as you
say it doesn't seem like it would help much. I have seen it used as an
implementation convenience. The SQL query engine will end up ignoring
the 1=1 so it should have no performance impact.
Why would someone use WHERE 1=1 AND <conditions> in a SQL clause?
If the user intends to only append records, then the fastest method is open the recordset without returning any existing records.
It can be useful when only table metadata is desired in an application. For example, if you are writing a JDBC application and want to get the column display size of columns in the table.
Pasting a code snippet here
String query = "SELECT * from <Table_name> where 1=0";
PreparedStatement stmt = connection.prepareStatement(query);
ResultSet rs = stmt.executeQuery();
ResultSetMetaData rsMD = rs.getMetaData();
int columnCount = rsMD.getColumnCount();
for(int i=0;i<columnCount;i++) {
System.out.println("Column display size is: " + rsMD.getColumnDisplaySize(i+1));
}
Here having a query like "select * from table" can cause performance issues if you are dealing with huge data because it will try to fetch all the records from the table. Instead if you provide a query like "select * from table where 1=0" then it will fetch only table metadata and not the records so it will be efficient.
Per user milso in another thread, another purpose for "WHERE 1=0":
CREATE TABLE New_table_name as select * FROM Old_table_name WHERE 1 =
2;
this will create a new table with same schema as old table. (Very
handy if you want to load some data for compares)
An example of using a where condition of 1=0 is found in the Northwind 2007 database. On the main page the New Customer Order and New Purchase Order command buttons use embedded macros with the Where Condition set to 1=0. This opens the form with a filter that forces the sub-form to display only records related to the parent form. This can be verified by opening either of those forms from the tree without using the macro. When opened this way all records are displayed by the sub-form.
In ActiveRecord ORM, part of RubyOnRails:
Post.where(category_id: []).to_sql
# => SELECT * FROM posts WHERE 1=0
This is presumably because the following is invalid (at least in Postgres):
select id FROM bookings WHERE office_id IN ()
It seems like, that someone is trying to hack your database. It looks like someone tried mysql injection. You can read more about it here: Mysql Injection

LINQ Count .. best method

My company has just started using LINQ and I still am having a little trouble with the abstractness (if thats a word) of the LINQ command and the SQL, my question is
Dim query = (From o In data.Addresses _
Select o.Name).Count
In the above in my mind, the SQL is returning all rows and the does a count on the number rows in the IQueryable result, so I would be better with
Dim lstring = Aggregate o In data.Addresses _
Into Count()
Or am I over thinking the way LINQ works ? Using VB Express at home so I can't see the actual SQL that is being sent to the database (I think) as I don't have access to the SQL profiler
As mentioned, these are functionally equivalent, one just uses query syntax.
As mentioned in my comment, if you evaluate the following as a VB Statement(s) in LINQPad:
Dim lstring = Aggregate o In Test _
Into Count()
You get this in the generated SQL output window:
SELECT COUNT(*) AS [value]
FROM [Test] AS [t0]
Which is the same as the following VB LINQ expression as evaluated:
(From o In Test_
Select o.Symbol).Count
You get the exact same result.
I'm not familiar with Visual Basic, but based on
http://msdn.microsoft.com/en-us/library/bb546138.aspx
Those two approaches are the same. One uses method syntax and the other uses query syntax.
You can find out for sure by using SQL Profiler as the queries run.
PS - The "point" of LINQ is you can easily do query operations without leaving code/VB-land.
An important thing here, is that the code you give will work with a wide variety of data sources. It will hopefully do so in a very efficient way, though that can't be fully guaranteed. It certainly will be done in an efficient way with a SQL source (being converted into a SELECT COUNT(*) SQL query. It will be done efficiently if the source was an in-memory collection (it gets converted to calling the Count property). It isn't done very efficiently if the source is an enumerable that is not a collection (in this case it does read everything and count as it goes), but in that case there really isn't a more efficient way of doing this.
In each case it has done the same conceptual operation, in the most efficient manner possible, without you having to worry about the details. No big deal with counting, but a bigger deal in more complex cases.
To a certain extent, you are right when you say "in my mind, the SQL is returning all rows and the does a count on the number rows". Conceptually that is what is happening in that query, but the implementation may differ. Compare with how the real query in SQL may not match the literal interpretation of the SQL command, to allow the most efficient approach to be picked.
I think you are missing the point as Linq with SQL has late binding the search is done when you need it so when you say I need the count number then a Query is created.
Before that Linq for SQL creates Expression trees that will be "translated" in to SQL when you need it....
http://weblogs.asp.net/scottgu/archive/2007/05/19/using-linq-to-sql-part-1.aspx
http://msdn.microsoft.com/en-us/netframework/aa904594.aspx
How to debug see Scott
http://weblogs.asp.net/scottgu/archive/2007/07/31/linq-to-sql-debug-visualizer.aspx
(source: scottgu.com)

Slow SQL Code on local server in MS Access

I've got a particular SQL statement which takes about 30 seconds to perform, and I'm wondering if anyone can see a problem with it, or where I need additional indexing.
The code is on a subform in Access, which shows results dependent on the content of five fields in the master form. There are nearly 5000 records in the table that's being queried. The Access project is stored and run from a terminal server session on the actual SQL server, so I don't think it's a network issue, and there's another form which is very similar that uses the same type of querying...
Thanks
PG
SELECT TabDrawer.DrawerName, TabDrawer.DrawerSortCode, TabDrawer.DrawerAccountNo, TabDrawer.DrawerPostCode, QryAllTransactons.TPCChequeNumber, tabdrawer.drawerref
FROM TabDrawer LEFT JOIN QryAllTransactons ON TabDrawer.DrawerRef=QryAllTransactons.tpcdrawer
WHERE (Forms!FrmSearchCompany!SearchName Is Null
Or [drawername] Like Forms!FrmSearchCompany!SearchName & "*")
And (Forms!FrmSearchCompany.SearchPostcode Is Null
Or [Drawerpostcode] Like Forms!FrmSearchCompany!Searchpostcode & "*")
And (Forms!FrmSearchCompany!SearchSortCode Is Null
Or [drawersortcode] Like Forms!FrmSearchCompany!Searchsortcode & "*")
And (Forms!FrmSearchCompany!Searchaccount Is Null
Or [draweraccountno] Like Forms!FrmSearchCompany!Searchaccount & "*")
And (Forms!FrmSearchCompany!Searchcheque Is Null
Or [tpcchequenumber] Like Forms!FrmSearchCompany!Searchcheque & "*");
");
EDIT
The Hold up seems to be in the union query that forms the QryAllTransactons query.
SELECT
"TPC" AS Type,
TabTPC.TPCRef,
TabTPC.TPCBranch,
TabTPC.TPCDate,
TabTPC.TPCChequeNumber,
TabTPC.TPCChequeValue,
TabTPC.TPCFee,
TabTPC.TPCAction,
TabTPC.TPCMember,
tabtpc.tpcdrawer,
TabTPC.TPCUser,
TabTPC.TPCDiscount,
tabcustomers.*
FROM
TabTPC
INNER JOIN TabCustomers ON TabTPC.TPCMember = TabCustomers.CustomerID
UNION ALL
SELECT
"CTP" AS Type,
TabCTP.CTPRef,
TabCTP.CTPBranch,
TabCTP.CTPDate,
TabCTP.CTPChequeNumb,
TabCTP.CTPAmount,
TabCTP.CTPFee,
TabCTP.CTPAction,
TabCTP.CTPMember,
0 as CTPXXX,
TabCTP.CTPUser,
TabCTP.CTPDiscount,
TABCUSTOMERS.*
FROM
TabCTP
INNER JOIN TabCustomers ON Tabctp.ctpMember = TabCustomers.CustomerID;
I've done a fair bit of work with simple union queries, but never had this before...
Two things. Since this is an Access database with a SQL Server backend, you may find a considerable speed improvement by converting this to a stored proc.
Second, do you really need to return all those fields, especially in the tabCustomers table? Never return more fields than you actually intend to use and you will improve performance.
At first, try compacting and repairing the .mdb file.
Then, simplify your WHERE clause:
WHERE
[drawername] Like Nz(Forms!FrmSearchCompany!SearchName, "") & "*"
And
[Drawerpostcode] Like Nz(Forms!FrmSearchCompany!Searchpostcode, "") & "*"
And
[drawersortcode] Like Nz(Forms!FrmSearchCompany!Searchsortcode, "") & "*"
And
[draweraccountno] Like Nz(Forms!FrmSearchCompany!Searchaccount, "") & "*"
And
[tpcchequenumber] Like Nz(Forms!FrmSearchCompany!Searchcheque, "") & "*"
Does it still run slowly?
EDIT
As it turned out, the question was not clear in that it is an up-sized Access Database with an SQL Server back end-and an Access Project front-end.
This sheds a different light on the whole problem.
Can you explain in more detail how this whole query is intended to be used?
If you use it to populate the RecordSource of some Form or Report, I think you will be able to refactor the whole thing like this:
make a view on the SQL server that returns the right data
query that view with a SQL server syntax, not with Access syntax
let the server sort it out
How many rows are in QryAllTransactons?
If your result returns 0 rows then Access may be able to see that immediately and stop, but if it returns even a single row then it needs to pull in the entire resultset of QryAllTransactons so that it can do the join internally. That would be my first guess as to what is happening.
Your best bet it usually to do joins on SQL Server. Try creating a view that does the LEFT OUTER JOIN and query against that.
Your goal, even when Access is running on the SQL Server itself and minimizes network traffic, is to only send to Access what it absolutely needs. Otherwise a large table will still take up memory, etc.
Have you tried running each of the subqueries in the union? Usually optimizers don't spend much time trying to inspect efficiencies between union elements - each one runs on its own merits.
Given that fact, you could also put the "IF" logic into the procedural code and run each of the tests in some likely order of discovery, without significant additional overhead from more calls.
Get rid of those like operators.
In your case you don't need them. Just check if the field starts with a given value which you can achive whith something like this:
Left([field], Len(value)) = value
This method applied to your query would look like this (did some reformatting for better readability):
SELECT
TabDrawer.DrawerName,
TabDrawer.DrawerSortCode,
TabDrawer.DrawerAccountNo,
TabDrawer.DrawerPostCode,
QryAllTransactons.TPCChequeNumber,
TabDrawer.DrawerRef
FROM
TabDrawer
LEFT JOIN QryAllTransactons
ON TabDrawer.DrawerRef = QryAllTransactons.TpcDrawer
WHERE
(Forms!FrmSearchCompany!SearchName Is Null
Or Left([drawername], Len(Forms!FrmSearchCompany!SearchName)) = Forms!FrmSearchCompany!SearchName)
And
(Forms!FrmSearchCompany.SearchPostcode Is Null
Or Left([Drawerpostcode], Len(Forms!FrmSearchCompany!Searchpostcode)) = Forms!FrmSearchCompany!Searchpostcode)
And
(Forms!FrmSearchCompany!SearchSortCode Is Null
Or Left([drawersortcode], Len(Forms!FrmSearchCompany!Searchsortcode)) = Forms!FrmSearchCompany!Searchsortcode)
And
(Forms!FrmSearchCompany!Searchaccount Is Null
Or Left([draweraccountno], Len(Forms!FrmSearchCompany!Searchaccount)) = Forms!FrmSearchCompany!Searchaccount)
And
(Forms!FrmSearchCompany!Searchcheque Is Null
Or Left([tpcchequenumber], Len(Forms!FrmSearchCompany!Searchcheque)) = Forms!FrmSearchCompany!Searchcheque)
Note that you're comparing case sensitive. I'm not totally sure if the like operator in MS-Access is case insensitive. Convert both strings to upper- or lowercase, if needed.
When you upsized did you make sure the tables were properly indexed? Indexes will speed queries tremendously if used properly (note they may also slow down inserts/updates/deletes, so choose carefully what to index)