I am looking for help on how to speed up the code bit below because as it stands, it is taking too long to perform the task. Any suggestions would be much appreciated. Thanks in advance!
The code bit below is a stripped down version of the actual version but all the important guts should be there. The code works; however, the code is really slow on even a modest size dataset. Needless to say, the primary culprit is the second, nested recordset/SQL call. The LIKE operator is part of the slowdown but I'm more concerned about the nesting and I think the LIKE operator will be required in what we're trying to accomplish. I tried nesting the second SQL call into the first but I didn't see a clean way of doing so.
Platform: Classic ASP, VBScript, MS Access DB
' Go through all people in the table.
sql1 = "SELECT ID, FN, LN, Email FROM Table1"
Call rst1.Open(sql1, cnx, 0, 1)
While Not rst1.EOF
id = rst1.Fields("ID").Value
fn = rst1.Fields("FN").Value
ln = rst1.Fields("LN").Value
email = rst1.Fields("Email").Value
If IsNull(email) Or IsEmpty(email) Then
email = ""
End If
' ----- Figure out if any other people in the table has a similar name or is using the same e-mail address.
' Capture both the ID of those other people as well as figure out the total number of possible duplicates.
sql2 = "SELECT ID FROM Table1"
sql2 = sql2 & " WHERE"
sql2 = sql2 & " ID <> " & id
sql2 = sql2 & " AND"
sql2 = sql2 & " ("
sql2 = sql2 & " FN & ' ' & LN LIKE '%" & Replace(fn & " " & ln, "'", "''") & "%'"
If email <> "" Then
sql2 = sql2 & " OR"
sql2 = sql2 & " Email LIKE '%" & Replace(email, "'", "''") & "%'"
End If
sql2 = sql2 & " )"
Call rst2.Open(sql2, cnx, 0, 1)
numDups = 0
possibleDups = ""
While Not rst2.EOF
numDups = numDups + 1
If possibleDups <> "" Then
possibleDups = possibleDups & ", "
End If
possibleDups = possibleDups & rst2.Fields("ID").Value
Call rst2.MoveNext()
Wend
Call rst2.Close()
' ----- End nest query.
Call Response.Write(fn & " " & ln & " has " & numDups & " possible duplicates (" & possibleDups & ")")
Call rst1.MoveNext()
Wend
Call rst1.Close()
Update 1:
Per request, here is a bit more info on the sample data and the expected output. Table1 is basically a table with the fields: id, fn, ln, email. id is an autogenerated ID representing the entry and fn/ln represent the first/last name, respectively, of the person's entry. Expected output is as coded, e.g.,...
John Doe has 3 possible duplicates (1342, 3652, 98325)
John Doe has 3 possible duplicates (986, 3652, 98325)
John Doe has 3 possible duplicates (986, 1342, 98325)
John Doe has 3 possible duplicates (986, 1342, 3652)
Sam Jones has 0 possible duplicates ()
Jane Smith has 2 possible duplicates (234, 10562)
Jane Smith has 2 possible duplicates (155, 10562)
Jane Smith has 2 possible duplicates (155, 234)
The numbers in parentheses correspond to the id's that appear to be duplicates to each person. A possible duplicate is a scenario in which another entry in the same table appears to share the same name or e-mail. For example, there could be 4 John Doe's and 3 Jane Smith's in the table based on name alone.
Ideally, only one SQL query is required to reduce the roundtrip induced by the recordset call but Access is limited compared to regular SQL Server as far as features and I'm not sure what I'm missing that might help speed this up.
Update 2:
Using the SQL Fiddle by #Abecee, I was able to get a faster query. However, I am now encountering two problems as a result.
The big picture view is still the same. We are looking for possible duplicates based on first name, last name, and e-mail address. However, we also added a search criteria, which are the lines wrapped inside of If searchstring <> "" Then ... End If. Also, note that the e-mail info is now being pulled from a separate table called EmailTable with the fields id, IndividualID (representing Table1.id), and email.
Mods: The updated query is similar but slightly different from the original query above. I'm not sure if it's better to create a whole new question or not, so I'll just leave this here for now. Let me know if I should move this to its own question.
If the code associated with comment A below is uncommented sql1 = sql1 & " OR (INSTR(E1.Email, E2.Email) > 0) ", I get an error message: Microsoft JET Database Engine (0x80040E14) Join expression not supported. The query seems to be coded correctly so what is missing or incorrect?
If the code associated with comment B below is uncommented sql1 = sql1 & " OR INSTR(E1.Email, '" & Replace(searchstring, "'", "''") & "') > 0", the query runs but it hangs. I tried dropping the query directly into Access to see if it'll work (e.g., New Query > SQL View) but it also hangs from within Access. I think the syntax and logic are correct but obviously something is askew. Do you see what or why it would hang with this line of code?
Here is the updated query:
sql1 = sql1 & "SELECT "
sql1 = sql1 & " T1.ID, T1.FN, T1.LN, E1.Email, "
sql1 = sql1 & " T2.ID, T2.FN, T2.LN "
sql1 = sql1 & "FROM "
sql1 = sql1 & " ((Table1 T1 LEFT JOIN [SELECT E1.* FROM EmailTable E1 WHERE E1.Primary = True]. AS E1 ON T1.ID = E1.IndividualID)"
sql1 = sql1 & " LEFT JOIN (Table1 T2 LEFT JOIN EmailTable E2 ON T2.ID = E2.IndividualID) "
sql1 = sql1 & " ON "
sql1 = sql1 & " ("
sql1 = sql1 & " T1.ID <> T2.ID "
sql1 = sql1 & " AND "
sql1 = sql1 & " ("
sql1 = sql1 & " ((INSTR(T1.FN, T2.FN) > 0) AND (INSTR(T1.LN, T2.LN) > 0)) "
' A. When the following line is uncommented, error is "Join expression not supported."
' sql1 = sql1 & " OR (INSTR(E1.Email, E2.Email) > 0) "
sql1 = sql1 & " ) "
sql1 = sql1 & " ) "
sql1 = sql1 & " ) "
If searchstring <> "" Then
sql1 = sql1 & " WHERE "
sql1 = sql1 & " INSTR(T1.FN & ' ' & T1.LN, '" & Replace(searchstring, "'", "''") & "') > 0"
' B. When the following line is uncommented, code hangs on the rst1.open() call."
' sql1 = sql1 & " OR INSTR(E1.Email, '" & Replace(searchstring, "'", "''") & "') > 0"
End If
sql1 = sql1 & " ORDER BY T1.LN, T1.FN, T1.ID"
prevID = 0
Call rst1.Open(sql1, cnx, 0, 1)
While Not rst1.EOF
id = rst1.Fields("ID").Value
' Get initial values if we've come across a new ID.
If (id <> prevID) Then
fn = rst1.Fields("T1.FN").Value
ln = rst1.Fields("T1.LN").Value
email = rst1.Fields("Email").Value
If IsNull(email) Or IsEmpty(email) Then
email = ""
End If
' Reset the counter for how many possible duplicates there are.
numDups = 0
' If there is an ID from the second table, then keep track of this possible duplicate.
tmp = rst1.Fields("T2.ID").Value
If IsNumeric(tmp) Then
tmp = CLng(tmp)
Else
tmp = 0
End If
If tmp > 0 Then
numDups = numDups + 1
possibleDups = possibleDups & tmp
End If
End If
' Figure out if we should show this row. Within this logic, we'll also see if there is another possible duplicate.
showrow = False
Call rst1.MoveNext()
If rst1.EOF Then
' Already at the end of the recordset so show this row.
showrow = True
Call rst1.MovePrevious()
Else
If rst1.Fields("T1.ID") <> lngIndividualIDCurrent Then
' Next record is different T1, so show this row.
showrow = True
Call rst1.MovePrevious()
Else
' Next record is the same T1, so don't show this row but note the duplicate.
Call rst1.MovePrevious()
' Also, add the new T2 as a possible duplicate.
tmp = rst1.Fields("T2.ID").Value
If IsNumeric(tmp) Then
tmp = CLng(tmp)
Else
tmp = 0
End If
If tmp > 0 Then
numDups = numDups + 1
If possibleDups <> "" Then
possibleDups = possibleDups & ", "
End If
possibleDups = possibleDups & tmp
End If
End If
End If
If showrow Then
Call Response.Write(fn & " " & ln & " has " & numDups & " possible duplicates (" & possibleDups & ")")
End If
Call rst1.MoveNext()
prevID = id
Wend
Call rst1.Close()
Yes, that's going to be slow because LIKE '%whatever%' is not sargable. So, if [Table1] has 1,000 rows then at best you'll be retrieving the other 999 rows for each row in the table, which means that you'll be pulling 999,000 rows in total.
A few observations:
You are performing the comparisons for every row in the table against every other row. That would be something that you might want to do one time only to find possible dups in legacy data, but as part of the normal operation of an application we would expect to compare one record against all of the others (i.e. the one record that you are inserting or updating).
You are looking for rows WHERE 'fn1 ln1' LIKE('%fn2 ln2%'). How is that significantly different from WHERE fn1=fn2 AND ln1=ln2? That would be sargable, so if you had indexes on [FN] and [LN] then that could speed things up a great deal.
You really should NOT be using an Access database as the back-end for a web application (ref: here).
Related
I am trying to compile my code but I get the same error every time:
Syntax error (missing operator ) in query expression
' True Status.Subsystem Not LIKE '''
This is my code :
Sub Import_Loop_Check_list()
Dim strSQL As String
Dim SS_sel As String
Dim rcrd As DAO.Recordset
If IsNull(Cobsubsystem) Then
SS_sel = "True"
Else
If IsNull(Logic1) Then
SS_sel = "Status.Subsystem LIKE '" & Cobsubsystem & "' "
Else
SS_sel = "Status.Subsystem NOT LIKE '" & Cobsubsystem & "' "
End If
End If
strSQL = " SELECT DISTINCT LOOP_JB.Loop_name, [Easyplant Dump query].Subsystem, LOOP_JB.PANEL_FROM, LOOP_JB.ITR_PANEL_FROM, LOOP_JB.ITR_PANEL_FROM_state, LOOP_JB.CABLE_NUM, LOOP_JB.ITR_cable, LOOP_JB.ITR_STATE_Cable, LOOP_JB.Cabinet_JB, LOOP_JB.ITR_Cabinet_JB, LOOP_JB.ITR_STATE_CABINET, Multicors.CABLE_NUM AS Multicore, Multicors.ITR_PANEL_FROM, Multicors.ITR_PANEL_FROM_state, [Cabinet query].PANEL_TO, [Cabinet query].ITR_PANEL_TO, [Cabinet query].ITR_PANEL_TO_state INTO [LOOP_Check] " & _
" FROM (LOOP_JB INNER JOIN ([Cabinet query] RIGHT JOIN Multicors ON [Cabinet query].CABLE_NUM = Multicors.CABLE_NUM) ON LOOP_JB.Loop_name = Multicors.Loop_name) INNER JOIN [Easyplant Dump query] ON LOOP_JB.Loop_name = [Easyplant Dump query].Clean_Tag_Number" & _
" WHERE True " & SS_sel & strSQL
DoCmd.SetWarnings False
DoCmd.RunSQL strSQL
DoCmd.SetWarnings True
DoCmd.OpenTable "LOOP_Check"
End Sub
I looked at this again -- something else does not make sense.
You reference Status.Subsystem in the WHERE but there is no table named Status -- Did you not include the full query?
original answer
I think the error message is clear -- you have a strange where statement
WHERE True Status.Subsystem Not LIKE
you probably mean
WHERE Status.Subsystem Not LIKE
so change this line
" WHERE True " & SS_sel & strSQL
to this
" WHERE " & SS_sel & strSQL
Also, it does not right to me -- are you sure you want to do a RIGHT join to Multicors and not a left join? You want a row in your result for every row in the multicors table?
EDIT: I need to add some context. Below is a small sample table. I am using 200 columns across 30,000 rows.
I am in Excel and Access. I am fairly good at Excel but not so good with Access.
I have two very large, Excel crashing, tables with matching criteria across multiple rows. I would like to return the change in amounts between the two table for each column by matching three criteria. See picture below.
Between the two tables the rows do not flow in the same direction and one table has a lot of extra rows with no corresponding values in the other.
What is my best option?
You simply need to join the tables together and handle the "missing" values. Assuming - really means NULL, you can do:
select t1.[group], t1.pu, t1.currency,
nz(t1.[8345], 0) - nz(t2.[8345], 0) as diff_8345,
nz(t1.[6789], 0) - nz(t2.[6789], 0) as diff_6789,
nz(t1.[4589], 0) - nz(t2.[4589], 0) as diff_4589
from table1 as t1 inner join
table2 as t2
on t1.[group] = t2.[group] and
t1.pu = t2.pu and
t1.currency = t2.currency;
Hopefully I didn't make any typos in here, but the idea is you want to do a FULL OUTER JOIN so you don't lose any data from either table. then you just need to subtract one value from the other after specifying with the "ON" clause on how to join the two tables. Since some data may have info from Table 2 and Not from Table1, I did a check to see if it found anything, if not, I pull from the second table.
Side note. Be careful with columns that use reserved words (GROUP). I put brackets around them, so it should recognize it as a column name
SELECT
IIF(ISNULL(T1.[group]), T2.[group], T1.[group]) AS [group],
IIF(ISNULL(T1.[PU]), T2.[PU], T1.[PU]) AS [PU],
IIF(ISNULL(T1.[currency]), T2.[currency], T1.[currency]) AS [currency],
IIF(ISNULL(T2.[8345]), 0, T2.[8345]) - IIF(ISNULL(T1.[8345]), 0, T1.[8345]) AS [8345],
IIF(ISNULL(T2.[6789]), 0, T2.[6789]) - IIF(ISNULL(T1.[6789]), 0, T1.[6789]) AS [6789],
IIF(ISNULL(T2.[4589]), 0, T2.[4589]) - IIF(ISNULL(T1.[4589]), 0, T1.[4589]) AS [4589]
FROM Table1 T1
FULL OUTER JOIN TABLE2 T2
ON T1.[group] = T2.[group]
AND T1.[PU] = T2.[PU]
AND T1.[currency] = T2.[currency]
In sql, "group" is a reserved word, so change the field name "group" to "igroup".
Try,
Dim Ws As Worksheet
Dim strSQL As String
Dim Rs As Object
Sub test()
Dim vR As Variant
Dim str As String
Dim i As Long
str = "Select [igroup], iif(isnull([8345]),0,[8345] ), " & _
"iif(isnull([6789]), 0,[6789] ), " & _
"iif(isnull([4589]), 0,[4589] ) " & _
"from [Table2$] "
getRs str
vR = Rs.getrows
Rs.Close
Set Rs = Nothing
For i = LBound(vR, 2) To UBound(vR, 2)
str = "Update [Table1$] "
str = str & "set [8345] = iif(isnull([8345]),0,[8345] ) - " & vR(1, i) & ", "
str = str & "[6789] = iif(isnull([6789]), 0,[6789] ) - " & vR(2, i) & ", "
str = str & "[4589] = iif(isnull([4589]), 0,[4589] ) - " & vR(3, i) & " "
str = str & " Where [igroup] ='" & vR(0, i) & "' "
getRs str
Next i
End Sub
Sub getRs(strQuery As String)
Dim strConn As String
strConn = "Provider=Microsoft.ACE.OLEDB.12.0;" & _
"Data Source=" & ThisWorkbook.FullName & ";" & _
"Extended Properties=Excel 12.0;"
Set Rs = CreateObject("ADODB.Recordset")
Rs.Open strQuery, strConn
End Sub
I have a system that places an order in ASP, and when an order is submitted, a notification email goes out.
When I submit the order, I get error '80020009' and the line it points to is this:
email_text = email_text & "<html>" & _
That's a simple line that just starts building the html string that fills the email! Isn't error '80020009' supposed to be for SQL statements? Or am I missing something?
The page still continues on and completes the order process, it just shows that error message first.
I realize this question doesn't really provide much detail, but I don't know what else to specify, I'm just at a loss.
EDIT: Here's some SQL that is a few lines up from that line. Not sure how this would cause the issue, since it's telling me it's on that email line, but it can't hurt to throw it in:
str = "SELECT creation_date, supplier FROM argus.PURCHASE_ORDER WHERE purchase_order = '" & Trim(Request.Form("order_id")) & "' AND customer = '" & Session("customer_id") & "' AND store = '" & Session("store_id") & "' AND userid = '" & Session("user_id") & "'"
Set rst = ConnFW.Execute(str)
str2 = "SELECT a.store_name, a.address1, a.address2, a.city, a.state_cd, a.zipcode, b.customer_name, c.supplier_account FROM argus.STORE a, argus.CUSTOMER b, argus.ELIGIBLE_SUPPLIER c WHERE a.customer = b.customer AND a.customer = c.customer AND a.store = " & Session("store_id") & " AND b.customer = " & Session("customer_id") & " AND c.supplier = " & Trim(rst("supplier")) & ""
Set rst2 = ConnFW.Execute(str2)
Can you provide additional code? There isn't anything wrong with your code except you need to make sure you have something on the line following the underscore:
email_text = email_text + "<html>" & _
""
-- EDIT
Thanks for posting your edits. I see a couple potential issues. In your second recordset object, your trying to access rst("supplier"), but that field hasn't been read yet. Instead of using connfw.execute(str) which is used to execute sql statements like INSERT, UPDATE and DELETE, use rst.open which is used with SELECT.
Try something like:
supplier = ""
rst.open str, connfw
if not rst.eof then
supplier = rst("supplier")
end if
rst.close
Then use supplier in your 2nd sql statement. Also, if the supplier field from the eligible_supplier table is a string, you need to wrap single quotes around that field in your where clause.
I have a listbox select and I want when the user selects null for the empty string it produces to pull the nulls from the SQL table.
Here's what I have now. Blank strings return nothing because there are no empty fields in the table.
SELECT * FROM dbo.Table WHERE ID = " & TextBox2.Text & " and And Field1 IN (" & Msg1 & ")
How do I code that?
Use an If statement. When your textbox is empty, have the SQL string contain "ID is null" instead of appending the textbox's value.
If (TextBox1.Text = "") Then
' use Is Null in your sql statement
Else
' use the textbox text value in your sql statement
End If
(Assuming you're talking about the textbox and not whatever Msg1 is.)
dim sql
If (TextBox2.Text = null) Then
sql = "SELECT * FROM dbo.Table WHERE ID is null and And Field1 IN (" & Msg1 & ")"
Else
sql = "SELECT * FROM dbo.Table WHERE ID = " & TextBox2.Text & " and And Field1 IN (" & Msg1 & ")"
End If
See #John Saunders comment, you are risking sql injections. When passing paremeter to an sql query, be sure to use parameters, and not concatenating strings.
SELECT * FROM dbo.Table WHERE ID = " & TextBox2.Text &
" And Field1 " & IIF(Trim(Msg1) = "", "IS NULL", "IN (" & Msg1 & ")")
Yes, it is crude.
One should not write query in this style.
EDIT: Corrected. Please check.
I use an SQL statement to remove records that exist on another database but this takes a very long time.
Is there any other alternative to the code below that can be faster? Database is Access.
email_DB.mdb is from where I want to remove the email addresses that exist on the other database (table Newsletter_Subscribers)
customers.mdb is the other database (table Customers)
SQLRemoveDupes = "DELETE FROM Newsletter_Subscribers WHERE EXISTS (select * from [" & strDBPath & "Customers].Customers " _
& "where Subscriber_Email = Email or Subscriber_Email = EmailO)"
NewsletterConn = "Driver={Microsoft Access Driver (*.mdb)};DBQ=" & strDBPath & "email_DB.mdb"
Set MM_editCmd = Server.CreateObject("ADODB.Command")
MM_editCmd.ActiveConnection = NewsletterConn
MM_editCmd.CommandText = SQLRemoveDupes
MM_editCmd.Execute
MM_editCmd.ActiveConnection.Close
Set MM_editCmd = Nothing
EDIT: Tried the SQL below from one of the answers but I keep getting an error when running it:
SQL: DELETE FROM Newsletter_Subscribers WHERE CustID IN (select CustID from [" & strDBPath & "Customers].Customers where Subscriber_Email = Email or Subscriber_Email = EmailO)
I get a "Too few parameters. Expected 1." error message on the Execute line.
I would use WHERE Subscriber_Email IN (Email, Email0) as the WHERE clause
SQLRemoveDupes = "DELETE FROM Newsletter_Subscribers WHERE EXISTS " & _
(select * from [" & strDBPath & "Customers].Customers where Subscriber_Email IN (Email, EmailO)"
I have found from experience that using an OR predicate in a WHERE clause can be detrimental in terms of performance because SQL will have to evaluate each clause separately, and it might decide to ignore indexes and use a table scan. Sometime it can be better to split it into two separate statements. (I have to admit I am thinking in terms of SQL Server here, but the same may apply to Access)
"DELETE FROM Newsletter_Subscribers WHERE EXISTS " & _
(select * from [" & strDBPath & "Customers].Customers where Subscriber_Email = Email)"
"DELETE FROM Newsletter_Subscribers WHERE EXISTS " & _
(select * from [" & strDBPath & "Customers].Customers where Subscriber_Email = EmailO)"
Assuming there's an ID-column present in the Customers table, the following change in SQL should give better performance:
"DELETE FROM Newsletter_Subscribers WHERE ID IN (select ID from [" & strDBPath & "Customers].Customers where Subscriber_Email = Email or Subscriber_Email = EmailO)"
PS. The ideal solution (judging from the column names) would be to redesign the tables and code logic of inserting emails in the first place. DS
Try adding an Access Querydef and calling that.
It sounds like you do not have an index on the subscriber_enail field. This forces a table scan ( or several). Add an index on this field and you should see significant improvement.
I would have coded the query
DELETE FROM Newsletter_Subscribers where (Subscriber_Email = Email or Subscriber_Email = EMail0)
I would try splitting this into two separate statements with separate database connections.
First, fetch the list of email addresses or IDs in the first database (as a string).
Second, construct a WHERE NOT IN statement and run it on the second database.
I would imagine this would be much faster as it does not have to interoperate between the two databases. The only possible issue would be if there are thousands of records in the first database and you hit the maximum length of a sql query string (whatever that is).
Here are some useful functions for this:
function GetDelimitedRecordString(sql, recordDelimiter)
dim rs, str
set rs = db.execute(sql)
if rs.eof then
str = ""
else
str = rs.GetString(,,,recordDelimiter)
str = mid(str, 1, len(str)-len(recordDelimiter))
end if
rs.close
set rs = nothing
GetDelimitedRecordString = str
end function
function FmtSqlList(commaDelimitedStringOrArray)
' converts a string of the format "red, yellow, blue" to "'red', 'yellow', 'blue'"
' useful for taking input from an html form post (eg a multi-select box or checkbox group) and using it in a SQL WHERE IN clause
' prevents sql injection
dim result:result = ""
dim arr, str
if isArray(commaDelimitedStringOrArray) then
arr = commaDelimitedStringOrArray
else
arr = split(commaDelimitedStringOrArray, ",")
end if
for each str in arr
if result<>"" then result = result & ", "
result = result & "'" & trim(replace(str&"","'","''")) & "'"
next
FmtSqlList = result
end function