I'm writing a piece of code (VB.NET) to cleanse a (quite big) table of data.
I am connecting to my SQL database, looping through the table, cleansing the data and adding the cleansed data in a different column.
As i'm currently doing an update to my database for each record in the same loop as where i am cleansing the data, i am wondering if there is a more efficient way of doing this, where i would cleanse the data and afterwards send all the updated records to the database in one go.
Simplified code:
'Connect
SQLConn.ConnectionString = strConnection
SQLConn.Open()
SQLCmd.Connection = SQLConn
SQLConn2.ConnectionString = strConnection
SQLConn2.Open()
SQLCmd2.Connection = SQLConn2
'Set query
strSQL = "SELECT Column1 FROM Table1"
SQLCmd.CommandText = strSQL
'Load Query
SQLdr = SQLCmd.ExecuteReader
'Start Cleansing
While SQLdr.Read
Cleansing()
'Add to database
strSQL2 = "UPDATE Table1 SET Clean_data = '" & strClean & "' WHERE Dirty_Data = '" & SQLdr(0).ToString & "'"
SQLCmd2.CommandText = strSQL2
SQLCmd2.ExecuteNonQuery()
End While
'Close Connections
SQLdr.Close()
SQLConn.Close()
SQLConn2.Close()
I'm guessing (from searching for a solution) that it is possible to do the update outside of my loop, but i can't seem to find how to do it specifically.
Many thanks!
Your code is taking a long time because the update is doing a full table scan for every record. You can speed it up by adding an index on the column "Dirty Data".
Essentially, you are reading the data in the select statement. Cleaning one row, and then updating it. The preferred "set-based" approach is more like:
Ideally, you would like to do:
update table1
set column1 = <fix the dirty data>
where column1 <is dirty>
And you have some options in SQL, in terms of replace() and case and like (for instance) that can help with this process.
But you already have the cleaning code external to the database. For this, you want to create and open a cursor, process the record, and then write back. Cursors are relatively slow, compared to in-database operations. But, this is exactly the situation they were designed for -- external code to be applied to individual records.
Related
I have a order creation form in an Access database where the user selects a product and VBA code is triggered with SQL select statement to retrieve the current availability of that product. This is how it's set up:
I have a Packages table where products batches are added to inventory.
I have an OrderDetail table where items from product batches are allocated to orders.
I have a InventoryPrep query with a the total packaged per batch and field that sums the number of allocated products per batch from the OrderDetail table.
Then I have an Inventory query that that has a calculated field that takes the TotalPackaged field from the InventoryPrep query and subtracts the TotalAllocated field from the InventoryPrep query.
Here is the VBA code in my form, triggered by an update to the [Batch] combo box:
Dim VBatch As String
VBatch = Me.Batch.Value
Dim VAvail As Double
Dim mySQL As String
Dim conn1 As ADODB.Connection
Set conn1 = CurrentProject.Connection
Dim rs1 As New ADODB.Recordset
rs1.ActiveConnection = conn1
mySQL = "SELECT Available FROM Inventory WHERE BatchID = " & "'" & VBatch & "'"
rs1.Open mySQL
rs1.MoveFirst
VAvail = rs1.Fields("Available").Value
Forms!ChangeOrders.ChangeOrderSubform.Form.Availability.Value = VAvail
rs1.Close
conn1.Close
Set rs1 = Nothing
Set conn1 = Nothing
This has been working just fine for weeks, retreiving the correct available amount as packaged items are added to the Packages table and orders are being added in the OrderDetail table. Yesterday it started returning the Packaged field from the InventoryPrep query instead.
I tried a bunch of things and then created a table from the query and used the SELECT statement to look it up in the table. That worked. There is something about my query set up that has caused it to stop recognizing my calculated field. I need help!
This is my first time posting and I hope this is enough information. I'm pretty new to Access and VBA but I've learned a lot from reading in this forum. I hope someone can help or let me know what other information could shed light on the problem.
To read a single value from a table or query, your code is a bit over the top.
For this scenario, Access has the DLookup function.
VAvail = DLookup("Available", "Inventory", "BatchID = '" & VBatch & "'")
Forms!ChangeOrders.ChangeOrderSubform.Form.Availability.Value = VAvail
That's all that is needed.
I have a form in which one of the ComboBoxes lists all the documents of a given project. The user should select one and after pressing a button, and if present in Table Dessinsit opens a second form showing that record. If it is not present in that table, I want to add it in.
One of my collegues told me all I had to do was to execute an SQL query with VBA. What I have so far is this:
Dim rsDessin As DAO.Recordset
Dim strContrat As String
Dim strProjet As String
Dim strDessin As String
Dim sqlquery As String
'I think these next 3 lines are unimportant. I set a first query to get information I need from another table
strDessin = Me.Combo_Dessin
strProjet = Me.Combo_Projet
sqlquery = "SELECT [Projet HNA] FROM [Projets] WHERE [Projet AHNS] = '" & strProjet & "'"
Set rsDessin = CurrentDb.OpenRecordset(sqlquery)
If Not rsDessin.RecordCount > 0 Then 'If not present I want to add it
strContrat = rsDessin![Projet HNA]
sqlquery = "INSERT INTO Feuilles ([AHNS], [Contrat], [No Projet]) VALUES (strDessin, strContrat, strDessin)"
'Not sure what to do with this query or how to make sure it worked.
End If
'Checking my variables
Debug.Print strProjet
Debug.Print strContrat
Debug.Print strDessin
'By here I'd like to have inserted my new record.
rsDessin.Close
Set rsDessin = Nothing
I also read online that i could achieve a similar result with something like this:
Set R = CurrentDb.OpenRecordset("SELECT * FROM [Dessins]")
R.AddNew
R![Contrat] = strContrat
R![Projet] = strProjet
R![AHNS] = strDessin
R.Update
R.Close
Set R = Nothing
DoCmd.Close
Is one way better than the other? In the case where my INSERT INTO query is better, what should I do to execute it?
You're asking which is preferable when inserting a record: to use an SQL statement issued to the Database object, or to use the methods of the Recordset object.
For a single record, it doesn't matter. However, you could issue the INSERT statement like this:
CurrentDb.Execute "INSERT INTO Feuilles ([AHNS], [Contrat], [No Projet]) VALUES (" & strDessin & ", " & strContrat & ", " & strDessin & ")", dbFailOnError
(You should use the dbFailOnError option to catch certain errors, as HansUp points out in this answer.)
For inserting multiple records from another table or query, it is generally faster and more efficient to issue an SQL statement like this:
Dim sql = _
"INSERT INTO DestinationTable (Field1, Field2, Field3) " & _
"SELECT Field1, Field2, Field3 " & _
"FROM SourceTable"
CurrentDb.Execute sql
than the equivalent using the Recordset object:
Dim rsSource As DAO.Recordset, rsDestination As DAO.Recordset
Set rsSource = CurrentDb.OpenRecordset("SourceTable")
Set rsDestination = CurrentDb.OpenRecordset("DestinationTable")
Do Until rs.EOF
rsDestination.AddNew
rsDestination!Field1 = rsSource!Field1
rsDestination!Field2 = rsSource!Field2
rsDestination!Field3 = rsSource!Field3
rsDestination.Update
rs.MoveNext
Loop
That said, using an SQL statement has its limitations:
You are limited to SQL syntax and functions.
This is partially mitigated in Access, because SQL statements can use many VBA built-in functions or functions that you define.
SQL statements are designed to work on blocks of rows. Per-row logic is harder to express using only the Iif, Choose, or Switch functions; and logic that depends on the current state (e.g. insert every other record) is harder or impossible using pure SQL. This can be easily done using the Recordset methods approach.
This too can be enabled using a combination of VBA and SQL, if you have functions that persist state in module-level variables. One caveat: you'll need to reset the state each time before issuing the SQL statement. See here for an example.
One part* of your question asked about INSERT vs. Recordset.AddNew to add one row. I suggest this recordset approach:
Dim db As DAO.Database
Dim R As DAO.Recordset
Set db = CurrentDb
Set R = db.OpenRecordset("Dessins", dbOpenTable, dbAppendOnly)
With R
.AddNew
!Contrat = rsDessin![Projet HNA].Value
!Projet = Me.Combo_Projet.Value
!AHNS = Me.Combo_Dessin.Value
.Update
.Close
End With
* You also asked how to execute an INSERT. Use the DAO.Database.Execute method which Zev recommended and include the dbFailOnError option. That will add clarity about certain insert failures. For example, a key violation error could otherwise make your INSERT fail silently. But including dbFailOnError ensures you get notified about the problem immediately. So always include that option ... except in cases where you actually want to allow an INSERT to fail silently. (For me, that's never.)
I recently came across vba update statements and I have been using Recordset.Edit and Recordset.Update to not only edit my existing data but to update it.
I want to know the difference between the two: recordset.update and Update sql Vba statement. I think they all do the same but I can't figure which one is more efficient and why.
Example code below:
'this is with sql update statement
dim someVar as string, anotherVar as String, cn As New ADODB.Connection
someVar = "someVar"
anotherVar = "anotherVar"
sqlS = "Update tableOfRec set columna = " &_
someVar & ", colunmb = " & anotherVar &_
" where columnc = 20";
cn.Execute stSQL
This is for recordset (update and Edit):
dim thisVar as String, someOthVar as String, rs as recordset
thisVar = "thisVar"
someOthVar = "someOtherVar"
set rs = currentDb.openRecordset("select columna, columnb where columnc = 20")
do While not rs.EOF
rs.Edit
rs!columna = thisVar
rs!columnb = someOthvar
rs.update
rs.MoveNext
loop
Assuming WHERE columnc = 20 selects 1000+ rows, as you mentioned in a comment, executing that UPDATE statement should be noticeably faster than looping through a recordset and updating its rows one at a time.
The latter strategy is a RBAR (Row By Agonizing Row) approach. The first strategy, executing a single (valid) UPDATE, is a "set-based" approach. In general, set-based trumps RBAR with respect to performance.
However your 2 examples raise other issues. My first suggestion would be to use DAO instead of ADO to execute your UPDATE:
CurrentDb.Execute stSQL, dbFailonError
Whichever of those strategies you choose, make sure columnc is indexed.
The SQL method is usually the fastest for bulk updates, but syntax is often clumsy.
The VBA method, however, has the distinct advantages, that code is cleaner, and the recordset can be used before or after the update/edit without requering the data. This can make a huge difference if you have to do long-winded calculations between updates. Also, the recordset can be passed ByRef to supporting functions or further processing.
I have found that when I need to update every record in a table in order, such as adding a sequential ID when using Autonumber is not feasible, adding a running total, or any calculation that is incremental based on some value in the recordset, that the DAO method is much faster.
If your data is not in the order you need it processed in, and you instead need to rely on matching values to the data source, then SQL is much more efficient.
Dim rs As DAO.Recordset
Set rs = CurrentDb.OpenRecordset("select invoice_num from dbo_doc_flow_data where barcode = '" & Me.barcode_f & "'")
Do While Not rs.EOF
rs.Edit
rs!invoice_num = Me!invoice_num_f
rs.Update
rs.MoveNext
Loop
rs.Close
I've been fumbling my way though writing my first application with SQL database access and I've been getting along ok with single commands using system.data.sqlclient.sqlcommand and doing something like:
SQLCmd.CommandText = ("DELETE from ContactRelationships WHERE ID = 'someid'")
SQLCmd.ExecuteNonQuery()
Or to store changes on a form and then save or cancel them:
Dim x As New SqlCommand("DELETE from SharedDataLocations where Location = #Loc and UserID= #UID ;--", cnn)
x.Parameters.Add("#Loc", SqlDbType.NVarChar, 300).Value = strRemDLoc
x.Parameters.Add("#UID", SqlDbType.Int).Value = UserID
PendingSQLChanges.Add(x)
'followed later by
For x = 0 To PendingSQLChanges.Count - 1
PendingSQLChanges(x).ExecuteNonQuery()
Next
PendingSQLChanges.Clear()
I haven't tackled anything more complex than that yet but I'm willing to learn. What I need to do now is take the id for the current STAFF record and see if it is already set in the STAFFMANAGERS relationship table and either update it with the just selected id from the MANAGERS table or create the record if a manager wasn't previously set. Both staff and manager ids are stored in form variables at this point so can just be inserted into any SQL commands.
I've seen various ways I could do this such as adding multiple lines to a SQLCmd.commandtext (although I'm not sure on format for this) but I have no idea if this is a foolproof solution or prone to problems or if it will even work.
Without stretching too far out of my current experience (or giving me an in depth explanation of something more complex) how can I best accomplish this?
I am not sure if this will help but, I have something similar in my VB.NET SQL code. What I did is use a IF THEN statement to check if the record exists. If there is no record then it will INSERT it with a new UID, but if the UID already exists it will UPDATE.
Public Class YourForm
Private UniqueID as Integer
Private Sub
If AddRecord Then
SQLCmd.CommandText = "INSERT into YourTableName (YourColumnName1,YourColumnName2) values (" & _
"'" & YourTextBox.Text & "')")
Else
RunSql ("Update YourTableName set YourColumnName='" & YourTextBox.Text & "')" & _
" where YourUID=" & UniqueID)
End Sub
End Class
Maybe you need the MERGE statement.
I am a beginner at this. But let me explain what I need to do and show you my code
I have a CSV file.
inside the CSV I have a projectnumber, city,state,country
I have a SQL table with the same column
I want to use vb.net to check if projectnumber exists in sql table
if exists then I want to run update statement.
if it does not exists then I want to run insert statement.
I have the program working . but I am just wondering if this would be the correct way or my code is some hack way of doing it.
LEGEND:
DTTable is data table with CSV inside
DT is data table with SQL result data
First I fill insert all lines in the CSV into a data table
Dim parser As New FileIO.TextFieldParser(sRemoteAccessFolder & "text.csv")
parser.Delimiters = New String() {","}
parser.ReadLine()
Do Until parser.EndOfData = True
DTTable.Rows.Add(parser.ReadFields())
Loop
parser.Close()
then I use oledbdataadapter to run the select query and fill another data table with the result of the select statement
SQLString = "select * from tblProjects where ProjectID='" & DTTable.Rows.Item(i).Item("ProjectNumber") & "'"
da = New OleDb.OleDbDataAdapter(SQLString, Conn)
da.Fill(dt)
then I run if statement
If dt.Rows.Count = 0 then
SQLString = "INSERT STATEMENT HERE"
oCmd = New OleDb.OleDbCommand(SQLString, Conn)
oCmd.ExecuteNonQuery()
Else
SQLString = "UPDATE STATEMENT HERE"
oCmd = New OleDb.OleDbCommand(SQLString, Conn)
oCmd.ExecuteNonQuery()
End if
ALL above code is run inside a for loop, to go through all the lines in the CSV
For i = 0 To DTTable.Rows.Count - 1
what do you think?
please advise
thank you
Personally, I wouldn't use .NET. I would import the table into a temp SQL Server table and then write my queries to insert/update data from the temp table to the regular table. This is certainly the way you want to go if the dataset is large.
If this is a process you need to repeat frequently, you could make an SSIS package.
I'd run the select query using datareader = command.ExecuteReader(). Then:
If datareader.Read() then
'Update query using datareader(0) as a where predicate goes here
ElseIf datareader(0) = Nothing then
'Insert query goes here
End If
I should say, I'm a relative novice too though, so maybe others can suggest a more elegant way of doing it.