How to delete large amounts of data from Foxpro - sql

The problem
We have a legacy Visual FoxPro reservation system with many tables. I have been asked to do some housekeeping on the tables to reduce their size.
The tables are badly designed with no auto incrementing primary key.
The largest table is 3 million rows.
I am attempting to delete 380,000 rows.
Due to the volume of data in the tables, I am trying to develop a solution which batch deletes.
What I've got so far
I have created a C# application which accesses the database files via the vfpoledb.1 driver. This application uses recno() to batch the deletion. This is an example of the query I'm using:
delete from TableA
where TableA.Key in (
select Key from TableB
where Departure < date(2010,01,01) and Key <> ""
) and recno() between 1 and 10000
Executing this via vfpoledb.1 does not delete anything. Executing a select statement with the same where clause does not return anything.
It seems that the combination of the recno() function and an in() function is causing the issue. Testing the query with each clause in turn returns results.
Questions
Is there another way of batch deleting data from Visual FoxPro?
Why are recno() and in() not compatible?
Is there anything else I'm missing?
Additional information
ANSI is set to TRUE
DELETED is set to TRUE
EXCLUSIVE is set to TRUE

Instead of doing in batch of so many record numbers, why not a simpler approach. You are looking to kill off everything prior to some date (2010-01-01).
Why not try based on starting with 2009-12-31 and keep working backwards to the earliest date on file you are trying to purge off. Also note, I don't know if Departure is a date vs datetime, so I changed it to
TTOD( Departure ) (meaning convert time to just the date component)
DateTime purgeDate = new DateTime(2009, 12, 31);
// the "?" is a parameter place-holder in the query
string SQLtxt = "delete from TableA "
+ " where TableA.Key in ( "
+ " select Key from TableB "
+ " where TTOD( Departure ) < ? and Key <> \"\" )";
OleDbCommand oSQL = new OleDbCommand( SQLtxt, YourOleDbConnectionHandle );
// default the "?" parameter place-holder
oSQL.Parameters.AddWithValue( "parmDate", purgeDate );
int RecordsDeleted = 0;
while( purgeDate > new DateTime(2000,1,1) )
{
// always re-apply the updated purge date for deletion
oSQL.Parameters[0].Value = purgeDate;
RecordsDeleted += oSQL.ExecuteNonQuery();
// keep going back one day at a time...
purgeDate = purgeDate.AddDays(-1);
}
This way, it does not matter what RECNO() you are dealing with, it will only do whatever keys are for that particular day. If you have more than 10,000 entries for a single day, then I might approach differently, but since this is more of a one-time cleanup, I would not be too concerned with doing 1000+ iterations ( 365 days per year for however many years) through the data... Or, you could do it with a date range and do maybe weekly, just change the WHERE clause and adjust the parameters... something like... (The date of 1/1/2000 is just a guess for how far back the data goes). Also, since this is doing entire date range, no need to convert possible TTOD() of the departure field.
DateTime purgeDate = new DateTime(2009, 12, 31);
DateTime lessThanDate = new DateTime( 2010, 1, 1 );
// the "?" is a parameter place-holder in the query
string SQLtxt = "delete from TableA "
+ " where TableA.Key in ( "
+ " select Key from TableB "
+ " where Departure >= ? "
+ " and Departure < ? "
+ " and Key <> \"\" )";
OleDbCommand oSQL = new OleDbCommand( SQLtxt, YourOleDbConnectionHandle );
// default the "?" parameter place-holder
oSQL.Parameters.AddWithValue( "parmDate", purgeDate );
oSQL.Parameters.AddWithValue( "parmLessThanDate", LessThanDate );
int RecordsDeleted = 0;
while( purgeDate > new DateTime(2000,1,1) )
{
// always re-apply the updated purge date for deletion
oSQL.Parameters[0].Value = purgeDate;
oSQL.Parameters[1].Value = lessThanDate;
RecordsDeleted += oSQL.ExecuteNonQuery();
// keep going back one WEEK at a time for both the starting and less than end date of each pass
purgeDate = purgeDate.AddDays(-7);
lessThanDate = lessThanDate.AddDays( -7);
}

I'm interested in the best way to accomplish this too. We use a lot of poorly designed dBaseIII files that are sometimes quite large.
We do this a lot but it's a nasty, manual process:
Import dbf files into a temp database using DTS (management studio import/export wizard for version 2005 + )
Run the cleanup scripts using SSMS
Export the dbf files and replace the original ones (backing them up) with the newly modified files.
It works for us.

It looks like your condition with date isn't working. Try to execute SELECT statement with using of CTOD() function instead of DATE() you've used.
When your condition will work, then you'll be able to run DELETE statement. But remember that as a result of the DELETE execution the rows will only be marked as deleted. To remove them completely you should run PACK statement after DELETE.
As an another way, you can also try our DBF editor - DBF Commander Professional. It allows to execute SQL queries, including command-line (batch) mode. E.g.:
dbfcommander. exe -q "DELETE FROM 'D:\table_name.dbf' WHERE RECNO()<10000"
dbfcommander. exe -q "PACK 'D:\table_name.dbf'"
You can use it for free within 20 days full-featured trial period.

Related

Record has been changed error if data is changed by VBA

I have a MS Access (2016) database using linked tables to a MySQL database. In the access database I have a form I use for data entry. I needed certain fields to be recalculated (manually) when I click a Recalc button.
The problem I am having is that when I run the VBA code to update fields on the form, if I then try to navigate to another record I get the error "This record has been changed by another user since you started editing it...."
I am the only user accessing this database. Everything works fine if I DON'T update a bound field on the form. Once I do, then I get that error when navigating to the next record.
Here is my vba code for the Recalc button:
Private Sub Recalculate()
vendorID = Me.product_supplier_id
supplierID = "supplier_id=" & vendorID
supplierHandling = Me.product_handling
vendorFee = Me.product_vendor_fee
supplierMarkupPercent = DLookup("supplier_markup_percent", "suppliers", supplierID)
supplierMarkupFixed = DLookup("supplier_markup_fixed", "suppliers", supplierID)
productCost = Me.product_cost
productShipping = Me.product_shipping
totalCost = productCost + productShipping + supplierHandling
totalCost = totalCost + vendorFee
markup = supplierMarkupFixed + (totalCost * supplierMarkupPercent)
productPrice = (totalCost + markup) / 0.85
amzFee = productPrice * 0.15
totalCost = totalCost + amzFee
profit = productPrice - totalCost
Me.product_total_cost = totalCost
Me.product_price = productPrice
Me.product_profit = profit
SetPriceColor
End Sub
The 3 statements near the end (before the SetPriceColor) are the culprits.
I am not sure how to resolve this issue. I have combed through many google searches, but nothing jumps out at me a solution for this specific case.
Yes, the issue is due to linked ODBC tables. Plus floating point number columns, which can cause problems when Access checks whether your changes in the bound form (be it by VBA or manually) conflict with the previous version of the saved record.
The solution should be to add a TIMESTAMP column with DEFAULT CURRENT_TIMESTAMP and ON UPDATE CURRENT_TIMESTAMP to your table.
From here:
ALTER TABLE myTable
ADD COLUMN updated_at
TIMESTAMP DEFAULT CURRENT_TIMESTAMP
ON UPDATE CURRENT_TIMESTAMP;
See these questions:
Write Conflict messages suddenly start happening in ODBC linked tables
Does MySQL have an equivalent of SQL Server rowversion?
For tables linked from SQL Server, adding a ROWVERSION column definitely fixes the issue. For MySql (and its ODBC driver) it should work, and it did work here.
When you have a linked SQL database to an Access Database there are a few things that you need to make sure are in place.
On the SQL side of things, the SQL table must have a primary key and a Timestamp field where the data type is timestamp.
On the Access side of things, when referencing the tables and using recordsets include dbOpenDynaset and dbSeeChanges. Here is an example:
Dim qry As String
Dim rs As Recordset
qry = "SELECT * FROM yourtable"
Set rs = CurrentDB.OpenRecordset(qry, dbOpenDynaset, dbSeeChanges)
This should stop your error from popping up. Also if you make changes to the SQL table, the Access database will not always catch these changes so you will want to refresh your connections using the Linked Table Manager.

SQL: Compare Date Range in a String text field

I have an MS Access db. I am writing an application in C# to access it. I have a field "FileName" of type "Short Text in MS Access. Data in FileName field is like "Test 11-12-2004 15.11.15".
Using a Date Range, I got to search records based on FileName field. I am not able to get - How do I compare the date of this format and retrieve the records ? FileName is a Text type and date is a substring of it. Retrieving only the date part and comparing with >= beginDate && <= endDate seems like a puzzle to me.
Can anyone suggest how do I write SQL query to perform this date range comparision and retrieve those records - "Select * from TestHead where FileName......" ????
Any help is appreciated.
Thanks a lot,
In your C# code, as you are going through the records, I'd split the string like this:
char[] delimiters = {' '};
string[] FileNameParts = FileName.Split(delimiters);
This will result in an array FileNameParts, the second element of which will contain the date, which you can convert to an actual date for use in the query:
DateTime FileNameDate = Convert.ToDateTime(FileNameParts(1))
Something along the lines of:
sSQL = "SELECT * FROM Table WHERE " & beginDate & " <= " & FileNameDate
I see this as preferable to adding a column to your table that contains the date substring of the FileName field, because then you constantly need to be updating that column whenever existing records are modified or new records are added. That means more clutter on the C# side, or an UPDATE query on the Access side which at least needs to get called periodically. Either way it would be more communication with the database.

Generate sql insert script from excel worksheet

I have a large excel worksheet that I want to add to my database.
Can I generate an SQL insert script from this excel worksheet?
I think importing using one of the methods mentioned is ideal if it truly is a large file, but you can use Excel to create insert statements:
="INSERT INTO table_name VALUES('"&A1&"','"&B1&"','"&C1&"')"
In MS SQL you can use:
SET NOCOUNT ON
To forego showing all the '1 row affected' comments. And if you are doing a lot of rows and it errors out, put a GO between statements every once in a while
You can create an appropriate table through management studio interface and insert data into the table like it's shown below. It may take some time depending on the amount of data, but it is very handy.
There is a handy tool which saves a lot of time at
http://tools.perceptus.ca/text-wiz.php?ops=7
You just have to feed in the table name, field names and the data - tab separated and hit Go!
You can use the following excel statement:
="INSERT INTO table_name(`"&$A$1&"`,`"&$B$1&"`,`"&$C$1&"`, `"&$D$1&"`) VALUES('"&SUBSTITUTE(A2, "'", "\'")&"','"&SUBSTITUTE(B2, "'", "\'")&"','"&SUBSTITUTE(C2, "'", "\'")&"', "&D2&");"
This improves upon Hart CO's answer as it takes into account column names and gets rid of compile errors due to quotes in the column. The final column is an example of a numeric value column, without quotes.
Depending on the database, you can export to CSV and then use an import method.
MySQL - http://dev.mysql.com/doc/refman/5.1/en/load-data.html
PostgreSQL - http://www.postgresql.org/docs/8.2/static/sql-copy.html
Use the ConvertFrom-ExcelToSQLInsert from the ImportExcel in the PowerShell Gallery
NAME
ConvertFrom-ExcelToSQLInsert
SYNTAX
ConvertFrom-ExcelToSQLInsert [-TableName] <Object> [-Path] <Object>
[[-WorkSheetname] <Object>] [[-HeaderRow] <int>]
[[-Header] <string[]>] [-NoHeader] [-DataOnly] [<CommonParameters>]
PARAMETERS
-DataOnly
-Header <string[]>
-HeaderRow <int>
-NoHeader
-Path <Object>
-TableName <Object>
-WorkSheetname <Object>
<CommonParameters>
This cmdlet supports the common parameters: Verbose, Debug,
ErrorAction, ErrorVariable, WarningAction, WarningVariable,
OutBuffer, PipelineVariable, and OutVariable. For more information, see
about_CommonParameters (http://go.microsoft.com/fwlink/?LinkID=113216).
ALIASES
None
REMARKS
None
EXAMPLE
ConvertFrom-ExcelToSQLInsert MyTable .\testSQLGen.xlsx
You could use VB to write something that will output to a file row by row adding in the appropriate sql statements around your data. I have done this before.
Here is another tool that works very well...
http://www.convertcsv.com/csv-to-sql.htm
It can take tab separated values and generate an INSERT script. Just copy and paste and in the options under step 2 check the box "First row is column names"
Then scroll down and under step 3, enter your table name in the box "Schema.Table or View Name:"
Pay attention to the delete and create table check boxes as well, and make sure you examine the generated script before running it.
This is the quickest and most reliable way I've found.
You can use the below C# Method to generate the insert scripts using Excel sheet just you need import OfficeOpenXml Package from NuGet Package Manager before executing the method.
public string GenerateSQLInsertScripts() {
var outputQuery = new StringBuilder();
var tableName = "Your Table Name";
if (file != null)
{
var filePath = #"D:\FileName.xsls";
using (OfficeOpenXml.ExcelPackage xlPackage = new OfficeOpenXml.ExcelPackage(new FileInfo(filePath)))
{
var myWorksheet = xlPackage.Workbook.Worksheets.First(); //select the first sheet here
var totalRows = myWorksheet.Dimension.End.Row;
var totalColumns = myWorksheet.Dimension.End.Column;
var columns = new StringBuilder(); //this is your columns
var columnRows = myWorksheet.Cells[1, 1, 1, totalColumns].Select(c => c.Value == null ? string.Empty : c.Value.ToString());
columns.Append("INSERT INTO["+ tableName +"] (");
foreach (var colrow in columnRows)
{
columns.Append("[");
columns.Append(colrow);
columns.Append("]");
columns.Append(",");
}
columns.Length--;
columns.Append(") VALUES (");
for (int rowNum = 2; rowNum <= totalRows; rowNum++) //selet starting row here
{
var dataRows = myWorksheet.Cells[rowNum, 1, rowNum, totalColumns].Select(c => c.Value == null ? string.Empty : c.Value.ToString());
var finalQuery = new StringBuilder();
finalQuery.Append(columns);
foreach (var dataRow in dataRows)
{
finalQuery.Append("'");
finalQuery.Append(dataRow);
finalQuery.Append("'");
finalQuery.Append(",");
}
finalQuery.Length--;
finalQuery.Append(");");
outputQuery.Append(finalQuery);
}
}
}
return outputQuery.ToString();}
Here is a link to an Online automator to convert CSV files to SQL Insert Into statements:
CSV-to-SQL
This query i have generated for inserting the Excel file data into database
In this id and price are numeric values and date field as well. This query summarized all the type which I require It may useful to you as well
="insert into product (product_id,name,date,price) values("&A1&",'" &B1& "','" &C1& "'," &D1& ");"
Id Name Date price
7 Product 7 2017-01-05 15:28:37 200
8 Product 8 2017-01-05 15:28:37 40
9 Product 9 2017-01-05 15:32:31 500
10 Product 10 2017-01-05 15:32:31 30
11 Product 11 2017-01-05 15:32:31 99
12 Product 12 2017-01-05 15:32:31 25
I had to make SQL scripts often and add them to source control and send them to DBA.
I used this ExcelIntoSQL App from windows store https://www.microsoft.com/store/apps/9NH0W51XXQRM
It creates complete script with "CREATE TABLE" and INSERTS.
I have a reliable way to generate SQL inserts batly,and you can modify partial parameters in processing.It helps me a lot in my work, for example, copy one hundreds data to database with incompatible structure and fields count.
IntellIJ DataGrip , the powerful tool i use.
DG can batly receive data from WPS office or MS Excel by column or line.
after copying, DG can export data as SQL inserts.

SQL String in VBA in Excel 2010 with Dates

I've had a look around but cannot find the issue with this SQL Statement:
strSQL = "SELECT Directory.DisplayName, Department.DisplayName, Call.CallDate, Call.Extension, Call.Duration, Call.CallType, Call.SubType FROM (((Department INNER JOIN Directory ON Department.DepartmentID = Directory.DepartmentID) INNER JOIN Extension ON (Department.DepartmentID = Extension.DepartmentID) AND (Directory.ExtensionID = Extension.ExtensionID)) INNER JOIN Site ON Extension.SiteCode = Site.SiteCode) INNER JOIN Call ON Directory.DirectoryID = Call.DirectoryID WHERE (Call.CallDate)>=27/11/2012"
Regardless of what I change the WHERE it always returns every single value in the database (atleast I assume it does since excel completely hangs when I attempt this) this SQL statement works perfectly fine in Access (if dates have # # around them). Any idea how to fix this, currently trying to create a SQL statement that allows user input on different dates, but have to get over the this random hurdle first.
EDIT: The date field in the SQL Database is a DD/MM/YY HH:MM:SS format, and this query is done in VBA - EXCEL 2010.
Also to avoid confusion have removed TOP 10 from the statement, that was to stop excel from retrieving every single row in the database.
Current Reference I have activated is: MicrosoftX Data Objects 2.8 Library
Database is a MSSQL, using the connection string:
Provider=SQLOLEDB;Server=#######;Database=#######;User ID=########;Password=########;
WHERE (Call.CallDate) >= #27/11/2012#
Surround the date variable with #.
EDIT: Please make date string unambiguous, such as 27-Nov-2012
strSQL = "SELECT ........ WHERE myDate >= #" & Format(dateVar, "dd-mmm-yyyy") & "# "
If you are using ado, you should look at Paramaters instead of using dynamic query.
EDIT2: Thanks to #ElectricLlama for pointing out that it is SQL Server, not MS-Access
strSQL = "SELECT ........ WHERE myDate >= '" & Format(dateVar, "mm/dd/yyyy") & "' "
Please verify that the field Call.CallDate is of datatype DATETIME or DATE
If you are indeed running this against SQL Server, try this syntax for starters:
SELECT Directory.DisplayName, Department.DisplayName, Call.CallDate,
Call.Extension, Call.Duration, Call.CallType, Call.SubType
FROM (((Department INNER JOIN Directory
ON Department.DepartmentID = Directory.DepartmentID)
INNER JOIN Extension ON (Department.DepartmentID = Extension.DepartmentID)
AND (Directory.ExtensionID = Extension.ExtensionID))
INNER JOIN Site ON Extension.SiteCode = Site.SiteCode)
INNER JOIN Call ON Directory.DirectoryID = Call.DirectoryID
WHERE (Call.CallDate)>= '2012-11-27'
The date format you see is simply whatever format your client tool decides to show it in. Dates are not stored in any format, they are effectively stored as a duration since x.
By default SQL Uses the format YYYY-MM-DD if you want to use a date literal.
But you are much better off defining a parameter of type date in your code and keeping your date a data type 'date' for as long as possible. This may include only allowing them to enter the date using a calendar control to stop ambiguities.

SQLDataAdapter filling datatable with primary key produces error and exits sub

Ok so this is going to take some explaining.
The process I am trying to do is grab data from a table function in SQL and then fill a dataset with the returned values.
I then have to run this query twice more to query an alternative number table. Then add to the same table as the previous queries.
This needs to be as fast as possible, so I am currently using an adapter.fill to populate the datasets and then a dataset.merge to put them all into one table.
The problem is the query can return duplicates which waste time and space, because of this I made column 3(part_ID) the primary key to stop duplicates.
When this is run with the .merge it quits at the first instance of a duplication and doesn't continue with the population.
The code below is what I used to fix this, I was just wondering if there is a better more elegant solution.
com = New SqlCommand(sqlPN, myConnect)
adapter.SelectCommand = com
adapter.Fill(temp, "Table(0)")
Dim data As New DataSet
data = temp
temp.Tables(0).Columns(3).Unique = True
firstSet = temp.Tables(0).Rows.Count
temp.AcceptChanges()
If temp.Tables(0).Rows.Count < maxRecords Then
Dim sqlAlt As String = "select Top " & (maxRecords + 10 - temp.Tables(0).Rows.Count) & " * from getAltEnquiry('" & tbSearchFor.Text & "') ORDER BY spn_partnumber"
adapter.SelectCommand.CommandText = sqlAlt
adapter.FillLoadOption = LoadOption.OverwriteChanges
adapter.Fill(temp, "Table(1)")
For i = 0 To temp.Tables(1).Rows.Count - 1
Try
temp.Tables(0).ImportRow(temp.Tables(1).Rows(i))
Catch e As Exception
End Try
Next
End If
If temp.Tables(0).Rows.Count < maxRecords Then
Dim sqlSuPN As String = "select Top " & (maxRecords + 5 - temp.Tables(0).Rows.Count) & " * from getSuPNEnquiry('" & tbSearchFor.Text & "') ORDER BY spn_partnumber"
adapter.SelectCommand.CommandText = sqlSuPN
adapter.Fill(temp, "Table(2)")
For i = 0 To temp.Tables(2).Rows.Count - 1
Try
temp.Tables(0).ImportRow(temp.Tables(2).Rows(i))
Catch e As Exception
End Try
Next
End If</code>
Thanks for any help or advice ^__^
Since you are looping through the records from the additional queries and using the ImportRow, your code will throw an exception if more than one record with the same value in the primary key field is attempted to be inserted. That is the purpose of a primary key when using in this way. If you want to ensure that your table only has unique records, you will need to ensure that the records are distinct before inserting them by checking the new row's part_id value against those already in the table. However, your design isn't necessarily the ideal approach.
Since you mentioned that this needs to be fast, it will probably be best if you could write a stored procedure to return just the rows you need from all tables and do the Fill into the table once.
If that's not possible, you can call adapter.Fill on the same DataTable for each of your data sources. Use the Fill overload that takes just the DataTable to fill and as per the docs, it will merge the data together if more than one record with the same primary key exists. The way you have the Fill method called, it is creating a new DataTable with the name you provide for each time you call Fill. Instead, you want to Fill just one DataTable.
"You can use the Fill method multiple times on the same DataTable. If a primary key exists, incoming rows are merged with matching rows that already exist. If no primary key exists, incoming rows are appended to the DataTable."