Validate data before uploading through SSIS - sql-server-2005

I have a SSIS package to upload data from Excel file into an Sql Server 2005 table.
The excel file will have varied lines of data ranging from 20k - 30k lines.
The upload works fine, when all the data are correct. But obviously fails when there is a small problem even in a single row. Examples like mandatory values presented null, inconvertable values (data type mismatch) etc.
I want to validate the excel file before the upload and want to tell the user which row and column has got the error...
Any idea as to how to accomplish this, without consuming much time and resources.
Thanks

It might be easiest to load into a temporary table that does not have any mandatory values etc and check that before appending it to the main table.
EDIT re comment
Dim cn As ADODB.Connection
Dim rs As ADODB.Recordset
''This is not necessarily the best way to get the workbook name
''that you need
strFile = Workbooks(1).FullName
''Note that if HDR=No, F1,F2 etc are used for column names,
''if HDR=Yes, the names in the first row of the range
''can be used.
''This is the Jet 4 connection string, you can get more
''here : http://www.connectionstrings.com/excel
strCon = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & strFile _
& ";Extended Properties=""Excel 8.0;HDR=Yes;IMEX=1"";"
Set cn = CreateObject("ADODB.Connection")
Set rs = CreateObject("ADODB.Recordset")
cn.Open strCon
''Note that HDR=Yes
''Pick one:
strSQL = "SELECT Frst, Secnd FROM TheRange WHERE SomeField Is Null" ''Named range
strSQL = "SELECT Frst, Secnd FROM [Sheet1$C3:C67] WHERE Val(Secnd)=0" ''Range
strSQL = "SELECT Frst, Secnd FROM [Sheet1$] WHERE First<Date()" ''Sheet
rs.Open strSQL, cn
Sheets("Sheet2").Cells(2, 1).CopyFromRecordset rs

I have recently been working on a number of similar packages in SSIS and the only way that I have been able to get around this is to have a holding table similar Remou's suggestion.
This table is extremely generic, where all fields are NULLable and VARCHAR(255). I then have a validation Stored Procedure that checks things such as typing, the existance of data etc before I move the data into a "live" situation. Although it may not be the most elegant of solutions, it gives you alot of control of the way you check the data and also means that you shouldn't have to worry about converting the file(s) to .CSV first.

Related

SQL query with same name columns

I am using excel for a macro to paste information with a sql query but in the table where I have the information in the columns I have the same repeated name and the names must be those.
The macro would be the following:
"Select [code], [name], [PCR] from [Book$B2:H]"
The table where I want to get the information would be the following:
I need the query to copy the information that is in bold but i have PCR in 3 columns so its getting only the first one.
If you're using ADO in your VBA code, then you can change the connection string to say that your data doesn't have headers. This then allows you to refer to fields by their position rather than their name. To do this, add HDR=No into the Extended Properties of the connection string.
Your SQL query could then be something like this:
SELECT F1, F2, F4, F6, F8 FROM [Book$C2:H]
Setting up the connection string would be something like this:
' Set up connection
Dim cn As Object
Set cn = CreateObject("ADODB.Connection")
' Connection string for Excel 2007 onwards .xlsm files
With cn
.Provider = "Microsoft.ACE.OLEDB.12.0"
.ConnectionString = "Data Source=" & ThisWorkbook.FullName & ";" & _
"Extended Properties=""Excel 12.0 Macro;HDR=No"";"
.Open
End With
This assumes that your VBA code is in the same workbook as the data - if that's not the case, then just change the value for the Data Source. See connectionstrings.com for any other potential variations you might need to make for different types of Excel file
The easiest solution is likely to involve:
Insert a row C.
in C3: =if(C1="",B1&"."&C2,C1&"."&C2) and drag across.
But to fit into the bigger picture we would need to know about the bigger picture.

Excel VBA SQL ADODB Not Inserting Row Into Worksheet (NO ERRORS)

My insert statement has been working, but randomly stops inserting, until I go into the workbook I'm inserting into and manually change something and save.
The db workbook currently has around 15,000 rows that have been previously inserted, but now when I run the macro, it doesn't insert anything and there are no errors. I've stepped through the code, and every line executes normally.
I need help figuring out why nothing is being inserted, when it worked before.
Sub Insert
Dim con As ADODB.Connection
Dim InsertSQL As String
InsertSQL = "INSERT INTO [Sheet1$] ([FIELD1],[FIELD2],[FIELD3],[FIELD4],[FIELD5]) VALUES( 1234567895,9350.00,#9/12/2019#,'username',#9/12/2019 10:05 AM#)"
Set con = New ADODB.Connection
con.Open "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & dbpath & ";Extended Properties=""Excel 12.0;HDR=Yes;IMEX=0""; Mode=ReadWrite;"
con.Execute InsertSQL
con.Close
End Sub
Edit:
I've set default values in the first 8 rows of my db workbook so as not to cause any limitations in data types as specified in:
This is an issue with the Jet OLEDB provider. It looks at the first 8
rows of the spreadsheet to determine the data type in each column. If
the column does not contain a field value over 256 characters in the
first 8 rows, then it assumes the data type is text, which has a
character limit of 256. The following KB article has more information
on this issue: http://support.microsoft.com/kb/281517
It's possible this issue is due to some blank rows in the worksheet, but I can't confirm.

How do I override the Header for field naming?

Using SQL to query an Excel worksheet without a header row echos the information I have found in all my research. But that isn't what I am getting. Has there been a change in Excel 2016, or is my implementation wrong?
Sub vndrrst()
Dim cn As Object
Set cn = CreateObject("adodb.connection")
strfilename = ThisWorkbook.Path
cn.ConnectionString = _
"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & strfilename & _
";Extended Properties=""text;HDR=NO;imex=1"";"
cn.Open
Set rs = cn.Execute("select * from My_Text_File.txt where F2='04-62425'")
Set rs = Nothing
Set cn = Nothing
End Sub
The error message reads "No value given for one or more required parameters." at the Set rs statement. When I change F2 to MFGID (the actual field name given in the header of this text file), it both runs and gives accurate output in the debug window... even though HDR=No.
So, how can I use the F field names if the file has a header this time? The reason this becomes interesting is because I have many text/csv/delimited files sent to my FTP, some have headers, some don't, and some have headers sometimes.
Edit: the schema.ini file reads
[My_Text_File.txt]
Format=TabDelimited
in case it matters.
A different related question is ADO Recordset to Excel spreadsheet opens properly in Excel 2007, has a missing parameter in Excel 2013, but the motivation/project of the OP is different, and therefore the answer is irrelevant.

EXCEL ADODB Query on local worksheet not Including newly inserted records

I am using ADODB to query data form a worksheet in the Active workbook. The data resides on it's own sheet, and has column headers. I've defined the table as an excel ListObject - excel's automatic table formatting construct.
I open the connection like this:
Set cn = CreateObject("ADODB.Connection")
Set rs = CreateObject("ADODB.Recordset")
strCon = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & ThisWorkbook.Path & "\" & _
ThisWorkbook.Name & ";Extended Properties=""Excel 8.0;HDR=Yes;IMEX=1"";"
cn.Open strCon
Then I can fetch a recordset using a simple SQL statement:
strSQL = "SELECT * from [sheet1$]
rs.Open strSQL, cn, 0, 1 'cursortype = adOpenForwardOnly, locktype = adOpenReadonly
This all works fine... until I insert a new row in the table on sheet1. The new row is not included in subsequent queries, even if I close, set to nothing, and re-open both the connection and recordset variables in my code.
If I save and close the workbook, and then re-open it, the new records ARE included in the query, which leads me to believe this might be a caching issue. I've searched for ADODB Cache Flush etc, but most results appear to be related to PHP or Access. I've also tried a variety of other options for Cursor Type and Lock Type, with no difference.
Can anyone suggest how I can ensure that each time I run my query I get all the rows, even after I insert new rows in the table?
Figured out a solution:
Since I'm using Excel 2010, I discovered that I can use a newer version of ADODB.
So, instead of defining my connection string like this:
"Provider=Microsoft.Jet.OLEDB.4.0;Data Source="...
I changed it to this:
"Provider=Microsoft.ACE.OLEDB.12.0;Data Source="...
and the problem is solved. New inserts and edits are now showing up immediately after I make them. This also removes the issue of the known memory leak in OLEDB.4.0, so that's a bonus.

Using Dates from Cell or named Range in Sql Query

I have created a sheet to extract data from a Microsoft SQL database to produce a customer report between 2 date StartDate and EndDate.
I have been playing with a few things but have not been successful in anyway. I have searched but have not been able to find anything that was what I was after or able to understand.
The problem I believe is data type of the date I am using in Excel and trying to pass it to the SQL query. I understand I need to convert this in some way to make this possible and correct.
If I manually enter dates into the query it works fine. But not practical for customer use
I am not experience with this and am just! stubbing my way through it. If someone would be so kind to me with this would be much appreciated.
Below is the code I am trying to use
Sub DataExtract()
'
DataExtract Macro
'
' Create a connection object.
Dim cni96X As ADODB.Connection
Set cni96X = New ADODB.Connection
' Set Database Range
' Provide the connection string.
Dim strConn As String
Dim Lan As Integer
Dim OS As Integer
Dim PointID As String
' Set Variables
Lan = Range("Lan").Value
OS = Range("OS").Value
PointID = Range("PointID").Value
StartDate = Range("StartDate").Value
EndDate = Range("EndDate").Value
'Use the SQL Server OLE DB Provider.
strConn = "PROVIDER=SQLOLEDB;"
'Connect to 963 database on the local server.
strConn = strConn & "DATA SOURCE=(local);INITIAL CATALOG=i96X;"
'Use an integrated login.
strConn = strConn & " INTEGRATED SECURITY=sspi;"
'Now open the connection.
cni96X.Open strConn
' Create a recordset object.
Dim rsi96X As ADODB.Recordset
Dim rsi96X1 As ADODB.Recordset
Set rsi96X = New ADODB.Recordset
Set rsi96X1 = New ADODB.Recordset
With rsi96X
' Assign the Connection object.
.ActiveConnection = cni96X
' Extract the required records1.
.Open "SELECT ModuleLabel, originalAlarmTime FROM LastAlarmDetailsByTime WHERE (os = " & OS & " And theModule = N'" & PointID & "'AND AlarmCode = N'DI=1' And lan = " & Lan & " And originalAlarmTime BETWEEN N'" & StartDate & "' AND N'" & EndDate & "') ORDER BY originalAlarmTime DESC"
' Copy the records into sheet.
Range("PointLabel, TimeCallInitiated").CopyFromRecordset rsi96X
With rsi96X1
.ActiveConnection = cni96X
' Assign the Connection object.
.Open "SELECT originalAlarmTime FROM LastAlarmDetailsByTime WHERE (os = " & OS & " And theModule = N'" & PointID & "'AND AlarmCode = N'CDI1' And lan = " & Lan & " And originalAlarmTime BETWEEN N'" & StartDate & "' AND N'" & EndDate & "')ORDER BY originalAlarmTime DESC"
' Copy the records into sheet.
Sheet1.Range("TimeCallEnded").CopyFromRecordset rsi96X1
' Tidy up
.Close
I hope this makes sense.
You cannot specify the data types, the Access database engine (formerly Jet) must guess. You can influence its guesswork by changing certain registry settings (e.g. MaxScanRows) and including IMEX=1 in the connection string. For more details, see this knowledge base article.
Here's something I wrote on the subject many years ago (if you google for "ONEDAYWHEN=0" you can see it has been widely read though perhaps not carefully enough!):
The relevant registry keys (for Jet 4.0) are in:
Hkey_Local_Machine/Software/Microsoft/Jet/4.0/Engines/Excel/
The ImportMixedTypes registry key is always read (whether it is
honored is discussed later). You can test this by changing the key to
ImportMixedTypes=OneDayWhen and trying to use the ISAM: you get the
error, "Invalid setting in Excel key of the Engines section of the
Windows Registry." The only valid values are:
ImportMixedTypes=Text
ImportMixedTypes=Majority Type
Data type is determined column by column. 'Majority Type' means a
certain number of rows (more on this later) in each column are scanned
and the data types are counted. Both a cell's value and format are
used to determine data type. The majority data type (i.e. the one with
the most rows) decides the overall data type for the entire column.
There's a bias in favor os numeric in the event of a tie. Rows from
any minority data types found that can't be cast as the majority data
type will be returned with a null value.
For ImportMixedTypes=Text, the data type for the whole column will be:
Jet (MS Access UI): 'Text' data type
DDL: VARCHAR(255)
ADO: adWChar ('a null-terminated Unicode character string')
Note that this is distinct from:
Jet (MS Access UI): 'Memo' data type
DDL: MEMO
ADO: adLongVarWChar ('a long null-terminated Unicode string value')
ImportMixedTypes=Text will curtail text at 255 characters as Memo is
cast as Text. For a column to be recognized as Memo, majority type
must be detected, meaning the majority of rows detected must contain
256 or more characters.
But how many rows are scanned for each column before is decided that
mixed and/or what the majority type is? There is a second registry
key, TypeGuessRows. This can be a value from 0-16 (decimal). A value
from 1 to 16 inclusive is the number of rows to scan. A value of zero
means all rows will be scanned.
There is one final twist. A setting of IMEX=1 in the connection
string's extended property determines whether the ImportMixedTypes
value is honored. IMEX refers to 'IMport EXport mode'. There are three
possible values. IMEX=0 and IMEX=2 result in ImportMixedTypes being
ignored and the default value of 'Majority Types' is used. IMEX=1 is
the only way to ensure ImportMixedTypes=Text is honored. The resulting
connection string might look like this:
Provider=Microsoft.Jet.OLEDB.4.0;
Data Source=C:\ db.xls;
Extended Properties='Excel 8.0;HDR=Yes;IMEX=1'
Finally, although it is mentioned in MSDN articles that MAXSCANROWS
can be used in the extended properties of the connection string to
override the TypeGuessRows registry keys, this seems to be a fallacy.
Using MAXSCANROWS=0 in this way never does anything under any
circumstances. Put another way, is has just the same effect as putting
ONEDAYWHEN=0 in the extended properties, being none (not even an
error!) The same applied to ImportMixedTypes i.e. can't be used in
the connection string to override the registry setting.
In summary, use TypeGuessRows to get Jet to detect whether a 'mixed
types' situation exists or use it to 'trick' Jet into detecting a
certain data type as being the majority type. In the event of a
'mixed types' situation being detected, use ImportMixedTypes to tell
Jet to either use the majority type or coerce all values as Text
(max 255 characters).
Try changing the date part of your SQL statement to:
"[...] originalAlarmTime BETWEEN '" & Format$(StartDate, "yyyy-mm-dd") & "' AND '" & Format$(EndDate, "yyyy-mm-dd") & "' [...]"
You might also try using a parameterized query.