SSIS importing percentage column from Excel displaying as NULL in database - sql

I have an ETL process set up to take data from an Excel spreadsheet and store it in a database using SSIS. However, one of the columns in the the Excel file is formatted as a percent, and it will sometimes erroneously be stored as a NULL value in the database, as if there was some sort of translation error.
Pictured is the exact format being used for the column in Excel.
Interestingly, these percent values do load properly on some days, but for some reason one particular Excel sheet I was given as an example of this issue will not load any of them at all when put through the SSIS processor.
In Excel, these values will show up like "50.00%", and when the SSIS processor is able to translate them properly it will display as the decimal equivalent in the database, "0.5", which is what I want instead of the NULL values. The data type I am using in SSIS for this is Unicode string [DT_WSTR], and it is saved as an NVARCHAR in the database.
Any insight as to why these values will sometimes not display/translate as intended? I have tried messing around with the data types in SSIS/SQL Server, but it has either resulted in no change or error. When I put test values in the Excel sheet, such as "test" to see if it is importing anything at all from this column, it does seem to work (just not for the percent numbers that I need).

The issue was caused by the "mixed data types" that were present in the first few rows of my data (the "mixed" part being blank fields), which would explain why some sheets would work and others wouldn't.
https://stackoverflow.com/a/542573/11815822
Setting the connection string to accommodate for this fixed the issue.

Related

Import data into SQL Server 2012 using wizard lost data (have date and empty cells in the same column)

I know this is not a new issue and I have browsed a lot but could not figure it out.
The table contains a column with a few dates (like 25-Apr-16) while the majority of the rows are empty. After importing to SQL, there is only NULL left; all the dates were gone.
I have tried three methods:
save the file to .xls, change the column to Date format;
save the file to .xls, fill up empty with NULL, change the column to Date format;
save to .csv;
So far, none of them worked.
At last, I tried this and it worked, but I am wondering if there are better ways:
save the file to .xls, fill up empty with 12/31/9999, after importing, update 12/31/9999 to NULL.
The issue when you import data to SQL Server is that SQL Server try to wonder what kind of data is in each cell of the sheet, and it does their research on the first 100 records. So if there is anything on the top 100 records of a cell, SQL Server considers that the cell is empty. To avoid that, you can fill the first record of your Excel sheet with dummy values and after that import to SQL Server.

Excel to SQL table field value appending with 0

I loaded an Excel file into an SQL table. The Excel file, one field consists of VARCHAR data (of data type general). When loaded into an SQL table, some of these values are prefixed with zero.
Example: in the Excel file it is 1081999 the same value become 01081999 in the SQL table.
What might be the reason for this ?
Excel will hide leading 0's as it identifies the fields content as a number and displays it as such. I would assume that the excel worksheet does indeed contain these leading 0's and they are simply not shown by Excel. If you change the type of the column from General to Text do they show up??
As a side note, if these are indeed numbers you should be storing them in a numeric datatype in the database...

Access randomly drops decimals when inserting from txt file

I have a number of .csv files in the following format with example data listed:
ID,Lat,Long
1,-43.76120167,158.0299917
2,-43.76119833,158.03
3,-43.7612,158.0299983
4,-43.76120167,158.0299967
The values change from file to file, but they're always the same format and similar amounts. What you see above is exactly how it shows up in the .csv file (when opened with a texpad/notepad not in Excel - so we can eliminate any Excel problems now).
However, when I run the following INSERT statement as an Access SQL query:
INSERT INTO Table1 SELECT * FROM [Text;Database=C:\;Hdr=Yes].[ImportFile.csv];
This is what shows up in my Access database table:
ID,Lat,Long
1,-43,158
2,-43,158
3,-43,158
4,-43,158
Now, I know what you're thinking. Let me just say that my table design in Access is set up such that ID is a Number/Long Integer, and both of my Lat and Long fields are set up as Number/Double with 4 Decimal Places. I've double checked this a million times and it can be confirmed because not all input files share this problem.
This is what is troubling me... where are all the digits after my decimal point going? I need to have them.
What confuses me even more is that some files read just fine and the decimal points stay in there just fine... same table, same insert query. Every file is generated from the same source and formatted the same for what it's worth.
However, if I fire up Access itself and run the import from text file wizard, the values end up just fine. Access automatically makes the field a double with auto decimals (I have also tried using auto decimals in my desired table, to no avail).
Anyone have any idea what is going on?
Thanks!
This is well known problem.
The Jet database engine determines the data types from data source. One of solutions is to use/create Schema.ini file.
Also, keep in mind that, in order to determine data type for columns, only first few rows are scanned.
For more info, please see here

Importing data from excel and inserting into sql tables

I am Importing data from excel and inserting into a sql table. One of the fields in my Excel file is populated with a time value. I want to import that time value as a string into my sql table. When i do that i get a weird value in my sql table. Excel value is: 07:00 and after inserting that as a string into the sql table the time values looks like this: 0,29166666667.
The reason for importing it as a string value is that you have to be able to define Days in the same field. Like this : D2 10:30. When i import this kind of values it is inserted correctly.
can anyone help ?
Excel stores dates and times as number-values from 0 to 0.99999999 +x days.
0.29166666667 would be like 00.01.1900 07:00:00, which seems to be correct in your case.
So, you would have to use some reformatting or conversion of this value, before using it as a direct string-input.
In VBA you could use Format(myValue,"DD-MM-YYYY hh:mm:ss").
The equivalent worksheet function would be TEXT(A1,"DD-MM-YYYY hh:mm:ss").
The format-code depends on your regional settings. You might want to try something like this Format(myTime, "Long Time"), if you preffer to use excel-defined time-formats.
Because you did not post any code, I am not sure about how you import your excel-data. But I would say, the fastest way to get better results, would be setting up a new column, using the TEXT formula with a reference to the previously time-column and use this new formatted column as input for your sql-db.

how can you parse an excel (.xls) file stored in a varbinary in MS SQL 2005?

problem
how to best parse/access/extract "excel file" data stored as binary data in an SQL 2005 field?
(so all the data can ultimately be stored in other fields of other tables.)
background
basically, our customer is requiring a large volume of verbose data from their users. unfortunately, our customer cannot require any kind of db export from their user. so our customer must supply some sort of UI for their user to enter the data. the UI our customer decided would be acceptable to all of their users was excel as it has a reasonably robust UI. so given all that, and our customer needs this data parsed and stored in their db automatically.
we've tried to convince our customer that the users will do this exactly once and then insist on db export! but the customer can not require db export of their users.
our customer is requiring us to parse an excel file
the customer's users are using excel as the "best" user interface to enter all the required data
the users are given blank excel templates that they must fill out
these templates have a fixed number of uniquely named tabs
these templates have a number of fixed areas (cells) that must be completed
these templates also have areas where the user will insert up to thousands of identically formatted rows
when complete, the excel file is submitted from the user by standard html file upload
our customer stores this file raw into their SQL database
given
a standard excel (".xls") file (native format, not comma or tab separated)
file is stored raw in a varbinary(max) SQL 2005 field
excel file data may not necessarily be "uniform" between rows -- i.e., we can't just assume one column is all the same data type (e.g., there may be row headers, column headers, empty cells, different "formats", ...)
requirements
code completely within SQL 2005 (stored procedures, SSIS?)
be able to access values on any worksheet (tab)
be able to access values in any cell (no formula data or dereferencing needed)
cell values must not be assumed to be "uniform" between rows -- i.e., we can't just assume one column is all the same data type (e.g., there may be row headers, column headers, empty cells, formulas, different "formats", ...)
preferences
no filesystem access (no writing temporary .xls files)
retrieve values in defined format (e.g., actual date value instead of a raw number like 39876)
My thought is that anything can be done, but there is a price to pay. In this particular case, the price seems to bee too high.
I don't have a tested solution for you, but I can share how I would give my first try on a problem like that.
My first approach would be to install excel on the SqlServer machine and code some assemblies to consume the file on your rows using excel API and then load them on Sql server as assembly procedures.
As I said, This is just a idea, I don't have details, but I'm sure others here can complement or criticize my idea.
But my real advice is to rethink the whole project. It makes no sense to read tabular data on binary files stored on a cell of a row of a table on database.
This looks like an "I wouldn't start from here" kind of a question.
The "install Excel on the server and start coding" answer looks like the only route, but it simply has to be worth exploring alternatives first: it's going to be painful, expensive and time-consuming.
I strongly feel that we're looking at a "requirement" that is the answer to the wrong problem.
What business problem is creating this need? What's driving that? Try the Five Whys as a possible way to explore the history.
It sounds like you're trying to store an entire database table inside a spreadsheet and then inside a single table's field. Wouldn't it be simpler to store the data in a database table to begin with and then export it as an XLS when required?
Without opening up an instance Excel and having Excel resolve worksheet references I'm not sure it's doable at all.
Could you write the varbinary to a Raw File Destination? And then use an Excel Source as your input to whatever step is next in your precedence constraints.
I haven't tried it, but that's what I would try.
Well, the whole setup seems a bit twisted :-) as others have already pointed out.
If you really cannot change the requirements and the whole setup: why don't you explore components such as Aspose.Cells or Syncfusion XlsIO, native .NET components, that allow you to read and interpret native Excel (XLS) files. I'm pretty such with either of the two, you should be able to read your binary Excel into a MemoryStream and then feed that into one of those Excel-reading components, and off you go.
So with a bit of .NET development and SQL CLR, I guess this should be doable - not sure if it's the best way to do it, but it should work.