how can you parse an excel (.xls) file stored in a varbinary in MS SQL 2005? - sql

problem
how to best parse/access/extract "excel file" data stored as binary data in an SQL 2005 field?
(so all the data can ultimately be stored in other fields of other tables.)
background
basically, our customer is requiring a large volume of verbose data from their users. unfortunately, our customer cannot require any kind of db export from their user. so our customer must supply some sort of UI for their user to enter the data. the UI our customer decided would be acceptable to all of their users was excel as it has a reasonably robust UI. so given all that, and our customer needs this data parsed and stored in their db automatically.
we've tried to convince our customer that the users will do this exactly once and then insist on db export! but the customer can not require db export of their users.
our customer is requiring us to parse an excel file
the customer's users are using excel as the "best" user interface to enter all the required data
the users are given blank excel templates that they must fill out
these templates have a fixed number of uniquely named tabs
these templates have a number of fixed areas (cells) that must be completed
these templates also have areas where the user will insert up to thousands of identically formatted rows
when complete, the excel file is submitted from the user by standard html file upload
our customer stores this file raw into their SQL database
given
a standard excel (".xls") file (native format, not comma or tab separated)
file is stored raw in a varbinary(max) SQL 2005 field
excel file data may not necessarily be "uniform" between rows -- i.e., we can't just assume one column is all the same data type (e.g., there may be row headers, column headers, empty cells, different "formats", ...)
requirements
code completely within SQL 2005 (stored procedures, SSIS?)
be able to access values on any worksheet (tab)
be able to access values in any cell (no formula data or dereferencing needed)
cell values must not be assumed to be "uniform" between rows -- i.e., we can't just assume one column is all the same data type (e.g., there may be row headers, column headers, empty cells, formulas, different "formats", ...)
preferences
no filesystem access (no writing temporary .xls files)
retrieve values in defined format (e.g., actual date value instead of a raw number like 39876)

My thought is that anything can be done, but there is a price to pay. In this particular case, the price seems to bee too high.
I don't have a tested solution for you, but I can share how I would give my first try on a problem like that.
My first approach would be to install excel on the SqlServer machine and code some assemblies to consume the file on your rows using excel API and then load them on Sql server as assembly procedures.
As I said, This is just a idea, I don't have details, but I'm sure others here can complement or criticize my idea.
But my real advice is to rethink the whole project. It makes no sense to read tabular data on binary files stored on a cell of a row of a table on database.

This looks like an "I wouldn't start from here" kind of a question.
The "install Excel on the server and start coding" answer looks like the only route, but it simply has to be worth exploring alternatives first: it's going to be painful, expensive and time-consuming.
I strongly feel that we're looking at a "requirement" that is the answer to the wrong problem.
What business problem is creating this need? What's driving that? Try the Five Whys as a possible way to explore the history.

It sounds like you're trying to store an entire database table inside a spreadsheet and then inside a single table's field. Wouldn't it be simpler to store the data in a database table to begin with and then export it as an XLS when required?
Without opening up an instance Excel and having Excel resolve worksheet references I'm not sure it's doable at all.

Could you write the varbinary to a Raw File Destination? And then use an Excel Source as your input to whatever step is next in your precedence constraints.
I haven't tried it, but that's what I would try.

Well, the whole setup seems a bit twisted :-) as others have already pointed out.
If you really cannot change the requirements and the whole setup: why don't you explore components such as Aspose.Cells or Syncfusion XlsIO, native .NET components, that allow you to read and interpret native Excel (XLS) files. I'm pretty such with either of the two, you should be able to read your binary Excel into a MemoryStream and then feed that into one of those Excel-reading components, and off you go.
So with a bit of .NET development and SQL CLR, I guess this should be doable - not sure if it's the best way to do it, but it should work.

Related

Importing data from Excel to SQL through webpage - Searching for phrases

Very low-level programmer tasked with handling something I don't really understand, here.
My company has a webpage that takes a customer's Excel document, reads the data, and moves it to a SQL database. It isn't too sophisticated: it apparently looks for data in a particular cell (e.g., "The cell below the column named "OrderNumber") using Excel's Name Manager as a guide.
If IsDBNull(xlRS.Fields("OrderNumber").Value) OrElse IsNothing(xlRS.Fields("OrderNumber").Value) Then
strPartNumber = ""
Else
strPartNumber = Trim(xlRS.Fields("OrderNumber").Value)
However, each of the customers that will be using this page uses a slightly different Excel form. Although every one will have an "Order Number" column, its location on the form will vary from customer to customer. Most of them can't be persuaded to use our standardized template, so I need to find out if there's a better way to do this.
I'm not sure whether I'm putting this correctly, but using VB.net, is it possible to locate an Excel form cell by searching for a phrase (e.g., "Order"), instead of providing an exact location? If not, what could be used to get around that limitation?
You could you ADO.NET with the appropriate data provider if the data is in a tabular format. I do this sort of thing all the time and it works really well but only if the data is a simple table.

SQL Server 2008 - TSQL Read CSV file

I am working on a project that basically entails on importing a CSV file into a SQL Server 2008 R2 database. The CSV file is generated from an Excel file that is populated by a "manager" with PR hours for his employees. This also includes some additional information such as which job and phase the employees were working on and also includes the number of hours for an equipment (if used).
Once you generate a CSV file for that, it's not exactly the usual straighforward "column" based CSV file. It's more like a "row" based CSV file with each row being kind of unique. Due to this caveat involved, I cannot do a straight dump (using BULK insert or OPENROWSET) to SQL, which would essential create a (temp) table with the appropriate column filled data.
I am looking to use the fields within the CSV file based on the "location" of that field in the row.
So, basically the positions of the data will remain the same, since every CSV is based on a TEMPLATE file - so all I have to do is navigate through the CSV file using SQL code to find the right field based on it's position in the ROW. I hope that gives you guys a better understanding of what I am trying to achieve here. Sorry for the long wall of text.
I researched a bit and here's what I have come up with so far:
Reads CSV files into a temp table through a custom SQL function (Reading lines from a file)
https://www.simple-talk.com/sql/t-sql-programming/reading-and-writing-files-in-sql-server-using-t-sql/
This one is interesting. Dumps the whole file as a BLOB and then you can sift through the data.
http://www.mssqltips.com/sqlservertip/1643/using-openrowset-to-read-large-files-into-sql-server/
Finally, this one essential splits out the rows and creates separates records per row. Interesting..
http://ask.sqlservercentral.com/questions/17408/how-to-read-a-text-file.html
If anyone has any suggestions or steps that I could follow to get through this, I would greatly appreciate it.
To the Mods: If I have posted something (especially the links) that shouldn't be here, please feel free to remove it. I apologize if I did.
Thanks much.. Hope to hear some positive responses! :)
Warm Regards,
Pranav
If the file is not too large, another option is to post-process the file in Excel using a VBA macro. Of course, you'd need to come up to speed using the Excel object model and VBA, but the recording function makes it fairly simple. One advantage of the VBA approach is that it seems you really do want to do row by row processing, and VBA is better for that, whereas SQL is better for set-based operations.

Get list of columns of source flat file in SSIS

We get weekly data files (flat files) from our vendor to import into SQL, and at times the column names change or new columns are added.
What we have currently is an SSIS package to import columns that have been defined. Since we've assigned the mapping, SSIS only throws up an error when a column is absent. However when a new column is added (apart from the existing ones), it doesn't get imported at all, as it is not named. This is a concern for us.
What we'd like is to get the list of all the columns present in the flat file so that we can check whether any new columns are present before we import the file.
I am relatively new to SSIS, so a detailed help would be much appreciated.
Thanks!
Exactly how to code this will depend on the rules for the flat file layout, but I would approach this by writing a script task that reads the flat file using the file system object and a StreamReader object, and looks at the columns, which are hopefully named in the first line of the file.
However, about all you can do if the columns have changed is send an alert. I know of no way to dynamically change your data transformation task to accomodate new columns. It will have to be edited to handle them. And frankly, if all you're going to do is send an alert, you might as well just use the error handler to do it, and save yourself the trouble of pre-reading the column list.
I agree with the answer provided by #TabAlleman. SSIS can't natively handle dynamic columns (and niether can your SQL destination).
May I propose an alternative? You can detect a change in headers without using a C# Script Tasks. One way to do this would be to create a flafile connection that reads the entire row as a single column. Use a Conditional Split to discard anything other than the header row. Save that row to a RecordSet object. Any change? Send Email.
The "Get Header Row" DataFlow would look like this. Row Number if needed.
The Control Flow level would look like this. Use a ForEach ADO RecordSet object to assign the header row value to an SSIS variable CurrentHeader..
Above, the precedent constraints (fx icons ) of
[#ExpectedHeader] == [#CurrentHeader]
[#ExpectedHeader] != [#CurrentHeader]
determine whether you load data or send email.
Hope this helps!
i have worked for banking clients. And for banks to randomly add columns to a db is not possible due to fed requirements and rules. That said I get your not fed regulated bizz. So here are some steps
This is not a code issue but more of soft skills and working with other teams(yours and your vendors).
Steps you can take are:
(1) reach a solid columns structure that you always require. Because for newer columns older data rows will carry NULL.
(2) if a new column is going to be sent by the vendor. You or your team needs to make the DDL/DML changes to the table were data will be inserted. Ofcouse of correct data type.
(3) document this change in data dictanary as over time you or another member will do analysis on this data and would like to know what is the use of each attribute or column.
(4) long-term you do not wish to keep changing table structure monthly because one of your many vendors decided to change the style the send you data. Some clients push back very aggresively other not so much.
If a third-party tool is an option for you, check out CozyRoc's Data Flow Task Plus. It handles variable columns in sources.
SSIS cannot make the columns dynamic,
one thing, i always do, is use a script task to read the first and last lines of a file.
if it is not an expected list of csv columns i mark file as errored and continue/fail as required.
Headers are obviously important, but so are footers. Files can through any unknown issue be partially built. Requesting the header be placed at the rear of the file it is a double check.
I also do not know if SSIS can do this dynamically, but it never ceases to amaze me how people add/change order of columns and assume things will still work.
1-SSIS Does not provide dynamic source and destination mapping.But some third party component such as Data flow task plus , supporting this feature
2-We can achieve this using ssis script task.
3-If the Header is correct process further for migration else fail the package before DFT execute.
4-Read the line from the header using script task and store in array or list object
5-Then compare those array values to user defined variables declare earlier contained default value as column name.
6-If values are matching exactly then progress further else fail it.

Get column name & table name from value

Actually I have a new client & their Database has no standard naming conventions & the application is in classic asp.I have a form in which a form there are many values in the different textboxes, it it very difficult to trace the value come from which table.& also there is no erd.
I need a query from which I can get the table name with column name by giving Value.
Let's suppose I have a value having label name abc#= '6599912268'
& the new project has no ERD no standard of naming conventions... I need a fast way to know the abc# ='6599912268' is taking from which table & which column name.... like this the UI has many values which is time taken to trace manually
Is there any way to trace it?
The simple answer is no. There is no way to trace table/column it comes from by mere inspection of the value.
I suggest the following.
Find out what type of db your product is using. Where it is situatede, do you have access to it.
If you have access to the database, get to know the db structure. What each table is meant to store, the relationships etc. Speak to the db administrator or the business analayst to increase your knowledge on the product domain.
Once you have the db structure, try and compare the table to the page. Eg. The user details will most like be stored in a db table named 'Users' or 'Membership' Catch my drift?
Then have a look at the web sites source code. Look at the specific page you are at. Is the sql code embedded in the source code (asp page) or does it call a COM server or something similar? If you are "lucky" (and I say lucky for on the purpose of your problem that you are having) you fill find the sql code in the asp page.
If it calls a COM object or something similar, then you will have to dig up the source code for that, and that is most likely where you sql will reside.
There is no easy way to do this, you have to use a stored procedure to loop over all the tables in the database and search for the value, and it will probably take a while.
There's a stored procedure and examples here: Search all columns in all the tables in a database for a specific value. You'll see there are stored procedures for finding dates, strings, numbers.
Not possible, and If you search the column with the value, there is a possible chance that you get multiple columns with the same value, so how would you differentiate them and the same case is for the table.

Efficient way to produce many small recordings and connect it to database

For a language course I need to record many sentences and words separately.
Ideally I would click on a button next to the written sentence in my database (e.g. MS Access or MySQL) and record the sentence. Then go to the next sentence.
Is there a way to do it that simple?
Until now, I record the sentence one by one in audio software like Soundbooth and type in a name each time. Than I have to type these names (and paths) in the database.
I would create sound files for each recording and then store the reference to each sound file in the database instead of storing the actual file as a blob.
Is that what you are doing at the moment and you are asking how to revert to storing blobs?
The job will be arduous either way, but you'll have a more maintainable structure if you keep data records in the DB only and a reference to the file.