We're facing a massive data migration that consists of about 1500 Excel Spreadsheets that were used to print out data. Because they were designed to be visual, the data is stored in several fields throughout the spreadsheet. I'm looking for a way to map out those fields, and then do a bulk import to bring all of it into a single large table (or series of tables, if needed).
If it was all in a table format, this wouldn't be a problem - but I'm not sure of any way to import, and somehow map the fields to be imported).
Any thoughts or ideas?
If we're speaking in generalities, I'd look at performing this task via a .NET program and use the ACE OLEDB provider to query the Excel data. Then use the language of your choice to parse the data into something more manageable and write to the database. This approach worked well to import some very non-tabular data into SQL Server.
Quick list of SO questions with C# examples of code to query Excel.
How to query excel file in C# using a detailed query
OleDB Read Excel decimal value
Importing excel files in c#
Related
Problem:
I need to get data sets from CSV files into SQL Server Express (SSMS v17.6) as efficiently as possible. The data sets update daily into the same CSV files on my local hard drive. Currently using MS Access 2010 (v14.0) as a middleman to aggregate the CSV files into linked tables.
Using the solutions below, the data transfers perfectly into SQL Server and does exactly what I want. But I cannot figure out how to refresh/update/sync the data at the end of each day with the newly added CSV data without having to re-import the entire data set each time.
Solutions:
Upsizing Wizard in MS Access - This works best in transferring all the tables perfectly to SQL Server databases. I cannot figure out how to update the tables though without deleting and repeating the same steps each day. None of the solutions or links that I have tried have panned out.
SQL Server Import/Export Wizard - This works fine also in getting the data over to SSMS one time. But I also cannot figure out how to update/sync this data with the new tables. Another issue is that choosing Microsoft Access as the data source through this method requires a .mdb file. The latest MS Access file formats are .accdb files so I have to save the database in an older .mdb version in order to export it to SQL Server.
Constraints:
I have no loyalty towards MS Access. I really am just looking for the most efficient way to get these CSV files consistently into a format where I can perform SQL queries on them. From all I have read, MS Access seems like the best way to do that.
I also have limited coding knowledge so more advanced VBA/C++ solutions will probably go over my head.
TLDR:
Trying to get several different daily updating local CSV files into a program where I can run SQL queries on them without having to do a full delete and re-import each day. Currently using MS Access 2010 to SQL Server Express (SSMS v17.6) which fulfills my needs, but does not update daily with the new data without re-importing everything.
Thank you!
You can use a staging table strategy to solve this problem.
When it's time to perform the daily update, import all of the data into one or more staging tables. Execute SQL statement to insert rows that exist in the imported data but not in the base data into the base data; similarly, delete rows from the base data that don't exist in the imported data; similarly, update base data rows that have changed values in the imported data.
Use your data dependencies to determine in which order tables should be modified.
I would run all deletes first, then inserts, and finally all updates.
This should be a fun challenge!
EDIT
You said:
I need to get data sets from CSV files into SQL Server Express (SSMS
v17.6) as efficiently as possible.
The most efficient way to put data into SQL Server tables is using SQL Bulk Copy. This can be implemented from the command line, an SSIS job, or through ADO.Net via any .Net language.
You state:
But I cannot figure out how to refresh/update/sync the data at the end
of each day with the newly added CSV data without having to re-import
the entire data set each time.
It seems you have two choices:
Toss the old data and replace it with the new data
Modify the old data so that it comes into alignment with the new data
In order to do number 1 above, you'd simply replace all the existing data with the new data, which you've already said you don't want to do, or at least you don't think you can do this efficiently. In order to do number 2 above, you have to compare the old data with the new data. In order to compare two sets of data, both sets of data have to be accessible wherever the comparison is to take place. So, you could perform the comparison in SQL Server, but the new data will need to be loaded into the database for comparison purposes. You can then purge the staging table after the process completes.
In thinking further about your issue, it seems the underlying issue is:
I really am just looking for the most efficient way to get these CSV
files consistently into a format where I can perform SQL queries on
them.
There exist applications built specifically to allow you to query this type of data.
You may want to have a look at Log Parser Lizard or Splunk. These are great tools for querying and digging into data hidden inside flat data files.
An Append Query is able to incrementally add additional new records to an existing table. However the question is whether your starting point data set (CSV) is just new records or whether that data set includes records already in the table.
This is a classic dilemma that needs to be managed in the Append Query set up.
If the CSV includes prior records - then you have to establish the 'new records' data sub set inside the CSV and append just those. For instance if you have a sequencing field then you can use a > logic from the existing table max. If that is not there then one would need to do a NOT compare of the table data with the csv data to identify which csv records are not already in the table.
You state you seek something 'more efficient' - but in truth there is nothing more efficient than a wholesale delete of all records and write of all records. Most of the time one can't do that - but if you can I would just stick with it.
We want some of our customers to be able to export some data into a file and then we have a job that imports that into a blank copy of a database at our location. Note: a DBA would not be involved. This would be a function within our application.
We can ignore table schema differences - they will match. We have different tables to deal with.
So on the customer side the function would ran somethiug like:
insert into myspecialstoragetable select * from source_table
insert into myspecialstoragetable select * from source_table_2
insert into myspecialstoragetable select * from source_table_3
I then run a select * from myspecialstoragetable and get a .sql file they can then ship to me which we can then use some job/sql script to import into our copy of the db.
I'm thinking we can use XML somehow, but I'm a little lost.
Thanks
Have you looked at the bulk copy utility bcp? You can wrap it with your own program to make it easier for less sophisticated users.
Since it is a function within your application, in what language is the application front-end written ? If it is .NET, you can use Data Transformation Services in SQL Server to do a sample export. In the last step, you could save the steps into a VB/.NET module. If necessary, modify this file to change table names etc. Integrate this DTS module into your application. While doing the sample export, export it to a suitable format such as .CSV, .Excel etc, whichever format from which you will be able to import into a blank database.
Every time the user wants do an export, he will have to click on a button that would invoke the DTS module integrated into your application, that will dump the data to the desired format. He can mail such file to you.
If your application is not written in .NET, in whichever language it is written, it will have options to read data from SQL Server and dump them to a .CSV or text file with delimiters. If it is a primitive language, you may have to do it by concatenating the fields of every record, by looping through the records and writing to a file.
XML would be too far-fetched for this, though it's not impossible. At your end, you should have the ability to parse the XML file and import it into your location. Also, XML is not really suited if the no. of records are too large.
You probably think of a .sql file, as in MySql. In SQL Server, .sql files, that are generated by the 'Generate Scripts' function of SQL Server's interface, are used for table structures/DDL rather than the generation of the insert statements for each of the record's hard values.
I am trying to find a way to search a sas7bdat file from excel for a specific value, and then copy and paste the data into an excel spreadsheet. Currently I can add the whole data set and then search it in excel, but it would be much better if I could search the data set before adding it to excel, as the data sets are sometimes too large for excel to handle.
Is there a way of doing this? Or alternatively is there a way of running some SAS code from excel that would perform the search for me?
Many thanks,
Alastair
SAS has a Microsoft Office add-in that you can use to connect to SAS datasets from excel. See their product page for more information.
There are a few other options for connecting to SAS as a data source; you can use ODBC for example. You also can do what you describe using DDE. Finally, you can produce excel spreadsheets (but not paste into particular spots, easily anyway) using PROC EXPORT or ODS EXCEL (the latter in 9.4 TS1M2+). You can filter the dataset to the appropriate size prior to exporting.
I am moving Excel data into SQL Server 2012 using the Import/Export wizard.
The Excel sheet has 377 columns, but when I am importing the file into SQL Server, only 255 columns are appearing in the table. Where are the rest of the columns?
Unfortunately this is a limitation of the ACE driver so not easy to overcome.
An easy solution that I see is to open up the Excel sheet in Excel, then save as CSV. Then use the Import wizard to import the CSV.
I'm not sure if you're aware of this, but the Import/Export wizard is actually using SSIS. In one of the last screens, you have the option to save the SSIS package (.dtsx).
To get the workaround with the names ranges to work, you'll first need to import the two ranges in two separate tables and then join them together to fill up the final table.
Maybe this helps as well: http://blogs.msdn.com/b/dataaccesstechnologies/archive/2011/01/22/importing-excel-2010-data-into-sql-server.aspx
I was wondering if it was possible to import Excel documents using SSIS by referencing a column by its position? For example, import columns A,D,M,AA, etc. I ask because I need to load in several Excel documents from a third party. Each document contains the same data type in the corresponding columns, except the column names are different for each document.
Thanks!
Yes but you won't be using the Excel driver and connection manager. Instead, you will use the OLE DB driver and write a SQL Query against the file. For anything but the most basic Excel files, this is my go-to approach for importing data out of Excel.
Various incarnations of my approach
Excel Source as Lookup Transformation Connection
script task in SSIS to import excel spreadsheet
Import a single Excel cell into SSIS