How many excel files can an SSIS package able to open and insert if the data in my excel files are less than 500KB?
Here it was tested for XML connection test - there is no known limitation. In other word, limitation depends from computer resources.
As far as I know, 500 KB files can be easily handled by SSIS packages if the machine running the packages meets the minimum requirements listed by Microsoft.
Please take a look at my answer in this link. In the link, tab delimited files are being loaded into SQL using SSIS. The .txt files used in the example were of size 41 MB containing a million rows. Configuration of the machine used for testing is provided in the answer. That should hopefully give an idea about SSIS capabilities of handling large files.
Hope that helps.
Related
Currently, my company uses SSIS and BCP to export data from SQL Server to CSV files. However, we are only able to create a single file per SQL table (due to the limitations of BCP). Most of these files are quite large; if I am correct, they are too large to get the best performance when loading them into Snowflake. On their website, they state that we should be working with multiple gzip files to offer the best performance.
I am wondering how other people made this work? Splitting up the CSV to multiple files and zipping them? Any good tools that can do this during export from SSIS?
I'd keep the current process that exports the large .csv files using SSIS, then run 7zip via command line to create a split gzip set for each text file, either within the SSIS package or via Powershell.
The -v switch is used to specify the volume size.
https://sevenzip.osdn.jp/chm/cmdline/switches/volume.htm
You may be able to start importing/uploading the completed chunks before the later ones are finished to pick up some additional time savings, but I've not tested that.
I'm trying to connect to and run queries on two large, locally-stored SQL databases with file extensions like so:
filename.sql.zstd.part
filename2.sql.zstd
My preference is to use the RMySQL package- however i am finding it hard to find documentation of a) how to access locally stored SQL files, and b) how to deal with the zstd extension.
This may be very basic but help is appreciated!
Seems like you have problems understanding the file extensions.
filename.sql.zstd.part
.part usually means you are downloading a file from the internet, but the download isn't complete yet (so downloads that are in progress or have been stopped)
So to get from filename.sql.zstd.part to filename.sql.zstd you need to complete your download
.zstd means it is a compressed file (to save disk space). You need a decompression program to get from filename.sql.zstd to filename.sql
The compression algorithm used is called Zstandard so you need a decompressor specifically for this program. Look here https://facebook.github.io/zstd/ for such a program.
There was also once an R package for this - but it has been archived. But you could also download an older version
(https://cran.r-project.org/web/packages/zstdr/index.html)
In filename.sql is actually not a database. In an .sql file are usually SQL statements for creating / modifying database structures. You'd have to install a database e.g. MariaDB and then import this .sql file to actually really have the files in a database on your computer. And then you would access this database via R.
Firstly apologies if this question seems like a wall of text, I can't think of a way to format it.
I have a machine with valuable data on(circa 1995), the machine is running unix (SCO OpenServer 6) with some sort of database stored on it.
The data is normally accessed via a software package of which the license has expired and the developers are no longer trading.
The software package connects to the machine via telnet to retrieve data and modify data (the telnet connection no longer functions due to the license being changed).
I can access the machine via an ODBC driver (SeaODBC.dll) over a network, this was how I was planning to extract the data but so far I have retrieved 300,000 rows in just over 24 hours, in total I estimate there will be around 50,000,000 rows total so at current speed it will take 6 months!
I need either a quicker way to extract the data from the machine via ODBC or a way to extract the entire DB locally on the machine to an external drive/network drive or other external source.
I've played around with the unix interface and the only large files I can find are in a massive matrix of single character folder (eg A\G\data.dat, A\H\Data.dat ect).
Does anyone know how to find out the installed DB systems on the machine? Hopefully it is a standard and I'll be able to find a way to export everything into a nicely formatted file.
Edit
Digging around the file system I have found a folder under root > L which contains lots of single lettered folders, each single lettered folder contains more single letter folders.
There are also files which are named after the table I need (eg "ooi.r") which have the following format:
<Id>
[]
l for ooi_lno, lc for ooi_lcno, s for ooi_invno, id for ooi_indate
require l="AB"
require ls="SO"
require id=25/04/1998
{<id>} is s
sort increasing Id
I do not recognize those kinds of filenames A\G\data.dat and so on (filenames with backslashes in them???) and it's likely to be a proprietary format so I wouldn't expect much from that avenue. You can try running file on these to see if they are in any recognized format just to see...
I would suggest improving the speed of data extraction over ODBC by virtualizing the system. A modern computer will have faster memory, faster disks, and a faster CPU and may be able to extract the data a lot more quickly. You will have to extract a disk image from the old system in order to virtualize it, but hopefully a single sequential pass at reading everything off its disk won't be too slow.
I don't know what the architecture of this system is, but I guess it is x86, which means it might be not too hard to virtualize (depending on how well the SCO OpenServer 6 OS agrees with the virtualization). You will have to use a hypervisor that supports full virtualization (not paravirtualization).
I finally solved the problem, running a query using another tool (not through MS Access or MS Excel) worked massively faster, ended up using DaFT (Database Fishing Tool) to SELECT INTO a text file. Processed all 50 million rows in a few hours.
It seems the dll driver I was using doesn't work well with any MS products.
Everytime I want to change some properties in some class I get the following error messages:
:Microsoft Cursor Engine [-2147217864]
Row cannot be located for updating. Some values may have been changed since it was last read.
ADODB.Recordset[-2146825069]
Operation is not allowed in this context.
How can I solve them??
Even if this question was posted a long time ago:
Now and then this error occurs in my projects, too.
Every time I try to edit specific elements in Enterprise Architect projects i get exactly the same error messages. The only solution to this is to delete the element completely and create it again.
#TomO:
When you are importing a package, is this from XMI or are you import a source code directory?
I import only via XMI file.
What are you using as a repository?
I'm using a PostgreSQL-Server based repository, which I access via ODBC Driver.
In your ODBC Data Source Configuration do you have "Return matched rows instead of affected rows" and "Allow big result sets".
Could specify where I can find these options? Perhaps this is outdated, becaus I can't find any of these options under the Options/Datasource Menu in my ODBC driver.
If you are importing form XMI are you stripping the GUIDs on import, this is always a good idea if you are making a copy of an existing folder in your model as having two elements with the same GUID is not ideal ;-)
I strip GUIDs when I'm exporting and again when I'm importing XMI files.
I would really apprechiate any help concerning this topic.
If possible i might need a little more information. When you are importing a package, is this from XMI or are you import a source code directory? What are you using as a repository? Given the error I am assuming it is not the local EAP file.
In your ODBC Data Source Configuration do you have "Return matched rows instead of affected rows" and "Allow big result sets"
If you are importing form XMI are you stripping the GUIDs on import, this is always a good idea if you are making a copy of an existing folder in your model as having two elements with the same GUID is not ideal ;-)
I have also noticed that you asked this on Apr 14th - sorry it has taken me so long to find your request. I hope this helps!
Are you accessing your ea repository as a cloud repository please? If so, you could try to switch to access the repository as an odbc datasource, and this problem might be solved. I think it is a bug of the Sparx enterprise architect cloud service.
I am running into an issue, while I import an excel file into sql server 2005 using OpenRowSet, it works fine when excel file is closed, If excel file is open it gives an error message.
I have an excel file which is being updated 8 to 10 times in a minute by a third party software, I have to import this excel file into sql server 2005 very 10 seconds.
Any help would be higly appriciated.........
Thanks, Yogi
How do you know which rows you want to import if the file is constantly being updated? Do you have some kind of sequence number so that you can detect gaps?
If you really want to run with something like you have suggested then why not set up a process to copy the Excel spreadsheet file periodically and then get your OPENROWSET code to read from this copy.
I think it's a general principle that a file cannot, or should not, be accessed while it's open and being written to by another program.
How is the conflict resolved if a part of the file is simultaneously written and read?
That being said, I've come across a situation on one of the computers I use (not Excel, not Windows, not even a PC, so it's hardly relevant) where I could download open files by FTP.
But I use another practically identical computer where this is not possible. I don't know why, but it seems to me to be the normal situation.
This idea of copying a file, and operating on the copy fits what pjp has said. If you can get away with it ... The frequency of access you require, Yogi, seems to invite conflict between reading and writing.
Is it bad if you just stop the import when you get the error and wait 10 seconds until the next import?