I have a bunch of xml files that is about 700 GB in size.
I'm going to load the data within those files into a SQL Server 2008 database table(tabular data).
In addition to the fields that will hold the data in a tabular format, the table will contain a field of SQL Server XML type that holds the xml data as a whole.
I want to use the FILESTREAM feature of SQL Server 2008 instead of loading the whole xml into the field.
I want to know the benefits the performance of the queries that will be made on such a very large-table will gain and the pros and cons of this feature.
Thank you in advance.
I do not expect this will ever be marked as the answer because the true answer will only be discovered after an through study of available solutions.
BUT
The answer I have is really a question for you. How are you going to use this data? IF your are going to shread the xml to retrieve the reporting values and keep the complete xml for a reference then I would goto Filestream. If you are going to run reports directly from the xml then you will have to load the data into the database creating the needed indexes.
Loading all data into SQL Server as a combination of shreaded xml and an xml datatype
PRO
All data is avaiable all the time from one source
A single backup contains all data
Additional data from XML can be shreaded to enhance reports on server side
CON
- Backup size
- Backup time
- Slow if data is in native XML
Loading values from XML into SQL Server and using Filestream
PRO
Data source (filestream) is tied to data values
Source data can be presented to client
Con
Filestream content is not available directly from within query
Filestream and SQL backups to syncronize for disaster recovery
Be aware of your storage needs for backups and the maintenaince window need.
Related
I need to convert all varchar columns in about 40 tables (filled with the data) to nvarchar columns. It is planned to happen in a dedicated MS SQL server used only for the purpose. The result should be moved to Azure SQL.
Where should be the conversion done: on the old SQL, or after moving it on Azure SQL Server?
According to Remus Rusanu's answer https://stackoverflow.com/a/8157951/1346705, new nvarchar columns are created in the process, and the old varchar columns are dropped. The space can be reclaimed by DBCC CLEANTABLE or using ALTER TABLE ... REBUILD. Are the dropped varchar columns packed into the backup table, or does the backup/restore also remove the dropped columns?
Can the process be somehow automated using a universal SQL script? Or is it necessary to write the script for each individual table?
Context: We are the 3rd party with respect to the enterprise information system. Our product reads from the information system SQL database and presents the data the way that would otherwise be expensive to implement in the IS. The enterprise information system is now migrated to the new version and is to be run on Azure SQL. The database of the IS have been changed heavily, and one of the changes was to abandon the old 8-bit text encoding (varchar) and to use Unicode instead (nvarchar). Our system was used also for collecting data typed manually -- using the same encoding that the old IS used.
Migration is to be done via doing old version of backup (SqlCmd that produces xxx.bak files), restoring on another good old SQL server. Then we run the script that removes all the tables, views, and stored procedures that can be reconstructed from the IS. One of the main reasons is that the SQL code uses features that are not accepted by the new backup tool SqlPackage.exe to produce xxx.bacpac file. Then the bacpac file is restored in Azure SQL.
Where should be the conversion done: on the old SQL, or after moving it on Azure SQL Server?
I would do it on local SQLServer First,Running this on Azure database,might cause you to run into some issues like hitting your DTU limits,disk IO throttling..
Are the dropped varchar columns packed into the backup table, or does the backup/restore also remove the dropped columns?
The space wont be released back to filesystem,also backup doesn't process free spaces,so you will not see much change there.You might want to read more on dbcc cleantable though,before proceeding ..
Can the process be somehow automated using a universal SQL script? Or is it necessary to write the script for each individual table?
It can be automated,may be you can use dynamic sql to see the column type and process further.You will also have to see if any of those columns are part of indexes,if so you have to drop them first
I suggest making the schema changes beforehand on the old instances. Even if you don't bother cleaning up space with DBCC CLEAANTABLE or ALTER...REBUILD, the resultant bacpac size will be the same because, unlike a physical backup/restore, a bacpac file is just a compressed package format of schema and data.
Consider using SQL Server Data Tools (SSDT) to facilitate the schema changes. This will consider all the dependencies (constraints, indexes, etc.) that is a challenge with a "universal" T-SQL solution. SSDT will generally generate a migration script that employs temp tables for such schema changes so the end result won't have wasted space in your old database. However, you will need sufficient unused space in the database to contain the old/new objects side-by-side.
Problem:
I need to get data sets from CSV files into SQL Server Express (SSMS v17.6) as efficiently as possible. The data sets update daily into the same CSV files on my local hard drive. Currently using MS Access 2010 (v14.0) as a middleman to aggregate the CSV files into linked tables.
Using the solutions below, the data transfers perfectly into SQL Server and does exactly what I want. But I cannot figure out how to refresh/update/sync the data at the end of each day with the newly added CSV data without having to re-import the entire data set each time.
Solutions:
Upsizing Wizard in MS Access - This works best in transferring all the tables perfectly to SQL Server databases. I cannot figure out how to update the tables though without deleting and repeating the same steps each day. None of the solutions or links that I have tried have panned out.
SQL Server Import/Export Wizard - This works fine also in getting the data over to SSMS one time. But I also cannot figure out how to update/sync this data with the new tables. Another issue is that choosing Microsoft Access as the data source through this method requires a .mdb file. The latest MS Access file formats are .accdb files so I have to save the database in an older .mdb version in order to export it to SQL Server.
Constraints:
I have no loyalty towards MS Access. I really am just looking for the most efficient way to get these CSV files consistently into a format where I can perform SQL queries on them. From all I have read, MS Access seems like the best way to do that.
I also have limited coding knowledge so more advanced VBA/C++ solutions will probably go over my head.
TLDR:
Trying to get several different daily updating local CSV files into a program where I can run SQL queries on them without having to do a full delete and re-import each day. Currently using MS Access 2010 to SQL Server Express (SSMS v17.6) which fulfills my needs, but does not update daily with the new data without re-importing everything.
Thank you!
You can use a staging table strategy to solve this problem.
When it's time to perform the daily update, import all of the data into one or more staging tables. Execute SQL statement to insert rows that exist in the imported data but not in the base data into the base data; similarly, delete rows from the base data that don't exist in the imported data; similarly, update base data rows that have changed values in the imported data.
Use your data dependencies to determine in which order tables should be modified.
I would run all deletes first, then inserts, and finally all updates.
This should be a fun challenge!
EDIT
You said:
I need to get data sets from CSV files into SQL Server Express (SSMS
v17.6) as efficiently as possible.
The most efficient way to put data into SQL Server tables is using SQL Bulk Copy. This can be implemented from the command line, an SSIS job, or through ADO.Net via any .Net language.
You state:
But I cannot figure out how to refresh/update/sync the data at the end
of each day with the newly added CSV data without having to re-import
the entire data set each time.
It seems you have two choices:
Toss the old data and replace it with the new data
Modify the old data so that it comes into alignment with the new data
In order to do number 1 above, you'd simply replace all the existing data with the new data, which you've already said you don't want to do, or at least you don't think you can do this efficiently. In order to do number 2 above, you have to compare the old data with the new data. In order to compare two sets of data, both sets of data have to be accessible wherever the comparison is to take place. So, you could perform the comparison in SQL Server, but the new data will need to be loaded into the database for comparison purposes. You can then purge the staging table after the process completes.
In thinking further about your issue, it seems the underlying issue is:
I really am just looking for the most efficient way to get these CSV
files consistently into a format where I can perform SQL queries on
them.
There exist applications built specifically to allow you to query this type of data.
You may want to have a look at Log Parser Lizard or Splunk. These are great tools for querying and digging into data hidden inside flat data files.
An Append Query is able to incrementally add additional new records to an existing table. However the question is whether your starting point data set (CSV) is just new records or whether that data set includes records already in the table.
This is a classic dilemma that needs to be managed in the Append Query set up.
If the CSV includes prior records - then you have to establish the 'new records' data sub set inside the CSV and append just those. For instance if you have a sequencing field then you can use a > logic from the existing table max. If that is not there then one would need to do a NOT compare of the table data with the csv data to identify which csv records are not already in the table.
You state you seek something 'more efficient' - but in truth there is nothing more efficient than a wholesale delete of all records and write of all records. Most of the time one can't do that - but if you can I would just stick with it.
I want to store large files in SQL Server 2012. I have been suggested to use BLOB. All I want to do is to create a table which map the Employee id and the path of his image in database. Whenever user want to access the image he will get the path from the database first and then get the image from referenced database using BLOB.
Can you help me how to access different database from one database.
Generally speaking for large files (over 1 MB, but not a rule) you should use FILESTREAM (Overview) which stores the files on filesystem and not in the database itself.
See this article for a guide to set up using FILESTREAM in your database.
As for your question "Can you help me how to access different database from one database." Referencing objects in SQL is done with dot notation like this
databasename.schemaname.tablename
So you can use it to reference objects (tables) in different databases. For more info see Using Identifiers As Object Names not to reiterate what's there already.
I am using SQL Server 2008 R2 on my local PC. In my database there is one table with approx 98000 rows. Now I want to transfer that data directly to online server database. I have tried by making script of that table but when I run that script, it gives me error of insufficient memory. plz help me.. how can I do this. Thanks
There are a variety of strategies you can employ in this instance. Here's a few off the top of my head...
Got some .NET programming up your sleeve? Try the SqlBulkCopy class
Export the data to a transferable format, e.g. CSV file and then use BULK INSERT to insert the data.
Try using OPENROWSET to copy from the local to remote. Stackoverflow example
If you've got the full leverage of SSIS, example of SSIS is just here
A bit Heath Robinson but why not grab the data out into CSV and using some Excel skills, build the individual statements yourself. Example here using INSERT INTO and UNION
HTH
I need to determine file type (i.e., MimeType) of stored data in the SQL Server 2008.
Is there anyway, if possible using SQL query, to identify the content type or MimeType of the binary data stored in the image column?
I think that, if you need that information, it would probably be better to store it in a separate column. Once it's in the DB, your only options really are guessing it from the file name (if you happen to store that) or by detecting the signature from the first few bytes of data.
There is no direct way in SQL Server to do that - there's no metadata on binary columns stored inside SQL Server, unless you've done it yourself.
For SQL Server, a blob is a blob is a blob - it's just a bunch of bytes, and SQL Server knows nothing about it, really. You need to have that information available from other sources, e.g. by storing a file name, file extension, mime type or something else in a separate column.
Marc