Hi I have looked for ways to do this but cant seem to get a clear answer. We have a requirement to Dump any changes to a table in SQL to a CSV File i.e Whenever anyone makes a change on the frontend to a customer record it must dump those changes to a CSV file for us to Update another System. Can anyone help me with an example either using SSIS or SQL Triggers.
Related
I was asked to do copy of a csv file(which resides on a server and is updated everyday) into the database but a group of 5-6 field are the parameter which will decide whether we can enter the data or not.
The condition for insertion is that if it is a completely new entry then it will be entered if it is a copy then skip that row and if it is a different entry then update the entry in the database.
Can someone help me with how can I do this? can I do if exist query but that will be a costly operation as it has to match every record? Or any SSIS activity might help with this?
you can make form this query MERGE and when matched do nothing
https://www.mssqltips.com/sqlservertip/1704/using-merge-in-sql-server-to-insert-update-and-delete-at-the-same-time/
Problem:
I need to get data sets from CSV files into SQL Server Express (SSMS v17.6) as efficiently as possible. The data sets update daily into the same CSV files on my local hard drive. Currently using MS Access 2010 (v14.0) as a middleman to aggregate the CSV files into linked tables.
Using the solutions below, the data transfers perfectly into SQL Server and does exactly what I want. But I cannot figure out how to refresh/update/sync the data at the end of each day with the newly added CSV data without having to re-import the entire data set each time.
Solutions:
Upsizing Wizard in MS Access - This works best in transferring all the tables perfectly to SQL Server databases. I cannot figure out how to update the tables though without deleting and repeating the same steps each day. None of the solutions or links that I have tried have panned out.
SQL Server Import/Export Wizard - This works fine also in getting the data over to SSMS one time. But I also cannot figure out how to update/sync this data with the new tables. Another issue is that choosing Microsoft Access as the data source through this method requires a .mdb file. The latest MS Access file formats are .accdb files so I have to save the database in an older .mdb version in order to export it to SQL Server.
Constraints:
I have no loyalty towards MS Access. I really am just looking for the most efficient way to get these CSV files consistently into a format where I can perform SQL queries on them. From all I have read, MS Access seems like the best way to do that.
I also have limited coding knowledge so more advanced VBA/C++ solutions will probably go over my head.
TLDR:
Trying to get several different daily updating local CSV files into a program where I can run SQL queries on them without having to do a full delete and re-import each day. Currently using MS Access 2010 to SQL Server Express (SSMS v17.6) which fulfills my needs, but does not update daily with the new data without re-importing everything.
Thank you!
You can use a staging table strategy to solve this problem.
When it's time to perform the daily update, import all of the data into one or more staging tables. Execute SQL statement to insert rows that exist in the imported data but not in the base data into the base data; similarly, delete rows from the base data that don't exist in the imported data; similarly, update base data rows that have changed values in the imported data.
Use your data dependencies to determine in which order tables should be modified.
I would run all deletes first, then inserts, and finally all updates.
This should be a fun challenge!
EDIT
You said:
I need to get data sets from CSV files into SQL Server Express (SSMS
v17.6) as efficiently as possible.
The most efficient way to put data into SQL Server tables is using SQL Bulk Copy. This can be implemented from the command line, an SSIS job, or through ADO.Net via any .Net language.
You state:
But I cannot figure out how to refresh/update/sync the data at the end
of each day with the newly added CSV data without having to re-import
the entire data set each time.
It seems you have two choices:
Toss the old data and replace it with the new data
Modify the old data so that it comes into alignment with the new data
In order to do number 1 above, you'd simply replace all the existing data with the new data, which you've already said you don't want to do, or at least you don't think you can do this efficiently. In order to do number 2 above, you have to compare the old data with the new data. In order to compare two sets of data, both sets of data have to be accessible wherever the comparison is to take place. So, you could perform the comparison in SQL Server, but the new data will need to be loaded into the database for comparison purposes. You can then purge the staging table after the process completes.
In thinking further about your issue, it seems the underlying issue is:
I really am just looking for the most efficient way to get these CSV
files consistently into a format where I can perform SQL queries on
them.
There exist applications built specifically to allow you to query this type of data.
You may want to have a look at Log Parser Lizard or Splunk. These are great tools for querying and digging into data hidden inside flat data files.
An Append Query is able to incrementally add additional new records to an existing table. However the question is whether your starting point data set (CSV) is just new records or whether that data set includes records already in the table.
This is a classic dilemma that needs to be managed in the Append Query set up.
If the CSV includes prior records - then you have to establish the 'new records' data sub set inside the CSV and append just those. For instance if you have a sequencing field then you can use a > logic from the existing table max. If that is not there then one would need to do a NOT compare of the table data with the csv data to identify which csv records are not already in the table.
You state you seek something 'more efficient' - but in truth there is nothing more efficient than a wholesale delete of all records and write of all records. Most of the time one can't do that - but if you can I would just stick with it.
I have a very large data set in GPDB from which I need to extract close to 3.5 million records. I use this for a flatfile which is then used to load to different tables. I use Talend, and do a select * from table using the tgreenpluminput component and feed that to a tfileoutputdelimited. However due to the very large volume of the file, I run out of memory while executing it on the Talend server.
I lack the permissions of a super user and unable to do a \copy to output it to a csv file. I think something like a do while or a tloop with more limited number of rows might work for me. But my table doesnt have any row_id or uid to distinguish the rows.
Please help me with suggestions how to solve this. Appreciate any ideas. Thanks!
If your requirement is to load data into different tables from one table, then you do not need to go for load into file and then from file to table.
There is a component named tGreenplumRow which allows you to write direct sql queries (DDL and DML queries) in it.
Below is a sample job,
If you notice, there are three insert statements inside this component. It will be executed one by one separated by semicolon.
I have a report which uses some tables with large data. We Wrote a stored procedure to get the required data from the tables and prcoess the report output table. Since the data volume is large, we dont want to call the procedure to call entire source tables data whenever any updates happened in them. We need to update only the 'updated/changed rowsin source table into destinationtable.
what is best way to do this in Sql Server?
Thanks for the Help.
You can definetely use table triggers
Please check SQL Server Trigger Example to Log Changes
There is also a solution called CDC (Change Data Capture) in SQL Server
You can also investigate CDC
I am not sure how to ask this question so please direct me in the right direction if I am not using the appropriate terminology, etc. but I can explain what I am currently doing. I would like to know if there is an easier way to update content in the database than the method I'm currently using.
(I'm using SQL Server 2008 BTW.)
I have a bunch of CSV files that I use to give to my client as a means to update content which gets imported into the DB (because the content is LARGE). The import works by running a python script that I wrote that makes use of a Jinja2 template that generates the SQL file needed to insert the CSV content into the database (if it is a from-scratch scenario). This is working fine.
Now when it comes to data migration (I need to migrate the data that exists in the DB to a new version thereof) I have a lot of manual work (I hand code it in the template, there is no SQL command or auto-generated code that I can run to do this for me) to do.
So lets say I have a list of Hospitals in a CSV file and I already have a set of hospitals in the database (which is imported from the previous version of the CSV file). I create a copy of the Hospitals table (without the data) and call it HospitalsTemp. The new CSV hospitals are inserted into the HospitalsTemp table (at least that part is generated via the template).
The Hospitals table now gets detached from all its foreign-keys and constraints. Now I go through all the tables surrounding the Hospitals (again manually!) and replace the hospitalId which pointed to the old hospitalId with the new hospitalId (as I can do a lookup from the Hospitals to the HospitalsTemp based on the hospital code to ensure that referential integrity is retained).
Then I delete the Hospitals table and rename the HospitalsTemp to Hospitals and put back the foreign-keys and constraints on the new Hospitals table.
I hope I explained it well enough for everyone to understand. I'm really hoping for a simpler way to do this.
How do you know which hospital becomes which, do the names stay the same? Is there an Id that stays the same?
Have you looked at SSIS, and the Slowly Changing Dimension component? You can use it to update existing rows and add new rows: http://blogs.msdn.com/b/karang/archive/2010/09/29/slowly-changing-dimension-using-ssis.aspx
Also SSIS would be a good tool for the import, as it handles reading CSV files well.
You could replace the current logic with simple SSIS package that's just a flat-file data source and the output of the SCD wizard by the sounds of it?