Checking if a group of data exist in the sql server database

Checking if a group of data exist in the sql server database - sql

I was asked to do copy of a csv file(which resides on a server and is updated everyday) into the database but a group of 5-6 field are the parameter which will decide whether we can enter the data or not.
The condition for insertion is that if it is a completely new entry then it will be entered if it is a copy then skip that row and if it is a different entry then update the entry in the database.
Can someone help me with how can I do this? can I do if exist query but that will be a costly operation as it has to match every record? Or any SSIS activity might help with this?

you can make form this query MERGE and when matched do nothing
https://www.mssqltips.com/sqlservertip/1704/using-merge-in-sql-server-to-insert-update-and-delete-at-the-same-time/

Related

Find last updated date from a table in SQL database

Is there a way to find last time updated date from a table without using sys.dm_db_index_usage_stats?? I have been searching for this for an hour now but all answers I found were using this property which seems to be reset on SQL database restart.
Thanks.

You can use this property (which is greatly advised).
Or you can code your own ON UPDATE TRIGGER that will populate this table
(or another homemade) on its own.
Also if you just wish to collect some data about current usage,
you can setup a SQL Profiler that will do the job
(then parse the results somehow, Excel or whatever)
Last option, restore successively the backups you have taken (on a copy).
Hoping you have enough backup retention to find the data you're searching for.

How to identify deleted records in sql server while importing to hadoop using Sqoop

While importing data from sql server or any RDBMS database to hadoop using Sqoop, we can get newly appended records or modified records using incremental append or last modified or some free form queries.
Is there anyway we can identify deleted records? Considering when record is deleted it will not exist in sql table.
One workaround is to load full table using Sqoop and compare with previous table in hive.
Is there any other best way to do?

No, you can not get deleted records using sqoop.
A better workaround could be:
Create a boolean field status(default true) in your SQL Server table.
Whenever you need to delete that record don't delete just update with marking status false.
If you are using last-modified increment import, you will get this changed data in HDFS.
Later (after sqqop import) you can delete all these records with status false.

If you are syncing the entire partition or table then you can identify deleted records after sqoop import before merging them using full join with existing target partition or table. Records existing in target table/partition which do not exist in imported data are those deleted on source database since last sync.

Incremental sqooping does not handle deleted records out of the box. There are two approach you may want to consider.
Please look at this post.

Newly inserted or updated row count in pentaho data integration

I am new to Pentaho Data Integration; I need to integrate one database to another location as ETL Job. I want to count the number of insert/updat during the ETL job, and insert that count to another table . Can anyone help me on this?

I don't think that there's a built-in functionality for returning the number of affected rows of an Insert/Update step in PDI to date.
Nevertheless, most database vendors are able to provide you with the ability to get the number of affected rows from a given operation.
In PostgreSQL, for instance, it would look like this:
/* Count affected rows from INSERT */
WITH inserted_rows AS (
INSERT INTO ...
VALUES
...
RETURNING 1
)
SELECT count(*) FROM inserted_rows;
/* Count affected rows from UPDATE */
WITH updated_rows AS (
UPDATE ...
SET ...
WHERE ...
RETURNING 1
)
SELECT count(*) FROM updated_rows;
However, you're aiming to do that from within a PDI job, so I suggest that you try to get to a point where you control the SQL script.
Suggestion: Save the source data in a file on the target DB server, then use it, perhaps with a bulk loading functionality, to insert/update, then save the number of affected rows into a PDI variable. Note that you may need to use the SQL script step in the Job's scope.
EDIT: the implementation is a matter of chosen design, so the suggested solution is one of many. On a very high level, you could do something like the following.
Transformation I - extract data from source
Get the data from the source, be it a database or anything else
Prepare it for output in a way that it fits the target DB's structure
Save a CSV file using the text file output step on the file system
Parent Job
If the PDI server is the same as the target DB server:
Use the Execute SQL Script step to:
Read data from the file and perform the INSERT/UPDATE
Write the number of affected rows into a table (ideally, this table could also contain the time-stamp of the operation so you could keep track of things)
If the PDI server is NOT the same as the target DB server:
Upload the source data file to the server, e.g. with the FTP/SFTP file upload steps
Use the Execute SQL Script step to:
Read data from the file and perform the INSERT/UPDATE
Write the number of affected rows into a table
EDIT 2: another suggested solution
As suggested by #user3123116, you can use the Compare Fields step (if not part of your environment, check the marketplace for it).
The only shortcoming I see is that you have to query the target database before inserting/updating, which is, of course, less performant.
Eventually it could look like so (note that this is just the comparison and counting part):
Also note that you can split the input of the source data stream (COPY, not DISTRIBUTE), and do your insert/update, but this stream must wait for the stream of the field comparison to end the query on the target database, otherwise you might end up with the wrong statistics.

The "Compare Fields" step will take 2 streams as input for comparison, and its output is 4 distinct streams for "Identical", Changed", "Added", and "Removed" records. You can count those 4, and then process the "Changed", "Added", and "Removed" records with an Insert/Update.

You can do it from the Logging option inside the Transformation settings. Please follow the below steps :
Click on Edit menu --> Settings
Switch to Logging Tab
Select Step from the left menu
Provide the Log Connection & Log table name(Say StepLog)
Select the required fields for logging(LINES_OUTPUT - for inserted count & LINES_UPDATED - for updated count)
Click on SQL button and create the table by clicking on the Execute button
Now all the steps will be logged into the Log table(StepLog), you can use it for further actions.
Enjoy

SQL Trigger to dump Changes to Customer Table

Hi I have looked for ways to do this but cant seem to get a clear answer. We have a requirement to Dump any changes to a table in SQL to a CSV File i.e Whenever anyone makes a change on the frontend to a customer record it must dump those changes to a CSV file for us to Update another System. Can anyone help me with an example either using SSIS or SQL Triggers.

SQL: Tracking changes to the table that gets truncated everyday (and repulled form different srvr)

I have a table that is a replicate of a table from a different server.
Unfortunately I don't have access to the transaction information, and all I have is the table that shows "as is" information & I have a SSIS to replicate the table on my server every day (the table gets truncated, and new information is pulled every night).
Everything has been fine and good, but I want to start tracking what has changed. i.e. I want to know if a new row has been inserted or a value of a column has changed.
Is this something that could be done easily?
I would appreciate any help..
The SQL version is SQL Server 2012 SP1 | Enterprise

If you want to do this for a perticular table then you can go for a scd(slowly changing dimension) transform in SSIS control flow which will keep the hystory records in different table
or
you can create CDC(changing data capture) method on that table.CDC will help you on monitering of every DML operation in that table.It will inserted in the modified row in the system table.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Checking if a group of data exist in the sql server database - sql

you can make form this query MERGE and when matched do nothing https://www.mssqltips.com/sqlservertip/1704/using-merge-in-sql-server-to-insert-update-and-delete-at-the-same-time/

Related

Find last updated date from a table in SQL database

How to identify deleted records in sql server while importing to hadoop using Sqoop

Newly inserted or updated row count in pentaho data integration

SQL Trigger to dump Changes to Customer Table

SQL: Tracking changes to the table that gets truncated everyday (and repulled form different srvr)

Categories

Resources