Validate Data in SQL Server Table - sql

I am trying to validate the data present in SQL Server table using a stored procedure.
In one of the validation rules, i have to check whether the value of a particular column is present in another table.
Suppose i have a staging table with following columns Cat_ID, Amount, SRC_CDE
I have a 'maintable' with following columns CatID , Cat_Name
I have to validate whether the Cat_ID present in staging table exists in the 'maintable' for each row
I am using the following statement to validate
if((Select count(*) from maintable where CatID= #Cat_id) >0 )
-- Do something if data present
I want to know if there is any better way of doing the above thing other than using a select query for every row.
Can i use some sort of an array where i can fetch all the CatID from maintable and the check instead of using a select query.
Thanks

Using a left join to list all the invalid rows.
select
staging.*
from
staging
left join maintable
on staging.catid=maintable.catid
where maintable.catid is null

Related

JOIN of DB table and internal table produces "is not defined in the ABAP Dictionary"

First: I'm working on a existing code and want to add some new stuff to it and I'm really new to ABAP
My goal: I want to duplicate an existing table and remove all values that occur multiple times. That - at least I think - works. Afterwards I want to INNER JOIN this new created table with another already existing table, but unfortunately I always get the following error:
Method MethodName "newCreatedTable" is not defined in the ABAP Dictionary as a table, projection view, or database view
Additional Info: As you can see I'm working inside of a method!
Here is my code what I've done so far:
creating new table and delete all duplicates
DATA newCreatedTable TYPE STANDARD TABLE OF existingTable.
SELECT columnName
FROM existingTable INTO TABLE newCreatedTable.
DELETE ADJACENT DUPLICATES
FROM newCreatedTable COMPARING columnName.
here is where the error happens
SELECT *
FROM anotherExistingTable as p
INNER JOIN newCreatedTable as t on t~columnName = p~columnName
...
I hope someone can help me out in this case! Thank You in advance!
If I'm not wrong your code looks like this:
DATA newCreatedTable TYPE STANDARD TABLE OF existingTable.
SELECT columnName FROM existingTable INTO TABLE newCreatedTable.
DELETE ADJACENT DUPLICATES
FROM newCreatedTable COMPARING columnName.
SELECT *
FROM anotherExistingTable
INNER JOIN newCreatedTable as t on t~columnName = p~columnName
If this is the case then you cannot make a SELECT on an internal table. You can only make this operation on a table that exists in the ABAP dictionary.
Your code should look like this:
DATA newCreatedTable TYPE STANDARD TABLE OF existingTable.
SELECT columnName
FROM existingTable INTO TABLE #newCreatedTable.
DELETE ADJACENT DUPLICATES FROM newCreatedTable COMPARING columnName.
SELECT *
FROM anotherExistingTable
FOR ALL ENTRIES IN #newCreatedTable " <-------------- Use FOR ALL ENTRIES
WHERE columnName = #newCreatedTable-columnName. " <-- in your code

Merge update records in a final table

I have a user table in Hive of the form:
User:
Id String,
Name String,
Col1 String,
UpdateTimestamp Timestamp
I'm inserting data in this table from a file which has the following format:
I/U,Timestamp when record was written to file, Id, Name, Col1, UpdateTimestamp
e.g. for inserting a user with Id 1:
I,2019-08-21 14:18:41.002947,1,Bob,stuff,123456
and updating col1 for the same user with Id 1:
U,2019-08-21 14:18:45.000000,1,,updatedstuff,123457
The columns which are not updated are returned as null.
Now simple insertion is easy in hive using load in path in a staging table and then ignoring the first two fields from the stage table.
However, how would I go about the update statements? So that my final row in hive looks like below:
1,Bob,updatedstuff,123457
I was thinking to insert all rows in a staging table and then perform some sort of merge query. Any ideas?
Typically with a merge statement your "file" would still be unique on ID and the merge statement would determine whether it needs to insert this as a new record, or update values from that record.
However, if the file is non-negotiable and will always have the I/U format, you could break the process up into two steps, the insert, then the updates, as you suggested.
In order to perform updates in Hive, you will need the users table to be stored as ORC and have ACID enabled on your cluster. For my example, I would create the users table with a cluster key, and the transactional table property:
create table test.orc_acid_example_users
(
id int
,name string
,col1 string
,updatetimestamp timestamp
)
clustered by (id) into 5 buckets
stored as ORC
tblproperties('transactional'='true');
After your insert statements, your Bob record would say "stuff" in col1:
As far as the updates - you could tackle these with an update or merge statement. I think the key here is the null values. It's important to keep the original name, or col1, or whatever, if the staging table from the file has a null value. Here's a merge example which coalesces the staging tables fields. Basically, if there is a value in the staging table, take that, or else fall back to the original value.
merge into test.orc_acid_example_users as t
using test.orc_acid_example_staging as s
on t.id = s.id
and s.type = 'U'
when matched
then update set name = coalesce(s.name,t.name), col1 = coalesce(s.col1, t.col1)
Now Bob will show "updatedstuff"
Quick disclaimer - if you have more than one update for Bob in the staging table, things will get messy. You will need to have a pre-processing step to get the latest non-null values of all the updates prior to doing the update/merge. Hive isn't really a complete transactional DB - it would be preferred for the source to send full user records any time there's an update, instead of just the changed fields only.
You can reconstruct each record in the table using you can use last_value() with the null option:
select h.id,
coalesce(h.name, last_value(h.name, true) over (partition by h.id order by h.timestamp) as name,
coalesce(h.col1, last_value(h.col1, true) over (partition by h.id order by h.timestamp) as col1,
update_timestamp
from history h;
You can use row_number() and a subquery if you want the most recent record.

How to insert generated id into a results table

I have the following query
SELECT q.pol_id
FROM quot q
,fgn_clm_hist fch
WHERE q.quot_id = fch.quot_id
UNION
SELECT q.pol_id
FROM tdb2wccu.quot q
WHERE q.nr_prr_ls_yr_cov IS NOT NULL
For every row in that result set, I want to create a new row in another table (call it table1) and update pol_id in the quot table (from the above result set) with the generated primary key from the inserted row in table1.
table1 has two columns. id and timestamp.
I'm using db2 10.1.
I've tried numerous things and have been unsuccessful for quite a while. Thanks!
Simple solution: create a new table for the result set of your query, which has an identity column in it. Then, after running your query, update the pol_id field with the newly generated ID in your result table.
Alteratively, you can do it more manually by using the the ROW_NUMBER() OLAP function, which I often found convenient for creating IDs. For this it is convenient to use a stored procedure which does the following:
get the maximum old id from Table1 and write it into a variable old_max_id.
after generating the result set, write the row-numbers into the table1, maybe by something like
INSERT INTO TABLE1
SELECT ROW_NUMBER() OVER (PARTITION BY <primary-key> ORDER BY <whatever-you-want>)
+ OLD_MAX_ID
, CURRENT TIMESTAMP
FROM (<here comes your SQL query>)
Either write the result set into a table or return a cursor to it. Here you should either use the same ROW_NUMBER statement as above or directly use the ID from Table1.

How to get list of deleted records in SQL

My goal is to separate two types of data in table that is being sent to a stored procedure. In this table, I will have two kinds of records of type 1 and type 2, let's say.
I want to delete all data of type 2 from the inputted table but still have it stored in a separate temp table.
I know how to delete data with the following statement:
DELETE t
FROM #tags t
WHERE t.Type = 2
Is there a way to retrieve the deleted records so I can insert them into a separate temp table?
Otherwise I will have to have a separate code block before that looks like the following:
INSERT #dynamicTags(String)
SELECT String
FROM #tags t
WHERE t.Type = 2
Any ideas to combine the two above statements into one?
If using SQL Server you can do this with the OUTPUT clause:
DELETE t
FROM #tags t
OUTPUT DELETED.* INTO #MyTableVar
WHERE t.Type = 2
If you are using postgres you can use the returning clause:
http://www.postgresql.org/docs/9.3/static/sql-delete.html

SQL: I need to take two fields I get as a result of a SELECT COUNT statement and populate a temp table with them

So I have a table which has a bunch of information and a bunch of records. But there will be one field in particular I care about, in this case #BegAttField# where only a subset of records have it populated. Many of them have the same value as one another as well.
What I need to do is get a count (minus 1) of all duplicates, then populate the first record in the bunch with that count value in a new field. I have another field I call BegProd that will match #BegAttField# for each "first" record.
I'm just stuck as to how to make this happen. I may have been on the right path, but who knows. The SELECT statement gets me two fields and as many records as their are unique #BegAttField#'s. But once I have them, I haven't been able to work with them.
Here's my whole set of code, trying to use a temporary table and SELECT INTO to try and populate it. (Note: the fields with # around the names are variables for this 3rd party app)
CREATE TABLE #temp (AttCount int, BegProd varchar(255))
SELECT COUNT(d.[#BegAttField#])-1 AS AttCount, d.[#BegAttField#] AS BegProd
INTO [#temp] FROM [Document] d
WHERE d.[#BegAttField#] IS NOT NULL GROUP BY [#BegAttField#]
UPDATE [Document] d SET d.[#NumAttach#] =
SELECT t.[AttCount] FROM [#temp] t INNER JOIN [Document] d1
WHERE t.[BegProd] = d1.[#BegAttField#]
DROP TABLE #temp
Unfortunately I'm running this script through a 3rd party database application that uses SQL as its back-end. So the errors I get are simply: "There is already an object named '#temp' in the database. Incorrect syntax near the keyword 'WHERE'. "
Comment out the CREATE TABLE statement. The SELECT INTO creates that #temp table.