Alternative to relational DB for transactional duplicate validation processing? - sql

I'm currently using SSMS, working with two tables both which have millions of records; let's call them: latest_status table and dupe_hash table.
latest_status:
latest_status_PK - bigint - identity(1,1)
status - bigint
50 => stage-zero
100 => stage-one
200 => stage-two
999 => duplicate-failed
dupe_hash:
dupe_hash_PK - bigint - identity(1,1)
latest_status_FK (relationship with latest_status) - bigint
md5 - varchar(256)
batch_id - uniqueidentifier
My objective is to look through dupe_hash records (where latest_status.status = 100) and see if this record's md5 is the same as another record's md5; I update the latest_status.status of the these depending on their statuses. Something along the lines of:
*record_1's status = 100 & record_2's status = 200 => update record_1's status to 999
*record_1's status = 100 & record_2's status = 50 => update record_1's status to 200 and record_2's status to 999
I'm currently using set-based operations, saving this query into a table variable and updating the latest_status table with a case statement. In my partition by md5, I am ordering the subsets based on their status because in the end, I only want to keep one record (rowid = 1) and the others would be set to 999:
SELECT ROW_NUMBER() OVER(PARTITION BY DUPHASH.MD5 ORDER BY CASE WHEN LATEST_STAT.STATUS = 200 THEN 0 WHEN LATEST_STAT.STATUS = 50 THEN 1 ELSE 1 END, LATEST_STAT.latest_status_PK DESC) ROWID,
LATEST_STAT.CLAIM_OUTBOUND_LATEST_STATUS_KEY,
LATEST_STAT.STATUS,
DUPHASH.DUPLICATE_KEY,
DUPHASH.BATCH_ID,
DUPHASH.MD5
FROM DBO.DUPE_HASH DUPHASH
INNER JOIN DBO.LATEST_STATUS LATEST_STAT
WHERE MD5 IN (SELECT DISTINCT MD5
FROM DBO.DUPE_HASH DUPHASH
INNER JOIN DBO.LATEST_STATUS LATEST_STAT
ON DUPHASH.LATEST_STATUS_FK = LATEST_STAT.LATEST_STATUS_PK
WHERE STATUS = 105) ON DUPHASH.LATEST_STATUS_FK = LATEST_STAT.LATEST_STATUS_PK
This brings me to my actual question: Are there any alternatives/improvements to this approach that could support realtime/transactioal processing?
table partitioning?
non-relational DBs?
I'm really all ears. Thank you!

Related

How to retrieve data if not exist with certain condition in SQL Server?

I have 3 values
SecurityLevel - ex:1
ReportName - ex:'TotalSales'
UserID - ex:'faisal.3012'
I have 2 tables:
SecurityLevelDetails
SecurityUserDetails
I want to check the data whether it is already exists or not in SecurityUserDetails. If exist, I want to retrieve that exist record, if not I want to retrieve record from SecurityLevelDetails.
I try to make it as a single query, I can do using if condition. But I don't want to do.
I tried this. I know this is wrong.
Select
ReportHide, RColumnName, RFilterName
From
HQWebMatajer.dbo.SecurityLevelDetails sld
Where
SecurityLevel = 1
and not exists(select top 1
UserID, ReportHide, RColumnName, RFilterName
from
[HQWebMatajer].[dbo].[SecurityUserDetails]
where
[UserID] = 'faisal.3012'
and [ReportName] = 'TotalSales')
It's retrieving a record if it does not exist in SecurityUserDetails. But I want to retrieve the record from SecurityUserDetails if it exists
UPDATED
I got the result from below code. But I am trying to make in single query
declare #flags int = 0;
select top 1 #flags=count(*)
from [HQWebMatajer].[dbo].[SecurityUserDetails]
where [UserID]='faisal.3012' and [ReportName]='TotalSales';
if(#flags>0)
BEGIN
select top 1 UserID,ReportHide,RColumnName,RFilterName
from [HQWebMatajer].[dbo].[SecurityUserDetails]
where [UserID]='faisal.3012' and [ReportName]='TotalSales'
END
ELSE
BEGIN
select SecurityLevel,ReportHide,RColumnName,RFilterName
From HQWebMatajer.dbo.SecurityLevelDetails sld
where SecurityLevel=1 and ReportName='TotalSales'
END
One way to approach this is with UNION ALL
You have both sides defined and a UNION ALL joins them up
I'll stick together all of the code you have posted so far:
select top 1 UserID,ReportHide,RColumnName,RFilterName
from [HQWebMatajer].[dbo].[SecurityUserDetails]
where [UserID]='faisal.3012' and [ReportName]='TotalSales'
UNION ALL
select SecurityLevel,ReportHide,RColumnName,RFilterName
From HQWebMatajer.dbo.SecurityLevelDetails sld
where SecurityLevel=1
and ReportName='TotalSales'
and not exists(select *
from
[HQWebMatajer].[dbo].[SecurityUserDetails]
where
[UserID] = 'faisal.3012'
and [ReportName] = 'TotalSales')

SQL Combining results from multiple tables, and rows, in to one row in one table

so here's my situation.
I have two tables (keysetdata115) containing vendor information and keysetdata117 that contains either a Remit or Payment address.
Here are the structures with one sample entry:
keysetdata115:
keysetnum ks183 ks178 ks184 ks185 ks187 usagecount
2160826 1 6934 AUDIO DIGEST FOUNDATION 26-1180877 A 0
keysetdata117 (I truncated values for ks192 and ks191 to fit formatting)
keysetnum ks183 ks178 ks188 ks189 ks190 ks192 ks191 usagecount
2160827 1 6934 P001 P EBSCO... TOP OF... A 0
2160828 1 6934 R002 R EBSCO... 123 SE... A 0
There is no 1:1 relationship and the only thing that makes a unique record is the combination or Remit Code,Payment Code, vendor number and vendor group.The codes can only be obtained by referencing the address and / or name.
Ideally what I'd like to do is set this up so that I can pass in the addresses and return all the related values.
I'm dumping this in a table called 'dbo.test' right now (for testing obviously), that has the following entries and what the correspond to in the above tables: vengroup (ks183), vendnum (ks178), remit (ks188), payment (ks188)... ks188 will be a remit or payment based off the value in ks189.
This is what I'm doing so far, using 3 select queries and it works, but there's a lot of redundancy and it's very inefficient.
Any suggestions on how I can streamline it would be MUCH appreciated.
insert into dbo.test (vengroup,vendnum)
select ks183, ks178
from hsi.keysetdata115
where ks184 like 'AUDIO DIGEST%'
update dbo.test
set dbo.test.remit = y.remit
from
dbo.test tst
INNER JOIN
(Select ksd.ks188 as remit, ksd.ks183 as vengroup, ksd.ks178 as vendnum
from hsi.keysetdata117 ksd
inner join dbo.test tst
on tst.vengroup = ksd.ks183 and tst.vendnum = ksd.ks178
where ksd.ks190 like 'EBSCO%' and ks189 = 'R') y
on tst.vengroup = y.vengroup and tst.vendnum = y.vendnum
update dbo.test
set dbo.test.payment = y.payment
from
dbo.test tst
INNER JOIN
(Select ksd.ks188 as payment, ksd.ks183 as vengroup, ksd.ks178 as vendnum
from hsi.keysetdata117 ksd
inner join dbo.test tst
on tst.vengroup = ksd.ks183 and tst.vendnum = ksd.ks178
where ksd.ks190 like 'EBSCO%' and ks189 = 'P') y
on tst.vengroup = y.vengroup and tst.vendnum = y.vendnum
Thanks so much for any suggestions!
You can do what you want in one statement. You just have to do the selection on the run. The way the statement below is written, if Remit gets the value, Payment gets a null and vice versa. If you want the other value to be non-null, just add an else clause to the cases. Like then b.ks188 else 0 end.
INSERT INTO dbo.TEST( vengroup, vendnum, remit, payment )
SELECT a.ks183, a.ks178,
CASE b.ks189 WHEN 'R' THEN b.ks188 END,
CASE b.ks189 WHEN 'P' THEN b.ks188 END
FROM keysetdata115 a
JOIN keysetdata117 b
ON b.ks183 = a.ks183
AND b.ks178 = a.ks178
AND b.ks190 LIKE 'EBSCO%'
WHERE a.ks184 LIKE 'AUDIO DIGEST%';

Update 1 field in a table from another field in a different table (OS400, not a 1 to 1 relationship)

Im trying to update a field in a table from another field in a different table.
The table being updated will have multiple records that need updating from 1 match in the other table.
Example, i have a 1 million row sales history file. Those million records have aproximately 40,000 different sku codes, each row has a date and time stamp. Each sku will have multiple records in there.
I added a new field called MATCOST (material cost).
I have a second table containing SKU and the MATCOST.
So i want to stamp every line in table 1 with the corresponding SKU's MATCOST in table2. I cannot seem to achieve this when its not a 1 to 1 relationship.
This is what i have tried:
update
aulsprx3/cogtest2
set
matcost = (select Matcost from queryfiles/coskitscog where
aulsprx3/cogtest2.item99 = queryfiles/coskitscog.ITEM )
where
aulsprx3/cogtest2.item99=queryfiles/coskitscog.ITEM
But that results in the SQL error: Column qualifier or table COSKITSCOG undefined and highlighting the q in the last reference to queryfiles/coskitscog.Item
Any idea's ?
Kindest Regards
Adam
Update: This is what my tables look like in principle. 1 Table contains the sales data, the other contains the MATCOSTS for the items that were sold. I need to update the Sales Data table (COGTEST2) with the data from the COSKITCOG table. I cannot use a coalesce statement because its not a 1 to 1 relationship, most select functions i use result in the error of multiple selects. The only matching field is Item=Item99
I cant find a way of matching multiple's. In the example we would have to use 3 SQL statements and just specify the item code. But in live i have about 40,000 item codes and over a million sales data records to update. If SQL wont do it, i suppose i'd have to try write it in an RPG program but thats way beyond me for the moment.
Thanks for any help you can provide.
Ok this is the final SQL statement that worked. (there were actually 3 values to update)
UPDATE atst2f2/SAP20 ct
SET VAL520 = (SELECT cs.MATCOST
FROM queryfiles/coskitscog cs
WHERE cs.ITEM = ct.pnum20),
VAL620 = (SELECT cs.LABCOST
FROM queryfiles/coskitscog cs
WHERE cs.ITEM = ct.pnum20),
VAL720 = (SELECT cs.OVRCOST
FROM queryfiles/coskitscog cs
WHERE cs.ITEM = ct.pnum20),
WHERE ct.pnum20 IN (SELECT cs.ITEM
FROM queryfiles/coskitscog cs)
This more compact way to do the same thing should be more efficient, eh?
UPDATE atst2f2/SAP20 ct
SET (VAL520, VAL620, VAL720) =
(SELECT cs.MATCOST, cs.LABCOST, cs.OVRCOST
FROM queryfiles/coskitscog cs
WHERE cs.ITEM = ct.pnum20)
WHERE ct.pnum20 IN (SELECT cs.ITEM
FROM queryfiles/coskitscog cs)
Qualify the columns with correlation names.
UPDATE AULSPRX3/COGTEST2 A
SET A.matcost = (SELECT matcost
FROM QUERYFILES/COSKITSCOG B
WHERE A.item99 = B.item)
WHERE EXISTS(SELECT *
FROM QUERYFILES/COSKITSCOG C
WHERE A.item99 = C.item)
From UPDATE, I'd suggest:
update
aulsprx3/cogtest2
set
(matcost) = (select Matcost from queryfiles/coskitscog where
aulsprx3/cogtest2.item99 = queryfiles/coskitscog.ITEM)
where
aulsprx3/cogtest2.item99=queryfiles/coskitscog.ITEM
Note the braces around matcost.

SQL, Search by Date and not exists

I have two tables and need to search for all entries that exist in one table in another table by idProduct, only if the date (dateStamp) is less than or older than 7 days.
Because the api I'm using is restricted to only processing 3000 results at a time, the application will close and the next time I run the application I only want the idProducts that are say 3000 or greater for that idProduct, this will be run numerous times for the Suppliercode wll most likely already exist in the table.
So I've been looking at the not exists and getdate functions in sql but not been able to get the desired results.
SELECT
*
FROM
products
WHERE
(active = - 1)
AND suppliercode = 'TIT'
and (NOT EXISTS
(SELECT
idProduct
FROM compare
WHERE
(products.idProduct = idProduct)
OR (compare.dateStamp < DATEADD(DAY,-7,GETDATE()))))
Any pointers would be great, I've changed the OR to AND but it doesn't seem to bring back the correct results.
I am guessing you want to match the rows in the two tables by idProduct as right now your inner query (NOT EXISTS (SELECT idProduct FROM compare WHERE (products.idProduct = idProduct) OR (compare.dateStamp < DATEADD(DAY,-7,GETDATE())))) looks like it is finding all rows that don't match. As your subquery finds all rows that match or where the date is older than 7 days and makes sure that they don't exist.
Is this what your want?
SELECT *
FROM products as p
LEFT JOIN compare as c
ON p.idProduct = c.idProduct
WHERE p.active = -1 and p.suppliercode = 'TIT' and c.dateStamp < DATEADD(DAY,-7,GETDATE())
Have you tried this one yet?
SELECT * FROM products
WHERE (active = - 1) AND
suppliercode = 'TIT'
and ipProduct NOT IN
(
SELECT idProduct FROM compare
WHERE
(products.idProduct = idProduct) OR
(compare.dateStamp < DATEADD(DAY,-7,GETDATE()))
)
Try NOT IN instead:
...
and ProductId NOT IN
(SELECT
idProduct
FROM compare
WHERE
(products.idProduct = idProduct)
OR (compare.dateStamp < DATEADD(DAY,-7,GETDATE()))))
....

Mysql many to many query

Having a mental block with going around this query.
I have the following tables:
review_list: has most of the data, but in this case the only important thing is review_id, the id of the record that I am currently interested in (int)
variant_list: model (varchar), enabled (bool)
variant_review: model (varchar), id (int)
variant_review is a many to many table linking the review_id in review_list to the model(s) in variant_list review and contains (eg):
..
test1,22
test2,22
test4,22
test1,23
test2,23... etc
variant_list is a list of all possible models and whether they are enabled and contains (eg):
test1,TRUE
test2,TRUE
test3,TRUE
test4,TRUE
what I am after in mysql is a query that when given a review_id (ie, 22) will return a resultset that will list each value in variant_review.model, and whether it is present for the given review_id such as:
test1,1
test2,1
test3,0
test4,1
or similar, which I can farm off to some webpage with a list of checkboxes for the types. This would show all the models available and whether each one was present in the table
Given a bit more information about the column names:
Select variant_list.model
, Case When variant_review.model Is Not Null Then 1 Else 0 End As HasReview
From variant_list
Left join variant_review
On variant_review.model = variant_list.model
And variant_review.review_id = 22
Just for completeness, if it is the case that you can have multiple rows in the variant_review table with the same model and review_id, then you need to do it differently:
Select variant_list.model
, Case
When Exists (
Select 1
From variant_review As VR
Where VR.model = variant_list.model
And VR.review_id = 22
) Then 1
Else 0
End
From variant_list