SQL MERGE Conditional Delete on MATCH

SQL MERGE Conditional Delete on MATCH - sql

I have two tables that I use to merge. Something is wrong with my query and I cannot find any information as to why. What happens is; I will run the query and it works fine as my target table populates with the information I want. I then run the query again (immediately after) and it changes (38 rows affected) I run it again and it adds the rows back in - and again... rows are deleted. There are over 100 rows but only the same rows seem to be affected. Nothing changes in the source table.
I suspect that I am performing a DELETE on WHEN MATCHED - I can only find information for DELETE WHEN NOT MATCHED - but I don't know as this only seems to affect the same 38 rows every time.
The tables and (hopefully) information you need are here in my code:
MERGE QA.dbo.RMA AS target USING Touchstn02.dbo.RMA_Detail AS source
ON (target.RMANUM_52 = source.RMANUM_52)
WHEN MATCHED AND (source.STATUS_52 >3)
THEN
DELETE
WHEN NOT MATCHED AND (source.STATUS_52 < 4) AND (source.RECQTY_52 > 0)
THEN
INSERT (RMANUM_52, RMADATE_52, CUSTID_52, RETNUM_52, RETQTY_52,
SERIAL_52, REPAIR_52,
RPLACE_52, CREDIT_52, WRKNUM_52, KEYWRD_52, RECQTY_52,
RECDTE_52, STATUS_52,
REM1_52, REM2_52, REM3_52, Comment, CMPDTE)
VALUES (source.RMANUM_52, source.RMADTE_52, source.CUSTID_52,
source.RETNUM_52,
source.RETQTY_52, source.SERIAL_52, source.REPAIR_52,
source.RPLACE_52, source.CREDIT_52,
source.WRKNUM_52, source.KEYWRD_52, source.RECQTY_52,
source.RECDTE_52,
source.STATUS_52, source.REM1_52, source.REM2_52,
source.REM3_52, source.REM4_52,
source.CMPDTE_52);
As always, I appreciate any help/input

I think I have got it!
The 38 rows being deleted have at least two STATUS_52 values in the source table for these RMANUM_52 key values.
For a given RMANUM_52 key value:
One of the STATUS_52 values will be < 4 and have a Qty > 0.
The other row Status value will be > 3.
So, on the first run through, a row is inserted with a STATUS_52 value < 4.
Then on next run through the DELETE is triggered because the matching is by RMANUM_52 and the STATUS_52 in the SOURCE table. The tricky thing here is that we are looking back for status values into all rows in the source table (including rows that were not part of the first insert). And there is a different row in the source table that matches RMANUM_52 with a STATUS_52 > 3. So DELETE logic matches and delete happens.
I can't tell from your example exactly what your requirements are so I'm not going to hazard guessing a fix.

Related

Looping/Iterating a table query in Bigquery

I am using BigQuery and I don´t know how to loop a table that is in a database here. For example, lets suppose we have schema_A.tableA with the following information
Table A
Originally the TableA.columnA holds the information for the rest row. The columnE is the calculation of the other three columns. But what I am looking for is to iterate/loop in a column the result coming from E (LAG(columnE)) and generate the calculation for the second row. The third row would take the result of columnE_2row and so on.
The desired output is like this :
For example the 2 row- columnA is using 500 because the result of the previous row is 500. In the third row is 300 because that was the result of columnE_row2 and so on. I don´t know how looping works in BigQuery, I would really appreciate your knowledge
Please help!!!
So far, I read some threads but none of them shows how to set a variable from a query, all are loops from 0. https://towardsdatascience.com/loops-in-bigquery-db137e128d2d

Unique Rows that match a specific criteria

My data is Microsoft Office 365 Mailbox audit logs.
I am working with 14 columns, incorporating names, timestamps, IP addresses, etc.
I have two tables, lets call them EXISTING and NEW. The column definition, order and count are identical in the two tables.
The data in Existing is (very close to!) Distinct.
The data in New is drawn from multiple overlapping searches and is not Distinct.
There are about millions of rows in Existing and hundreds of thousands in New.
Data is being written to New all the time, 24x7, with about 1 million rows a day being added.
~95% of the Rows in New are already present in Existing and are therefore unwanted duplicates. However the data in New contains has many gaps, there are many recent rows in Existing that are NOT present in New.
Want to select all rows from New that are not present in Existing, using Invoke-SqlCmd in Powershell.
Then want to delete all the processed rows from New so it doesn't grow uncontrollably.
My approach so far has been:
Add a [Processed] column to New.
Set [Processed] to 0 for all existing data for selection purposes. New rows that continue to be added will have [Processed] = NULL, and will be left alone.
SELECT DISTINCT all data with [Processed] = 0 from New and copy to a table temporary table called Staging. Find the oldest timestamp ([LastAccessed]) in this data. Then delete all rows from New with [Processed] = 0.
Copy all data from Existing with [LastAccessed] equal to or later to above time stamp across to STAGING, adding the column [Processed] = 1.
Now I want all data in Staging where [Processed] = 0 and there is No duplicate.
Nearest concept I can come up with is:
SELECT MailboxOwnerUPN
,MailboxResolvedOwnerName
,LastAccessed
,ClientIPAddress
,ClientInfoString
,MailboxGuid
,Operation
,OperationResult
,LogonType
,ExternalAccess
,InternalLogonType
,LogonUserDisplayName
,OriginatingServer
FROM dbo.Office365Staging
GROUP BY MailboxOwnerUPN
,MailboxResolvedOwnerName
,LastAccessed
,ClientIPAddress
,ClientInfoString
,MailboxGuid
,Operation
,OperationResult
,LogonType
,ExternalAccess
,InternalLogonType
,LogonUserDisplayName
,OriginatingServer
HAVING Count(1) = 1 and Processed = 0;
Which of course I can't do because [Processed] isn't part of the Select or Group. If I add the Column [Processed] then all lines are unique and there are no duplicates. Have tried a variety of joins and other techniques, without success thus far.
Initially without [Processed] = 0, the query worked, but returned unwanted unique lines from Existing. I only want unique lines from New.
Clearly due to the size of these structures efficiency is a consideration. This process will be happening regularly, every 15 minutes ideally.
Identifying these new lines then starts another process of Geo-IP, reputation, alerting, etc in PowerShell....

Thought the performance of the following would be horrid, but it is OK at ~27 seconds....
SELECT [MailboxOwnerUPN]
,[MailboxResolvedOwnerName]
,[LastAccessed]
,[ClientIPAddress]
,[ClientInfoString]
,[MailboxGuid]
,[Operation]
,[OperationResult]
,[LogonType]
,[ExternalAccess]
,[InternalLogonType]
,[LogonUserDisplayName]
,[OriginatingServer]
FROM dbo.New
WHERE [Processed] = 1 and
NOT EXISTS (Select * From dbo.Existing
Where New.LastAccessed = Existing.LastAccessed and
New.ClientIPAddress = Existing.ClientIPAddress and
New.ClientInfoString = Existing.ClientInfoString and
New.MailboxGuid = Existing.MailboxGuid)
GO

SQL Query stopped working and cant figure out why

Basically I am using MS Access 2013 to import all active work items that are assigned to a specific group from an API and select the data into 2 new tables (Requests & Request_Tasks).
I then have a form sourced from a query to select specific fields from the 2 tables.
Until yesterday it was working with no problems and nothing has changed.
All of the data appears in the 2 tables so the import from the API works fine.
When it comes to the query selecting the data from the 2 tables (Which are already populated with the correct data) the query returns only data from Requests table with blank fields instead of data from Request_Tasks.
The strange part is that out of 28 active work items it returns 24 correctly and the last 4 are having the problem.
Every new task added to the group has the problem also.
Query is below.
SELECT
Request_Tasks.RQTASK_Number,
Request_Tasks.Request_Number,
Requests.Task, Requests.Entity,
Request_Tasks.Description,
Request_Tasks.Request_Status,
Requests.Requested_for_date,
Request_Tasks.Work_On_Date,
Request_Tasks.Estimated_Time,
Request_Tasks.Actual_Time_Analysis,
Request_Tasks.Offers_Built,
Request_Tasks.Number_of_links_Opened,
Request_Tasks.Number_of_Links_Extended,
Request_Tasks.Number_Of_links_closed,
Request_Tasks.Build_Allocated_to,
Request_Tasks.Buld_Review_Allocated_to,
Request_Tasks.Keying_Allocated_to,
Request_Tasks.Keying_Approval_allocated_to,
Request_Tasks.Actual_Build_Time,
Request_Tasks.Actual_Stakeholder_Support,
Request_Tasks.Task_Completed_Date
FROM Request_Tasks
RIGHT JOIN Requests
ON Request_Tasks.Request_Number = Requests.Request_Number
WHERE (((Request_Tasks.Task_Completed_Date)>=Date()
Or (Request_Tasks.Task_Completed_Date) Is Null)
AND ((Requests.Task)<>"7"
And (Requests.Task)<>"8" And (Requests.Task)<>"9"))
OR (((Request_Tasks.Task_Completed_Date)>=Date()
Or (Request_Tasks.Task_Completed_Date) Is Null)
AND ((Requests.Task)<>"7"
And (Requests.Task)<>"8"
And (Requests.Task)<>"9"))
ORDER BY Request_Tasks.Work_On_Date Is Null DESC , Request_Tasks.Work_On_Date, Requests.Entity Is Null DESC , Requests.Task;
Any help would be great.
Thanks.

The query is using RIGHT JOIN, which means rows from Requests table is always reported even if there is no corresponding entry in Request_tasks table.
A full example is here http://www.w3schools.com/Sql/sql_join_right.asp
In your case, most likely somechange might have happened during data load/API and Request_tasks table is not being populated. That is the reason you see blank data for fields from that table.
Solution
Manually check data for 4 faulty records in Request_tasks table.
Ensure keys in both table request_number are matching including data type and any leading space/non printable characters (if they are string type of data) for faulty records.
Query seems fine, its more of issue with data based on problem statement.

Find lowest value and update other table with that value

I have 2 tables in SQL : Event and Swimstyle
The Event table has a value SwimstyleId which refers to Swimstyle.id
The Swimstyle table has 3 values : distance, relaycount and strokeid
Normally there would be somewhere between 30 and 50 rows in the table Swimstyle, which would hold all possible values (these are swimming distances like 50 (distance), 1 (relaycount), FREE (strokeid)).
However, due to a programming mistake the lookup for existing values didn't work and the importer of new results created a new swimstyle entry for each event added...
My Swimstyle table now consists of almost 200k rows, which ofcourse is performance wise not the best idea ;)
To fix this i want to go through all Events, get the swimstyle values that are attached, lookup the first existing row in Swimstyle that has the same distance, relaycount and strokeid values and update the Event.SwimstyleId with that value.
When this is all done i can delete all orphaned Swimstyle rows, leaving a table with only 30-50 rows.
I have been trying to make a query that does this, but not getting anywhere. Anyone to point me in the right direction ?

These 2 statements should fix the problem, if I've read it right. N.B. I haven't been able to try this out anywhere, and I've made a few assumptions about the table structure.
UPDATE event e
set swimstyle_id = (SELECT MIN(s_min.id)
FROM swimstyle s_min,swimstyle s_cur
WHERE s_min.distance = s_cur.distance
AND s_min.relaycount = s_cur.relaycount
AND s_min.strokeid = s_cur.strokeid
AND s_cur.id = e.swimstyle_id);
DELETE FROM swimstyle s
WHERE NOT EXISTS (SELECT 1
FROM event e
WHERE e.swimstyle_id = s.id);

MS SQL 2000 - How to efficiently walk through a set of previous records and process them in groups. Large table

I'd like to consult one thing. I have table in DB. It has 2 columns and looks like this:
Name...bilance
Jane...+3
Jane...-5
Jane...0
Jane...-8
Jane...-2
Paul...-1
Paul...2
Paul....9
Paul...1
...
I have to walk through this table and if I find record with different "name" (than was on previous row) I process all rows with the previous "name". (If I step on the first Paul row I process all Jane rows)
The processing goes like this:
Now I work only with Jane records and walk through them one by one. On each record I stop and compare it with all previous Jane rows one by one.
The task is to sumarize "bilance" column (in the scope of actual person) if they have different signs
Summary:
I loop through this table in 3 levels paralelly (nested loops)
1st level = search for changes of "name" column
2nd level = if change was found, get all rows with previous "name" and walk through them
3rd level = on each row stop and walk through all previous rows with current "name"
Can this be solved only using CURSOR and FETCHING, or is there some smoother solution?
My real table has 30 000 rows and 1500 people and If I do the logic in PHP, it takes long minutes and than timeouts. So I would like to rewrite it to MS SQL 2000 (no other DB is allowed). Are cursors fast solution or is it better to use something else?
Thank you for your opinions.
UPDATE:
There are lots of questions about my "summarization". Problem is a little bit more difficult than I explained. I simplified it just to describe my algorithm.
Each row of my table contains much more columns. The most important is month. That's why there are more rows for each person. Each is for different month.
"Bilances" are "working overtimes" and "arrear hours" of workers. And I need to sumarize + and - bilances to neutralize them using values from previous months. I want to have as many zeroes as possible. All the table must stay as it is, just bilances must be changed to zeroes.
Example:
Row (Jane -5) will be summarized with row (Jane +3). Instead of 3 I will get 0 and instead of -5 I will get -2. Because I used this -5 to reduce +3.
Next row (Jane 0) won't be affected
Next row (Jane -8) can not be used, because all previous bilances are negative
etc.

You can sum all the values per name using a single SQL statement:
select
name,
sum(bilance) as bilance_sum
from
my_table
group by
name
order by
name

On the face of it, it sounds like this should do what you want:
select Name, sum(bilance)
from table
group by Name
order by Name
If not, you might need to elaborate on how the Names are sorted and what you mean by "summarize".

I'm not sure what you mean by this line... "The task is to sumarize "bilance" column (in the scope of actual person) if they have different signs".
But, it may be possible to use a group by query to get a lot of what you need.
select name, case when bilance < 0 then 'negative' when bilance >= 0 then 'positive', count(*)
from table
group by name, bilance
That might not be perfect syntax for the case statement, but it should get you really close.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas