Although it's much more complex than I'm about to explain, I'll try to only stick to the relevant bits of what I want to accomplish. Our data model is quite complex, and the terms are also a bit confusing. We basically have a Request, and this request can have an active Request_Status (which has an Enum_Value to indicate it's current status), as well as previous Request_Statuses that aren't relevant anymore (to preserve history). A Person is linked to this Request, but the Values that are entered are linked to the current Request_Status.
So here are those tables, and the relevant columns:
Persons:
person_id
unique_code
Some other values
Requests:
request_id
fk_person_id
year
Some other values
Enum_Values:
enum_value_id
value
Some other values
Request_Statuses:
request_status_id
fk_request_id
fk_enum_value_id
created_date
Some other values
Values:
value_id
fk_request_status_id
Some other values
I have: A list of Person.unique_codes.
I want to achieve two things:
For each Person.unique_code I want to get the Request of the year 2017, and then create a new Request_Status with fk_enum_value_id set to 4, linked to this existing Request.
Create copies of the Values that were linked to the previously active Request_Status, and set their fk_request_status_id to the currently active Request_Status (the records I've created in step 1).
I've been able to do step 1 myself with a monstrous query (but it works..)
Here is the monstrous query for step 1:
Some things to note:
- There will only be a single Request of a given year.
- There can be more than one Request_Statuses for a given Request, so finding the active is the one with the highest created_date.
- p.unique_code IN ('12345','67890') is privatized and reduced code. In reality I have about 500 person.unique_codes.
- SELECT rs1.fk_request_id, 4 /*, some other irrelevant values */ FROM Request_Statuses rs1 LEFT JOIN Request_Statuses rs2 ON (rs1.fk_request_id = rs2.fk_request_id AND rs1.created_date < rs2.created_date) WHERE rs2.created_date IS NULL is copied from this SO answer for the question "Retrieving the last record in each group". I've used the windowing function at the top before, but it wasn't really suitable for sub-queries in combination with Oracle SQL, so I've used the (probably slightly slower) original method that was posted in 2009, which does work as intended.
INSERT_INTO Request_Statuses (fk_request_id, fk_enum_value_id /*, some other irrelevant values */)
(SELECT rs1.fk_request_id, 4 /*, some other irrelevant values */ FROM Request_Statuses rs1
LEFT JOIN Request_Statuses rs2 ON (rs1.fk_request_id = rs2.fk_request_id AND rs1.created_date < rs2.created_date)
WHERE rs2.created_date IS NULL AND rs1.fk_request_id IN (SELECT r.request_id FROM Requests r
WHERE r.fk_person_id IN (SELECT p.person_id FROM Persons p
WHERE p.unique_code IN ('12345','67890')) AND r.year = 2017));
And I'm currently working on step 2.
I currently have this:
INSERT INTO Values (fk_request_status_id /* some other irrelevant values */)
(SELECT /*TODO: Get request_status_id created in step 1*/, /* some other irrelevant values */
FROM Values v1 WHERE v1.fk_request_status_id IN (SELECT rs.status_id FROM Request_Statuses rs
WHERE rs.fk_request_id IN (SELECT r.request_id FROM Requests r
WHERE r.fk_person_id IN (SELECT p.person_id FROM Persons p
WHERE p.bsn IN ('12345','67890')) AND r.year = 2017) AND (SELECT COUNT(*) FROM Values v2
WHERE v2.fk_request_status_id = rs.status_id) > 0));
All I need is to get the request_status_id of the Request_Statuses I've created in step 1, based on the same person.unique_code, and insert it at the TODO..
I've also been thinking about using a default value for now, and then update just the fk_request_status_id with a third (monstrous) query. Unfortunately, the fk_request_status_id in combination with a second column in the Values table form an unique constraint, and fk_request_status_id cannot be empty, so I can't just insert any value here to update later.. Maybe I should remove the constraints temporarily, and add them later again after the query..
PS: Performance isn't that important. I've only got around 500-750 person.unique_codes for which I have to create one new Request_Status each (and zero to about 50 Values that are potentially linked to the previous active Request_Status). It should work in under 4 hours, though. ;)
Related
I have one table named: ORDERS
this table contains OrderNumber's which belong to the same person and same address lines for that person.
However sometimes the data is inconsistent;
as example looking at the table screenshot: Orders table with bad data to fix -
you all can noticed that orderNumber 1 has a name associated to and addresses line1-2-3-4. sometimes those are all different by some character or even null.
my goal is to update all those 3 lines with one set of data that is already there and set equally all the 3 rows.
to make more clear the result expected should be like this:
enter image description here
i am currently using a MERGE statement to avoid a CURSOR (for loop )
but i am having problems to make it work
here the SQL
MERGE INTO ORDERS O USING
(SELECT
INNER.ORDERNUMBER,
INNER.NAME,
INNER.LINE1,
INNER.LINE2,
INNER.LINE3,
INNER.LINE4
FROM ORDERS INNER
) TEMP
ON( O.ORDERNUMBER = TEMP.ORDERNUMBER )
WHEN MATCHED THEN
UPDATE
SET
O.NAME = TEMP.NAME,
O.LINE1 = TEMP.LINE1,
O.LINE2 = TEMP.LINE2,
O.LINE3 = TEMP.LINE3,
O.LINE4 = TEMP.LINE4;
the biggest issues i am facing is to pick a single row out of the 3 randomly - it does not matter whihc of the data - row i pick to update the line/s
as long i make the records exaclty the same for an order number.
i also used ROWNUM =1 but it in multip[le updates will only output one row and update maybe thousand of lines with the same address and name whihch belong to an order number.
order number is the join column to use ...
kind regards
A simple correlated subquery in an update statement should work:
update orders t1
set (t1.name, t1.line1, t1.line2, t1.line3, t1.line4) =
(select t2.name, t2.line1, t2.line2, t2.line3, t2.line4
from orders t2
where t2.OrderNumber = t1.OrderNumber
and rownum < 2)
i have enabled CDC feature on one of my database. now i have below table data in cdc tables
MemberID LastName __$operation
1 David 4
1 Dave 4
2 Jimmy 4
2 Test 4
Now my problem is that i have to query the cdc table and get all the rows that are the latest one for all the members (most recent updated value). for example the query would return
MemberID LastName __$operation
1 Dave 4
2 Test 4
In addition to the _$operation column, there are also the _$start_lsn and __$seq_val columns. Ordering by those two should get you there.
You can not only determine by _$operations for CDC. If you want to do it correct use other column fields that are:
__$start_lsn
__$end_lsn
__$seqval
__$update_mask
So I'm not 100% sure I understand what you are asking for, but if you need the latest values for all the members in the table then ignore the CDC table and just query the table itself as this is where all the latest values are afterall.
If you need to see the latest values for all the members that have been changed within a certain time period, then you should use the cdc.fn_cdc_get_net_changes_(capture_instance) function, detailed here:
cdc.fn_cdc_get_net_changes
This allows you to specify a start and end date for the capture period (via the sys.fn_cdc_map_time_to_lsn function which allows you to map the LSNs to actual times) and it will then output the net changes to the table within this period.
The cdc.fn_cdc_get_net_changes_(capture_instance) changes is generated depending on your table name, so as you have not specified what this is, I have called it dbo_members, please change as required, here is an example of how you can get a list of the latest values for all changed members within the last day using the functions detailed above:
DECLARE #begin_time DATETIME ,
#end_time DATETIME ,
#begin_lsn BINARY(10) ,
#end_lsn BINARY(10);
SELECT #begin_time = GETDATE() - 1 ,
#end_time = GETDATE();
SELECT #begin_lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than',
#begin_time);
SELECT #end_lsn = sys.fn_cdc_map_time_to_lsn('largest less than or equal',
#end_time);
SELECT [MemberID] ,
[LastName]
FROM cdc.fn_cdc_get_net_changes_dbo_members(#begin_lsn, #end_lsn, 'all')
GO
As per steoleary you can simply check the data table for the latest values and ignore CDC altogether, but if you are looking to what changed with values from and to, then you will need to refer to the _$operation values 3 (deleted) and 4 (inserted) values in conjunction with the __$start_lsn. The inserted and deleted values correspond to those tables you would use when writing triggers btw.
To just see what column values changes as a precursor to actually evaluating those values, then you can use the __$update_mask column, tied into the cdc.captured_columns table which will provide you the actual column names, by implementing the sys.fn_cdc_is_bit_set(captured_columns.column_ordinal, __$update_mask) function where the result = 1.
Welcome to the wacky world of CDC and the copious amounts of late nights and caffeine hits required to even attempt to master it!
If your cdc system table name is cdc.dbo_demo_ct then with following query you will get desired result:
SELECT *
FROM (SELECT Row_number() OVER (partition BY a.MemberID ORDER BY b.tran_end_time DESC) t,
*
FROM cdc.dbo_demo_ct a
INNER JOIN cdc.lsn_time_mapping b
ON a.__$start_lsn = b.start_lsn) T
WHERE T.t = 1
I have 2 tables that I need to join to get the last/latest update in the 2nd table based on valid rows in the 1st table.
Code below is en example.
Table 1: Registered users
This table contains a list of users registered in the system.
When a user gets registered it gets added into this table. A user is registered with a name, and a registration time.
A user can get de-registered from the system. When this is done, the de-registration column gets updated to the time that the user was removed. If this value is NULL, it means that the user is still registered.
CREATE TABLE users (
entry_idx SERIAL PRIMARY KEY,
name TEXT NOT NULL,
reg_time TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
dereg_time TIMESTAMP WITH TIME ZONE DEFAULT NULL
);
Table 2: User updates
This table contains updates on the users. Each time a user changes a property (example position) the change gets stored in this table. No updates must be removed since there is a requirement to keep history in the table.
CREATE TABLE user_updates (
entry_idx SERIAL PRIMARY KEY,
name TEXT NOT NULL,
position INTEGER NOT NULL,
time TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
Required output
So given the above information, I need to get a new table that contains only the last update for the current registered users.
Test Data
The following data can be used as test data for the above tables:
-- Register 3 users
INSERT INTO users(name) VALUES ('Person1');
INSERT INTO users(name) VALUES ('Person2');
INSERT INTO users(name) VALUES ('Person3');
-- Add some updates for all users
INSERT INTO user_updates(name, position) VALUES ('Person1', 0);
INSERT INTO user_updates(name, position) VALUES ('Person1', 1);
INSERT INTO user_updates(name, position) VALUES ('Person1', 2);
INSERT INTO user_updates(name, position) VALUES ('Person2', 1);
INSERT INTO user_updates(name, position) VALUES ('Person3', 1);
-- Unregister the 2nd user
UPDATE users SET dereg_time = NOW() WHERE name = 'Person2';
From the above, I want the last updates for Person 1 and Person 3.
Failed attempt
I have tried using joins and other methods but the results are not what I am looking for. The question is almost the same as one asked here. I have used the solution in answer 1 and it does give the correct answer, but it takes too long to get too the answer in my system.
Based on the above link I have created the following query that 'works':
SELECT
t1.*
, t2.*
FROM
users t1
JOIN (
SELECT
t.*,
row_number()
OVER (
PARTITION BY
t.name
ORDER BY t.entry_idx DESC
) rn
FROM user_updates t
) t2
ON
t1.name = t2.name
AND
t2.rn = 1
WHERE
t1.dereg_time IS NULL;
Problem
The problem with the above query is that it takes very long to complete. Table 1 contains a small list of users, while table 2 contains a huge amount of updates. I think that the query might be inefficient in the way that it handles the 2 tables (based on my limited understanding of the query). From pgAdmin's explain it does a lot of sorting and aggregation on the updates 1st before joining with the registered table.
Question
How can I formulate a query to efficiently and fast get the latest updates for registered users?
PostgreSQL have a special distinct on syntax for such type of queries:
select distinct on(t1.name)
--it's better to specify columns explicitly, * just for example
t1.*, t2.*
from users as t1
left outer join user_updates as t2 on t2.name = t1.name
where t1.dereg_time is null
order by t1.name, t2.entry_idx desc
sql fiddle demo
you can try it, but for me your query should work fine too.
I am using q1 to get the last update of each user. Then joining with users to remove entries that have been deregistered. Then joining with q2 to get rest of user_update fields.
select users.*,q2.* from users
join
(select name,max(time) t from user_updates group by name) q1
on users.name=q1.name
join user_updates q2 on q1.t=q2.time and q1.name=q2.name
where
users.dereg_time is null
(I haven't tested it. have edited some things)
I have a question here that looks a little like some of the ones that I found in search, but with solutions for slightly different problems and, importantly, ones that don't work in SQL 2000.
I have a very large table with a lot of redundant data that I am trying to reduce down to just the useful entries. It's a history table, and the way it works, if two entries are essentially duplicates and consecutive when sorted by date, the latter can be deleted. The data from the earlier entry will be used when historical data is requested from a date between the effective date of that entry and the next non-duplicate entry.
The data looks something like this:
id user_id effective_date important_value useless_value
1 1 1/3/2007 3 0
2 1 1/4/2007 3 1
3 1 1/6/2007 NULL 1
4 1 2/1/2007 3 0
5 2 1/5/2007 12 1
6 3 1/1/1899 7 0
With this sample set, we would consider two consecutive rows duplicates if the user_id and the important_value are the same. From this sample set, we would only delete row with id=2, preserving the information from 1-3-2007, showing that the important_value changed on 1-6-2007, and then showing the relevant change again on 2-1-2007.
My current approach is awkward and time-consuming, and I know there must be a better way. I wrote a script that uses a cursor to iterate through the user_id values (since that breaks the huge table up into manageable pieces), and creates a temp table of just the rows for that user. Then to get consecutive entries, it takes the temp table, joins it to itself on the condition that there are no other entries in the temp table with a date between the two dates. In the pseudocode below, UDF_SameOrNull is a function that returns 1 if the two values passed in are the same or if they are both NULL.
WHILE (##fetch_status <> -1)
BEGIN
SELECT * FROM History INTO #history WHERE user_id = #UserId
--return entries to delete
SELECT h2.id
INTO #delete_history_ids
FROM #history h1
JOIN #history h2 ON
h1.effective_date < h2.effective_date
AND dbo.UDF_SameOrNull(h1.important_value, h2.important_value)=1
WHERE NOT EXISTS (SELECT 1 FROM #history hx WHERE hx.effective_date > h1.effective_date and hx.effective_date < h2.effective_date)
DELETE h1
FROM History h1
JOIN #delete_history_ids dh ON
h1.id = dh.id
FETCH NEXT FROM UserCursor INTO #UserId
END
It also loops over the same set of duplicates until there are none, since taking out rows creates new consecutive pairs that are potentially dupes. I left that out for simplicity.
Unfortunately, I must use SQL Server 2000 for this task and I am pretty sure that it does not support ROW_NUMBER() for a more elegant way to find consecutive entries.
Thanks for reading. I apologize for any unnecessary backstory or errors in the pseudocode.
OK, I think I figured this one out, excellent question!
First, I made the assumption that the effective_date column will not be duplicated for a user_id. I think it can be modified to work if that is not the case - so let me know if we need to account for that.
The process basically takes the table of values and self-joins on equal user_id and important_value and prior effective_date. Then, we do 1 more self-join on user_id that effectively checks to see if the 2 joined records above are sequential by verifying that there is no effective_date record that occurs between those 2 records.
It's just a select statement for now - it should select all records that are to be deleted. So if you verify that it is returning the correct data, simply change the select * to delete tcheck.
Let me know if you have questions.
select
*
from
History tcheck
inner join History tprev
on tprev.[user_id] = tcheck.[user_id]
and tprev.important_value = tcheck.important_value
and tprev.effective_date < tcheck.effective_date
left join History checkbtwn
on tcheck.[user_id] = checkbtwn.[user_id]
and checkbtwn.effective_date < tcheck.effective_date
and checkbtwn.effective_date > tprev.effective_date
where
checkbtwn.[user_id] is null
OK guys, I did some thinking last night and I think I found the answer. I hope this helps someone else who has to match consecutive pairs in data and for some reason is also stuck in SQL Server 2000.
I was inspired by the other results that say to use ROW_NUMBER(), and I used a very similar approach, but with an identity column.
--create table with identity column
CREATE TABLE #history (
id int,
user_id int,
effective_date datetime,
important_value int,
useless_value int,
idx int IDENTITY(1,1)
)
--insert rows ordered by effective_date and now indexed in order
INSERT INTO #history
SELECT * FROM History
WHERE user_id = #user_id
ORDER BY effective_date
--get pairs where consecutive values match
SELECT *
FROM #history h1
JOIN #history h2 ON
h1.idx+1 = h2.idx
WHERE h1.important_value = h2.important_value
With this approach, I still have to iterate over the results until it returns nothing, but I can't think of any way around that and this approach is miles ahead of my last one.
I am currently using a SQL Server Agent job to create a master user table for my in-house web applications, pulling data from 3 other databases; Sharepoint, Practice Management System and Our HR Database.
Currently it goes...
truncate table my_tools.dbo.tb_staff
go
insert into my_tools.dbo.tb_staff
(username
,firstname
,surname
,chargeoutrate)
select right(wss.nt_user_name,
,hr.firstname
,hr.surname
,pms.chargeoutrate
from sqlserver.pms.dbo.staff as pms
inner join sqlserver.wss_content.dbo.vw_staffwss as wss
on pms.nt_user_name = wss.nt_user_name
inner join sqlserver.hrdb.dbo.vw_staffdetails as hr
on wss.fullname = hr.knownas
go
The problem is that the entire table is cleared as the first step so my auto increment primary key/identified on tb_staff is certain to change. Also if someone is removed from sharepoint or the PMS they will not be recreated on this table and this will cause inconsistencies throughout the database.
I want to preserve entries in this table, even after they are removed from one of the other systems.
I suppose what I want to do is:
1) Mark all exiting entries in tb_staff as inactive (using a column called active and set it to false)
2) Run the query on the three joined tables and update every found record, also marking them as active.
I can't see how I can nest a select statement within an Update statement like I have here with the Insert statement.
How can I achieve this please?
*please note I have edited my SQL down to 4 columns and simplified it so small errors are probably due to rushed editing. The real query is far bigger.
WITH source AS(
SELECT RIGHT(wss.nt_user_name, 10) nt_user_name, /*Or whatever - this is invalid in the original SQL*/
hr.firstname,
hr.surname,
pms.chargeoutrate
FROM staff AS pms
INNER JOIN vw_staffwss AS wss
ON pms.nt_user_name = wss.nt_user_name
INNER JOIN vw_staffdetails AS hr
ON wss.fullname = hr.knownas
)
MERGE
INTO tb_staff
USING source
ON source.nt_user_name= tb_staff.username /*Or whatever you are using as the key */
WHEN MATCHED
THEN UPDATE SET active=1 /*Can synchronise other columns here if needed*/
WHEN NOT MATCHED BY TARGET
THEN INSERT (username, firstname, surname, chargeoutrate, active) VALUES (nt_user_name,firstname, surname, chargeoutrate, 1)
WHEN NOT MATCHED BY source
THEN UPDATE SET active=0;