Selecting Sorted Records Prior to Target Record - sql

The background to this question is that we have had to hand-roll replication between a 3rd party Oracle database and our SQL Server database since there are no primary keys defined in the Oracle tables but there are unique indexes.
In most cases the following method works fine: we load the values of the columns in the unique index along with an MD5 hash of all column values from each corresponding table in the Oracle and SQL Server databases and are able to then calculate what records need to be inserted/deleted/updated.
However, in one table the sheer number of rows precludes us from loading all records into memory from the Oracle and SQL Server databases. So we need to do the comparison in blocks.
The method I am considering is: to query the first n records from the Oracle table and then - using the same sort order - to query the SQL Server table for all records up to the last record that was returned from the Oracle database and then compare the two data sets for what needs to be inserted/deleted/updated.
Then once that has been done to load the next n records from the Oracle database and query the records in the SQL Server table that when sorted in the same way fall between (and include) the first and last records in that data set.
My question is: how to achieve this in SQL Server? If I have the values of the nth record (having queried the table in Oracle with a certain sort order) how can I return the range of records up to and including the record with those values from SQL Server?
Example
I have the following table:
| Id | SOU_ORDREF | SOU_LINESEQ | SOU_DATOVER | SOU_TIMEOVER | SOU_SEQ | SOU_DESC |
|-----------------------------------------------------|------------|-------------|-------------------------|------------------|---------|------------------------|
| AQ000001_10_25/07/2004 00:00:00_14_1 | AQ000001 | 10 | 2004-07-2500:00:00.000 | 14 | 1 | Black 2.5mm Cable |
| AQ000004_91_26/07/2004 00:00:00_15.4833333333333_64 | AQ000004 | 91 | 2004-07-26 00:00:00.000 | 15.4333333333333 | 63 | 2.5mm Yellow Cable |
| AQ000005_31_26/07/2004 00:00:00_10.8333333333333_18 | AQ000005 | 31 | 2004-07-26 00:00:00.000 | 10.8333333333333 | 18 | Rotary Cam Switch |
| AQ000012_50_26/07/2004 00:00:00_11.3_17 | AQ000012 | 50 | 2004-07-26 00:00:00.000 | 11.3 | 17 | 3Mtr Heavy Gauge Cable |
The Id field is basically a concatenation of the five fields which make up the unique index on the table i.e. SOU_ORDREF, SOU_LINESEQ, SOU_DATOVER, SOU_TIMEOVER, and SOU_SEQ.
What I would like to do is to be able to query, for example, all the records (when sorted by those columns) up to the record with the Id 'AQ000005_31_26/07/2004 00:00:00_10.8333333333333_18' which would give us the following result (I'll just show the ids):
| Id |
|-----------------------------------------------------|
| AQ000001_10_25/07/2004 00:00:00_14_1 |
| AQ000004_91_26/07/2004 00:00:00_15.4833333333333_64 |
| AQ000005_31_26/07/2004 00:00:00_10.8333333333333_18 |
So, the query has not included the record with Id 'AQ000012_50_26/07/2004 00:00:00_11.3_17' since it comes after 'AQ000005_31_26/07/2004 00:00:00_10.8333333333333_18' when we order by SOU_ORDREF, SOU_LINESEQ, SOU_DATOVER, SOU_TIMEOVER, and SOU_SEQ.

Related

Distinct performance in Redshift

I am trying to populate a multiple dimension tables from single Base table.
Sample Base Table:
| id | empl_name | emp_surname | country | dept | university |
|----|-----------|-------------|---------|------|------------|
| 1 | AAA | ZZZ | USA | CE | U_01 |
| 2 | BBB | XXX | IND | CE | U_01 |
| 3 | CCC | XXX | CAN | IT | U_02 |
| 4 | CCC | ZZZ | USA | MECH | U_01 |
Required Dimension tables :
emp_name_dim with values - AAA,BBB,CCC
emp_surname_dim with values - ZZZ,XXX
country_dim with values - USA,IND,CAN
dept_dim with values - CE,IT,MECH
university_dim with values - U_01,U_02
Now to populate above dimension tables from base table, I am thinking of 2 approaches
Get distinct values from base table for all above columns combination, create single temp table out of that and use that temp table for subsequent individual dimension table creation. Here, I will be reading data from base table only once but with more column combination.
Create separate temp tables for distinct values specific to each dimension. This way we need to read base table for multiple times, but created temp table will be smaller(i.e. less number of rows and only single column's distinct values).
Which approach is better if we consider for performance?
Note :
Base table is huge containing millions of rows.
Above columns are just for sample. In actual table there are around 50 columns for
which I need to consider for distinct combination.
Scanning the large table only once is the way to go.
Also there is another way to get the distinct values which in some cases will be faster than distinct. As an alternative approach perform a "group by" on all the columns. Run this as a bake-off to see which is faster. In general if there will be a small number (fits in memory) number of resulting rows from distinct, then distinct will be faster. However, if the result will be large then group by will be faster. There are a lot of corner-cases and factors (distribution style) that can impact this rule-of-thumb so testing both for speed will give you which is faster in your case.
Given that you have 50 columns and you want all the unique combination I'd guess that the output set will be large and that group by will wind but this is just a guess.

Get the row with latest start date from multiple tables using sub select

I have data from 3 tables as copied below . I am not using joins to get data. I dont know how to use joins for multiple tables scenario. My situation is to update the OLD(eff_start_ts) date rows to sydate in one of the tables when we find the rows returned for a particular user is more than 2. enter code here
subscription_id |Client_id
----------------------------
20685413 |37455837
reward_account_id|subscription_id |CURRENCY_BAL_AMT |CREATE_TS |
----------------------------------------------------------------------
439111697 | 20685413 | -40 |1-09-10 |
REWARD_ACCT_DETAIL_ID|REWARD_ACCOUNT_ID |EFF_START_TS |EFF_STOP_TS |
----------------------------------------------------------------------
230900968 | 439111697 | 14-06-11 | 15-01-19
47193932 | 439111697 | 19-02-14 | 19-12-21
243642632 | 439111697 | 18-03-23 | 99-12-31
247192972 | 439111697 | 17-11-01 | 17-11-01
The SQL should update the EFF_STOP_TS of last table except the second row - 47193932 bcz that has the latest EFF_START_TS.
Expected result is to update the EFF_STOP_TS column of 230900968, 243642632 and 247192972 to sysdate.
As per my understanding, You need to update it per REWARD_ACCOUNT_ID. So, You can try the below code -
UPDATE REWARD_ACCT_DETAIL RAD
SET EFF_STOP_TS = SYSDATE
WHERE EFF_START_TS NOT IN (SELECT MAX(EFF_START_TS)
FROM REWARD_ACCT_DETAIL RAD1
WHERE RAD.REWARD_ACCOUNT_ID = RAD1.REWARD_ACCOUNT_ID)

Auto generate columns in Microsoft Access table

How can we auto generate column/fields in microsoft access table ?
Scenario......
I have a table with personal details of my employee (EmployDetails)
I wants to put their everyday attendance in an another table.
Rather using separate records for everyday, I want to use a single record for an employ..
Eg : I wants to create a table with fields like below
EmployID, 01Jan2020, 02Jan2020, 03Jan2020,.........25May2020 and so on.......
It means everyday I have to generate a column automatically...
Can anybody help me ?
Generally you would define columns manually (whether that is through a UI or SQL).
With the information given I think the proper solution is to have two tables.
You have your "EmployDetails" which you would put their general info (name, contact information etc), and the key which would be the employee ID (unique, can be autogenerated or manual, just needs to be unique)
You would have a second table with a foreign key to the empployee ID in "EmployDetails" with a column called Date, and another called details (or whatever you are trying to capture in your date column idea).
Then you simply add rows for each day. Then you do a join query between the tables to look up all the "days" for an employee. This is called normalisation and how relational databases (such as Access) are designed to be used.
Employee Table:
EmpID | NAME | CONTACT
----------------------
1 | Jim | 222-2222
2 | Jan | 555-5555
Detail table:
DetailID | EmpID (foreign key) | Date | Hours_worked | Notes
-------------------------------------------------------------
10231 | 1 | 01Jan2020| 5 | Lazy Jim took off early
10233 | 2 | 02Jan2020| 8 | Jan is a hard worker
10240 | 1 | 02Jan2020| 7.5 | Finally he stays a full day
To find what Jim worked you do a join:
SELECT Employee.EmpID, Employee.Name, Details.Date, Details.Hours_worked, Details.Notes
FROM Employee
JOIN Details ON Employee.EmpID=Details.EmpID;
Of course this will give you a normalised result (which is generally what's wanted so you can iterate over it):
EmpID | NAME | Date | Hours_worked | Notes
-----------------------------------------------
1 | Jim | 01Jan2020 | 5 | ......
1 | Jim | 02Jan2020 | 7 | .......
If you want the results denormalised you'll have to look into pivot tables.
See more on creating foreign keys

How can I trigger an update to a value in a table when criteria is met on a different table?

Aware there is an almost identical question here, but that covers the SQL query required, rather than the mechanism of event triggering.
Lets say I have two tables. One table contains performance data for each staff member each week. The other table is a table that holds the staff members information. What I want is to update a value in the table to a Y or N based on whether that staff member left at the week date.
staffTable
+----------+----------------+------------+
| staff_id | staff_name | leave_date |
+----------+----------------+------------+
| 1 | Joseph Blogges | 2020-01-24 |
| 2 | Joe Bloggs | 9999-12-31 |
| 3 | Joey Blogz | 9999-12-31 |
+----------+----------------+------------+
targetTable
+------------+----------+--------+-----------+
| week_start | staff_id | target | left_flag |
+------------+----------+--------+-----------+
| 2020-01-13 | 1 | 10 | N |
| 2020-01-20 | 1 | 10 | N |
| 2020-01-27 | 1 | 8 | Y |
+------------+----------+--------+-----------+
What I am trying to do is have the left_flag automatically change from 'N' to 'Y' when the week_start value is greater than leave_date of the staff member (in the other table).
I have tried successfully putting this into a view, which works, but the problem is that existing applications, views and queries will need to all reference a new view instead of a table and I want to be able to query the data table as my front-end has issues interacting in live with a view instead of a table.
I have also successfully used a UDF to return the leave_date and then create computed column that will check if this UDF variable is greater than the start_date column and this worked fine until I realised that the UDF is the most resource consuming query on the entire server and is completely disproportionate.
Is there a way that I can trigger an update to the staffTable when a criteria is met in another table, or is there a totally better and different way of doing this? If it can't be done easily, I'll try to switch to a view and work around it in the front-end.
I'm going to describe the process rather than writing the code.
What you are describing can be accomplished using triggers on staffTable. When a new row is inserted or updated the trigger would change any rows in targetTable. This would be an after insert/update trigger.
The heart of the trigger would be:
update tt
set left_flag = 'Y'
from targettable tt join
inserted i
on tt.staff_id = i.staff_id
where i.leave_date < tt.week_start and
tt.left_flag <> 'Y';

Gather single rows from multiple tables in Microsoft Access

I have several tables in Microsoft Access 2013, all of which follow the same format of:
ID | Object | Person 1 | Person 2 | Person 3 |
ID | String | Yes/No | Yes/No | Yes/No |
What I would like to do is make a query where I put in a string value for each table and it prints out the entire row, with each string getting its own row, so it looks like:
ID Number | Object | Person 1...
Table 1 ID | Table 1 String | Table 1 Yes/No...
Table 2 ID | Table 2 String | Table 2 Yes/No...
Every time I try, though, it puts all the data into one extremely long row that's impossible to look at. All of my searching has only turned up people trying to do the exact opposite of what I'm doing, though, so I must be missing something obvious. Any tips?