Get the "most" optimal row in a JOIN - sql

Problem
I have a situation in which I have two tables in which I would like the entries from table 2 (lets call it table_2) to be matched up with the entries in table 1 (table_1) such that there are no duplicates rows of table_2 used in the match up.
Discussion
Specifically, in this case there are datetime stamps in each table (field is utcdatetime). For each row in table_1, I want to find the row in table_2 in which has the closed utcdatetime to the table 1 utcdatetime such that the table2.utcdatetime is older than the table_1 utcdatetime and within 30 minutes of the table 1 utcdatetime. Here is the catch, I do not want any repeats. If a row in table 2 gets gobbled up in a match on an earlier row in table 1, then I do not want it considered for a match later.
This has currently been implemented in a Python routine, but it is slow to iterate over all of the rows in table 1 as it is large. I thought I was there with a single SQL statement, but I found that my current SQL results in duplicate table 2 rows in the output data.

I would recommend using a nested select to get whatever results you're looking for.
For instance:
select *
from person p
where p.name_first = 'SCCJS'
and not exists (select 'x' from person p2 where p2.person_id != p.person_id
and p.name_first = 'SCCJS' and p.name_last = 'SC')

Related

SQL Server Inner Join with Timestamps: is each record only assigned once?

I am working with timestamped records and need to do an inner join based on the timestamp difference. I have been using the DATEDIFF function and it seems to be working well. However, the amount of time between timestamps varies. To clarify, sometimes the record appears in table 2 within the same second as table 1, and sometimes the record in table 2 is up to 15 seconds behind the record in table 1. The records in table 1 are always timestamped before table 2. There is no other common field with which I can join, however there is a register number in each table that I am using to increase accuracy by ensuring that the registers are the same.
My question is: if I increase the timestamp difference to do the inner join (e.g. where the DATEDIFF = 1 or 2 or 3... or 15) will records only be joined once? Or would my table contain duplicate records from table 1 (e.g. where record 1 is joined to record 4 in table 2 where the diff is 4 seconds, and is also joined with record 7 from table 2 where the diff is 11 seconds)?
The reason my statement works now is that no registers have records with less than 6 seconds in between, so even if there are multiple timestamps that would match, the matching of registers eliminates this problem.
My Statement is currently working as:
SELECT *
INTO AtriumSequoiaJoin5
FROM Atrium INNER JOIN Sequoia ON Atrium.Reader = Sequoia.theader_pos_name
WHERE (
((DateDiff(s,[Atrium].[Date2],[Sequoia].[theader_tdatetime]))=0
Or (DateDiff(s,[Atrium].[Date2],[Sequoia].[theader_tdatetime]))=1
Or (DateDiff(s,[Atrium].[Date2],[Sequoia].[theader_tdatetime]))=2
Or (DateDiff(s,[Atrium].[Date2],[Sequoia].[theader_tdatetime]))=3
Or (DateDiff(s,[Atrium].[Date2],[Sequoia].[theader_tdatetime]))=4
Or (Datediff(s,[Atrium].[Date2],[Sequoia].[theader_tdatetime]))=5)
)
ORDER BY Sequoia.theader_id;
you could CROSS APPLY to the closest record in proximity. That's by no means ideal however, what if there are multiple records written at the same time? You perhaps should give the first table an identity field, then update the next table with scopeidentity
SELECT *
INTO AtriumSequoiaJoin5
FROM Atrium CROSS APPLY
(SELECT TOP 1 * FROM Sequoia WHERE
Atrium.Reader = Sequoia.theader_pos_name
ORDER BY Datediff(millisecond,[Atrium].[Date2],[Sequoia].[theader_tdatetime])) DQ
ORDER BY Sequoia.theader_id;

Oracle 11 SQL lists and two table comparison

I have a list of 12 strings (strings of numbers) that I need to compare to an existing table in Oracle. However I don't want to create a table just to do the compare; that takes more time than it is worth.
select
column_value as account_number
from
table(sys.odcivarchar2list('27001', '900480', '589358', '130740',
'807958', '579813', '1000100462', '656025',
'11046', '945287', '18193', '897603'))
This provides the correct result.
Now I want to compare that list of 12 against an actual table with account numbers to find the missing values. Normally with two tables I would do a left join table1.account_number = table2.account_number and table two results will have blanks. When I attempt that using the above, all I get are the results where the two records are equal.
select column_value as account_number, k.acct_num
from table(sys.odcivarchar2list('27001',
'900480',
'589358',
'130740',
'807958',
'579813',
'1000100462',
'656025',
'11046',
'945287',
'18193',
'897603'
)) left join
isi_owner.t_known_acct k on column_value = k.acct_num
9 match, but 3 should be included in table1 and blank in table2
Thoughts?
Sean

SQL INNER JOIN vs. WHERE ID IN(...) not the same results

I was surprised by the outcome of these two queries. I was expecting same from both. I have two tables that share a common field but there is not a relationship set up. The table (A) has a field EventID varchar(10) and table (B) has a field XXNumber varchar(15).
Values from table B column XXNumber are referenced in table A column EventID. Even though XXNumber can hold 15 chars, none of the 179K rows of data is longer than 10 chars.
So the requirement was:
"To avoid Duplicate table B and table A entries, if the XXNumber is contained in a table A >“Event ID” number, then it should not be counted."
To see how many common records I have I ran this query first - call it query alpha"
SELECT dbo.TableB.XXNumber FROM dbo.TableB WHERE dbo.TableB.XXNumber in
( select distinct dbo.TableA.EventId FROM dbo.TableA )
The result was 5322 rows.
The following query - call it query delta which looks like this:
SELECT DISTINCT dbo.TableB.XXNumber, dbo.TableB.EventId
FROM dbo.TableB INNER JOIN dbo.TableA ON dbo.TableB.XXNumber= dbo.TableB.EventId
haas returned 4308 rows.
Shouldn't the resulting number of rows be the same?
The WHERE ID IN () version will select all rows that match each distinct value in the list (regardless of whether you code DISTINCT indide the inner select or not - that's irrelevant). If a given value appears in the parent table more than once, you'll get multipke rows selected from the parent table for that single value found in the child table.
The INNER JOIN version will select each row from the parent table once for every successful join, so if there are 3 rows in the child table with the value, and 2 in the parent, then there will be 6 rows rows in the result for that value.
To make them "the same", add 'DISTINCT' to your main select.
To explain what you're seeing, we'd need to know more about your actual data.

Update a table based on a results of a group by

Update a table based on a results of a group by
I've got a tricky update problem I'm trying to solve. There are two tables that contain the same three columns plus additional varied columns, looking like this:
Table1 {pers_id, loc_id, pos, ... }
Table2 {pers_id, loc_id, pos, ... }
None of the fields are unique. The first two fields collectively identify the records in a table (or tables) as belonging to the same entity. Table1 could have 15 records belonging to an entity, and table2 could have 4 records belonging to the same entity. The third column 'pos' is an index from 0 to whatever, and this is the column that I'm trying to update.
In Table1 and in Table2, the pos column begins at 0, and increments based on user selection, so that in the example (15 records in table1 and 4 records in table2), table1 contains 'pos' values of 0 - 14, and Table2 contains 'pos' values of 0-3.
I want to increment the pos field in Table1 with the results of the count of similar entities in Table2. This is the sql statement that correctly gives me the results from table2:
select table2.pers_id, table2.loc_id, count(*) as pos_increment from table2 group by table2.pers_id, table2.loc_id;
The end result of the update, in the example (15 records in table1 and 4 records in table2), would be all records in Table1 of the same entity being incremented by 4 (the result of the specific entity group by). 0 would be changed to 4, 15 to 19, etc.
Is this achievable in a single statement?
Since you only need to increment the pos field the solution is really simple:
update table1 t1
set t1.pos = t1.pos +
(select count(1)
from table2 t2
where t2.pers_id = t1.pers_id
and t2.loc_id = t1.loc_id)
Yes, this is possible, you can use MERGE for some of these upadtes and there are ways to relate values between the update and the subselect. I have done this in the past, but it's tricky and I don't have an existing example.
You can find several examples on this site, some for Oracle and some for other database that will awork with slight modifications.

SQL Query - Ensure a row exists for each value in ()

Currently struggling with finding a way to validate 2 tables (efficiently lots of rows for Table A)
I have two tables
Table A
ID
A
B
C
Table matched
ID Number
A 1
A 2
A 9
B 1
B 9
C 2
I am trying to write a SQL Server query that basically checks to make sure for every value in Table A there exists a row for a variable set of values ( 1, 2,9)
The example above is incorrect because t should have for every record in A a corresponding record in Table matched for each value (1,2,9). The end goal is:
Table matched
ID Number
A 1
A 2
A 9
B 1
B 2
B 9
C 1
C 2
C 9
I know its confusing, but in general for every X in ( some set ) there should be a corresponding record in Table matched. I have obviously simplified things.
Please let me know if you all need clarification.
Use:
SELECT a.id
FROM TABLE_A a
JOIN TABLE_B b ON b.id = a.id
WHERE b.number IN (1, 2, 9)
GROUP BY a.id
HAVING COUNT(DISTINCT b.number) = 3
The DISTINCT in the COUNT ensures that duplicates (IE: A having two records in TABLE_B with the value "2") from being falsely considered a correct record. It can be omitted if the number column either has a unique or primary key constraint on it.
The HAVING COUNT(...) must equal the number of values provided in the IN clause.
Create a temp table of values you want. You can do this dynamically if the values 1, 2 and 9 are in some table you can query from.
Then, SELECT FROM tempTable WHERE NOT IN (SELECT * FROM TableMatched)
I had this situation one time. My solution was as follows.
In addition to TableA and TableMatched, there was a table that defined the rows that should exist in TableMatched for each row in TableA. Let’s call it TableMatchedDomain.
The application then accessed TableMatched through a view that controlled the returned rows, like this:
create view TableMatchedView
select a.ID,
d.Number,
m.OtherValues
from TableA a
join TableMatchedDomain d
left join TableMatched m on m.ID = a.ID and m.Number = d.Number
This way, the rows returned were always correct. If there were missing rows from TableMatched, then the Numbers were still returned but with OtherValues as null. If there were extra values in TableMatched, then they were not returned at all, as though they didn't exist. By changing the rows in TableMatchedDomain, this behavior could be controlled very easily. If a value were removed TableMatchedDomain, then it would disappear from the view. If it were added back again in the future, then the corresponding OtherValues would appear again as they were before.
The reason I designed it this way was that I felt that establishing an invarient on the row configuration in TableMatched was too brittle and, even worse, introduced redundancy. So I removed the restriction from groups of rows (in TableMatched) and instead made the entire contents of another table (TableMatchedDomain) define the correct form of the data.