Finding closest value in another table - sql

I'm quite new to SQL so bare with me. What I'm trying to do is return the value closest to another value in a different table for every record.
I'll show a simplified example of my two tables for clarification
First table is the one that I want the value ENTRY_YEAR matched to:
ID
ENTRY_VALUE
1001
1900
1002
2000
And the second table:
ID
ENTRY_VALUE
STATUS
1001
1880
SUCCES
1001
1930
FAIL
1001
1940
SUCCES
1002
1960
SUCCES
1002
1980
FAIL
So the end result I'm looking for is:
ID
ENTRY_VALUE
STATUS
1001
1880
SUCCES
1002
1980
FAIL
I have currently only managed to link the id's together but can't find a way to compare the ENTRY_VALUE in both tables and return the one closest to the Table1 entry.
So only this:
SELECT * from Table2
INNER JOIN Table1 ON (Table2.ID = Table1.ID)
Once again my bad for the basic question, I have googled right about everything but can't get it to work so any help is very welcome!

First attempt
This is a (slower performing) query. First attempt! This is an approach using a "correlated subquery" so it runs the inner query for each row of the outer query. The strategy is, for each row, to determine what the min value is we are looking for, and then select only the rows that fit that criteria. But such queries can be slow at runtime, although the logic is very clean.
select
a.id,
b.entry_value,
b.[status]
from
Foo a
inner join Bar b
on a.id = b.id
where
abs(a.entry_value - b.entry_value) =
(select min(abs(t1.entry_value-t2.entry_value))
from Foo t1
inner join Bar t2
on t1.id = t2.id
where
t1.id = a.id
group by t1.id)
Second attempt
If you have many rows (in the tens of thousands or in any case if the previous query is just too slow), then this next one should be better performing. Second Attempt! If you run the two inner queries by themselves, you will probably see the strategy here of how we are joining them to get the desired result.
select A.Id, A.entry_value, A.[status]
from
(
select t1.id, t2.entry_value, abs(t1.entry_value-t2.entry_value) as diff, t2.[status]
from Foo t1
inner join Bar t2
on t1.id = t2.id
) A
inner join
(
select t3.id, min(abs(t3.entry_value-t4.entry_value)) as diff
from Foo t3
inner join Bar t4
on t3.id = t4.id
group by t3.id
) B
on A.id = B.id
and A.diff = B.diff
Note
I would probably not try to write either of these queries in MSAccess "Design view" although if I had too I am sure I could. But generally, this is a case where I would write the query "by hand" and paste it into your query directly using MSAccess "SQL view".
Caution
Beware that ties will result in two rows! Example:
First table has (1003,2000)
Second table has (1003, 1990, 'success') and (1003, 2010, 'fail')
You will have a result with two rows, one with success and the other with fail (!)
So you really should test with your data and look for such cases that might produce such ties (and decide what to do, if necessary).
Btw...
just for fun, here's how you might go for it in SQL Server.
But I think this will NOT work in MSAccess, unfortunately.
select
T.id,
T.entry_value,
T.[status]
from
(
select
t1.id,
t2.entry_value,
abs(t1.entry_value-t2.entry_value) as diff,
t2.[status],
rank() over (partition by t1.id order by abs(t1.entry_value-t2.entry_value)) as seq
from #Foo t1
inner join #Bar t2
on t1.id = t2.id
) T
where T.seq = 1;

Use a simple subquery to find the minimum offset:
Select
tbl1.ID,
tbl2.ENTRY_VALUE,
tbl2.STATUS
From
tbl1
Inner Join
tbl2 On tbl1.ID = tbl2.ID
Where
Abs([tbl1].[ENTRY_VALUE] - [tbl2].[ENTRY_VALUE]) =
(Select Min(Abs([tbl1].[ENTRY_VALUE] - [T2].[ENTRY_VALUE])) As Offset
From tbl2 As T2
Where T2.ID = tbl1.ID);
Output:
ID
ENTRY_VALUE
STATUS
1001
1880
SUCCES
1002
1980
FAIL
Note, that if the minimum offset for an ID exists twice, both records having this offset will be returned. Thus, you may have to aggregate the output.

Related

SQL Server Return Rows Where Field Changed

I have a table with 3 values.
ID AuditDateTime UpdateType
12 12-15-2015 18:09 1
45 12-04-2015 17:41 0
75 12-21-2015 04:26 0
12 12-17-2015 07:43 0
35 12-01-2015 05:36 1
45 12-15-2015 04:35 0
I'm trying to return only records where the UpdateType has changed from AuditDateTime based on the IDs. So in this example, ID 12 changes from the 12-15 entry to the 12-17 entry. I would want that record returned. There will be multiple instances of ID 12, and I need all records returned where an ID's UpdateType has changed from its previous entry. I tried adding a row_number but it didn't insert sequentially because the records are not in the table in order. I've done a ton of searching with no luck. Any help would be greatly appreciated.
By using a CTE it is possible to find the previous record based upon the order of the AuditDateTime
WITH CTEData AS
(SELECT ROW_NUMBER() OVER (PARTITION BY ID ORDER BY AuditDateTime) [ROWNUM], *
FROM #tmpTable)
SELECT A.ID, A.AuditDateTime, A.UpdateType
FROM CTEData A INNER JOIN CTEData B
ON (A.ROWNUM - 1) = B.ROWNUM AND
A.ID = B.ID
WHERE A.UpdateType <> B.UpdateType
The Inner Join back onto the CTE will give in one query both the current record (Table Alias A) and previous row (Table Alias B).
This should do what you're trying to do I believe
SELECT
T1.ID,
T1.AuditDateTime,
T1.UpdateType
FROM
dbo.My_Table T1
INNER JOIN dbo.My_Table T2 ON
T2.ID = T1.ID AND
T2.UpdateType <> T1.UpdateType AND
T2.AuditDateTime < T1.AuditDateTime
LEFT OUTER JOIN dbo.My_Table T3 ON
T3.ID = T1.ID AND
T3.AuditDateTime < T1.AuditDateTime AND
T3.AuditDateTime > T2.AuditDateTime
WHERE
T3.ID IS NULL
Alternatively:
SELECT
T1.ID,
T1.AuditDateTime,
T1.UpdateType
FROM
dbo.My_Table T1
INNER JOIN dbo.My_Table T2 ON
T2.ID = T1.ID AND
T2.UpdateType <> T1.UpdateType AND
T2.AuditDateTime < T1.AuditDateTime
WHERE
NOT EXISTS
(
SELECT *
FROM
dbo.My_Table T3
WHERE
T3.ID = T1.ID AND
T3.AuditDateTime < T1.AuditDateTime AND
T3.AuditDateTime > T2.AuditDateTime
)
The basic gist of both queries is that you're looking for rows where an earlier row had a different type and no other rows exist between the two rows (hence, they're sequential). Both queries are logically identical, but might have differing performance.
Also, these queries assume that no two rows will have identical audit times. If that's not the case then you'll need to define what you expect to get when that happens.
You can use the lag() window function to find the previous value for the same ID. Now you can pick only those rows that introduce a change:
select *
from (
select lag(UpdateType) over (
partition by ID
order by AuditDateTime) as prev_updatetype
, *
from YourTable
) sub
where prev_updatetype <> updatetype
Example at SQL Fiddle.

SQL - get max result

Assume there is a table name "test" below:
name value
n1 1
n2 2
n3 3
Now, I want to get the name which has the max value, I have some solution below:
Solution 1:
SELECT TOP 1 name
FROM test
ORDER BY value DESC
solution 2:
SELECT name
FROM test
WHERE value = (SELECT MAX(value) FROM test);
Now, I hope use join operation to find the result, like
SELECT name
FROM test
INNER JOIN test ON...
Could someone please help and explain how it works?
If you are looking for JOIN then
SELECT T.name, T.value
FROM test T
INNER JOIN
( SELECT T1.name, T1.value ,
RANK() OVER (PARTITION BY T1.name ORDER BY T1.value) N
FROM test T1
WHERE T1.value IN (SELECT MAX(t2.value) FROM test T2)
)T3 ON T3.N = 1 AND T.name = T3.name
FIDDLE DEMO
or
select name, value
from
(
select name, value,
row_number() over(order by value desc) rn
from test
) src
where rn = 1
FIDDLE DEMO
First, note that solutions 1 and 2 could give different results when value is not unique. If in your test data there would be an additional record ('n4', 3), then solution 1 would return either 'n3' or 'n4', but solution 2 would return both.
A solution with JOIN will need aliases for the table, because as you started of, the engine would say Ambiguous column name 'name'.: it would not know whether to take name from the first or second occurrence of the test table.
Here is a way to complete the JOIN version:
SELECT t1.name
FROM test t1
LEFT JOIN test t2
ON t2.value > t1.value
WHERE t2.value IS NULL;
This query takes each of the records, and checks if any records exist that have a higher value. If not, the first record will be in the result. Note the use of LEFT: this denotes an outer join, so that records from t1 that have no match with t2 -- based on the ON condition -- are not immediately rejected (as would be the case with INNER): in fact, we want to reject all the other records, which is done with the WHERE clause.
A way to understand this mechanism, is to look at a variant of the query above, which lacks the WHERE clause and returns the values of both tables:
SELECT t1.value, t2.value
FROM test t1
LEFT JOIN test t2
ON t2.value > t1.value
On your test data this will return:
t1.value t2.value
1 2
1 3
2 3
3 (null)
Note that the last entry would not be there if the join where an INNER JOIN. But with the outer join, one can now look for the NULL values and actually get those records in the result that would be excluded from an INNER JOIN.
Note that this query will give the same result as solution 2 when there are duplicate values. If you want to have also only one result like with solution 1, it suffices to add TOP 1 after SELECT.
Here is a fiddle.
Alternative with pure INNER JOIN
If you really want an INNER join, then this will do it. Again the TOP 1 is only needed if you have non-unique values:
SELECT TOP 1 t1.name
FROM test t1
INNER JOIN (SELECT Max(value) AS value FROM test) t2
ON t2.value = t1.value;
But this one really is very similar to what you did in solution 2. Here is fiddle for it.

SQL Select Query - "Pairing" Records Together

Here's my data:
Table1...
id
field1
field2
...
Table2...
id
table1_original_id
table1_new_id
Table1 holds records that cannot themselves be updated though I built a mechanism for my users to be able to "update" them... they select a record, make changes, and those changes are actually submitted as a new record. For conversation's sake let's assume there are currently 10 records in Table1 with IDs 1 through 10. User submits an update to ID 3. ID 3 remains as it was and a new record, ID 11, is added to Table1. Along with this new record, Table2 gets a new record, with ID = 1, t1_original_id = 3 and t1_new_id = 11.
What would be by SQL select to retrieve the "pairs" of records from Table1... in this case the query would provide me with... IDs 3 and 11.
scratching head
I don't think it matters much by DB platform is Postgres 8.4 and I'm retrieving the data via PHP to be processed in jqGrid. Bonus points if you can point me in the direction of displaying each pair of records in a separate jqGrid!
Thanks for your time and effort.
=== EDIT ===
The following is a sample of what I need returned from the query based on the scenario above:
id field1 field2
3 blah stuff
11 more things
Once I have these pairs back I can process them further, as necessary, on the PHP side.
Standard SQL solution
To get two separate rows (as later specified in the Q update) use a UNION query:
SELECT 'old' AS version, t1.id, t1.field1, t1.field2
FROM table1 t1
JOIN table2 t2 ON t2.table1_original_id = t1.id
WHERE t2.id = $some_t2_id -- your t2.id here!
UNION ALL
SELECT 'new' as version, t1.id, t1.field1, t1.field2
FROM table1 t1
JOIN table2 t2 ON t2.table1_new_id = t1.id
WHERE t2.id = $some_t2_id; -- your t2.id here!
Note: this is one query.
Returns as requested, plus a column version to distinguish the rows reliably:
version | id | field1 | field2
--------+----+--------+--------
old | 3 | blah | stuff
new | 11 | more | things
Faster Postgres-specific solution
A more sophisticated, but shorter and faster solution:
SELECT version, id, field1, field2
FROM (
SELECT unnest(ARRAY['old', 'new']) AS version
,unnest(ARRAY[table1_original_id, table1_new_id]) AS id
FROM table2 t2 WHERE t2.id = $some_t2_id -- your t2.id here!
) t2
JOIN table1 USING (id);
Same result.
-> SQLfiddle demo for both.
You'll need to JOIN to Table1 twice, something like:
SELECT orig.id AS Orig_ID
, orig.value AS Orig_Value
, n.id AS New_ID
, n.value AS New_Value
FROM Table2 a
JOIN Table1 AS orig
ON a.table1_original_id = orig.id
JOIN Table1 AS n
ON a.table1_new_id = n.id
Demo: SQL Fiddle
Update:
To get them paired as you want without manually choosing a set you'll need something like this:
SELECT sub.id,sub.value
FROM (SELECT a.id as Update_ID,o.id,o.value, '1' AS sort
FROM Table2 a
JOIN Table1 AS o
ON a.table1_original_id = o.id
UNION
SELECT a.id as Update_ID, n.id,n.value, '2' AS sort
FROM Table2 a
JOIN Table1 AS n
ON a.table1_new_id = n.id
) AS sub
ORDER BY Update_ID,sort
Demo: Sql Fiddle
Notes: Change UNION to UNION + ALL, can't post those words next to each other due to firewall limitation.
The hard-coded '1' and '2' are so that original always appear before new.

SQL - remove duplicates from left join

I'm creating a joined view of two tables, but am getting unwanted duplicates from table2.
For example: table1 has 9000 records and I need the resulting view to contain exactly the same; table2 may have multiple records with the same FKID but I only want to return one record (random chosen is ok with my customer). I have the following code that works correctly, but performance is slower than desired (over 14 seconds).
SELECT
OBJECTID
, PKID
,(SELECT TOP (1) SUBDIVISIO
FROM dbo.table2 AS t2
WHERE (t1.PKID = t2.FKID)) AS ProjectName
,(SELECT TOP (1) ASBUILT1
FROM dbo.table2 AS t2
WHERE (t1.PKID = t2.FKID)) AS Asbuilt
FROM dbo.table1 AS t1
Is there a way to do something similar with joins to speed up performance?
I'm using SQL Server 2008 R2.
I got close with the following code (~.5 seconds), but 'Distinct' only filters out records when all columns are duplicate (rather than just the FKID).
SELECT
t1.OBJECTID
,t1.PKID
,t2.ProjectName
,t2.Asbuilt
FROM dbo.table1 AS t1
LEFT JOIN (SELECT
DISTINCT FKID
,ProjectName
,Asbuilt
FROM dbo.table2) t2
ON t1.PKID = t2.FKID
table examples
table1 table2
OID, PKID FKID, ProjectName, Asbuilt
1, id1 id1, P1, AB1
2, id2 id1, P5, AB5
3, id4 id2, P10, AB2
5, id5 id5, P4, AB4
In the above example returned records should be id5/P4/AB4, id2/P10/AB2, and (id1/P1/AB1 OR id1/P5/AB5)
My search came up with similar questions, but none that resolved my problem. link, link
Thanks in advance for your help. This is my first post so let me know if I've broken any rules.
This will give the results you requested and should have the best performance.
SELECT
OBJECTID
, PKID
, t2.SUBDIVISIO,
, t2.ASBUILT1
FROM dbo.table1 AS t1
OUTER APPLY (
SELECT TOP 1 *
FROM dbo.table2 AS t2
WHERE t1.PKID = t2.FKID
) AS t2
Your original query is producing arbitrary values for the two columns (the use of top with no order by). You can get the same effect with this:
SELECT t1.OBJECTID, t1.PKID, t2.ProjectName, t2.Asbuilt
FROM dbo.table1 t1 LEFT JOIN
(SELECT FKID, min(ProjectName) as ProjectName, MIN(asBuilt) as AsBuilt
FROM dbo.table2
group by fkid
) t2
ON t1.PKID = t2.FKID
This version replaces the distinct with a group by.
To get a truly random row in SQL Server (which your syntax suggests you are using), try this:
SELECT t1.OBJECTID, t1.PKID, t2.ProjectName, t2.Asbuilt
FROM dbo.table1 t1 LEFT JOIN
(SELECT FKID, ProjectName, AsBuilt,
ROW_NUMBER() over (PARTITION by fkid order by newid()) as seqnum
FROM dbo.table2
) t2
ON t1.PKID = t2.FKID and t2.seqnum = 1
This assumes version 2005 or greater.
If you want described result, you need to use INNER JOIN and following query will satisfy your need:
SELECT
t1.OID,
t1.PKID,
MAX(t2.ProjectName) AS ProjectName,
MAX(t2.Asbuilt) AS Asbuilt
FROM table1 t1
JOIN table2 t2 ON t1.PKID = t2.FKID
GROUP BY
t1.OID,
t1.PKID
If you want to see all rows from left table (table1) whether it has pair in right table or not, then use LEFT JOIN and same query will gave you desired result.
EDITED
This construction has good performance, and you dont need to use subqueries.

SQL nested query

I have a table like below
id name dependency
-----------------------
1 xxxx 0
2 yyyy 1
3 zzzz 2
4 aaaaaa 0
5 bbbbbb 4
6 cccccc 5
the list goes on. I want to select group of rows from this table , by giving the name of 0 dependency in where clause of SQL and till it reaches a condition where there is no more dependency. (For ex. rows 1,2, 3 forms a group, and rows 4,5,6 is another group) .please help
Since you did not specify a product, I'll go with features available in the SQL specification. In this case, I'm using a common-table expression which are supported by many database products including SQL Server 2005+ and Oracle (but not MySQL):
With MyDependents As
(
Select id, name, 0 As level
From MyTable
Where dependency = 0
And name = 'some value'
Union All
Select T.id, T.name, T.Level + 1
From MyDependents As D
Join MyTable As T
On T.id = D.dependency
)
Select id, name, level
From MyDependents
Another solution which does not rely on common-table expressions but does assume a maximum level of depth (in this case two levels below level 0) would something like
Select T1.id, T1.name, 0 As level
From MyTable As T1
Where T1.name = 'some value'
Union All
Select T2.id, T2.name, 1
From MyTable As T1
Join MyTable As T2
On T2.Id = T1.Dependency
Where T1.name = 'some value'
Union All
Select T3.id, T3.name, 2
From MyTable As T1
Join MyTable As T2
On T2.Id = T1.Dependency
Join MyTable As T3
On T3.Id = T2.Dependency
Where T1.name = 'some value'
Sounds like you want to recursively query your table, for which you will need a Common Table Expression (CTE)
This MSDN article explains CTEs very well. They are confusing at first but surprisingly easy to implement.
BTW this is obviously only for SQL Server, I'm not sure how you'd achieve that in MySQL.
This is the first thing that came to mind. It can be probably done more directly/succinctly, I'll try to dwell on it a little.
SELECT *
FROM table T1
WHERE T1.id >=
(SELECT T2.id FROM table T2 WHERE T2.name = '---NAME HERE---')
AND T1.id <
(SELECT MIN(id)
FROM table T3
WHERE T3.dependency = 0 AND T3.id > T2.id)
If you can estimate a max depth, this works out to something like:
SELECT
COALESCE(t4.field1, t3.field1, t2.field1, t1.field1, t.field1),
COALESCE(t4.field2, t3.field2, t2.field2, t1.field2, t.field2),
COALESCE(t4.field3, t3.field3, t2.field3, t1.field3, t.field3),
....
FROM table AS t
LEFT JOIN table AS t1 ON t.dependency = t1.id
LEFT JOIN table AS t2 ON t1.dependency = t2.id
LEFT JOIN table AS t3 ON t2.dependency = t3.id
LEFT JOIN table AS t4 ON t3.dependency = t4.id
....
This is a wild guess just to be different, but I think it's kind of pretty, anyway. And it's at least as portable as any of the others. But I don't want to look to closely; I'd want to use sensible data, start testing, and check for sensible results.
Hierarchical query will do:
SELECT *
FROM your_table
START WITH id = :id_of_group_header_row
CONNECT BY dependency = PRIOR id
Query works like this:
1. select all rows satisfying START WITH condition (this rows are roots now)
2. select all rows satisfying CONNECT BY condition,
keyword PRIOR means this column's value will be taken from the root row
3. consider rows selected on step 2 to be roots
4. go to step 2 until there are no more rows