MS-Access Select 1 row from GROUP BY query - sql

Always had a hard time wrapping my head around GROUP BY functionality, and this one is no exception.
I have a simple Join query as such
Select t1.g1, t1.g2, t2.id, t2.datetime, t3.name
From ((table1 t1 Inner Join table2 t2 on t1.fld1=t2.fld1)
Inner Join table3 t3 on t1.fld2=t3.fld2)
Order By t2.datetime, t2.id
This returns my data as expected. Here are some sample rows that illustrate what I am trying to retrieve with Group By...
t1.g1
t2.g2
t2.id
t2.datetime
t3.name
726
4506
32
9/12/2021
nameA
726
4506
33
9/12/2021
nameB
726
4506
30
9/13/2021
nameC
I want to grab ONLY the first row in each Group of t1.g1, t1.g2.
So, I try the following:
Select t1.g1, t1.g2, FIRST(t2.id), FIRST(t2.datetime), FIRST(t3.name)
From ((table1 t1 Inner Join table2 t2 on t1.fld1=t2.fld1)
Inner Join table3 t3 on t1.fld2=t3.fld2)
Group By t1.g1, t1.g2
Order By FIRST(t2.datetime), FIRST(t2.id)
For the example Group above, this returns the following record...
t1.g1
t2.g2
t2.id
t2.datetime
t3.name
726
4506
30
9/13/2021
nameC
So, Order By operates after the Grouping is done, not before. Or so it seems. Perhaps the reason for the order of the SQL keywords (Select, From, Where, Group By, Order By). Ok, makes sense if my assumption is correct. I think it finds t2.id=30 ahead of the other 726/4506 records because t2.id is a primary key on table2.
So, now I try a nested Query, wherein my first query above returns the data in the correct order and the outside query groups and grabs the first record.
Select t1.g1, t1.g2, FIRST(t2.id), FIRST(t2.datetime), FIRST(t3.name)
FROM (
Select t1.g1, t1.g2, t2.id, t2.datetime, t3.name
From ((table1 t1 Inner Join table2 t2 on t1.fld1=t2.fld1)
Inner Join table3 t3 on t1.fld2=t3.fld2)
Order By t2.datetime, t2.id
)
Group By t1.g1, t1.g2
Order By FIRST(t2.datetime), FIRST(t2.id)
Same results! I am at a loss to understand how this is happening. So, if anyone can shed light on the order of functioning under-the-covers for Access SQL in this instance I would love to know. On my 2nd query (nested Select), it seems as though I am ordering the target data such that after Grouping the FIRST() aggregate function should select the first row found in the inner result set. But that is not happening.
And of course, if anyone can tell me how to return the row I am after ...
t1.g1
t2.g2
t2.id
t2.datetime
t3.name
726
4506
32
9/12/2021
nameA
That is all I really need.

I want to grab ONLY the first row in each Group of t1.g1, t1.g2.
You don't want aggregation. You want to filter the data. In this case, a correlated subquery does what you want:
Select t1.g1, t1.g2, t2.id, t2.datetime, t3.name
From (table1 t1 Inner Join
table2 t2
on t1.fld1 = t2.fld1
) Inner Join
table3 t3
on t1.fld2 = t3.fld2
where t2.id = (select top 1 tt2.id
from (table1 tt1 Inner Join
table2 tt2
on tt1.fld1 = tt2.fld1
) Inner Join
table3 tt3
on tt1.fld2 = tt3.fld2
where tt1.g1 = t1.g1 and tt1.g2 = t1.g2
order by tt2.datetime, tt2.id
);

Here is a solution that scales well (6s on 250k recs in t2) and does what I am asking for.
I could not get Gordon's answer to work in Access. Seems like it should have however. And I have my doubts about how well it would perform with 250k recs in t2. I would love to test a solution like Gordon's, if I could figure out how to get Access to take it.
See problem description for an example on exactly which record I am after. I only need t2.id from the result set. This was not stated originally, but I don't see how that changes the problem statement or solution. I could be wrong there. I still need t3.name, but it can be retrieved later using t2.id.
But I still need to pick the record GROUP'd BY t1.g1, t1.g2 that comes first when all records are sorted by t2.dateandtime, t2.id. Or stated another way, amongst all records with the same t1.g1+t1.g2, I need exactly the first record when the group is sorted by "t2.dateandtime, t2.id".
Perhaps I am thinking about this solution to my problem all wrong, and there are better ways to resolve this with SQL; if so, I would love to hear it.
I seem to have learned that GROUP BY does group records together based on this SQL clause, but this grouping loses any concept of individual records at this point; e.g. you can only extract other fields by using an Aggregate Function (MIN, MAX, SUM, etc), but - and importantly - FIRST does not get the value of the record that you can predict, as the ORDER BY clause has not been performed yet.
With all that said, here is my solution that works.
I removed reference to the Join on t3 as with t2.id I can retrieve all the other info I need from t3 after the fact, using t2.id.
Don't need to select 't1.g1, t1.g2', that is superfluous. I originally thought that any Group By fields had to be specified in the Select clause also.
I combine t2.dateandtime and t2.id into a Text field and use MIN to Select the data/record I am after once it is GROUP'd BY. No need to Sort my result set, as the record with the MIN value of t2.dateandtime, then t2.id has been chosen! Thus satisfying my condition and selection of the correct record.
Since all I need is t2.id returned for further processing, I extract t2.id from the String built in #3 and convert back to Long data type.
Here is the brief and simple query:
Select
MIN(Format(t2.dateandtime, "yyyymmddhhmmss") & '_' & Format(t2.id, '000000')) as dt_id,
CLNG(MID(dt_id, INSTR(dt_id, '_') + 1)) as id
From
(table1 t1 Inner Join table2 t2 on t1.fld1=t2.fld1)
Group By
t1.g1, t1.g2

Related

Finding closest value in another table

I'm quite new to SQL so bare with me. What I'm trying to do is return the value closest to another value in a different table for every record.
I'll show a simplified example of my two tables for clarification
First table is the one that I want the value ENTRY_YEAR matched to:
ID
ENTRY_VALUE
1001
1900
1002
2000
And the second table:
ID
ENTRY_VALUE
STATUS
1001
1880
SUCCES
1001
1930
FAIL
1001
1940
SUCCES
1002
1960
SUCCES
1002
1980
FAIL
So the end result I'm looking for is:
ID
ENTRY_VALUE
STATUS
1001
1880
SUCCES
1002
1980
FAIL
I have currently only managed to link the id's together but can't find a way to compare the ENTRY_VALUE in both tables and return the one closest to the Table1 entry.
So only this:
SELECT * from Table2
INNER JOIN Table1 ON (Table2.ID = Table1.ID)
Once again my bad for the basic question, I have googled right about everything but can't get it to work so any help is very welcome!
First attempt
This is a (slower performing) query. First attempt! This is an approach using a "correlated subquery" so it runs the inner query for each row of the outer query. The strategy is, for each row, to determine what the min value is we are looking for, and then select only the rows that fit that criteria. But such queries can be slow at runtime, although the logic is very clean.
select
a.id,
b.entry_value,
b.[status]
from
Foo a
inner join Bar b
on a.id = b.id
where
abs(a.entry_value - b.entry_value) =
(select min(abs(t1.entry_value-t2.entry_value))
from Foo t1
inner join Bar t2
on t1.id = t2.id
where
t1.id = a.id
group by t1.id)
Second attempt
If you have many rows (in the tens of thousands or in any case if the previous query is just too slow), then this next one should be better performing. Second Attempt! If you run the two inner queries by themselves, you will probably see the strategy here of how we are joining them to get the desired result.
select A.Id, A.entry_value, A.[status]
from
(
select t1.id, t2.entry_value, abs(t1.entry_value-t2.entry_value) as diff, t2.[status]
from Foo t1
inner join Bar t2
on t1.id = t2.id
) A
inner join
(
select t3.id, min(abs(t3.entry_value-t4.entry_value)) as diff
from Foo t3
inner join Bar t4
on t3.id = t4.id
group by t3.id
) B
on A.id = B.id
and A.diff = B.diff
Note
I would probably not try to write either of these queries in MSAccess "Design view" although if I had too I am sure I could. But generally, this is a case where I would write the query "by hand" and paste it into your query directly using MSAccess "SQL view".
Caution
Beware that ties will result in two rows! Example:
First table has (1003,2000)
Second table has (1003, 1990, 'success') and (1003, 2010, 'fail')
You will have a result with two rows, one with success and the other with fail (!)
So you really should test with your data and look for such cases that might produce such ties (and decide what to do, if necessary).
Btw...
just for fun, here's how you might go for it in SQL Server.
But I think this will NOT work in MSAccess, unfortunately.
select
T.id,
T.entry_value,
T.[status]
from
(
select
t1.id,
t2.entry_value,
abs(t1.entry_value-t2.entry_value) as diff,
t2.[status],
rank() over (partition by t1.id order by abs(t1.entry_value-t2.entry_value)) as seq
from #Foo t1
inner join #Bar t2
on t1.id = t2.id
) T
where T.seq = 1;
Use a simple subquery to find the minimum offset:
Select
tbl1.ID,
tbl2.ENTRY_VALUE,
tbl2.STATUS
From
tbl1
Inner Join
tbl2 On tbl1.ID = tbl2.ID
Where
Abs([tbl1].[ENTRY_VALUE] - [tbl2].[ENTRY_VALUE]) =
(Select Min(Abs([tbl1].[ENTRY_VALUE] - [T2].[ENTRY_VALUE])) As Offset
From tbl2 As T2
Where T2.ID = tbl1.ID);
Output:
ID
ENTRY_VALUE
STATUS
1001
1880
SUCCES
1002
1980
FAIL
Note, that if the minimum offset for an ID exists twice, both records having this offset will be returned. Thus, you may have to aggregate the output.

How to join two SQL tables by extracting maximum numbers from one then into another?

As others have commented, I'm now going to add some code:
Imported tables
table3
Case No. is the primary key. Each report date shows one patient. Depending on if the patient is import or local, the cumulative column increases. You can see some days there are no cases so the date like 25/01/2020 is skipped
table2
Report date has no duplicate.
Now, I want to join the tables. Example outcome here:
enter image description here
The maximum cumulative of each date is joined into the new table. So although 26/01/2020 of table3 shows the increase from 6, 7, to 8, I only want the highest cumulative number there.
Thanks for letting me know how my previous query could be improved. Your opinion helps me a lot.
I have tried Gordon Linoff's by substituting the actual names (which I initially omitted because I thought they were ambiguous).
His code is as follows (I've upvoted):
SELECT t3.`Report date`,
max(max(t3.cumulative_local)) over (order by t3.`Report date`),
max(max(t3.cumulative_import)) over (order by t3.`Report date`)
from table3 t3 left join
table2 t2
using (`Report date`)
group by t2.`Report date`;
But I got an error
Error Code: 1055. Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'new.t3.Report date' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
Anyways I am now experimenting. Both answers helped. If you know how to fix 1055, let me know, or if you could propose another solution. Thanks
I think you just want aggregation and window functions:
select t1.date,
max(max(cumulativea)) over (order by t1.date),
max(max(cumulativeb)) over (order by t1.date)
from table1 t1 left join
table2 t2
on t1.date = t2.date
group by t1.date;
This returns the maximum values of the two columns up to each date, which is, I think, what you are trying to describe.
I don't understand why you have cumulA and cumulB on table1. I suppose it will be to store the Max cumulA and cumulB for each days.
You must first self-join table2 to find the Max for each date (with a GROUP BY date) :
SELECT t2.id, t2.date, cA
FROM t2
JOIN (
SELECT id, MAX(cumulA) AS cA, date AS d2
FROM t2
GROUP BY d2
) AS td
ON t2.id=td.id
AND t2.date=d2
ORDER BY t2.date
After, you join left table1 on result of self-join table2 to have each days.
SELECT * FROM `t1` LEFT JOIN t2 ON t1.date = t2.date ORDER BY t1.date
Here is the fusion of the 2 junctions :
SELECT * FROM `t1` LEFT JOIN (
SELECT t2.id, t2.date, cA
FROM t2
JOIN (
SELECT id, MAX(cumulA) AS cA, date AS d2
FROM t2
GROUP BY d2
) AS td
ON t2.id=td.id
AND t2.date=d2
ORDER BY t2.date
) AS tt
ON t1.date = tt.date ORDER BY t1.date
You do the same for cumulB.
And after (I suppose), you INSERT INTO the result into table1.
I hope I answered your question.
Good continuation.
_Teddy_

Is is possible to attach table alias to column names to figure out where columns are coming from?

I have a query that I'm trying to rework that has over 1,000 columns when I select * FROM several tables. I want to know if there is a way in SQL to tag the column alias with the table alias so i can know from which table the columns are from. It looks like the following:
SELECT *
FROM table1 t1
join table2 t2
join table3 t3
join table4 t4
Current column output:
id, id, id, id, name, name, name, name, order, order, order, order
Desired Column output:
t1.id, t1.name, t1.order, t2.id, t2.name, t2.order,t3.id, t3.name, t3.order, t4.id, t4.name, t4.order
this is a very simple example but you can imagine trying to fish out the column you need of a sea of 1,000 columns trying to figure out what table it came from! Any ideas??
I'm not aware of a way to prefix each column with the column alias. However I do know how you could easily break the columns into groups that would allow you to figure out which table each column comes from.
SELECT 'T1' as [Table1]
, t1.*
, 'T2' as [Table2]
, t2.*
, 'T3' as [Table3]
, t3.*
, t4.* as [Table4]
, t4.*
, 'T5' as [Table5]
, t5.*
FROM table1 t1
join table2 t2
join table3 t3
join table4 t4
This would break out the columns into groups by table and it would break a little bookmark before and after each group to help you understand where they're coming.
I know not exactly what you asked for but I believe it would help you a lot in figuring out what's from what tables.
Your other option is as others have said and specifiying the prefix on every column which it sounds like you don't want to do. However it can be a lot quicker to do this if you drag the columns from the Object Explorer - and use ALT-SHIFT to add the prefix to each column.
Here's an article about copying columns from object explorer - https://www.qumio.com/Blog/Lists/Posts/Post.aspx?ID=56
Her's an article about adjusting code using ALT+SHIFT - https://blogs.msdn.microsoft.com/sql_pfe_blog/2017/04/11/quick-tip-shiftalt-for-multiple-line-edits/
The first method would take less than a method, the 2nd method I could see taking less than 10 minutes even for 1,000 columns.
You have to assign non-default column aliases manually:
select t1.id as t1_id, t1.name as t1_name, t1.order as t1_order,
t2.id as t2_id, t2.name as t2_name, t2.order as t2_order,
. . .
You might find that a spreadsheet or query can help, if you have a lot of columns.
Some products may have exceptions, but generally no, you can't do that. You either have to use wildcards (SELECT *) or specify the columns you wish returned by full and complete name.
If you specify columns, you can "alias" them, set the column name to something other than the source name. For example (psuedo-code, leaving out the "ON" clause):
SELECT
T1.Id as T1_Id
,T2.Id as T2_Id
from table1 T1
join table2 T2
Note that you can combine table aliases with wildcards. For example:
SELECT
T2.*
from table1 T1
join table2 T2
join table3 T3
join table4 T5
will return all the columns from table2, and only from table2. This might help in revising your query by getting a list of the available columns in each table.

SQL aggregate function returning inflated values on joined table

I'm breaking my head here where I'm going wrong.
The following query:
SELECT SUM(table1.col1) FROM table1
returns value x.
And the following query:
SELECT SUM(table1.col1) FROM table2 RIGHT OUTER JOIN table1 ON table2.ID = table1.ID
returns value y. (I need the Join for the other data of table2). Why is the 2nd example returning a different value than in the first?
Make life easier on yourself, your colleagues that will support your code, and your clients by temporarily ignoring the existence of RIGHT OUTER JOIN. Use Table1 as the "from table" instead of table2.
Then, If aggregating, you will often find it necessary to do this BEFORE joining, so that the numbers are accurate. e.g.
SELECT T1.SUMCOL1
FROM (
SELECT id, SUM(col1) as SUMCOL1 FROM Table1 GROUP BY id
) T1
LEFT OUTER JOIN table2 T2 on T1.id = T2.ID
Obvious answer is because table2 is many to table1's one. That is, there are multiple rows in table2 for one id in table1. You may also be eliminating rows from table1 if the id isn't present in table2.
Compare:
SELECT COUNT(*) FROM table1
To:
SELECT COUNT(*) FROM table2 RIGHT OUTER JOIN table1 ON table2.ID = table1.ID
If you get different results, you're aggregating duplicates or eliminating rows from table1.
If you want to avoid this, you'll need to use a subquery.

Many SQL queries vs 1 complex query

I have a database with two tables that are similar to:
table1
------
Id : long, auto increment
Title : string(50)
ParentId : long
table2
------
Id : long, auto increment
FirstName : string(20)
LastName : string(30)
Zip : string(5)
table2 has a one-to-many relationship with table1 where many includes zero.
I also have the following query (that works correctly, so ignore typos an the like, it is an example):
SELECT t1.Id AS tid, t1.Title, t2.Id AS oid, t2.FirstName, t2.LastName
FROM table t1
INNER JOIN table2 t2 ON t1.ParentId = t2.Id
WHERE t2.Id IN
(SELECT Id FROM table2
WHERE Zip IN ('zip1', 'zip2', 'etc'))
ORDER BY t2.Id DESC
The query finds all items in table1 that belong to a person in table2, where the person is in one of the listed zip codes.
The problem I have now is: I want to show all the users (with their items if available) in the listed zip codes, not just the ones with items.
So, I am wondering, should I just do something simple with a lot more queries, like:
SELECT Id AS oid, FirstName, LastName FROM table2 WHERE Zip in ('zip1', 'zip2', 'etc')
foreach(result) {
SELECT Id AS tid, Title FROM table2 WHERE ParentId = oid
}
Or should I come up with a more elaborate single SQL statement? And if so, can I get a little help? Thanks!
If I understand correctly, changing your INNER JOIN to a RIGHT JOIN should return all users regardless of whether they have an item or not, the item columns will just be null for those that don't.
Look into Right Joins and Group By. That will most likely get you the query you are after.
I agree with (and have upvoted) #Lee D and #Bueller. However, I generally advocate LEFT OUTER JOINS, because I find it easier to conceptualized what's going on with them, particularly when you are joining three or more tables. Consider it like so:
Start with what you know you want in the final result set:
FROM table2 t2
and then add in the "optional" data.
FROM table2 t2
left outer join table1 t1
on t1.ParentId = t2.Id
Whether or not matches are found, whatever gets selected from table2 will always appears.
In general, you should prefer the "many queries" approach if (and only if)
it gets you simpler code in total
is fast enough (which you should find out by testing)
In this case, I suspect, both conditions may not apply.
You should come up with a more elaborate single SQL statement and then process the results with your favorite programming language.
What you've described is called an N + 1 query. You have 1 initial query that returns N results, then 1 query for each of your N results. If N is small, the performance difference may not be noticeable - but there will be a larger and larger performance hit as N grows.
If I understand correctly, I think you are looking for something like this
SELECT t1.Id AS tid, t1.Title, t2.Id AS oid, t2.FirstName, t2.LastName
FROM table t1
RIGHT OUTER JOIN table2 t2 ON t1.ParentId = t2.Id AND Zip IN ('zip1', 'zip2', 'etc'))
ORDER BY t2.Id DESC
You can have multiple conditions on your JOIN and RIGHT OUTER will give you all the rows in table2 even if they don't match in table1