SQL String: Sum of maximum of distinct values (group by)

SQL String: Sum of maximum of distinct values (group by) - sql

Assume the following table:
ID COMPANY SUBSIDIARY NR_LIVES INSURANCE_LINE FACTOR_CALC
1 COMPANY_X SUB_1 860 LIFE YES
2 COMPANY_X SUB_1 860 DISABILITY YES
3 COMPANY_X SUB_1 860 MEDICAL YES
4 COMPANY_X SUB_2 46 LIFE YES
5 COMPANY_X SUB_2 689 MEDICAL YES
6 COMPANY_X SUB_3 852 LIFE YES
I need an SQL string that returns to me the value 2401.
This is done by making the sum of the highest NR_Of_Lives per subsidiary where FACTOR_CALC = Yes.
I probably would know how to do it loading everything in a recordset and then using VBA, but I would appreciate it if it were possible in one SQL command.
UPDATE:
The current query:
sSQL_Select = "SELECT SUM(NR_LIVES) FROM (SELECT SUBSIDIARY, MAX(NR_LIVES) FROM T_WILMA WHERE PARENT=" & lParent & " AND ACC_YEAR=" & lAcc_Year & _
" AND FACTOR_CALCULATION=TRUE GROUP BY SUBSIDIARY);"
throws an error: Too few parameters, expected 1.
The subquery on its own works as expected.
Thanks to replies so far, but I haven't succeeded to make it work so far.

You can determine the maximum per subsidiary in a subquery. The outer query can then sum the maximums.
select sum(MaxLives)
from (
select company
, subsidiary
, max(nr_lives) as MaxLives
from YourTable
where factor_calc = 'yes'
group by
company
, subsidiary
) as SubQueryAlias

I'll suggest you include some aliasing to see whether that helps unconfuse the db engine.
sSQL_Select = "SELECT SUM(sub.MaxOfNR_LIVES) AS NR_LIVES" & vbcrlf & _
"FROM (" & vbCrLf & _
"SELECT SUBSIDIARY, MAX(NR_LIVES) AS MaxOfNR_LIVES" & vbCrLf & _
"FROM T_WILMA WHERE PARENT=" & lParent & _
" AND ACC_YEAR=" & lAcc_Year & _
" AND FACTOR_CALCULATION=TRUE GROUP BY SUBSIDIARY) AS sub;"
Debug.Print sSQL_Select

SELECT SUM(NR_LIVES)
from(
SELECT SUBSIDIARY,MAX(NR_LIVES) as NR_LIVES
from <Table>
where FACTOR_CALC='YES'
group by SUBSIDIARY)a

You need to let the system know which NR_LIVES it's trying to add up. On your table (and taking out the extra stuff in the WHERE that is not in your example, this returns 2401
SELECT Sum(MAXNR_LIVES) AS Expr1
FROM (SELECT SUBSIDIARY, MAX(NR_LIVES) AS MAXNR_LIVES FROM T_WILMA
WHERE FACTOR_CALC=TRUE GROUP BY SUBSIDIARY);

Related

Extract data from wharehouse stock, group by article but, if description change, use the last USED

I've one file xls and in the 1th sheet I've the records of my goods stock, I extract this file each year and i put the result into the last row in the same table.
I try to match the last 3 year, the Article code are UNIQUE but the description has been changed.
There are one way to group by article code, and if the description has changed use the last found?
this is the file
this is my query, I use ADODB.Connection:
SELECT tab1[Codice Articolo], tab1Descrizione, Sum(IIf([anno]=2017,[qta],0)) AS Qta2017, Sum(IIf([anno]=2017,[Tot],0)) AS Val2017, Sum(IIf([anno]=2017,[€Pz],0)) AS Cad2017, Sum(IIf([anno]=2018,[Qta],0)) AS Qta2018, Sum(IIf([anno]=2018,[Tot],0)) AS Val2018, Sum(IIf([anno]=2018,[€pz],0)) AS Cad2018, Sum(IIf([anno]=2019,[Qta],0)) AS Qta2019, Sum(IIf([anno]=2019,[Tot],0)) AS Val2019, Sum(IIf([anno]=2019,[€pz],0)) AS Cad2019
FROM tab1
GROUP BY tab1.[Codice Articolo], tab1.Descrizione;
this is the qry result
and this is that I hope to be possible to have:
I think that I need to use one join with the same table, I try some variant of this code, without right:
SELECT t1[Codice Articolo], t2.Descrizione,(IIf(t1.[anno]=2017,t1.[qta],0)) AS Qta2017, Sum(IIf(t1.[anno]=2017,t1.[Tot],0)) AS Val2017, Sum(IIf(t1.[anno]=2017,t1.[€Pz],0)) AS Cad2017, Sum(IIf(t1.[anno]=2018,t1.[Qta],0)) AS Qta2018, Sum(IIf(t1.[anno]=2018,t1.[Tot],0)) AS Val2018, Sum(IIf(t1.[anno]=2018,t1.[€pz],0)) AS Cad2018, Sum(IIf(t1.[anno]=2019,t1.[Qta],0)) AS Qta2019, Sum(IIf(t1.[anno]=2019,t1.[Tot],0)) AS Val2019, Sum(IIf(t1.[anno]=2019,t1.[€pz],0)) AS Cad2019
FROM tab1 t1
LEFT JOIN (Select t2.Descrizione from Tab1 t2 on t2.anno = max(t1.anno)
LEFT JOIN
GROUP BY tab1[Codice Articolo], t2.Descrizione;
Are the correct way but the errore is on the code ore are wrong the approach?
thank for any suggestions

The problem is with the description, whose value is sometimes equal to the article code, and sometimes not. This creates new groups, which is not what you want.
Here is one way to avoid that:
SELECT
tab1[Codice Articolo],
Max(IIf(tab1[Codice Articolo] <> tab1Descrizione, tab1Descrizione, null)) AS tab1Descrizione
Sum(IIf([anno]=2017,[qta],0)) AS Qta2017,
Sum(IIf([anno]=2017,[Tot],0)) AS Val2017,
...,
Sum(IIf([anno]=2019,[€pz],0)) AS Cad2019
FROM tab1
GROUP BY tab1.[Codice Articolo], tab1.Descrizione;

I was sure that the solution to my problem was with JOIN QUERY, I can't see the light because I needed two JOIN.
this is the solution I found:
strsql = "SELECT t1.[Codice Articolo], t2.Descrizione," & _
"Sum(IIf(t1.[Esercizio]=2017,t1.[NrPz],0)) AS Qta2017," & _
"Sum(IIf(t1.[Esercizio]=2017,t1.[€Tot],0)) AS Val2017," & _
"Sum(IIf(t1.[Esercizio]=2017,t1.[€Pz],0)) AS Cad2017," & _
"Sum(IIf(t1.[Esercizio]=2018,t1.[NrPz],0)) AS Qta2018," & _
"Sum(IIf(t1.[Esercizio]=2018,t1.[€Tot],0)) AS Val2018," & _
"Sum(IIf(t1.[Esercizio]=2018,t1.[€pz],0)) AS Cad2018," & _
"Sum(IIf(t1.[Esercizio]=2019,t1.[NrPz],0)) AS Qta2019," & _
"Sum(IIf(t1.[Esercizio]=2019,t1.[€Tot],0)) AS Val2019," & _
"Sum(IIf(t1.[Esercizio] = 2019, t1.[€pz], 0)) As Cad2019 " & _
"FROM ([temp$] t1 INNER JOIN [temp$] AS t2 ON t1.[Codice Articolo] = t2.[Codice Articolo]) INNER JOIN (SELECT Max([Esercizio]) AS maxdiEsercizio, [Codice Articolo] FROM [temp$] GROUP BY [Codice Articolo]) t3 ON (t2.[Codice Articolo] = t3.[Codice Articolo]) AND (t2.Esercizio = t3.maxdiEsercizio) " & _
"GROUP BY t1.[Codice Articolo], t2.Descrizione;"
Thanks.
fabrizio

Count instances of consecutive dates for associated name (VBA, SQL)

Good morning all,
I am trying to determine instances of consecutive dates (excluding Sunday) from a data set. The data is stored in Access and I am pulling the required dates into Excel. I am then trying to determine how many instances each person has in the data provided. Example below.
Data example:
| Name | Date of absence|
| Bob | 02/01/17 |
| Jill | 02/01/17 |
| Bob | 03/01/17 |
| Jill | 04/01/17 |
Result example:
Bob - 1 Instance, 2 days
Jill - 2 Instance, 2 days
I started trying to work through this with VBA in Excel using loops to rotate through each instance of absence until all people had been completed/ticked off, however the code was becoming really cumbersome and it felt very inefficient, not to mention how slow it was getting for larger data sets! I wonder if it is possible to query the database for the info or to write something a bit more efficient.
Any help or suggestions would be appreciated!
Update:
Testing Tom's suggestion;
Sql = "SELECT Absence.Racf,count(RecordDate) as dups"
Sql = Sql & " FROM Absence"
Sql = Sql & " left outer join"
Sql = Sql & " (select Racf, [RecordDate]+IIf(Weekday([RecordDate],7)=1,2,1) as date1 from Absence) t1"
Sql = Sql & " on Absence.RecordDate=t1.date1 and Absence.Racf=t1.Racf"
Sql = Sql & " where date1 Is Not Null"
Sql = Sql & " group by Absence.Racf"
But unfortunately on the list of dates below it returns 7, instead of 5.
Dates:
23-Feb-16,24-Feb-16,08-Aug-16,09-Aug-16,10-Aug-16,31-Aug-16,24-Oct-16,25-Oct-16,26-Oct-16,25-Jan-17,26-Jan-17,27-Jan-17

So this is how the SQL might actually look in an Access query
SELECT table1.name,count(date) as dups
FROM Table1
left outer join
(select name, [date]+IIf(Weekday([Date],7)=1,2,1) as date1 from table1) t1
on table1.date=t1.date1 and table1.name=t1.name
where date1 is not null
group by table1.name
;
If you want to run this from Excel using a macro, here is a useful reference.
I lifted the code from there and changed the lines which set up the SQL query string to
SQL = "SELECT table1.name,count(date) as dups"
SQL = SQL & " FROM table1"
SQL = SQL & " left outer join"
SQL = SQL & " (select name, [date]+IIf(Weekday([Date],7)=1,2,1) as date1 from table1) t1"
SQL = SQL & " on table1.date=t1.date1 and table1.name=t1.name"
SQL = SQL & " where date1 Is Not Null"
SQL = SQL & " group by table1.name"
and it worked fine.
Try this if you want to get sequences with length greater than one
SELECT Absence.Racf, Count(Absence.RecordDate) AS CountOfRecordDate
FROM (Absence LEFT JOIN (select Racf, RecordDate+IIf(Weekday([RecordDate],7)=1,2,1) as RecordDate1 from Absence) AS t1 ON (Absence.RecordDate = t1.RecordDate1) AND (Absence.Racf = t1.Racf))
LEFT JOIN (select Racf, [RecordDate]-IIf(Weekday([RecordDate],2)=1,2,1) as RecordDate2 from Absence) AS t2 ON (Absence.RecordDate = t2.RecordDate2) AND (Absence.Racf = t2.Racf)
WHERE (((t1.RecordDate1) Is Not Null) AND ((t2.RecordDate2) Is Null))
GROUP BY Absence.Racf;
Or this if you want to get sequences of one or more consecutive dates
SELECT Absence.Racf, Count(Absence.RecordDate) AS CountOfRecordDate
FROM Absence LEFT JOIN (select Racf, [RecordDate]+IIf(Weekday([RecordDate],7)=1,2,1) as RecordDate2 from Absence) AS t2 ON (Absence.RecordDate = t2.RecordDate2) AND (Absence.Racf = t2.Racf)
WHERE (((t2.RecordDate2) Is Null))
GROUP BY Absence.Racf;
adding to the SQL string as before.

This can be done using array formula in Excel. In D I have =INDEX($A2:$A$15,MATCH(0,COUNTIF($D$1:$D1,$A2:$A$15),0)) to get the unique employees, then in E I have the following to count the instances =SUM(--(($A$1:$A$15=D1)*(OFFSET($A$1:$A$15,1,0)=D1)*(OFFSET($B$1:$B$15,1,0)-$B$1:$B$15)=1)) which gives the result something like this. You'll need to add another criteria, based on weekday (I will adjust a little later as running low on time) This relies on the data being in date order
EDIT : I understand this is not the full answer and will require modification, a starting point :o)
Covering the Sunday absence (will still need weekday check):
=D1 & " " & COUNTIF($A$1:$A$15,D1) &" instances " & SUM(--(--($A$1:$A$15=D1)*--(OFFSET($A$1:$A$15,1,0)=D1))*--(--(OFFSET($B$1:$B$15,1,0)-$B$1:$B$15=1)+--(OFFSET($B$1:$B$15,1,0)-$B$1:$B$15=2)))&" Consecutive"
Checking the weekday also
=D2 & " " & COUNTIF($A$1:$A$15,D2) &" instances " & SUM(--(--($A$1:$A$15=D2)*--(OFFSET($A$1:$A$15,1,0)=D2))*--(--(OFFSET($B$1:$B$15,1,0)-$B$1:$B$15=1)+--(WEEKDAY(OFFSET($B$1:$B$15,1,0),2)=1)*((OFFSET($B$1:$B$15,1,0)-$B$1:$B$15=2)))) & " Consecutive"

A SQL approach would be something along the lines of, based on a table 000Absence, which is the data from examples EEName and AbsDate.
SELECT abs1.EEName, abs1.AbsDate,
(select count(abs2.EEName) from 000Absence as abs2 where abs2.[EEName]=abs1.[EEName]) AS INSTANCES,
(select count(abs3.EEName) from 000Absence as abs3 where abs3.[EEName]=abs1.[EEName] and abs3.[AbsDate]=abs1.[AbsDate]+iif(weekday(abs3.[AbsDate],7)=1,2,1)) AS CONSECUTIVE
FROM 000Absence AS abs1;
Where the output can be got from the query, grouping by Employee etc.

Select a maximum value within a date range

Task:
Append/edit the currently working code below to return only one row per patient, the maximum value of d1_10.xtransfer (datatype int) with the restriction that this row's d1_10.dstartdate <= glob_End_Date.
Caveats:
There are similar questions on StackOverflow and its sister sites. None that I have found have successfully helped with a resolution to this issue.
This is a medical EHR database, I can share code, but any discussion of results has to be general and exclude patient information.
I am replacing the SQL query within a pre-existing Excel spreadsheet to do something different. Excel pulls information from our database with an ODBC connection. Our database is using Ingres SQL which accepts most, but not all, of your typical SQL code varieties. It's possible that a piece of code will generally work in other flavors of SQL but not with the combo of Ingres and Excel. I've got the spreadsheet working and returning results, now it's about making some fixes by writing SQL code that works in this software.
Thus far:
With the currently working code below (no maximum d1_10.xtransfer restrictions) we return all rows with d1_10.dstartdate in the user selected date range and with the user selected d1_10.xinstitute. We want just the latest one. That is, the patient's row with either the maximum d1_10.dstartdate within the date range, or the maximum d1_10.xtransfer (index that counts up as they are added) within the date range.
Currently working code:
"SELECT " & _
"d1.xpid ""XPID"", " & _
"d0_v1.name_family ""NAME_FAMILY"", " & _
"d0_v1.name_given1 ""NAME_GIVEN1"", " & _
"d0_v1.name_given2 ""NAME_GIVEN2"", " & _
"d1.sex ""SEX"", " & _
"d1.birthdate ""DOB"", " & _
"d0_v1.hsp_pid, " & _
"c58.brief_name, " & _
"c73.cname, " & _
"date_trunc('day',d1_10.dstartdate) ""DSTARTDATE"", " & _
"date_trunc('day',d1_17.ddeath) ""DDEATH"" " & _
"FROM d1 " & _
"JOIN d0_v1 ON d1.xpid = d0_v1.xpid " & _
"JOIN d1_2 ON d1.xpid = d1_2.xpid " & _
"JOIN c58 ON d1_2.xmodality = c58.xcmodality " & _
"JOIN d1_10 ON d1.xpid = d1_10.xpid " & _
"JOIN c73 ON d1_10.xinstitute = c73.xcsite " & _
"JOIN d1_17 ON d1.xpid = d1_17.xpid " & _
"WHERE " & _
"d1_10.xinstitute = " & institute_index & " AND " & _
"d1_10.dstartdate >= '" & glob_Start_Date & " 00:00:00' and " & _
"d1_10.dstartdate <= '" & glob_End_Date & " 23:59:59' "
The closest I have gotten with code that runs from the excel spreadsheet is with this additional line in the WHERE clause:
d1_10.xtransfer = (SELECT MAX(d1_10.xtransfer) FROM d1_10 GROUP BY xpid)
With this additional line we now return only one row from each patient that has a d1_10.xtransfer within the date range. But if they have a row where d1_10.xtransfer is more recent than the date range, then they don't show up in the results at all.
With this line the code is taking MAX(d1_10.xtransfer) for each xpid before it applies the date restriction. By my logic we want it to do so after instead, but I have been unable to come up with code that runs that gets it any closer than this.
Thanks in advance. I'll keep this question updated with additional info below this page break.
Additional Info:
Per PaulM:
Yes, xpid is a patient ID index number, unique to each patient.
Added/edited line in WHERE clause to: "d1_10.xtransfer = (SELECT MAX(xtransfer) FROM d1_10 d1_10_b WHERE d1_10.xpid = d1_10_b.xpid AND d1_10_b.dstartdate <= '" & glob_End_Date & " 23:59:59') "
Patient Bob has transfers on both the 14th and 17th of June that fit the rest of the criteria.
When inputting a date range with an end date of Jun 17+, the spreadsheet correctly returns a row for Bob with his Jun 17 transfer.
When inputting a date range with an end date of Jun 14,15 or 16, the spreadsheet incorrectly does not return a row for Bob.
It seems as though it still takes the maximum xtransfer before restricting by date.
Per PaulM's comment:
I ran the subselect for a specific patient as follows:
Input:
SELECT MAX(xtransfer) FROM d1_10 d1_10_b WHERE d1_10_b.xpid = '2258' AND d1_10_b.dstartdate <= '20-apr-2016 23:59:59'
It outputted a value of MAX(xtransfer) = '48233'. This is correct.
So, when run in Visual SQL as its own statement, setting d1_10_b.xpid equal to a specific patient, it correctly pulls the maximum xtransfer from the date range. (There was a more recent xtransfer outside of the date range, and it still correctly displayed the maximum xtransfer from within the date range.)
I then tried running this exact same subselect in the where close for the spreadsheet. That is, I manually selected the same date range (which is being passed through as a variable correctly and successfully) but I subbed out d1_10.xpid = d1_10_b.xpid for d1_10_b.xpid = '2258'. This did not work. The spreadsheet did not show a row for this patient, seemingly because it still applies the MAX() function before it restricts by the date range in the subquery. And yet, the subquery works when run by itself.
Much appreciation for any further suggestions.

You need to add the date restriction in the subselect as well as the main query. Also I suspect the group by is wrong. By adding a group by you're making the subselect a list of patients xtransfer values with the largest value for each xpid (identifies a patient?). However that means if the row you're interested in from the main query happens to have an xtransfer value that matches the largest one belonging to a different xpid you're getting a false match.
What you really need is to add a join on xpid from the subselect back up to the main query. To do that you'll need a different correlation name e.g.
d1_10.xtransfer = (SELECT MAX(xtransfer)
FROM d1_10 d1_10_b
WHERE d1_10.xpid = d1_10_b.xpid
AND d1_10_b.dstartdate > = ... {as above} )

Left Join not working with WHERE condition set to <>

Before I ask this question I must admit that I am new to SQL but here it goes:
I have 3 tables: Crown Facility, tblSubProjects, and lnkSubProjectFacility.
I use lnkSubProjectFacility to assign facilities to sub-projects and vice-verse. When the user selects a sub-project from a combo box (cboAddFacSubProject) in a form, the AfterUpdate event in the combo box causes the 2 listboxes to populate with the associated records in the lnkSubProjectFacility table records.
The LEFT listbox (lstAddSubProjectFacilities) displays all facilities associated with the sub-project and the RIGHT (lstAddSubProFac) displays all available remaining facilities that the user can assign, with a command button, and remove a available facility from the right listbox and moving it to the left (thus creating a new record in the lnkSubProjectFacility table).
Everything works fine except I cannot seem to get the right listbox to populate with the correct available facilities in the combo box's AfterUpdate event with the VBA code below:
Private Sub cboAddFacSubProject_AfterUpdate()
Dim strSQL As String, strSQL2 As String
'String SQL statment variable for the LEFT listbox - to display all ASSIGNED facilities
strSQL = "SELECT [CROWN Facility].FACILITY_ID, " & _
"[CROWN Facility].FACILITY_NAME, " & _
"lnkSubProjectFacility.SUBPROJECT_ID " & _
"FROM [CROWN Facility] INNER JOIN lnkSubProjectFacility " & _
"ON [CROWN Facility].FACILITY_ID = lnkSubProjectFacility.FACILITY_ID " & _
"WHERE lnkSubProjectFacility.SUBPROJECT_ID =" & Me.cboAddFacSubProject & " " & _
"ORDER BY [CROWN Facility].FACILITY_NAME"
'String SQL statment variable for the RIGHT listbox - to display all AVAILABLE facilities
strSQL2 = "SELECT [CROWN Facility].FACILITY_ID, " & _
"[CROWN Facility].FACILITY_NAME, " & _
"lnkSubProjectFacility.SUBPROJECT_ID " & _
"FROM [CROWN Facility] LEFT JOIN lnkSubProjectFacility " & _
"ON [CROWN Facility].FACILITY_ID = lnkSubProjectFacility.FACILITY_ID " & _
"WHERE lnkSubProjectFacility.SUBPROJECT_ID <>" & Me.cboAddFacSubProject & " " & _
"ORDER BY [CROWN Facility].FACILITY_NAME"
'RowSource for the LEFT listbox - to display all assigned facilities
lstAddSubProjectFacilities.RowSource = strSQL
'RowSource for the RIGHT listbox - to display all available facilities
lstAddSubProFac.RowSource = strSQL2
'This is just updating a label showing the count of items in the left listbox
lblListCt.Caption = lstAddSubProjectFacilities.ListCount & " Facilities Selected"
End Sub
After executing this code, everything is fine in the left listbox, with all assigned records for that sub-project showing correctly. However, despite the right listbox excluding all facilities shown in the left, it also excludes ANY facilities that were assigned to any other sub-projects in that table.
In addition for searching for hours for this question, I have experimented with the strSQL2 variable by trying to add IS NULL at the end (which of course does not work) and many other things such as changing the JOIN types. Most recently, I changed WHERE to AND which returns nothing.
I am sure there is a fairly simple solution to this but I would be very appreciative of any assistance to get me there!
Note: I am using Access 2010 but I do not think that makes any difference.
EDIT: Here is the structure for the lnkSubProjectFacility table:
SUBPROJECT_ID FACILITY_ID
7 20000003
7 20000025
7 20000027
8 20010302
8 20021781
9 20040035
9 20044392
10 20045465
17 10000282
17 10000452
17 10000844
21 20000005
21 20000019
21 20000026
CROWN Facility table structure:
FACILITY_ID FACILITY_NAME
20000003 Barnes
20000025 Bio-Medical Applications
20000027 Barnes Center
20010302 Atlantic
20021781 Anthonys Hospital
20040035 Black Hawk
20044392 Ames
20045465 Arnold
10000282 BETHANY
10000452 ANDOVER
10000844 Ankeny
20000005 Columbia
20000019 Baptist
20000026 Childrens Hospital
tblSubProjects table structure:
SUBPROJECT_ID SUBPROJECT
7 Service Project1
7 Service Project1
7 Service Project1
8 Service Project2
8 Service Project2
9 Service Project3
9 Service Project3
10 Service Project4
17 CatheterReduction1
17 CatheterReduction1
17 CatheterReduction1
21 Patient Access3
21 Patient Access3
21 Patient Access3

I am going to assume that lnkSubProjectFacility is meant to relate Facility to tblSubProjects. lnkSubProjectFacility seems to be the only table I would change as far as schema
lnkSubProjectFacility
SPF_Id, SubProject_Id, Facility_Id
1 7 20000003
2 7 20000025
3 7 20000027
4 8 20010302
. . .
. . .
. . .
try to normalize the data as much as possible that way it is easier to write queries.
if you are trying to write a query that gets all facilities that have not been assigned a subproject:
select [CROWN FACILITY].FACILITY_ID, [CROWN FACILITY].FACILITY_NAME
from [CROWN FACILITY] as CF
where not exists (select *
from lnkSubProjectFacility as lSPF
where CF.FACILITY_ID == lSPF.FACILITY_ID)
i believe that query should work, hard to tell when I cannot test it.
sometimes a join is not the answer. a join can look a lot nicer, but sometimes a subquery is better

MS Access SQL find the year a value first appears

Hi I'm trying to run a query on a species database. I want to query unique values but also find the year that species was first found.
So far I have this:
SELECT DISTINCT [Genus_HeTR] & " " & [Species_HeTR] AS Species
FROM HeTR_Rec
WHERE [Species_HeTR] <> "sp."
UNION SELECT DISTINCT [Genus_HeOP] & " " & [Species_HeOP] AS Species
FROM HeOP_Rec
WHERE [Species_HeOP] <> ""
AND [Species_HeOP] <> "sp.";
I'm concatenating the Genus and species name and adding data from two different tables (hence the UNION). This provides a species list but I would like to know the year that species was seen at this site.

I will hazard a guess that both your source tables include a Date/Time field which stores the date of each observation. If that is so, you can UNION data from the 2 tables and use that as a subquery source in a GROUP BY query where you derive the minimum observation year for each species.
SELECT
sub.Species,
Min(sub.observation_year) AS first_sighting_year
FROM
(
SELECT
[Genus_HeTR] & " " & [Species_HeTR] AS Species,
Year(observation_date) AS observation_year
FROM HeTR_Rec
WHERE [Species_HeTR] <> "sp."
UNION ALL
SELECT
[Genus_HeOP] & " " & [Species_HeOP] AS Species,
Year(observation_date) AS observation_year
FROM HeOP_Rec
WHERE [Species_HeOP] <> ""
AND [Species_HeOP] <> "sp."
) AS sub
GROUP BY sub.Species;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL String: Sum of maximum of distinct values (group by) - sql

You can determine the maximum per subsidiary in a subquery. The outer query can then sum the maximums. select sum(MaxLives) from ( select company , subsidiary , max(nr_lives) as MaxLives from YourTable where factor_calc = 'yes' group by company , subsidiary ) as SubQueryAlias

SELECT SUM(NR_LIVES) from( SELECT SUBSIDIARY,MAX(NR_LIVES) as NR_LIVES from <Table> where FACTOR_CALC='YES' group by SUBSIDIARY)a

Related

Extract data from wharehouse stock, group by article but, if description change, use the last USED

Count instances of consecutive dates for associated name (VBA, SQL)

Select a maximum value within a date range

Left Join not working with WHERE condition set to <>

MS Access SQL find the year a value first appears

Categories

Resources