SQL Server query. JOIN by latest date - sql

I have 3 tables:
UnitInfo(UnitID, ...),
UnitList(UnitID, ...)
UnitMonitoring(RecordID,UnitID, EventDate, ...)
UnitList is a subset of UnitInfo (in terms of data and in terms of columns). UnitMonitoring receives records time to time pertaining to UnitList (for every UnitID in UnitMonitoring we will have many records) filling EventDate. (UnitInfo has extended info).
I can't figure how to build a query so that for every UnitID in UnitList I get UnitMonitoring record such that EventDate is the latest one.
So far I have
SELECT a.UnitID, a.Name, b.EventDate
FROM UnitInfo a INNER JOIN UnitMonitoring b
WHERE a.UnitID IN (SELECT UnitID FROM UnitList)
which yields all records from UnitMonitoring

SELECT ul.unitId, um.*
FROM UnitList ul
OUTER APPLY
(
SELECT TOP 1 *
FROM UnitMonitoring umi
WHERE umi.UnitID = ul.unitID
ORDER BY
EventDate DESC
)
This will handle the duplicates correctly and will return all units (those with no records in UnitMonitoring will have NULL values in corresponding fields)

I chose to go with a Common Table Expression (CTE) to apply a ranking function (ROW_NUMBER):
;WITH NumberedMonitoring as (
SELECT RecordID,UnitID, EventDate, ...
ROW_NUMBER() over (PARTITION BY UnitID ORDER BY EventDate desc) rn
FROM UnitMonitoring
)
SELECT * FROM
UnitList ul
inner join
NumberedMonitoring nm
on
ul.UnitID = nm.UnitID and nm.rn = 1
But there are many different solutions (the above could also be done using a subselect).
Common Table Expressions (quoting from above link):
A common table expression (CTE) can be thought of as a temporary result set
That is, it lets you write a bit of the query first. In this case, I'm using it because I want to number the rows (using the ROW_NUMBER function). I'm telling it to restart the numbering for each UnitID (PARTITION BY UnitID), and within each unit ID, I want the rows numbered based on the EventDate descending (ORDER BY EventDate desc). This means that the row that receives row number 1 (within each UnitID partition) is the most recent row.
In the following select, I'm able to treat my CTE (NumberedMonitoring) as if it's any other table. So I'm just joining it to the UnitList table, and ensuring as part of the join conditions that I'm only selecting row number 1 (rn = 1)

Try:
Select M.*
From UnitList L
Join UnitMonitoring M
On M.UnitId = L.UnitId
Where M.EventDate =
(Select Max(EventDate) From UnitMonitoring
Where UnitId = M.UnitId)
If There are multiple records with the same UnitId and EventDate, then you can still use this technique, but you need to filter on a unique field, say the PK field in UnitMonitoring in this case is named PkId.
Select M.*
From UnitList L
Join UnitMonitoring M
On M.UnitId = L.UnitId
Where M.PkId =
(Select Max(PkId) From UnitMonitoring iM
Where UnitId = M.UnitId
And EventDate =
(Select Max(EventDate) From UnitMonitoring
Where UnitId = M.UnitId))

SELECT a.UnitID, a.Name, MAX(b.EventDate)
FROM UnitInfo a
INNER JOIN UnitMonitoring b
WHERE a.UnitID IN (SELECT UnitID FROM UnitList)
GROUP BY a.UnitID, a.Name

Related

Get n grouped categories and sum others into one

I have a table with the following structure:
Contents (
id
name
desc
tdate
categoryid
...
)
I need to do some statistics with the data in this table. For example I want to get number of rows with the same category by grouping and id of that category. Also I want to limit them for n rows in descending order and if there are more categories available I want to mark them as "Others". So far I have come out with 2 queries to database:
Select n rows in descending order:
SELECT COALESCE(ca.NAME, 'Unknown') AS label
,ca.id AS catid
,COUNT(c.id) AS data
FROM contents c
LEFT OUTER JOIN category ca ON ca.id = c.categoryid
GROUP BY label
,catid
ORDER BY data DESC LIMIT 7
Select other rows as one:
SELECT 'Others' AS label
,COUNT(c.id) AS data
FROM contents c
LEFT OUTER JOIN category ca ON ca.id = c.categoryid
WHERE c.categoryid NOT IN ($INCONDITION)
But when I have no category groups left in db table I still get an "Others" record. Is it possible to make it in one query and make the "Others" record optional?
The specific difficulty here: Queries with one or more aggregate functions in the SELECT list and no GROUP BY clause produce exactly one row, even if no row is found in the underlying table.
There is nothing you can do in the WHERE clause to suppress that row. You have to exclude such a row after the fact, i.e. in the HAVING clause, or in an outer query.
Per documentation:
If a query contains aggregate function calls, but no GROUP BY clause,
grouping still occurs: the result is a single group row (or perhaps no
rows at all, if the single row is then eliminated by HAVING). The same
is true if it contains a HAVING clause, even without any aggregate
function calls or GROUP BY clause.
It should be noted that adding a GROUP BY clause with only a constant expression (which is otherwise completely pointless!) works, too. See example below. But I'd rather not use that trick, even if it's short, cheap and simple, because it's hardly obvious what it does.
The following query only needs a single table scan and returns the top 7 categories ordered by count. If (and only if) there are more categories, the rest is summarized into 'Others':
WITH cte AS (
SELECT categoryid, count(*) AS data
, row_number() OVER (ORDER BY count(*) DESC, categoryid) AS rn
FROM contents
GROUP BY 1
)
( -- parentheses required again
SELECT categoryid, COALESCE(ca.name, 'Unknown') AS label, data
FROM cte
LEFT JOIN category ca ON ca.id = cte.categoryid
WHERE rn <= 7
ORDER BY rn
)
UNION ALL
SELECT NULL, 'Others', sum(data)
FROM cte
WHERE rn > 7 -- only take the rest
HAVING count(*) > 0; -- only if there actually is a rest
-- or: HAVING sum(data) > 0
You need to break ties if multiple categories can have the same count across the 7th / 8th rank. In my example, categories with the smaller categoryid win such a race.
Parentheses are required to include a LIMIT or ORDER BY clause to an individual leg of a UNION query.
You only need to join to table category for the top 7 categories. And it's generally cheaper to aggregate first and join later in this scenario. So don't join in the the base query in the CTE (common table expression) named cte, only join in the first SELECT of the UNION query, that's cheaper.
Not sure why you need the COALESCE. If you have a foreign key in place from contents.categoryid to category.id and both contents.categoryid and category.name are defined NOT NULL (like they probably should be), then you don't need it.
The odd GROUP BY true
This would work, too:
...
UNION ALL
SELECT NULL , 'Others', sum(data)
FROM cte
WHERE rn > 7
GROUP BY true;
And I even get slightly faster query plans. But it's a rather odd hack ...
SQL Fiddle demonstrating all.
Related answer with more explanation for the UNION ALL / LIMIT technique:
Sum results of a few queries and then find top 5 in SQL
The quick fix, to make the 'Others' row conditional would be to add a simple HAVING clause to that query.
HAVING COUNT(c.id) > 0
(If there are no other rows in the contents table, then COUNT(c.id) is going to be zero.)
That only answers half the question, how to make the return of that row conditional.
The second half of the question is a little more involved.
To get the whole resultset in one query, you could do something like this
(this is not tested yet; desk checked only.. I'm not sure if postgresql accepts a LIMIT clause in an inline view... if it doesn't we'd need to implement a different mechanism to limit the number of rows returned.
SELECT IFNULL(t.name,'Others') AS name
, t.catid AS catid
, COUNT(o.id) AS data
FROM contents o
LEFT
JOIN category oa
ON oa.id = o.category_id
LEFT
JOIN ( SELECT COALESCE(ca.name,'Unknown') AS name
, ca.id AS catid
, COUNT(c.id) AS data
FROM contents c
LEFT
JOIN category ca
ON ca.id = c.categoryid
GROUP
BY COALESCE(ca.name,'Unknown')
, ca.id
ORDER
BY COUNT(c.id) DESC
, ca.id DESC
LIMIT 7
) t
ON ( t.catid = oa.id OR (t.catid IS NULL AND oa.id IS NULL))
GROUP
BY ( t.catid = oa.id OR (t.catid IS NULL AND oa.id IS NULL))
, t.catid
ORDER
BY COUNT(o.id) DESC
, ( t.catid = oa.id OR (t.catid IS NULL AND oa.id IS NULL)) DESC
, t.catid DESC
LIMIT 7
The inline view t basically gets the same result as the first query, a list of (up to) 7 id values from category table, or 6 id values from category table and a NULL.
The outer query basically does the same thing, joining content with category, but also doing a check if there's a matching row from t. Because t might be returning a NULL, we have a slightly more complicated comparison, where we want a NULL value to match a NULL value. (MySQL conveniently gives us shorthand operator for this, the null-safe comparison operator <=>, but I don't think that's available in postgresql, so we have to express differently.
a = b OR (a IS NULL AND b IS NULL)
The next bit is getting a GROUP BY to work, we want to group by the 7 values returned by the inline view t, or, if there's not matching value from t, group the "other" rows together. We can get that to happen by using a boolean expression in the GROUP BY clause.
We're basically saying "group by 'if there was a matching row from t'" (true or false) and then group by the row from 't'. Get a count, and then order by the count descending.
This isn't tested, only desk checked.
You can approach this with nested aggregation. The inner aggregation calculates the counts along with a sequential number. You want to take everything whose number is 7 or less and then combine everything else into the others category:
SELECT (case when seqnum <= 7 then label else 'others' end) as label,
(case when seqnum <= 7 then catid end) as catid, sum(cnt)
FROM (SELECT ca.name AS label, ca.id AS catid, COUNT(c.id) AS cnt,
row_number() over (partition by ca.name, catid order by count(c.id) desc) as seqnum
FROM contents c LEFT OUTER JOIN
category ca
ON ca.id = c.categoryid
GROUP BY label, catid
) t
GROUP BY (case when seqnum <= 7 then label else 'others' end),
(case when seqnum <= 7 then catid end)
ORDER BY cnt DESC ;

Filter SQL data by repetition on a column

Very simple basic SQL question here.
I have this table:
Row Id __________Hour__Minute__City_Search
1___1409346767__23____24_____Balears (Illes)
2___1409346767__23____13_____Albacete
3___1409345729__23____7______Balears (Illes)
4___1409345729__23____3______Balears (Illes)
5___1409345729__22____56_____Balears (Illes)
What I want to get is only one distinct row by ID and select the last City_Search made by the same Id.
So, in this case, the result would be:
Row Id __________Hour__Minute__City_Search
1___1409346767__23____24_____Balears (Illes)
3___1409345729__23____7______Balears (Illes)
What's the easier way to do it?
Obviously I don't want to delete any data just query it.
Thanks for your time.
SELECT Row,
Id,
Hour,
Minute,
City_Search
FROM Table T
JOIN
(
SELECT MIN(Row) AS Row,
ID
FROM Table
GROUP BY ID
) AS M
ON M.Row = T.Row
AND M.ID = T.ID
Can you change hour/minute to a timestamp?
What you want in this case is to first select what uniquely identifies your row:
Select id, max(time) from [table] group by id
Then use that query to add the data to it.
SELECT id,city search, time
FROM (SELECT id, max(time) as lasttime FROM [table] GROUP BY id) as Tkey
INNER JOIN [table] as tdata
ON tkey.id = tdata.id AND tkey.lasttime = tdata.time
That should do it.
two options to do it without join...
use Row_Number function to find the last one
Select * FROM
(Select *,
row_number() over(Partition BY ID Order BY Hour desc Minute Desc) as RNB
from table)
Where RNB=1
Manipulate the string and using simple Max function
Select ID,Right(MAX(Concat(Hour,Minute,RPAD(Searc,20,''))),20)
From Table
Group by ID
avoiding Joins is usually much faster...
Hope this helps

SQL Max returns duplicates if values are equal

I've got a view that contains a document ID column and a date column as well as a dozen other columns that aren't relevant to this problem. There can be multiple rows with the same document ID, but the dates are usually different. This signifies that it's the same document, just a revision of it. The problem is if I have two rows where the document ID and the date are the same, I get both. I just want to get one. It doesn't matter which one, as long as I only get one.
The following has duplicates where the document ID and date are the same.
SELECT FSD.*
FROM vFSD FSD
INNER JOIN
(
SELECT InternalID, MAX(FileLastUploadedDate) AS FileLastUploadedDate
FROM vFSD
GROUP BY InternalID
) gFSD ON FSD.InternalID = gFSD.InternalID AND FSD.FileLastUploadedDate = gFSD.FileLastUploadedDate
I've also tried it with DISTINCT, but it didn't fix the problem.
SELECT DISTINCT FSD.*
FROM vFSD FSD
INNER JOIN
(
SELECT DISTINCT InternalID, MAX(FileLastUploadedDate) AS FileLastUploadedDate
FROM vFSD
GROUP BY InternalID
) gFSD ON FSD.InternalID = gFSD.InternalID AND FSD.FileLastUploadedDate = gFSD.FileLastUploadedDate
You can use ROW_NUMBER to only bring back one arbitrary row in the event that two are tied with the same greatest FileLastUploadedDate for an InternalID
WITH CTE
AS (SELECT *,
ROW_NUMBER() OVER (PARTITION BY InternalID
ORDER BY FileLastUploadedDate DESC) AS RN
FROM vFSD)
SELECT InternalID,
FileLastUploadedDate
/*Other desired columns*/
FROM CTE
WHERE RN = 1

Over clause in SQL Server

I have the following query
select * from
(
SELECT distinct
rx.patid
,rx.fillDate
,rx.scriptEndDate
,MAX(datediff(day, rx.filldate, rx.scriptenddate)) AS longestScript
,rx.drugClass
,COUNT(rx.drugName) over(partition by rx.patid,rx.fillDate,rx.drugclass) as distinctFamilies
FROM [I 3 SCI control].dbo.rx
where rx.drugClass in ('h3a','h6h','h4b','h2f','h2s','j7c','h2e')
GROUP BY rx.patid, rx.fillDate, rx.scriptEndDate,rx.drugName,rx.drugClass
) r
order by distinctFamilies desc
which produces results that look like
This should mean that between the two dates in the table the patID that there should be 5 unique drug names. However, when I run the following query:
select distinct *
from rx
where patid = 1358801781 and fillDate between '2008-10-17' and '2008-11-16' and drugClass='H4B'
I have a result set returned that looks like
You can see that while there are in fact five rows returned for the second query between the dates of 2008-10-17 and 2009-01-15, there are only three unique names. I've tried various ways of modifying the over clause, all with different levels of non-success. How can I alter my query so that I only find unique drugNames within the timeframe specified for each row?
Taking a shot at it:
SELECT DISTINCT
patid,
fillDate,
scriptEndDate,
MAX(DATEDIFF(day, fillDate, scriptEndDate)) AS longestScript,
drugClass,
MAX(rn) OVER(PARTITION BY patid, fillDate, drugClass) as distinctFamilies
FROM (
SELECT patid, fillDate, scriptEndDate, drugClass,rx.drugName,
DENSE_RANK() OVER(PARTITION BY patid, fillDate, drugClass ORDER BY drugName) as rn
FROM [I 3 SCI control].dbo.rx
WHERE drugClass IN ('h3a','h6h','h4b','h2f','h2s','j7c','h2e')
)x
GROUP BY x.patid, x.fillDate, x.scriptEndDate,x.drugName,x.drugClass,x.rn
ORDER BY distinctFamilies DESC
Not sure if DISTINCT is really necessary - left it in since you've used it.

Select a Column in SQL not in Group By

I have been trying to find some info on how to select a non-aggregate column that is not contained in the Group By statement in SQL, but nothing I've found so far seems to answer my question. I have a table with three columns that I want from it. One is a create date, one is a ID that groups the records by a particular Claim ID, and the final is the PK. I want to find the record that has the max creation date in each group of claim IDs. I am selecting the MAX(creation date), and Claim ID (cpe.fmgcms_cpeclaimid), and grouping by the Claim ID. But I need the PK from these records (cpe.fmgcms_claimid), and if I try to add it to my select clause, I get an error. And I can't add it to my group by clause because then it will throw off my intended grouping. Does anyone know any workarounds for this? Here is a sample of my code:
Select MAX(cpe.createdon) As MaxDate, cpe.fmgcms_cpeclaimid
from Filteredfmgcms_claimpaymentestimate cpe
where cpe.createdon < 'reportstartdate'
group by cpe.fmgcms_cpeclaimid
This is the result I'd like to get:
Select MAX(cpe.createdon) As MaxDate, cpe.fmgcms_cpeclaimid, cpe.fmgcms_claimid
from Filteredfmgcms_claimpaymentestimate cpe
where cpe.createdon < 'reportstartdate'
group by cpe.fmgcms_cpeclaimid
The columns in the result set of a select query with group by clause must be:
an expression used as one of the group by criteria , or ...
an aggregate function , or ...
a literal value
So, you can't do what you want to do in a single, simple query. The first thing to do is state your problem statement in a clear way, something like:
I want to find the individual claim row bearing the most recent
creation date within each group in my claims table
Given
create table dbo.some_claims_table
(
claim_id int not null ,
group_id int not null ,
date_created datetime not null ,
constraint some_table_PK primary key ( claim_id ) ,
constraint some_table_AK01 unique ( group_id , claim_id ) ,
constraint some_Table_AK02 unique ( group_id , date_created ) ,
)
The first thing to do is identify the most recent creation date for each group:
select group_id ,
date_created = max( date_created )
from dbo.claims_table
group by group_id
That gives you the selection criteria you need (1 row per group, with 2 columns: group_id and the highwater created date) to fullfill the 1st part of the requirement (selecting the individual row from each group. That needs to be a virtual table in your final select query:
select *
from dbo.claims_table t
join ( select group_id ,
date_created = max( date_created )
from dbo.claims_table
group by group_id
) x on x.group_id = t.group_id
and x.date_created = t.date_created
If the table is not unique by date_created within group_id (AK02), you you can get duplicate rows for a given group.
You can do this with PARTITION and RANK:
select * from
(
select MyPK, fmgcms_cpeclaimid, createdon,
Rank() over (Partition BY fmgcms_cpeclaimid order by createdon DESC) as Rank
from Filteredfmgcms_claimpaymentestimate
where createdon < 'reportstartdate'
) tmp
where Rank = 1
The direct answer is that you can't. You must select either an aggregate or something that you are grouping by.
So, you need an alternative approach.
1). Take you current query and join the base data back on it
SELECT
cpe.*
FROM
Filteredfmgcms_claimpaymentestimate cpe
INNER JOIN
(yourQuery) AS lookup
ON lookup.MaxData = cpe.createdOn
AND lookup.fmgcms_cpeclaimid = cpe.fmgcms_cpeclaimid
2). Use a CTE to do it all in one go...
WITH
sequenced_data AS
(
SELECT
*,
ROW_NUMBER() OVER (PARITION BY fmgcms_cpeclaimid ORDER BY CreatedOn DESC) AS sequence_id
FROM
Filteredfmgcms_claimpaymentestimate
WHERE
createdon < 'reportstartdate'
)
SELECT
*
FROM
sequenced_data
WHERE
sequence_id = 1
NOTE: Using ROW_NUMBER() will ensure just one record per fmgcms_cpeclaimid. Even if multiple records are tied with the exact same createdon value. If you can have ties, and want all records with the same createdon value, use RANK() instead.
You can join the table on itself to get the PK:
Select cpe1.PK, cpe2.MaxDate, cpe1.fmgcms_cpeclaimid
from Filteredfmgcms_claimpaymentestimate cpe1
INNER JOIN
(
select MAX(createdon) As MaxDate, fmgcms_cpeclaimid
from Filteredfmgcms_claimpaymentestimate
group by fmgcms_cpeclaimid
) cpe2
on cpe1.fmgcms_cpeclaimid = cpe2.fmgcms_cpeclaimid
and cpe1.createdon = cpe2.MaxDate
where cpe1.createdon < 'reportstartdate'
Thing I like to do is to wrap addition columns in aggregate function, like max().
It works very good when you don't expect duplicate values.
Select MAX(cpe.createdon) As MaxDate, cpe.fmgcms_cpeclaimid, MAX(cpe.fmgcms_claimid) As fmgcms_claimid
from Filteredfmgcms_claimpaymentestimate cpe
where cpe.createdon < 'reportstartdate'
group by cpe.fmgcms_cpeclaimid
What you are asking, Sir, is as the answer of RedFilter.
This answer as well helps in understanding why group by is somehow a simpler version or partition over:
SQL Server: Difference between PARTITION BY and GROUP BY
since it changes the way the returned value is calculated and therefore you could (somehow) return columns group by can not return.
You can use as below,
Select X.a, X.b, Y.c from (
Select X.a as a, sum (b) as sum_b from name_table X
group by X.a)X
left join from name_table Y on Y.a = X.a
Example;
CREATE TABLE #products (
product_name VARCHAR(MAX),
code varchar(3),
list_price [numeric](8, 2) NOT NULL
);
INSERT INTO #products VALUES ('paku', 'ACE', 2000)
INSERT INTO #products VALUES ('paku', 'ACE', 2000)
INSERT INTO #products VALUES ('Dinding', 'ADE', 2000)
INSERT INTO #products VALUES ('Kaca', 'AKB', 2000)
INSERT INTO #products VALUES ('paku', 'ACE', 2000)
--SELECT * FROM #products
SELECT distinct x.code, x.SUM_PRICE, product_name FROM (SELECT code, SUM(list_price) as SUM_PRICE From #products
group by code)x
left join #products y on y.code=x.code
DROP TABLE #products