Variant use of the GROUP BY clause in TSQL - sql

Imagine the following schema and sample data (SQL Server 2008):
OriginatingObject
----------------------------------------------
ID
1
2
3
ValueSet
----------------------------------------------
ID OriginatingObjectID DateStamp
1 1 2009-05-21 10:41:43
2 1 2009-05-22 12:11:51
3 1 2009-05-22 12:13:25
4 2 2009-05-21 10:42:40
5 2 2009-05-20 02:21:34
6 1 2009-05-21 23:41:43
7 3 2009-05-26 14:56:01
Value
----------------------------------------------
ID ValueSetID Value
1 1 28
etc (a set of rows for each related ValueSet)
I need to obtain the ID of the most recent ValueSet record for each OriginatingObject. Do not assume that the higher the ID of a record, the more recent it is.
I am not sure how to use GROUP BY properly in order to make sure the set of results grouped together to form each aggregate row includes the ID of the row with the highest DateStamp value for that grouping. Do I need to use a subquery or is there a better way?

You can do it with a correlated subquery or using IN with multiple columns and a GROUP-BY.
Please note, simple GROUP-BY can only bring you to the list of OriginatingIDs and Timestamps. In order to pull the relevant ValueSet IDs, the cleanest solution is use a subquery.
Multiple-column IN with GROUP-BY (probably faster):
SELECT O.ID, V.ID
FROM Originating AS O, ValueSet AS V
WHERE O.ID = V.OriginatingID
AND
(V.OriginatingID, V.DateStamp) IN
(
SELECT OriginatingID, Max(DateStamp)
FROM ValueSet
GROUP BY OriginatingID
)
Correlated Subquery:
SELECT O.ID, V.ID
FROM Originating AS O, ValueSet AS V
WHERE O.ID = V.OriginatingID
AND
V.DateStamp =
(
SELECT Max(DateStamp)
FROM ValueSet V2
WHERE V2.OriginatingID = O.ID
)

SELECT OriginatingObjectID, id
FROM (
SELECT id, OriginatingObjectID, RANK() OVER(PARTITION BY OriginatingObjectID
ORDER BY DateStamp DESC) as ranking
FROM ValueSet)
WHERE ranking = 1;

This can be done with a correlated sub-query. No GROUP-BY necessary.
SELECT
vs.ID,
vs.OriginatingObjectID,
vs.DateStamp,
v.Value
FROM
ValueSet vs
INNER JOIN Value v ON v.ValueSetID = vs.ID
WHERE
NOT EXISTS (
SELECT 1
FROM ValueSet
WHERE OriginatingObjectID = vs.OriginatingObjectID
AND DateStamp > vs.DateStamp
)
This works only if there can not be two equal DateStamps for a OriginatingObjectID in the ValueSet table.

Related

SQL query to return duplicate rows for certain column, but with unique values for another column

I have written the query shown here that combines three tables and returns rows where the at_ticket_num from appeal_tickets is duplicated but against a different at_sys_ref value
select top 100
t.t_reference, at.at_system_ref, at_ticket_num, a.a_case_ref
from
tickets t, appeal_tickets at, appeals_2 a
where
t.t_reference in ('AB123','AB234') -- filtering on these values so that I can see that its working
and t.t_number = at.at_ticket_num
and at.at_system_ref = a.a_system_ref
and at.at_ticket_num IN (select at_ticket_num
from appeal_tickets
group by at_ticket_num
having count(distinct at_system_ref) > 1)
order by
t.t_reference desc
This is the output:
t_reference at_system_ref at_ticket_num a_case_ref
-------------------------------------------------------
AB123 30838974 23641583 1111979010
AB123 30838976 23641583 1111979010
AB234 30839149 23641520 1111977352
AB234 30839209 23641520 1111988003
I want to modify this so that it only returns records where t_reference is duplicated but against a different a_case_ref. So in above case only records for AB234 would be returned.
Any help would be much appreciated.
You want all ticket appeals that have more than one system reference and more than one case reference it seems. You can join the tables, count the occurrences per ticket and then only keep the tickets that match these criteria.
select *
from
(
select
t.t_reference, at.at_system_ref, at.at_ticket_num, a.a_case_ref,
count(distinct a.a_system_ref) over (partition by at.at_ticket_num) as sysrefs,
count(distinct a.a_case_ref) over (partition by at.at_ticket_num) as caserefs
from tickets t
join appeal_tickets at on at.at_ticket_num = t.t_number
join appeals_2 a on a.a_system_ref = at.at_system_ref
) counted
where sysrefs > 1 and caserefs > 1
order by t.t_reference, at.at_system_ref, at.at_ticket_num, a.a_case_ref;
Correction
It seems that SQL Server still doesn't support COUNT(DISTINCT ...) OVER (...). You can count distinct values in a subquery though. Replace
count(distinct a.a_system_ref) over (partition by at.at_ticket_num) as sysrefs,
by
(
select count(distinct a2.a_system_ref)
from appeal_tickets at2
join appeals_2 a2 on a2.a_system_ref = at2.at_system_ref
where at2.at_ticket_num = t.t_number
) as sysrefs,
An alternative workaround is to use DENSE_RANK in two directions (found here: https://stackoverflow.com/a/53518204/2270762):
dense_rank() over (partition by at.at_ticket_num order by a.a_system_ref) +
dense_rank() over (partition by at.at_ticket_num order by a.a_system_ref desc) -
1 as sysrefs,
with data as (
<your query plus one column>,
case when
min() over (partition by t.t_reference)
<>
max() over (partition by t.t_reference)
then 1 end as dup
)
select * from data where dup = 1

Calculate MAX for every row in SQL

I have this tables:
Docenza(id, id_facolta, ..., orelez)
Facolta(id, ...)
and I want to obtain, for every facolta, only the id of Docenza who has done the maximum number of orelez and the number of orelez:
id_docenzaP facolta1 max(orelez)
id_docenzaQ facolta2 max(orelez)
...
id_docenzaZ facoltaN max(orelez)
how can I do this? This is what i do:
SELECT DISTINCT ... F.nome, SUM(orelez) AS oreTotali
FROM Docenza D
JOIN Facolta F ON F.id = D.id_facolta
GROUP BY F.nome
I obtain somethings like:
docenzaP facolta1 maxValueForidP
docenzaQ facolta1 maxValueForidQ
...
docenzaR facolta2 maxValueForidR
docenzaS facolta2 maxValueForidS
...
docenzaZ facoltaN maxValueForFacoltaN
How can I take only the max value for every facolta?
Presumably, you just want:
SELECT F.nome, sum(orelez) AS oreTotali
FROM Docenza D JOIN
Facolta F
ON F.id = D.id_facolta
GROUP BY F.nome;
I'm not sure what the SELECT DISTINCT is supposed to be doing. It is almost never used with GROUP BY. The . . . suggests that you are selecting additional columns, which are not needed for the results you want.
This is untested, and since you didn't provide sample data with expected results I can't be sure it's really what you need.
It's a bit ugly and I'm sure there is some clever correlated sub query approach, but I've never been good with those.
SELECT st.focolta,
s_orelez,
TMP3.id_docenza
FROM some_table AS st
INNER
JOIN (SELECT *
FROM (SELECT focolta,
s_orelez,
id_docenza,
ROW_NUMBER() OVER -- Get the ranking of the orelez sum by focolta.
( PARTITION BY focolta
ORDER BY s_orelez DESC
) rn_orelez
FROM (SELECT focolta,
id_docenza,
SUM(orelez) OVER -- Sum the orelez by focolta
( PARTITION BY focolta
) AS s_orelez
FROM some_table
) TMP
) TMP2
WHERE = TMP2.rn_orelez = 1 -- Limit to the highest rank value
) TMP3
ON some_table.focolta = TMP3.focolta; -- Join to focolta to the id associated with the hightest value.

Grouping records on consecutive dates

If I have following table in Postgres:
order_dtls
Order_id Order_date Customer_name
-------------------------------------
1 11/09/17 Xyz
2 15/09/17 Lmn
3 12/09/17 Xyz
4 18/09/17 Abc
5 15/09/17 Xyz
6 25/09/17 Lmn
7 19/09/17 Abc
I want to retrieve such customer who has placed orders on 2 consecutive days.
In above case Xyz and Abc customers should be returned by query as result.
There are many ways to do this. Use an EXISTS semi-join followed by DISTINCT or GROUP BY, should be among the fastest.
Postgres syntax:
SELECT DISTINCT customer_name
FROM order_dtls o
WHERE EXISTS (
SELEST 1 FROM order_dtls
WHERE customer_name = o.customer_name
AND order_date = o.order_date + 1 -- simple syntax for data type "date" in Postgres!
);
If the table is big, be sure to have an index on (customer_name, order_date) to make it fast - index items in this order.
To clarify, since Oto happened to post almost the same solution a bit faster:
DISTINCT is an SQL construct, a syntax element, not a function. Do not use parentheses like DISTINCT (customer_name). Would be short for DISTINCT ROW(customer_name) - a row constructor unrelated to DISTINCT - and just noise for the simple case with a single expression, because Postgres removes the pointless row wrapper for a single element automatically. But if you wrap more than one expression like that, you get an actual row type - an anonymous record actually, since no row type is given. Most certainly not what you want.
What is a row constructor used for?
Also, don't confuse DISTINCT with DISTINCT ON (expr, ...). See:
Select first row in each GROUP BY group?
Try something like...
SELECT `order_dtls`.*
FROM `order_dtls`
INNER JOIN `order_dtls` AS mirror
ON `order_dtls`.`Order_id` <> `mirror`.`Order_id`
AND `order_dtls`.`Customer_name` = `mirror`.`Customer_name`
AND DATEDIFF(`order_dtls`.`Order_date`, `mirror`.`Order_date`) = 1
The way I would think of it doing it would be to join the table the date part with itselft on the next date and joining it with the Customer_name too.
This way you can ensure that the same customer_name done an order on 2 consecutive days.
For MySQL:
SELECT distinct *
FROM order_dtls t1
INNER JOIN order_dtls t2 on
t1.Order_date = DATE_ADD(t2.Order_date, INTERVAL 1 DAY) and
t1.Customer_name = t2.Customer_name
The result you should also select it with the Distinct keyword to ensure the same customer is not displayed more than 1 time.
For postgresql:
select distinct(Customer_name) from your_table
where exists
(select 1 from your_table t1
where
Customer_name = your_table.Customer_name and Order_date = your_table.Order_date+1 )
Same for MySQL, just instead of your_table.Order_date+1 use: DATE_ADD(your_table.Order_date , INTERVAL 1 DAY)
This should work:
SELECT A.customer_name
FROM order_dtls A
INNER JOIN (SELECT customer_name, order_date FROM order_dtls) as B
ON(A.customer_name = B.customer_name and Datediff(B.Order_date, A.Order_date) =1)
group by A.customer_name

Find records in SQL Server 2008 R2

Please find below image for make understanding my issues. I have a table as shown below picture. I need to get only highlighted (yellow) records. What is the best method to find these records?
In SQL Server 2012+, you can use the lead() and lag() functions. However, this is not available in SQL Server 2008. Here is a method using outer apply:
select t.*
from t outer apply
(select top 1 tprev.*
from t tprev
on tprev.time < t.time
order by tprev.time desc
) tprev outer apply
(select top 1 tnext.*
from t tnext
on tnext.time > t.time
order by tnext.time asc
)
where (t.cardtype = 1 and tnext.cardtype = 2) or
(t.cardtype = 2 and tprev.cardtype = 1);
With your sample data, it would also be possible to use self joins on the id column. This seems unsafe, though, because there could be gaps in that columns values.
Havent tried this, but I think it should work. First, make a view of the table in your question, with the rownumber included as one column:
CREATE VIEW v AS
SELECT
ROW_NUMBER() OVER(ORDER BY id) AS rownum,
id,
time,
card,
card_type
FROM table
Then, you can get all the rows of type 1 followed by a row of type 2 like this:
SELECT
a.id,
-- And so on...
FROM v AS a
JOIN v AS b ON b.rownum = a.rownum + 1
WHERE a.card_type = 1 AND b.card_type = 2
And all the rows of type 2 preceded by a row of type 1 like this:
SELECT
b.id,
-- And so on...
FROM v AS b
JOIN v AS a ON b.rownum = a.rownum + 1
WHERE a.card_type = 1 AND b.card_type = 2
To get them both in the same set of results, you can just use UNION ALL. Technically, you don't need the view. You could use nested selects instead, but since you will need to query the table four times it might be nice to have it as a view.
Also, if the ID is continous (it goes 1, 2, 3 without any gaps), you don't need the rownum and can just use the ID instead.
here is a code you can run in sql server
select * from Table_name where id in (1,2,6,7,195,160,164,165)

Select from derived table sql exception

I'm trying to execute a select statement from derived table as follows in MSSQL SERVER 2005:
The problem I try to solve is that there are duplicate rows but they differ in DATE field by seconds but i take minutes into account for example
ID DATE
1 08:20:00
1 08:20:01
2 09:21:00
5 10:00:00
5 10:00:01
I want to take DISTINCT values of ID's, and order by DATE but as i order by date I need to include DATE field. So i cant select distinctly on one column.
Derived table query (works by itself perfectly retrieving duplicates)
SELECT p.[SICIL] AS ID, h.[ZAMAN_TRH] AS ZAMAN_TRH
FROM [RF_BIO].[dbo].[PERSONEL] p, [RF_BIO].[dbo].[HAREKETLER] h
WHERE h.[ZAMAN_TRH] > '2013-05-27T00:00:00.000' AND h.[YON]= 2 AND
(p.[KARTNO] = h.[KARTNO] OR p.[SICIL]= h.[SICIL])
ORDER BY h.[ZAMAN_TRH] DESC
The query that uses the derived table:
SELECT DISTINCT [SICIL]
FROM ( SELECT p.[SICIL] AS SICIL, h.[ZAMAN_TRH] AS ZAMAN_TRH
FROM [RF_BIO].[dbo]. [PERSONEL] p, [RF_BIO].[dbo].[HAREKETLER] h
WHERE h.[ZAMAN_TRH] > '2013-05-27T00:00:00.000' AND h.[YON]= 2 AND
(p.[KARTNO] = h.[KARTNO] OR p.[SICIL]= h.[SICIL]) ORDER BY h.[ZAMAN_TRH] DESC ) AS LAST
This gets me sql exception in Java
java.sql.SQLException:
at net.sourceforge.jtds.jdbc.SQLDiagnostic.addDiagnostic(SQLDiagnostic.java:372)
at net.sourceforge.jtds.jdbc.TdsCore.tdsErrorToken(TdsCore.java:2893)
at net.sourceforge.jtds.jdbc.TdsCore.nextToken(TdsCore.java:2335)
at net.sourceforge.jtds.jdbc.TdsCore.getMoreResults(TdsCore.java:638)
at net.sourceforge.jtds.jdbc.JtdsStatement.executeSQLQuery(JtdsStatement.java:505)
at net.sourceforge.jtds.jdbc.JtdsStatement.executeQuery(JtdsStatement.java:1427)
Thank you for your help.
Use GROUP BY clause with aggregate function in the ORDER BY clause
SELECT p.[ID] AS ID
FROM [RF_BIO].[dbo].[PERSONEL] p, [RF_BIO].[dbo].[HAREKETLER] h
WHERE h.[DATE] > '2013-05-27T00:00:00.000' AND h.[YON]= 2
AND (p.[KART] = h.[KART] OR p.[ID]= h.[ID])
GROUP BY p.[ID]
ORDER BY MAX(h.[DATE]) DESC
Simple demo on SQLFiddle
SELECT p.[SICIL] AS SICIL
FROM [RF_BIO].[dbo].[PERSONEL] p, [RF_BIO].[dbo].[HAREKETLER] h
WHERE h.[ZAMAN_TRH] > '2013-05-27T00:00:00.000' AND h.[YON]= 2
AND (p.[KARTNO] = h.[KARTNO] OR p.[SICIL]= h.[SICIL])
GROUP BY p.[SICIL]
ORDER BY MAX(h.[ZAMAN_TRH]) DESC
Plan Diagram