Calculating running sum starting x years before - sql

I have a table with entity name, year and activity number as
bellow. During some years there is not any activity.
name | year | act_num
-----+------+---------
aa | 2000 | 2
aa | 2001 | 6
aa | 2002 | 9
aa | 2003 | 15
aa | 2005 | 17
b | 2000 | 3
b | 2002 | 4
b | 2003 | 9
b | 2005 | 12
b | 2006 | 2
To create it on postgresql;
CREATE TABLE entity_year_activity (
name character varying(10),
year integer,
act_num integer
);
INSERT INTO entity_year_activity
VALUES
('aa', 2000, 2),
('aa', 2001, 6),
('aa', 2002, 9),
('aa', 2003, 15),
('aa', 2005, 17),
('b', 2000, 3),
('b', 2002, 4),
('b', 2003, 9),
('b', 2005, 12),
('b', 2006, 2);
I would like to have the total number of the past x years with the
number of this year activities for each entity for every year as bellow.
As an example for x = three years.
name | year | act_num | total_3_years
-----+------+---------+---------------
aa | 2000 | 2 | 2
aa | 2001 | 6 | 8
aa | 2002 | 9 | 17
aa | 2003 | 15 | 30
aa | 2004 | 0 | 24
aa | 2005 | 17 | 32
b | 2000 | 3 | 3
b | 2001 | 0 | 3
b | 2002 | 4 | 7
b | 2003 | 9 | 13
b | 2005 | 12 | 21
b | 2006 | 2 | 14

Here's an approach that uses the ability to use the sum aggregate as a window function with a range-based window frame - see SUM(...) OVER (PARTITION BY name ORDER BY year ROWS 2 PRECEDING) and window framing.
WITH name_years(gen_name, gen_year) AS (
SELECT gen_name, s
FROM generate_series(
(SELECT min(year) FROM entity_year_activity),
(SELECT max(year) FROM entity_year_activity)
) s CROSS JOIN (SELECT DISTINCT name FROM entity_year_activity) n(gen_name)
),
windowed_history(name, year,act_num,last3_actnum) AS (
SELECT
gen_name, gen_year, coalesce( act_num, 0),
SUM(coalesce(act_num,0)) OVER (PARTITION BY gen_name ORDER BY gen_year ROWS 2 PRECEDING)
FROM name_years
LEFT OUTER JOIN entity_year_activity ON (gen_name = name AND gen_year = year)
)
SELECT name, year, act_num, sum(last3_actnum) as total_3_years
FROM windowed_history
GROUP BY name, year, act_num
HAVING sum(last3_actnum) <> 0
ORDER BY name, year;
See SQLFiddle.
The need to generate entries for years that have no entry themselves complicates this query. I generate a table of all (name, year) pairs, then left outer join entity_year_activity on it before doing the window sum, so all years for all name sets are represented. That's why this is so complicated. Then I filter the aggregated result to exclude entries with zero in the sum.

SQL Fiddle
select
s.name,
d "year",
coalesce(act_num, 0) act_num,
coalesce(act_num, 0)
+ lag(coalesce(act_num, 0), 1, 0) over(partition by s.name order by d)
+ lag(coalesce(act_num, 0), 2, 0) over(partition by s.name order by d)
total_3_years
from
entity_year_activity eya
right join (
generate_series(
(select min("year") from entity_year_activity),
(select max("year") from entity_year_activity)
) d cross join (
select distinct name
from entity_year_activity
) f
) s on s.name = eya.name and s.d = eya."year"
order by s.name, d

SELECT en_key.name, en_key.year, en_key.act_num, SUM(en_sum.act_num) as total_3_years
FROM entity_year_activity en_key
INNER JOIN entity_year_activity en_sum
ON en_key.name = en_sum.name
WHERE en_sum.year BETWEEN en_key.year - 2 AND en_key.year
GROUP BY en_key.name, en_key.year

Another try. This one lacks the 0 row years, though:
select t1.name, t1.year, t1.act_num,
(select sum(t2.act_num) from entity_year_activity t2
where t2.year between t1.year - 2 and t1.year
and t2.name = t1.name) total
from entity_year_activity t1;

Related

Sql query to partition and sum the records grouping by their bill number and Product code

Below are two tables where there are parent bill number like 1, 4 and 8. These parents bill references to nothing/NULL values. They are referenced by one or more child bill number. For eg parent bill 1 is referenced by child bill 2, 3 and 6.
Table B also has the bill no column with prod code with actual service (ST values) and associated service values (SV). SV are the additional cost to ST.
Same ST may occur in multiple bill numbers. Here Bill number is only unique.
For eg, ST1 are in bill number 1 and 8. Also same SV may reference same or different ST.
SV1, SV2 and SV3 are referencing to ST1 corresponding to bill no. 1 and SV2 and SV4 are referencing to ST2 corresponding to bill no.2.
How can we get below expected output?
Table A:
| bill no | ref |
+----------------------------------------+
| 1 | |
| 2 | 1 |
| 3 | 1 |
| 4 | |
| 5 | 4 |
| 6 | 1 |
| 7 | 4 |
| 8 | |
| 9 | 8 |
Table B:
| bill no | Prod code | cost |
+-----------------------------------------------------+
| 1 | ST1 | 10
| 2 | SV1 | 20
| 3 | SV2 | 30
| 4 | ST2 | 10
| 5 | SV2 | 20
| 6 | SV3 | 30
| 7 | SV4 | 40
| 8 | ST1 | 50
| 9 | SV1 | 10
Expected output:
| bill no | Prod code | ST_cost | SV1 | SV2 | SV3 |
+---------------------------------------------------------------------------------------------+
| 1 | ST1 | 10 | 20 | 30 | 30 |
| 4 | ST2 | 10 | 20 | 40 | |
| 8 | ST1 | 50 | 10 | | |
Here's a script that should get you there:
USE tempdb;
GO
DROP TABLE IF EXISTS dbo.TableA;
CREATE TABLE dbo.TableA
(
BillNumber int NOT NULL PRIMARY KEY,
Reference int NULL
);
GO
INSERT dbo.TableA (BillNumber, Reference)
SELECT *
FROM (VALUES (1,NULL),
(2,1),
(3,1),
(4,NULL),
(5,4),
(6,1),
(7,4),
(8,NULL),
(9,8)) AS a(BillNumber, Reference);
GO
DROP TABLE IF EXISTS dbo.TableB;
CREATE TABLE dbo.TableB
(
BillNumber int NOT NULL PRIMARY KEY,
ProductCode varchar(10) NOT NULL,
Cost int NOT NULL
);
GO
INSERT dbo.TableB (BillNumber, ProductCode, Cost)
SELECT BillNumber, ProductCode, Cost
FROM (VALUES (1, 'ST1', 10),
(2, 'SV1', 20),
(3, 'SV2', 30),
(4, 'ST2', 10),
(5, 'SV2', 20),
(6, 'SV3', 30),
(7, 'SV4', 40),
(8, 'ST1', 50),
(9, 'SV1', 10)) AS b(BillNumber, ProductCode, Cost);
GO
WITH ParentBills
AS
(
SELECT b.BillNumber, b.ProductCode, b.Cost AS STCost
FROM dbo.TableB AS b
INNER JOIN dbo.TableA AS a
ON b.BillNumber = a.BillNumber
WHERE a.Reference IS NULL
),
SubBills
AS
(
SELECT pb.BillNumber, pb.ProductCode, pb.STCost,
b.ProductCode AS ChildProduct, b.Cost AS ChildCost
FROM ParentBills AS pb
INNER JOIN dbo.TableA AS a
ON a.Reference = pb.BillNumber
INNER JOIN dbo.TableB AS b
ON b.BillNumber = a.BillNumber
)
SELECT sb.BillNumber, sb.ProductCode, sb.STCost,
MAX(CASE WHEN sb.ChildProduct = 'SV1' THEN sb.ChildCost END) AS [SV1],
MAX(CASE WHEN sb.ChildProduct = 'SV2' THEN sb.ChildCost END) AS [SV2],
MAX(CASE WHEN sb.ChildProduct = 'SV3' THEN sb.ChildCost END) AS [SV3]
FROM SubBills AS sb
GROUP BY sb.BillNumber, sb.ProductCode, sb.STCost
ORDER BY sb.BillNumber;
You could write a function that creates you query based on your SV number.
And use "Execute Immediate" to execute the Query String and then "PIPE ROW" to generate the result.
Check This PIPE ROW EXAMPLE
I don't understand where the "SV1" value comes from on the second row.
But your problem is basically conditional aggregation:
with ab as (
select a.*, b.productcode, b.cost,
coalesce(a.reference, a.billnumber) as parent_billnumber
from a join
b
on b.billnumber = a.billnumber
)
select parent_billnumber,
max(case when reference is null then productcode end) as st,
sum(case when reference is null then cost end) as st_cost,
sum(case when productcode = 'SV1' then cost end) as sv1,
sum(case when productcode = 'SV2' then cost end) as sv2,
sum(case when productcode = 'SV3' then cost end) as sv3
from ab
group by parent_billnumber
order by parent_billnumber;
Here is a db<>fiddle.
Note this works because you have only one level of child relationships. If there are more, then recursive CTEs are needed. I would recommend that you ask a new question if this is possible.
The CTE doesn't actually add much to the query, so you can also write:
select coalesce(a.reference, a.billnumber) as parent_billnumber ,
max(case when a.reference is null then productcode end) as st,
sum(case when a.reference is null then b.cost end) as st_cost,
sum(case when b.productcode = 'SV1' then b.cost end) as sv1,
sum(case when b.productcode = 'SV2' then b.cost end) as sv2,
sum(case when b.productcode = 'SV3' then b.cost end) as sv3
from a join
b
on b.billnumber = a.billnumber
group by coalesce(a.reference, a.billnumber)
order by parent_billnumber;

List value by group by sql server

I have a table as below
Id | PriceCardId | Days
1 | 1 | 2
2 | 1 | 4
3 | 1 | 5
4 | 1 | 6
5 | 1 | 3
6 | 2 | 5
7 | 2 | 3
8 | 3 | 6
How to write SQL query to get all PriceCardId has Day contain
List<int> days
Example days = [2,4,5,6], with data as above result is 1
Thanks!
I think you want:
select pricecardid
from t
where day in (2, 4, 5, 6)
group by pricecardid
having count(*) = 4; -- the number of days you are looking for
This assumes no duplicates in your table. If there are duplicates, use having count(distinct day) instead of count(*).
Note: You can phrase this as:
with d as (
select v.dy
from values ( (2), (4), (5), (6) ) v(dy)
)
select pricecardid
from t
where day in (select d.dy from d)
group by pricecardid
having count(*) = (select count(*) from d);
Try this:
SELECT PriceCardId
FROM [My_Table]
WHERE [Day] IN(2,4,5,6)
GROUP BY PriceCardId
HAVING COUNT(DISTINCT [Day])=4

A very basic SQL issue I'm stuck with [duplicate]

I have a table of player performance:
CREATE TABLE TopTen (
id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
home INT UNSIGNED NOT NULL,
`datetime`DATETIME NOT NULL,
player VARCHAR(6) NOT NULL,
resource INT NOT NULL
);
What query will return the rows for each distinct home holding its maximum value of datetime? In other words, how can I filter by the maximum datetime (grouped by home) and still include other non-grouped, non-aggregate columns (such as player) in the result?
For this sample data:
INSERT INTO TopTen
(id, home, `datetime`, player, resource)
VALUES
(1, 10, '04/03/2009', 'john', 399),
(2, 11, '04/03/2009', 'juliet', 244),
(5, 12, '04/03/2009', 'borat', 555),
(3, 10, '03/03/2009', 'john', 300),
(4, 11, '03/03/2009', 'juliet', 200),
(6, 12, '03/03/2009', 'borat', 500),
(7, 13, '24/12/2008', 'borat', 600),
(8, 13, '01/01/2009', 'borat', 700)
;
the result should be:
id
home
datetime
player
resource
1
10
04/03/2009
john
399
2
11
04/03/2009
juliet
244
5
12
04/03/2009
borat
555
8
13
01/01/2009
borat
700
I tried a subquery getting the maximum datetime for each home:
-- 1 ..by the MySQL manual:
SELECT DISTINCT
home,
id,
datetime AS dt,
player,
resource
FROM TopTen t1
WHERE `datetime` = (SELECT
MAX(t2.datetime)
FROM TopTen t2
GROUP BY home)
GROUP BY `datetime`
ORDER BY `datetime` DESC
The result-set has 130 rows although database holds 187, indicating the result includes some duplicates of home.
Then I tried joining to a subquery that gets the maximum datetime for each row id:
-- 2 ..join
SELECT
s1.id,
s1.home,
s1.datetime,
s1.player,
s1.resource
FROM TopTen s1
JOIN (SELECT
id,
MAX(`datetime`) AS dt
FROM TopTen
GROUP BY id) AS s2
ON s1.id = s2.id
ORDER BY `datetime`
Nope. Gives all the records.
I tried various exotic queries, each with various results, but nothing that got me any closer to solving this problem.
You are so close! All you need to do is select BOTH the home and its max date time, then join back to the topten table on BOTH fields:
SELECT tt.*
FROM topten tt
INNER JOIN
(SELECT home, MAX(datetime) AS MaxDateTime
FROM topten
GROUP BY home) groupedtt
ON tt.home = groupedtt.home
AND tt.datetime = groupedtt.MaxDateTime
The fastest MySQL solution, without inner queries and without GROUP BY:
SELECT m.* -- get the row that contains the max value
FROM topten m -- "m" from "max"
LEFT JOIN topten b -- "b" from "bigger"
ON m.home = b.home -- match "max" row with "bigger" row by `home`
AND m.datetime < b.datetime -- want "bigger" than "max"
WHERE b.datetime IS NULL -- keep only if there is no bigger than max
Explanation:
Join the table with itself using the home column. The use of LEFT JOIN ensures all the rows from table m appear in the result set. Those that don't have a match in table b will have NULLs for the columns of b.
The other condition on the JOIN asks to match only the rows from b that have bigger value on the datetime column than the row from m.
Using the data posted in the question, the LEFT JOIN will produce this pairs:
+------------------------------------------+--------------------------------+
| the row from `m` | the matching row from `b` |
|------------------------------------------|--------------------------------|
| id home datetime player resource | id home datetime ... |
|----|-----|------------|--------|---------|------|------|------------|-----|
| 1 | 10 | 04/03/2009 | john | 399 | NULL | NULL | NULL | ... | *
| 2 | 11 | 04/03/2009 | juliet | 244 | NULL | NULL | NULL | ... | *
| 5 | 12 | 04/03/2009 | borat | 555 | NULL | NULL | NULL | ... | *
| 3 | 10 | 03/03/2009 | john | 300 | 1 | 10 | 04/03/2009 | ... |
| 4 | 11 | 03/03/2009 | juliet | 200 | 2 | 11 | 04/03/2009 | ... |
| 6 | 12 | 03/03/2009 | borat | 500 | 5 | 12 | 04/03/2009 | ... |
| 7 | 13 | 24/12/2008 | borat | 600 | 8 | 13 | 01/01/2009 | ... |
| 8 | 13 | 01/01/2009 | borat | 700 | NULL | NULL | NULL | ... | *
+------------------------------------------+--------------------------------+
Finally, the WHERE clause keeps only the pairs that have NULLs in the columns of b (they are marked with * in the table above); this means, due to the second condition from the JOIN clause, the row selected from m has the biggest value in column datetime.
Read the SQL Antipatterns: Avoiding the Pitfalls of Database Programming book for other SQL tips.
Here goes T-SQL version:
-- Test data
DECLARE #TestTable TABLE (id INT, home INT, date DATETIME,
player VARCHAR(20), resource INT)
INSERT INTO #TestTable
SELECT 1, 10, '2009-03-04', 'john', 399 UNION
SELECT 2, 11, '2009-03-04', 'juliet', 244 UNION
SELECT 5, 12, '2009-03-04', 'borat', 555 UNION
SELECT 3, 10, '2009-03-03', 'john', 300 UNION
SELECT 4, 11, '2009-03-03', 'juliet', 200 UNION
SELECT 6, 12, '2009-03-03', 'borat', 500 UNION
SELECT 7, 13, '2008-12-24', 'borat', 600 UNION
SELECT 8, 13, '2009-01-01', 'borat', 700
-- Answer
SELECT id, home, date, player, resource
FROM (SELECT id, home, date, player, resource,
RANK() OVER (PARTITION BY home ORDER BY date DESC) N
FROM #TestTable
)M WHERE N = 1
-- and if you really want only home with max date
SELECT T.id, T.home, T.date, T.player, T.resource
FROM #TestTable T
INNER JOIN
( SELECT TI.id, TI.home, TI.date,
RANK() OVER (PARTITION BY TI.home ORDER BY TI.date) N
FROM #TestTable TI
WHERE TI.date IN (SELECT MAX(TM.date) FROM #TestTable TM)
)TJ ON TJ.N = 1 AND T.id = TJ.id
EDIT
Unfortunately, there are no RANK() OVER function in MySQL.
But it can be emulated, see Emulating Analytic (AKA Ranking) Functions with MySQL.
So this is MySQL version:
SELECT id, home, date, player, resource
FROM TestTable AS t1
WHERE
(SELECT COUNT(*)
FROM TestTable AS t2
WHERE t2.home = t1.home AND t2.date > t1.date
) = 0
This will work even if you have two or more rows for each home with equal DATETIME's:
SELECT id, home, datetime, player, resource
FROM (
SELECT (
SELECT id
FROM topten ti
WHERE ti.home = t1.home
ORDER BY
ti.datetime DESC
LIMIT 1
) lid
FROM (
SELECT DISTINCT home
FROM topten
) t1
) ro, topten t2
WHERE t2.id = ro.lid
I think this will give you the desired result:
SELECT home, MAX(datetime)
FROM my_table
GROUP BY home
BUT if you need other columns as well, just make a join with the original table (check Michael La Voie answer)
Best regards.
Since people seem to keep running into this thread (comment date ranges from 1.5 year) isn't this much simpler:
SELECT * FROM (SELECT * FROM topten ORDER BY datetime DESC) tmp GROUP BY home
No aggregation functions needed...
Cheers.
You can also try this one and for large tables query performance will be better. It works when there no more than two records for each home and their dates are different. Better general MySQL query is one from Michael La Voie above.
SELECT t1.id, t1.home, t1.date, t1.player, t1.resource
FROM t_scores_1 t1
INNER JOIN t_scores_1 t2
ON t1.home = t2.home
WHERE t1.date > t2.date
Or in case of Postgres or those dbs that provide analytic functions try
SELECT t.* FROM
(SELECT t1.id, t1.home, t1.date, t1.player, t1.resource
, row_number() over (partition by t1.home order by t1.date desc) rw
FROM topten t1
INNER JOIN topten t2
ON t1.home = t2.home
WHERE t1.date > t2.date
) t
WHERE t.rw = 1
SELECT tt.*
FROM TestTable tt
INNER JOIN
(
SELECT coord, MAX(datetime) AS MaxDateTime
FROM rapsa
GROUP BY
krd
) groupedtt
ON tt.coord = groupedtt.coord
AND tt.datetime = groupedtt.MaxDateTime
This works on Oracle:
with table_max as(
select id
, home
, datetime
, player
, resource
, max(home) over (partition by home) maxhome
from table
)
select id
, home
, datetime
, player
, resource
from table_max
where home = maxhome
Try this for SQL Server:
WITH cte AS (
SELECT home, MAX(year) AS year FROM Table1 GROUP BY home
)
SELECT * FROM Table1 a INNER JOIN cte ON a.home = cte.home AND a.year = cte.year
Here is MySQL version which prints only one entry where there are duplicates MAX(datetime) in a group.
You could test here http://www.sqlfiddle.com/#!2/0a4ae/1
Sample Data
mysql> SELECT * from topten;
+------+------+---------------------+--------+----------+
| id | home | datetime | player | resource |
+------+------+---------------------+--------+----------+
| 1 | 10 | 2009-04-03 00:00:00 | john | 399 |
| 2 | 11 | 2009-04-03 00:00:00 | juliet | 244 |
| 3 | 10 | 2009-03-03 00:00:00 | john | 300 |
| 4 | 11 | 2009-03-03 00:00:00 | juliet | 200 |
| 5 | 12 | 2009-04-03 00:00:00 | borat | 555 |
| 6 | 12 | 2009-03-03 00:00:00 | borat | 500 |
| 7 | 13 | 2008-12-24 00:00:00 | borat | 600 |
| 8 | 13 | 2009-01-01 00:00:00 | borat | 700 |
| 9 | 10 | 2009-04-03 00:00:00 | borat | 700 |
| 10 | 11 | 2009-04-03 00:00:00 | borat | 700 |
| 12 | 12 | 2009-04-03 00:00:00 | borat | 700 |
+------+------+---------------------+--------+----------+
MySQL Version with User variable
SELECT *
FROM (
SELECT ord.*,
IF (#prev_home = ord.home, 0, 1) AS is_first_appear,
#prev_home := ord.home
FROM (
SELECT t1.id, t1.home, t1.player, t1.resource
FROM topten t1
INNER JOIN (
SELECT home, MAX(datetime) AS mx_dt
FROM topten
GROUP BY home
) x ON t1.home = x.home AND t1.datetime = x.mx_dt
ORDER BY home
) ord, (SELECT #prev_home := 0, #seq := 0) init
) y
WHERE is_first_appear = 1;
+------+------+--------+----------+-----------------+------------------------+
| id | home | player | resource | is_first_appear | #prev_home := ord.home |
+------+------+--------+----------+-----------------+------------------------+
| 9 | 10 | borat | 700 | 1 | 10 |
| 10 | 11 | borat | 700 | 1 | 11 |
| 12 | 12 | borat | 700 | 1 | 12 |
| 8 | 13 | borat | 700 | 1 | 13 |
+------+------+--------+----------+-----------------+------------------------+
4 rows in set (0.00 sec)
Accepted Answers' outout
SELECT tt.*
FROM topten tt
INNER JOIN
(
SELECT home, MAX(datetime) AS MaxDateTime
FROM topten
GROUP BY home
) groupedtt ON tt.home = groupedtt.home AND tt.datetime = groupedtt.MaxDateTime
+------+------+---------------------+--------+----------+
| id | home | datetime | player | resource |
+------+------+---------------------+--------+----------+
| 1 | 10 | 2009-04-03 00:00:00 | john | 399 |
| 2 | 11 | 2009-04-03 00:00:00 | juliet | 244 |
| 5 | 12 | 2009-04-03 00:00:00 | borat | 555 |
| 8 | 13 | 2009-01-01 00:00:00 | borat | 700 |
| 9 | 10 | 2009-04-03 00:00:00 | borat | 700 |
| 10 | 11 | 2009-04-03 00:00:00 | borat | 700 |
| 12 | 12 | 2009-04-03 00:00:00 | borat | 700 |
+------+------+---------------------+--------+----------+
7 rows in set (0.00 sec)
SELECT c1, c2, c3, c4, c5 FROM table1 WHERE c3 = (select max(c3) from table)
SELECT * FROM table1 WHERE c3 = (select max(c3) from table1)
Another way to gt the most recent row per group using a sub query which basically calculates a rank for each row per group and then filter out your most recent rows as with rank = 1
select a.*
from topten a
where (
select count(*)
from topten b
where a.home = b.home
and a.`datetime` < b.`datetime`
) +1 = 1
DEMO
Here is the visual demo for rank no for each row for better understanding
By reading some comments what about if there are two rows which have same 'home' and 'datetime' field values?
Above query will fail and will return more than 1 rows for above situation. To cover up this situation there will be a need of another criteria/parameter/column to decide which row should be taken which falls in above situation. By viewing sample data set i assume there is a primary key column id which should be set to auto increment. So we can use this column to pick the most recent row by tweaking same query with the help of CASE statement like
select a.*
from topten a
where (
select count(*)
from topten b
where a.home = b.home
and case
when a.`datetime` = b.`datetime`
then a.id < b.id
else a.`datetime` < b.`datetime`
end
) + 1 = 1
DEMO
Above query will pick the row with highest id among the same datetime values
visual demo for rank no for each row
Why not using:
SELECT home, MAX(datetime) AS MaxDateTime,player,resource FROM topten GROUP BY home
Did I miss something?
In MySQL 8.0 this can be achieved efficiently by using row_number() window function with common table expression.
(Here row_number() basically generating unique sequence for each row for every player starting with 1 in descending order of resource. So, for every player row with sequence number 1 will be with highest resource value. Now all we need to do is selecting row with sequence number 1 for each player. It can be done by writing an outer query around this query. But we used common table expression instead since it's more readable.)
Schema:
create TABLE TestTable(id INT, home INT, date DATETIME,
player VARCHAR(20), resource INT);
INSERT INTO TestTable
SELECT 1, 10, '2009-03-04', 'john', 399 UNION
SELECT 2, 11, '2009-03-04', 'juliet', 244 UNION
SELECT 5, 12, '2009-03-04', 'borat', 555 UNION
SELECT 3, 10, '2009-03-03', 'john', 300 UNION
SELECT 4, 11, '2009-03-03', 'juliet', 200 UNION
SELECT 6, 12, '2009-03-03', 'borat', 500 UNION
SELECT 7, 13, '2008-12-24', 'borat', 600 UNION
SELECT 8, 13, '2009-01-01', 'borat', 700
Query:
with cte as
(
select id, home, date , player, resource,
Row_Number()Over(Partition by home order by date desc) rownumber from TestTable
)
select id, home, date , player, resource from cte where rownumber=1
Output:
id
home
date
player
resource
1
10
2009-03-04 00:00:00
john
399
2
11
2009-03-04 00:00:00
juliet
244
5
12
2009-03-04 00:00:00
borat
555
8
13
2009-01-01 00:00:00
borat
700
db<>fiddle here
This works in SQLServer, and is the only solution I've seen that doesn't require subqueries or CTEs - I think this is the most elegant way to solve this kind of problem.
SELECT TOP 1 WITH TIES *
FROM TopTen
ORDER BY ROW_NUMBER() OVER (PARTITION BY home
ORDER BY [datetime] DESC)
In the ORDER BY clause, it uses a window function to generate & sort by a ROW_NUMBER - assigning a 1 value to the highest [datetime] for each [home].
SELECT TOP 1 WITH TIES will then select one record with the lowest ROW_NUMBER (which will be 1), as well as all records with a tying ROW_NUMBER (also 1)
As a consequence, you retrieve all data for each of the 1st ranked records - that is, all data for records with the highest [datetime] value with their given [home] value.
Try this
select * from mytable a join
(select home, max(datetime) datetime
from mytable
group by home) b
on a.home = b.home and a.datetime = b.datetime
Regards
K
#Michae The accepted answer will working fine in most of the cases but it fail for one for as below.
In case if there were 2 rows having HomeID and Datetime same the query will return both rows, not distinct HomeID as required, for that add Distinct in query as below.
SELECT DISTINCT tt.home , tt.MaxDateTime
FROM topten tt
INNER JOIN
(SELECT home, MAX(datetime) AS MaxDateTime
FROM topten
GROUP BY home) groupedtt
ON tt.home = groupedtt.home
AND tt.datetime = groupedtt.MaxDateTime
this is the query you need:
SELECT b.id, a.home,b.[datetime],b.player,a.resource FROM
(SELECT home,MAX(resource) AS resource FROM tbl_1 GROUP BY home) AS a
LEFT JOIN
(SELECT id,home,[datetime],player,resource FROM tbl_1) AS b
ON a.resource = b.resource WHERE a.home =b.home;
Hope below query will give the desired output:
Select id, home,datetime,player,resource, row_number() over (Partition by home ORDER by datetime desc) as rownum from tablename where rownum=1
(NOTE: The answer of Michael is perfect for a situation where the target column datetime cannot have duplicate values for each distinct home.)
If your table has duplicate rows for homexdatetime and you need to only select one row for each distinct home column, here is my solution to it:
Your table needs one unique column (like id). If it doesn't, create a view and add a random column to it.
Use this query to select a single row for each unique home value. Selects the lowest id in case of duplicate datetime.
SELECT tt.*
FROM topten tt
INNER JOIN
(
SELECT min(id) as min_id, home from topten tt2
INNER JOIN
(
SELECT home, MAX(datetime) AS MaxDateTime
FROM topten
GROUP BY home) groupedtt2
ON tt2.home = groupedtt2.home
) as groupedtt
ON tt.id = groupedtt.id
Accepted answer doesn't work for me if there are 2 records with same date and home. It will return 2 records after join. While I need to select any (random) of them. This query is used as joined subquery so just limit 1 is not possible there.
Here is how I reached desired result. Don't know about performance however.
select SUBSTRING_INDEX(GROUP_CONCAT(id order by datetime desc separator ','),',',1) as id, home, MAX(datetime) as 'datetime'
from topten
group by (home)

SQL Server 2005: How to join table rows only once

I think I've seen answers for similar questions for MySQL, but I'm struggling to find an answer applicable to SQL Server 2005.
So I have a table like this:
| ID | RelationalID | Year
----------------------------
| 1 | A | 2014
| 2 | A | 2014
| 3 | B | 2014
| 4 | A | 2015
| 5 | B | 2015
And I'd like a result like this when I join the same table where RelationID matches but the year is different:
| 2014_ID | 2015_ID | RelationalID |
------------------------------------
| 1 | 4 | A |
| 2 | NULL | A |
| 3 | 5 | B |
But a standard JOIN ends up getting duplicate matches:
| 2014_ID | 2015_ID | RelationalID |
------------------------------------
| 1 | 4 | A |
| 2 | 4 | A |
| 3 | 5 | B |
Is there a way to join two tables where the matches from the right table are joined only once in SQL Server 2005?
I tried this query with no success:
SELECT * FROM myTable
LEFT JOIN (SELECT * FROM myTable) AS t ON t.RelationalID = myTable.RelationalID
WHERE myTable.Year = 2014 and t.Year = 2015
You can get the result based on ROW_NUMBERs, but you need a rule how to assign them, I assumed it's based on the Id.
;WITH cte AS
(SELECT Id,
RelationalId,
year,
row_number()
over (partition by RelationalId, year
order by Id) as rn
FROM [YourTable]
)
select t1.id as Id_2014,t2.id as Id_2015, t1.RelationalId
from cte as t1 left join cte as t2
on t1.RelationalId = t2.RelationalId
and t1.rn = t2.rn
and t2.year = 2015
where t1.Year = 2014
This is based on TMNT2014's fiddle
Below Sql would give you the result you are looking for but as I said before complexity would depend on the original set of data you have in your table. Here is the SQL Fiddle - http://sqlfiddle.com/#!3/d6300/24 - Good Luck!
;WITH CTE_Union AS
(SELECT
a.Id AS Id2014,
NULL AS Id2015,
a.RelationalId
FROM [YourTable] a
WHERE a.Year = 2014
UNION
SELECT
NULL AS Id2014,
b.Id AS Id2015,
b.RelationalId
FROM [YourTable] b
WHERE b.Year = 2015)
SELECT Distinct CASE WHEN Id2014 IS NULL THEN (SELECT MIN(Id2014) FROM CTE_Union C WHERE C.RelationalId =M.RelationalId) ELSE Id2014 END AS ID2014 ,
CASE WHEN Id2015 IS NULL AND Id2014 = (SELECT MIN(Id2014) FROM CTE_Union C2 WHERE C2.RelationalId =M.RelationalId) THEN (SELECT MIN(Id2015) FROM CTE_Union C WHERE C.RelationalId =M.RelationalId) ELSE Id2015 END
,RelationalID
FROM CTE_Union M
DECLARE #MyTable TABLE
(
ID INT,
RelationalID VARCHAR(10),
[Year] INT
)
INSERT INTO #MyTable
VALUES
( 1 ,'A', 2014),
( 2 ,'A', 2014),
( 3 ,'B', 2014),
( 4 ,'A', 2015),
( 5 ,'B', 2015)
;WITH TEST AS
(
SELECT
a.Id AS Id2014,
NULL AS Id2015,
a.RelationalId,
RANK() OVER (PARTITION BY RelationalId ORDER BY ID) Ranked
FROM #MyTable a
WHERE a.Year = 2014
UNION
SELECT
NULL AS Id2014,
b.Id AS Id2015,
b.RelationalId,
RANK() OVER (PARTITION BY RelationalId ORDER BY ID) Ranked
FROM #MyTable b
WHERE b.Year = 2015
)
SELECT
t1.Id2014,
t2.Id2015,
t1.RelationalID
FROM TEST t1
LEFT JOIN TEST t2
ON t1.Ranked = t2.Ranked
AND t1.RelationalID = t2.RelationalID
AND t2.Id2015 IS NOT NULL
WHERE t1.Id2014 IS NOT NULL
ORDER BY t1.Id2014
I used a union and then ranked each side by relational id and left joined them.
Here is the output:
Id2014 Id2015 RelationalID
1 4 A
2 NULL A
3 5 B
There are probably a few ways to solve this but below shows an example of utilizing "Derived Tables" in a query.
SELECT
q1.Id AS [2014_Id],
q2.Id AS [2015_Id],
q1.RelationalId
FROM (SELECT
MAX(a.Id) AS Id,
a.RelationalId
FROM [table] a
WHERE a.Year = 2014
GROUP BY
a.RelationalId) q1
INNER JOIN (SELECT
MAX(b.Id) AS Id,
b.RelationalId
FROM [table] b
WHERE b.Year = 2015
GROUP BY
b.RelationalId) q2
ON q2.RelationalId = q1.RelationalId

Join table on itself for unique row combinations for calculations

I have a table that I need to use to build a result set from where certain rows from the table are columns in the result set. I started to chain LEFT JOINs together on the table multiple times but I need to eliminate results that are a different combination of another result already in the set:
For example, if I get 1, 21, 25 as result columns, I can't have ANY other combination of those numbers in the results.
My table definition is:
Table tblKPIDetails
Column Month int
Column Year int
Column Division varchar(3)
Column KPI int
Column Value decimal(18,4)
My current query is:
SELECT *
FROM tblKPIDetails J1
LEFT JOIN tblKPIDetails J2 ON J2.Month = J1.Month AND J2.Year = J1.Year AND J2.Division = J1.Division AND NOT(J2.KPI = J1.KPI ) AND (J2.KPI = 1 OR J2.KPI = 21 OR J2.KPI = 25)
LEFT JOIN tblKPIDetails J3 ON J3.Month = J1.Month AND J3.Year = J1.Year AND J3.Division = J1.Division AND NOT(J3.KPI = J1.KPI ) AND (J3.KPI = 1 OR J3.KPI = 21 OR J3.KPI = 25)
WHERE J1.KPI = 1 OR J1.KPI = 21 OR J1.KPI = 25
I know this is wrong, but it's a super-set of what I need. In the results from the query above, I can get J1.KPI, J2.KPI, J3.KPI or J1.KPI, J3.KPI, J2.KPI, or any other combination.
My expected result would be:
Division | Month | Year | KPIA | KPIAValue | KPIB | KPIBValue | KPIC | KPICValue
for each division, month, and year
where KPIA, KPIB, or KPIC = 1, 21, or 25 but only 1 combination of 1,21,25 exists per division|month|year
EDIT
To clarify the expected results a little more, using the above query, I'm getting the following results:
Division | Month | Year | KPIA | KPIAValue | KPIB | KPIBValue | KPIC | KPICValue
--------------------------------------------------------------------------------
000 1 2012 1 1000 21 2000 25 3000
000 1 2012 21 2000 1 1000 25 3000
000 1 2012 25 3000 21 2000 1 1000
111 1 2012 1 555 21 10000 25 5000
I need to make it so my results would only be ANY 1 of the first 3 results and then the last one...for example:
Division | Month | Year | KPIA | KPIAValue | KPIB | KPIBValue | KPIC | KPICValue
--------------------------------------------------------------------------------
000 1 2012 25 3000 21 2000 1 1000
111 1 2012 1 555 21 10000 25 5000
I think you are looking for the PIVOT table operator like so:
SELECT
Devision,
Month,
Year,
[1] AS KPIAValue,
[21] AS KPIBValue,
[25] AS KPICValue
FROM
(
SELECT t1.*
FROM tblKPIDetails t1
INNER JOIN
(
SELECT Month, Year, Devision
FROM tblKPIDetails
WHERE KPI IN(1, 21, 25)
GROUP BY Month, Year, Devision
HAVING COUNT(DISTINCT KPI) = 3
) t2 ON t1.Month = t2.Month AND t1.Year = t2.Year
AND t1.Devision = t2.Devision
) t
PIVOT
(
MAX(Value)
FOR KPI IN([1], [21], [25])) p;
SQL Fiddle Demo
This will give you the data in the form:
| DEVISION | MONTH | YEAR | KPIAVALUE | KPIBVALUE | KPICVALUE |
---------------------------------------------------------------
| A | 2 | 2012 | 16 | 16 | 16 |
| B | 10 | 2012 | 16 | 18 | 20 |
Note that: This will give you the only combination of the Year, Month, DEVISION that have all the values 1, 21 and 25, and that what this query do:
SELECT Month, Year, Devision
FROM tblKPIDetails
WHERE KPI IN(1, 21, 25)
GROUP BY Month, Year, Devision
HAVING COUNT(DISTINCT KPI) = 3
Update: If you are looking for those that had at least one of 1, 21 or 25, just remove the HAVING COUNT(DISTINCT KPI) = 3, but this will make you expect more values than these three, in this case it will ignore other values and return only those three. Also it will return NULL for any of the missing values of them like so:
SELECT
Devision,
Month,
Year,
[1] AS KPIAValue,
[21] AS KPIBValue,
[25] AS KPICValue
FROM
(
SELECT t1.*
FROM tblKPIDetails t1
INNER JOIN
(
SELECT Month, Year, Devision
FROM tblKPIDetails
WHERE KPI IN(1, 21, 25)
GROUP BY Month, Year, Devision
) t2 ON t1.Month = t2.Month AND t1.Year = t2.Year
AND t1.Devision = t2.Devision
) t
PIVOT
(
MAX(Value)
FOR KPI IN([1], [21], [25])) p;
Updated SQL Fiddle Demo
| DIVISION | MONTH | YEAR | KPIAVALUE | KPIBVALUE | KPICVALUE |
---------------------------------------------------------------
| A | 2 | 2012 | 15.5 | 15.5 | 15.5 |
| B | 10 | 2012 | 15.5 | 17.5 | 20.24 |
| C | 12 | 2012 | 15.5 | (null) | 20.24 |
If you don't have a large number of "IDs", you could just transpose the values like this:
select
[Month],
[Year],
Division,
sum(case when KPI = 1 then Value else null end) as KPI1,
sum(case when KPI = 21 then Value else null end) as KPI21,
sum(case when KPI = 25 then Value else null end) as KPI25
from tblKPIDetails
group by
[Month],
[Year],
Division
order by
[Month],
[Year],
Division
Or same thing by using the "OVER" clause.
I think you want a conditional aggregation. But it is still not clear to me how the results are being defined. This might help you on your way:
SELECT Division, Month, Year,
1, max(case when kpi = 1 then value end) as kpi1value,
21, max(case when kpi = 21 then value end) as kpi21value,
25, max(case when kpi = 25 then value end) as kpi25value,
FROM tblKPIDetails J1
maybe you can try the following:
SELECT DISTINCT
t.Division,
t.Month,
t.Year,
KA.Value AS KPIAValue,
KB.Value AS KPIBValue,
KC.Value AS KPICValue
FROM
tblKPIDetails t
LEFT JOIN tblKPIDetails KA ON t.Division = KA.Division and t.Month = KA.month and .year = KA.year and KA.KPI = 1
LEFT JOIN tblKPIDetails KB ON t.Division = KB.Division and t.Month = KB.month and t.year = KB.year and KB.KPI = 21
LEFT JOIN tblKPIDetails KC ON t.Division = KC.Division and t.Month = KC.month and t.year = KC.year and KC.KPI = 25
Then is one LEFT JOIN for each possible KPI value you want.