I am not very familiar with SQL and I hope some expert here can show me suitable and efficient query for what I want to achieve. I am using DB2 by the way.
Below is a screenshot of a sample data. What I need is for a given year, select the record with distinct ID1+ID2+Name columns and maximum (most recent) effective date (in YYYYMMDD format, stored as integer), with the above year being in between YearFrom and YearTo range.
FOr anyone that cant see a screenshot:
NAME YearFrom YearTo ID1 ID2 EffDate
item1 2002 2005 AB 10 20091201
item1 2009 2013 AB 10 20100301
item2 2001 2004 XX 20 20050103
item2 2002 2009 XX 20 20060710
item2 2007 2013 XX 20 20090912
item3 2005 2010 YY 30 20110304
I hope I explained it well. For example if user is looking for available items in year 2011, item1 (with eff. date 20100301) and item 2 (with eff. date 20090912) will be returned.
If someone is looking for items available in year 2008: item2 (with eff. date 20090912) and item 3 will be returned. Item 1 will not be returned in this case because the most recent record for item 1 has range of 2009-2013.
I think I have the first part of the query right, but I dont know how to select the valid records from that results based on the year in one query.
select name,id1,id2,max(effdate)
from [table]
group by name,id1,id2
Any help would be much appreciated.
you can go with below qyery for this type of output --
-- you want to check with the row where effective date is the maximum for the item name column then you can take only those records and then we can put year condition on those records.
SELECT NAME, Id1, Id2, Effdate
FROM Table_Name t_1
WHERE Effdate =
(SELECT (t_2.Effdate)
FROM Table_Name t_2
WHERE t_2.NAME = t_1.NAME
and t_2.id1 = t_1.id1
and t_2.id2 = t_1.id2
GROUP BY t_2.name,t_2.id1,t_2.id2)
AND Your_Year_Variable_Value BETWEEN t_1.Yearfrom AND t_1.Yearto
It's not clear whether these two statements are in conflict. I think they are in conflict, and I'm going with statement 1 in the code below.
[1.] What I need is for a given year, select the record with distinct ID1+ID2+Name columns and maximum (most recent) effective date (in YYYYMMDD format, stored as integer), with the above year being in between YearFrom and YearTo range.
[2.] Item 1 will not be returned in this case because the most recent record for item 1 has range of 2009-2013.
I would say that item 1 would not be returned, because it has no information for year 2008. If it did have information for 2008, it should be returned per statement 1 above, regardless of whether there happened to be more recent data.
If you expand your table so each year appears in a row by itself, rather than being implied by a range like 2002-2005, it's pretty simple. The query below is in PostgreSQL; you should only have to replace the first common table expression with a DB2 equivalent to generate a table of numbers (or use an actual table of numbers), and fixup the CTE syntax. (DB2's CTE syntax is unique.)
with years as (
select generate_series(2000, 2020) as year
),
expanded_table1 as (
select id1, id2, name, year, yearfrom, yearto, effdate
from Table1
inner join years on years.year between YearFrom and YearTo
)
select id1, id2, name, year, max(effdate)
from expanded_table1
where year = 2008
group by id1, id2, name, year
Explanation
This query, the first CTE, generates a series of integers that represent all the years we might be interested in. A more robust solution might select the minimum and maximum years for the number generator from your table instead of using integer literals.
select generate_series(2000, 2020) as year;
YEAR
--
2000
2001
2002
...
2020
By joining that table with your table, we can expand the ranges into rows.
with years as (
select generate_series(2000, 2020) as year
)
select id1, id2, name, year, yearfrom, yearto, effdate
from Table1
inner join years on years.year between YearFrom and YearTo
order by id1, id2, name, year;
ID1 ID2 NAME YEAR YEARFROM YEARTO EFFDATE
--
AB 10 item1 2002 2002 2005 20091201
AB 10 item1 2003 2002 2005 20091201
AB 10 item1 2004 2002 2005 20091201
AB 10 item1 2005 2002 2005 20091201
...
Having prepared the foundation this way, the query to find the maximum effective date for each distinct combination of id1, id2, name, for a given year is just a simple GROUP BY with a WHERE clause.
with years as (
select generate_series(2000, 2020) as year
),
expanded_table1 as (
select id1, id2, name, year, yearfrom, yearto, effdate
from Table1
inner join years on years.year between YearFrom and YearTo
)
select id1, id2, name, year, max(effdate)
from expanded_table1
where year = 2011
group by id1, id2, name, year
ID1 ID2 NAME YEAR MAX
--
AB 10 item1 2011 20100301
XX 20 item2 2011 20090912
Related
I have a table that looks like the below.
ParentID | PersonID | Year
----------------------------
1 1 2019
1 2 2020
3 3 2019
3 4 2020
5 5 2019
I'm trying to figure out how to select the current PersonID when a ParentID has more than one record so my results would look like the below.
ParentID | PersonID | Year
----------------------------
1 2 2020
3 4 2020
5 5 2019
I can't select just the max PersonID because we sometimes create Person records for the previous year, in which case the PersonID is greater, and we still want to return this year's record. I also can't select based on year, because if they don't have a record for this year, we still need their most recent record for each ever year that is.
I've tried selecting this subset in half a dozen ways at this point and have only managed to make my brain hurt. Any assistance would be appreciated!!
This is a typical greatest-n-per-group problem. To solve it, you need to think filtering rather than aggregation.
A portable solution is to filter with a correlated subquery that returns the latest year per parent_id:
select t.*
from mytable t
where t.year = (
select max(t1.year) from mytable t1 where t1.parent_id = t.parent_id
)
Assuming you are using MSSQL, this can be achieved by ROW_NUMBER. You can read more about ROW_NUMBER here. The PARTITION BY divides the result into partitions and apply row numbers to the partitions. So, applying partition to ParentId and sorting with Year descending, the data sorted ParentId by Year. Then remove the older data by using the RowNo = 1 condition.
Create Table Test(ParentId int, PersonId int, Year int);
INSERT INTO Test values
(1, 1, 2019),
(1, 2, 2020),
(3, 3, 2019),
(3, 4, 2020),
(5, 5, 2019);
SELECT ParentId, PersonId, Year FROM
(
SELECT ROW_NUMBER() OVER(PARTITION BY ParentId
ORDER BY Year /* Use PersonId if it fits correctly */ DESC) AS RowNo,
ParentId, PersonId, Year from Test -- Table Name
) E WHERE ROWNo = 1
In a SQL Server 2008 database, I'm trying to identify each ID that has corresponding dates that meet the following criteria:
Are any 2 dates within each ID >= 3 months apart?
Are those same 2 dates <= 24 months apart?
I can do a comparison on the next row, but that doesn't tell me if rows 1 and 3 meet the criteria, or rows 5 and 7, etc.
Here's the table structure (there are around 100,000 rows in the actual table):
select ID, Date from #tmp;
ID Date
ID1 7/2/2016
ID1 10/19/2016
ID1 1/21/2017
ID1 7/19/2017
ID2 11/26/2015
ID2 2/10/2016
ID2 5/23/2016
ID3 6/15/2017
ID3 6/30/2017
So here ID1 and ID2 both have dates meeting the criteria, but the dates for ID3 don't meet the 1st criteria (being 3 months apart).
Here's the self-join I've tried so far:
with NextDateTable as
(
select
ID
,Date
,rn=rank() over (partition by ID order by Date asc)
from #tmp
)
select
a.ID
,a.Date
,NextDate=b.Date
into #tmp2
from NextDateTable a
left join NextDateTable b on a.ID=b.ID and b.rn=a.rn+1
order by ID,Date
;
This gives me a table with the next date in a new column, so I can do the following datediff:
select
ID
,Date
,NextDate
,case
when ((Date is not null) and (NextDate is not null))
and
datediff(mm,Date,NextDate)>=3
and
datediff(mm,Date,NextDate)<=24
then 1
else 0
end as Check
into #tmp3
from #tmp2
;
The problem with this is that it only checks consecutive rows, and it doesn't check every row against each other row within the same ID.
Any suggestions would be greatly appreciated!
Your question simplifies to asking if the total span of the dates is between 3 and 24 months. You can simply do:
select id
from #tmp
group by id
having max(date) >= dateadd(month, 3, min(date)) and
max(date) < dateadd(month, 24, min(date));
Note that if you are asking about adjacent dates, then that is another question, not this one. Ask a new question if that is what you really intend.
Say my start year is 2000 and I would like to have a one column select return every year from 2000 to the current year, example:
2000
2001
...
2012
2013
This is to populate a parameter in Reporting Services.
The easiest thing for you to do would be to create a numbers table that you would use for these types of queries.
You could also use a recursive Common Table Expression to generate the list of years:
;with cte (yr) as
(
select 2000
union all
select yr + 1
from cte
where yr+1 <=2013
)
select yr
from cte;
See SQL Fiddle with Demo
Note: Tried a couple of the answers below (its in Teradata, so some of the answers are giving me syntax errors everywhere)
I hit a brick wall here.
I want to compare year by year in different columns
ID, Year, Revenue
1, 2009, 10
1, 2009, 20
1, 2010, 20
2, 2009, 5
2, 2010, 50
2, 2010, 1
How do I separate it by both ID and Year?
At the end I would like it to look like this
ID, Year, Sum
1, 2009, 30
1, 2009, 20
...
2, 2010, 51
(heavily edited for comprehension)
The best I can give you with the amount of detail you have provided is to break your table into subqueries:
select t1.yr - t2.yr from
(select yr
from the_table where yr = 2010) t1,
(select yr
from the_table where yr = 2010) t2
More detail could be given if we knew which type of database you are using, what the real structure of your table is, etc. but perhaps this will get you started.
something like this:
select id, t2009.year, t.2010.year, t2010.year-t.2009.year diff
from
( select id, year
from mytable
where year = 2009
) t2009
,
( select id, year
from mytable
where year = 2010
) t2010
You will most likely have to do a self-join
SELECT [what you are comparing] FROM [table] t1
[INNER/LEFT] JOIN [table] t2 ON t1.[someID] = t2.[someID]
WHERE t1.year = 2009 AND t2.year = 2010
In the someID would not necessarily have to be an ID, or even an indexed column, but it should be the column you are looking to compare across the years.
E.g. a table called 'Products' with columns/fields
ID
ProductName
Price
Year
You could do:
SELECT t1.ProductName, (t2.Price - t1.Price) As Price_change FROM Products t1
INNER JOIN Products t2 ON t1.ProductName = t2.ProductName
WHERE t1.year = 2009 AND t2.year = 2010
This would be faster is ProductName was a primary key or an indexed column. This would also be faster than using nested selects which are much much slower than joins (when joining on an index).
By your data and your desired output, I think you simply want this:
select ID, Year, SUM(Revenue)
from YourTable
GROUP BY ID, Year
Update
Now, if your first data sample is already a SELECT query, you need to:
select ID, Year, SUM(Revenue)
from (SELECT...) YourSelect
GROUP BY ID, Year
This looks like a good candidate for the ROLLUP command. It will give you automatic sums for the grouped-by columns:
GROUP BY ROLLUP (ID,Year)
More info here.
I have table named usertable and structure is
id Name Year
==========================
1 a 2010
____________________________
2 b 2008
____________________________
3 c 2010
____________________________
4 d 2007
____________________________
5 e 2008
Now I want the Output result like this
Year
==========
2010
____________
2008
____________
2007
I don't know the SQL query .
So please help me.
Every Ideas and suggestions are welcome.
Not exactly sure what you're looking for, but if you're looking for the years that are in the table in descending order, then you could use this:
SELECT DISTINCT year FROM usertable ORDER BY year DESC;
SELECT DISTINCT [Year]
FROM myTable
ORDER BY [Year] DESC
SELECT DISTINCT Year
FROM MyTable
ORDER BY Year DESC
SELECT DISTINCT Year
FROM TABLE
ORDER BY Year DESC
if you wan't to be more than tape monkey here's a brilliant hands on source http://sqlzoo.net/0.htm