Split column in 3 - sql

Note: Tried a couple of the answers below (its in Teradata, so some of the answers are giving me syntax errors everywhere)
I hit a brick wall here.
I want to compare year by year in different columns
ID, Year, Revenue
1, 2009, 10
1, 2009, 20
1, 2010, 20
2, 2009, 5
2, 2010, 50
2, 2010, 1
How do I separate it by both ID and Year?
At the end I would like it to look like this
ID, Year, Sum
1, 2009, 30
1, 2009, 20
...
2, 2010, 51
(heavily edited for comprehension)

The best I can give you with the amount of detail you have provided is to break your table into subqueries:
select t1.yr - t2.yr from
(select yr
from the_table where yr = 2010) t1,
(select yr
from the_table where yr = 2010) t2
More detail could be given if we knew which type of database you are using, what the real structure of your table is, etc. but perhaps this will get you started.

something like this:
select id, t2009.year, t.2010.year, t2010.year-t.2009.year diff
from
( select id, year
from mytable
where year = 2009
) t2009
,
( select id, year
from mytable
where year = 2010
) t2010

You will most likely have to do a self-join
SELECT [what you are comparing] FROM [table] t1
[INNER/LEFT] JOIN [table] t2 ON t1.[someID] = t2.[someID]
WHERE t1.year = 2009 AND t2.year = 2010
In the someID would not necessarily have to be an ID, or even an indexed column, but it should be the column you are looking to compare across the years.
E.g. a table called 'Products' with columns/fields
ID
ProductName
Price
Year
You could do:
SELECT t1.ProductName, (t2.Price - t1.Price) As Price_change FROM Products t1
INNER JOIN Products t2 ON t1.ProductName = t2.ProductName
WHERE t1.year = 2009 AND t2.year = 2010
This would be faster is ProductName was a primary key or an indexed column. This would also be faster than using nested selects which are much much slower than joins (when joining on an index).

By your data and your desired output, I think you simply want this:
select ID, Year, SUM(Revenue)
from YourTable
GROUP BY ID, Year
Update
Now, if your first data sample is already a SELECT query, you need to:
select ID, Year, SUM(Revenue)
from (SELECT...) YourSelect
GROUP BY ID, Year

This looks like a good candidate for the ROLLUP command. It will give you automatic sums for the grouped-by columns:
GROUP BY ROLLUP (ID,Year)
More info here.

Related

Find duplicates on the basis of a condition in sql

So, I want to find duplicate IDs in a table on the basis of the condition.
I have multiple ids for files with year 2019, 2019, 2020, 2021. There can be possible overlap of ids between the files across years.
I want to find all the duplicate ids present in 2019 year, which are also present in rest of the years.
So if:
id
year
1
2019
1
2020
1
2021
2
2019
3
2019
4
2018
4
2019
I want:
id
1
4
Note: I only want IDs specific to 2019. If an Id is present in 2018 and 2020, that should go unmatched.
this is what I tried:
select id from table
intersect
select id from table where year='2019'
Thanks in advance!
Assuming you want to grab all of the rows where the year is 2019, you can use the very simple query:
SELECT * FROM TABLE WHERE year = '2019'
If you exclusively want to return IDs 1 and 4 for this year, you can use:
SELECT * FROM TABLE WHERE year = '2019' AND id IN ('1', '4')
If instead you want to return the years where there are duplicates you can use GROUP BY and HAVING:
SELECT id, year, COUNT(*)
FROM TABLE
WHERE year = '2019'
GROUP BY year
HAVING COUNT(*) > 1
Note that you'll need to replace TABLE with your table name.
TRY this: you can achieve exactly what you want by using EXISTS and HAVING as below:
CREATE TABLE #test(id INT, year INT)
INSERT INTO #test(id, year) VALUES
(1, 2019),
(1, 2020),
(1, 2021),
(2, 2019),
(3, 2019),
(4, 2018),
(4, 2019)
SELECT t.id
FROM #test t
WHERE EXISTS(SELECT 1 FROM #test t1 WHERE t.id = t1.id AND t1.year = 2019)
GROUP BY id HAVING COUNT(t.id) > 1
So there are 2 conditions, 1st is a simple where: year = 2019, second is a condition of group, if group by id, it can be written as having count(*) > 1.
However you can not write sql with both where and having, as where year = 2019 will impact the grouping, only 2019 rows participates the grouping, and all count is 1.
This can be written with with a sub query, to avoid above problem.
select id
from table
where id in (select id from table where year = 2019)
group by id
having count(*) > 1
If you formulated correctly the requirements the query would be:
select id, year, count(*)
from table_name tn
where tn.year = '2019'
and exists (select year
from table_name tn2
where tn2.id = tn.id and tn2.year = '2018')
and exists (select year
from table_name tn2
where tn2.id = tn.id and tn2.year = '2020')
and exists (select year
from table_name tn2
where tn2.id = tn.id and tn2.year = '2021')
group by tn.year
having count(*) > 1

Find max over multiple columns

I am trying to query a list of meetings from the most recent semester, where semester is determined by two fields (year, semester). Here's a basic outline of the schema:
Otherfields Year Semester
meeting1 2014 1
meeting2 2014 1
meeting3 2013 2
... etc ...
As the max should be considered for the Year first, and then the Semester, my results should look like this:
Otherfields Year Semester
meeting1 2014 1
meeting2 2014 1
Unfortunately simply using the MAX() function on each column separately will try to find Year=2014, Semester=2, which is incorrect. I tried a couple approaches using nested subqueries and inner joins but couldn't quite get something to work. What is the most straightforward approach to solving this?
Using a window function:
SELECT Year, Semester, RANK() OVER(ORDER BY Year DESC, Semester DESC) R
FROM your_table;
R will be a column containing the "rank" of the couple (Year, Semester). You can then use this column as a filter, for instance :
WITH TT AS (
SELECT Year, Semester, RANK() OVER(ORDER BY Year DESC, Semester DESC) R
FROM your_table
)
SELECT ...
FROM TT
WHERE R = 1;
If you don't want gaps between ranks, you can use dense_rank instead of rank.
This answer assumes you use a RDBMS who is advanced enough to offer window functions (i.e. not MySQL)
I wouldn't be surprised if there's a more effecient way to do this (and avoid the duplicate subquery), but this will get you the answer you want:
SELECT * FROM table WHERE Year =
(SELECT MAX(Year) FROM table)
AND Semester =
(SELECT MAX(Semester) FROM table WHERE Year =
(SELECT MAX(Year) FROM table))
Here's Postgres:
with table2 as /*virtual temporary table*/
(
select *, year::text || semester as yearsemester
from table
)
select Otherfields, year, semester
from table2
where (Otherfields, yearsemester) in
(
select Otherfields, max(yearsemester)
from table2
group by Otherfields
)
I've been overthinking this, there's a much simpler way to get this:
SELECT Meeting.year, Meeting.semester, Meeting.otherFields
FROM Meeting
JOIN (SELECT year, semester
FROM Meeting
WHERE ROWNUM = 1
ORDER BY year DESC, semester DESC) MostRecent
ON MostRecent.year = Meeting.year
AND MostRecent.semester = Meeting.semester
(and working Fiddle)
Note that variations of this should work for pretty much all dbs (anything that supports a limiting clause in a subquery); here's the MySQL version, for example:
SELECT Meeting.year, Meeting.semester, Meeting.otherFields
FROM Meeting
JOIN (SELECT year, semester
FROM Meeting
ORDER BY year DESC, semester DESC
LIMIT 1) MostRecent
ON MostRecent.year = Meeting.year
AND MostRecent.semester = Meeting.semester
(...and working fiddle)
Given some of the data in this answer this should be performant for Oracle, and I suspect other dbs as well (given the shortcuts the optimizer is allowed to take). This should be able to replace the use of things like ROW_NUMBER() in most instances where no partitioning clause is provided (no window).
why don't you simply use ORDER BY???
that way, it would be easier to handle and less messy!! :)
SELECT * FROM table
Where Year = (Select Max(Year) from table) /* optional clause to select only 2014*/
Order by Semester ASC, Year DESC, Otherfields; /*numericaly lowest sem first. in case of sem clash, sort by descending year first */
EDIT
In case, you need limited results from 2014, use Limit clause ( for mysql )
SELECT * FROM table
Where Year = (Select Max(Year) from table)
Order by Semester ASC, Year DESC, Otherfields
LIMIT 10;
It will order first, then get the Limit - 10, so u get your limited result set!
This will fetch output like :
Otherfields Year Semester
meeting1 2014 1
meeting2 2014 1
meeting1 2013 1
meeting2 2013 2
Answering my own question here:
This query was run in a stored procedure, so I went ahead and found the maximum year/semester in separate queries before the rest of the query. This is most likely inefficient and inelegant, but it is also the most understandable method- I don't need to worry about other members of my team getting confused by it. I'll leave this question here since it's generally applicable to many other situations, and there appear to be some good answers providing alternative approaches.
-- Find the most recent year.
SELECT MAX(year) INTO max_year FROM meeting;
-- Find the most recent semester in the year.
SELECT MAX(semester) INTO max_semester FROM meeting WHERE year = max_year;
-- Open a ref cursor for meetings in most recent year/semester.
OPEN meeting_list FOR
SELECT otherfields, year, semester
FROM meeting
WHERE year = max_year
AND semester = max_semester;

2.5 percent increase of previous field?

I have a table with a series of IDs. Each ID has dates ranging up to year 2025 from current year. Each year for each ID has a specific price.
http://i.imgur.com/srplSDo.jpg
Once I get to a certain point with each ID, it no longer has a specific price. So what I am wanting to do is take the previous years price and increase it by 2.5 percent. I have figured a way to grab the previous years price with this
SELECT a.*,
(CASE
WHEN a.YEARLY_PRICING is not null
THEN a.YEARLY_PRICING
ELSE (SELECT b.YEARLY_PRICING
FROM #STEP3 b
WHERE (a.id = b.id) AND (b.YEAR = a.YEAR-1))*1.025
END) AS TEST
FROM #STEP3 a
which would provide these results:
http://imgur.com/MJutM99
but the problem I am having is after the first null year, it is still recognizing the previous yearly_pricing as null, which gives me the null results, so obviously this method won't work for me. Any other suggestions for improvement?
Thanks
WITH CTE AS
(
SELECT ID, Year, Price, Price AS Prev
FROM T A
WHERE Year = (SELECT min(year) FROM T WHERE T.ID = A.ID GROUP BY T.ID)
UNION ALL
SELECT T.ID, T.Year, T.Price, ISNULL(T.Price, 1.025*Prev)
FROM T JOIN CTE ON T.ID = CTE.ID
AND T.Year - 1 = CTE.YEAR
)
SELECT * FROM CTE
ORDER BY ID, Year
SQL Fiddle Demo
What you want is a way to find not just the previous year (year - 1), but instead the year that is previous and also has a not-null price. To query for such a year (without solving your problem), you would do something like this:
select a.*
, (select max(year)
from step3 b
where a.id=b.id and a.year>b.year and b.yearly_pricing is not null
) PRIOR_YEAR
from step3 a
Since SQL-Server allows common-table expressions, you can call the above query "TMP", and then approach it this way. The CALC_PRICE in any year will be the price from the "PRIOR_YEAR" found as per the above query, multiplied by factor. That factor will be 1.025 to the POWER of the number of years from "PRIOR_YEAR" to the current year.
You would end up with SQL like this:
with TMP AS (
select a.*
, (select max(year)
from step3 b
where a.id=b.id and a.year>b.year and b.yearly_pricing is not null
) PRIOR_YEAR
from step3 a
)
select t.*,
c.yearly_pricing As prior_price,
c.yearly_pricing * POWER(1.025 , (t.year-t.prior_year)) calc_price
from tmp t
left join step3 c
on t.id=c.id and t.prior_year = c.year
It still has nulls, etc. but those are easily handled with COALESCE() or CASE expressions like you had in your question.
Here's an SQL Fiddle which shows how it works: http://sqlfiddle.com/#!3/296a4/21

Looking for DB2 SQL query

I am not very familiar with SQL and I hope some expert here can show me suitable and efficient query for what I want to achieve. I am using DB2 by the way.
Below is a screenshot of a sample data. What I need is for a given year, select the record with distinct ID1+ID2+Name columns and maximum (most recent) effective date (in YYYYMMDD format, stored as integer), with the above year being in between YearFrom and YearTo range.
FOr anyone that cant see a screenshot:
NAME YearFrom YearTo ID1 ID2 EffDate
item1 2002 2005 AB 10 20091201
item1 2009 2013 AB 10 20100301
item2 2001 2004 XX 20 20050103
item2 2002 2009 XX 20 20060710
item2 2007 2013 XX 20 20090912
item3 2005 2010 YY 30 20110304
I hope I explained it well. For example if user is looking for available items in year 2011, item1 (with eff. date 20100301) and item 2 (with eff. date 20090912) will be returned.
If someone is looking for items available in year 2008: item2 (with eff. date 20090912) and item 3 will be returned. Item 1 will not be returned in this case because the most recent record for item 1 has range of 2009-2013.
I think I have the first part of the query right, but I dont know how to select the valid records from that results based on the year in one query.
select name,id1,id2,max(effdate)
from [table]
group by name,id1,id2
Any help would be much appreciated.
you can go with below qyery for this type of output --
-- you want to check with the row where effective date is the maximum for the item name column then you can take only those records and then we can put year condition on those records.
SELECT NAME, Id1, Id2, Effdate
FROM Table_Name t_1
WHERE Effdate =
(SELECT (t_2.Effdate)
FROM Table_Name t_2
WHERE t_2.NAME = t_1.NAME
and t_2.id1 = t_1.id1
and t_2.id2 = t_1.id2
GROUP BY t_2.name,t_2.id1,t_2.id2)
AND Your_Year_Variable_Value BETWEEN t_1.Yearfrom AND t_1.Yearto
It's not clear whether these two statements are in conflict. I think they are in conflict, and I'm going with statement 1 in the code below.
[1.] What I need is for a given year, select the record with distinct ID1+ID2+Name columns and maximum (most recent) effective date (in YYYYMMDD format, stored as integer), with the above year being in between YearFrom and YearTo range.
[2.] Item 1 will not be returned in this case because the most recent record for item 1 has range of 2009-2013.
I would say that item 1 would not be returned, because it has no information for year 2008. If it did have information for 2008, it should be returned per statement 1 above, regardless of whether there happened to be more recent data.
If you expand your table so each year appears in a row by itself, rather than being implied by a range like 2002-2005, it's pretty simple. The query below is in PostgreSQL; you should only have to replace the first common table expression with a DB2 equivalent to generate a table of numbers (or use an actual table of numbers), and fixup the CTE syntax. (DB2's CTE syntax is unique.)
with years as (
select generate_series(2000, 2020) as year
),
expanded_table1 as (
select id1, id2, name, year, yearfrom, yearto, effdate
from Table1
inner join years on years.year between YearFrom and YearTo
)
select id1, id2, name, year, max(effdate)
from expanded_table1
where year = 2008
group by id1, id2, name, year
Explanation
This query, the first CTE, generates a series of integers that represent all the years we might be interested in. A more robust solution might select the minimum and maximum years for the number generator from your table instead of using integer literals.
select generate_series(2000, 2020) as year;
YEAR
--
2000
2001
2002
...
2020
By joining that table with your table, we can expand the ranges into rows.
with years as (
select generate_series(2000, 2020) as year
)
select id1, id2, name, year, yearfrom, yearto, effdate
from Table1
inner join years on years.year between YearFrom and YearTo
order by id1, id2, name, year;
ID1 ID2 NAME YEAR YEARFROM YEARTO EFFDATE
--
AB 10 item1 2002 2002 2005 20091201
AB 10 item1 2003 2002 2005 20091201
AB 10 item1 2004 2002 2005 20091201
AB 10 item1 2005 2002 2005 20091201
...
Having prepared the foundation this way, the query to find the maximum effective date for each distinct combination of id1, id2, name, for a given year is just a simple GROUP BY with a WHERE clause.
with years as (
select generate_series(2000, 2020) as year
),
expanded_table1 as (
select id1, id2, name, year, yearfrom, yearto, effdate
from Table1
inner join years on years.year between YearFrom and YearTo
)
select id1, id2, name, year, max(effdate)
from expanded_table1
where year = 2011
group by id1, id2, name, year
ID1 ID2 NAME YEAR MAX
--
AB 10 item1 2011 20100301
XX 20 item2 2011 20090912

SQL selecting where all distinct values exist in another column

I have a table in which the first two rows are Company, Year. Each company will have some years, but not necessarily all of them:
ABC | 2010
ABC | 2011
ABC | 2012
BBC | 2011 //does not have all the years, don't want to select it
I'd like to select a list of companies which have ALL the years (not just some of them), but I'm having trouble writing a select query to do that. I imagine this is really easy but I can't figure it out for some reason.
try
select company
from your_table
group by company
having count(distinct year) = (select count(distinct year) from your_table)
Select * FROM Company Where CompanyId In(
select CompanyId From Company
group by CompanyId
having count(*) = (select count(distinct Year) from Company)
)
http://www.sqlfiddle.com/#!3/c2f81/11
Note that if you alread know how many years there should be, then obviously you would just say that number instead of doing a select distinct year.
SELECT Company
FROM Table
GROUP BY Company
HAVING COUNT(Distinct YEAR) = 3