I'm having some trouble building a query that will group my items into monthly ranges according to whenever they exist in a month or not. I'm using PostgreSQL.
For example I have a table with data as this:
Name Period(text)
Ana 2010/09
Ana 2010/10
Ana 2010/11
Ana 2010/12
Ana 2011/01
Ana 2011/02
Peter 2009/05
Peter 2009/06
Peter 2009/07
Peter 2009/08
Peter 2009/12
Peter 2010/01
Peter 2010/02
Peter 2010/03
John 2009/05
John 2009/06
John 2009/09
John 2009/11
John 2009/12
and I want the result query to be this:
Name Start End
Ana 2010/09 2011/02
Peter 2009/05 2009/08
Peter 2009/12 2010/03
John 2009/05 2009/06
John 2009/09 2009/09
John 2009/11 2009/12
Is there any way to achieve this?
This is an aggregation problem, but with a twist -- you need the define the groups of adjacent months for each name.
Assuming that the month never appears more than once for a given name, you can do this by assigning a "month" number to each period and subtracting a sequential number. The values will be a constant for months that are in a row.
select name, min(period), max(period)
from (select t.*,
(cast(left(period, 4) as int) * 12 + cast(right(period, 2) as int) -
row_number() over (partition by name order by period)
) as grp
from names t
) t
group by grp, name;
Here is a SQL Fiddle illustrating this.
Note: duplicates are not really a problem either. You would jsut use dense_rank() instead of row_number().
I don't know if there is an easier way (there probably is) but I can't think of one right now:
with parts as (
select name,
to_date(replace(period,'/',''), 'yyyymm') as period
from names
), flagged as (
select name,
period,
case
when lag(period,1, (period - interval '1' month)::date) over (partition by name order by period) = (period - interval '1' month)::date then null
else 1
end as group_flag
from parts
), grouped as (
select flagged.*,
coalesce(sum(group_flag) over (partition by name order by period),0) as group_nr
from flagged
)
select name, min(period), max(period)
from grouped
group by name, group_nr
order by name, min(period);
The first common table expression (parts) simple changes the period into a date so that it can be used in an arithmetic expression.
The second CTE (flagged) assigns a flag each time the gap (in months) between the current row and the previous is not one.
The third CTE then accumulates those flags to define a unique group number for each consecutive number of rows.
The final select then simply gets the start and end period for each group. I didn't bother to convert the period back to the original format though.
SQLFiddle example that also shows the intermediate result of the flagged CTE: http://sqlfiddle.com/#!15/8c0aa/2
Well one of the common ways to do this could be recursive SQL:
with recursive cte1 as (
select
"Name" as name,
("Period"||'/01')::date as period
from Table1
), cte2 as (
select
c.name, c.period as s, c.period as e
from cte1 as c
where not exists (select * from cte1 as t where t.name = c.name and t.period = c.period - interval '1 month')
union all
select
c.name, c.s as s, t.period
from cte2 as c
inner join cte1 as t on t.name = c.name and t.period = c.e + interval '1 month'
)
select
c.name, to_char(c.s, 'YYYY/MM') as "Start", to_char(max(c.e), 'YYYY/MM') as "End"
from cte2 as c
group by c.name, c.s
order by 1, 2
I'm not sure about performance of this one, you have to test it.
sql fiddle demo
Related
I have a table that stores salary information (SALARY with fields such as NATIONAL_ID, SALYEAR, SALMONTH, SALAMOUNT, DATE_PAID, etc) for employees.
I need to extract data from that table including the last month an employee was paid a salary.
Unfortunately, DATE_PAID column is null for many cases in that table which forces me to think of using a combination of SALYEAR, SALMONTH to determine the highest value.
SALMONTH stores numbers from 1-12 and SALYEAR stores year information i.e 2010, 2015, etc.
Using ROW_NUMBER, we can try:
WITH cte AS (
SELECT t.*, ROW_NUMBER() OVER (PARTITION BY NATIONAL_ID
ORDER BY SALYEAR DESC, SALMONTH DESC) rn
FROM yourTable t
)
SELECT *
FROM cte
WHERE rn = 1;
The above approach will target the latest record for each NATIONAL_ID, with "latest" being defined as having the most recent month in the most recent year.
select max(to_date(sal_year||to_char(sal_month,'FM00'),'yyyymm'))
should do it. The TO_DATE is really just for completeness if you want a date datatype.
You would need to use COALESCE like this:
SELECT MAX(COALESCE(YEAR(DATE_PAID) * 100 + MONTH(DATE_PAID), SALYEAR * 100 + SALMONTH))
FROM tablename
Use Correlated Sub Query -
with temp as (
select national_id,
max(to_number(to_char(salyear)||lpad(to_char(salmonth),2,'0'))) as max_salmonthyear
from salary
group by national_id)
select *
from salary s, temp t
where s.national_id = t.national_id
and to_number(to_char(salyear)||lpad(to_char(salmonth),2,'0')) = t.max_salmonthyear
order by 1;
Why not just use aggregation?
select nationalid, max(salyear * 100 + salmonth) as salyearmonth
from t
group by nationalid;
If you want to convert the yearmonth to a date:
select nationalid,
to_date(max(salyear * 100 + salmonth), 'YYYYMM') as salyearmonth
from t
group by nationalid;
This returns the first date of the salary month.
I have a table with a column for customer names, a column for purchase amount, and a column for the date of the purchase. Is there an easy way I can find how much first time customers spent on each day?
So I have
Name | Purchase Amount | Date
Joe 10 9/1/2014
Tom 27 9/1/2014
Dave 36 9/1/2014
Tom 7 9/2/2014
Diane 10 9/3/2014
Larry 12 9/3/2014
Dave 14 9/5/2014
Jerry 16 9/6/2014
And I would like something like
Date | Total first Time Purchase
9/1/2014 73
9/3/2014 22
9/6/2014 16
Can anyone help me out with this?
The following is standard SQL and works on nearly all DBMS
select date,
sum(purchaseamount) as total_first_time_purchase
from (
select date,
purchaseamount,
row_number() over (partition by name order by date) as rn
from the_table
) t
where rn = 1
group by date;
The derived table (the inner select) selects all "first time" purchases and the outside the aggregates based on the date.
The two key concepts here are aggregates and sub-queries, and the details of which dbms you're using may change the exact implementation, but the basic concept is the same.
For each name, determine they're first date
Using the results of 1, find each person's first day purchase amount
Using the results of 2, sum the amounts for each date
In SQL Server, it could look like this:
select Date, [totalFirstTimePurchases] = sum(PurchaseAmount)
from (
select t.Date, t.PurchaseAmount, t.Name
from table1 t
join (
select Name, [firstDate] = min(Date)
from table1
group by Name
) f on t.Name=f.Name and t.Date=f.firstDate
) ftp
group by Date
If you are using SQL Server you can accomplish this with either sub-queries or CTEs (Common Table Expressions). Since there is already an answer with sub-queries, here is the CTE version.
First the following will identify each row where there is a first time purchase and then get the sum of those values grouped by date:
;WITH cte
AS (
SELECT [Name]
,PurchaseAmount
,[date]
,ROW_NUMBER() OVER (
PARTITION BY [Name] ORDER BY [date] --start at 1 for each name at the earliest date and count up, reset every time the name changes
) AS rn
FROM yourTableName
)
SELECT [date]
,sum(PurchaseAmount) AS TotalFirstTimePurchases
FROM cte
WHERE rn = 1
GROUP BY [date]
I have a list of dates and I want to find out which one occurs the earliest in the year, I used a dense rank function to only extract the date and the month, but I can't get it to return all the values equal to 1 (there may be multiple earliest dates not just one).
SELECT
S.SG_HOSTCITY,
C.COUNTRY_OLYMPIC_CODE,
DENSE_RANK() OVER (ORDER BY to_char(S.SG_START, 'MMDD')) AS RN
FROM
SUMMERGAMES S,
COUNTRY C
WHERE
S.COUNTRY_ISOCODE = C.COUNTRY_ISOCODE
RN = 1
ORDER BY RN;
Just spits out 00933. 00000 - "SQL command not properly ended"
Can anyone help? I don't know what I'm doing wrong.
Put it into an inline view:
select SG_HOSTCITY, COUNTRY_OLYMPIC_CODE
from (SELECT S.SG_HOSTCITY,
C.COUNTRY_OLYMPIC_CODE,
DENSE_RANK() OVER(ORDER BY to_char(S.SG_START, 'MMDD')) AS RN
FROM SUMMERGAMES S
join COUNTRY C
on S.COUNTRY_ISOCODE = C.COUNTRY_ISOCODE)
WHERE RN = 1
You can't use the WHERE clause to filter in on the output values of an analytic function within the same query. You have to put it into a subquery. The above is the same as your current query but is free of syntax errors.
However I don't know if it will actually give you the output you're expecting. I might also try:
select *
from (SELECT S.SG_HOSTCITY,
C.COUNTRY_OLYMPIC_CODE,
DENSE_RANK() OVER( partition by TRUNC(S.SG_START, 'YYYY')
order BY TRUNC(S.SG_START) ) AS RN
FROM SUMMERGAMES S
join COUNTRY C
on S.COUNTRY_ISOCODE = C.COUNTRY_ISOCODE)
WHERE RN = 1
This will give you combinations of SG_HOSTCITY and COUNTRY_OLYMPIC_CODE falling on the first SG_START date associated with each year. If the first of the year 2002 is 1/5, for instance, and there are 5 such SG_HOSTCITY and COUNTRY_OLYMPIC_CODE values falling on that date for year 2002, this will show all 5 for that year, because it will bring back ties.
The difference is that the rank ascends and then restarts at the change in each year, not throughout all years (notice the partition).
I'm thinking the second query above is what you really want.
The topic might be a little bit unclear but I couldn't describe in a single sentence what I want to achieve.
Say I have a table that is (columns)
id INT PK
name VARCHAR
date DATE
I have a grouping select
select
name,
max(date)
from table
group by name
that gives me a name and the latest date.
What is the easiest way to join the id column to the current aggregated result set with the id value where the date was the maximum?
Let me explain what my goal is with an example:
The table is filled with the data as follows
id name date
1 david 2012-12-12
2 david 2013-12-02
3 patrick 2014-01-02
4 patrick 2012-11-11
and by my query I'd like to get the following result
id name date
2 david 2013-12-02
3 patrick 2014-01-02
Notice that all the records for name = 'david' are aggregated and the maximum date is selected. How to get the row id for this maximum date?
One option is to use ROW_NUMBER():
SELECT id, name, date
FROM (
SELECT id, name, date,
row_number() over (partition by name order by date desc) rn
FROM yourtable
) t
WHERE rn = 1
SQL Fiddle Demo
Another option is to join the table back to itself using the MAX() aggregate. This option could potentially result in ties if multiple id/name combinations share the same max date:
SELECT t.id, t.name, t.date
FROM yourtable t
JOIN (SELECT name, max(date) maxdate
FROM yourtable
GROUP BY name) t2 on t.name = t2.name AND t.date = t2.maxdate
More Fiddle
I have a table with a series of IDs. Each ID has dates ranging up to year 2025 from current year. Each year for each ID has a specific price.
http://i.imgur.com/srplSDo.jpg
Once I get to a certain point with each ID, it no longer has a specific price. So what I am wanting to do is take the previous years price and increase it by 2.5 percent. I have figured a way to grab the previous years price with this
SELECT a.*,
(CASE
WHEN a.YEARLY_PRICING is not null
THEN a.YEARLY_PRICING
ELSE (SELECT b.YEARLY_PRICING
FROM #STEP3 b
WHERE (a.id = b.id) AND (b.YEAR = a.YEAR-1))*1.025
END) AS TEST
FROM #STEP3 a
which would provide these results:
http://imgur.com/MJutM99
but the problem I am having is after the first null year, it is still recognizing the previous yearly_pricing as null, which gives me the null results, so obviously this method won't work for me. Any other suggestions for improvement?
Thanks
WITH CTE AS
(
SELECT ID, Year, Price, Price AS Prev
FROM T A
WHERE Year = (SELECT min(year) FROM T WHERE T.ID = A.ID GROUP BY T.ID)
UNION ALL
SELECT T.ID, T.Year, T.Price, ISNULL(T.Price, 1.025*Prev)
FROM T JOIN CTE ON T.ID = CTE.ID
AND T.Year - 1 = CTE.YEAR
)
SELECT * FROM CTE
ORDER BY ID, Year
SQL Fiddle Demo
What you want is a way to find not just the previous year (year - 1), but instead the year that is previous and also has a not-null price. To query for such a year (without solving your problem), you would do something like this:
select a.*
, (select max(year)
from step3 b
where a.id=b.id and a.year>b.year and b.yearly_pricing is not null
) PRIOR_YEAR
from step3 a
Since SQL-Server allows common-table expressions, you can call the above query "TMP", and then approach it this way. The CALC_PRICE in any year will be the price from the "PRIOR_YEAR" found as per the above query, multiplied by factor. That factor will be 1.025 to the POWER of the number of years from "PRIOR_YEAR" to the current year.
You would end up with SQL like this:
with TMP AS (
select a.*
, (select max(year)
from step3 b
where a.id=b.id and a.year>b.year and b.yearly_pricing is not null
) PRIOR_YEAR
from step3 a
)
select t.*,
c.yearly_pricing As prior_price,
c.yearly_pricing * POWER(1.025 , (t.year-t.prior_year)) calc_price
from tmp t
left join step3 c
on t.id=c.id and t.prior_year = c.year
It still has nulls, etc. but those are easily handled with COALESCE() or CASE expressions like you had in your question.
Here's an SQL Fiddle which shows how it works: http://sqlfiddle.com/#!3/296a4/21