Determine monthly values of timestamped records - sql

I have a SQL table with the following schema:
fruit_id INT
price FLOAT
date DATETIME
This table contains many records where the price of a given fruit is recorded at a given time. There may be multiple records in a single day, there may be
I would like to be able to fetch a list of prices for a single fruit over the last 12 months inclusive of the current month. So given a fruit_id of 2 and datetime of now(), what would the price values be for December, January, February, ... October, November?
Given the above requirements, what strategy would you use to get this data? Pure sql, fetch all prices and process in code?
Thanks for you time.

Are you talking about min price, max price, average price, or something else?
Here's a quick query to get you started, which includes min, max, and average price for each month for fruit_id 2:
select left(date,7) as the_month, min(price),max(price),avg(price)
from fruit_price
where fruit_id = 2
and date >= concat(left(date_sub(curdate(), interval 11 month),7),'-01')
group by the_month;

If I understand it correctly from -
I would like to be able to fetch a list of prices for a single fruit over the last 12 months inclusive of the current month. So given a fruit_id of 2 and datetime of now(), what would the price values be for December, January, February, ... October, November?
You want the total price for every month for a single year based on the date and fruit_if you pass in.
So,this won't give all months of an year but all months which had a price for year..in case you want all months..you would need to create a dimdate table which will have all the dates...and then join with it..
declare #passeddate=Now() --date to be calculated
declare #fruit_id=2 --fruit id to be calculated
Select
fruit_id as FruitId,
Sum(price) as MonthPrice,
Month(date) as FruitMonth
from SQL_Table
group by FruitMonth,FruitId
where fruit_id=#fruit_id and
Year(date)=Year(#passeddate)

select month(date) as "Month", distinct(price) as "Unique Price" where fruit_id = 2 group by month(date);
I'd try to state as much as possible in SQL that does not require unindexed access to data because it's usually fast(er) than processing it with the application.

Related

Calculate the monthly average including the date where data is missing

I want to calculate the monthly average of some data using SQL query where the data resides in redshift DB.
The data is present in the following format in the table.
s_date | sales
------------+-------
2020-08-04 | 10
2020-08-05 | 20
---- | --
---- | --
The data may not be present for all the date in a month. If the data is not present for a day, it should be considered as 0.
Following query using AVG() function "group by" month as gives the average of based on the data on available date.
select trunc(date_trunc('MONTH', s_date)::timestamp) as month, avg(sales) from sales group by month;
However it does not consider the data for missing dates as 0. What should be the right query to calculate the monthly average as expected?
One more expectation is that, for the current month, the average should be calculated based on the data till today. So it should not consider entire month (like 30 or 31 days).
Regards,
Paul
Using a calendar table might be the easiest way to go here:
WITH dates AS (
SELECT date_trunc('day', t)::date AS dt
FROM generate_series('2020-01-01'::timestamp, '2020-12-31'::timestamp, '1 day'::interval) t
),
cte AS (
SELECT t.dt, COALESCE(SUM(s.sales), 0) AS sales
FROM dates t
LEFT JOIN sales s ON t.dt = s.s_date
GROUP BY t.dt
)
SELECT
LEFT(dt::text, 7) AS ym,
AVG(sales) AS avg_sales
FROM cte
GROUP BY
LEFT(dt::text, 7);
The logic here is to first generate an intermediate table in the second CTE which has one record for each data in your data set, along with the total sales for that date. Then, we aggregate by year/month, and report the average sales.

SQL Query to recursively track month of purchase

I have a table with customer id and month of purchase. For each customer, I first need to segment them on their first month of purchase, i.e., if a customer did their first purchase on 10 June 2017, then they belong to bucket June 2017. See below sample data table.
Then for each subsequent purchase of that customer (say from June 2017 segment), we need to track the month. For instance, if the June 2017 customer did their second purchase on 25 June 2017 and 3rd purchase on 11 Aug 2017. Then second purchase will be counted in 1st Month (within 30 days of 1st transaction) and 3rd purchase will be counted in 3rd month, as difference between 11 Aug 2017 and 10 June 2017 is 62 days, which lies between 61 and 90 days, hence in the 3rd month.
See below sample output table, although I need it in percentage form (% of customer who did in first month, second month, etc.). In the table, we are showing all the customers who did their first transaction say in Jan 2017 and then how many of them did transactions in subsequent months.
This tracking needs to be done for each customer. While I believe I am comfortable with the first part, wherein I need to segment each customer, I can do that based on first or partition.
I am not sure about how to do this recursively for subsequent transactions.
Thanks in advance for help!
You simply use window functions to define the original month and then conditional aggregation.
You don't mention the database, but this is the idea:
select to_char(first_purchase_date, 'YYYY-MM') as yyyymm,
sum(case when months_between(first_purchase_date, purchase_date) = 1 then 1 else 0 end) as purchases_1,
sum(case when months_between(first_purchase_date, purchase_date) = 1 then 1 else 0 end) as purchases_2,
. . .
from (select t.*,
min(purchase_date) over (partition by customer_id) as first_purchase_date
from t
) t
group by first_purchase_date;
I invented the months_between() and to_char() functions, but you should get the idea.
The above tracks purchases. To get customers, you can use:
(count(distinct case when months_between(first_purchase_date, purchase_date) = 1 then customer_id) /
count(distinct customer_id)
) as month_1_ratio
You can use the lag function to create a column “previous purchase.
Lag(purchasemonth,1) over(partition. by customerid order by purchasemonth) as [PreviousPurchaseDate]
Then simply do a datediff and bucket as you wish.

Ms ACCESS: calculating past annual averages over varying date ranges

In a form on Ms ACCESS, a user can select a commodity (such as copper, nickel, etc.) from a list and a commodity price date from a list. A trailing 12 month average commodity price should then be calculated.
For example: the user selects Copper as commodity and February 1st 2010, 02/01/2010. I then want the average price to be calculated over the time period: [02/01/2009 - 02/01/2010].
I'm not sure how to write this in query form. This is the current incomplete code;
SELECT Avg(CommPrices.Price) AS Expr1,
FROM CommPrices
WHERE (((CommPrices.Commodity)=[Forms]![Tool Should Cost]![List243]))
AND CommPrices.DateComm = [Forms]![Tool Should Cost]![List55];
List243 is the list of commodities the user can select from, list55 is the list of dates the user can select. All data is obtained from the table CommPrices.
Note: the earliest dates in the column DateComm is 01/01/2008. So if the user selects a date for example 02/01/2008, then calculating the average over the past 12 months before 02/01/2008 won't be possible. I do want the code to still calculate the average using the dates available. (in the example it would just be the average over the past month)
Second Note: the column DateComm only has monthly dates for the first day of every month (e.g 01/01/2008, 02/01/2008, 03/01/2008). The dates listed in list55 can refer to different days in the month (e.g 03/16/2009), in that case I want the code to still calculate the past 12 month average using the closest commodity dates possible. So if the user selects date 03/16/2009, I want the code to calculate the 12 month average for 03/01/2008 - 03/01/2009.
For "integer" months it would be:
SELECT
Avg(CommPrices.Price) AS AveragePrice,
FROM
CommPrices
WHERE
CommPrices.Commodity=[Forms]![Tool Should Cost]![List243]
AND
CommPrices.DateComm = BETWEEN
DateSerial(Year([Forms]![Tool Should Cost]![List55]) - 1, Month([Forms]![Tool Should Cost]![List55]), 1)
AND
DateSerial(Year([Forms]![Tool Should Cost]![List55]), Month([Forms]![Tool Should Cost]![List55]), 1)

Earliest and Lastdate for each year in sql

I have a column with 3 columns. I have multiple records for a year. As you see some of my records as follows
ID stardate enddate
1 1/1/2010 5/3/2010
2 2/4/2010 NULL -**EDIT**
3 1/2/2011 5/6/2011
4 3/4/2011 NULL -**EDIT**
I want to get a result for the earliest date in that year and the last date in that year. So output could be like
**EDITED:** 1/1/2010 12/31/2010 - For Year 2010
**EDITED:** 1/2/2011 12/31/2011 - For Year 2011
How can i get that in a query?If you need more info,please ask. Thanks
EDIT: If for the year if one of the columns read NULL then I have to consider the last day of the year as the enddate. i.e.12/31/YYYY. And I need to do that for each year again.
Assuming you use DATE (or related) columns in a MySQL table, something like this should serve your request:
SELECT MIN(startdate),
MAX(enddate),
YEAR(startdate)
FROM my_table
GROUP BY YEAR(startdate);
This groups all entries by year (of the startdate) and show you the minimum and maximum entries for each year as you want. See also the documentation for the DATE function in MySQL.
There are similar date functions and possibilities if you are using an other database system. Usually you can easily find them by googling the database system and something like "date functions".
select MIN(stardate),max(enddate)
from [Tablename]
where YEAR(enddate)=2013

Returning all rows falling within a time period

I have a doubt in writing sql.
I had a farmerfields table with
YEAR,SEASON,Number of Fields.
and season look like this
Kharif---- 15june-15Oct
Rabi---15 oct to 15 Feb
Summer----15Feb to 15 June
now i want to write sql which returns all the rows excluding the current season in the current year. ie we should get the current season based on system date.
I am cracking my brain to get this, but could not.
Please help me.
I would suggest that you define a seasons table with three rows as above, e.g.
create table season (
season_id int,
description varchar(32),
start_day_of_month int,
start_month int
end_day_of_month int,
end_month int
)
the year is not included here just the day of month and month indices.
Your farmerfields table should then have a seaon_id column referring to this and most likely have a year column too.
Depending on your SQL vendor different date functions will be available but should should be able to compose a start and end date using the year from farmerfields and the month and day-of-month from season. Given this you can then determine if the current date falls within a given farmerfield entry's start and end dates.
Your table structure is wrong and not fit for what you need.
Instead of single field called SEASON, have two fields: SEASON_START and SEASON_END both of type Date then the query is as simple as:
Select * From [farmerfields] Where GetDate() Between SEASON_START And SEASON_END
If the names are part of your current SEASON field, add third field SEASON_NAME as well and the new structure will be:
SEASON_NAME | SEASON_START | SEASON_END
---------------------------------------
Kharif | 15june | 15Oct
Rabi | 15 oct | 15 Feb
...
Edit: in my above sample code I assumed you have SQL Server database - in case of different database you'll have different function to get current system date.
For difference of dates you can use the sql function DATEDIFF
SELECT * FROM table WHERE `date` BETWEEN '2011-10-20' AND '2011-1-1' AND DATEDIFF(`date`, '2011-10-20') % 10 = 0
I would suggest that you define a seasons table with three rows as above, e.g.
create table season (
season_id int,
description varchar(32),
start_day_of_month int,
start_month int
end_day_of_month int,
end_month int
)
The year is not included here just the day of month and month indices.
Your farmerfields table should then have a season_id column referring to this and most likely have a year column too.
Depending on your SQL vendor different date functions will be available but should should be able to compose a start and end date using the year from farmerfields and the month and day-of-month from season. Given this you can then determine if the current date falls within a given farmerfield entry's start and end dates.