MySQL AVG function for recent 15 records by date (order date desc) in every symbol - sql

I am trying to create a statement in SQL (for a table which holds stock symbols and price on specified date) with avg of 5 day price and avg of 15 days price for each symbol.
Table columns:
symbol
open
high
close
date
The average price is calculated from last 5 days and last 15 days. I tried this for getting 1 symbol:
SELECT avg(close),
avg(`trd_qty`)
FROM (SELECT *
FROM cashmarket
WHERE symbol = 'hdil'
ORDER BY `M_day` desc
LIMIT 0,15 ) s
but I couldn't get the desired list for showing avg values for all symbols.

You can either do it with row numbers as suggested by astander, or you can do it with dates.
This solution will also take the last 15 days if you don't have rows for every day while the row number solution takes the last 15 rows. You have to decide which one works better for you.
EDIT: Replaced AVG, use CASE to avoid division by 0 in case no records are found within the period.
SELECT
CASE WHEN SUM(c.is_5) > 0 THEN SUM( c.close * c.is_5 ) / SUM( c.is_5 )
ELSE 0 END AS close_5,
CASE WHEN SUM(c.is_5) > 0 THEN SUM( c.trd_qty * c.is_5 ) / SUM( c.is_5 )
ELSE 0 END AS trd_qty_5,
CASE WHEN SUM(c.is_15) > 0 THEN SUM( c.close * c.is_15 ) / SUM( c.is_15 )
ELSE 0 END AS close_15,
CASE WHEN SUM(c.is_15) > 0 THEN SUM( c.trd_qty * c.is_15 ) / SUM( c.is_15 )
ELSE 0 END AS trd_qty_15
FROM
(
SELECT
cashmarket.*,
IF( TO_DAYS(NOW()) - TO_DAYS(m_day) < 15, 1, 0) AS is_15,
IF( TO_DAYS(NOW()) - TO_DAYS(m_day) < 5, 1, 0) AS is_5
FROM cashmarket
) c
The query returns the averages of close and trd_qty for the last 5 and the last 15 days. Current date is included, so it's actually today plus the last 4 days (replace < by <= to get current day plus 5 days).

Use:
SELECT DISTINCT
t.symbol,
x.avg_5_close,
y.avg_15_close
FROM CASHMARKET t
LEFT JOIN (SELECT cm_5.symbol,
AVG(cm_5.close) 'avg_5_close',
AVG(cm_5.trd_qty) 'avg_5_qty'
FROM CASHMARKET cm_5
WHERE cm_5.m_date BETWEEN DATE_SUB(NOW(), INTERVAL 5 DAY) AND NOW()
GROUP BY cm_5.symbol) x ON x.symbol = t.symbol
LEFT JOIN (SELECT cm_15.symbol,
AVG(cm_15.close) 'avg_15_close',
AVG(cm_15.trd_qty) 'avg_15_qty'
FROM CASHMARKET cm_15
WHERE cm_15.m_date BETWEEN DATE_SUB(NOW(), INTERVAL 15 DAY) AND NOW()
GROUP BY cm_15.symbol) y ON y.symbol = t.symbol
I'm unclear on what trd_qty is, or how it factors into your equation considering it isn't in your list of columns.
If you want to be able to specify a date rather than the current time, replace the NOW() with #your_date, an applicable variable. And you can change the interval values to suit, in case they should really be 7 and 21.

Have a look at How to number rows in MySQL
You can create the row number per item for the date desc.
What you can do is to retrieve the Rows where the rownumber is between 1 and 15 and then apply the group by avg for the selected data you wish.

trdqty is the quantity traded on particular day.
the days are not in order coz the market operates only on weekdays and there are holidays too so date may not be continuous

Related

SQL Report - query multiple tables

I have been working on a Stats page in APEX and currently have the following report query:
select to_char(DATELOGGED,'Month - YYYY') as Month,
COUNT(*) as "Total Calls",
SUM(case when CLOSED is null then 1 else null end) as "Open",
COUNT(case CLOSED when 'Y' then 1 else null end) as "Closed",
SUM(case when EXTREF is null then 0 else 1 end) as "Referred",
round((COUNT(case SLA_MET when 'Y' then 1 else null end)/COUNT(case CLOSED when 'Y' then 1 else null end)*100),2) as "SLA Met %"
from IT_SUPPORT_CALLS
GROUP BY to_char(DATELOGGED,'Month - YYYY')
order by MIN (DATELOGGED) desc
I wish to add the sum of DURATION from a different table:
select
"START_TIME",
DECODE(DURATION,null,'Open',((select extract( minute from DURATION )
+ extract( hour from DURATION ) * 60
+ extract( day from DURATION ) * 60 * 24
from dual)||' minutes')) DURATION
from "IT_DOWNTIME"
The IT_DOWNTIME table uses START_TIME (varchar2) as the date identifier, the IT_SUPPORT_CALLS uses DATELOGGED (DATE) as date identifier.
The current output for IT_DOWNTIME is for example:
08-FEB-2019 - 30 Minutes
20-FEB-2019 - 15 Minutes
I would like the report SUM and group IT_DOWNTIME and add this into the existing report.
Hope this makes sense.
Please let me know if I missed any information that would help to resolve this.
Many thanks
Thanks for that, much appreciated. Unfortunately it doesn't return any data from IT_DOWNTIME.
I'm guessing the different date formats doesn't help, hope this clears things up a bit:
These are the columns in IT_DOWNTIME that are of interest:
START_TIME ( VARCHAR2(30) )
DURATION ( INTERVAL DAY(2) TO SECOND(6) )
Example of current IT_DOWNTIME output without formatting:
START_TIME
06-JUL-2016 11:05
DURATION
+00 00:35:00.000000
Example of current IT_SUPPORT_CALLS output without formatting:
DATELOGGED
06/07/2016
Something like this will probably do it, but there has been some guesswork as to your column names etc:
SELECT *
FROM
(
SELECT
to_char(DATELOGGED,'MON-YYYY') as Month,
COUNT(*) as Total_Calls,
SUM(case when CLOSED is null then 1 else null end) as case_Open,
COUNT(case CLOSED when 'Y' then 1 else null end) as case_Closed,
SUM(case when EXTREF is null then 0 else 1 end) as case_Referred,
round((COUNT(case SLA_MET when 'Y' then 1 else null end)/COUNT(case CLOSED when 'Y' then 1 else null end)*100),2) as percent_SLA_met
FROM IT_SUPPORT_CALLS
GROUP BY to_char(DATELOGGED,'MON-YYYY')
) calls
LEFT JOIN
(
SELECT
SUBSTR(START_TIME, 4) as down_month,
SUM(extract(minute from DURATION) +
extract(hour from DURATION) * 60 +
extract(day from DURATION) * 60 * 24
) || 'minutes' as total_down_mins
FROM IT_DOWNTIME
WHERE duration is not null
GROUP BY SUBSTR(START_TIME, 4)
) downs
ON calls.month = downs.down_month
Changed your date formatting of the first query to be MON-YYYY to make it align with what you claim is the formatting of the varchar2 date of the second query (dd-mon-yyy), and substringed the date to remove the day, leaving just the month
Edit:
Ok, so since you've posted some different example data from IT_DOWNTIME I see the problem: there's a time on the date also. Your first sample data didn't contain this time, it was just a date (as a string) so I was doing...
SUBSTR('01-JAN-1970', 4)
...to reduce the day date to a month date ('JAN-1970') and this was intended to align with the stuff going on in the other table ( to_date() with a format of 'non-yyyy' )
Now we know that there is a time in there too, of course it won't align because...
SUBSTR('01-JAN-1970 12:34', 4)
...produces 'JAN-1970 12:34' and this will then not match to anything from the other table (which will be just 'JAN-1970' without the time), so the left join means that nulls will be output
The solution is to change the SUBSTR call so it cuts 8 characters, starting at position 4:
SUBSTR(start_time, 4, 8)
This will remove the day and the time, leaving just the month-year that we need. You'll need to make the change in two places in the query above..
Apologies for the delay on replying to this. However, that is working perfectly Caius, thanks very much! So to be complete, had to change your above code to:
SUBSTR(START_TIME, 4, 8) as down_month,
and
GROUP BY SUBSTR(START_TIME, 4, 8)

sql repeat rows for weekend and holidays

I have a table A that we import based on the day that it lands on a location. We dont receive files on weekend and public holidays, and the table has multiple countries data so the public holidays vary. In essence we looking to duplicate a row multiple times till it encounters the next record for that ID (unless its the max date for that ID). A typical record looks like this:
Account Datekey Balance
1 20181012 100
1 20181112 100
1 20181212 100
1 20181512 100
1 20181712 100
And needs to look like this (sat, sun & PH added to indicate the day of week):
Account Datekey Balance
1 20181012 100
1 20181112 100
1 20181212 100
1 20181312 100 Sat
1 20181412 100 Sun
1 20181512 100
1 20181612 100 PH
1 20181712 100
Also Datekey is numeric and not a date. I tried a couple solutions suggested but found that it simply duplicates the previous row multiple times without stopping when the next dates record is found. I need to run it as an update query that would execute daily on table A and add missing records when its executed (sometimes 2 or 3 days later).
Hope you can assist.
Thanks
This question has multiple parts:
Converting an obscene date format to a date
Generating "in-between" rows
Filling in the new rows with the previous value
Determining the day of the week
The following does most of this. I refuse to regenerate the datekey format. You really need to fix that.
This also assumes that your setting are for English week day names.
with t as (
select Account, Datekey, Balance, convert(date, left(dkey, 4) + right(dkey, 2) + substring(dkey, 5, 2)) as proper_date
from yourtable
),
dates as (
select account, min(proper_date) as dte, max(proper_date) as max_dte
from t
group by account
union all
select account, dateadd(day, 1, dte), max_dte
from dates
where dte < max_dte
)
select d.account, d.dte, t.balance,
(case when datename(weekday, d.dte) in ('Saturday', 'Sunday')
then left(datename(weekday, d.dte), 3)
else 'PH'
end) as indicator
from dates d cross apply
(select top (1) t.*
from t
where t.account = d.account and
t.proper_date <= d.dte
order by t.proper_date desc
) t
option (maxrecursion 0);

Sort Numbers in varchar value in SQL Server

My Goal is to load a monthly-daily tabular presentation of sales data with sum total and other average computation at the bottom,
I have one data result set with one column that is named as 'Day' which corresponds to the days of the month, with automatic datatype of int.
select datepart(day, a.date ) as 'Day'
On my second result set, is the loading of the sum at the bottom, it happens that the word 'Sum' is aligned to the column of Day, and I used Union All TO COMBINE the result set together, expected result set is something to this like
day sales
1 10
2 20
3 30
4 10
5 20
6 30
.
.
.
31 10
Sum 130
What I did is to convert the day value, originally in int to varchar datatype. this is to successfully join columns and it did, the new conflict is the sorting of the number
select * from #SalesDetailed
UNION ALL
select * from #SalesSum
order by location, day
Assuming your union query returns the correct results, just messes up the order, you can use case with isnumeric in the order by clause to manipulate your sort:
SELECT *
FROM
(
SELECT *
FROM #SalesDetailed
UNION ALL
SELECT *
FROM #SalesSum
) u
ORDER BY location,
ISNUMERIC(day) DESC,
CASE WHEN ISNUMERIC(day) = 1 THEN cast(day as int) end
The isnumeric will return 1 when day is a number and 0 when it's not.
Try this
select Day, Sum(Col) as Sales
from #SalesDetailed
Group by Day With Rollup
Edit (Working Sample) :
select
CASE WHEN Day IS NULL THEN 'SUM' ELSE STR(Day) END as Days,
Sum(Sales) from
(
Select 1 as Day , 10 as Sales UNION ALL
Select 2 as Day , 20 as Sales
) A
Group by Day With Rollup
EDIT 2:
select CASE WHEN Day IS NULL THEN 'SUM' ELSE STR(Day) END as Days,
Sum(Sales) as Sales
from #SalesDetailed
Group by Day With Rollup

Count occurrences of combinations of columns

I have daily time series (actually business days) for different companies and I work with PostgreSQL. There is also an indicator variable (called flag) taking the value 0 most of the time, and 1 on some rare event days. If the indicator variable takes the value 1 for a company, I want to further investigate the entries from two days before to one day after that event for the corresponding company. Let me refer to that as [-2,1] window with the event day being day 0.
I am using the following query
CREATE TABLE test AS
WITH cte AS (
SELECT *
, MAX(flag) OVER(PARTITION BY company ORDER BY day
ROWS BETWEEN 1 preceding AND 2 following) Lead1
FROM mytable)
SELECT *
FROM cte
WHERE Lead1 = 1
ORDER BY day,company
The query takes the entries ranging from 2 days before the event to one day after the event, for the company experiencing the event.
The query does that for all events.
This is a small section of the resulting table.
day company flag
2012-01-23 A 0
2012-01-24 A 0
2012-01-25 A 1
2012-01-25 B 0
2012-01-26 A 0
2012-01-26 B 0
2012-01-27 B 1
2012-01-30 B 0
2013-01-10 A 0
2013-01-11 A 0
2013-01-14 A 1
Now I want to do further calculations for every [-2,1] window separately. So I need a variable that allows me to identify each [-2,1] window. The idea is that I count the number of windows for every company with the variable "occur", so that in further calculations I can use the clause
GROUP BY company, occur
Therefore my desired output looks like that:
day company flag occur
2012-01-23 A 0 1
2012-01-24 A 0 1
2012-01-25 A 1 1
2012-01-25 B 0 1
2012-01-26 A 0 1
2012-01-26 B 0 1
2012-01-27 B 1 1
2012-01-30 B 0 1
2013-01-10 A 0 2
2013-01-11 A 0 2
2013-01-14 A 1 2
In the example, the company B only occurs once (occur = 1). But the company A occurs two times. For the first time from 2012-01-23 to 2012-01-26. And for the second time from 2013-01-10 to 2013-01-14. The second time range of company A does not consist of all four days surrounding the event day (-2,-1,0,1) since the company leaves the dataset before the end of that time range.
As I said I am working with business days. I don't care for holidays, I have data from monday to friday. Earlier I wrote the following function:
CREATE OR REPLACE FUNCTION addbusinessdays(date, integer)
RETURNS date AS
$BODY$
WITH alldates AS (
SELECT i,
$1 + (i * CASE WHEN $2 < 0 THEN -1 ELSE 1 END) AS date
FROM generate_series(0,(ABS($2) + 5)*2) i
),
days AS (
SELECT i, date, EXTRACT('dow' FROM date) AS dow
FROM alldates
),
businessdays AS (
SELECT i, date, d.dow FROM days d
WHERE d.dow BETWEEN 1 AND 5
ORDER BY i
)
-- adding business days to a date --
SELECT date FROM businessdays WHERE
CASE WHEN $2 > 0 THEN date >=$1 WHEN $2 < 0
THEN date <=$1 ELSE date =$1 END
LIMIT 1
offset ABS($2)
$BODY$
LANGUAGE 'sql' VOLATILE;
It can add/substract business days from a given date and works like that:
select * from addbusinessdays('2013-01-14',-2)
delivers the result 2013-01-10. So in Jakub's approach we can change the second and third last line to
w.day BETWEEN addbusinessdays(t1.day, -2) AND addbusinessdays(t1.day, 1)
and can deal with the business days.
Function
While using the function addbusinessdays(), consider this instead:
CREATE OR REPLACE FUNCTION addbusinessdays(date, integer)
RETURNS date AS
$func$
SELECT day
FROM (
SELECT i, $1 + i * sign($2)::int AS day
FROM generate_series(0, ((abs($2) * 7) / 5) + 3) i
) sub
WHERE EXTRACT(ISODOW FROM day) < 6 -- truncate weekend
ORDER BY i
OFFSET abs($2)
LIMIT 1
$func$ LANGUAGE sql IMMUTABLE;
Major points
Never quote the language name sql. It's an identifier, not a string.
Why was the function VOLATILE? Make it IMMUTABLE for better performance in repeated use and more options (like using it in a functional index).
(ABS($2) + 5)*2) is way too much padding. Replace with ((abs($2) * 7) / 5) + 3).
Multiple levels of CTEs were useless cruft.
ORDER BY in last CTE was useless, too.
As mentioned in my previous answer, extract(ISODOW FROM ...) is more convenient to truncate weekends.
Query
That said, I wouldn't use above function for this query at all. Build a complete grid of relevant days once instead of calculating the range of days for every single row.
Based on this assertion in a comment (should be in the question, really!):
two subsequent windows of the same firm can never overlap.
WITH range AS ( -- only with flag
SELECT company
, min(day) - 2 AS r_start
, max(day) + 1 AS r_stop
FROM tbl t
WHERE flag <> 0
GROUP BY 1
)
, grid AS (
SELECT company, day::date
FROM range r
,generate_series(r.r_start, r.r_stop, interval '1d') d(day)
WHERE extract('ISODOW' FROM d.day) < 6
)
SELECT *, sum(flag) OVER(PARTITION BY company ORDER BY day
ROWS BETWEEN UNBOUNDED PRECEDING
AND 2 following) AS window_nr
FROM (
SELECT t.*, max(t.flag) OVER(PARTITION BY g.company ORDER BY g.day
ROWS BETWEEN 1 preceding
AND 2 following) in_window
FROM grid g
LEFT JOIN tbl t USING (company, day)
) sub
WHERE in_window > 0 -- only rows in [-2,1] window
AND day IS NOT NULL -- exclude missing days in [-2,1] window
ORDER BY company, day;
How?
Build a grid of all business days: CTE grid.
To keep the grid to its smallest possible size, extract minimum and maximum (plus buffer) day per company: CTE range.
LEFT JOIN actual rows to it. Now the frames for ensuing window functions works with static numbers.
To get distinct numbers per flag and company (window_nr), just count flags from the start of the grid (taking buffers into account).
Only keep days inside your [-2,1] windows (in_window > 0).
Only keep days with actual rows in the table.
Voilá.
SQL Fiddle.
Basically the strategy is to first enumarate the flag days and then join others with them:
WITH windows AS(
SELECT t1.day
,t1.company
,rank() OVER (PARTITION BY company ORDER BY day) as rank
FROM table1 t1
WHERE flag =1)
SELECT t1.day
,t1.company
,t1.flag
,w.rank
FROM table1 AS t1
JOIN windows AS w
ON
t1.company = w.company
AND
w.day BETWEEN
t1.day - interval '2 day' AND t1.day + interval '1 day'
ORDER BY t1.day, t1.company;
Fiddle.
However there is a problem with work days as those can mean whatever (do holidays count?).

Last three months average for each month in PostgreSQL query

I'm trying to build a query in Postgresql that will be used for a budget.
I currently have a list of data that is grouped by month.
For each month of the year I need to retrieve the average monthly sales from the previous three months. For example, in January I would need the average monthly sales from October through December of the previous year. So the result will be something like:
1 12345.67
2 54321.56
3 242412.45
This is grouped by month number.
Here is a snippet of code from my query that will get me the current month's sales:
LEFT JOIN (SELECT SUM((sti.cost + sti.freight) * sti.case_qty * sti.release_qty)
AS trsf_cost,
DATE_PART('month', st.invoice_dt) as month
FROM stransitem sti,
stocktrans st
WHERE sti.invoice_no = st.invoice_no
AND st.invoice_dt >= date_trunc('year', current_date)
AND st.location_cd = 'SLC'
AND st.order_st != 'DEL'
GROUP BY month) as trsf_cogs ON trsf_cogs.month = totals.month
I need another join that will get me the same thing, only averaged from the previous 3 months, but I'm not sure how.
This will ALWAYS be a January-December (1-12) list, starting with January and ending with December.
This is a classic problem for a window function. Here is how to solve this:
SELECT month_nr
,(COALESCE(m1, 0)
+ COALESCE(m2, 0)
+ COALESCE(m3, 0))
/
NULLIF ( CASE WHEN m1 IS NULL THEN 0 ELSE 1 END
+ CASE WHEN m2 IS NULL THEN 0 ELSE 1 END
+ CASE WHEN m3 IS NULL THEN 0 ELSE 1 END, 0) AS avg_prev_3_months
-- or divide by 3 if 3 previous months are guaranteed or you don't care
FROM (
SELECT date_part('month', month) as month_nr
,lag(trsf_cost, 1) OVER w AS m1
,lag(trsf_cost, 2) OVER w AS m2
,lag(trsf_cost, 3) OVER w AS m3
FROM (
SELECT date_part( 'month', month) as trsf_cost -- some dummy nr. for demo
,month
FROM generate_series('2010-01-01 0:0'::timestamp
,'2012-01-01 0:0'::timestamp, '1 month') month
) x
WINDOW w AS (ORDER BY month)
) y;
This is requires that no month is ever missing! Else, have a look at this related answer:
How to compare the current row with next and previous row in PostgreSQL?
Calculates correct average for every month. If only two previous moths then devide by 2, etc. If no prev. months, result is NULL.
In your subquery, use
date_trunc('month', st.invoice_dt)::date AS month
instead of
DATE_PART('month', st.invoice_dt) as month
so you can sort months over the years easily!
More info
Window function lag()
date_trunc()