My source data includes Transaction ID, Date, Amount. I need a one week trailing average which moves on a daily basis and averaging amount per transaction. Problem is, that sometimes there is no transactions in particuliar date, and I need avg per transaction, no per day, and trailing average moves by day, not by week.In this particular case I can't use OVER with rows preceding. I'm stack with it :(
Data looks like this:
https://gist.github.com/avitominoz/a252e9f1ab3b1d02aa700252839428dd
There are two methods to doing this. One uses generate_series() to get all the results. The second uses a lateral join.
with minmax as (
select min(trade_date) as mintd, max(trade_date) as maxtd
from sales
)
select days.dte, s.values,
avg(values) over (order by days.dte
rows between 6 preceding and current row
) as avg_7day
from generate_series(mintd, maxtd, interval '1 day') days(dte) left join
sales s
on s.trade_dte = days.dte;
Note: this ignores the values on missing days rather than treating them as 0. If you want 0, then use avg(coalesce(values, 0)).
Related
Need to calculated the Moving Range for a set of data without using group by clause. As I am calculating the average value and the previous avg value I need to take into account only existing values. I cant not use DIFFDATE(start-end).
Another constrains is that I need to do it at row-level as I need to have it as a pre-calculated value (denominator) to calculate the AVG Moving Range.
At the moment I am using window functions to calculate the average and previous averages.
ROUND(AVG(SUMCOUNTSFT3) OVER (partition by to_date(to_char(DATETIMEOFREADING, 'DD/MM/RR'))),2) as AVG_SUMCOUNTSFT3,
ROUND(AVG(SUMCOUNTSFT3) OVER (order by to_date(to_char(DATETIMEOFREADING, 'DD/MM/RR')) RANGE between interval '1' day preceding AND interval '1' day preceding),2) as LAG_VAL
Here is some sample data, as you can see I have multiple readings from a sensor. I have calculated the average for that day and for the previous day. Then I will have the difference between data points by |Xi - Xi-1|, the denominator is the column that I am trying to calculate. In some cases we will not have reading for a day if the sensor is failing and I need to discard those days if there is no data.
I believe a ROW_NUMBER() or DENSE_RANK() will do the job with a partition clause.
To put a long story short, I am working on a database using PostgreSQL that is managing yelp checkins. The checkintable has the attributes business_id(string), date(string in form yyyy-mm-dd), and time(string in form 00:00:00).
What I simply need to do is, given a business_id, I need to return a list of the total number of checkins based on just the mm (month) value.
So for instance, I need to retrieve the total checkins that were in Jan, Feb, March, April, etc, not based upon the year.
Any help is greatly appreciated. I've already considered group by clauses but I didn't know how to factor in '%mm%'.
Reiterating Gordon, class or not, storing dates and times as strings makes things harder, slower, and more likely to break. It's harder to take advantage of Postgres's powerful date math functions. Storing dates and times separately makes things even harder; you have to concatenate them together to get the full timestamp which means it will not be indexed. Determining the time between two events becomes unnecessarily difficult.
It should be a single timestamp column. Hopefully your class will introduce that shortly.
What I simply need to do is, given a business_id, I need to return a list of the total number of checkins based on just the mm (month) value.
This is deceptively straightforward. Cast your strings to dates, fortunately they're in ISO 8601 format so no reformatting is required. Then use extract to extract just the month part.
select
extract('month' from checkin_date::date) as month,
count(*)
from yelp_checkins
where business_id = ?
group by month
order by month
But there's a catch. What if there are no checkins for a business on a given month? We'll get no entry for that month. This is a pretty common problem.
If we want a row for every month, we need to generate a table with our desired months with generate_series, then left join with our checkin table. A left join ensures all the months (the "left" table) will be there even if there is no corresponding month in the join table (the "right" table).
select
months.month,
count(business_id)
from generate_series(1,12) as months(month)
left join yelp_checkins
on months.month = extract('month' from checkin_date::date)
and business_id = ?
group by months.month
order by months.month
Now that we have a table of months, we can group by that. We can't use a where business_id = ? clause or that will filter out empty months after the left join has happened. Instead we must put that as part of the left join.
Try it.
Why would you store the date as a string? That is a broken data model. You should fix the data.
That said, I recommend converting a date and truncating to the first day of the month:
select date_trunc('day', datestr::date) as yyyymm, count(*)
from t
group by yyyymm
order by yyyymm;
If you don't want these based on the year, then use extract():
select extract(month from datestr::date) as mm, count(*)
from t
group by mm
order by mm;
I am try to calculate the average since the last time stamp and pull all records where the average is greater than 3. My current query is:
SELECT AVG(BID)/BID AS Multiple
FROM cdsData
where Multiple > 3
and SqlUnixTime > 1492225582
group by ID_BB_RT;
I have a table cdsData and the unix time is april 15th converted. Finally I want the group by calculated within the ID as I show. I'm not sure why it's failing but it says that the field Multiple is unknown in the where clause.
I am try to calculate the average since the last time stamp and pull all records where the average is greater than 3.
I think your intention is correctly stated as follows, "I am trying to calculate the average since the last time stamp and select all rows where the average is greater than 3 times the individual bid".
In fact, a still better restatement of your objective would be, "I want to select all rows since the last time stamp, where the bid is less than 1/3rd the average bid".
For this, the steps are as follows:
1) A sub-query finds the average bid divided by 3, of rows since the last time stamp.
2) The outer query selects rows since the last time stamp, where the individual bid is < the value returned by the sub-query.
The following SQL statement does that:
SELECT BID
FROM cdsData
WHERE SqlUnixTime > 1492225582
AND BID <
(
SELECT AVG(BID) / 3
FROM cdsData
WHERE SqlUnixTime > 1492225582
)
ORDER BY BID;
1)
SQL is evaluated backwards, from right to left. So the where clause is parsed and evaluate prior to the select clause. Because of this the aliasing of AVG(BID)/BID to Multiple has not yet occurred.
You can try this.
SELECT AVG(BID)/BID AS Multiple
FROM cdsData
WHERE SqlUnixTime > 1492225582
GROUP BY ID_BB_RT Having (AVG(BID)/BID)>3 ;
Or
Select Multiple
From (SELECT AVG(BID)/BID AS Multiple
FROM cdsData
Where SqlUnixTime > 1492225582 group by ID_BB_R)X
Where multiple >3
2)
Once you corrected the above error, you will be having one more error:
Column 'BID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
To correct this you have to insert BID column in group by clause.
just trying to use the window function of doing a cumulative sum against a month as follows.
sum(MeterReading) over (partition by Serial, code order by month(MeterReadingDate)) as cumulative
this seems to be way to slow to run and doesn't bring any results after waiting, is there something i am doing wrong?
Basically I want to see the sum against each month for each serial/code.
Select
serial,
code,
DATEPART(YEAR,MeterReadingDate) as Year,
DATEPART(MONTH,MeterReadingDate) as Month,
sum(MeterReading) over (
partition by
Serial,
code,
Datepart(YEAR,MeterReadingDate),
Datepart(MONTH,MeterReadingDate)
) as cumulative
from table
First making a sum with an order by claus makes no sense since you want to add all results for one month together.
Second, adding the two dateparts for year and month will partition your data so that the sums only will add meter readings from one month.
If you are interested in seeing yearly variations per month then you can remove the Datepart(YEAR,...) and add some average perhaps.
sum(MeterReading) over (
partition by Serial,
code,
DATEADD(MONTH, DATEDIFF(MONTH, 0, MeterReadingDate), 0)
order by MeterReadingDate
) as cumulative
The function with DATEADD and DATEDIFF converts a date to the first of the month.
Then I add this function to the PARTITION to group by Serial, Code and Month.
I have a table of inventory transactions. I need to select the dates of the last few transactions, up until the adjusted quantity is greater then the current amount on hand in inventory.
I am dealing with three columns: item, transaction_date, adj_qty. Each item will have multiple transaction dates and adjustment quantities.
How do I return the all the transaction_dates for each item until the item reached a certain threshold (i.e. accumulate 100). Say the first item has 2000 transactions and the last five transactions have each a qty of 21. I would like the query to return transaction the last 5 because that is when the item reached 100.
If possible I'd like to do this without a loop or cursor.
Can anybody help?
What you need is a cumulative sum. This is built into SQL Server 2012.
Alas, with that, you need to do it with a self join:
select t.item, t.transaction_date, t.adj_qty,
sum(tprev.adj_qty) as CumSum
from t t join
t tprev
on t.item = tprev.item and
t.transaction_date >= tprev.transaction_date
group by t.item, t.transaction_date, t.adj_qty
having 100 between sum(tprev.adj_qty) -t.adj_qty + 1 and sum(tprev.adj_qty)
Notice the use of the self join and group by to do the cumulative sum. Not pleasant, but necessary without the order clause in the SUM() over (partition by) function. This cumulative sum adds everything up from the first record (by transaction date) for an item up to any other.
The HAVING clause then selects the row you are looking for, where the cumulative sum has increased passed some threshhold.