Access Query - Sum of Field based on newest for each day - sql

I have the following data:
Site # | Site Name | Product | Reading Date | Volume
1 | Cambridge | Regular | 02/21/17 08:00 | 40000
2 | Cambridge | Regular | 02/22/17 07:00 | 35000
3 | Cambridge | Regular | 02/22/17 10:00 | 30000
What I want to achieve is get the SUM of [Volume] of the last 30 days while taking the newest reading EACH day possible since its pretty inconsistent whether one day there are 1,2 or 3 readings. I have tried a couple of things but can't get it to work.
This is what I've tried:
SELECT [Site #], Product, Sum(Volume) AS SumOfVolume, DatePart("d",InventoryDate]) AS Day
FROM [Circle K New]
GROUP BY [Site #], Product, Day
HAVING (([Site #]=852446) AND (Product ="Diesel Lows"))
ORDER BY DatePart("d",[Inventory Date]) DESC;
Result:
It adds the two readings of the same day. I was/am thinking about just getting a daily average then finding the monthly average from that. But I'm unsure if the value changes affect average numbers.

Based on your description:
select sum(volume)
from data as d
where d.readingdate in (select min(d2.readingdate)
from data as d2
group by int(d2.readingdate)
) and
d.readingdate >= dateadd("d", -30, date());

Related

Querying the retention rate on multiple days with SQL

Given a simple data model that consists of a user table and a check_in table with a date field, I want to calculate the retention date of my users. So for example, for all users with one or more check ins, I want the percentage of users who did a check in on their 2nd day, on their 3rd day and so on.
My SQL skills are pretty basic as it's not a tool that I use that often in my day-to-day work, and I know that this is beyond the types of queries I am used to. I've been looking into pivot tables to achieve this but I am unsure if this is the correct path.
Edit:
The user table does not have a registration date. One can assume it only contains the ID for this example.
Here is some sample data for the check_in table:
| user_id | date |
=====================================
| 1 | 2020-09-02 13:00:00 |
-------------------------------------
| 4 | 2020-09-04 12:00:00 |
-------------------------------------
| 1 | 2020-09-04 13:00:00 |
-------------------------------------
| 4 | 2020-09-04 11:00:00 |
-------------------------------------
| ... |
-------------------------------------
And the expected output of the query would be something like this:
| day_0 | day_1 | day_2 | day_3 |
=================================
| 70% | 67 % | 44% | 32% |
---------------------------------
Please note that I've used random numbers for this output just to illustrate the format.
Oh, I see. Assuming you mean days between checkins for users -- and users might have none -- then just use aggregation and window functions:
select sum( (ci.date = ci.min_date)::numeric ) / u.num_users as day_0,
sum( (ci.date = ci.min_date + interval '1 day')::numeric ) / u.num_users as day_1,
sum( (ci.date = ci.min_date + interval '2 day')::numeric ) / u.num_users as day_2
from (select u.*, count(*) over () as num_users
from users u
) u left join
(select ci.user_id, ci.date::date as date,
min(min(date::date)) over (partition by user_id order by date) as min_date
from checkins ci
group by user_id, ci.date::date
) ci;
Note that this aggregates the checkins table by user id and date. This ensures that there is only one row per date.

BigQuery: Repeat the same calculated value in multiple rows

I'm trying to get several simple queries into one new table using Googe Big Query. In the final table is existing revenue data per day (that I can simply draw from another table). I then want to calculate the average revenue per day of the current month and continue this value until the end of the month. So the final table is updated every day and includes actual data and forecasted data.
So far, I came up with the following, which generates an error message in combination: Scalar subquery produced more than one element
#This gives me the date, the revenue per day and the info that it's actual data
SELECT
date, sum(revenue), 'ACTUAL' as type from `project.dataset.table` where date >"2020-01-01" and date < current_date() group by date
union distinct
# This shall provide the remaining dates of the current month
SELECT
(select calendar_date FROM `project.dataset.calendar_table` where calendar_date >= current_date() and calendar_date <=DATE_SUB(DATE_TRUNC(DATE_ADD(CURRENT_DATE(), INTERVAL 1 MONTH), MONTH), INTERVAL 1 DAY)),
#This shall provide the average revenue per day so far and write this value for each day of the remaining month
(SELECT avg(revenue_daily) FROM
(select sum(revenue) as revenue_daily from `project.dataset.table` WHERE date > "2020-01-01" and extract(month from date) = extract (month from current_date()) group by date) as average_daily_revenue where calendar >= current_date()),
'FORECAST'
How I wish the final data looks like:
+------------+------------+----------+
| date | revenue | type |
+------------+------------+----------+
| 01.04.2020 | 100 € | ACTUAL |
| … | 5.000 € | ACTUAL |
| 23.04.2020 | 200 € | ACTUAL |
| 24.04.2020 | 230,43 € | FORECAST |
| 25.04.2020 | 230,43 € | FORECAST |
| 26.04.2020 | 230,43 € | FORECAST |
| 27.04.2020 | 230,43 € | FORECAST |
| 28.04.2020 | 230,43 € | FORECAST |
| 29.04.2020 | 230,43 € | FORECAST |
| 30.04.2020 | 230,43 € | FORECAST |
+------------+------------+----------+
The forecast value is simply the sum of the actual revenue of the month divided by the number of days the month had so far.
Thanks for any hint on how to approach this.
I just figured something out, which creates the data I need. I'll still work on updating this every day automatically. But this is what I got so far:
select
date, 'actual' as type, sum(revenue) as revenue from `project.dataset.revenue` where date >="2020-01-01" and date < current_date() group by date
union distinct
select calendar_date, 'forecast',(SELECT avg(revenue_daily) FROM
(select sum(revenue) as revenue_daily from `project.dataset.revenue` WHERE extract(year from date) = extract (year from current_date()) and extract(month from date) = extract (month from current_date()) group by date order by date) as average_daily_revenue), FROM `project.dataset.calendar` where calendar_date >= current_date() and calendar_date <=DATE_SUB(DATE_TRUNC(DATE_ADD(CURRENT_DATE(), INTERVAL 1 MONTH), MONTH), INTERVAL 1 DAY) order by date

Compare data from different timestamps

What I have is a table with restaurants, a restaurant is either active (1) or inactive (0). This can change weekly. I want to determine how many restaurants have been deactivated since last week. E.g. if a restaurant was active = 1 in Week 50, but active = 0 in Week 51 then it should be counted. Hence, I want to compare active on a weekly basis.
My table looks like this:
restaurant | week_nm | active | date
-----------------------------------------
rest1 | 50 | 1 | xxx-xx-xx
rest1 | 51 | 0 | xxx-xx-xx
rest2 | 50 | 1 | xxx-xx-xx
rest2 | 51 | 1 | xxx-xx-xx
rest3 | 50 | 1 | xxx-xx-xx
rest3 | 51 | 0 | xxx-xx-xx
What I want to have is this:
week_nm | restaurants_deactivated
---------------------------------
51 | 2
A count of restaurants that went from active = 1 to active = 0.
As jarlh commented correctly, taking the year change into account will be a bit complicated. So if the date column (btw: a horrible name for a column) correctly reflects the week in which the the restaurant was active or not, then you can use the ISO year/week combination (derived from that date) to properly deal with the year change:
select to_char(r1.date, 'iyyy-iw'), count(*)
from rest r1
where to_char(r1.date, 'iyyy-iw') = '2015-51'
and not active
and exists (select 1
from rest r2
where to_char(to_date(to_char(r1.date, 'iyyy-iw'), 'iyyy-iw') - 7, 'iyyy-iw') = to_char(r2.date, 'iyyy-iw')
and r2.restaurant = r1.restaurant
and r2.active)
group by to_char(date, 'iyyy-iw')
SQLFiddle: http://sqlfiddle.com/#!15/8da24/2
to_char(r1.date, 'iyyy-iw') calculates the year and week number based on the ISO definition of the week in the year. This returns e.g. 2015-51 for 2015-12-21.
The part:
where to_char(r1.date, 'iyyy-iw') = '2015-51'
and not active
retrieves all rows from week 51 where the restaurant was not active (this assumes that active is a boolean column).
The tricky part is to calculate the "previous" week. This is done using the expression:
to_char(to_date(to_char(r1.date, 'iyyy-iw'), 'iyyy-iw') - 7, 'iyyy-iw')
The "date" 2015-51 is converted back to a date, which result in the first day of that week. Then 7 days are subtracted and the result of that date, is then converted back into the year/week display. This is then used in the co-related subquery. The effect of that is, that it returns all rows that have been active in the "previous" week (where exists (...))
This should work from December to January as well (just keep in mind that e.g. the ISO week #1 in 2016 starts on January 4th).
try this
select top 1 (a.[week_nm]) as [week_nm], (select count(b.active) from [as] b where b.[week_nm] = a.[week_nm] and b.active=0) as restaurants_deactivated from [as] a order by a.[week_nm] desc

Counting number of non-trading days/days without price changes

I have a table of closing prices for bonds over time, with the essential structure:
bond_id | tdate | price
---------+------------+-------
EIX1923 | 2014-01-01 | 100.12
EIX1923 | 2014-01-02 | 100.10
EIX1923 | 2014-01-05 | 100.10
EIX1923 | 2014-01-10 | 100.15
As you can see, I don't have prices for every day -- because the bond does not trade every day. I would like to count how often this occurs in a given year and, if the bond price hasn't changed between consecutive days, I take that as the same result.
That is, for a year with N trading days (excluding weekends, ignoring holidays), I would essentially want to generate a series of dates and count how many days the price is (1) unchanged from the previous day or (2) is not recored for that day and divide it over N.
I'm using PostgreSQL, so I started out with generate_series('2014-01-01'::timestamp, '2015-01-01'::timestamp, '1 day'::interval); I can SELECT from this series and do a WHERE to exclude weekends:
SELECT dd
FROM generate_series(
'2014-01-01'::timestamp,
'2015-01-01'::timestamp,
'1 day'::timestamp
) dd
WHERE EXTRACT(dow FROM dd) NOT IN (0, 6);
Now, I figure I would like to generate a "column" of bond_id to JOIN against the trade table with, but I'm not sure how. Essentially, I figured the simplest structure would be a LEFT JOIN so that I get something like:
EIX1923 | 2014-01-01 | 100.12
EIX1923 | 2014-01-02 | 100.10
EIX1923 | 2014-01-03 |
EIX1923 | 2014-01-04 |
EIX1923 | 2014-01-05 | 100.10
EIX1923 | 2014-01-06 |
EIX1923 | 2014-01-07 |
EIX1923 | 2014-01-08 |
EIX1923 | 2014-01-09 |
EIX1923 | 2014-01-10 | 100.15
Then I could just fill in the gaps with the most recently available price and count the number of ABS(∆P) == 0 in application code. But if there are solutions to do this entirely in SQL that would be nice too! I have no idea if the approach above is the right one to go with.
(I didn't bother to check if the first days of January 2014 are weekends or not, since it's just for illustration here; but they would be excluded from the results, obviously).
EDIT: Seems there might be a number of similar questions already. Hope it's not too much of a duplicate!
EDIT: So, I played a bit more with this and this solution "works" (and I feel silly for not realizing sooner) in the above sense:
SELECT
'EI653670', dd, t.price
FROM
generate_series('2014-01-01'::timestamp, '2015-01-01'::timestamp, '1 day'::interval) dd
LEFT JOIN
trade t ON dd = t.tdate AND t.id = 'EI653670'
WHERE
EXTRACT(dow FROM dd) NOT IN (0, 6) ORDER BY dd;
Is there a better way?
I think you can do this logic with lag(). The following show the general idea -- get the previous date and price and do some logic:
select bond_id,
sum(case when prev_price = price
then date - prev_date + 1
when prev_date = date + interval '1 day'
then 0
else date - prev_date
end)
from (select t.*,
lag(t.date) over (partition by t.bond_id order by t.date) as prev_date,
lag(t.price) over (partition by t.bond_id order by t.date) as prev_price
trade t
) t
group by bond_id;
One caveat is that this probably won't handle boundary conditions the way that you want.

Summing one tables value and grouping it with a single value of another table

I have two tables.
Table 1: Actuals
A table with all the employess and how many hours they have worked in a given month
| ID | empID | hours | Month |
--------------------------------------
Table 2:
A target table which has hour target per month. The target refers to an amount that the sum of all the hours worked by the employees should meet.
| ID | hours target | Month |
-----------------------------------
Is it possible to return the sum of all table 1's hours with the target of table 2 and group it by the month and return it in a single data set?
Example
| Month | Actual Hours | hours target |
-----------------------------------------
| 1 | 320 | 350 |
etc.
Hope that is clear enough and many thanks for considering the question.
This should work:
SELECT t.[month], sum(a.[hours]) as ActualHours, t.[hourstarget]
FROM [TargetTable] t
JOIN [ActualsTable] a on t.[month] = a.[month]
GROUP BY t.[month], t.[hourstarget]
Written in plain English, you're saying "give me the sum of all hours accrued, grouped by the month (and also include the target hours for that month)".
WITH
t1 AS (SELECT mnth, targetHours FROM tblTargetHours),
t2 AS (SELECT mnth, sum(hours) AS totalhours FROM tblEmployeeHours GROUP BY mnth)
SELECT t1.mnth, t2.totalhours, t1.targethours
FROM t1, t2
WHERE t1.mnth = t2.mnth
results:
mnth totalhours targethours
1 135 350
2 154 350
3 128 350