Calculating quantity of available resource from reservations database table - sql

Calculating quantity of available resource from reservations database table
I am working on a project where i need to store reservations of a certain resource. Before inserting a new reservation, it has to be checked whether, in the time interval specified, there is at least the quantity of resource asked. Reservations are stored in a relational database with the following schema:
id
owner
timestamp_start
timestamp_end
resource_quantity
1
Matt
2023/01/15-01:00
2023/01/15-02:00
40
2
Andrew
2023/01/15-03:00
2023/01/15-10:00
30
3
Mary
2023/01/15-04:00
2023/01/15-07:00
10
4
Ann
2023/01/15-05:00
2023/01/15-06:00
8
5
Mia
2023/01/15-09:00
2023/01/15-13:00
8
6
Rick
2023/01/15-11:00
2023/01/15-12:00
4
Here is a visual representation of the aforementioned data set.
I have two questions:
How could I calculate the maximum quantity of available resource from 2023/01/15-00:00 and 2023/01/15:23:59? It should be an extension of finding all the reservations that overlap with them other, and progressively calculating the sum of their associated resource quantity.
Do you think this is the proper way to handle this sort of data? Perhaps I should consider using another temporal model?

Given an interval #interval_start and #interval_end (both are timestamps) the overlapping rows will be found with:
SELECT *
FROM <your table>
WHERE
timestamp_start <= #interval_end AND timestamp_end >= #interval_start
Every time there's a "start" you must add resources, and every time there's an "end" you must subtract resources. For this reason you must have a sequence of events like this:
SELECT
timestamp_start as event_timestamp,
resource_quantity
FROM <your table>
WHERE
timestamp_start <= #interval_end AND timestamp_end >= #interval_start
UNION ALL
select
timestamp_end as event_timestamp,
(-1) * resource_quantity as resource_quantity
FROM <your table>
WHERE
timestamp_start <= #interval_end AND timestamp_end >= #interval_start
ORDER BY event_timestamp
At this point you write a procedural SQL which scrolls down this table's rows and, starting from 0, adds the resource_quantity and keeps the maximum value found if the event_timestamp is inside the given interval.
If you need I can help you write the procedure

Related

Avoiding roundtrips in the database caused by looping

I am using postgres and, I recently encountered that the code I am using has too many roundtrips.
What I am doing is basically getting data from a table on a daily basis because I have to look for changes on a daily basis, but the whole function that does this job is called once a month.
An example of my table
Amount
Id | Itemid | Amount | Date
1 | 2 | 50 | 20-5-20
Now this table can be updated to add items at any point in time and I have to see the total amount that is SUM(Amount) every day.
But here's the catch, I have to add interest to the amount of each day at the rate of 5%.
So I can't just once call the function, I have to look at its value every day.
For example if I add an item of 50$ on the 1st of may then the interest on that day is 5/100*50
I add another item on the 5th of may worth 50$ and now the interest on the 5th day is 5/100*50.
But prior to 5th, the interest was on only 50$ so If I just simply use SUM(Amount)*5/100. It is wrong.
Also, another issue is the fact that dates are stored as timestamps and I need to group it by date of the timestamp because if I group it on the basis of timestamp then it will create multiple rows for the same date which I want to avoid while taking the sum.
So if there are two entries on the same date but different hours ideally the query should sum it up as one single date.
Example
Amount Table
Date | Amount
2020-5-5 20:8:8 100
2020-5-5 7:8:8 | 100
Result should be
Amount Table
Date | Amount
2020-5-5 200
My current code.
for i in numberofdaysinthemonth:
amount = amount + session.query(func.sum(Amount.Amount)).filter(Amount.date<current_date).scalar() * 5/100
I want a query that gets all these values according to dates, for example
date | Sum of amount till that date
20-5-20 | 50
20-6-20 | 100
Any ideas about what I should do to avoid a loop that runs 30 times since the function is called once in a month.
I am supposed to get all this data in a table daywise and aggregated as the sum of amount for each day
That is a simple "running total"
select "date",
sum(amount) over (order by "date") as amount_til_date
from the_table
order by "date";
If you need the amount per itemid
select "date",
sum(amount) over (partition by itemid order by "date") as amount_til_date
from the_table
order by "date";
If you also need to calculate the "compound interest rate" up to that day, you can do that as well:
select item_id,
"date",
sum(amount) over (partition by itemid order by "date") as amount_til_date,
sum(amount) over (partition by item_id order by "date") * power(1.05, count(*) over (partition by item_id order by "date")) as compound_interest
from the_table
order by "date";
To get that for a specific month, add a WHERE clause:
where "date" >= date '2020-06-01'
and "date" < date '2020-07-01'
In general to avoid round trips between application and database, application code must be moved from application to database in stored code (stored procedures an stored functions) using a procedural language. This approach is sometimes called "thick database" in commercial databases like Oracle Database.
PostgreSQL default procedural language is pl/pgsql but you can use Java, Perl, Python, Javascript using PostgreSQL extensions that you would need to install in PostgreSQL.

oracle sql: efficient way to calculate business days in a month

I have a pretty huge table with columns dates, account, amount, etc. eg.
date account amount
4/1/2014 XXXXX1 80
4/1/2014 XXXXX1 20
4/2/2014 XXXXX1 840
4/3/2014 XXXXX1 120
4/1/2014 XXXXX2 130
4/3/2014 XXXXX2 300
...........
(I have 40 months' worth of daily data and multiple accounts.)
The final output I want is the average amount of each account each month. Since there may or may not be record for any account on a single day, and I have a seperate table of holidays from 2011~2014, I am summing up the amount of each account within a month and dividing it by the number of business days of that month. Notice that there is very likely to be record(s) on weekends/holidays, so I need to exclude them from calculation. Also, I want to have a record for each of the date available in the original table. eg.
date account amount
4/1/2014 XXXXX1 48 ((80+20+840+120)/22)
4/2/2014 XXXXX1 48
4/3/2014 XXXXX1 48
4/1/2014 XXXXX2 19 ((130+300)/22)
4/3/2014 XXXXX2 19
...........
(Suppose the above is the only data I have for Apr-2014.)
I am able to do this in a hacky and slow way, but as I need to join this process with other subqueries, I really need to optimize this query. My current code looks like:
<!-- language: lang-sql -->
select
date,
account,
sum(amount/days_mon) over (partition by last_day(date))
from(
select
date,
-- there are more calculation to get the account numbers,
-- so this subquery is necessary
account,
amount,
-- this is a list of month-end dates that the number of
-- business days in that month is 19. similar below.
case when last_day(date) in ('','',...,'') then 19
when last_day(date) in ('','',...,'') then 20
when last_day(date) in ('','',...,'') then 21
when last_day(date) in ('','',...,'') then 22
when last_day(date) in ('','',...,'') then 23
end as days_mon
from mytable tb
inner join lookup_businessday_list busi
on tb.date = busi.date)
So how can I perform the above purpose efficiently? Thank you!
This approach uses sub-query factoring - what other RDBMS flavours call common table expressions. The attraction here is that we can pass the output from one CTE as input to another. Find out more.
The first CTE generates a list of dates in a given month (you can extend this over any range you like).
The second CTE uses an anti-join on the first to filter out dates which are holidays and also dates which aren't weekdays. Note that Day Number varies depending according to the NLS_TERRITORY setting; in my realm the weekend is days 6 and 7 but SQL Fiddle is American so there it is 1 and 7.
with dates as ( select date '2014-04-01' + ( level - 1) as d
from dual
connect by level <= 30 )
, bdays as ( select d
, count(d) over () tot_d
from dates
left join holidays
on dates.d = holidays.hol_date
where holidays.hol_date is null
and to_number(to_char(dates.d, 'D')) between 2 and 6
)
select yt.account
, yt.txn_date
, sum(yt.amount) over (partition by yt.account, trunc(yt.txn_date,'MM'))
/tot_d as avg_amt
from your_table yt
join bdays
on bdays.d = yt.txn_date
order by yt.account
, yt.txn_date
/
I haven't rounded the average amount.
You have 40 month of data, this data should be very stable.
I will assume that you have a cold body (big and stable easily definable range of data) and hot tail (small and active part).
Next, I would like to define a minimal period. It is a data range that is a smallest interval interesting for Business.
It might be year, month, day, hour, etc. Do you expect to get questions like "what was averege for that account between 1900 and 12am yesterday?".
I will assume that the answer is DAY.
Then,
I will calculate sum(amount) and count() for every account for every DAY of cold body.
I will not create a dummy records, if particular account had no activity on some day.
and I will save day, account, total amount, count in a TABLE.
if there are modifications later to the cold body, you delete and reload affected day from that table.
For hot tail there might be multiple strategies:
Do the same as above (same process, clear to support)
always calculate on a fly
use materialized view as an averege between 1 and 2.
Cold body table totalc could also be implemented as materialized view, but if data never change - no need to rebuild it.
With this you go from (number of account) x (number of transactions per day) x (number of days) to (number of account)x(number of active days) number of records.
That should speed up all following calculations.

Enhancing Performance

I'm not as clued up on shortcuts in SQL so I was hoping to utilize the brainpower on here to help speed up a query I'm using. I'm currently using Oracle 8i.
I have a query:
SELECT
NAME_CODE, ACTIVITY_CODE, GPS_CODE
FROM
(SELECT
a.NAME_CODE, b.ACTIVITY_CODE, a.GPS_CODE,
ROW_NUMBER() OVER (PARTITION BY a.GPS_DATE ORDER BY b.ACTIVITY_DATE DESC) AS RN
FROM GPS_TABLE a, ACTIVITY_TABLE b
WHERE a.NAME_CODE = b.NAME_CODE
AND a.GPS_DATE >= b.ACTIVITY_DATE
AND TRUNC(a.GPS_DATE) > TRUNC(SYSDATE) - 2)
WHERE
RN = 1
and this takes about 7 minutes give or take 10 seconds to run.
Now the GPS_TABLE is currently 6.586.429 rows and continues to grow as new GPS coordinates are put into the system, each day it grows by about 8.000 rows in 6 columns.
The ACTIVITY_TABLE is currently 1.989.093 rows and continues to grow as new activities are put into the system, each day it grows by about 2.000 rows in 31 columns.
So all in all these are not small tables and I understand that there will always be a time hit running this or similar queries. As you can see I'm already limiting it to only the last 2 days worth of data, but anything to speed it up would be appreciated.
Your strongest filter seems to be the filter on the last 2 days of GPS_TABLE. It should filter the GPS_TABLE to about 15k rows. Therefore one of the best candidate for improvement is an index on the column GPS_DATE.
You will find that your filter TRUNC(a.GPS_DATE) > TRUNC(SYSDATE) - 2 is equivalent to a.GPS_DATE > TRUNC(SYSDATE) - 2, therefore a simple index on your column will work if you change the query. If you can't change it, you could add a function-based index on TRUNC(GPS_DATE).
Once you have this index in place, we need to access the rows in ACTIVITY_TABLE. The problem with your join is that we will get all the old activities and therefore a good portion of the table. This means that the join as it is will not be efficient with index scans.
I suggest you define an index on ACTIVITY_TABLE(name_code, activity_date DESC) and a PL/SQL function that will retrieve the last activity in the least amount of work using this index specifically:
CREATE OR REPLACE FUNCTION get_last_activity (p_name_code VARCHAR2,
p_gps_date DATE)
RETURN ACTIVITY_TABLE.activity_code%type IS
l_result ACTIVITY_TABLE.activity_code%type;
BEGIN
SELECT activity_code
INTO l_result
FROM (SELECT activity_code
FROM activity_table
WHERE name_code = p_name_code
AND activity_date <= p_gps_date
ORDER BY activity_date DESC)
WHERE ROWNUM = 1;
RETURN l_result;
END;
Modify your query to use this function:
SELECT a.NAME_CODE,
a.GPS_CODE,
get_last_activity(a.name_code, a.gps_date)
FROM GPS_TABLE a
WHERE trunc(a.GPS_DATE) > trunc(sysdate) - 2
Optimising an SQL query is generally done by:
Add some indexes
Try a different way to get the same information
So, start by adding an index for ACTIVITY_DATE, and perhaps some other fields that are used in the conditions.

Postgres SQL select a range of records spaced out by a given interval

I am trying to determine if it is possible, using only sql for postgres, to select a range of time ordered records at a given interval.
Lets say I have 60 records, one record for each minute in a given hour. I want to select records at 5 minute intervals for that hour. The resulting rows should be 12 records each one 5 minutes apart.
This is currently accomplished by selecting the full range of records and then looping thru the results and pulling out the records at the given interval. I am trying to see if I can do this purly in sql as our db is large and we may be dealing with tens of thousands of records.
Any thoughts?
Yes you can. Its really easy once you get the hang of it. I think its one of jewels of SQL and its especially easy in PostgreSQL because of its excellent temporal support. Often, complex functions can turn into very simple queries in SQL that can scale and be indexed properly.
This uses generate_series to draw up sample time stamps that are spaced 1 minute apart. The outer query then extracts the minute and uses modulo to find the values that are 5 minutes apart.
select
ts,
extract(minute from ts)::integer as minute
from
( -- generate some time stamps - one minute apart
select
current_time + (n || ' minute')::interval as ts
from generate_series(1, 30) as n
) as timestamps
-- extract the minute check if its on a 5 minute interval
where extract(minute from ts)::integer % 5 = 0
-- only pick this hour
and extract(hour from ts) = extract(hour from current_time)
;
ts | minute
--------------------+--------
19:40:53.508836-07 | 40
19:45:53.508836-07 | 45
19:50:53.508836-07 | 50
19:55:53.508836-07 | 55
Notice how you could add an computed index on the where clause (where the value of the expression would make up the index) could lead to major speed improvements. Maybe not very selective in this case, but good to be aware of.
I wrote a reservation system once in PostgreSQL (which had lots of temporal logic where date intervals could not overlap) and never had to resort to iterative methods.
http://www.amazon.com/SQL-Design-Patterns-Programming-Focus/dp/0977671542 is an excellent book that goes has lots of interval examples. Hard to find in book stores now but well worth it.
Extract the minutes, convert to int4, and see, if the remainder from dividing by 5 is 0:
select *
from TABLE
where int4 (date_part ('minute', COLUMN)) % 5 = 0;
If the intervals are not time based, and you just want every 5th row; or
If the times are regular and you always have one record per minute
The below gives you one record per every 5
select *
from
(
select *, row_number() over (order by timecolumn) as rown
from tbl
) X
where mod(rown, 5) = 1
If your time records are not regular, then you need to generate a time series (given in another answer) and left join that into your table, group by the time column (from the series) and pick the MAX time from your table that is less than the time column.
Pseudo
select thetimeinterval, max(timecolumn)
from ( < the time series subquery > ) X
left join tbl on tbl.timecolumn <= thetimeinterval
group by thetimeinterval
And further join it back to the table for the full record (assuming unique times)
select t.* from
tbl inner join
(
select thetimeinterval, max(timecolumn) timecolumn
from ( < the time series subquery > ) X
left join tbl on tbl.timecolumn <= thetimeinterval
group by thetimeinterval
) y on tbl.timecolumn = y.timecolumn
How about this:
select min(ts), extract(minute from ts)::integer / 5
as bucket group by bucket order by bucket;
This has the advantage of doing the right thing if you have two readings for the same minute, or your readings skip a minute. Instead of using min even better would be to use one of the the first() aggregate functions-- code for which you can find here:
http://wiki.postgresql.org/wiki/First_%28aggregate%29
This assumes that your five minute intervals are "on the fives", so to speak. That is, that you want 07:00, 07:05, 07:10, not 07:02, 07:07, 07:12. It also assumes you don't have two rows within the same minute, which might not be a safe assumption.
select your_timestamp
from your_table
where cast(extract(minute from your_timestamp) as integer) in (0,5);
If you might have two rows with timestamps within the same minute, like
2011-01-01 07:00:02
2011-01-01 07:00:59
then this version is safer.
select min(your_timestamp)
from your_table
group by (cast(extract(minute from your_timestamp) as integer) / 5)
Wrap either of those in a view, and you can join it to your base table.

Query to obtain average each 30 seconds

How can i calculate average of each 30 second? The following is the table structure
Price TTime
every minute 5-60 records inserted. The time is inserted by getDate(). I have to calculate average of every 30 seconds.
You need to do 2 things:
Create a column (in your SELECT result, not in the table) that contains the time in half-minutes;
calculate the average of Price using AVG(Price) and GROUP BY:
SELECT <function returning half minutes from TTime> AS HalfMinute, AVG(Price) FROM <Table> GROUP BY HalfMinute`
I don't know SQL Server's time functions. If you can get the time returned in seconds, you could go with SECONDS/30. Maybe someone else can step in with details here.
Something like:
SELECT
AVG(Price) AS AvgPrice,
COUNT(Price) AS CountPrice,
MIN(TTIME) AS PeriodBegin,
(SECOND(TTime) % 30) * 30 AS PeriodType /* either 0 or 30 */
FROM
PriceTable
GROUP BY
YEAR(TTime), MONTH(TTime), DAY(TTime), HOUR(TTime), MINUTE(TTime)
SECOND(TTime) % 30 /* either 0 or 1 */
ORDER BY
MIN(TTime)
In place of:
GROUP BY
YEAR(TTime), MONTH(TTime), DAY(TTime), HOUR(TTime), MINUTE(TTime)
you could also use, for example:
GROUP BY
LEFT(CONVERT(varchar, TTime, 120), 16)
In any case these are operation that invoke a table scan, since they are not indexable. A WHERE clause to determine the valid TTime range is advisable.
You could also make a column that contains the calculated date ('…:00.000' or '…:30.000') and fill that on INSERT with help of a trigger. Place an index on it, GROUP BY it, done.