PostgreSQL query and data caching - sql

I have this SQL query:
SELECT p.timestamp,
COUNT(*) as total,
date_part('hour', p.timestamp) as hour
FROM parties as p
WHERE p.timestamp >= TIMESTAMP 'today' AND p.timestamp < TIMESTAMP 'tomorrow'
AND p.member_id = 1
GROUP BY p.timestamp, hour;
which will grouped how many people by hour:
+-------------------------+-------+------+
| Timestamp | Total | Hour |
+-------------------------+-------+------+
| 2018-11-21 12:00:00+07 | 10 | 12 |
| 2018-11-21 13:00:00+07 | 2 | 13 |
| 2018-11-21 14:00:00+07 | 2 | 14 |
| 2018-11-21 16:00:00+07 | 1 | 16 |
| 2018-11-21 17:00:00+07 | 21 | 17 |
| 2018-11-21 19:00:00+07 | 18 | 19 |
| 2018-11-21 20:00:00+07 | 8 | 20 |
| 2018-11-21 21:00:00+07 | 1 | 21 |
+-------------------------+-------+------+
My question is, if I refetch some API end point that will query above statement, would it be the data in the past hour cached automatically? because in my case, if there is a new data, it will update the last hour's row only.
If not how to cache it? Thanks in advance

PSQL can not cache result of query itself. The solution is cache the result at API application layer.
I prefer using redis to cache it. Using a hash with fields is year+month+day+hour and value is total online user of each hour. Example:
hash: useronline
field: 2018112112 - value: 10
field: 2018112113 - value: 2
You also set a timeout on key. After the timeout has expired, the key will automatically be deleted. I will set 1 hour in here.
EXPIRE useronline 3600
When have API request we will get result in redis cache first. If do not exist or expired call query to database layer to get result, save to redis cache again. Reponse result to client.
Here is list of redis clients suitable for programing language.

Related

Finding how many days left per user per year

I have a table that tracks leave days for each user:
ID | Start | End | IDUser
1 | 02-02-2020 | 03-02-2020 | 2
2 | 01-02-2020 | 21-02-2020 | 2
IDUser connects to the Users Table, that has IDUser and Username columns
I have a view / exhibition / query that shows previous mentioned columns data PLUS a column named UsedDays that counts how many leave days were used:
DATEDIFF(DAY, dbo.leavedays.start, dbo.leavedays.[end]) + 1
This is what I have now:
Start | End | IDUser | UsedDays
02-02-2020 | 03-02-2020 | 2 | 1
01-02-2020 | 21-02-2020 | 1 | 20
Each user has a total available number of days per year so I would like to have a column that subtracts from those total possible days of each user, and show how many are left.
Example:
John (IDUser = 2) has 30 days available this year and he already used 1, so there are 29 left
Start | End | IDUser | TotalDaysYear | UsedDays | LeftDays
02-02-2020 | 03-02-2020 | 2 | 30 | 1 | 29
01-02-2020 | 21-02-2020 | 1 | 20 | 20 | 0
I believe I have to create a table for TotalDaysYear, probably with:
ID | Year | TotalDaysYear | IDUser
1 | 2020 | 30 | 2
2 | 2020 | 20 | 1
IDUser connects to the Users Table, that has IDUser and Username columns
But I'm having trouble finding the logic for the relationship and how to find the result that I want, since it depends also on the year (available days may change per yer, per user).
Assuming you are using SQL Server, this should work:
SELECT
ld.start,
ld.[end],
ld.IDUser,
ldy.TotalDaysYear,
SUM(DATEDIFF(DAY, ld.start, ld.[end])+1) OVER (PARTITION BY ld.IDUser, YEAR(ld.start) ORDER BY ld.start) as UsedDays,
ldy.TotalDaysYear - SUM(DATEDIFF(DAY, ld.start, ld.[end])+1) OVER (PARTITION BY ld.IDUser, YEAR(ld.start) ORDER BY ld.start) as LeftDays
FROM leavedays ld
LEFT JOIN leavedaysperyear ldy
ON YEAR(ld.start) = ldy.Year AND ld.IDUser = ldy.IDUser
Basic idea is to have a running total of Used Days per user, per year and then subtract it to total available days for that user, during that same year.
Here's a SQLFiddle
NB. The example provided doesn't handle leave periods across years

Best way to pre-aggregate time-series data in postgres

I have a table of sent alerts as below:
id | user_id | sent_at
1 | 123 | 01/01/2020 12:09:39
2 | 452 | 04/01/2020 02:39:50
3 | 264 | 11/01/2020 05:09:39
4 | 123 | 16/01/2020 11:09:39
5 | 452 | 22/01/2020 16:09:39
Alerts are sparse and I've around 100 Million user_ids. This table has total ~500 Million entries (last 2 months).
I want to query alerts per user in last X hours/days/weeks/months for 10 million users_ids(saved in another table). I cannot use any external time-series database and it has to be done in postgres only.
I tried keeping hourly buckets for each user. But data is so sparse that I've too many rows (userIds*hours). For eg. Getting alerts count for 10 Million users in last 10 hours takes a long time from this table.
user_id | hour | count
123 | 01/01/2020 12:00:00 | 2
123 | 01/01/2020 10:00:00 | 1
234 | 11/01/2020 12:00:00 | 1
There are not many alerts per user, so an index on (user_id) should be sufficient.
However, you might as well put the time into it as well, so I would recommend (user_id, sent_at). This covers the where clause of your query. Postgres will still need to look up the original data pages to check for changes to the data.

SQL table join based on aggregated/equal values

I have a very horrible two table dataset to hand that I need to create a join query for. Its best to show an example:
+------+---------+-----------+--+
| Time | Sent | Received | |
+------+---------+-----------+--+
| 1 | 100 | NULL | |
| 2 | NULL | 100 | |
| 3 | 50 | NULL | |
| 4 | NULL | 40 | |
| 5 | NULL | 10 | |
| 6 | 400 | 200 | |
| 7 | 100 | 200 | |
| 8 | NULL | 100 | |
| 9 | 500 | 500 | |
+------+---------+-----------+--+
Assuming 'time' above is in hours - 'Sent' shows the number of items sent in that hour, and 'Received' shows the number received. The problem being that they likely will not arrive in the same hour they were sent (though they can).
I need to match the received against the appropriate sent to find the time the received item was Sent.
Using the above:
Received 100 at time 2 is obviously the items sent from hour 1, so
that would be assigned to hour 1.
50 Sent in time 3 arrived in two batches (40 and 10 in time 4/5 respectively). So received 40/10 should be lumped into the time 3 category
Received in 6/7 (each for 200) correspond to the 400 order in time 6 (note that half the order was received in the same hour, this can happen)
Also in time 7 a new order was sent which corresponds to received for time 8
Also in time 9 an order of 500 was sent and received in the same hour.
Below is an example of what the output would look like (Note that there are other values associated with each 'Received' row but they are orthogonal to the task and will just be summed to provide meaning)
+------+----------+
| Time | Received |
+------+----------+
| 1 | 100 |
| 3 | 50 |
| 6 | 400 |
| 7 | 100 |
| 8 | 100 |
| 9 | 500 |
+------+----------+
I have been trying to rack my head around this for a while. If I could do this outside of sql I would have some function that loops through the value for each 'Sent' incrementally through time and loop that through 'Received' until the values match then assign those Received values to the Time index, then delete both the sent and received from the array (or note where the loop got to and continue from there)
Unfortunately the project doesnt allow the that scope - This must be done as much in SQL as possible. I am really at a loss and hoping there is some SQL functionality I have overlooked. Any help is much appreciated
If this is in SQL Server, you can use a WHILE loop. Look at the documentation. So, your project might look something like this:
CREATE TABLE #temp ([Time] int, [Received] int)
DECLARE #i int = 1
DECLARE #value int = 0
WHILE #i <= 9
BEGIN
#value = SELECT [Received] FROM [table] WHERE [Time] = #i
--Your logic here
INSERT INTO #temp ...
END
SELECT * FROM #temp
DROP TABLE #temp

Apply Limit for a Condition

I have a query that returns the credit notes (CN) and debit notes (DN) of an operation, each CN is accompanied by two or more DN (referenced by the field payment_plan_id). At the time of paging, for example I must bring 10 operations, that is 10 CN and their DN, but if I leave the limit at 10, it will also count the debit notes of the transaction that I must return in the query. So, it will only bring me 2, 3 or 4 operations depending on the number of DNs that accompany the credit note.
SELECT
value, installment, payment_plan_id, model,
creation_date, operation
FROM payment_plant
WHERE model != 'IMMEDIATE'
AND operation IN ('CN', 'DN')
AND creation_date BETWEEN '2017-06-12' AND '2017-07-12 23:59:59'
ORDER BY
model,
creation_date,
operation
LIMIT 10
OFFSET 1
Example of the table obviating some fields:
| id | payment_plan_id | value | installment | operation |
---------------------------------------------------------
| 1 | b3cdaede | 12 | 1 | NC |
| 2 | b3cdaede | 3.5 | 1 | ND |
| 3 | b3cdaede | 1.2 | 1 | ND |
| 4 | e1d7f051 | 36 | 1 | NC |
| 5 | e1d7f051 | 5.9 | 1 | ND |
| 6 | 00e6a0b4 | 15 | 1 | NC |
| 7 | 00e6a0b4 | 1 | 1 | ND |
| 8 | 00e6a0b4 | 3.6 | 1 | ND |
How can I limit the Limit so that it only consider the NCs?
Well, the query you give above doesn't do remotely what you describe. Assuming you actually want "the last 10 CN and their DN". You also don't explain what fields CN and DN have in common, so I'm going to assume that the fields are payment_plan_id and installment. Given that here's how you would get it:
WITH last_10_cn AS (
SELECT
value, installment, payment_plan_id, model,
creation_date
FROM payment_plant
WHERE model != 'IMMEDIATE'
AND operation = 'CN'
AND creation_date BETWEEN '2017-06-12' AND '2017-07-12 23:59:59'
ORDER BY
model,
creation_date,
operation
LIMIT 10
OFFSET 1 )
SELECT last_10_cn.*,
dn.value as dn_value, dn.model as dn_model,
dn.creation_date as dn_creation_date
FROM last_10_cn JOIN payment_plant as dn
ON last_10_cn.payment_plan_id = dn.payment_plan_id
AND last_10_cn.installment = dn.installment
ORDER BY
last_10_cn.model,
last_10_cn.creation_date,
last_10_cn.operation
dn.creation_date;
Adjust the above according to the actual join conditions and how you really want things to be sorted.
BTW, your table structure is what's giving you trouble here. DNs should really be a separate table with a foreign key to CNs. I realize that's not how most GLs do it, but the GL model predates relational databases.

Group based on time difference between two date values

I've searched around, but haven't been able to find anyone else with this same question.
I'm working with SQL Server (2008 R2).
Let's say I have the following three rows of data coming back from my query. What I need to do is group the first two rows into one (in either SQL Server or SSRS) based on the difference in minutes between the Start Time and the End Time (the Duration). How much time elapses between one row's End Time and the next row's Start Time is of no concern; I'm only looking at Duration.
Current result set:
+---------+------------+------------+----------+
| Vehicle | Start Time | End Time | Duration |
+---------+------------+------------+----------+
| 12 | 1:56:30 AM | 2:07:47 AM | 11 |
+---------+------------+------------+----------+
| 12 | 2:07:57 AM | 6:46:08 AM | 279 |
+---------+------------+------------+----------+
| 19 | 2:55:02 PM | 3:45:59 PM | 53 |
+---------+------------+------------+----------+
Desired result set:
+---------+------------+------------+----------+
| Vehicle | Start Time | End Time | Duration |
+---------+------------+------------+----------+
| 12 | 1:56:30 AM | 6:46:08 AM | 290 |
+---------+------------+------------+----------+
| 19 | 2:55:02 PM | 3:45:59 PM | 53 |
+---------+------------+------------+----------+
I feel like it has to be a matter of grouping, but I'm not sure how to group based on whether or not the start and end times are less than 15 minutes apart.
How can this be accomplished?
Unless I misunderstood your question, try this
Select Vehicle
,StartTime = min(StartTime)
.EndTime = max(EndTime)
,Duration = sum(Duration)
From YourTable
Group By Vehicle