Postgres Calculate Difference Using Window Functions

Postgres Calculate Difference Using Window Functions - sql

I apologize in advance if the question is too basic. Window functions are fun and challenging at the same time!
I have two Postgres tables such as below called client and order.
id | name
------------
41 | james
29 | melinda
36 | henry
...
id | date | volume | client_id
------------------------------
328 | 2018-01-03 | 16 | 41
411 | 2018-01-29 | 39 | 29
129 | 2018-01-13 | 73 | 29
542 | 2018-01-22 | 62 | 36
301 | 2018-01-17 | 38 | 41
784 | 2018-01-08 | 84 | 29
299 | 2018-01-10 | 54 | 36
300 | 2018-01-10 | 18 | 36
178 | 2018-01-30 | 37 | 36
...
a) How can I write a query to find the largest difference in order volume for each client? For example, client_id = 36 should show (54 + 18) - 37 = -35. This is because orders placed on the same day by the same client should count as one order.
b) How can I find the difference in volume between the two most recent orders for each client? For example, client_id = 29 should show 39 - 73 = -34

Well here is a T-SQL.
For this formula as you said ---> Max(total volume each day) - Min(total volume each day)
May help you.
SELECT (X.Max(SumV)-X.Min(SumV))
From (
SELECT Client_Id,Date,SUM(Volume) AS SumV
FROM Orders
GROUP BY Client_id,Date
) X
Group by X.Client_Id

Related

How to count how many times a specific value appeared on each columns and group by range

I'm new on postgres and I have a question:
I have a table with 100 columns. I need to count the values from each columns and count how many times they appeared, so I can group then based on the range that they fit
I have a table like this(100 columns)
+------+------+------+------+------+---------+--------+
| Name | PRB0 | PRB1 | PRB2 | PRB3 | ....... | PRB100 |
+------+------+------+------+------+---------+--------+
| A | 15 | 6 | 47 | 54 | ..... | 8 |
| B | 25 | 22 | 84 | 86 | ..... | 76 |
| C | 57 | 57 | 96 | 38 | ..... | 28 |
+------+------+------+------+------+---------+--------+
And need the output to be something like this
+------+---------------+----------------+----------------+----------------+-----+-----------------+--+
| Name | Count 0 to 20 | Count 21 to 40 | Count 41 to 60 | Count 61 to 70 | ... | Count 81 to 100 | |
+------+---------------+----------------+----------------+----------------+-----+-----------------+--+
| A | 5 | 46 | 87 | 34 | ... | 98 | |
| B | 5 | 2 | 34 | 56 | ... | 36 | |
| C | 7 | 17 | 56 | 78 | ... | 88 | |
+------+---------------+----------------+----------------+----------------+-----+-----------------+--+
For Name A we have:
5 times the number between 0 and 20 apeared
46 times the number between 21 and 40 appeared
86 times the number between 41 and 60 appeared
Basicaly I need something like the function COUNTIFS that we have on Excel. On excel we just need to especify the range of columns and the condition.

You could unpivot with a lateral join, then aggregate:
select
name,
count(*) filter(where prb between 0 and 20) cnt_00_20,
count(*) filter(where prb between 21 and 50) cnt_21_20,
...,
count(*) filter(where prb between 81 and 100) cnt_81_100
from mytable t
cross join lateral (values(t.prb0), (t.prb1), ..., (t.prb100)) p(prb)
group by name
Note, however, that this still requires you to enumerate all the columns in the values() table constructor. If you want something fully dynamic, you can use json instead. The idea is to turn each record to a json object using to_jsonb(), then to rows with jsonb_each(); you can then do conditional aggregation.
select
name,
count(*) filter(where prb::int between 0 and 20) cnt_00_20,
count(*) filter(where prb::int between 21 and 50) cnt_21_20,
...,
count(*) filter(where prb::int between 81 and 100) cnt_81_100
from mytable t
cross join lateral to_jsonb(t) j(js)
cross join lateral jsonb_each( j.js - 'name') r(col, prb)
group by name

Select well spread points from a big table

I'm trying to write a stored procedure for selecting X amount of well spread points in time from a big table.
I have a table points:
"Userid" integer
, "Time" timestamp with time zone
, "Value" integer
It contains hundreds of millions of records. And about a million of records per each user.
I want to select X points (lets say 50), which all well spread from time A to time B. The problem is that the points are not spread equally (if one point is in 6:00:00, the next point may be after 15 seconds, 20, or 4 minutes for example).
Selection all the points for an id can take up to 60 seconds (because there are about a million points).
Is there any way to select the exact amount of points I desire, as much well spread as possible, in a fast way?
Sample data:
+--------+---------------------+-------+
| UserId | Time | Value |
+--------+---------------------+-------+
1 | 1 | 2017-04-10 14:00:00 | 1 |
2 | 1 | 2017-04-10 14:00:10 | 10 |
3 | 1 | 2017-04-10 14:00:20 | 32 |
4 | 1 | 2017-04-10 14:00:35 | 80 |
5 | 1 | 2017-04-10 14:00:58 | 101 |
6 | 1 | 2017-04-10 14:01:00 | 203 |
7 | 1 | 2017-04-10 14:01:30 | 204 |
8 | 1 | 2017-04-10 14:01:40 | 205 |
9 | 1 | 2017-04-10 14:02:02 | 32 |
10 | 1 | 2017-04-10 14:02:15 | 7 |
11 | 1 | 2017-04-10 14:02:30 | 900 |
12 | 1 | 2017-04-10 14:02:45 | 22 |
13 | 1 | 2017-04-10 14:03:00 | 34 |
14 | 1 | 2017-04-10 14:03:30 | 54 |
15 | 1 | 2017-04-10 14:04:00 | 54 |
16 | 1 | 2017-04-10 14:06:00 | 60 |
17 | 1 | 2017-04-10 14:07:20 | 654 |
18 | 1 | 2017-04-10 14:07:40 | 32 |
19 | 1 | 2017-04-10 14:08:00 | 33 |
20 | 1 | 2017-04-10 14:08:12 | 32 |
21 | 1 | 2017-04-10 14:10:00 | 8 |
+--------+---------------------+-------+
I want to select 11 "best" points from the list above, for the user with Id 1,
from time 2017-04-10 14:00:00 to 2017-04-10 14:10:00.
Currently its done on the server, after selecting all the points for the user.
I calculate the "best times" by dividing the difference in times and get a list such as: 14:00:00,14:01:00,....14:10:00 (11 "best times", as the amount of points). Than I look for the closest point for each "best time", that not have been selected yet.
The result will be points: 1, 6, 9, 13, 15, 16, 17, 18, 19, 20, 21
Edit:
I'm trying something like this:
SELECT * FROM "points"
WHERE "Userid" = 1 AND
(("Time" =
(SELECT "Time" FROM
"points"
ORDER BY abs(extract(epoch from '2017-04-10 14:00:00' - "Time"))
LIMIT 1)) OR
("Time" =
(SELECT "Time" FROM
"points"
ORDER BY abs(extract(epoch from '2017-04-10 14:01:00' - "Time"))
LIMIT 1)) OR
("Time" =
(SELECT "Time" FROM
"points"
ORDER BY abs(extract(epoch from '2017-04-10 14:02:00' - "Time"))
LIMIT 1)))
The problems here are that:
A) It doesn't take in account points that already have been selected.
B) Because of the ORDER BY, each additional time increases the running time of the query by ~ 1 second, and for 50 points I get back to the 1 minute mark.

There is an optimization problem behind your question that's hard to solve with just SQL.
That said, your attempt of an approximation can be implemented to use an index and show good performance irregardless of table size. You need this index if you don't have it already:
CREATE INDEX ON points ("Userid", "Time");
Query:
SELECT *
FROM generate_series(timestamptz '2017-04-10 14:00:00+0'
, timestamptz '2017-04-10 14:09:00+0' -- 1 min *before* end!
, interval '1 minute') grid(t)
LEFT JOIN LATERAL (
SELECT *
FROM points
WHERE "Userid" = 1
AND "Time" >= grid.t
AND "Time" < grid.t + interval '1 minute' -- same interval
ORDER BY "Time"
LIMIT 1
) t ON true;
dbfiddle here
Most importantly, the rewritten query can use above index and will be very fast, solving problem B).
It also addresses problem A) to some extent as no point is returned more than once. If there is no row between two adjacent points in the grid, you get no row in the result. Using LEFT JOIN .. ON true keeps all grid rows and appends NULL in this case. Eliminate those NULL rows by switching to CROSS JOIN. You may get fewer result rows this way.
I am only search ahead of each grid point. You might append a second LATERAL join to also search behind each grid point (just another index-scan), and take the closer one of the two results (ignoring NULL). But that introduces two problems:
If one match is behind and the next is ahead, the gap widens.
You need special treatment for lower and / or upper bound of the outer interval
And you need two LATERAL joins with two index scans.
You could use a recursive CTE to search 1 minute ahead of the last time actually found, but then the total number of rows returned varies even more.
It all comes down to an exact definition of what you need, and where compromises are allowed.
Related:
What is the difference between a LATERAL JOIN and a subquery in PostgreSQL?
Aggregating the most recent joined records per week
MySQL/Postgres query 5 minutes interval data
Optimize GROUP BY query to retrieve latest row per user

answer use generate_series('2017-04-10 14:00:00','2017-04-10 14:10:00','1 minute'::interval) and join for comparison.
for others to save time on data set:
t=# create table points(i int,"UserId" int,"Time" timestamp(0), "Value" int,b text);
CREATE TABLE
Time: 13.728 ms
t=# copy points from stdin delimiter '|';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 1 | 1 | 2017-04-10 14:00:00 | 1 |
>> 2 | 1 | 2017-04-10 14:00:10 | 10 |
3 | 1 | 2017-04-10 14:00:20 | 32 |
4 | 1 | 2017-04-10 14:00:35 | 80 |
5 | 1 | 2017-04-10 14:00:58 | 101 |
6 | 1 | 2017-04-10 14:01:00 | 203 |
7 | 1 | 2017-04-10 14:01:30 | >> 204 |
8 | 1 | 2017-04-10 14:01:40 | 205 |
9 | 1 | 2017-04-10 14:02:02 | 32 |
10 | 1 | 2017-04-10 14:02:15 | 7 |
11 | 1 | 2017-04-10 14:02:30 | 900 |
12 | 1 | 2017-04-10 14:02:45 | 22 |
>> >> >> >> >> >> >> >> >> >> 13 | 1 | 2017-04-10 14:03:00 | 34 |
14 | 1 | 2017-04-10 14:03:30 | 54 |
15 | 1 | 2017-04-10 14:04:00 | 54 |
16 | 1 | 2017-04-10 14:06:00 | 60 |
17 | 1 | 2017-04-10 14:07:20 | 654 |
18 | 1 | 2017-04-10 14:07:40 | 32 |
19 | 1 | 2017-04-10 14:08:00 | 33 |
20 | 1 | 2017-04-10 14:08:12 | 32 |
21 | 1 | 2017-04-10 14:10:00 | 8 |>> >> >> >> >> >> >> >> \.
>> \.
COPY 21
Time: 7684.259 ms
t=# alter table points rename column "UserId" to "Userid";
ALTER TABLE
Time: 1.013 ms
Frankly I don't understand the request. This is how I got it from description and results are different from expected by OP:
t=# with r as (
with g as (
select generate_series('2017-04-10 14:00:00','2017-04-10 14:10:00','1 minute'::interval) s
)
select *,abs(extract(epoch from '2017-04-10 14:02:00' - "Time"))
from g
join points on g.s = date_trunc('minute',"Time")
order by abs
limit 11
)
select i, "Time","Value",abs
from r
order by i;
i | Time | Value | abs
----+---------------------+-------+-----
4 | 2017-04-10 14:00:35 | 80 | 85
5 | 2017-04-10 14:00:58 | 101 | 62
6 | 2017-04-10 14:01:00 | 203 | 60
7 | 2017-04-10 14:01:30 | 204 | 30
8 | 2017-04-10 14:01:40 | 205 | 20
9 | 2017-04-10 14:02:02 | 32 | 2
10 | 2017-04-10 14:02:15 | 7 | 15
11 | 2017-04-10 14:02:30 | 900 | 30
12 | 2017-04-10 14:02:45 | 22 | 45
13 | 2017-04-10 14:03:00 | 34 | 60
14 | 2017-04-10 14:03:30 | 54 | 90
(11 rows)
I added abs column to justify why I thought those rows fit request better

query to get quantity from last recorded period in month postgresql

I've been trying to make this query for so long, that I just gave up and came here to ask you some help. Really guys, many mornings, afternoons spent only trying to formulate this single single single query, almost loosing my mind. Anyway, I will be direct with you, here we go:
I have a table:
date | quantity | id
------------+----------+----
2016-06-10 | 438 | 27
2016-06-17 | 449 | 28
2016-06-24 | 458 | 29
2016-07-01 | 466 | 30
2016-07-08 | 468 | 31
2016-07-15 | 468 | 32
2016-07-22 | 473 | 33
2016-07-29 | 473 | 34
2016-08-05 | 475 | 35
2016-08-12 | 479 | 36
2016-08-19 | 488 | 37
2016-08-26 | 498 | 38
2016-09-02 | 519 | 39
I need to get the quantity from the last recorded day inside each month. I mean, from the table above, I need the following rolls:
date | quantity | id
------------+----------+----
2016-06-24 | 458 | 29
2016-07-29 | 473 | 34
2016-08-26 | 498 | 38
and the final table that I really need is this:
month | quantity
------------+----------
06 | 458
07 | 473
06 | 498
I trying GROUP, HAVING, MAX, JOINS, UNIONS anything you can imagine, buy just can't get through it. Any ideas to make this happen?
Thanks

Postgres has a great feature called distinct on:
select distinct on (date_trunc('month', date)) t.*
from t
order by date_trunc('month', date), date desc;
distinct on returns exactly one row for each value of the keys in parentheses -- in this case, one per month. Which row? The first row encountered in the data. So, this returns the latest date each month.

Try with ROW_NUMBER()
SQL DEMO
SELECT date_trunc('month', date),
quantity
FROM (
SELECT date, quantity, id,
ROW_NUMBER() OVER (PARTITION BY date_trunc('month', date)
ORDER BY date DESC) as rn
FROM Table1
) T
WHERE T.rn = 1;
OUTPUT
| date_trunc | quantity |
|-----------------------------|----------|
| June, 01 2016 00:00:00 | 458 |
| July, 01 2016 00:00:00 | 473 |
| August, 01 2016 00:00:00 | 498 |
| September, 01 2016 00:00:00 | 519 |

Need SQL select query to find duplicates and return min and max rows

I have the sql query as below.
(SELECT
height
,width
,ROUND(height / 0.0254, 0) AS "H1"
,FLOOR((width * 2) / 0.0254) AS "W1"
FROM iclr_max_dim_results mdim
,iclr_request req
WHERE mdim.request_oid = req.oid
AND req.request_number = 102017
AND req.version_number = 52731
GROUP BY height
ORDER BY height DESC
) A
)
Below is the result of the query.
height | width | H1 | W1
-----------------------------------------
<hr>
6.0223 | 0.1003 | 237 | 7
6.0198 | 0.2435 | 237 | 19
6.0185 | 0.3151 | 237 | 24
5.9944 | 1.6759 | 236 | 131
5.9931 | 1.6779 | 236 | 132
5.9576 | 1.7016 | 235 | 133
5.9563 | 1.7024 | 235 | 134
If we see the last two columns H1 and W1 in the first three rows, the value 237 repeats with 7, 19, 24 respectively. I will need to return only the rows min and max W1 value for H1.
Here, in this case the result shall be as below. We eliminated 237 | 19 since 7 and 24 are min and max for 237.
6.0223 | 0.1003 | 237 | 7
6.0185 | 0.3151 | 237 | 24
5.9944 | 1.6759 | 236 | 131
5.9931 | 1.6779 | 236 | 132
5.9576 | 1.7016 | 235 | 133
5.9563 | 1.7024 | 235 | 134
How should I edit the SQL qyery to archieve this.
Thank you very much.

Query can be this:
SELECT a.*
FROM (...) a
JOIN (
SELECT H1, MIN(W1) as w1_min, MAX(W1) as w1_max
FROM (...) c
GROUP BY H1
) b ON b.H1 = a.H1 AND (b.w1_min = a.W1 OR b.w1_max = a.W1)
Replace ... with your original query or cretate VIEW from your original query and replace (...) with views name.

Order By column 2, group by column 1

Maybe is the title not clear enough, but didn't knew an better way to say it.
The thing is that I've got an table called Partij which has an idPartij and an Moederpartij. The column Moederpartij will point back to idPartij ( so we can create an Mother -> Child relation ).
This is the Query I have so far:
SELECT
P.idPartij,
P.Partijnaam,
P.Gewicht,
PER.Perceel,
P.Moederpartij
FROM Partij AS P
LEFT OUTER JOIN Perceel AS PER ON P.idPerceel = PER.idPerceel
WHERE P.Actief = 1
ORDER BY
P.Moederpartij ASC,
P.Partijnaam ASC
Which results in the following output:
360 | 34 Avarna 13-1V | | 0
280 | 36 Agata 13-1V | | 0
160 | 37 Excellency 13-1V | | 0
140 | 38 Erika 13-1V | | 0
300 | 39 Rosagold 13-1V | | 0
240 | 40 Fontane 13-2V | | 0
200 | 41 Fontane 13-1V | | 0
220 | 42 Fontane 13-3V | | 0
180 | 45 Spunta 13-3V | | 0
260 | 46 Arnova 13-1V | | 0
400 | 43 Spunta 13-2V | | 180
380 | 44 Spunta 13-1V | | 180
320 | 35 Altus 13-1V | | 260
340 | 47 Arizona 13-1V | | 260
But I'm trying to get the following output:
360 | 34 Avarna 13-1V | | 0
280 | 36 Agata 13-1V | | 0
160 | 37 Excellency 13-1V | | 0
140 | 38 Erika 13-1V | | 0
300 | 39 Rosagold 13-1V | | 0
240 | 40 Fontane 13-2V | | 0
200 | 41 Fontane 13-1V | | 0
220 | 42 Fontane 13-3V | | 0
180 | 45 Spunta 13-3V | | 0
400 | 43 Spunta 13-2V | | 180
380 | 44 Spunta 13-1V | | 180
260 | 46 Arnova 13-1V | | 0
320 | 35 Altus 13-1V | | 260
340 | 47 Arizona 13-1V | | 260
So that you first get the Mother (Moederpartij) And after that all the Childs, and so on...
Is this even possible in 1 single Query or should I loop in PHP through the records an get all the child for each record?
EDIT 1
The DB where this is running, is an MariaDB.

Try this one. I had added an column in the select query and sorted based on it. Since I don't have the MariaDB have not tested the query but I am sure it shld solve ur query.
SELECT
P.idPartij,
P.Partijnaam,
P.Gewicht,
PER.Perceel,
P.Moederpartij,
case when P.Moederpartij =0 then
Concat(P.idPartij ,"-", "0","-", P.idPartij )
else
Concat(P.Moederpartij ,"-", "9","-", P.idPartij )
end sorder
FROM Partij AS P
LEFT OUTER JOIN Perceel AS PER ON P.idPerceel = PER.idPerceel
WHERE P.Actief = 1
ORDER BY
sorder Asc

I am not sure if I understand your question, but is "Avarna 13-1V" for example the full party name, and the mother as you call it is "Avarna"?
I.e. you want all Spuntas, Fontanes etc. together but they also contain the 13-1V, 13-2V etc. and that's what throws it off?
You could order by just the first word first (locate the space separator & take the left of that), then by the mother party field. E.g. something like this:
ORDER BY LEFT(Partijnaam, LOCATE(' ',Partijnaam) - 1), Moederpartij
But if that is the case why not just separate the text name and the 13-v1 etc. into different columns?
(MySQL)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Postgres Calculate Difference Using Window Functions - sql

Well here is a T-SQL. For this formula as you said ---> Max(total volume each day) - Min(total volume each day) May help you. SELECT (X.Max(SumV)-X.Min(SumV)) From ( SELECT Client_Id,Date,SUM(Volume) AS SumV FROM Orders GROUP BY Client_id,Date ) X Group by X.Client_Id

Related

How to count how many times a specific value appeared on each columns and group by range

Select well spread points from a big table

query to get quantity from last recorded period in month postgresql

Need SQL select query to find duplicates and return min and max rows

Order By column 2, group by column 1

Categories

Resources