How to get columns when using buckets (width_bucket) - sql

I would like to know which row were moved to a bucket.
SELECT
width_bucket(s.score, sl.mins, sl.maxs, 9) as buckets,
COUNT(*)
FROM scores s
CROSS JOIN scores_limits sl
GROUP BY 1
ORDER BY 1;
My actual return:
buckets | count
---------+-------
1 | 182
2 | 37
3 | 46
4 | 15
5 | 29
7 | 18
8 | 22
10 | 11
| 20
What I expect to return:
SELECT buckets FROM buckets_table [...] WHERE scores.id = 1;
How can I get, for example, the column 'id' of table scores?

I believe you can include the id in an array with array_agg. If I recreate your case with
create table test (id serial, score int);
insert into test(score) values (10),(9),(5),(4),(10),(2),(5),(7),(8),(10);
The data is
id | score
----+-------
1 | 10
2 | 9
3 | 5
4 | 4
5 | 10
6 | 2
7 | 5
8 | 7
9 | 8
10 | 10
(10 rows)
Using the following and aggregating the id with array_agg
SELECT
width_bucket(score, 0, 10, 11) as buckets,
COUNT(*) nr_ids,
array_agg(id) agg_ids
FROM test s
GROUP BY 1
ORDER BY 1;
You get
buckets | nr_ids | agg_ids
---------+--------+----------
3 | 1 | {6}
5 | 1 | {4}
6 | 2 | {3,7}
8 | 1 | {8}
9 | 1 | {9}
10 | 1 | {2}
12 | 3 | {1,5,10}

Related

Select all the records in the first table that match each of the records in the second

I'm working with an Access database and have two tables:
ID_1
Number
Some other data
1
1
Data
2
2
Data
3
3
Data
4
4
Data
5
3
Data
6
1
Data
7
2
Data
8
3
Data
9
1
Data
10
1
Data
11
2
Data
12
3
Data
13
4
Data
14
1
Data
15
2
Data
16
3
Data
17
4
Data
18
3
Data
19
3
Data
ID_2
Number
Some other data
1
3
Data
2
1
Data
3
2
Data
4
3
Data
5
2
Data
As you see, both tables have duplicate data. I need a query that would select all the records in the first table that match each of the records in the second, they are related by Number field. It's also necessary that these records aren't repeated (that is, that the query doesn't repeat values when selecting). For the given example I should get this result:
ID
ID_1
Number
Some other data
1
3
3
Data
2
5
3
Data
3
8
3
Data
4
12
3
Data
5
16
3
Data
6
18
3
Data
7
19
3
Data
8
1
1
Data
9
6
1
Data
10
9
1
Data
11
10
1
Data
12
14
1
Data
13
2
2
Data
14
7
2
Data
15
11
2
Data
16
15
2
Data
I was thinking that maybe I could use Join, but I still don't know how; tried Where, but also didn't find a use for it. Could you please help me with that?
I don't see where you're generating your output ID field from - or where you're picking your Data field from so here's the best guess.
SELECT Table1.ID_1, Table1.Number, Table1.[Some other data]
FROM Table1
WHERE (Table1.Number In (SELECT Number From Table2))
ORDER BY Table1.Number, Table1.ID_1;
Looks like this:
MySql DB data structure
create table tbl1(ID_1 serial, Number int);
create table tbl2(ID_2 serial, Number int);
insert into tbl1(Number) values (1),(2),(3),(4),(3),(1),(2),(3),(1),(1),(2),(3),(4),(1),(2),(3),(4),(3),(3);
insert into tbl2(Number) values (3),(1),(2),(3),(2);
query (with s), needed to remove duplicates
the window function count(tbl1.Number) OVER(PARTITION BY Number) sorts the result for us by the count of matched numbers
the #rownum variable is needed to count rows
with s as (select distinct Number from tbl2),
f as (select ID_1,tbl1.Number from tbl1 left join s on
(tbl1.Number=s.Number) where s.Number is not null order by
count(tbl1.Number) OVER(PARTITION BY Number) desc)
select #rownum := #rownum + 1 AS ID,ID_1,Number from f, (SELECT #rownum := 0) r;
results
+------+------+--------+
| ID | ID_1 | Number |
+------+------+--------+
| 1 | 3 | 3 |
| 2 | 5 | 3 |
| 3 | 8 | 3 |
| 4 | 12 | 3 |
| 5 | 16 | 3 |
| 6 | 18 | 3 |
| 7 | 19 | 3 |
| 8 | 1 | 1 |
| 9 | 6 | 1 |
| 10 | 9 | 1 |
| 11 | 10 | 1 |
| 12 | 14 | 1 |
| 13 | 2 | 2 |
| 14 | 7 | 2 |
| 15 | 11 | 2 |
| 16 | 15 | 2 |
+------+------+--------+

RANK data by value in the column

I'd like to divide the data into separate groups (chunks) based on the value in the column. If the value increase above certain threshold, the value in the "group" should increase by 1.
This would be easy to achieve in MySQL, by doing CASE WHEN #val > 30 THEN #row_no + 1 ELSE #row_no END however I am using Amazon Redshift where this is not allowed.
Sample fiddle: http://sqlfiddle.com/#!15/00b3aa/6
Suggested output:
ID
Value
Group
1
11
1
2
11
1
3
22
1
4
11
1
5
35
2
6
11
2
7
11
2
8
11
2
9
66
3
10
11
3
A cumulative sum should do what you want:
SELECT *, sum((val>=30)::INTEGER) OVER (ORDER BY id BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) FROM mydata ORDER BY id;
id | val | sum
----+-----+-----
1 | 11 | 0
2 | 11 | 0
3 | 22 | 0
4 | 11 | 0
5 | 35 | 1
6 | 11 | 1
7 | 11 | 1
8 | 11 | 1
9 | 66 | 2
10 | 11 | 2

Postgres Query Based on Previous and Next Rows

I'm trying to solve the bus routing problem in postgresql which requires visibility of previous and next rows. Here is my solution.
Step 1) Have one edges table which represents all the edges (the source and target represent vertices (bus stops):
postgres=# select id, source, target, cost from busedges;
id | source | target | cost
----+--------+--------+------
1 | 1 | 2 | 1
2 | 2 | 3 | 1
3 | 3 | 4 | 1
4 | 4 | 5 | 1
5 | 1 | 7 | 1
6 | 7 | 8 | 1
7 | 1 | 6 | 1
8 | 6 | 8 | 1
9 | 9 | 10 | 1
10 | 10 | 11 | 1
11 | 11 | 12 | 1
12 | 12 | 13 | 1
13 | 9 | 15 | 1
14 | 15 | 16 | 1
15 | 9 | 14 | 1
16 | 14 | 16 | 1
Step 2) Have a table which represents bus details like from time, to time, edge etc.
NOTE: I have used integer format for "from" and "to" column for faster results as I can do an integer query, but I can replace it with any better format if available.
postgres=# select id, "busedgeId", "busId", "from", "to" from busedgetimes;
id | busedgeId | busId | from | to
----+-----------+-------+-------+-------
18 | 1 | 1 | 33000 | 33300
19 | 2 | 1 | 33300 | 33600
20 | 3 | 2 | 33900 | 34200
21 | 4 | 2 | 34200 | 34800
22 | 1 | 3 | 36000 | 36300
23 | 2 | 3 | 36600 | 37200
24 | 3 | 4 | 38400 | 38700
25 | 4 | 4 | 38700 | 39540
Step 3) Use dijkstra algorithm to find the nearest path.
Step 4) Get the upcoming buses from the busedgetimes table in the earliest first order for the nearest path detected by dijkstra algorithm.
Problem: I am finding it difficult to make the query for the Step 4.
For example: If I get the path as edges 2, 3, 4, to travel from source vertex 2 to target vertex 5 in the above records. To get the first bus for the first edge, it's not so hard as I can simply query with from < 'expected departure' order by from desc but for the second edge, the from condition requires to time of first result row. Also, query requires edge ids filter.
How can I achieve this in a single query?
I am not sure if I understood your problem correctly. But getting values from other rows this can be done by window functions (https://www.postgresql.org/docs/current/static/tutorial-window.html):
demo: db<>fiddle
SELECT
id,
lag("to") OVER (ORDER BY id) as prev_to,
"from",
"to",
lead("from") OVER (ORDER BY id) as next_from
FROM bustimes;
The lag function moves the value of the previous row into the current one. The lead function does the same with the next row. So you are able to calculate a difference between last arrival and current departure or something like that.
Result:
id prev_to from to next_from
18 33000 33300 33300
19 33300 33300 33600 33900
20 33600 33900 34200 34200
21 34200 34200 34800 36000
22 34800 36000 36300
Please notice that "from" and "to" are reserved words in PostgreSQL. It would be better to chose other names.

select only tuples where second column always has same value

I have a similar table to this one
ID | CountryID
1 | 22
1 | 22
2 | 19
3 | 0
3 | 14
3 | 18
3 | 21
3 | 22
3 | 23
4 | 19
5 | 9
5 | 9
6 | 14
and I want to group by the first ID column but select only rows, where the CountryID has the same value throughout an ID. The resulting table should look like
ID | CountryID
1 | 22
2 | 19
4 | 19
5 | 9
6 | 14
Any ideas?
I think the following query should work:
SELECT ID, MAX(CountryID)
FROM Table1
GROUP BY ID
HAVING MIN(CountryID) = MAX(CountryID)
SELECT ID, count(distinct CountryID)
FROM Table1
GROUP BY ID
HAVING count(distinct CountryID)=1

Postgres width_bucket() not assigning values to buckets correctly

In postgresql 9.5.3 I can't get width_bucket() to work as expected, it appears to be assigning values to the wrong buckets.
Dataset:
1
2
4
32
43
82
104
143
232
295
422
477
Expected output (bucket ranges and zero-count rows added to help analysis):
bucket | bucketmin | bucketmax | Expect | Actual
--------+-----------+-----------+--------|--------
1 | 1 | 48.6 | 5 | 5
2 | 48.6 | 96.2 | 1 | 2
3 | 96.2 | 143.8 | 2 | 1
4 | 143.8 | 191.4 | 0 | 0
5 | 191.4 | 239 | 1 | 1
6 | 239 | 286.6 | 0 | 1
7 | 286.6 | 334.2 | 1 | 0
8 | 334.2 | 381.8 | 0 | 1
9 | 381.8 | 429.4 | 1 | 0
10 | 429.4 | 477 | 1 | 1
Actual output:
wb | count
----+-------
1 | 5
2 | 2
3 | 1
5 | 1
6 | 1
8 | 1
10 | 1
Code to generate actual output:
create temp table metrics (val int);
insert into metrics (val) values(1),(2),(4),(32),(43),(82),(104),(143),(232),(295),(422),(477);
with metric_stats as (
select
cast(min(val) as float) as minV,
cast(max(val) as float) as maxV
from metrics m
),
hist as (
select
width_bucket(val, s.minV, s.maxV, 9) wb,
count(*)
from metrics m, metric_stats s
group by 1 order by 1
)
select * from hist;
Your calculations appear to be off. The following query:
with metric_stats as (
select cast(min(val) as float) as minV,
cast(max(val) as float) as maxV
from metrics m
)
select g.n,
s.minV + ((s.maxV - s.minV) / 9) * (g.n - 1) as bucket_start,
s.minV + ((s.maxV - s.minV) / 9) * g.n as bucket_end
from generate_series(1, 9) g(n) cross join
metric_stats s
order by g.n
Yields the following bins:
1 1 53.8888888888889
2 53.8888888888889 106.777777777778
3 106.777777777778 159.666666666667
4 159.666666666667 212.555555555556
5 212.555555555556 265.444444444444
6 265.444444444444 318.333333333333
7 318.333333333333 371.222222222222
8 371.222222222222 424.111111111111
9 424.111111111111 477
I think you intend for the "9" to be a "10", if you want 10 buckets.