Finding sequence starting from a particular number in Big query

Finding sequence starting from a particular number in Big query - sql

How can we achieve the same functionality as of 'SEQUENCE' in provided in Netezza?
Please find below the link demonstrating the functionality I would like to achieve in Big query :
[https://www.ibm.com/support/knowledgecenter/en/SSULQD_7.2.1/com.ibm.nz.dbu.doc/r_dbuser_create_sequence.html][1]
I have reviewed RANK() but this is not solving my purpose to the core. Any leads would be appreciated.

in BigQuery Standard SQL you can find two function that can help you here -
GENERATE_ARRAY(start_expression, end_expression\[, step_expression\])
and
GENERATE_DATE_ARRAY(start_date, end_date\[, INTERVAL INT64_expr date_part\])
For example, below code
#standardSQL
SELECT sequence
FROM UNNEST(GENERATE_ARRAY(1, 10, 1)) AS sequence
produces result as
sequence
1
2
3
4
5
6
7
8
9
10

Related

Select latest values in timeseries table [duplicate]

This question already has answers here:
Efficient latest record query with Postgresql
(6 answers)
Select first row in each GROUP BY group?
(20 answers)
Optimize GROUP BY query to retrieve latest row per user
(3 answers)
Closed last month.
I have a timeseries table in Postgres that collects events of various types, with their values and a timestamp. In stylized form it looks like this:
evt_type
evt_val_01
evt_val_02
evt_time
1
0.5
10
1
2
0.7
12
1
3
0.8
13
1
2
0.1
21
2
2
0.3
98
3
3
0.4
76
3
2
0.2
3
4
I'd like now to create a SELECT query that returns the latest (by timestamp in evt_time) values per event type (evt_typ), i.e., the query should return:
evt_type
evt_val
evt_val_02
evt_time
1
0.5
10
1
2
0.2
3
4
3
0.4
76
3
In QuestDB for instance, this is easy to be achieved by the LATEST ON parameter. However, I am struggling to find an efficient Postgres approach that makes full use of index on evt_time and evt_type. The trivial approach using
select evt_type, max(evt_time) from events group by evt_type
In a sub-query, is way too slow, as the real-life table has double-digit million rows and about a thousand event types, plus about a dozen value columns. I tried to use various forms of last_value(), and LATERAL joins, but did not manage to get it right thus far.
Any suggestions how to structure the SELECT query would be greatly appreciated. PS, I'm on Postgres 15.1

InfluxDB v1.8: subquery using MAX selector

I'm using InfluxDB 1.8 and trying to make a little more complex query than Influx was made for.
I want to retrieve all data that refers to the last month stored, based on tag and field values that my script stores (not the default "time" field that Influx creates). Say we have this infos measurement:
time
field_month
field_week
tag_month
tag_week
some_data
1631668119209113500
8
1
8
1
random
1631668119209113500
8
2
8
2
random
1631668119209113500
8
3
8
3
random
1631668119209113500
9
1
9
1
random
1631668119209113500
9
1
9
1
random
1631668119209113500
9
2
9
2
random
Which 8 refers to August, 9 to September, and then some_data that is stored on a given week of that month.
I can use MAX selector at field_month to get the last month of the year stored (can't use Flux date package because I'm using v1.8). Further, I want the data grouped by tag_month and tag_week so I can COUNT how many times some_data was stored on each week of the month, that's why the same data is repeated in field and tag keys. Something like that:
SELECT COUNT(field_month) FROM infos WHERE field_month = 9 GROUP BY tag_month, tag_week
Replacing 9 by MAX Selector:
SELECT COUNT(field_month) FROM infos WHERE field_month = (SELECT MAX(field_month) FROM infos) GROUP BY tag_month, tag_week
The first query works (see results here), but not the second.
Am I doing something wrong? Is there any other possibility to make this work in v1.8?
NOTE: I know Influx wasn't supposed to be used like that. I've tried and managed this easily with PostgreSQL, using an adapted form of the second query above. But while we straighten things up to use Postgres, we have to use InfluxDB v1.8.

in postgresql you can try :
SELECT COUNT(field_month) FROM infos WHERE field_month =
(SELECT field_month FROM infos ORDER BY field_month DESC limit 1)
GROUP BY tag_month, tag_week;

CONNECT BY without PRIOR

I came across a query to create a dummy table like this
CREATE TABLE destination AS
SELECT level AS id,
CASE
WHEN MOD(level, 2) = 0 THEN 10
ELSE 20
END AS status,
'Description of level ' || level AS description
FROM dual
CONNECT BY level <= 10;
SELECT * FROM destination;
1 20 Description of level 1
2 10 Description of level 2
3 20 Description of level 3
4 10 Description of level 4
5 20 Description of level 5
6 10 Description of level 6
7 20 Description of level 7
8 10 Description of level 8
9 20 Description of level 9
10 10 Description of level 10
10 rows selected.
Could you share some insights of how this works ? First, the missing of PRIOR is unbeknownst to me. Second, I don't get how the tree is constructed. From the level it looks like they are all branched out from the same root.

This gimmick was noticed by a DB professional, Mikito Harakiri, and shared on AskTom. It has been adopted in the Oracle community, although it is undocumented (it actually goes against the documentation), and its use is somewhat dangerous in that Oracle may at some point make it no longer work. (Although with its already massive use, it would be insane for Oracle to take it back.)
The rows are indeed branching from the same root, the single row of dual. You can use any other table that has EXACTLY ONE row for the same trick. If you start with two rows (or you use the trick on your own table, with many rows), you will quickly run into trouble. There are ways around that, you will pick that up over time. You may be interested in following the Oracle forum, at OTN, people use this trick all the time there.
Here is an article that discusses this trick: http://www.sqlsnippets.com/en/topic-11821.html

SQL group by steps

I'm using SQL in SAS.
I'm doing a SQL query with a GROUP BY clause on a continuous variable (made discrete), and I'd like it to aggregate more. I'm not sure this is clear, so here is an example.
Here is my query :
SELECT CEIL(travel_time) AS time_in_mn, MEAN(foo) AS mean_foo
FROM my_table
GROUP BY CEIL(travel_time)
This will give me the mean value of foo for each different value of travel_time. Thanks to the CEIL() function, it will group by minutes and not seconds (travel_time can take values such as 14.7 (minutes)). But I'd like to be able to group by groups of 5 minutes, for instance, so that I have something like that :
time_in_mn mean_foo
5 4.5
10 3.1
15 17.6
20 12
(Of course, the mean(foo) should be done over the whole interval, so for time_in_mn = 5, mean_foo should be the mean of foo where travel_time in (0,1,2,3,4,5) )
How can I achieve that ?
(Sorry if the answer can be found easily, the only search term I could think of is group by step, which gives me a lot of "step by step tutorials" about SQL...)

A common idiom of "ceiling to steps" (or rounding, or flooring, for that matter) is to divide by the step, ceil (or round, or floor, of course) and then multiply by it again. This way, if we take, for example, 12.4:
Divide: 12.4 / 5 = 2.48
Ceil: 2.48 becomes 3
Multiply: 3 * 5 = 15
And in SQL form:
SELECT 5 * CEIL(travel_time / 5.0) AS time_in_mn,
MEAN(foo) AS mean_foo
FROM my_table
GROUP BY 5 * CEIL(travel_time / 5.0)

PostgreSQL - divide query result into quintiles

There is a PostgreSQL SQL SELECT results. I need to divide rows into quintiles and update quintil value to the specific row.
Is there some possibility to do this requirement in SELECT without need to do it in application? I would like to avoid situation when I need to select data to application and do the ranking out of PostgreSQL server.
Data example - first column is value, second column is quintil
4859 - 5
4569 - 5
4125 - 4
3986 - 4
3852 - 3
3562 - 3
3452 - 2
3269 - 2
3168 - 1
3058 - 1
Thank you.

There is a window function called "ntile" to produce this: you give it a parameter specifying how many "tiles" the output it covers should be divided into (5 in this case).
For example:
select t.id, ntile(5) over (order by t.id)
from t
See window function tutorial for an introduction to window functions, and window functions for a list of the standard ones supplied.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Finding sequence starting from a particular number in Big query - sql

Related

Select latest values in timeseries table [duplicate]

InfluxDB v1.8: subquery using MAX selector

CONNECT BY without PRIOR

SQL group by steps

PostgreSQL - divide query result into quintiles

Categories

Resources