BigQuery - how many entries per partition? - google-bigquery

I have big partitioned tables and try to figure out how many entries are in each day-partition.
So far I used a for loop in a script but there must be a simpler way doing it.
Google did not help me. Does anyone know the right query?
Thanks

you can run the following query to count how many entries you have in each partition
#standardSQL
SELECT
_PARTITIONTIME AS pt,
COUNT(1)
FROM
`dataset.table`
GROUP BY
1
ORDER BY
1 DESC
and
#legacySQL
SELECT
_PARTITIONTIME AS pt,
COUNT(1)
FROM
[dataset:table]
GROUP BY
1
ORDER BY
1 DESC
it returns a table like this, please note that the NULL entries are still in streaming buffer. Hint: to obtain records which are in streaming buffer us a query with NULL.
+-------------------------+-----+--+
| 2017-02-14 00:00:00 UTC | 252 | |
+-------------------------+-----+--+
| 2017-02-13 00:00:00 UTC | 257 | |
+-------------------------+-----+--+
| 2017-02-12 00:00:00 UTC | 188 | |
+-------------------------+-----+--+
| 2017-02-11 00:00:00 UTC | 234 | |
+-------------------------+-----+--+
| 2017-02-10 00:00:00 UTC | 107 | |
+-------------------------+-----+--+
| null | 13 | |
+-------------------------+-----+--+

Related

How to concat two fields and use the result in WHERE clause?

I have to get all oldest records based on the date-time information.
Data
Id | External Id | Date | Time
1 | 1000 | 2020-08-18 00:00:00 | 02:30:22
2 | 1000 | 2020-08-12 00:00:00 | 12:45:51
3 | 1556 | 2020-08-17 00:00:00 | 10:09:01
4 | 1919 | 2020-08-14 00:00:00 | 18:19:18
5 | 1919 | 2020-08-14 00:00:00 | 11:45:21
6 | 1919 | 2020-08-14 00:00:00 | 15:54:15
Expected result
Id | External Id | Date | Time
2 | 1000 | 2020-08-12 00:00:00 | 12:45:51
3 | 1556 | 2020-08-17 00:00:00 | 10:09:01
5 | 1919 | 2020-08-14 00:00:00 | 11:45:21
I'm currently doing this
SELECT *
FROM RUN AS T1
WHERE CONCAT(T1.DATE, T1.TIME) = (
SELECT MIN(CONCAT(T2.DATE, T2.TIME))
FROM RUN AS T2
WHERE T2.EXTERNAL_ID = T1.EXTERNAL_ID
)
Is it a correct way to do ?
Thank you, regards
Update 1 : Data type
DATE column is datetime
TIME column is varchar
You can use a window function such as DENSE_RANK()
SELECT ID, External_ID, Date, Time
FROM
(
SELECT DENSE_RANK() OVER (PARTITION BY External_ID ORDER BY Date, Time) AS dr,
r.*
FROM run r
) AS q
WHERE dr = 1
Demo

Optimizing results for query with WHERE EXISTS clause

I have this table in postgres:
id | id_datetime | longitude | latitude
--------+---------------------+---------------------+--------------------
639438 | 2018-02-20 18:00:00 | -122.3880011217841 | 37.75538988423265
639439 | 2018-02-20 20:30:00 | -122.38756878451498 | 37.760550220844614
639440 | 2018-02-20 20:05:00 | -122.39640513677658 | 37.76130039041195
639441 | 2018-02-24 10:00:00 | -122.45819139221014 | 37.724317534370066
639442 | 2018-02-10 09:00:00 | -122.44693382058489 | 37.77000760474354
I want an output with all the differents ID's which has at least another ID between the last 15 minutes and between 1000 meters (geographic distance).
My table has more than 100K rows. So, I'm currently trying with the following query which works but takes too long. Are there any steps I can take to optimize this?
SELECT DISTINCT
x.id
FROM table x
WHERE EXISTS(
SELECT
1
FROM table t
WHERE t.id <> x.id
AND (t.id_datetime between x.id_datetime - interval '15 minutes' AND x.id_datetime)
AND (ST_Distance((geography(ST_MakePoint(x.longitude, x.latitude))),
geography(ST_MakePoint(t.longitude, t.latitude)) ) <= 1000)
)

SQL - Average monthly amount

I have a table with "amount" and "date" columns, and I want to display the average by month.
The table looks like this:
amount | date |
100 | 2017-04-22 20:39:24 |
300 | 2017-04-25 16:14:08 |
200 | 2017-04-28 17:51:16 |
100 | 2017-05-29 05:46:42 |
100 | 2017-05-08 16:15:13 |
100 | 2017-05-09 22:06:45 |
400 | 2017-06-10 10:57:34 |
500 | 2017-06-11 15:57:14 |
900 | 2017-06-14 16:02:36 |
This is what I have:
SELECT AVG(amount) AS avg_amount, date
FROM table
GROUP BY date
It displays the average by day so it ends up looking exactly the same as the first table but without the hour/minute/second portion, while I want it to look like this:
avg_amount | date |
200 | April |
100 | May |
600 | June |
GROUP BY MONTH(date)
Check out the date and time functions in MySQL or in PostgreSQL extract function.
I like TOvidiu's answer, but that will only work if you have a single years worth of data. I would suggest
SELECT AVG(amount) AS avg_amount, date
FROM table
GROUP BY YEAR(date), MONTH(date)

how to get second max date in postgres sql

I have following situation where i need to get several values between two invoices date.
So query is giving data based on invoices now what i need to do is for some values fetch data between this invoice date and last invoice date
already tried ways
1) sub query will easily solve this but as i have to do this for 4-5 column and its a 15 gb database so that's not possible.
2) if i go like this
left join (select inv.date ,inv,actno from invoice inv) as invo on invo.actno=act.id and invo.date < inv.date
then it will give all the data less then that date but i need only one data that will be less than main invoice date.
3) we can not get second max value in subquery of from clause because outer invoice is not grouped so it might be max or midlle or least .
4) we can not send values of other table in subquery of join table.
ex
create table inv (id serial ,date timestamp without time zone);
insert into inv (date) values('2017-01-31 00:00:00'),('2017-01-30 00:00:00'),('2017-01-29 00:00:00'),('2017-01-28 00:00:00'),('2017-01-27 00:00:00');
select date as d1 from inv;
id | date
----+---------------------
1 | 2017-01-31 00:00:00
2 | 2017-01-30 00:00:00
3 | 2017-01-29 00:00:00
4 | 2017-01-28 00:00:00
5 | 2017-01-27 00:00:00
(5 rows)
I need this
id |date |date | id
1 | 2017-01-31 00:00:00 | 2017-01-30 00:00:00 | 2
2 | 2017-01-30 00:00:00 | 2017-01-29 00:00:00 | 3
3 | 2017-01-29 00:00:00 | 2017-01-28 00:00:00 | 4
4 | 2017-01-28 00:00:00 | 2017-01-27 00:00:00 | 5
5 | 2017-01-27 00:00:00 |
I can't do subquery in select as database is big and need to do this for 4-5 column
UPDATE 1
I need this from same table but using it twice in FROM clause as my requirement is that I need several data joined from invoice table and then there is 4-5 column in which I need things like sum of amount paid between last and this invoice.
So I can take both invoice date in subquery and get the data between them
UPDATE 2
lag will not solve this
select i.id,i.date, lag(date) over (order by date) from inv i order by id ;
id | date | lag
----+---------------------+---------------------
1 | 2017-01-31 00:00:00 | 2017-01-30 00:00:00
2 | 2017-01-30 00:00:00 | 2017-01-29 00:00:00
3 | 2017-01-29 00:00:00 | 2017-01-28 00:00:00
4 | 2017-01-28 00:00:00 | 2017-01-27 00:00:00
5 | 2017-01-27 00:00:00 |
(5 rows)
Time: 0.480 ms
test=# select i.id,i.date, lag(date) over (order by date) from inv i where id=2 order by id ;
id | date | lag
----+---------------------+-----
2 | 2017-01-30 00:00:00 |
(1 row)
Time: 0.525 ms
test=# select i.id,i.date, lag(date) over (order by date) from inv i where id in (2,3) order by id ;
id | date | lag
----+---------------------+---------------------
2 | 2017-01-30 00:00:00 | 2017-01-29 00:00:00
3 | 2017-01-29 00:00:00 |
it will calculate on the data it will get from the table in that query it is bounded in that query see here 3 has a lag but could not get it cause query is not allowing it to have it ....something in left join needs to be done so the lag date can be taken from same table but calling it again in from clause Thanks Again buddy
Like here?:
t=# select date as d1,
lag(date) over (order by date)
from inv
order by 1 desc;
d1 | lag
---------------------+---------------------
2017-01-31 00:00:00 | 2017-01-30 00:00:00
2017-01-30 00:00:00 | 2017-01-29 00:00:00
2017-01-29 00:00:00 | 2017-01-28 00:00:00
2017-01-28 00:00:00 | 2017-01-27 00:00:00
2017-01-27 00:00:00 |
(5 rows)
Time: 1.416 ms

Aggregate function (SUM) on 5 newest rows in table

This is an perplexing SQL problem (at least to me) involving GROUP BY and AGGREGATES... would love any help.
Im working on a site that logs information about bike rides and riders. We have a table which contains a rider id, ride date, and ride distance. I want to display a table with the latest rides, and distances, as well as a total distance for each of those riders. Here is my sql and output (where id is rider id):
+--------+---------------------+----------+
| id | dated | distance |
+--------+---------------------+----------+
| 101240 | 2012-11-30 00:00:00 | 250 |
| 101332 | 2012-11-22 00:00:00 | 31 |
| 101313 | 2012-11-21 00:00:00 | 15 |
| 101319 | 2012-11-21 00:00:00 | 25 |
| 101320 | 2012-11-21 00:00:00 | 56 |
+--------+---------------------+----------+
This is easy to get with:
SELECT id, dated, distance FROM rides ORDER BY dated LIMIT 5
What I can't seem to figure out is getting the riders cumulative total for these most recent rides... Basically:
SELECT sum(distance) FROM rides GROUP BY id
Is it possible to handle all this in SQL without having to do something programmatic? I've tried doing some subqueries and JOINS but to no avail yet!
Thanks in advance SO community.
Duh, I should have known my data schema a little better. I had been trying to work with the wrong id column which was actually a serialized row id, and not a riders id. A working version of SQL (on MYSQL) is:
SELECT r.rider, rr.dated, rr.distance, i.firstname, i.lastname, sum(r.distance)
FROM rides r
INNER JOIN (SELECT rider, distance, dated FROM rides ORDER BY dated DESC LIMIT 5) rr ON r.rider = rr.rider
INNER JOIN riders i ON r.rider = i.id
GROUP BY r.rider ORDER BY rr.dated DESC;
This returns:
+-------+---------------------+----------+-----------+----------+-----------------+
| rider | dated | distance | firstname | lastname | sum(r.distance) |
+-------+---------------------+----------+-----------+----------+-----------------+
| 3304 | 2012-11-30 00:00:00 | 250 | venkatesh | ss | 250 |
| 647 | 2012-11-22 00:00:00 | 31 | ralph | suelzle | 22726 |
| 2822 | 2012-11-21 00:00:00 | 15 | humberto | calderon | 10421 |
| 2339 | 2012-11-21 00:00:00 | 25 | Judy | Rutter | 8545 |
| 1452 | 2012-11-21 00:00:00 | 56 | Fred | Stearley | 64366 |
+-------+---------------------+----------+-----------+----------+-----------------+
Thanks for your answers!
Would something like this work? BTW, what sql server are using?
SELECT sum(distance)
FROM (SELECT distance FROM rides ORDER BY dated DESC LIMIT 5)