Query across partitions in partitioned table in BigQuery SQL - sql

I got a BigQuery table partitioned on a daily basis and i need to return the average of a columns for all the partition that belongs to a month. In other words, my desired output would have a column for year-month and a column with the average of the values of the same columns over a month: |year-month|avg_value|. A group_by partition, if you want.
I managed to to access those informations for a single partition, e.g. for february 2023:
SELECT *
FROM `name_of_the_table`
WHERE EXTRACT(YEAR FROM DATE(_PARTITIONTIME))= 2023
AND EXTRACT(MONTH FROM DATE(_PARTITIONTIME)) = 02
)
and by looping over this query i can reach my desired output, but i look forward to learn a smarter solution.
I also have managed to access the partition infos:
SELECT table_name, partition_id, total_rows
FROM `dataset_name.INFORMATION_SCHEMA.PARTITIONS`
WHERE partition_id IS NOT NULL
and table_name = 'table_name'
order by partition_id desc
but have no clues on how to merge those infos. Thank you in advance.

Sounds like you're looking for a group by.
SELECT
FORMAT_TIMESTAMP("%Y%m", _PARTITIONTIME) year_month,
COUNT(*)
FROM
`table`
GROUP BY
1
ORDER BY
1 DESC

Related

How to apply avg function on top of select query in Oracle?

Select (SYSDATE - CREATED_DATE_emplogin) as newinfo, USER_ID from emp;
Based on the result of the above query I want to take the average on newinfo column
I am very much new to SQL I don't understand what to do next
We are oracle db.
If it is average per user_id, then apply avg to date difference and include group by clause (which contains all non-aggregated columns):
Select avg(SYSDATE - CREATED_DATE_emplogin) as newinfo,
USER_ID
from emp
group by user_id

Postgresql select distinct for every unique quarter

I have a query
select * from table_name where sales=0
from which i got the following data
Now, for every quarter I want to select distinct rows so that final data will be like this
I am able to get data for individual quarter with
select distinct quarter, id from some_view where quarter='2020-Q2'
but I am unable to get a single query which can accommodate all quarters data.
please suggest how to proceed
I might be oversimplifying this... But you do seem to want:
select distinct quarter, id from some_view where sales = 0
This gives you all distinct querter/id tuples, out of rows that satisfy the where clause.

How to select the first observation in a category in PostgreSQL

My table contains different house IDs(dataid), time of observation(readtime), meter reading Basic Output
And the query is as follows Query statement :
select *
from university.gas_ert
where readtime between '01/01/2014' and '01/02/2014'
I am trying to get only the first observation of each day of all the dataids between the time span. I have tried GROUP BY, but it doesn't seem working.
Distinct ON could make your query much more simple.. More read in Documentation
Definition :
Keeps only the first row of each set of rows where the given
expressions evaluate to equal. Note that the “first row” of each set
is unpredictable unless ORDER BY is used to ensure that the desired
row appears first.
SELECT
DISTINCT ON (meter_value) meter_value,
dataid,
readtime
FROM
university.gas.ert
WHERE
readtime between '2014-01-01' and '2014-01-02'
ORDER BY
meter_value,
readtime ASC;
If you want one row for each unique dataid within the time range, you should use the DISTINCT ON construction. The following query will give you a row for each dataid for each day in the range described in the WHERE clause and lets you extend the range if you want to return rows for each day x dataid combination.
select distinct on(dataid, date_trunc('day', readtime)) *
from university.gas_ert
where readtime between '2014-01-01' and '2014-01-02'
order by dataid, date_trunc('day', readtime) asc
You can take a look at window functions to help out in this. ROW_NUMBER.
GROUP the records on the basis of day using date_trunc(ie without the time component) and then rank them on the basis of readtime asc
select *
from (
select *
,row_number() over(partition by date_trunc('day',a.readtime) order by a.readtime asc ) as rnk
from university.gas_ert a
)x
where x.rnk=1

Efficient query for the first result in groups (postgresql 9)

I have a table with 200000 rows and columns: name and date. The dates and names may have repeated values. I would like get the first 300 unique names for the dates sorted in an ascending order and have this run fast as my table may have a million rows.
I am using postgresql 9.
SELECT name, date
FROM
(
SELECT DISTINCT ON (name) name, date
FROM table
ORDER BY name, date
) AS id_date
ORDER BY date
LIMIT 300;
The last query of #jachguate will miss names having two dates on the same date, however this one doesn't.
The query takes about 100 ms in a non-optimized postgresql 9.1 with about 100.000 entries, thus it may not scale to millions of entries.
An upgrade to postgresql 9.2 may help, as according to the release notes there are many performance improvements
use a CTE:
with unique_date_name as (
select date, name, count(*) rcount
from table
group by date, name
having count(*) = 1
)
select name, date
from unique_date_name
order by date limit 300;
Edit
From the comments, this result in poor performance, so try this other:
select date, name, count(*) rcount
from table
group by date, name
having count(*) = 1
order by date limit 300;
or, transforming the original query into a nested subquery in FROM instead of a CTE:
select name, date
from (
select date, name, count(*) rcount
from table
group by date, name
having count(*) = 1
) unique_date_name
order by date limit 300;
unfortunately I don't have a postgreSQL at hand to check if it works, but the optimizer will make a better work.
A Index for (date, name) is a must for optimal performance.

Fastest way to identify differences between two tables?

I have a need to check a live table against a transactional archive table and I'm unsure of the fastest way to do this...
For instance, let's say my live table is made up of these columns:
Term
CRN
Fee
Level Code
My archive table would have the same columns, but also have an archive date so I can see what values the live table had at a given date.
Now... How would I write a query to ensure that the values for the live table are the same as the most recent entries in the archive table?
PS I'd prefer to handle this in SQL, but PL/SQL is also an option if it's faster.
SELECT term, crn, fee, level_code
FROM live_data
MINUS
SELECT term, crn, fee, level_code
FROM historical_data
Whats on live but not in historical. Can then union to a reverse of this to get whats in historical but not live.
Simply:
SELECT collist
FROM TABLE A
minus
SELECT collist
FROM TABLE B
UNION ALL
SELECT collist
FROM TABLE B
minus
SELECT collist
FROM TABLE A;
You didn't mention how rows are uniquely identified, so I've assumed you also have an "id" column:
SELECT *
FROM livetable
WHERE (term, crn, fee, levelcode) NOT IN (
SELECT FIRST_VALUE(term) OVER (ORDER BY archivedate DESC)
,FIRST_VALUE(crn) OVER (ORDER BY archivedate DESC)
,FIRST_VALUE(fee) OVER (ORDER BY archivedate DESC)
,FIRST_VALUE(levelcode) OVER (ORDER BY archivedate DESC)
FROM archivetable
WHERE livetable.id = archivetable.id
);
Note: This query doesn't take NULLS into account - if any of the columns are nullable you can add suitable logic (e.g. NVL each column to some "impossible" value).
unload to table.unl
select * from table1
order by 1,2,3,4
unload to table2.unl
select * from table2
order by 1,2,3,4
diff table1.unl table2.unl > diff.unl
Could you use a query of the form:
SELECT your columns FROM your live table
EXCEPT
SELECT your columns FROM your archive table WHERE archive date is most recent;
Any results will be rows in your live table that are not in your most recent archive.
If you also need rows in your most recent archive that are not in your live table, simply reverse the order of the selects, and repeat, or get them all in the same query by performing a (live UNION archive) EXCEPT (live INTERSECTION archive)