Adding summary statistics to an existing table in SQL - sql

I am trying to add summary statistics (just total and average) to a table with 21 columns and 7 rows of data, I would like the two rows of summary statistics to start at row 8. I've been trying a query along these lines without any luck:
SELECT *
FROM
( SELECT 1,
weekday, summer_member_total, summer_member_avg_duration, summer_casual_total, summer_casual_avg_duration,
fall_member_total, fall_member_avg_duration, fall_casual_total, fall_casual_avg_duration,
winter_member_total, winter_member_avg_duration, winter_casual_total, winter_casual_avg_duration,
spring_member_total, spring_member_avg_duration, spring_casual_total, spring_casual_avg_duration,
member_total, member_avg_duration, casual_total, casual_avg_duration,
FROM `case-study-319921.2020_2021_Trip_Data.2020_2021_Summary_Stats`
UNION ALL
SELECT 8,
'TOTAL',
SUM(summer_member_total),
SUM(summer_member_avg_duration),
SUM(summer_casual_total),
SUM(summer_casual_avg_duration),
SUM(fall_member_total),
SUM(fall_member_avg_duration),
SUM(fall_casual_total),
SUM(fall_casual_avg_duration),
SUM(winter_member_total),
SUM(winter_member_avg_duration),
SUM(winter_casual_total),
SUM(winter_casual_avg_duration),
SUM(spring_member_total),
SUM(spring_member_avg_duration),
SUM(spring_casual_total),
SUM(spring_casual_avg_duration),
SUM(member_total),
SUM(member_avg_duration),
SUM(casual_total),
SUM(casual_avg_duration),
FROM `case-study-319921.2020_2021_Trip_Data.2020_2021_Summary_Stats`
UNION ALL
SELECT 9,
'AVG',
AVG(summer_member_total),
AVG(summer_member_avg_duration),
AVG(summer_casual_total),
AVG(summer_casual_avg_duration),
AVG(fall_member_total),
AVG(fall_member_avg_duration),
AVG(fall_casual_total),
AVG(fall_casual_avg_duration),
AVG(winter_member_total),
AVG(winter_member_avg_duration),
AVG(winter_casual_total),
AVG(winter_casual_avg_duration),
AVG(spring_member_total),
AVG(spring_member_avg_duration),
AVG(spring_casual_total),
AVG(spring_casual_avg_duration),
AVG(member_total),
AVG(member_avg_duration),
AVG(casual_total),
AVG(casual_avg_duration),
FROM `case-study-319921.2020_2021_Trip_Data.2020_2021_Summary_Stats` )
ORDER BY 1
Any ideas on how to approach this?

As an option to six your issue - replace
SELECT 1,
weekday, summer_
with
SELECT 1,
CAST(weekday AS STRING) weekday , summer_

Related

Bigquery: get tables' sizes from all datasets

I have a simple query that returns the tabels' sizes for each table in the dataset orders:
SELECT
table_id,
TRUNC(size_bytes/1024/1024/1024/1024,2) size_tb,
FROM orders.__TABLES__
If I wish to run this query once for the whole project and all its tables, how can I do it?
I tried to change the last row to From __TABLES__ but that is an error.
I use this Python script for something similar (probably originate in Stackoverflow) with my adjustments
from google.cloud import bigquery
client = bigquery.Client()
datasets = list(client.list_datasets())
project = client.project
sizes = []
if datasets:
print('Datasets in project {}:'.format(project))
for dataset in datasets: # API request(s)
print('Dataset: {}'.format(dataset.dataset_id))
query_job = client.query("select table_id, sum(size_bytes)/pow(10,9) as size from `"+dataset.dataset_id+"`.__TABLES__ group by 1")
results = query_job.result()
for row in results:
print("\tTable: {} : {}".format(row.table_id, row.size))
item = {
'project': project,
'dataset': dataset.dataset_id,
'table': row.table_id,
'size': row.size
}
sizes.append(item)
else:
print('{} project does not contain any datasets.'.format(project))
You could use INFORMATION_SCHEMA data to query
select
project_id,
TABLE_SCHEMA,
TABLE_NAME,
sum(TOTAL_PHYSICAL_BYTES) / pow(10,9) as size
from
project.region.INFORMATION_SCHEMA.TABLE_STORAGE
group by 1,2, 3
order by size DESC
Where project is your project name and region is region where data is located (e.g. region-us). Refer to https://cloud.google.com/bigquery/docs/information-schema-table-storage for more info
Ok. Lets consider doing it in some steps:
Step 1 - List a single project and own datasets:
SELECT
string_agg(concat("SELECT * FROM `$_PROJECT_ID.", schema_name, ".__TABLES__` ")," UNION ALL \n")
FROM
`$_PROJECT_ID`.INFORMATION_SCHEMA.SCHEMATA;
OR IF ISNT FOR A SINGLE PROJECT:
Step 1.1 - List All projects consider then are been used in queries stories in last 6m (180 days):
WITH LISTA_PROJETOS AS (
SELECT DISTINCT R.PROJECT_ID
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION J, UNNEST(REFERENCED_TABLES) R
ORDER BY 1 ASC
), RESULTADOS AS (
SELECT 'SELECT \n\t' ||AGG_RESULTADOS FROM (
SELECT STRING_AGG('(SELECT STRING_AGG(CONCAT("SELECT * FROM `'||PROJECT_ID||'.", SCHEMA_NAME, ".__TABLES__` UNION ALL "), "\\n") FROM `'||PROJECT_ID||'`.INFORMATION_SCHEMA.SCHEMATA)', ' ||"\\n"||\n\t') AS AGG_RESULTADOS
FROM LISTA_PROJETOS
)
)
SELECT * FROM RESULTADOS;
If you choose the step 1.1 then you must copy all to clipboard the one line output from step 1.1 and execute it.
So you will have something like it:
SELECT * FROM `teste.raw.__TABLES__` UNION ALL
SELECT * FROM `teste.stage.__TABLES__` UNION ALL
Take care... the maximum list of unions for this query is 100.
You must remove the last UNION ALL from last query for it works.
Then you should do the next step:
Step 2:
/***** Query onde será feita a consulta... *****/
SELECT
project_id,
dataset_id,
table_id,
concat(project_id,':',dataset_id,'.',table_id) objeto,
case type
when 1 then 'TABLE'
when 2 then 'VIEW'
else 'OTHER'
end as tipo,
row_count as qtd_linhas,
round(size_bytes/power(1024, 3), 2) as tamanho_gb,
FORMAT_TIMESTAMP('%Y-%m-%d %H:%M:%S', TIMESTAMP_MILLIS(creation_time), 'America/Sao_Paulo') as data_criacao,
FORMAT_TIMESTAMP('%Y-%m-%d %H:%M:%S', TIMESTAMP_MILLIS(last_modified_time), 'America/Sao_Paulo') as ultima_modificacao, /*Dados somente L6M (GCP)*/
FORMAT_TIMESTAMP('%Y-%m-%d %H:%M:%S', MAX(last_query_in), 'America/Sao_Paulo') as ultima_consulta_em,
MAX(user_email) as consultado_por
FROM (
/***** HERE YOU SHOULD PASTE THE CODE OUTPUT FROM STEP 1 OR 1.1 *****/
SELECT * FROM `teste.raw.__TABLES__` UNION ALL
SELECT * FROM `teste.stage.__TABLES__`
/***** HERE YOU SHOULD PASTE THE CODE OUTPUT FROM STEP 1 OR 1.1 *****/
) AS tables
LEFT JOIN (
SELECT
creation_time AS last_query_in, user_email,
x
FROM
`region-us`.INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION,
UNNEST(referenced_tables) AS x)
ON
project_id=x.project_id
AND x.dataset_id=dataset_id
AND x.table_id=table_id
GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9 ORDER BY 2, 7
Finally you have the data you've desired.
Let me know if this helps you, ok?

SQL - query timing out when pulling records for most recent date with a subquery

I'm trying to pull values for the most recent date (COMPUTE_DAY) in a very large dataset - this seems to be a frequently asked question, with the most common solution to be using a subquery on the same table. Unfortunately, my query is timing out every time I try that. The table is partitioned on two columns, REGION and COMPUTE_DAY, with primary keys REGION, COMPUTE_DAY and PLAN_UUID. Are there any ways I can make this query more efficient?
SELECT /*+ use_hash(ipp,ipp2) */
ipp.COMPUTE_DAY,
ipp.ITEM,
ipp.MANUFACTURER,
ipp.ORDER_DATE,
ipp.CARTON,
sum(ipp.TARGET_INVENTORY) as 1,
sum(ipp.CURRENT_INVENTORY) as 2,
sum(ipp.DEMAND) as 3,
sum(ipp.ORDERS) as 4,
sum(ipp.SHIPMENTS) as 5,
sum(ipp.QUANTITY) as 6,
FROM
table ipp
WHERE
ipp.REGION = 1
AND ipp.COMPUTE_DAY = (select max(ipp2.COMPUTE_DAY) from O_IP_PLANS ipp2 where ipp2.REGION_ID = 1 AND ipp2.COMPUTE_DAY BETWEEN TO_DATE('{RUN_DATE_YYYY/MM/DD}','YYYY/MM/DD')-7 AND TO_DATE('{RUN_DATE_YYYY/MM/DD}','YYYY/MM/DD') AND ipp2.PLAN_UUID = ipp.PLAN_UUID)
AND ipp.GROUP_ID = 121
AND ipp.IOG = 1
AND ipp.INTENT = 'YES'
GROUP BY ipp.COMPUTE_DAY,
ipp.ITEM,
ipp.MANUFACTURER,
ipp.ORDER_DATE,
ipp.CARTON;

Oracle SQL aggregate rows into column listagg with condition

I am having the following - simplified - layout for tables:
TABLE blocks (id)
TABLE content (id, blockId, order, data, type)
content.blockId is a foreign key to blocks.id. The idea is that in the content table you have many content entries with different types for one block.
I am now looking for a query that can provide me with an aggregation based on a blockId where all the content entries of the 3 different types are concatenated and put into respective columns.
I have already started and found the listagg function which is working well, I did the following statement and lists me all the content entries in a column:
SELECT listagg(c.data, ',') WITHIN GROUP (ORDER BY c.order) FROM content c WHERE c.blockId = 330;
Now the concatenated string however contains all the data elements of the block in one column. What I would like to achieve is that its put into separate columns based on the type. For example the following content of content would be like this:
1, 1, 0, "content1", "FRAGMENT"
2, 1, 1, "content2", "BULK"
3, 1, 3, "content4", "FRAGMENT"
4, 1, 2, "content3", "FRAGMENT"
Now I wanted to get as an output 2 columns, one is FRAGMENT and one is BULK, where FRAGMENT contains "content1;content3;content4" and BULK contains "content2"
Is there an efficient way of achieving this?
You can use case:
SELECT listagg(CASE WHEN content = 'FRAGMENT' THEN c.data END, ',') WITHIN GROUP (ORDER BY c.order) as fragments,
listagg(CASE WHEN content = 'BULK' THEN c.data END, ',') WITHIN GROUP (ORDER BY c.order) as bulks
FROM content c
WHERE c.blockId = 330;
As an alternative, if you want it more dynamic, you could pivot the outcome.
Note, that this will only work for Oracle 11.R2. Here´s an example how it could look like:
select * from
(with dataSet as (select 1 idV, 1 bulkid, 0 orderV, 'content1' dataV, 'FRAGMENT' typeV from dual union
select 2, 1, 1, 'content2', 'BULK' from dual union
select 3, 1, 3, 'content4', 'FRAGMENT' from dual union
select 4, 1, 2, 'content3', 'FRAGMENT' from dual)
select typeV, listagg(dataSet.dataV ,',') WITHIN GROUP (ORDER BY orderV) OVER (PARTITION BY typeV) dataV from dataSet)
pivot
(
max(dataV)
for typeV in ('BULK', 'FRAGMENT')
)
O/P
Bulk | FRAGMENT
-----------------
content2 | content1,content3,content4
The important things here:
OVER (PARTITION BY typeV): this acts like a group by for the listagg, concatinating everything having the same typeV.
for typeV in ('BULK', 'FRAGMENT'): this will gather the data for BULK and FRAGMENT and produce separate columns for each.
max(dataV) simply to provide a aggregate function, otherwise pivot wont work.

Grouping in SQL Hierarchy

I'm still new to SQL and my question here is kinda long. Ok here it is...my task is to calculate the total downtime of machines situated in a particular location. Each machine has parent, children and grandchildren. For eg:
Location:A1
Machine no:A1-100, A1-100-01, A1-100-01-001, A1-200, A1-200-01
(A1-100-01, A1-100-01-001 belongs to A1-100) and (A1-200-01 belongs to A1-200)
This is my SQL query:
select machine_no, downtime from table_name where location='A1'
The output is:
machine_no downtime
A1-100-01 2
A1-100 1.5
A1-200 3
A1-100-01-001 0.5
A2-200-01 1.5
My question is how do I group the children and grandchildren to their parent and display the total downtime of that group? I'm sorry if the question is confusing but basically I want the output to be like this:
machine_no total_downtime
A1-100 4 (total of A1-100,A1-100-01,A1-100-01-001)
A1-200 4.5 (total of A1-200,A1-200-01)
Thank you.
try the following query:
SELECT machine_no, SUM(downtime) as total_downtime
FROM (
SELECT
SUBSTR(machine_no, 1,
CASE WHEN INSTR(machine_no, '-', 1, 2) = 0 THEN LENGTH(machine_no) ELSE INSTR(machine_no, '-', 1, 2)-1 END
) as machine_no, -- this will get the part of machine_no before the second '-' char
downtime
FROM MyTable
WHERE location='A1'
) InnerQuery
GROUP BY machine_no
output:
machine_no total_downtime
A1-100 4
A1-200 4.5
You don't actually need the inner query, but it's more readable then grouping by the SUBSTR(....) expression.
Play with it yourself on sql fiddle
You can use group by along with sum like :
select machine_no, sum(downtime) from table_name where location like 'A1-100%' group by machine_no;

Trouble with SQL UNION operation

I have the following table:
I am trying to create an SQL query that returns a table that returns three fields:
Year (ActionDate), Count of Built (actiontype = 12), Count of Lost (actiontype = a few different ones)
Bascially, ActionType is a lookup code. So, I'd get back something like:
YEAR CountofBuilt CountofLost
1905 30 18
1929 12 99
1940 60 1
etc....
I figured this would take two SELECT statements put together with a UNION.
I tried the following below but it only spits back two columns (year and countbuilt). My countLost field doesn't appear
My sql currently (MS Access):
SELECT tblHist.ActionDate, Count(tblHist.ActionDate) as countBuilt
FROM ...
WHERE ((tblHist.ActionType)=12)
GROUP BY tblHist.ActionDate
UNION
SELECT tblHist.ActionDate, Count(tblHist.ActionDate) as countLost
FROM ...
WHERE (((tblHist.ActionType)<>2) AND
((tblHist.ActionType)<>3))
GROUP BY tblHist.ActionDate;
Use:
SELECT h.actiondate,
SUM(IIF(h.actiontype = 12, 1, 0)) AS numBuilt,
SUM(IIF(h.actiontype NOT IN (2,3), 1, 0)) AS numLost
FROM tblHist h
GROUP BY h.actiondate
You should not use UNION for such queries. There are many ways to do what you want, for example
Updated to fit access syntax
SELECT tblHist.ActionDate,
COUNT(SWITCH(tblHist.ActionType = 12,1)) as countBuilt,
COUNT(SWITCH(tblHist.ActionType <>1 OR tblHist.ActionType <>2 OR ...,1)) as countLost
FROM ..
WHERE ....
GROUP BY tblHist.ActionDate