Apply partitioning over a list in bigquery - google-bigquery

I applied partitioning in the fact table on the country column using the FARM_FINGERPRINT function (ABS(MOD(FARM_FINGERPRINT((COUNTRY)),4000)). Now based on a different table (table1) that contains the list of countries for each zone (for example in 'Europe' we have 'France', 'Germany', 'Espagne') I want to run a query that detect the list of countries inside a given zone and run a where clause based on that list applying partitioning (to avoid full scan). But when i run this query partitioning is not applied :
WITH
step1 AS (
SELECT
ARRAY_AGG(ABS(MOD(FARM_FINGERPRINT((MULTIDIVISION_CLUSTER_CODE)),4000))) AS list
FROM (
SELECT
DISTINCT(MULTIDIVISION_CLUSTER_CODE) AS MULTIDIVISION_CLUSTER_CODE
FROM
`project.dataset.table1` table1
WHERE
table1.MULTIDIVISION_ZONE = "Europe" )),
step2 AS(
SELECT
*
FROM
`project.dataset.table2`
WHERE
_hash_partition IN UNNEST((select list from step1))
)
SELECT
*
FROM
step2
For information if i replace "_hash_partition IN UNNEST((select list from step1)" with "_hash_partition IN (2591,287,3623,1537)" or "_hash_partition IN UNNEST(([2591,287,3623,1537]))" it works (query do not do a full scan)
Table1:
(zone , country)
Table2:
(date, zone, country, _hash_partition, mesure)

You may try a dynamic sql below. FORMAT function will generate the query same as what you said works.
-- simplified query for *step1* CTE.
CREATE TEMP TABLE step1 AS SELECT [2591,287,3623,1537] AS list;
EXECUTE IMMEDIATE FORMAT("""
WITH step2 AS (
SELECT
*
FROM
`project.dataset.table2`
WHERE
_hash_partition IN UNNEST(%s)
)
SELECT * FROM step2;
""", (SELECT FORMAT('%t', list) FROM step1));

Related

Split single row value to multiple rows in Snowflake

I have a table where the column data has a combination of values seperated by ';'. I would like to split them into rows for each column value.
Table data
Now I would like to split them into multiple rows for each value like
I have tried using the below SQL statement.
SELECT DISTINCT COL_NAME FROM "DB"."SCHEMA"."TABLE,
LATERAL FLATTEN(INPUT=>SPLIT(COL_NAME,';'))
But the output is not as expected. Attaching the query output below.
Basically the query does nothing to my data.
It could be achieved using SPLIT_TO_TABLE table function:
This table function splits a string (based on a specified delimiter) and flattens the results into rows.
SELECT *
FROM tab, LATERAL SPLIT_TO_TABLE(column_name, ';')
I was able to resolve this by using LATERAL FLATTERN like a joining table and selecting the value from it.
SELECT DISTINCT A.VALUE AS COL_NAME
FROM "DB"."SCHEMA"."TABLE",
LATERAL SPLIT_TO_TABLE(COL_NAME,';')A
Looks your data has multiple delimiters , We can leverage STRTOK_SPLIT_TO_TABLE function using multiple delimiters..
STRTOK_SPLIT_TO_TABLE
WITH data AS (
SELECT *
FROM VALUES
('Greensboro-High Point-Winston-Salem;Norfolk-Portsmouth-Newport News Washington, D.C. Roanoke-Lynchburg Richmond-Petersburg')
v( cities))
select *
from data, lateral strtok_split_to_table(cities, ';-')
order by seq, index;
Result:
Your first attempt was very close, you just need to access the out of the flatten, instead of the input to the flatten
so using this CTE for data:
WITH fake_data AS (
SELECT *
FROM VALUES
('Greensboro-High Point-Winston-Salem;Norfolk-Portsmouth-Newport News;Washington, D.C.;Roanoke-Lynchburg;Richmond-Petersburg'),
('Knoxville'),
('Knoxville;Memphis;Nashville')
v( COL_NAME)
)
if you had aliased you tables, and accessed the parts.
SELECT DISTINCT f.value::text as col_name
FROM fake_data d,
LATERAL FLATTEN(INPUT=>SPLIT(COL_NAME,';')) f
;
which is what you did in your provided answer, but via SPLIT_TO_TABLE
SELECT DISTINCT f.value as col_name
FROM fake_data d,
TABLE(SPLIT_TO_TABLE(COL_NAME,';')) f
;
STRTOK_SPLIT_TO_TABLE also is the same thing:
SELECT DISTINCT f.value as col_name
FROM fake_data d,
TABLE(strtok_split_to_table(COL_NAME,';')) f
;
Which can also be done via a strtok_to_array and FLATTEN that
SELECT DISTINCT f.value as col_name
FROM fake_data d,
TABLE(FLATTEN(input=>STRTOK_TO_ARRAY(COL_NAME,';'))) f
;
COL_NAME
Greensboro-High Point-Winston-Salem
Norfolk-Portsmouth-Newport News
Washington, D.C.
Roanoke-Lynchburg
Richmond-Petersburg
Knoxville
Memphis
Nashville

How to modify data of select query before inserting from one table to another table

I have 2 tables Temp and Final. I want to insert data from temp table to final table. There is already a complex query written for that.
INSERT INTO public.Final
(
select id, class, type, meta_id, time, zone, geom
from public.Temp where ....
)
Now I want to add further criteria on geometry where in I want to merge polygons and then remove overlapping geometries. I have 2 separate queries written for those tasks. I can't combine into one select as it is already complex.
These queries I want to apply before inserting data into final table. Is it possible that output of one select query goes to input of another select query?
INSERT INTO public.Final
(
/*step 3 final output */
select non overlapping geometries where ...
(/*step 2*/
select merged geometries
where ...
(/*step 1*/select valid geometries where ...)
)
)
If you could give me any example on how to do it, that would be great! thanks.
yes, there are multiple ways , but needs more detail to know better which way is suitable for your case :
using JOIN:
SELECT *
FROM (SELECT * FROM COMPLEXQUERY1) AS Q1
JOIN (SELECT * FROM COMPLEXQUERY2) AS Q2
ON Q1.ID = Q2.ID
JOIN (SELECT * FROM COMPLEXQUERY3) AS Q3
ON Q2.ID = Q3.ID
using IN and/or EXISTS IN:
SELECT *
FROM (SELECT * FROM COMPLEXQUERY1) AS Q1
WHERE EXISTS
(
SELECT * FROM COMPLEXQUERY2 Q2
WHERE Q1.ID = Q2.ID
AND Q2.ID2 IN
(
SELECT id from COMPLEXQUERY3
)
)

SQL returns SQL as result. How do i run the returned SQL?

using the crosstab method to dynamically pivot a hstore thats suggested here. Is there any way i can instantly run the sql returned or is there way to have a function that would call the original crosstab then run the sql to return the pivoted table as the only result instead of running two queries ?
so when i run this query
SELECT format(
$s$SELECT * FROM crosstab(
$$SELECT h.id, kv.*
FROM hstore_test h, each(hstore_col) kv
ORDER BY 1, 2$$
, $$SELECT unnest(%L::text[])$$
) AS t(id int, %s text);
$s$
, array_agg(key) -- escapes strings automatically
, string_agg(quote_ident(key), ' text, ') -- needs escaping!
) AS sql
FROM (
SELECT DISTINCT key
FROM hstore_test, skeys(hstore_col) key
ORDER BY 1
) sub;
it returns a result that looks like this:
SELECT * FROM crosstab(
$$SELECT h.id, kv.*
FROM hstore_test h
LEFT JOIN LATERAL each(hstore_col) kv ON TRUE
ORDER BY 1, 2$$
, $$SELECT unnest('{key1,key2,key3}'::text[])$$
) AS t(id int, key1 text, key2 text, key3 text);
what i want to do is either with a function or another query wrapped around the first one. return the results of the second query and use that returned data to build a materialized view

BigQuery: How to create integer partitioned table via DML?

I try to understand how the integer partitioned tables work. So far however, I could not create one.
What is wrong with this query:
#standardSQL
CREATE or Replace TABLE temp.test_int_partition
PARTITION BY RANGE_BUCKET(id, GENERATE_ARRAY(0,100))
OPTIONS(
description="test int partition"
)
as
WITH data as (
SELECT 12 as id, 'Alex' as name
UNION ALL
SELECT 23 as id, 'Chimp' as name
)
SELECT *
from data
I'm getting this error:
Error: PARTITION BY expression must be DATE(<timestamp_column>), a DATE column, or RANGE_BUCKET(<int64_column>, GENERATE_ARRAY(<int64_value>, <int64_value>, <int64_value>))
The issue is that despite GENERATE_ARRAY being documented as GENERATE_ARRAY(start_expression, end_expression [, step_expression]), meaning step_expression being optional, for the RANGE_BUCKET it's mandatory.
So the following will work:
#standardSQL
CREATE or Replace TABLE temp.test_int_partition
PARTITION BY RANGE_BUCKET(id, GENERATE_ARRAY(0,100,1))
OPTIONS(
description="test int partition"
)
as
WITH data as (
SELECT 12 as id, 'Alex' as name
UNION ALL
SELECT 23 as id, 'Chimp' as name
)
SELECT *
from data

Converting a pivot table to a flat table in SQL

I would like to transform a pivot table into a flat table, but in the following fashion: consider the simple example of this table:
As you can see, for each item - Address or Income -, we have a column for old values, and a column for new (updated values). I would like to convert the table to a "flat" table, looking like:
Is there an easy way of doing that?
Thank you for your help!
In order to get the result, you will need to UNPIVOT the data. When you unpivot you convert the multiple columns into multiple rows, in doing so the datatypes of the data must be the same.
I would use CROSS APPLY to unpivot the columns in pairs:
select t.employee_id,
t.employee_name,
c.data,
c.old,
c.new
from yourtable t
cross apply
(
values
('Address', Address_Old, Address_new),
('Income', cast(income_old as varchar(15)), cast(income_new as varchar(15)))
) c (data, old, new);
See SQL Fiddle with demo. As you can see this uses a cast on the income columns because I am guessing it is a different datatype from the address. Since the final result will have these values in the same column the data must be of the same type.
This can also be written using CROSS APPLY with UNION ALL:
select t.employee_id,
t.employee_name,
c.data,
c.old,
c.new
from yourtable t
cross apply
(
select 'Address', Address_Old, Address_new union all
select 'Income', cast(income_old as varchar(15)), cast(income_new as varchar(15))
) c (data, old, new)
See Demo
select employee_id,employee_name,data,old,new
from (
select employee_id,employee_name,adress_old as old,adress_new as new,'ADRESS' as data
from employe
union
select employee_id,employee_name,income_old,income_new,'INCOME'
from employe
) data
order by employee_id,data
see this fiddle demo : http://sqlfiddle.com/#!2/64344/7/0