BigQuery: How to create integer partitioned table via DML? - google-bigquery

I try to understand how the integer partitioned tables work. So far however, I could not create one.
What is wrong with this query:
#standardSQL
CREATE or Replace TABLE temp.test_int_partition
PARTITION BY RANGE_BUCKET(id, GENERATE_ARRAY(0,100))
OPTIONS(
description="test int partition"
)
as
WITH data as (
SELECT 12 as id, 'Alex' as name
UNION ALL
SELECT 23 as id, 'Chimp' as name
)
SELECT *
from data
I'm getting this error:
Error: PARTITION BY expression must be DATE(<timestamp_column>), a DATE column, or RANGE_BUCKET(<int64_column>, GENERATE_ARRAY(<int64_value>, <int64_value>, <int64_value>))

The issue is that despite GENERATE_ARRAY being documented as GENERATE_ARRAY(start_expression, end_expression [, step_expression]), meaning step_expression being optional, for the RANGE_BUCKET it's mandatory.
So the following will work:
#standardSQL
CREATE or Replace TABLE temp.test_int_partition
PARTITION BY RANGE_BUCKET(id, GENERATE_ARRAY(0,100,1))
OPTIONS(
description="test int partition"
)
as
WITH data as (
SELECT 12 as id, 'Alex' as name
UNION ALL
SELECT 23 as id, 'Chimp' as name
)
SELECT *
from data

Related

Apply partitioning over a list in bigquery

I applied partitioning in the fact table on the country column using the FARM_FINGERPRINT function (ABS(MOD(FARM_FINGERPRINT((COUNTRY)),4000)). Now based on a different table (table1) that contains the list of countries for each zone (for example in 'Europe' we have 'France', 'Germany', 'Espagne') I want to run a query that detect the list of countries inside a given zone and run a where clause based on that list applying partitioning (to avoid full scan). But when i run this query partitioning is not applied :
WITH
step1 AS (
SELECT
ARRAY_AGG(ABS(MOD(FARM_FINGERPRINT((MULTIDIVISION_CLUSTER_CODE)),4000))) AS list
FROM (
SELECT
DISTINCT(MULTIDIVISION_CLUSTER_CODE) AS MULTIDIVISION_CLUSTER_CODE
FROM
`project.dataset.table1` table1
WHERE
table1.MULTIDIVISION_ZONE = "Europe" )),
step2 AS(
SELECT
*
FROM
`project.dataset.table2`
WHERE
_hash_partition IN UNNEST((select list from step1))
)
SELECT
*
FROM
step2
For information if i replace "_hash_partition IN UNNEST((select list from step1)" with "_hash_partition IN (2591,287,3623,1537)" or "_hash_partition IN UNNEST(([2591,287,3623,1537]))" it works (query do not do a full scan)
Table1:
(zone , country)
Table2:
(date, zone, country, _hash_partition, mesure)
You may try a dynamic sql below. FORMAT function will generate the query same as what you said works.
-- simplified query for *step1* CTE.
CREATE TEMP TABLE step1 AS SELECT [2591,287,3623,1537] AS list;
EXECUTE IMMEDIATE FORMAT("""
WITH step2 AS (
SELECT
*
FROM
`project.dataset.table2`
WHERE
_hash_partition IN UNNEST(%s)
)
SELECT * FROM step2;
""", (SELECT FORMAT('%t', list) FROM step1));

SQL returns SQL as result. How do i run the returned SQL?

using the crosstab method to dynamically pivot a hstore thats suggested here. Is there any way i can instantly run the sql returned or is there way to have a function that would call the original crosstab then run the sql to return the pivoted table as the only result instead of running two queries ?
so when i run this query
SELECT format(
$s$SELECT * FROM crosstab(
$$SELECT h.id, kv.*
FROM hstore_test h, each(hstore_col) kv
ORDER BY 1, 2$$
, $$SELECT unnest(%L::text[])$$
) AS t(id int, %s text);
$s$
, array_agg(key) -- escapes strings automatically
, string_agg(quote_ident(key), ' text, ') -- needs escaping!
) AS sql
FROM (
SELECT DISTINCT key
FROM hstore_test, skeys(hstore_col) key
ORDER BY 1
) sub;
it returns a result that looks like this:
SELECT * FROM crosstab(
$$SELECT h.id, kv.*
FROM hstore_test h
LEFT JOIN LATERAL each(hstore_col) kv ON TRUE
ORDER BY 1, 2$$
, $$SELECT unnest('{key1,key2,key3}'::text[])$$
) AS t(id int, key1 text, key2 text, key3 text);
what i want to do is either with a function or another query wrapped around the first one. return the results of the second query and use that returned data to build a materialized view

How to query hugeblob data

I wanted to query to the hugeblob attribute in a table. I have tried below, but it doesnt give any data while selecting.
select DBMS_LOB.substr(mydata, 1000,1) from mytable;
Is there any other to do this?
DBMS_LOB.substr() is the right function to use. Ensure that there is data in the column.
Example usage:
-- create table
CREATE TABLE myTable (
id INTEGER PRIMARY KEY,
blob_column BLOB
);
-- insert couple of rows
insert into myTable values(1,utl_raw.cast_to_raw('a long data item here'));
insert into myTable values(2,null);
-- select rows
select id, blob_column from myTable;
ID BLOB_COLUMN
1 (BLOB)
2 null
-- select rows
select id, DBMS_LOB.substr(blob_column, 1000,1) from myTable;
ID DBMS_LOB.SUBSTR(BLOB_COLUMN,1000,1)
1 61206C6F6E672064617461206974656D2068657265
2 null
-- select rows
select id, UTL_RAW.CAST_TO_VARCHAR2(DBMS_LOB.substr(blob_column,1000,1)) from myTable;
ID UTL_RAW.CAST_TO_VARCHAR2(DBMS_LOB.SUBSTR(BLOB_COLUMN,1000,1))
1 a long data item here
2 null

Is it possible to group all the related elements using a SQL statement

I want a SQL statement which can run on all the platforms and give the result in tree structure (not a real tree structure though)format.i.e all the related columns appears together. is it possible to achieve the following format using a sql. I have a simple table with three columns(GROUP_STEP,PREDECESSOR,COLUMNNUM). expected output
Platform supported : Oracle, SQL Server, DB2 and Sybase. I am looking for a SELECT statement from the table having following data in different format.
After #diaho suggestion , following is the output
select groupnum,predecessor ,count(groupnum) as columnum group by groupnum,predecessor
if you give more details on table schema answer may change !!!
Based on your table, I'm assuming that you only have the Group_Step and Predecessor columns in your raw table and that the ColumnNum column represents the total number of levels for a leaf in your tree. For example, since PC_Wrap1 has the Predecessor BENEFITS which has the Predecessor COPY_BUDG it has a total of 3 'levels'. If that is the case, you need a recursive query to calculate the ColumnNum value. I can't speak to all platforms, but for SQL Server, you can use a CTE:
EDIT: Removed 'dreaded non-standard square brackets' per a_horse_with_no_name's suggestion :)
-- Setup table
CREATE TABLE #Temp
(
Group_Step VARCHAR(100),
Predecessor VARCHAR(100)
)
-- Setup dummy data
INSERT INTO #Temp
(
Group_Step,
Predecessor
)
SELECT 'ACT_BD_ACT', '' UNION
SELECT 'COPY_BUDG', '' UNION
SELECT 'COPY_BUDG2', '' UNION
SELECT 'BENEFITS', 'COPY_BUDG' UNION
SELECT 'BENEFITS', 'COPY_BUDG2' UNION
SELECT 'PC_WRAP1', 'BENEFITS' UNION
SELECT 'PC_WRAP2', 'BENEFITS' UNION
SELECT 'ALLC1', '' UNION
SELECT 'ALLC2', '' UNION
SELECT 'ALLC3', 'ALLC2' UNION
SELECT 'TCP1', 'ALLC3' UNION
SELECT 'TCP1', 'ALLC4' UNION
SELECT 'COPY_BUDG3', '' UNION
SELECT 'COPY_BUDG4', '';
-- Actual solution starts here:
WITH Result
(
Group_Step,
Predecessor,
ColumnNum
)
AS
(
-- Anchor member definition
SELECT
Group_Step,
Predecessor AS Predecessor,
1 AS ColumnNum
FROM
#Temp
WHERE
Predecessor = ''
UNION ALL
-- Recursive member definition
SELECT
t.Group_Step,
t.Predecessor,
ColumnNum + 1 AS ColumnNum
FROM
#Temp AS t
JOIN
Result AS r
ON
t.Predecessor = r.Group_Step
)
-- Statement that executes the CTE
SELECT DISTINCT
Group_Step,
Predecessor,
ColumnNum
FROM
Result
-- EDIT #2: Adding ORDER BY per Op's comment
ORDER BY
ColumnNum
DROP TABLE #Temp

Conversion failed when converting the varchar value to int

Microsoft SQL Server 2008 (SP1), getting an unexpected 'Conversion failed' error.
Not quite sure how to describe this problem, so below is a simple example. The CTE extracts the numeric portion of certain IDs using a search condition to ensure a numeric portion actually exists. The CTE is then used to find the lowest unused sequence number (kind of):
CREATE TABLE IDs (ID CHAR(3) NOT NULL UNIQUE);
INSERT INTO IDs (ID) VALUES ('A01'), ('A02'), ('A04'), ('ERR');
WITH ValidIDs (ID, seq)
AS
(
SELECT ID, CAST(RIGHT(ID, 2) AS INTEGER)
FROM IDs
WHERE ID LIKE 'A[0-9][0-9]'
)
SELECT MIN(V1.seq) + 1 AS next_seq
FROM ValidIDs AS V1
WHERE NOT EXISTS (
SELECT *
FROM ValidIDs AS V2
WHERE V2.seq = V1.seq + 1
);
The error is, 'Conversion failed when converting the varchar value 'RR' to data type int.'
I can't understand why the value ID = 'ERR' should be being considered for conversion because the predicate ID LIKE 'A[0-9][0-9]' should have removed the invalid row from the resultset.
When the base table is substituted with an equivalent CTE the problem goes away i.e.
WITH IDs (ID)
AS
(
SELECT 'A01'
UNION ALL
SELECT 'A02'
UNION ALL
SELECT 'A04'
UNION ALL
SELECT 'ERR'
),
ValidIDs (ID, seq)
AS
(
SELECT ID, CAST(RIGHT(ID, 2) AS INTEGER)
FROM IDs
WHERE ID LIKE 'A[0-9][0-9]'
)
SELECT MIN(V1.seq) + 1 AS next_seq
FROM ValidIDs AS V1
WHERE NOT EXISTS (
SELECT *
FROM ValidIDs AS V2
WHERE V2.seq = V1.seq + 1
);
Why would a base table cause this error? Is this a known issue?
UPDATE #sgmoore: no, doing the filtering in one CTE and the casting in another CTE still results in the same error e.g.
WITH FilteredIDs (ID)
AS
(
SELECT ID
FROM IDs
WHERE ID LIKE 'A[0-9][0-9]'
),
ValidIDs (ID, seq)
AS
(
SELECT ID, CAST(RIGHT(ID, 2) AS INTEGER)
FROM FilteredIDs
)
SELECT MIN(V1.seq) + 1 AS next_seq
FROM ValidIDs AS V1
WHERE NOT EXISTS (
SELECT *
FROM ValidIDs AS V2
WHERE V2.seq = V1.seq + 1
);
It's a bug and has already been reported as SQL Server should not raise illogical errors (as I said, it's hard to describe this one!) by Erland Sommarskog.
The response from the SQL Server Programmability Team is, "the issue is that SQL Server raises errors [too] eagerly due to pushing of prediates/expressions during query execution without considering the logical result of the query."
I've now voted for a fix, everyone do the same please :)
What if you replace the section
SELECT ID, CAST(RIGHT(ID, 2) AS INTEGER)
FROM IDs
WHERE ID LIKE 'A[0-9][0-9]'
With
SELECT ID, CAST(RIGHT(ID, 2) AS INTEGER)
FROM
(
select ID from IDs
WHERE ID LIKE 'A[0-9][0-9]'
)
This happened to me because I did a Union and was not careful to make sure both queries had their fields in the same order. Once I fixed that, it was fine.