Partition key failing error for postgres table - sql

ERROR: no partition of relation "test_table" found for row DETAIL:
Partition key of the failing row contains (start_time) = (2021-04-25
00:00:00). SQL state: 23514
I am inserting a data where i have a column start time (2021-04-25 00:00:00)
This is my Schema
CREATE TABLE test_table (
start_time timestamp NULL,
)
PARTITION BY RANGE (start_time);

This sounds as if you have no partition-tables defined for this table.
You might need something like this:
CREATE TABLE test_table_2021 PARTITION OF test_table
FOR VALUES FROM ('2021-01-01') TO ('2022-01-01');
After you defined this partition for your partitioned table, you should be able to insert the data (as long as start_time is anywhen in 2021).
See the docs: https://www.postgresql.org/docs/current/ddl-partitioning.html

Related

Recreate table from a select and add an extra datetime default column (Snowflake)

I'm having problems creating a table that should be pretty straightforward. The SQL code (Snowflake) is:
create or replace table bank_raw as
select
*,
created_at datetime default current_timestamp()
from bank_raw;
My error is: Syntax error: unexpected 'DEFAULT'. (line 12).
I don't know how I can recreate this table and add this default timestamp column. By the way, I have already created multiple tables from scratch with created_at DateTime default current_timestamp().
Any ideas?
It is possible to define column list definition when using CTAS:
Sample data:
CREATE TABLE bank_raw(id INT, col TEXT);
INSERT INTO bank_raw(id, col) VALUES (1, 'a'), (2,'b');
Query:
CREATE OR REPLACE TABLE bank_raw(id INT,
col TEXT,
created_at datetime default current_timestamp())
AS
SELECT
id, col, CURRENT_TIMESTAMP()
FROM bank_raw;
Output:
SELECT * FROM bank_raw;
DESCRIBE TABLE bank_raw;
Since this is a DML operation not a DDL operation, the default keyword does not apply. You can simply remove it and instead project the column and name it:
create or replace table bank_raw as
select
*,
current_timestamp() as created_at
from bank_raw;
Edit: To enforce a default, you cannot alter a table to add a column with a default value except for sequences. So you'd need to do something like this:
select get_ddl('table','BLANK_RAW');
-- Copy and paste the DDL. Rename the new table,
-- and add the default timestamp:
create or replace table A
(
-- Existing columns here then:
created_at timestamp default current_timestamp
);
You can then do an insert from a select on the table BLANK_RAW. You'll need to specify a column list and omit the CREATED_AT column.

how do i create a table partitioned by date(yearly) in Google Bigquery

This is my data sample
{"userName":"sampleUserName","DateCreated":"1519302159.248"}
and this is how i attempted to create the table
CREATE TABLE dataSet.myTableName (userName string, DateCreated DATE, email string)
PARTITION BY DateCreated
OPTIONS(
description="a table partitioned by DateCreated"
)
but when i try to load the data from a command line from newline_delimited_json i get this error
Invalid schema update. Field DateCreated has changed type from DATE to TIMESTAMP
the problem i think is the DateCreated field is of type DATE, i do not know how to make it a TIMESTAMP, the documentation says to use a partition_expression, how do i do that, the aim is to create a partitioned table by date(in my case by DateCreated) for example by partition by year. how do i improve my query to achieve that, any suggestions or point me to an example or documentation would be great.
You can use a CREATE TABLE statement with partitioning by the timestamp instead:
CREATE TABLE dataSet.myTableName
(
userName STRING,
DateCreated TIMESTAMP,
email STRING
)
PARTITION BY DATE(DateCreated)
OPTIONS(
description="a table partitioned by DateCreated"
)
The documentation says:
PARTITION BY DATE(<timestamp_column>) — partitions the table using the date of the TIMESTAMP column
If the intention is to partition by year, you have a couple of options:
Whenever you insert into the table, truncate the timestamp to the beginning of the year.
Just insert the timestamps without truncation, but when you query the table, filter by the start of the year, e.g. WHERE _PARTITIONTIME >= '2018-01-01' or WHERE _PARTITIONTIME >= '2016-01-01' AND _PARTITIONTIME < '2018-01-01'.

Bulk Inserting data to table which have default current timestamp column

I have a table on redshift with following structure
CREATE TABLE schemaName.tableName (
some_id INTEGER,
current_time TIMESTAMP DEFAULT GETDATE()
);
If I bulk insert data from other table for example
INSERT INTO schemaName.tableName (some_id) SELECT id FROM otherSchema.otherTable;
Will the value for current_time column be same for all bulk inserted rows? Or it will depend on insertion time for each record. As the column data-type is TIMESTAMP
I am considering this for Amazon Redshift only.
So far I have tested with changing the default value of current_time column to SYSDATE and bulk inserting 10 rows to target table. current_time column values per row yields results like 2016-11-16 06:38:52.339208 and are same for each row, where GETDATE() yields result like 2016-11-16 06:43:56. I haven't found any documentation regarding this and need confirmation regarding this.
To be precise, all rows get same timestamp values after executing following statement
INSERT INTO schemaName.tableName (some_id) SELECT id FROM otherSchema.otherTable;
But if I change the table structure to following
CREATE TABLE schemaName.tableName (
some_id INTEGER,
current_time DOUBLE PRECISION DEFAULT RANDOM()
);
rows get different random values for current_time
Yes. Redshift will have same default value in the case of bulk insert. The RedshiftDocumentation has the below content:
the evaluated DEFAULT expression for a given column is the same for
all loaded rows, a DEFAULT expression that uses a RANDOM() function
will assign to same value to all the rows.

Table partitioning with procedure input parameter

I'm trying to partitioning my table on ID which I got from procedure parameter.
For example my table ddl:
CREATE TABLE bigtable
( ID number )
As input procedure parameter I got eg. number: 130 , So I'm trying to create partition:
Alter table bigtable
add partition part_random_number values(random number);
Of course as random number I mean eg. 120,56 etc : )
But I got an error that object is not partitioned. So I tried to first defined partition clause in crate table statement:
CREATE TABLE bigtable
( ID number )
PARTITION BY list (ID)
But i doesn't work, It works when I defined some partition eg.
CREATE TABLE bigtable
( ID number )
PARTITION BY list (ID)
( partition type values(130);
)
But I would like to avoid it... Is there any other solution?
As result I would like to have table partitioned by procedure input parameterers.
A partitioned table has to have at least one partition. Just create it with a dummy partition and add the ones you actually need using your procedure.

Hive - Partition Column Equal to Current Date

I am trying to insert into a Hive table from another table that does not have a column for todays date. The partition I am trying to create is at the date level. What I am trying to do is something like this:
INSERT OVERWRITE TABLE table_2_partition
PARTITION (p_date = from_unixtime(unix_timestamp() - (86400*2) , 'yyyy-MM-dd'))
SELECT * FROM table_1;
But when I run this I get the following error:
"cannot recognize input near 'from_unixtime' '(' 'unix_timestamp' in constant"
If I query a table and make one of the columns that it work just fine. Any idea how to set the partition date to current system date in HiveQL?
Thanks in advance,
Craig
What you want here is Hive dynamic partitioning. This allows the decision for which partition each record is inserted into be determined dynamically as the record is selected. In your case, that decision is based on the date when you run the query.
To use dynamic partitioning your partition clause has the partition field(s) but not the value. The value(s) that maps to the partition field(s) is the value(s) at the end of the SELECT, and in the same order.
When you use dynamic partitions for all partition fields you need to ensure that you are using nonstrict for your dynamic partition mode (hive.exec.dynamic.partition.mode).
In your case, your query would look something like:
SET hive.exec.dynamic.partition.mode=nonstrict;
INSERT OVERWRITE TABLE table_2_partition
PARTITION (p_date)
SELECT
*
, from_unixtime(unix_timestamp() - (86400*2) , 'yyyy-MM-dd')
FROM table_1;
Instead of using unix_timestamp() and from_unixtime() functions, current_date() can used to get current date in 'yyyy-MM-dd' format.
current_date() is added in hive 1.2.0. official documentation
revised query will be :
SET hive.exec.dynamic.partition.mode=nonstrict;
INSERT OVERWRITE TABLE table_2_partition
PARTITION (p_date)
SELECT
*
, current_date()
FROM table_1;
I hope you are running a shell script and then you can store the current date in a variable. Then you create an empty table in Hive using beeline with just partition column. Once done then while inserting the data into partition table you can add that variable as your partition column and data will be inserted.