Change range value in table partition - sql

I have a big table with partitions.
ex:
CREATE TABLE cust_order (
cust_id NUMBER(10),
start_date VARCHAR2(25),
amount_sold NUMBER(10,2))
PARTITION BY RANGE (START_DATE)
(
PARTITION MAY VALUES LESS THAN ('20120601'),
PARTITION DEF VALUES LESS THAN(MAXVALUE));
I want to alter the table so that the MAY partition contains values less than '20120501' and that the data from '20120501' to '20120601' are stored in the DEF partition.

First, you should always store dates in DATE columns. Storing dates in a VARCHAR2(25) column is a recipe for problems down the line-- someone will inevitably insert a string that has an unexpected format.
Second, assuming that there are no partitions between the MAY and DEF partitions, you could split the MAY partition
ALTER TABLE cust_order
SPLIT PARTITION may AT ('20120501')
INTO( PARTITION may,
PARTITION june_temp )
UPDATE GLOBAL INDEXES;
and then merge the JUNE_TEMP and DEF partitions
ALTER TABLE cust_order
MERGE PARTITIONS june_temp, def
INTO PARTITION def
I'm not sure that I see the wisdom, however, in doing this... A default partition should not generally store any data-- it's generally only there so that inserts don't error out if they have an unexpectedly large partition key. So taking the data that is already in one partition and moving it into a default partition seems rather odd. If you're just going to create a JUNE partition in the future, I'm not sure why you wouldn't just split the partitions into a MAY and JUNE partition.

Related

ORA-14097: column type or size mismatch in ALTER TABLE EXCHANGE PARTITION

I ran into the ORA-14097 while exchanging partition. Can anyone share me some light?
I have the following source_tbl (non-partitioned) table and is intended to partition it using the column "VALID_PERIOD_END"
CREATE TABLE source_tbl
( INVOICE_ID NUMBER(15,0) NOT NULL ENABLE,
LATEST_FLAG_NAME VARCHAR2(3000),
STD_HASH **RAW**(1000),
VALID_PERIOD_START TIMESTAMP (6),
VALID_PERIOD_END **TIMESTAMP** (6),
OVERSEAS NUMBER,
.. <another 20 number columns)
VIP_NO NUMBER
) partition by range(VALID_PERIOD_END)
nologging;
This table now have 5M rows and I want to partition it by VALID_PERIOD_END so that if it is '9999-12-31 23:59:59' (current) will be in one partition while the rest with be in another partition
I have created a second table called TEMP_tbl
CREATE TABLE TEMP_tbl
( INVOICE_ID NUMBER(15,0) NOT NULL ENABLE,
LATEST_FLAG_NAME VARCHAR2(3000),
STD_HASH **RAW**(1000),
VALID_PERIOD_START TIMESTAMP (6),
VALID_PERIOD_END **TIMESTAMP** (6),
OVERSEAS NUMBER,
.. <another 20 number columns)
VIP_NO NUMBER
)partition by range(VALID_PERIOD_END)
(partition p1 values less than(maxvalue)) nologging;
The TEMP_tbl have exactly the same data structure as source_tbl as the script was driven using
dbms_metadata.get_ddl
I have executed gather table status without any error returned
EXEC DBMS_STATS.gather_table_stats(USER, upper('source_tbl'), cascade => TRUE);
However, when I try to execute the following exchange partition statement, I have the error above
alter table TEMP_tbl
exchange partition p1
with table source_tbl
without validation
update global indexes
;
I have checked "user_tab_cols" and I confirm there is no hidden column from the source_tbl. Would it be because of the raw column from my table?
Thanks in advance!
Oracle 12.2 introduced two new partitioning features that will help you a great deal here.
A new ALTER TABLE MODIFY PARTITION BY DDL was introduced that allows conversion of a non-partitioned table to a partitioned table. This operation will copy the data from the existing non-partitioned table into the new table partitions, so it can be long-running. You may specify the ONLINE keyword to do the operation in online mode, which means that DML operations on the table will be allowed while the ALTER TABLE is running. For example:
ALTER TABLE source_tbl
MODIFY PARTITION BY RANGE(VALID_PERIOD_END)
(partition p1 values less than (timestamp '9999-12-31 23:59:59'),
partition p2 values less than (maxvalue))
ONLINE;
To help generally with the sort of EXCHANGE PARTITION problem you are facing, the FOR EXCHANGE WITH TABLE clause was introduced in CREATE TABLE. This is specifically intended for matching physical columns exactly when creating a new table that will be exchanged with an existing table. Optionally FOR EXCHANGE WITH TABLE can be used along with PARTITION BY to create a partitioned table that can be exchanged with the source table. For example:
CREATE TABLE TEMP_tbl
PARTITION BY RANGE(VALID_PERIOD_END)
(partition p1 values less than(maxvalue))
FOR EXCHANGE WITH TABLE source_tbl;
Here is a blog article that describes both of these partitioning enhancements. And here is another blog article that specifically talks using CREATE TABLE ... FOR EXCHANGE WITH TABLE to solve EXCHANGE PARTITION errors.
You didn't mention which version of Oracle you are using, so perhaps you are still running 11g. In this case you probably need to drill down into USER_TAB_COLS to see what the difference is between the two tables. You mentioned that you already checked for hidden columns (which is good) but other mis-matches could occur.
One thing to keep in mind is that NULLABLE column attributes must match between the two tables. If one table has primary key constraint and the other one doesn't, the column may be non-nullable in the table with primary key and nullable in the other table, which will cause ORA-14097.
If that doesn't explain the problem you can also check SEGMENT_COLUMN_ID ordering, DATA_TYPE, DATA_LENGTH, DATA_PRECISION, and DATA_PRECISION. Since you used dbms_metadata.get_ddl to create your table these things should match, but there must be some difference otherwise you would not get the error.
The RAW(1000) column should not be a problem for EXCHANGE PARTITION.

Understanding Hive table creation notation

I have come across Hive tables which I need to convert to Redshift/MySql equivalent.
I am having trouble understanding Hive query structure and would appreciate some help:
CREATE TABLE IF NOT EXISTS table_1 (
id BIGINT,
price DOUBLE,
asset string
)
PARTITIONED BY (
pt STRING
);
ALTER TABLE table_1 DROP IF EXISTS PARTITION (pt== '${yyyymmdd}');
INSERT OVERWRITE TABLE table_1 PARTITION (pt= '${yyyymmdd}')
select aa.id,aa.price,aa.symbol from
...
...
from
table_2 table
I am having trouble understanding the PARTITIONED BY clause. This, if I am understanding correctly, is different from MySQL table partitions, and is a Hive specific dynamic partition.
The partition does not define a column or a key, and partitions by the current date.
Does this mean that table_1 is partitioned by the date? Each day has a separate partition?
Then later on in the code there are notations similar to
inner join table_new table on table.pt = '${yyyymmdd}' and ...
In this context, does it mean only rows inserted on yyyymmdd are selected for the join?
Thank you.
Partition in Hive is a folder in HDFS by default with name key=value + metadata in the Hive metastore. You can alter partition location and create partition on top of any folder.
This PARTITIONED BY (pt STRING) defines partition column pt of type string, not date. Partition values are stored in the metadata. The pt column is not present in the table data files, it is only defined in PARTITIONED BY, all partition values are stored in the metadata. If you load partition dynamically, partition folder is being created with name pt='value'.
This sentence creates partition dynamically:
INSERT OVERWRITE TABLE table_1 PARTITION (pt)
select id, price, symbol
coln as pt --partition column should be the last one
from ...
And this sentence loads single STATIC partition:
INSERT OVERWRITE TABLE table_1 PARTITION (pt= '${yyyymmdd}')
select aa.id,aa.price,aa.symbol
from
No partition column is selected, partition value specified in the
PARTITION (pt= '${yyyymmdd}')
'${yyyymmdd}' here is a parameter with name yyyymmdd which is passed to the script using --hivevar like this:
hive --hivevar yyyymmdd=20200604 -f myscript.sql
You can pass ANY string as partition value in this case, though parameter name yyyymmdd suggests it's format.
BTW date format in hive is 'yyyy-MM-dd' Strings in 'yyyy-MM-dd' format can be implicitly converted to DATE.
I will try in one shot explain what is partitioning in Hive. First of all would be
WHEN TO USE TABLE PARTITIONING
Table partitioninig is good when:
Reading the entire dataset takes too long
Queries almost always filter on the partition columns
There are a reasonable number of different values for partition columns
Data generation of ETL process splits data by file or directory names
Partition column values are not in the data itself
Don't partition on columns with many unique values
Example: Partitioning customers by first name
CREATING PARTITIONED TABLES
To create a partitioned table, use the PARTITIONED BY clause in the CREATE TABLE statement.
The names and types of the partition columns must be specified
in the PARTITIONED BY clause, and only in the PARTITIONED BY clause.
They must not also appear in the list of all the other columns.
CREATE TABLE customers_by_country
(cust_id STRING, name STRING)
PARTITIONED BY (country STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
The example CREATE TABLE statement shown above creates the table customers_by_country,
which is partitioned by the STRING column named country.
Notice that the country column appears only in the PARTITIONED BY clause,
and not in the column list above it.
This example specifies only one partition column, but you can specify more than one by using
a comma-separated column list in the PARTITIONED BY clause.
Aside from these specific differences, this CREATE TABLE statement is the same
as the statement used to create an equivalent non-partitioned table.
Table partitioning is implemented in a way that is mostly transparent
to a user issuing queries with Hive.
A partition column is what’s known as a virtual column, because its values are not stored within the data files.
Following is the result of the DESCRIBE command on customers_by_country;
it displays the partition column country just as if it were a normal column within the table.
You can refer to partition columns in any of the usual clauses of a SELECT statement.
name type comment
cust_id string
name string
country string
You can load data in partitioned tables dynamically or statically
LOADING DATA WITH DYNAMIC PARTITION
One way to load data into a partitioned table is to use dynamic partitioning,
which automatically defines partitions when you load the data, using the values in the partition column.
(The other way is to manually define the partitions with Static Partitioning)
To use dynamic partitioning, you must load data using an INSERT statement.
In the INSERT statement, you must use the PARTITION clause to list the partition columns.
The data you are inserting must include values for the partition columns.
The partition columns must be the rightmost columns in the data you are inserting,
and they must be in the same order as they appear in the PARTITION clause.
INSERT OVERWRITE TABLE customers_by_country
PARTITION(country)
SELECT cust_id, name, country FROM customers;
The example shown above uses an INSERT … SELECT statement
to load data into the customers_by_country table with dynamic partitioning.
Notice that the partition column, country, is included
in the PARTITION clause and is specified last in the SELECT list.
When Hive executes this statement, it automatically creates partitions
for the country column and loads the data into these partitions based on the values in the country column.
The resulting data files in the partition subdirectories do not include values for the country column.
Since the country is known based on which subdirectory a data file is in,
it would be redundant to include country values in the data files as well.
Look at the contents of the customers_by_country directory.
It should now have one subdirectory for each value in the country column.
Look at the file in one of those directories.
Notice that the file contains the row for the customer from that country,
and no others; notice also that the country value is not included.
Note: Hive includes a safety feature that prevents users
from accidentally creating or overwriting a large number of partitions.
(See “Risks of Using Partitioning” for more about this.)
By default, Hive sets the property hive.exec.dynamic.partition.mode to strict.
This prevents you from using dynamic partitioning, though you can still use static partitions.
You can disable this safety feature in Hive by setting
the property hive.exec.dynamic.partition.mode to nonstrict:
SET hive.exec.dynamic.partition.mode=nonstrict;
Then you can use the INSERT statement to load the data dynamically.
Hive properties set in Beeline are for the current session only,
so the next time you start a Hive session this property will be set back to strict.
But you or your system administrator can configure properties permanently, if necessary.
When you run some SELECT queries on the partitioned table, if the table is big enough you can note significant difference in the time it takes to run.
Notice that you will not query the table any differently than you would query the customers table.
LOADING DATA WITH STATIC PARTITIONING
One way to load data into a partitioned table is to use static partitioning,
in which you manually define the different partitions.
With static partitioning, you create a partition manually, using an ALTER TABLE … ADD PARTITION statement,
and then load the data into the partition.
For example, this ALTER TABLE statement creates the partition for Pakistan (pk):
ALTER TABLE customers_by_country
ADD PARTITION (country='pk');
Notice how the partition column name, which is country, and the specific value that defines this partition,
which is pk, are both specified in the ADD PARTITION clause.
This creates a partition directory named country=pk inside the customers_by_country table directory.
After the partition for Pakistan is created, you can add data into the partition using an INSERT … SELECT statement:
INSERT OVERWRITE TABLE customers_by_country
PARTITION(country='pk')
SELECT cust_id, name FROM customers WHERE country='pk'
Notice how in the PARTITION clause, the partition column name, which is country,
and the specific value, which is pk, are both specified, just like in the ADD PARTITION command used to create the partition.
Also notice that in the SELECT statement, the partition column is not included in the SELECT list.
Finally, notice that the WHERE clause in the SELECT statement selects only customers from Pakistan.
With static partitioning, you need to repeat these two steps for each partition:
first create the partition, then add data.
You can actually use any method to load the data; you need not use an INSERT statement.
You could instead use hdfs dfs commands or a LOAD DATA INPATH command.
But however you load the data, it’s your responsibility to ensure that data is stored in the correct partition subdirectories.
For example, data for customers in Pakistan must be stored in the Pakistan partition subdirectory,
and data for customers in other countries must be stored in those countries’ partition subdirectories.
Static partitioning is most useful when the data being loaded
into the table is already divided into files based on the partition column,
or when the data grows in a manner that coincides with the partition column:
For example, suppose your company opens a new store in a different country,
like New Zealand ('nz'), and you're given a file of data for new customers, all from that country.
You could easily add a new partition and load that file into it.
RISKS OF USING PARTITIONING
A major risk when using partitioning is creating partitions that lead you into the small files problem.
When this happens, partitioning a table will actually worsen query performance
(the opposite of the goal when using partitioning) because it causes too many small files to be created.
This is more likely when using dynamic partitioning, but it could still
happen with static partitioning—for example if you added a new partition to a sales table
on a daily basis containing the sales from the previous day,
and each day’s data is not particularly big.
When choosing your partitions, you want to strike a happy balance between too many partitions
(causing the small files problem) and too few partitions (providing performance little benefit).
The partition column or columns should have a reasonable number of values
for the partitions—but what you should consider reasonable is difficult to quantify.
Using dynamic partitioning is particularly dangerous because if you're not careful,
it's easy to partition on a column with too many distinct values.
Imagine a use case where you are often looking for data that falls within
a time frame that you would specify in your query.
You might think that it's a good idea to partition on a column that pertains to time.
But a TIMESTAMP column could have the time to the nanosecond, so every row could have a unique value;
that would be a terrible choice for a partition column! Even to the minute or hour could create
far too many partitions, depending on the nature of your data;
partitioning by larger time units like day, month, or even year might be a better choice.
As another example, consider an employees table.
This has five columns: empl_id, first_name, last_name, salary, and office_id.
Before reading on, think for a moment, which of these might be reasonable for partitioning
The column empl_id is a unique identifier.
If that were your partition column, you would have a separate partition for each employee,
and each would have exactly one row.
In addition, it's not likely you'll be doing a lot of queries looking for a particular value,
or even a particular range of values. This is a poor choice.
The column first_name will not have one per employee, but there will likely be many columns that have only one row.
This is also true for last_name.
Also, like empl_id, it's not likely you'll need filter queries based on these columns. These are also poor choices.
The column salary also will have many divisions
(and even more so if your salaries go to the cent rather than to the dollar as our sample table does).
While it may be that you'll sometimes want to query on salary ranges,
it's not likely you'll want to use individual salaries.
So salary is a poor choice.
A more limited salary_grades specification, like the ones in the salary_grades table,
might be reasonable if your use case involves looking at the data by salary grade frequently.
The office_id column identifies the office where an employee works.
This will have a much smaller number of unique values, even if you have a large company with offices in many cities.
It's imaginable that your use case might be to frequently filter
your employee data based on office location, too. So this would be a good choice.
You also can use multiple columns and create nested partitions.
For example, a dataset of customers might include country and state_or_province columns.
You can partition by country and then partition those further by state_or_province, so customers from Ontario,
Canada would be in the country=ca/state_or_province=on/ partition directory.
This can be extremely helpful for large amounts of data that you want to access either by country or by state or province.
However, using multiple columns increases the danger of creating too many partitions, so you must take extra care when doing so.
The risk of creating too many partitions is why Hive includes the property
hive.exec.dynamic.partition.mode, set to strict by default, which must be reset to nonstrict before you can create a partition.
Rather than automatically and mechanically resetting that property when you're about to load data dynamically,
take it as an opportunity to think about the partitioning columns
and maybe check the number of unique values you would get when you load the data.
And that's all.

How insert data from a temporary table into partitioned table in oracle/sql using merge statement

I have to write a merge statement to insert data from temporary table to a partitioned table and i'm getting below error:-
Error report -
SQL Error: ORA-14400: inserted partition key does not map to any partition
I have to do it session wise and as a result, have to use a temporary table which can not be partitioned.
if your inserts datasets into the partioned table, oracle want to place it
into the correct partionen. You must create for the complete period of time
partitions like in example for MONTHY partition:
ALTER TABLE sales ADD
PARTITION sales_q1_2007 VALUES LESS THAN (TO_DATE('01-APR-2007','dd-MON-yyyy')),
PARTITION sales_q2_2007 VALUES LESS THAN (TO_DATE('01-JUL-2007','dd-MON-yyyy')),
PARTITION sales_q3_2007 VALUES LESS THAN (TO_DATE('01-OCT-2007','dd-MON-yyyy')),
PARTITION sales_q4_2007 VALUES LESS THAN (TO_DATE('01-JAN-2008','dd-MON-yyyy'))
;
If you have done this, you can insert the data ass needed.
Good luck,

How does Impala support partitioning?

How does Impala support the concept of partitioning and, if it supports it, what are the differences between Hive Partitioning and Impala Partitioning?
By default, all the data files for a table are located in a single directory.
Partitioning is a technique for physically dividing the data during loading, based on values from one or more columns, to speed up queries that test those columns.
For example, with a school_records table partitioned on a year column, there is a separate data directory for each different year value, and all the data for that year is stored in a data file in that directory. A query that includes a WHERE condition such as YEAR=1966, YEAR IN (1989,1999), or YEAR BETWEEN 1984 AND 1989 can examine only the data files from the appropriate directory or directories, greatly reducing the amount of data to read and test.
Static and Dynamic Partitioning
Specifying all the partition columns in a SQL statement is called "static partitioning" ,because the statement affects a single predictable partition. For example, you use static partitioning with an ALTER TABLE statement that affects only one partition, or with an INSERT statement that inserts all values into the same partition:
insert into t1 partition(x=10, y='a') select c1 from some_other_table;
When you specify some partition key columns in an INSERT statement, but leave out the values, Impala determines which partition to insert This technique is called "dynamic partitioning":
insert into t1 partition(x, y='b') select c1, c2 from some_other_table;
Create new partition if necessary based on variable year, month, and day; insert a single value.
insert into weather partition (year, month, day) select 'cloudy',2014,4,21;
Create new partition if necessary for specified year and month but variable day; insert a single value.
insert into weather partition (year=2014, month=04, day) select 'sunny',22;
The more key columns you specify in the PARTITION clause, the fewer columns you need in the SELECT list. The trailing columns in the SELECT list are substituted in order for the partition key columns with no specified value.
You may refer to this link for further reading.
Hope that helps!

Creating a table with date as range partition in SQL Server 2012

I am new to SQL Server coding. Please let me know how to create a table with range partition on date in SQL Server
A similar syntax in teradata would be the following (a table is created with order date as range partition over year 2012 with each day as single partition )
CREATE TABLE ORDER_DATA (
ORDER_NUM INTEGER NOT NULL
,CUST_NUM INTEGER
,ORDER_DATE DATE
,ORDER_TOT DECIMAL(10,2)
)
PRIMARY INDEX(ORDER_NUM)
PARTITION BY (RANGE_N ( ORDER_DATE BETWEEN DATE ‘2012-01-01’ AND DATE 2012-12-31 EACH INTERVAL ‘1’ DAY));
Thanks in advance
The process of creating partitioned table is described on MSDN as follows:
Creating a partitioned table or index typically happens in four parts:
1. Create a filegroup or filegroups and corresponding files that will hold the partitions specified by the partition scheme.
2. Create a partition function that maps the rows of a table or index into partitions based on the values of a specified column.
3. Create a partition scheme that maps the partitions of a partitioned table or index to the new filegroups.
4. Create or modify a table or index and specify the partition scheme as the storage location.
You can find code samples on MSDN.