Check if table exists in created Database in U-SQL - azure-data-lake

I am trying to check whether the table exists in database or not in U-SQL. Currently syntax is
DROP TABLE IF EXISTS Logs;
CREATE TABLE Logs (
date DateTime,
eventType int,
eventTime DateTime,
INDEX Index_EventType CLUSTERED (eventType ASC)
DISTRIBUTED BY HASH(eventType) INTO 3);
In this example, I just want to check if table exists or not in current database, I don't want to drop the table if it exists.
Basically I want to add if..else statements in U-SQL script for Table.Such as below:
IF NOT EXISTS Logs
{
//Create table here
}
else
{
//Update table scripts
}
How to have this particular condition in U-SQL script?

Can you please file a feature request at http://aka.ms/adlfeedback for TABLE.EXISTS?
As a workaround, you could create a fake partitioned table that only contains a single partition, or use any of the workarounds maya specified in the comments to her answers.

You can specify IF NOT EXISTS within the CREATE statement to achieve such behavior:
CREATE TABLE IF NOT EXISTS Logs (
date DateTime,
eventType int,
eventTime DateTime,
INDEX Index_EventType CLUSTERED (eventType ASC)
DISTRIBUTED BY HASH(eventType) INTO 3);

Related

Clickhouse. Create table from exist mysql DB

It is necessary to create a table based on the existing mysql database, but only with those records that have passed the WHERE check. I can simply create a table with records, but I cannot understand the logic of how to insert a block of WHERE checks. The official wiki simply says AS SELETECT 1. Sample code is provided below. Help me please.
"CREATE TABLE test_table (id UInt32, payment_date DateTime('Europe/Moscow'))
ENGINE = MySQL('{host}:{port}', '{database}', 'test_table', '{username}', '{password}')
AS SELECT create_date FROM test_table
WHERE create_date > '2020-01-01'";

Hive table creation with a default value

I have a table in RDBMS like so:
create table test (sno number, entry_date date default sysdate).
Now I want to create a table in hive with a structure as adding a default value to a column.
Hive currently doesn't support the feature of adding default value to any column while creating a table.
As a workaround load data into a temporary table and use the insert overwrite table statement to add the current date and time into the main table.
Create a temporary table:
create table test (sno number);
Load data into the table:
Create final table:
create table final_table (sno number, createDate string);
Finally load the data from temp test table to the final table:
insert overwrite table final_table select sno, FROM_UNIXTIME( UNIX_TIMESTAMP(), 'dd/MM/YYYY' ) from test;
Hive doesn't support DEFAULT fields
Doesn't mean you can't do it, though. Just a two step process of creating one "staging" table, then inserting into a second table and selecting that "default" value.
Adding a default value to a column while creating table in hive
Since you mention,
I've table in RDBMS
You could also use your existing table, and use Sqoop to import the data into Hive.

Overwrite hive schema metadata without dropping and creating

Say I have a predefined Hive table with partitions loaded to it.
CREATE EXTERNAL TABLE t1
(
c1 STRING
)
PARTITIONED BY ( dt STRING )
LOCATION...
ALTER TABLE t1 ADD PARTITION ( dt = '2017-01-01' )
Now I got a new text representing the schema:
CREATE EXTERNAL TABLE t1
(
user_id STRING
)
PARTITIONED BY ( dt STRING )
LOCATION...
If I drop and then recreate the table, I'll lose partitions info.
I am looking for a way to redefine the columns schema part without manual add/remove/rename columns ( not a one time thing, trying to automate a schema update process ).
I found a way to do 'almost' what I needed:
Hive supports
REPLACE COLUMNS
Which means I can replace all old columns with new ones.

Table partitioning with procedure input parameter

I'm trying to partitioning my table on ID which I got from procedure parameter.
For example my table ddl:
CREATE TABLE bigtable
( ID number )
As input procedure parameter I got eg. number: 130 , So I'm trying to create partition:
Alter table bigtable
add partition part_random_number values(random number);
Of course as random number I mean eg. 120,56 etc : )
But I got an error that object is not partitioned. So I tried to first defined partition clause in crate table statement:
CREATE TABLE bigtable
( ID number )
PARTITION BY list (ID)
But i doesn't work, It works when I defined some partition eg.
CREATE TABLE bigtable
( ID number )
PARTITION BY list (ID)
( partition type values(130);
)
But I would like to avoid it... Is there any other solution?
As result I would like to have table partitioned by procedure input parameterers.
A partitioned table has to have at least one partition. Just create it with a dummy partition and add the ones you actually need using your procedure.

Alter partition function and partition schema automatically

The structure of my tables are below :
SalesCompanyFinancialPeriod (ID int, ...)
Document (ID int, SalesCompanyFinancialPeriodID Int, ...)
DocumentDetail (ID Int, DocumentID Int, ...)
I want to create a partition function and partition schema for partitioning the Document table and DocumentDetail table, using SalesCompanyFinancialPeriodID column value.
I also want to automatically alter this partition schema and partition function using an after trigger on SalesCompanyFinancialPeriod table.
In other word, I want to automatically create a filegroup in my database when a new salescompanyfinancialperiod record is created, and partition the records of Document table and DocumentDetail table with a new salescompanyfinancialperiodid in this newly created filegroup.
How can I do this?
See http://sqlfascination.com/2010/09/12/interval-partitioning-in-sql-server-2008/, which does almost exactly this (Based on 1 table, but it is the same idea.)
He notes that according to MS, the DML trigger cannot do this directly; quoting Books OnLine: "...the following Transact-SQL statements are not allowed inside the body of a DML trigger when it is used against the table or view that is the target of the triggering action ..., ALTER PARTITION FUNCTION, ..."
He says it is untrue, but I would be careful. You could, instead, create a stored procedure that altered the partitions that is run based on a trigger. This is somewhat more safe, as the statement would need to run as the database owner and have dataspace permissions, which might be scary to have in a trigger directly.
Side note - In SQL 2008, there is no list partition, only range partitions, so this would be annoying even manually. You can trick it, per the following:
http://www.sqlservercentral.com/articles/partition/64740/