Is there a way to change the database's table in hive or Hcatalog?
For instance, I have the table foo in the database default, and I want to put this table in the database bar. I try this, but it doesn't work:
ALTER TABLE foo RENAME TO bar.foo
Thanks in advance
AFAIK there is no way in HiveQL to do this. A ticket was raised long back though. But the issue is still open.
An alternate could be to use the EXPORT/IMPORT feature provided by Hive. With this feature we can export the data of a table to a HDFS file along with the metadata using the EXPORT command. The data is stored in JSON format. Data once exported this way could be imported back to another database (even another hive instance) using the IMPORT command.
More on this can be found on the IMPORT/EXPORT MANUAL.
HTH
thanks for your response. I found an other mean to change the database
USE db1; CREATE TABLE db2.foo like foo
Related
I have a process that uses the PutBigQueryBatch processor, in which I would like it to truncate the table before inserting the data. I defined an AVRO schema, and previously created the table in BigQuery specifying how I wanted the fields. I am aware that if I change the "Write Disposition" property to the value "WRITE_TRUNCATE", it will truncate the table. However, when I use this option, the schema of the table in BigQuery ends up being deleted, which I would not like to happen, and a new schema is created to record the data. I understand that the "Create Disposition" property exists, and that if the "CREATE_NEVER" option is selected, the schema should be respected and not deleted.
When I run this processor with the "Write Disposition" property set to "WRITE_APPEND", the schema I created in BigQuery is respected, but with the "WRITE_TRUNCATE" not.
Is there any way to use the "WRITE_TRUNCATE" option and the table schema not be deleted?
Am I doing something wrong?
Below I forward the configuration that I am using in the PutBigQueryBatch processor:
PutBigQueryBatch processor configuration
It sounds like what you want is to run a TRUNCATE TABLE query before starting your process: https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax#truncate_table_statement
I want to load mysqldump file into a server.
while loading dump , I want to change few column values and update schema.
for example for guid column we gave varchar(100) so now I want to change into binary(16) that means I need change in table schema and table values.
can I do this changes while loading dump file into new server.
Thanks
No, basically you can't do anything WHILE loading dump. As mentioned in comments, you have two options:
Edit SQL in dump
Load dump and after that execute a script
with needed fixes.
If you have access to initial database, you can produce another dump with needed changes.
I have a database with data, but I would like to export the schema of the data base to be able to create an empty data base.
I create the script, I select only tables and views, no users, because the idea it's install the data base in many computers with different users. The permissions I will manage individualy.
Well, in the next step, in advaned options, I select that I want triggers, foreign checks and all the other options and I create the script.
However I have some problems:
When I delete my data base from the server and I use the script, I get the error that says that the data base does not exists. Is it possible in the script add the option to create the data base?
If I create the data base manually, if I use the script I get an error that says that a column name is not valid.
At this point I was wondering where is the correct way to create a script of the schema to export it to another servers?
Thanks so much.
I've been trying to store csv data into a table in a database using a pig script.
But instead of inserting the data into a table in a database I created a new file in the metastore.
Can someone please let me know if it is possible to insert data into a table in a database with a pig script, and if so what that script might look like?
You can take a look at DBStorage, but be sure to include the JDBC jar in your pig script and declaring the UDF.
The documentation for the storage UDF is here:
http://pig.apache.org/docs/r0.12.0/api/org/apache/pig/piggybank/storage/DBStorage.html
you can use:
STORE into tablename USING org.apache.hcatalog.pig.HCatStorer()
When I load a (csv)-file to a hive table I can load without overwriting, thus adding the new file to the table.
Internally the file is just copied to the correct folder in HDFS
(e.g. user/warehouse/dbname/tablName/datafile1.csv). And probably some metadata is updated.
After a few loads I want to remove the contents of a specific file from the table.
I am sure I cannot simply delete the file because of the metadata that needs to be adjusted as well. There must be some kind of build-in function for this.
How do I do that?
Why do you need that?I mean Hive was developed to serve as a warehouse where you put lots n lots n lots of data and not to delete data every now and then. Such a need seems to be a poorly thought out schema or a poor use of Hive, at least to me.
And if you really have these kind of needs why don't you create partitioned tables? If you need to delete some specific data just delete that particular partition using either TRUNCATE or ALTER.
TRUNCATE TABLE table_name [PARTITION partition_spec];
ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec, PARTITION partition_spec,...
if this feature is needed more than just once in a while you can use MapR's distribution while allows this kind of operations with no problem (even via NFS). otherwise, if you don't have partition I think you'll have to create and new table using CTAS filterring the data in the bad file or just copy the good files back to os with "hadoop fs -copyToLocal" and move them back to hdfs into new table