Description
I have a managed partitioned Hive table table_a with data stored in Amazon S3 in parquet format. I renamed column col_old to col_new. And, I lost all the data of col_old because of the way parquet file works.
Question
Is there any way to recover values of col_old? (I still have the old parquet data files.)
Here are few things I tried:
Created a new table with old files and renamed col_new to col_old.
Created a new table with old files and added col_old.
creae table with new column and do
insert into new table as select * from old_table
Related
I have data loaded in my S3 bucket folder as multiple parquet files.
After loading them into Athena I can query the data successfully.
What are the ways to rename the Athena table columns for parquet file source and still be able to see the data under renamed column after querying?
Note: checked with edit schema option, column is getting renamed but after querying you will not see data under that column.
There is as far as I know no way to create a table with different names for the columns than what they are called in the files. The table can have fewer or extra columns, but only the names that are the same as in the files will be queryable.
You can, however, create a view with other names, for example:
CREATE OR REPLACE VIEW a_view AS
SELECT
a AS b,
b AS c
FROM the_table
I have a table in Athena that is created from a csv file stored in S3 and I am using Lambda to query it. But I have incoming data being processed by the lambda function and want to append a new row to the existing table in Athena. How can I do this? Because I saw in documentation that Athena prohibits some SQL statements like INSERT INTO and CREATE TABLE AS SELECT
If you are adding new data you can save the new data file into the same folder (prefix/key) that the table is in reading from. Athena will read from all files in this folder, the format of the new file just needs to be the same as the existing one.
I want to do the following thing in Hive: I create an external table stored as a Textfile and I convert this table in an ORC table (with the usual way: first create an empty ORC table, and second load the data from the original one).
For my TextFile table, my data is located in HDFS in a directory, say /user/MY_DATA/.
So when I add/drop files from MY_DATA, my TextFile table is automatically updated. Now I would like the ORC table to be automatically updated too. Do you know if this is possible?
Thank you!
No, there is no straight forward way for this, u need to add the new data in the ORC table as u did for the first load, or u can create a new orc
CREATE TABLE orc_emp STORED AS ORC AS SELECT * FROM employees.emp;
table and drop the old orc table.
Facing issue on creating hive table on top of parquet file. Can someone help me on the same.? I have read many articles and followed the guidelines but not able to load a parquet file in Hive Table.
According "Using Parquet Tables in Hive" it is often useful to create the table as an external table pointing to the location where the files will be created, if a table will be populated with data files generated outside of Hive.
hive> create external table parquet_table_name (<yourParquetDataStructure>)
STORED AS PARQUET
LOCATION '/<yourPath>/<yourParquetFile>';
I am now preparing to store data in .csv files into hive. Of course, because of the good performance of parquet file format, the hive table should is parquet format. So, the normal way, is to create a temp table whose format is textfile, then I load local CSV file data into this temp table, and finally, create a same-structure parquet table and use sql insert into parquet_table values (select * from textfile_table);.
But I don't think this temp textfile table is necessary. So, my question is, is there a way for me to load these local .csv files into hive parquet-format table directly, namely, not to resort the a temp table? Or a easier way to accomplish this task?
As stated in Hive documentation:
NO verification of data against the schema is performed by the load command.
If the file is in hdfs, it is moved into the Hive-controlled file system namespace.
You could skip a step by using CREATE TABLE AS SELECT for the parquet table.
So you'll have 3 steps:
Create text table defining the schema
Load data into text table (move the file into the new table)
CREATE TABLE parquet_table AS SELECT * FROM textfile_table STORED AS PARQUET; supported from hive 0.13