I'd like to append 2 datasets into one (both datasets have the same columns). To do this, I created a new dataset and set the destination table to an existing table that I want to append the new table to. However, when I do this, the dataset only contains data from the new table.
How can I make sure that the new dataset appends to the existing table?
Thanks
let's say you have a table dataset1.tableA and a table in a difference dataset dataset2.tableB, both table have the same schema. You want to append the table dataset2.tableB to dataset1.tableA. Here is how you do it in StandardSQL via BQ UI:
Set Destination Table: Dataset dataset1 & table ID tableA
Choose Write Preference: Append to table
Run query: SELECT * FROM dataset2.tableB
Now in your table dataset1.tableA you should have data from dataset2.tableB appended.
Related
I have a table in AWS Athena which contains 2 records. Is there a SQL query using which a new column can be inserted in to the table?
You can find more information about adding columns to table in Athena documentation
Or you can use CTAS
For example, you have a table with
CREATE EXTERNAL TABLE sample_test(
id string)
LOCATION
's3://bucket/path'
and you can create another table from sample_test with the query
CREATE TABLE new_test
AS
SELECT *, 'new' AS new_col FROM sample_test
You can use any available query after AS
This is mainly for future readers like me, who was struggling to get this working for Hive table with AVRO data and if you don't want to create new table i.e updating schema of the existing table. It works for csv using 'add columns', but not for Hive + AVRO. For Hive + AVRO, to append columns at the end, before partition columns, the solution is available at this link. However, there are couple of things to note that, we need to pass full schema to the literal attribute and not just the changes; and (not sure why but) we had to alter hive table for all 3 things in the same order - 1. add columns using add columns 2. set tblproperties and 3. set serdeproperties. Hopefully it helps someone.
Actually, I want to move one table to another database.
But spark don't permit this.
Then, how to copy table by spark-sql?
I already tried this.
SELECT *
INTO table1 IN new_database
FROM old_database.table1
But it was not working.
maybe try:
CREATE TABLE new_db.new_table AS
SELECT *
FROM old_db.old_table;
To preserve partitioning and storage format do the following-
Get the complete schema of the existing table by running-
show create table db.old_table
The above query will output the table schema which you can just execute after changing the path name and table name.
Then insert all the rows into the new blank table using-
insert into db.new_table select * from db.old_table
The following snippet will create a new table while preserving the definition of the "old" table.
CREATE TABLE db.new_table LIKE db.old_table;
For more info, check the doc's CREATE TABLE.
I'm processing a big hive's table (more than 500 billion records).
The processing is too slow and I would like to make it faster.
I think that by adding partitions, the process could be more efficient.
Can anybody tell me how I can do that?
Note that my table already exists.
My table :
create table T(
nom string,
prenom string,
...
date string)
Partitioning on date field.
Thx
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;
INSERT OVERWRITE TABLE table_name PARTITION(Date) select date from table_name;
Note :
In the insert statement for a partitioned table make sure that you are specifying the partition columns at the last in select clause.
You have to restructure the table. Here are the steps:
Make sure no other process is writing to the table.
Create new external table using partitioning
Insert into new table by selecting from the old table
Drop the new table (external), only table will be dropped but data will be there
Drop the old table
Create the table with original name by pointing to the location under step 2
You can run repair command to fix all the metadata.
Alternative 4, 5, 6 and 7
Create the table with original name by running show create table on new table and replace with original table name
Run LOAD DATA INPATH command to move files under partitions to new partitions of new table
Drop the external table created
Both the approaches will achieve restructuring with one insert/map reduce job.
Can you help me on this one.I'm trying to pull data from the database of a CAD software and I wish to make a temporary table from the given table below(the output temptable is also shown below) so that i can join it to my already created table1. I'm new to SQL and it seems that a temporary table could work but i don't know how to append the data from the other row into the first row such that the behavior is similar to a sum() function but working with text. Since i cannot post pictures yet, bear with me the formatting of the original table. and the temptable i wish to make. Thanks in advance
orignal table
----Oid---- ----Cable Tray----
--0010f--- ---mv001---
--0010f--- ---mv002---
--0010f--- ---mv003---
--020ab--- ---lv001---
--020ab--- ---lv002---
output temptable
----Oid---- ----Cable Tray Route---
--0010f--- ---mv001, mv002, mv003---
--020ab--- ---lv001, lv002---
This is my sample code:
select *
from table1
join temptable on temptable.oid=table1.oid
When extracting data from a table (schema and data) I can do this by right clicking on the database and by going to tasks->Generate Scripts and it gives me all the data from the table including the create script, which is good.
This though gives me all the data from the table - can this be changed to give me only some of the data from the table? e.g only data on the table after a certain dtmTimeStamp?
Thanks,
I would recommend extracting your data into a separate table using a query and then using generate scripts on this table. Alternatively you can extract the data separately into a flatfile using the export data wizard (include your column headers and use comma seperators with double quote field delimiters).
To make a copy of your table:
SELECT Col1 ,Col2
INTO CloneTable
FROM MyTable
WHERE Col3 = #Condition
(Thanks to #MarkD for adding that)