Join two temp tables using SPARK SQL - apache-spark-sql

I am creating a temp table like below for one of my folder which is month=02. I want to Query for other folder which is month=03. I would like to know how can I do it. Shall i create two temp tables for each floder and then Join them. If yes can anyone help me with the Syntax. or if there is a way I can create a temp table in Spark for two folders.
This create one table sucessfully for month=02. I would like to query month=02 and month=03
df= spark.read.json("wasbs://container#storage.blob.core.windows.net/topics/details/year=2022/month=02/")
df.createOrReplaceTempView("tableForFeb2022")

You can modify below queries as per your requirements:
dF1.createOrReplaceTempView("tableForFeb2022")
dF2.createOrReplaceTempView("tableForMarch2022")
spark.sql("select * from tableForFeb2022 tbl1, tableForMarch2022 tbl2
"where tbl1.id == tbl2.id ")
.show(false)
Refer this article by NKK for more information
You can create two different dataframes, then combine them as given here and finally create spark sql table.

Related

Can we add column to an existing table in AWS Athena using SQL query?

I have a table in AWS Athena which contains 2 records. Is there a SQL query using which a new column can be inserted in to the table?
You can find more information about adding columns to table in Athena documentation
Or you can use CTAS
For example, you have a table with
CREATE EXTERNAL TABLE sample_test(
id string)
LOCATION
's3://bucket/path'
and you can create another table from sample_test with the query
CREATE TABLE new_test
AS
SELECT *, 'new' AS new_col FROM sample_test
You can use any available query after AS
This is mainly for future readers like me, who was struggling to get this working for Hive table with AVRO data and if you don't want to create new table i.e updating schema of the existing table. It works for csv using 'add columns', but not for Hive + AVRO. For Hive + AVRO, to append columns at the end, before partition columns, the solution is available at this link. However, there are couple of things to note that, we need to pass full schema to the literal attribute and not just the changes; and (not sure why but) we had to alter hive table for all 3 things in the same order - 1. add columns using add columns 2. set tblproperties and 3. set serdeproperties. Hopefully it helps someone.

How to copy table by spark-sql

Actually, I want to move one table to another database.
But spark don't permit this.
Then, how to copy table by spark-sql?
I already tried this.
SELECT *
INTO table1 IN new_database
FROM old_database.table1
But it was not working.
maybe try:
CREATE TABLE new_db.new_table AS
SELECT *
FROM old_db.old_table;
To preserve partitioning and storage format do the following-
Get the complete schema of the existing table by running-
show create table db.old_table
The above query will output the table schema which you can just execute after changing the path name and table name.
Then insert all the rows into the new blank table using-
insert into db.new_table select * from db.old_table
The following snippet will create a new table while preserving the definition of the "old" table.
CREATE TABLE db.new_table LIKE db.old_table;
For more info, check the doc's CREATE TABLE.

How can I create a partitioned table 'like' an unpartitioned table with Hive HQL?

I've got a table with two weeks worth of entries, and I would like to copy those entries into a table partitioned by date (creating it if it does not exist).
I'm writing a luigi task to do this, and I would love for it to be independent of the table schema--i.e. I wouldn't have to specify column names and types, and it would CREATE TABLE IF NOT EXISTS when necessary.
I was hoping I could use:
CREATE TABLE IF NOT EXISTS test_part
COMMENT 'This is a test table to see if partitioning works in this case'
PARTITIONED BY (event_date string)
AS select *, '2014-12-15' from source_db.source_table
where event_at <'2014-12-16' and event_at >='2014-12-15';
But this of course fails with: FAILED: SemanticException [Error 10068]: CREATE-TABLE-AS-SELECT does not support partitioning in the target table
I tried again with "like" with basically the same results. Is there a way to do this that I am missing? It doesn't have to be atomic. Multiple sequential commands are fine.
You do not do a create table as.
You create a table first using describe source_table and then you make an insert into table partition (event_date string)
2 steps it works better.

Copy tables from one server to another in DB2

SELECT *
FROM table1 X, table2 C, table3 M, table4 XSDT
WHERE X.CATID= C.CATID
AND M.MEMID= X.MEMID
AND XSDT.SHIPDISC= X.SHIPDISC;
Say I want to run this query on the HOST db (external) and get its data and copy it to a local DB2 database.
Is there a way to do so in DB2?
I know teradata has fastload... but I'm not sure about db2 or how I would go about doing so.
Please keep in mind I do not have dba-level privileges.
Solution to this: http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp?topic=%2Fcom.ibm.db2.udb.admin.doc%2Fdoc%2Fr0002079.htm
If you want to do this with SQL, then you would use something like the following SQL:
create table schema2.table1;
insert into schema2.table1
select * from schema1.table1;
Since you're joining tables, you would have to define the local table in your CREATE TABLE SQL and list the columns in your INSERT as well as your SELECT.
You can do a DB2 backup of the tables, and restore them to your local schema.
You can do a DB2 export of the tables, and use DB2 import to create them on your local schema.
You can use the DB2 db2move utility.

Dynamically creating tables using information held in another table

I want to create a table in sql using the columns details (name, data type etc.) stored in anther table in the database.
Depending on the database you can use the information schema tables. They hold the information you are looking for. Look for the table that describes the columns.
Postgres: http://www.postgresql.org/docs/8.4/interactive/information-schema.html
MySQL: http://dev.mysql.com/doc/refman/5.0/en/information-schema.html
You can query these tables and use 'select into' to insert the results into your other table.
One opinion is to create CREATE TABLE query and execute it in ADO.NET like shown here this
Try out this code
CREATE TABLE new_table
AS
SELECT *
FROM old_table
WHERE 1 = 2;