I am using AWS EMR and I am using an open-source delta lake.
In Python, dataframe.write.format('delta').save() works fine.
But I want to use it in SQL. I tried to create a delta table in SQL as below.
spark.sql('''
CREATE OR REPLACE TABLE test.foo
(name string)
USING delta
LOCATION 's3://<bucket_name>/test/foo'
''');
But when I try to INSERT, an error is raised.
spark.sql('INSERT INTO test.foo (name) VALUES ("bar")');
ERROR: org.apache.spark.sql.AnalysisException: Path does not exist
Tables were created in Glue Metastore, but nothing was created in s3://<bucket_name>/test/foo in S3.
Is there any way to create a table in SQL? :)
To me it looks like you use the wrong name: test.sql_delta. If u use sql and created a sql table it will reference to your physical metastore through the SQL table name you just created.
code should be:
spark.sql('INSERT INTO test.foo (name) VALUES ("bar")')
SQL version:
%SQL
INSERT INTO test.foo (name) VALUES ("bar")
I have a table in SQL Server with some columns and a text file. I need to import data of two columns of text file into SQL table (two columns exist in SQL table for do it and no need two insert columns). How can I do it?
Use the SQL Server Importing Wizard and just ignore the columns in the mapping that are not required.
See link.
SQL Server Management Studio (SSMS) provides the Import Wizard task which you can use to copy data from one data source to another. You can choose from a variety of source and destination data source types, select tables to copy or specify your own query to extract data, and save your work as an SSIS package. In this section we will go through the Import Wizard and import data from an Excel spreadsheet into a table in a SQL Server database.
https://www.mssqltips.com/sqlservertutorial/203/simple-way-to-import-data-into-sql-server/
FOR CSV
// THIS IS THE DATA IN THE CSV FILE
Name,Class
Prabhat,4
Prabhat1,5
Prabhat2,6
// end OF CSV FILE
THE QUERY
CREATE TABLE CSVTest (Name varchar(100) , class varchar(10))
BULK
INSERT CSVTest
FROM 'C:\New folder (2)\testcsv.txt'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
GO
--Check the content of the table.
SELECT *
FROM CSVTest
GO
--Drop the table to clean up database.
DROP TABLE CSVTest
GO
SELECT *
FROM table1 X, table2 C, table3 M, table4 XSDT
WHERE X.CATID= C.CATID
AND M.MEMID= X.MEMID
AND XSDT.SHIPDISC= X.SHIPDISC;
Say I want to run this query on the HOST db (external) and get its data and copy it to a local DB2 database.
Is there a way to do so in DB2?
I know teradata has fastload... but I'm not sure about db2 or how I would go about doing so.
Please keep in mind I do not have dba-level privileges.
Solution to this: http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp?topic=%2Fcom.ibm.db2.udb.admin.doc%2Fdoc%2Fr0002079.htm
If you want to do this with SQL, then you would use something like the following SQL:
create table schema2.table1;
insert into schema2.table1
select * from schema1.table1;
Since you're joining tables, you would have to define the local table in your CREATE TABLE SQL and list the columns in your INSERT as well as your SELECT.
You can do a DB2 backup of the tables, and restore them to your local schema.
You can do a DB2 export of the tables, and use DB2 import to create them on your local schema.
You can use the DB2 db2move utility.
I am new to hive. I have successfully setup a single node hadoop cluster for development purpose and on top of it, I have installed hive and pig.
I created a dummy table in hive:
create table foo (id int, name string);
Now, I want to insert data into this table. Can I add data just like sql one record at a time? kindly help me with an analogous command to:
insert into foo (id, name) VALUES (12,"xyz);
Also, I have a csv file which contains data in the format:
1,name1
2,name2
..
..
..
1000,name1000
How can I load this data into the dummy table?
I think the best way is:
a) Copy data into HDFS (if it is not already there)
b) Create external table over your CSV like this
CREATE EXTERNAL TABLE TableName (id int, name string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION 'place in HDFS';
c) You can start using TableName already by issuing queries to it.
d) if you want to insert data into other Hive table:
insert overwrite table finalTable select * from table name;
There's no direct way to insert 1 record at a time from the terminal, however, here's an easy straight forward workaround which I usually use when I want to test something:
Assuming that t is a table with at least 1 record. It doesn't matter what is the type or number of columns.
INSERT INTO TABLE foo
SELECT '12', 'xyz'
FROM t
LIMIT 1;
Hive apparently supports INSERT...VALUES starting in Hive 0.14.
Please see the section 'Inserting into tables from SQL' at: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
What ever data you have inserted into one text file or log file that can put on one path in hdfs and then write a query as follows in hive
hive>load data inpath<<specify inputpath>> into table <<tablename>>;
EXAMPLE:
hive>create table foo (id int, name string)
row format delimited
fields terminated by '\t' or '|'or ','
stored as text file;
table created..
DATA INSERTION::
hive>load data inpath '/home/hive/foodata.log' into table foo;
to insert ad-hoc value like (12,"xyz), do this:
insert into table foo select * from (select 12,"xyz")a;
this is supported from version hive 0.14
INSERT INTO TABLE pd_temp(dept,make,cost,id,asmb_city,asmb_ct,retail) VALUES('production','thailand',10,99202,'northcarolina','usa',20)
It's a limitation of hive.
1.You cannot update data after it is inserted
2.There is no "insert into table values ... " statement
3.You can only load data using bulk load
4.There is not "delete from " command
5.You can only do bulk delete
But you still want to insert record from hive console than you can do select from statck. refer this
You may try this, I have developed a tool to generate hive scripts from a csv file. Following are few examples on how files are generated.
Tool -- https://sourceforge.net/projects/csvtohive/?source=directory
Select a CSV file using Browse and set hadoop root directory ex: /user/bigdataproject/
Tool Generates Hadoop script with all csv files and following is a sample of
generated Hadoop script to insert csv into Hadoop
#!/bin/bash -v
hadoop fs -put ./AllstarFull.csv /user/bigdataproject/AllstarFull.csv
hive -f ./AllstarFull.hive
hadoop fs -put ./Appearances.csv /user/bigdataproject/Appearances.csv
hive -f ./Appearances.hive
hadoop fs -put ./AwardsManagers.csv /user/bigdataproject/AwardsManagers.csv
hive -f ./AwardsManagers.hive
Sample of generated Hive scripts
CREATE DATABASE IF NOT EXISTS lahman;
USE lahman;
CREATE TABLE AllstarFull (playerID string,yearID string,gameNum string,gameID string,teamID string,lgID string,GP string,startingPos string) row format delimited fields terminated by ',' stored as textfile;
LOAD DATA INPATH '/user/bigdataproject/AllstarFull.csv' OVERWRITE INTO TABLE AllstarFull;
SELECT * FROM AllstarFull;
Thanks
Vijay
You can use following lines of code to insert values into an already existing table. Here the table is db_name.table_name having two columns, and I am inserting 'All','done' as a row in the table.
insert into table db_name.table_name
select 'ALL','Done';
Hope this was helpful.
Hadoop file system does not support appending data to the existing files. Although, you can load your CSV file into HDFS and tell Hive to treat it as an external table.
Use this -
create table dummy_table_name as select * from source_table_name;
This will create the new table with existing data available on source_table_name.
LOAD DATA [LOCAL] INPATH '' [OVERWRITE] INTO TABLE <table_name>;
use this command it will load the data at once just specify the file path
if file is in local fs then use LOCAL if file is in hdfs then no need to use local
I have two database in the same schema. My db is in Postgres. I want to copy data of any table (i.e product) of my 1st db into the same table of the 2nd db.
Is it possible to do so using query?
Can't do it as a single SQL command (at least not without dblink), but the easiest way is probably to just use a pipe between two psql's - use COPY on both ends, one sending the data out in CSV format the other one receiving it.
try
insert into db1.table1 select * from db2.table2
It's not possible in vanilla PostgreSQL installation.
If you are able to install contrib modules, use dblink:
INSERT
INTO product
SELECT *
FROM dblink
(
'dbname=sourcedb',
'
SELECT *
FROM product
'
) AS p (id INT, column1 INT, column2 TEXT, …)
This should be run in the target database.