so, I have a table nba_schedule which is created below. When I try to copy data from an s3 csv file to insert to the table using COPY, I receive this error InternalError_: Cannot COPY into nonexistent table newsletter_schedule.
I'm thinking it's because this is all taking place in the same transaction, which is what I am expected to do here. Also, the redshift variables are located in an env file, I'm not sharing the code that loads that in.
redshift_table = 'nba_schedule'
# Connect to redshift
conn_string = "dbname={} port={} user={} password={} host={}".format(
redshift_dbname, redshift_port, redshift_user, redshift_password, redshift_host)
conn = psycopg2.connect(conn_string)
cursor = conn.cursor()
logging.info("Creating newsletter_schedule table in Redshift")
sql = f"""DROP TABLE IF EXISTS {schema + "." + redshift_table}"""
cursor.execute(sql)
sql = f"""CREATE TABLE IF NOT EXISTS {schema + "." + redshift_table} (
Date DATE,
Player_Name VARCHAR(255),
Player_Nickname VARCHAR(13),
Player_No VARCHAR(13),
Points VARCHAR(255),
Rebounds VARCHAR(255),
Assists VARCHAR(255),
Blocks VARCHAR(1),
3PM VARCHAR(1),
3PA VARCHAR(1),
FGM VARCHAR(50),
FGA VARCHAR(255),
three_percent VARCHAR(50),
fg_percent VARCHAR(50)
)
"""
cursor.execute(sql)
sql =f"""
COPY newsletter_schedule
FROM 's3://random_sample_data/nba_redshift/{s3_file}'
CREDENTIALS 'aws_iam_role=arn:aws:iam::4254514352:role/SampleRole'
DELIMITER ','
IGNOREHEADER 1
EMPTYASNULL
QUOTE '"'
CSV
REGION 'us-east-1';
"""
cursor.execute(sql)
conn.commit()
Any thoughts?
My first thought is that the CREATE TABLE is with the schema explicitly defined but the COPY command w/o the schema defined, just the table name. Now I don't know what schema you are using or what the search path is for this user on Redshift but it seems like you should check that this isn't just a schema search path issue.
What happens if you use schema.table in the COPY command? (this debug path is easier to explain than describing how to evaluate the user's search path)
There are other more subtle ways this could be happening but I've learned to look at the simple causes first - they are easier to rule out and more often than not the root cause.
Related
How do we create an external table using Snowflake sql that points to a directory in S3? Below is the code I tried so far, but didn't work. Any help is highly appreciated.
create external table my_table
(
column1 varchar(4000),
column2 varchar(4000)
)
LOCATION 's3a://<externalbucket>'
Note : The file that I have in the S3 bucket is a csv file (comma seperated, double quotes enclosed and with header).
You will need to update your location to be an external stage, include the file_format parameter, and include the proper expression for the columns.
The location Parameter:
Specifies the external stage where the files containing data to be read are staged.
Additionally you'll need to define the file_format
https://docs.snowflake.com/en/sql-reference/sql/create-external-table.html#required-parameters
So your statement should look more like this:
create external table my_table
(
column1 varchar as (value:c1::varchar),
column2 varchar as (value:c2::varchar)
)
location = #[namespace.]ext_stage_name[/path]
file_format = (type = CSV)
You may need to define additional paramaters in the file format to handle your file appropriately
Finally I sorted this out. Posting this answer as to make the answer simple to understand especially for the beginners.
Say that I have a csv file in the S3 location in the below format.
Step 1 :
Create a file format in which you can define what type of file it is, field delimiter, data enclosed in double quotes, skip the header of the file etc.
create or replace file format schema_name.pipeformat
type = 'CSV'
field_delimiter = '|'
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
skip_header = 1
https://docs.snowflake.com/en/sql-reference/sql/create-file-format.html
Step 2 :
Create a Stage to specify the S3 details and file format.
create or replace stage schema_name.stage_name
url='s3://<path where file is kept>'
credentials=(aws_key_id='****' aws_secret_key='****')
file_format = pipeformat
https://docs.snowflake.com/en/sql-reference/sql/create-stage.html#required-parameters
Step 3 :
Create the external table based on the Stage name and file format.
create or replace external table schema_name.table_name
(
RollNumber INT as (value:c1::int),
Name varchar(20) as ( value:c2::varchar),
Marks int as (value:c3::int)
)
with location = #stage_name
file_format = pipeformat
https://docs.snowflake.com/en/sql-reference/sql/create-external-table.html
Step 4 :
Now you should be able to query from the external table.
select *
from schema_name.table_name
I have a very basic question which is: How can I add a very simple table to Hive. My table is saved in a text file (.txt) which is saved in HDFS. I have tried to create an external table in Hive which points out this file but when I run an SQL query (select * from table_name) I don't get any output.
Here is an example code:
create external table Data (
dummy INT,
account_number INT,
balance INT,
firstname STRING,
lastname STRING,
age INT,
gender CHAR(1),
address STRING,
employer STRING,
email STRING,
city STRING,
state CHAR(2)
)
LOCATION 'hdfs:///KibTEst/Data.txt';
KibTEst/Data.txt is the path of the text file in HDFS.
The rows in the table are seperated by carriage return, and the columns are seperated by commas.
Thanks for your help!
You just need to create an external table pointing to your file
location in hdfs and with delimiter properties as below:
create external table Data (
dummy INT,
account_number INT,
balance INT,
firstname STRING,
lastname STRING,
age INT,
gender CHAR(1),
address STRING,
employer STRING,
email STRING,
city STRING,
state CHAR(2)
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
LOCATION 'hdfs:///KibTEst/Data.txt';
You need to run select query(because file is already in HDFS and external table directly fetches data from it when location is specified in create statement). So you test using below select statement:
SELECT * FROM Data;
create external table Data (
dummy INT,
account_number INT,
balance INT,
firstname STRING,
lastname STRING,
age INT,
gender CHAR(1),
address STRING,
employer STRING,
email STRING,
city STRING,
state CHAR(2)
)
row format delimited
FIELDS TERMINATED BY ‘,’
stored as textfile
LOCATION 'Your hdfs location for external table';
If data in HDFS then use :
LOAD DATA INPATH 'hdfs_file_or_directory_path' INTO TABLE tablename
The use select * from table_name
create external table Data (
dummy INT,
account_number INT,
balance INT,
firstname STRING,
lastname STRING,
age INT,
gender CHAR(1),
address STRING,
employer STRING,
email STRING,
city STRING,
state CHAR(2)
)
row format delimited
FIELDS TERMINATED BY ','
stored as textfile
LOCATION '/Data';
Then load file into table
LOAD DATA INPATH '/KibTEst/Data.txt' INTO TABLE Data;
Then
select * from Data;
I hope, below inputs will try to answer the question asked by #mshabeen.
There are different ways that you can use to load data in Hive table that is created as external table.
While creating the Hive external table you can either use the LOCATION option and specify the HDFS, S3 (in case of AWS) or File location, from where you want to load data OR you can use LOAD DATA INPATH option to load data from HDFS, S3 or File after creating the Hive table.
Alternatively you can also use ALTER TABLE command to load data in the Hive partitions.
Below are some details
Using LOCATION - Used while creating the Hive table. In this case data is already loaded and available in Hive table.
**LOAD DATA INPATH** option - This Hive command can be used to load data from specified location. Point to remember here is, the data will get MOVED from input path to Hive warehouse path.
Example -
LOAD DATA INPATH 'hdfs://cluster-ip/path/to/data/location/'
Using ALTER TABLE command - Mostly this is used to add data from other locations into the Hive partitions. In this case it is required that all partitions are already defined and the values for the partitions are already known. In case of dynamic partitions this command is not required.
Example -
ALTER TABLE table_name ADD PARTITION (date_col='2018-02-21') LOCATION 'hdfs/path/to/location/'
The above code will map the partition to the specified data location (in this case HDFS). However, the data will NOT MOVED to Hive internal warehouse location.
Additional details are available here
I have a package that contains two execute sql tasks followed by numerous DFTs. The first has the statement:
select CAST(floor(rand()*1000)+1 AS INT) AS SeqVar
And has the ResultSet Single row - this works perfect. It gives me a random number between 1 and 1000 and passes that value on to a variable I have called SeqVar. (I have also verified that this works)
The problem I am having is in my second execute SQL task where I try and use the SeqVar variable outputted from the first Execute SQL teask as a parameter in the following statement:
IF OBJECT_ID('tempdb.dbo.[##temp1]') IS NOT NULL
DROP TABLE [##temp1]
CREATE TABLE [##temp1] (
[RecID] int IDENTITY(?,1),
[Name] VARCHAR(30),
[ABA] VARCHAR(10),
[Account] VARCHAR(20),
[Type] VARCHAR(1),
[Date] date,
[Amount] money
);
Under parameter mapping I have the SeqVar variable name, Direction is Input, Data Type numeric, Parameter name is 0, and Parameter size is 1000.
The value I get has to go where I have the "?" in the create tempdb statement. I am trying to have my code start at a random number and increment by 1.
I know this would probably be easier with a Script task but that tool is broken on my maching (weird dts pipeline errors). Thanks in advance and this is all in SSIS 2008.
Using identity in this way seems like a strange solution just to start a sequence at some random value. Parameters formatted as ? definitely don't work in that context in SQL.
However, another way to manage this, if the method is a given, is to set the entire code of your SQL task using an expression where you sub in the value as a simple string concatenation and then run the resulting string.
Better do create table as like below then use RESEED property to set it to your Random Variable Value'
IF OBJECT_ID('tempdb.dbo.[##temp1]') IS NOT NULL
DROP TABLE [##temp1]
CREATE TABLE [##temp1] (
[RecID] int IDENTITY(1,1),
[Name] VARCHAR(30),
[ABA] VARCHAR(10),
[Account] VARCHAR(20),
[Type] VARCHAR(1),
[Date] date,
[Amount] money
);
DECLARE #RANDOM_VALUE INT = 50
DBCC CHECKIDENT('##TEMP1',RESEED,#RANDOM_VALUE)
I am creating table in hive like:
CREATE TABLE SEQUENCE_TABLE(
SEQUENCE_NAME VARCHAR2(225) NOT NULL,
NEXT_VAL NUMBER NOT NULL
);
But, in result there is parse exception. Unable to read Varchar2(225) NOT NULL.
Can anyone guide me that how to create table like given above and any other process to provide path for it.
There's no such thing as VARCHAR, field width or NOT NULL clause in hive.
CREATE TABLE SEQUENCE_TABLE( SEQUENCE_TABLE string, NEXT_VAL bigint);
Please read this for CREATE TABLE syntax:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable
Anyway Hive is "SQL Like" but it's not "SQL". I wouldn't use it for things such as sequence table as you don't have support for transactions, locking, keys and everything you are familiar with from Oracle (though I think that in new version there is simple support for transactions, updates, deletes, etc.).
I would consider using normal OLTP database for whatever you are trying to achieve
only you have option here like:
CREATE TABLE SEQUENCE_TABLE(SEQUENCE_NAME String,NEXT_VAL bigint) row format delimited fields terminated by ',' stored as textfile;
PS:Again depends the types to data you are going to load in hive
Use following syntax...
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.] table_name
[(col_name data_type [COMMENT col_comment], ...)]
[COMMENT table_comment]
[ROW FORMAT row_format]
[STORED AS file_format]
And Example of hive create table
CREATE TABLE IF NOT EXISTS employee ( eid int, name String,
salary String, destination String)
COMMENT ‘Employee details’
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘\t’
LINES TERMINATED BY ‘\n’
STORED AS TEXTFILE;
I have a table in php which is in this format:
CREATE EXTERNAL TABLE IF NOT EXISTS {$tableName} (fileContent VARCHAR(250), description VARCHAR(250), dimension DOUBLE, fileName VARCHAR(250)) ROW FORMAT
DELIMITED FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/var/www/ASOIS_Proiect/metadata/'
I want for a situation to update only description field if fileName='a' and 'size'='12' already exist in database.
Any idea please? I tried to update the file create for insert with command LOAD and flag OVERWRITE but it is not working.