I am trying to write a table into my data warehouse using the RPostgreSQL package
library(DBI)
library(RPostgreSQL)
pano = dbConnect(dbDriver("PostgreSQL"),
host = 'db.panoply.io',
port = '5439',
user = panoply_user,
password = panoply_pw,
dbname = mydb)
RPostgreSQL::dbWriteTable(pano, "mtcars", mtcars[1:5, ])
I am getting this error:
Error in postgresqlpqExec(new.con, sql4) :
RS-DBI driver: (could not Retrieve the result : ERROR: syntax error at or near "STDIN"
LINE 1: ..."hp","drat","wt","qsec","vs","am","gear","carb" ) FROM STDIN
^
)
The above code writes into Panoply as a 0 row, 0 byte table. Columns seem to be properly entered into Panoply but nothing else appears.
Fiest and most important redshift <> postgresql.
Redshift does not use the Postgres bulk loader. (so stdin is NOT allowed).
There are many options available which you should choose depending on your need, especially consider the volume of data.
For high volume of data you should write to s3 first and then use redshift copy command.
There are many options take a look at
https://github.com/sicarul/redshiftTools
for low volume see
inserting multiple records at once into Redshift with R
Related
I am trying to do POC in pyspark on a very simple requirement. As a first step, I am just trying to copy the table records from one table to another table. There are more than 20 tables but at first, I am trying to do it only for the one table and later enhance it to multiple tables.
The below code is working fine when I am trying to copy only 10 records. But, when I am trying to copy all records from the main table, this code is getting stuck and eventually I have to terminate it manually. As the main table has 1 million records, I was expecting it to happen in few seconds, but it just not getting completed.
Spark UI :
Could you please suggest how should I handle it ?
Host : Local Machine
Spark verison : 3.0.0
database : Oracle
Code :
from pyspark.sql import SparkSession
from configparser import ConfigParser
#read configuration file
config = ConfigParser()
config.read('config.ini')
#setting up db credentials
url = config['credentials']['dbUrl']
dbUsr = config['credentials']['dbUsr']
dbPwd = config['credentials']['dbPwd']
dbDrvr = config['credentials']['dbDrvr']
dbtable = config['tables']['dbtable']
#print(dbtable)
# database connection
def dbConnection(spark):
pushdown_query = "(SELECT * FROM main_table) main_tbl"
prprDF = spark.read.format("jdbc")\
.option("url",url)\
.option("user",dbUsr)\
.option("dbtable",pushdown_query)\
.option("password",dbPwd)\
.option("driver",dbDrvr)\
.option("numPartitions", 2)\
.load()
prprDF.write.format("jdbc")\
.option("url",url)\
.option("user",dbUsr)\
.option("dbtable","backup_tbl")\
.option("password",dbPwd)\
.option("driver",dbDrvr)\
.mode("overwrite").save()
if __name__ =="__main__":
spark = SparkSession\
.builder\
.appName("DB refresh")\
.getOrCreate()
dbConnection(spark)
spark.stop()
It looks like you are using only one thread(executor) to process the data by using JDBC connection. Can you check the executors and driver details in Spark UI and try increasing the resources. Also share the error by which it's failing. You can get this from the same UI or use CLI to logs "yarn logs -applicationId "
I am trying to load JSON file from Staging area (S3) into Stage table using COPY INTO command.
Table:
create or replace TABLE stage_tableA (
RAW_JSON VARIANT NOT NULL
);
Copy Command:
copy into stage_tableA from #stgS3/filename_45.gz file_format = (format_name = 'file_json')
Got the below error when executing the above (sample provided)
SQL Error [100069] [22P02]: Error parsing JSON: document is too large, max size 16777216 bytes If you would like to continue loading
when an error is encountered, use other values such as 'SKIP_FILE' or
'CONTINUE' for the ON_ERROR option. For more information on loading
options, please run 'info loading_data' in a SQL client.
When I had put "ON_ERROR=CONTINUE" , records got partially loaded, i.e until the record with more than max size. But no records after the Error record was loaded.
Was "ON_ERROR=CONTINUE" supposed to skip only the record that has max size and load records before and after it ?
Yes, the ON_ERROR=CONTINUE skips the offending line and continues to load the rest of the file.
To help us provide more insight, can you answer the following:
How many records are in your file?
How many got loaded?
At what line was the error first encountered?
You can find this information using the COPY_HISTORY() table function
Try setting the option strip_outer_array = true for file format and attempt the loading again.
The considerations for loading large size semi-structured data are documented in the below article:
https://docs.snowflake.com/en/user-guide/semistructured-considerations.html
I partially agree with Chris. The ON_ERROR=CONTINUE option only helps if the there are in fact more than 1 JSON objects in the file. If it's 1 massive object then you would simply not get an error or the record loaded when using ON_ERROR=CONTINUE.
If you know your JSON payload is smaller than 16mb then definitely try the strip_outer_array = true. Also, if your JSON has a lot of nulls ("NULL") as values use the STRIP_NULL_VALUES = TRUE as this will slim your payload as well. Hope that helps.
I keep getting a sql syntax error when I run the script to import the data into sql.
here is the data in question,
filtered_interfaces = (['interface Gi1/0/7']
filtered_tech = ['description TECH_5750'])
cab = u'10.210.44.5'
ProgrammingError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '['description TECH_5750'],['interface Gi1/0/7'],10.20.94.5)' at line 1")
Ive attempted dictation and tried to match values as it gets placed in to the database, I have also tried %s inplace of {}.
Im getting the same error as above on both of these methods.
sql = "INSERT INTO device (hostname, cab, localint) VALUES ({0},{1},{2})".format(filtered_tech,filtered_interface,cab)
mycursor.execute(sql)
Long story short this application logs into switches at our DC's. It looks for CDP LLDP devices then spits out all that info into an array. I then filter the info looking for hostname and local interface. The goal is to import that data into sql (Its different on each switch). Then retrieve data on a html page and edit the hostname if applicable. I had the editable piece working but it wasn't scalable hints the sql additions. Apologies if this is an easy fix im not a programmer by trade, just trying to make life easier and learn python.
If you are using MySQL behind your Python script, then you should be following the usage pattern in the documentation for prepared statements:
params = (filtered_tech, filtered_interface, cab,)
sql = """INSERT INTO device (hostname, cab, localint)
VALUES (%s, %s, %s)"""
cursor.execute(sql, params)
# retrieve data here
I am writing a R Markdown file with a sequence of sql chunks. Each chunk contains a CREATE TABLE statement that writes data into an Oracle database. For instance:
```{r setup}
library(knitr)
library(ROracle)
con <- dbConnect(dbDriver("Oracle"), ...)
```
```{sql, connection = "con"}
CREATE TABLE TEST AS SELECT * FROM TESTTABLE
```
Ideally, I would like to get a report that contains the SQLs followed by messages of success or failure. However, when I run the chunks, tables are successfully created, but the following error messages appear:
Error in seq_len(ncol(data)) : argument must be coercible to
non-negative integer In addition: Warning message: In
seq_len(ncol(data)) : first element used of 'length.out' argument
This suggests to me that the result of the SQL statement (which is a logical TRUE or FALSE) cannot be transformed and displayed correctly. A workaround is to assign the result of the SQL to R via the output.var option and then print it in a separate R chunk.
```{sql, connection = "con", output.var = "createTableResult"}
CREATE TABLE TEST AS
SELECT * FROM TESTTABLE
```
```{r}
createTableResult
```
[1] TRUE
Is there a way to avoid this workaround?
Can I somehow get the resulting TRUE or FALSE message directly from the SQL chunk?
The goal of my code is to return a count of the number of rows in a table based on one specific parameter.
Here is code that works:
######### SQL Commands
### Connect to the SQL Database
my $dbh = DBI->connect($data_source, $user, $pass)
or die "Can't connect to $data_source: $DBI::errstr";
### Prepare the SQL statement and execute
my $sth1 = $dbh->selectrow_array("select count(*) from TableInfo where Type = '2'")
or die "Can't connect to $data_source: $DBI::errstr";
### Disconnect from Database statements are completed.
$dbh->disconnect;
######### end SQL Commands
print $sth1;
This will successfully print a number which is 189 in this instance.
When I try to use the same code but change the "Type = '2'" (which should return a value of 2000) I get the following error:
DBD::ODBC::db selectrow_array failed: [Microsoft][ODBC Driver 11 for SQL Server]Numeric value out of range (SQL-22003) at ./getTotalExpSRP.cgi line 33, <DATA> line 225.
Can't connect to dbi:ODBC:BLTSS_SRP: [Microsoft][ODBC Driver 11 for SQL Server]Numeric value out of range (SQL-22003) at ./getTotalExpSRP.cgi line 33, <DATA> line 225.
I've search everywhere but I cannot find out why this happens.
From the symptom of the issue I would guess that there is a limit to the size of the results returned but I cannot find any supporting evidence of this.
I have run a trace on my Microsoft SQL 2005 server and can confirm that the sql statement is being run properly with no errors.
I've viewed my odbc trace log but unfortunately I cannot derive any useful information when comparing a working example to the failed one.
Any help would be appreciated!!
Thanks,
Now we've seen the trace I can explain this. DBD::ODBC calls SQLDescribeCol and is told:
DescribeCol column = 1, name = , namelen = 0, type = unknown(0), precision/column size = 10, scale = 0, nullable = 1
display size = 11
Then it calls SQLColAttribute and is told the column size is 4. Since the column type was unknown (why the driver did that I'm not sure) DBD::ODBC decides to bind the column as a char(4) and so as soon as the count is > 3 digits it will overflow.
The version of DBI and DBD::ODBC being used here is really old and I suspect the latest versions will cope better with this.
Numeric value out of range is a type conversion error. Is TYPE supposed to be a number of a character/string? If it should be a number, use a number
my $sth1 = $dbh->selectrow_array(
"select count(*) from TableInfo where Type=2") # Type=2, not Type='2'
or use placeholders and let Perl and the database driver worry about type conversions
my $sth = $dbh->prepare("select count(*) from TableInfo where Type=?");
$sth->execute(2);
$sth->execute('2'); # same thing
my $st1 = $sth->fetchall_arrayref;