PostgreSQL ON CONFLICT error inserting column EXCLUDED.column_name does not exist with psycopg2.sql Placeholders - sql

I'm using psycopg2 with psycopg2.sql.
import psycopg2
from psycopg2 import sql
I re-wrote some static sql code to be more dynamic by using sql.Placeholder and sql.Identifier.
However, even when there is no conflict, I get an error:
Error inserting into table: column "EXCLUDED.domain_name" does not exist
My sql query looks like this:
query = sql.SQL("insert into dns ({}) values ({}) "
"ON CONFLICT ({}) "
"DO UPDATE SET ({}) = ({}) "
"WHERE {} >= ({});").format(
sql.SQL(', ').join(map(sql.Identifier, dns_cols)),
sql.SQL(', ').join(sql.Placeholder() * len(dns_vals)),
sql.SQL(', ').join(map(sql.Identifier, conflict_cols)),
sql.SQL(', ').join(map(sql.Identifier, dns_cols)),
sql.SQL(', ').join(map(sql.Identifier, excluded_names)),
sql.Identifier('EXCLUDED.updated_date_time'),
sql.Identifier('dns.updated_date_time')
)
mogrify prints out the following:
b'insert into dns ("domain_name", "tld", "subdomain", "https", "cf_url", "updated_date_time") values (\'example\', \'org\', \'\', false, \'http://example.org\', \'2018-12-06 23:12:00\') ON CONFLICT ("domain_name", "tld", "subdomain") DO UPDATE SET ("domain_name", "tld", "subdomain", "https", "cf_url", "updated_date_time") = ("EXCLUDED.domain_name", "EXCLUDED.tld", "EXCLUDED.subdomain", "EXCLUDED.https", "EXCLUDED.cf_url", "EXCLUDED.updated_date_time") WHERE "EXCLUDED.updated_date_time" >= ("dns.updated_date_time");'
dns_cols, excluded_names, and dns_vals are all lists and their values appear to be showing up just fine in the mogrify print out.
I have never needed to create EXCLUDED columns, they are always accessible when "ON CONFLICT" is triggered.
How do I reference EXCLUDED columns when using psycopg2.sql Placeholders?

You are referencing an unqualified column whose name contains a literal period. If you want to quote this, you would have to quote the two parts separately, not together. Also, EXCLUDED has to be in lower case if you do quote it (I am kind of surprised that it works at all when quoted).
"excluded"."domain_name"
The best way to do this is probably to hard code the EXCLUDED into the query string and remove it from the sql.Identifier call:
"DO UPDATE SET ({}) = (EXCLUDED.{}) "
...
sql.Identifier('updated_date_time'),

Related

ProgrammingError: ('Expected 0 parameters, supplied 391', 'HY000') with 391 columns using dynamic approach

I have a dataframe that contains 391 columns and a number of rows. I am trying to push this to a database via pyodbc and using the following command:
cursor = conn.cursor()
cursor.fast_executemany = True
cursor.executemany(
f"INSERT INTO db.tble({', '.join(df.columns.tolist())}) VALUES ({('?,' * len(df.columns))[:-1]})",
list(df.itertuples(index=False, name=None))
)
cursor.commit()
I would have thought this method would be dynamic for a dataframe of any size yet I get the following error:
ProgrammingError: ('Expected 0 parameters, supplied 391', 'HY000')
I am struggling to understand this as the syntax looks correct, ? has been used instead of %s like other answers. Can someone please help.
Thanks
I once wrote a piece of code, where I wanted to create the insert statement dynamically based on number of columns in the data frame:
here is how the insert query would be passed to the database:
INSERT INTO dbo.Table (column1,columns2,column3) VALUES (?,?,?)
and again, the number of columns and values '?' would be required to be created dynamically at runtime based upon the number of columns the data frame had
I wrote the below piece to just write a string (of ?,?,?) and concatenate it with the insert query,
here
df is the dataframe,
symbol_counter would hold the number of columns in the dataframe,
sym_string would be the final string i.e. (?,?,?,?...n) based on the number of columns
symbol = ['?']
sym_string = ''
symbol_counter = int(df.shape[1])-1
word = 0
for word in range(symbol_counter):
# sym_string += str(symbol)
symbol.insert(word, "?")
word+=1
sym_string = (','.join(symbol))
#and then use this variable and concatenate it with the rest of the query as shown below
query = Variable_holding_first_partofthequery + " VALUES (" +sym_string+")"
I know, it's the big way, but that's how I got it to work. Good Luck!

how to filter double quotes when creating SQL file with read.csv.sql, R

After much searching and experimenting, I simply cannot get this. I have a csv whose fields are all within quotation marks and I cannot import them. My basic script is the following. It works fine if there are no quotes. I would expect it to print out just the first three lines. I am thinking it could be the filter argument that I need to add?
library(sqldf)
sqldf("attach testingdb as new")
read.csv.sql("df.csv", sql = "create table main as select * from file", dbname = "testingdb")
sqldf("select * from main limit 3", dbname = "testingdb")
This is what the two few lines of the csv look like:
"{6DA08449}","89000","2002-05-10 00:00","MK42Q","S","N","F","38",""
"{6DB08449}","100000","2002-05-10 00:00","M429HQ","S","N","F","38",""

PYTHON - Using double quotes in SQL constant

I have a SQL query entered into a constant. One of the fields that I need to put in my where clause is USER which is a key word. To run the query I put the keyword into double quotes.
I have tried all of the suggestions from here yet none seem to be working.
Here is what I have for my constant:
SELECT_USER_SECURITY = "SELECT * FROM USER_SECURITY_TRANSLATED WHERE \"USER\" = '{user}' and COMPANY = " \
"'company_number' and TYPE NOT IN (1, 4)"
I am not sure how to get this query to work from my constant.
I also tried wrapping the whole query in """. I am getting a key error on the USER.
SELECT_USER_SECURITY = """SELECT * FROM USER_SECURITY_TRANSLATED WHERE "USER" = '{user}' and
COMPANY = 'company_number' and TYPE NOT IN (1, 4)"""
Below is the error I am getting:
nose.proxy.KeyError: 'user'
So the triple quoted solution was the best one. The problem I was running into was I had not included the "user" key in my dictionary of params which formatted the query.

update an SQL table via R sqlSave

I have a data frame in R having 3 columns, using sqlSave I can easily create a table in an SQL database:
channel <- odbcConnect("JWPMICOMP")
sqlSave(channel, dbdata, tablename = "ManagerNav", rownames = FALSE, append = TRUE, varTypes = c(DateNav = "datetime"))
odbcClose(channel)
This data frame contains information about Managers (Name, Nav and Date) which are updatede every day with new values for the current date and maybe old values could be updated too in case of errors.
How can I accomplish this task in R?
I treid to use sqlUpdate but it returns me the following error:
> sqlUpdate(channel, dbdata, tablename = "ManagerNav")
Error in sqlUpdate(channel, dbdata, tablename = "ManagerNav") :
cannot update ‘ManagerNav’ without unique column
When you create a table "the white shark-way" (see documentation), it does not get a primary index, but is just plain columns, and often of the wrong type. Usually, I use your approach to get the columns names right, but after that you should go into your database and assign a primary index, correct column widths and types.
After that, sqlUpdate() might work; I say might, because I have given up using sqlUpdate(), there are too many caveats, and use sqlQuery(..., paste("Update....))) for the real work.
What I would do for this is the following
Solution 1
sqlUpdate(channel, dbdata,tablename="ManagerNav", index=c("ManagerNav"))
Solution 2
Lcolumns <- list(dbdata[0,])
sqlUpdate(channel, dbdata,tablename="ManagerNav", index=c(Lcolumns))
Index is used to specify what columns R is going to update.
Hope this helps!
If none of the other solutions work and your data is not that big, I'd suggest using sqlQuery() and loop through your dataframe.
one_row_of_your_df <- function(i) {
sql_query <-
paste0("INSERT INTO your_table_name (column_name1, column_name2, column_name3) VALUES",
"(",
"'",your_dataframe[i,1],",",
"'",your_dataframe[i,2],"'",",",
"'",your_dataframe[i,3],"'",",",
")"
)
return(sql_query)
}
This function is Exasol specific, it is pretty similar to MySQL, but not identical, so small changes could be necessary.
Then use a simple for loop like this one:
for(i in 1:nrow(your_dataframe))
{
sqlQuery(your_connection, one_row_of_your_df(i))
}

Rails3: SQL execution with hash substitution like .where()

With a simple model like that
class Model < ActiveRecord::Base
# ...
end
we can do queries like that
Model.where(["name = :name and updated_at >= :D", \
{ :D => (Date.today - 1.day).to_datetime, :name => "O'Connor" }])
Where the values in the hash will be substituted into the final SQL statement with proper escaping depending on the underlying database engine.
I would like to know a similar feature for SQL execution like:
ActiveRecord::Base.connection.execute( \
["update models set name = :name, hired_at = :D where id = :id;"], \
{ :id => 73465, :D => DateTime.now, :name => "O'My God" }] \
) # THIS CODE IS A FANTASY. NOT WORKING.
(Please do not solve the example with loading a Model object, modifying and then saving! The example is only an illustration for the feature I would like to have / know. Concentrate on the subject!)
The original problem is that I want to insert large amount (many thousand lines) of data into the database. I want to use some features of the SQL abstraction of the ActiveRecord framework but I don't want to use model objects based on ActiveRecord::Base because they are damn slow! (8 queries per second for my current problem.)
query = ActiveRecord::Base.connection.raw_connection.prepare("INSERT INTO users (name) VALUES(:name)")
query.execute(:name => 'test_name')
query.close
Extending the #peufeu solution with concrete code example for bulk insert:
users_places = []
users_values = []
timestamp = Time.now.strftime('%Y-%m-%d %H:%M:%S')
params[:users].each do |user|
users_places << "(?,?,?,?)"
users_values << user[:name] << user[:punch_line] << timestamp << timestamp
end
bulk_insert_users_sql_arr = ["INSERT INTO users (name, punch_line, created_at, updated_at) VALUES #{users_places.join(", ")}"] + users_values
begin
sql = ActiveRecord::Base.send(:sanitize_sql_array, bulk_insert_users_sql_arr)
ActiveRecord::Base.connection.execute(sql)
rescue
"something went wrong with the bulk insert sql query"
end
Here is the reference to sanitize_sql_array method in ActiveRecord::Base, it generates the proper query string by escaping the single quotes in the strings. For example the punch_line "Don't let them get you down" will become "Don\'t let them get you down".
Yes you could do raw SQL, but checkout the ar-extensions gem that helps with batch inserts:
https://github.com/zdennis/ar-extensions
Here's a post on it, and various other techniques:
http://www.coffeepowered.net/2009/01/23/mass-inserting-data-in-rails-without-killing-your-performance/
For INSERTs, batching them using a long VALUES clause (as shown by Simon's link) is the fastest way (unless you want to generate a text file and load it in your database with MySQL's LOAD DATA INFILE). But you have to be very careful about escaping your text values (which is not done in the example).
I was asking "what database are you using" because it does matter for mass UPDATEs.
For instance, you can do this on postgres (and I believe SQL Server changing "columnX" to "colX" ):
UPDATE foo
JOIN (VALUES (1,2),(3,4),... long list) v ON (foo.id=v.column1)
SET foo.bar = v.column2
And you can update a load of rows using a single statement, very fast.
If you don't need Ruby to perform some Ruby-specific magic on your data, the fastest way to transfer data from one DB to a different one is to export as a text file (CSV or tab separated), load it on the other DB (LOAD DATA INFILE on MySQL), perhaps in a temporary table, and bulk process using SQL.
EDIT : Here's how I do this in Python :
sql = [ "INSERT INTO foo (column list) VALUES " ]
values = []
for tuple in tuple_list:
append "(?,?,?,?)" to sql
extend values list with tuple
Then join sql into a string, you get "INSERT INTO foo (column list) VALUES (?,?,?,?),(?,?,?,?),(?,?,?,?)" with the "(?,?,?,?)" repeated as many times as you have lines to insert.
Then "values" contains a list of (a1,b1,c1,d1,a2,b2,c2,d2,a3,b3,c3,d3) with an,bn,cn,dn being the tuples you want to insert for line n. Each one corresponds to a placeholder in the sql string.
Then pass this to the usual "execute query with parameters" function which will handle quoting and escaping as usual.
I encountered a similar issue recently when tying to insert 100K+ records into a MySQL database for a Rails 4 app using mysql2 gem. The data included characters that had to be sanitized prior to insert.
The solution I ended going with was a slightly modified version of Option 3 described at https://www.coffeepowered.net/2009/01/23/mass-inserting-data-in-rails-without-killing-your-performance/
Here's the relevant code block from the above link:
TIMES = 10000
inserts = []
TIMES.times do
inserts.push "(3.0, '2009-01-23 20:21:13', 2, 1)"
end
sql = "INSERT INTO user_node_scores (`score`, `updated_at`, `node_id`, `user_id`) VALUES #{inserts.join(", ")}"
The modification I made was using the public method ActiveRecord::Base.sanitize() on values that required it.
inserts = []
created = Time.now.strftime "%Y-%m-%d %H:%M:%S"
params[:audits].each do |audit|
inserts.push "(#{audit.user_id), #{created}," + ActiveRecord::Base.sanitize(audit.comment) + ", #{audit.status})"
end
sql = "INSERT INTO user_audits (`user_id`, `created_at`, `comment`, `status`) VALUES #{inserts.join(", ")}"