I am trying to write a pandas DataFrame to a Postgres database.
Code is as below:
dbConnection = psycopg2.connect(user = "user1", password = "user1", host = "localhost", port = "5432", database = "postgres")
dbConnection.set_isolation_level(0)
dbCursor = dbConnection.cursor()
dbCursor.execute("DROP DATABASE IF EXISTS FiguresUSA")
dbCursor.execute("CREATE DATABASE FiguresUSA")
dbCursor.execute("DROP TABLE IF EXISTS FiguresUSAByState")
dbCursor.execute("CREATE TABLE FiguresUSAByState(Index integer PRIMARY KEY, Province_State VARCHAR(50), NumberByState integer)");
for i in data_pandas.index:
query = """
INSERT into FiguresUSAByState(column1, column2, column3) values('%s',%s,%i);
""" % (data_pandas['Index'], data_pandas['Province_State'], data_pandas['NumberByState'])
dbCursor.execute(query)
When I run this, I get an error which just says : "Index". I know its somewhere in my for loop is the problem, is that % notation correct? I am new to Postgres and don't see how that could be correct syntax. I know I can use to_sql but I am trying to use different techniques.
Print out of data_pandas is as below:
One slight possible anomaly is that there an "index" in the IDE version. Could this be the problem?
If you use pd.DataFrame.to_sql, you can supply the index_label parameter to use that as a column.
data_pandas.to_sql('FiguresUSAByState', con=dbConnection, index_label='Index')
If you would prefer to stick with the custom SQL and for loop you have, you will need to reset_index first.
for row in data_pandas.reset_index().to_dict('rows'):
query = """
INSERT into FiguresUSAByState(index, Province_State, NumberByState) values(%i, '%s', %i);
""" % (row['index'], row['Province_State'], row['NumberByState'])
Note that the default name for the new column is index, uncapitalized, rather than Index.
In the insert statement:
query = """
INSERT into FiguresUSAByState (column1, column2, column3) values ('%s',%s,%i);
"""% (data_pandas ['Index'], data_pandas ['Province_State'], data_pandas ['NumberByState'])
You have a '%s', I think that is the problem. So remove the quotes
Related
I have a dataframe that contains 391 columns and a number of rows. I am trying to push this to a database via pyodbc and using the following command:
cursor = conn.cursor()
cursor.fast_executemany = True
cursor.executemany(
f"INSERT INTO db.tble({', '.join(df.columns.tolist())}) VALUES ({('?,' * len(df.columns))[:-1]})",
list(df.itertuples(index=False, name=None))
)
cursor.commit()
I would have thought this method would be dynamic for a dataframe of any size yet I get the following error:
ProgrammingError: ('Expected 0 parameters, supplied 391', 'HY000')
I am struggling to understand this as the syntax looks correct, ? has been used instead of %s like other answers. Can someone please help.
Thanks
I once wrote a piece of code, where I wanted to create the insert statement dynamically based on number of columns in the data frame:
here is how the insert query would be passed to the database:
INSERT INTO dbo.Table (column1,columns2,column3) VALUES (?,?,?)
and again, the number of columns and values '?' would be required to be created dynamically at runtime based upon the number of columns the data frame had
I wrote the below piece to just write a string (of ?,?,?) and concatenate it with the insert query,
here
df is the dataframe,
symbol_counter would hold the number of columns in the dataframe,
sym_string would be the final string i.e. (?,?,?,?...n) based on the number of columns
symbol = ['?']
sym_string = ''
symbol_counter = int(df.shape[1])-1
word = 0
for word in range(symbol_counter):
# sym_string += str(symbol)
symbol.insert(word, "?")
word+=1
sym_string = (','.join(symbol))
#and then use this variable and concatenate it with the rest of the query as shown below
query = Variable_holding_first_partofthequery + " VALUES (" +sym_string+")"
I know, it's the big way, but that's how I got it to work. Good Luck!
Say I have a class variable restemail which stores the email id I need to use to sort out from the select statement in SQLite (Python). Whenever I refer to that variable after my WHERE clause, SQLite treats it as a column and returns an error saying that such a column doesn't exist. Something like this:
restemail=StringVar()
Password=StringVar()
def database(self):
conn = sqlite3.connect('data.db')
with conn:
cursor=conn.cursor()
strrest = self.restemail
cursor.execute('SELECT * FROM Restaurant3 WHERE restemail = strrest')
Can someone tell me how to use a variable inside my SQL queries without it being treated as a column name?
Any help will be appreciated.
Try the sqlite3 variable substitution syntax:
cursor.execute('SELECT * FROM Restaurant3 WHERE restemail = ?', (strrest,))
I'm trying to insert data into a pre-existing PostgreSQL table using RPostgreSQL and I can't figure out the syntax for SQL parameters (prepared statements).
E.g. suppose I want to do the following
insert into mytable (a,b,c) values ($1,$2,$3)
How do I specify the parameters? dbSendQuery doesn't seem to understand if you just put the parameters in the ....
I've found dbWriteTable can be used to dump an entire table, but won't let you specify the columns (so no good for defaults etc.). And anyway, I'll need to know this for other queries once I get the data in there (so I suppose this isn't really insert specific)!
Sure I'm just missing something obvious...
I was looking for the same thing, for the same reasons, which is security.
Apparently dplyr package has the capacity that you are interested in. It's barely documented, but it's there. Scroll down to "Postgresql" in this vignette: http://cran.r-project.org/web/packages/dplyr/vignettes/databases.html
To summarize, dplyr offers functions sql() and escape(), which can be combined to produce a parametrized query. SQL() function from DBI package seems to work in exactly same way.
> sql(paste0('SELECT * FROM blaah WHERE id = ', escape('random "\'stuff')))
<SQL> SELECT * FROM blaah WHERE id = 'random "''stuff'
It returns an object of classes "sql" and "character", so you can either pass it on to tbl() or possibly dbSendQuery() as well.
The escape() function correctly handles vectors as well, which I find most useful:
> sql(paste0('SELECT * FROM blaah WHERE id in ', escape(1:5)))
<SQL> SELECT * FROM blaah WHERE id in (1, 2, 3, 4, 5)
Same naturally works with variables as well:
> tmp <- c("asd", 2, date())
> sql(paste0('SELECT * FROM blaah WHERE id in ', escape(tmp)))
<SQL> SELECT * FROM blaah WHERE id in ('asd', '2', 'Tue Nov 18 15:19:08 2014')
I feel much safer now putting together queries.
As of the latest RPostgreSQL it should work:
db_connection <- dbConnect(dbDriver("PostgreSQL"), dbname = database_name,
host = "localhost", port = database_port, password=database_user_password,
user = database_user)
qry = "insert into mytable (a,b,c) values ($1,$2,$3)"
dbSendQuery(db_connection, qry, c(1, "some string", "some string with | ' "))
Here's a version using the DBI and RPostgres packages, and inserting multiple rows at once, since all these years later it's still very difficult to figure out from the documentation.
x <- data.frame(
a = c(1:10),
b = letters[1:10],
c = letters[11:20]
)
# insert your own connection info
con <- DBI::dbConnect(
RPostgres::Postgres(),
dbname = '',
host = '',
port = 5432,
user = '',
password = ''
)
RPostgres::dbSendQuery(
con,
"INSERT INTO mytable (a,b,c) VALUES ($1,$2,$3);",
list(
x$a,
x$b,
x$c
)
)
The help for dbBind() in the DBI package is the only place that explains how to format parameters:
The placeholder format is currently not specified by DBI; in the
future, a uniform placeholder syntax may be supported. Consult the
backend documentation for the supported formats.... Known examples are:
? (positional matching in order of appearance) in RMySQL and RSQLite
$1 (positional matching by index) in RPostgres and RSQLite
:name and $name (named matching) in RSQLite
? is also the placeholder for R package RJDBC.
I have a data frame in R having 3 columns, using sqlSave I can easily create a table in an SQL database:
channel <- odbcConnect("JWPMICOMP")
sqlSave(channel, dbdata, tablename = "ManagerNav", rownames = FALSE, append = TRUE, varTypes = c(DateNav = "datetime"))
odbcClose(channel)
This data frame contains information about Managers (Name, Nav and Date) which are updatede every day with new values for the current date and maybe old values could be updated too in case of errors.
How can I accomplish this task in R?
I treid to use sqlUpdate but it returns me the following error:
> sqlUpdate(channel, dbdata, tablename = "ManagerNav")
Error in sqlUpdate(channel, dbdata, tablename = "ManagerNav") :
cannot update ‘ManagerNav’ without unique column
When you create a table "the white shark-way" (see documentation), it does not get a primary index, but is just plain columns, and often of the wrong type. Usually, I use your approach to get the columns names right, but after that you should go into your database and assign a primary index, correct column widths and types.
After that, sqlUpdate() might work; I say might, because I have given up using sqlUpdate(), there are too many caveats, and use sqlQuery(..., paste("Update....))) for the real work.
What I would do for this is the following
Solution 1
sqlUpdate(channel, dbdata,tablename="ManagerNav", index=c("ManagerNav"))
Solution 2
Lcolumns <- list(dbdata[0,])
sqlUpdate(channel, dbdata,tablename="ManagerNav", index=c(Lcolumns))
Index is used to specify what columns R is going to update.
Hope this helps!
If none of the other solutions work and your data is not that big, I'd suggest using sqlQuery() and loop through your dataframe.
one_row_of_your_df <- function(i) {
sql_query <-
paste0("INSERT INTO your_table_name (column_name1, column_name2, column_name3) VALUES",
"(",
"'",your_dataframe[i,1],",",
"'",your_dataframe[i,2],"'",",",
"'",your_dataframe[i,3],"'",",",
")"
)
return(sql_query)
}
This function is Exasol specific, it is pretty similar to MySQL, but not identical, so small changes could be necessary.
Then use a simple for loop like this one:
for(i in 1:nrow(your_dataframe))
{
sqlQuery(your_connection, one_row_of_your_df(i))
}
I want to encrypt particular data I wish in the tables..
For example,
I need something like this:
update mytable set column1 = encrypt(column1, "key") where condition;
and this:
select decrypt(column1, "key") from mytable where condition;
Is threre any simple in-built SQL function in SQLite to accomplish
this?
I have a Java function for encrypt() and decrypt(), I need to bulk encrypt the table column and it will be too slow if I read the column, apply the function and then write back. Please advise.
Thanks in advance.
SQLite has possibility to supply your own functions - create_function sqlite function - that you can than use in the SQL statement.
So I would look for create_function in java, e.g:
http://ppewww.physics.gla.ac.uk/~tdoherty/sqlite/javasqlite-20050608/doc/SQLite/Database.html
or here is even an example
http://www.daniweb.com/software-development/java/threads/221260
For Example, in python it looks like this (example taken from http://docs.python.org/library/sqlite3.html Connection.create_function)
import sqlite3
import md5
def md5sum(t):
return md5.md5(t).hexdigest()
con = sqlite3.connect(":memory:")
con.create_function("md5", 1, md5sum)
cur = con.cursor()
cur.execute("select md5(?)", ("foo",))
print cur.fetchone()[0]
Or: http://www.sqlite.org/c3ref/create_function.html