I am trying to write a pandas DataFrame to a Postgres database.
Code is as below:
dbConnection = psycopg2.connect(user = "user1", password = "user1", host = "localhost", port = "5432", database = "postgres")
dbConnection.set_isolation_level(0)
dbCursor = dbConnection.cursor()
dbCursor.execute("DROP DATABASE IF EXISTS FiguresUSA")
dbCursor.execute("CREATE DATABASE FiguresUSA")
dbCursor.execute("DROP TABLE IF EXISTS FiguresUSAByState")
dbCursor.execute("CREATE TABLE FiguresUSAByState(Index integer PRIMARY KEY, Province_State VARCHAR(50), NumberByState integer)");
for i in data_pandas.index:
query = """
INSERT into FiguresUSAByState(column1, column2, column3) values('%s',%s,%i);
""" % (data_pandas['Index'], data_pandas['Province_State'], data_pandas['NumberByState'])
dbCursor.execute(query)
When I run this, I get an error which just says : "Index". I know its somewhere in my for loop is the problem, is that % notation correct? I am new to Postgres and don't see how that could be correct syntax. I know I can use to_sql but I am trying to use different techniques.
Print out of data_pandas is as below:
One slight possible anomaly is that there an "index" in the IDE version. Could this be the problem?
If you use pd.DataFrame.to_sql, you can supply the index_label parameter to use that as a column.
data_pandas.to_sql('FiguresUSAByState', con=dbConnection, index_label='Index')
If you would prefer to stick with the custom SQL and for loop you have, you will need to reset_index first.
for row in data_pandas.reset_index().to_dict('rows'):
query = """
INSERT into FiguresUSAByState(index, Province_State, NumberByState) values(%i, '%s', %i);
""" % (row['index'], row['Province_State'], row['NumberByState'])
Note that the default name for the new column is index, uncapitalized, rather than Index.
In the insert statement:
query = """
INSERT into FiguresUSAByState (column1, column2, column3) values ('%s',%s,%i);
"""% (data_pandas ['Index'], data_pandas ['Province_State'], data_pandas ['NumberByState'])
You have a '%s', I think that is the problem. So remove the quotes
I've created a 'artist' table in my database with the columns 'artistid' and 'artisttitle'. I also uploaded a csv that have the same names for headers. I'm using the below code to upload the csv data into the sql table but receive the following error:
---------------------------------------------------------------------------
UndefinedColumn Traceback (most recent call last)
<ipython-input-97-80bd8826bb17> in <module>
10 with connection, connection.cursor() as cursor:
11 for row in album.itertuples(index=False, name=None):
---> 12 cursor.execute(INSERT_SQL,row)
13
14 mediatype = mediatype.where(pd.notnull(mediatype), None)
UndefinedColumn: column "albumid" of relation "album" does not exist
LINE 1: INSERT INTO zp2gz.album (albumid, albumtitle) VALUES (1,'Fo...
^
EDIT---------------------------------
I meant to say albumid and albumtitle! My apologies
Seems like a typo -- you need to use albmid instead of albumid -- maybe fix your models.py and re-migrate.
I am attempting to export data and write to a formatted file in groovy 2.1.6. The query returns a null value for an entire column included in the query.
null, 0000001,1434368,ACTIVE
null, 0000002,1354447,ACTIVE
null, 0000004,1358538,ACTIVE
Here is the code that I am using in Groovy to query and write the data to a file.
private void profilerSql() {
def today = new Date()
def formattedDate = today.format('yyyyMMdd')
String reportSql
reportSql = """
SELECT
col_1,
col_2,
col_3,
col_4
from my_table
"""
sql.execute(reportSql)
def filename = "My_Table_export_" + formattedDate + ".csv"
//Create the file Object
File outputFile = new File(filename);
//Write a blank line to it to create a new "empty" file
outputFile.write("");
// Iterate through the SQL recordset. Output settings are defined within the function.
sql.eachRow(reportSql) {
// Create each line, joining the columns with a comma.
def reportLine = [it.col_1, it.col_2, it.col_3, it.col_4].join(',')
// Write the line to the file. End with a new line char.
outputFile.append(reportLine + System.getProperty("line.separator"))
}
}
Perhaps relevant information, the column that returns null values was created as a sequence in Oracle 11g. If any one can provide some insight even into how Groovy interacts with different data types in Oracle databases I would be grateful.
I see a couple things questionable about the code but none of which are about getting a sequence column out of Oracle - but wouldn't really expect that to be much of a problem - since JDBC has been around for years and years.
Don't think you need the initial call to sql.execute(reportSql) - the execute returns a boolean rather than a resultset.
Shouldn't the first parm to the outputFile.append be reportLine and not lineFormat?
Hope this helps!
I have a list and want to pass thru django raw sql.
Here is my list
region = ['US','CA','UK']
I am pasting a part of raw sql here.
results = MMCode.objects.raw('select assigner, assignee from mm_code where date between %s and %s and country_code in %s',[fromdate,todate,region])
Now it gives the below error, when i execute it in django python shell
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/usr/local/lib/python2.6/dist-packages/django/db/models/query.py", line 1412, in __iter__
query = iter(self.query)
File "/usr/local/lib/python2.6/dist-packages/django/db/models/sql/query.py", line 73, in __iter__
self._execute_query()
File "/usr/local/lib/python2.6/dist-packages/django/db/models/sql/query.py", line 87, in _execute_query
self.cursor.execute(self.sql, self.params)
File "/usr/local/lib/python2.6/dist-packages/django/db/backends/util.py", line 15, in execute
return self.cursor.execute(sql, params)
File "/usr/local/lib/python2.6/dist-packages/django/db/backends/mysql/base.py", line 86, in execute
return self.cursor.execute(query, args)
File "/usr/lib/pymodules/python2.6/MySQLdb/cursors.py", line 166, in execute
self.errorhandler(self, exc, value)
File "/usr/lib/pymodules/python2.6/MySQLdb/connections.py", line 35, in defaulterrorhandler
raise errorclass, errorvalue
DatabaseError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ')' at line 1")
I have tried by passing the tuple also but there is no use. Can some one help me.
Thanks
Vikram
For PostgreSQL at least, a list/tuple parameter is converted into an array in SQL, e.g.
ARRAY['US', 'CA', 'UK']
When this is inserted into the given query, it results in invalid SQL -
SELECT assigner, assignee FROM mm_code
WHERE date BETWEEN '2014-02-01' AND '2014-02-05'
AND country_code IN ARRAY['US', 'CA', 'UK']
However, the 'in' clause in SQL is logically equivalent to -
SELECT assigner, assignee FROM mm_code
WHERE date BETWEEN %s AND %s
AND country_code = ANY(%s)
... and when this query is filled with the parameters, the resulting SQL is valid and works -
SELECT assigner, assignee FROM mm_code
WHERE date BETWEEN '2014-02-01' AND '2014-02-05'
AND country_code = ANY(ARRAY['US', 'CA', 'UK'])
I'm not sure if this works in the other databases though, and whether or not this changes how the query is planned.
Casting the list to a tuple does work in Postgres, although the same code fails under sqlite3 with DatabaseError: near "?": syntax error so it seems this is backend-specific. Your line of code would become:
results = MMCode.objects.raw('select assigner, assignee from mm_code where date between %s and %s and country_code in %s',[fromdate,todate,tuple(region)])
I tested this on a clean Django 1.5.1 project with the following in bar/models.py:
from django.db import models
class MMCode(models.Model):
assigner = models.CharField(max_length=100)
assignee = models.CharField(max_length=100)
date = models.DateField()
country_code = models.CharField(max_length=2)
then at the shell:
>>> from datetime import date
>>> from bar.models import MMCode
>>>
>>> regions = ['US', 'CA', 'UK']
>>> fromdate = date.today()
>>> todate = date.today()
>>>
>>> results = MMCode.objects.raw('select id, assigner, assignee from bar_mmcode where date between %s and %s and country_code in %s',[fromdate,todate,tuple(regions)])
>>> list(results)
[]
(note that the query line is changed slightly here, to use the default table name created by Django, and to include the id column in the output so that the ORM doesn't complain)
This is not a great solution, because you must make sure your "region" values are correctly escaped for SQL. However, this is the only thing I could get to work with Sqlite:
sql = ('select assigner, assignee from mm_code '
'where date between %%s and %%s and country_code in %s' % (tuple(region),))
results = MMCode.objects.raw(sql, [fromdate,todate])
I ran into exactly this problem today. Django has changed (we now have RawSQL() and friends!), but the general solution is still the same.
According to https://stackoverflow.com/a/283801/532513 the general idea is to explicitly add the same numbers of placeholders to your SQL string as there are elements in your region array.
Your code would then look like this:
sql = 'select assigner, assignee from mm_code where date between %s and %s and country_code in ({0})'\
.format(','.join([%s] * len(region)))
results = MMCode.objects.raw(sql, [fromdate,todate] + region)
Your sql string would then first become ... between %s and %s and country_code in (%s, %s, %s) ... and your params would be effectively [fromdate, todate, 'US', 'CA', 'UK']. This way, you allow the database backend to correctly escape and potentially encode each of the country codes.
Well i'm not against raw sql but you can use:
MMCode.objects.filter(country_code__in=region, date__range=[fromdate,todate])
hope this helps.
I use Jython2.5.0, mysql-connector-java-5.0.8-bin.jar, because my sever is Mysql5.0.38
There is a problem in my Jython application.
Two functions:
def act_query(query):
connection = conn //conn is global and has been assigned when called
cursor = connection.cursor()
num_affected_rows = cursor.execute(query)
cursor.close()
connection.commit()
return num_affected_rows
def get_row(query):
connection = conn
cursor = connection.cursor()
cursor.execute(query)
row = cursor.fetchone()
cursor.close()
return row
Then i do something like this:
query = """SELECT count(*) from test_db.test_table1 into #max"""
act_query(query)
get_query= """ select #max """
row = get_row(get_query)
print row
The output is something like: array('b', [49, 52, 54, 50, 56, 48, 52])
I try to find the reason and make a test like:
test_query = """select count(*) from test_db.test_table1"""
row = get_row(test_query)
print row
The output is the right answer
In fact this a script in CPython first with MySQLdb and I turn it into Jython with zxJDBC
It worked well in CPython but It can't work now.
In CPython the cursor can be defined as
cursor = connection.cursor(MySQLdb.cursors.DictCursor)
but in zxJDBC there seem to be no choice for creating a cursor.
Is this the reason?
Please show me the way. Thanks!
I got the answer from jython mailist.
A session variable get by JDBC is of varbinary type. zxJDBC make it to be an array.The right output can be get by converting the array into int with str(byte[]).