CSV to SQL Upload - sql

I've created a 'artist' table in my database with the columns 'artistid' and 'artisttitle'. I also uploaded a csv that have the same names for headers. I'm using the below code to upload the csv data into the sql table but receive the following error:
---------------------------------------------------------------------------
UndefinedColumn Traceback (most recent call last)
<ipython-input-97-80bd8826bb17> in <module>
10 with connection, connection.cursor() as cursor:
11 for row in album.itertuples(index=False, name=None):
---> 12 cursor.execute(INSERT_SQL,row)
13
14 mediatype = mediatype.where(pd.notnull(mediatype), None)
UndefinedColumn: column "albumid" of relation "album" does not exist
LINE 1: INSERT INTO zp2gz.album (albumid, albumtitle) VALUES (1,'Fo...
^
EDIT---------------------------------
I meant to say albumid and albumtitle! My apologies

Seems like a typo -- you need to use albmid instead of albumid -- maybe fix your models.py and re-migrate.

Related

trying to import csv file to table in sql

I have 4 csv files each having 500,000 rows. I am trying to import the csv data into my Exasol databse, but there is an error with the date column and I have a problem with the first unwanted column in the files.
Here is an example CSV file:
unnamed:0 , time, lat, lon, nobs_cloud_day
0, 2006-03-30, 24.125, -119.375, 22.0
1, 2006-03-30, 24.125, -119.125, 25.0
The table I created to import csv to is
CREATE TABLE cloud_coverage_CONUS (
index_cloud DECIMAL(10,0)
,"time" DATE -- PRIMARY KEY
,lat DECIMAL(10,6)
,lon DECIMAL(10,6)
,nobs_cloud_day DECIMAL (3,1)
)
The command to import is
IMPORT INTO cloud_coverage_CONUS FROM LOCAL CSV FILE 'D:\uni\BI\project 1\AOL_DB_ANALYSIS_TASK1\datasets\cloud\cfc_us_part0.csv';
But I get this error:
SQL Error [42636]: java.sql.SQLException: ETL-3050: [Column=0 Row=0] [Transformation of value='Unnamed: 0' failed - invalid character value for cast; Value: 'Unnamed: 0'] (Session: 1750854753345597339) while executing '/* add path to the 4 csv files, that are in the cloud database folder*/ IMPORT INTO cloud_coverage_CONUS FROM CSV AT 'https://27.1.0.10:59205' FILE 'e12a96a6-a98f-4c0a-963a-e5dad7319fd5' ;'; 04509 java.sql.SQLException: java.net.SocketException: Connection reset by peer: socket write error
Alternatively I use this table (without the first column):
CREATE TABLE cloud_coverage_CONUS (
"time" DATE -- PRIMARY KEY
,lat DECIMAL(10,6)
,lon DECIMAL(10,6)
,nobs_cloud_day DECIMAL (3,1)
)
And use this import code:
IMPORT INTO cloud_coverage_CONUS FROM LOCAL CSV FILE 'D:\uni\BI\project 1\AOL_DB_ANALYSIS_TASK1\datasets\cloud\cfc_us_part0.csv'(2 FORMAT='YYYY-MM-DD', 3 .. 5);
But I still get this error:
SQL Error [42636]: java.sql.SQLException: ETL-3052: [Column=0 Row=0] [Transformation of value='time' failed - invalid value for YYYY format token; Value: 'time' Format: 'YYYY-MM-DD'] (Session: 1750854753345597339) while executing '/* add path to the 4 csv files, that are in the cloud database folder*/ IMPORT INTO cloud_coverage_CONUS FROM CSV AT 'https://27.1.0.10:60350' FILE '22c64219-cd10-4c35-9e81-018d20146222' (2 FORMAT='YYYY-MM-DD', 3 .. 5);'; 04509 java.sql.SQLException: java.net.SocketException: Connection reset by peer: socket write error
(I actually do want to ignore the first column in the files.)
How can I solve this issue?
Solution:
IMPORT INTO cloud_coverage_CONUS FROM LOCAL CSV FILE 'D:\uni\BI\project 1\AOL_DB_ANALYSIS_TASK1\datasets\cloud\cfc_us_part0.csv' (2 .. 5) ROW SEPARATOR = 'CRLF' COLUMN SEPARATOR = ',' SKIP = 1;
I did not realise that mysql is different from exasol
Looking at the first error message, a few things stand out. First we see this:
[Column=0 Row=0]
This tells us the problem is with the very first value in the file. This brings us to the next thing, where the message even tells us what value was read:
Transformation of value='Unnamed: 0' failed
So it's failing to convert Unnamed: 0. You also provided the table definition, where we see the first column in the table is a decimal type.
This makes sense. Unnamed: 0 is not a decimal. For this to work, the CSV data MUST align with the data types for the columns in the table.
But we also see this looks like a header row. Assuming everything else matches we can fix it by telling the database to skip this first row. I'm not familiar with Exasol, but according to the documentation I believe the correct code will look like this:
IMPORT INTO cloud_coverage_CONUS
FROM LOCAL CSV FILE 'D:\uni\BI\project 1\AOL_DB_ANALYSIS_TASK1\datasets\cloud\cfc_us_part0.csv'
(2 FORMAT='YYYY-MM-DD', 3 .. 5)
ROW SEPARATOR = 'CRLF'
COLUMN SEPARATOR = ','
SKIP = 1;

Passing table name and list of values as argument to psycopg2 query

Context
I would like to pass a table name along with query parameters in a psycopg2 query in a python3 function.
If I understand correctly, I should not format the query string using python .format() method prior to the execution of the query, but let psycopg2 do that.
Issue
I can't succeed passing both the table name and the parameters as argument to my query string.
Code sample
Here is a code sample:
import psycopg2
from psycopg2 import sql
connection_string = "host={} port={} dbname={} user={} password={}".format(*PARAMS.values())
conn = psycopg2.connect(connection_string)
curs = conn.cursor()
table = 'my_customers'
cities = ["Paris", "London", "Madrid"]
data = (table, tuple(customers))
query = sql.SQL("SELECT * FROM {} WHERE city = ANY (%s);")
curs.execute(query, data)
rows = cursLocal.fetchall()
Error(s)
But I get the following error message:
TypeError: not all arguments converted during string formatting
I also tried to replace the data definition by:
data = (sql.Identifier(table), tuple(object_types))
But then this error pops:
ProgrammingError: can't adapt type 'Identifier'
If I put ANY {} instead of ANY (%s) in the query string, in both previous cases this error shows:
SyntaxError: syntax error at or near "{"
LINE 1: ...* FROM {} WHERE c...
^
Initially, I didn't used the sql module and I was trying to pass the data as the second argument to the curs.execute() method, but then the table name was single quoted in the command, which caused troubles. So I gave the sql module a try, hopping it's not a deprecated habit.
If possible, I would like to keep the curly braces {} for parameters substitution instead of %s, except if it's a bad idea.
Environment
Ubuntu 18.04 64 bit 5.0.0-37-generic x86_64 GNU/Linux
Python 3.6.9 (default, Nov 7 2019, 10:44:02)
psycopg2.__version__
'2.8.4 (dt dec pq3 ext lo64)'
You want something like:
table = 'my_customers'
cities = ["Paris", "London", "Madrid"]
query = sql.SQL("SELECT * FROM {} WHERE city = ANY (%s)").format(sql.Identifier(table))
curs.execute(query, (cities,))
rows = cursLocal.fetchall()

Python & SnakeSQL - raise lock.LockError('Lock no longer valid.') ERROR

I am trying to run a python script (createdb.py) which has DB operations from my main python script (app.py) but having the below error.
Traceback (most recent call last):
File "C:\Python27\lib\site-packages\web\application.py", line 236, in process
return self.handle()
File "C:\Python27\lib\site-packages\web\application.py", line 227, in handle
return self._delegate(fn, self.fvars, args)
File "C:\Python27\lib\site-packages\web\application.py", line 409, in _delegate
return handle_class(cls)
File "C:\Python27\lib\site-packages\web\application.py", line 384, in handle_class
return tocall(*args)
File "D:\Python\virtualenvs\new4\textweb\bin\app.py", line 16, in GET
createdb.createTables()
File "D:\Python\virtualenvs\new4\textweb\bin\createdb.py", line 9, in createTables
cursor.execute("CREATE TABLE table (dateColumn Date, numberColumn Integer)")
File "D:\Python\virtualenvs\new4\textweb\bin\SnakeSQL\driver\base.py", line 1548, in execute
self.info = self.connection._create(parsedSQL['table'], parsedSQL['columns'], parameters)
File "D:\Python\virtualenvs\new4\textweb\bin\SnakeSQL\driver\base.py", line 993, in _create
self._insertRowInColTypes(table)
File "D:\Python\virtualenvs\new4\textweb\bin\SnakeSQL\driver\base.py", line 632, in _insertRowInColTypes
], types= ['String','String','String','Bool','Bool','Bool','Text','Text','Integer']
File "D:\Python\virtualenvs\new4\textweb\bin\SnakeSQL\driver\dbm.py", line 61, in _insertRow
self.tables[table].file[str(primaryKey)] = str(values)
File "D:\Python\virtualenvs\new4\textweb\bin\SnakeSQL\external\lockdbm.py", line 50, in __setitem__
raise lock.LockError('Lock no longer valid.')
LockError: Lock no longer valid.
Here is my createdb.py code;
import SnakeSQL
connection = SnakeSQL.connect(database='test', autoCreate=True)
connection = SnakeSQL.connect(database='test')
cursor = connection.cursor()
def createTables():
cursor.execute("CREATE TABLE table (dateColumn Date, numberColumn Integer)")
cursor.execute("INSERT INTO table (dateColumn, numberColumn) VALUES ('2003-11-8', 3)")
cursor.execute("INSERT INTO table (dateColumn, numberColumn) VALUES ('2004-11-8', 4)")
cursor.execute("INSERT INTO table (dateColumn, numberColumn) VALUES ('2005-11-8', 5)")
cursor.execute("INSERT INTO table (dateColumn, numberColumn) VALUES ('2006-11-8', 6)")
def select():
selectResult = cursor.execute("SELECT dateColumn FROM table WHERE numberColumn = 3")
return selectResult
if __name__ == "__main__":
createTables()
and here is my app.py code;
import web
import SnakeSQL
import createdb
render = web.template.render('templates/')
connection = SnakeSQL.connect(database='test')
cursor = connection.cursor()
urls = (
'/', 'index'
)
class index:
def GET(self):
createdb.createTables()
result = createdb.select()
return render.index(result)
if __name__ == "__main__":
app = web.application(urls, globals())
app.run()
I couldn't find out why I am having this error. Can you please share your knowledge for solving this problem?
First off, the SnakeSQL docs appear to be from 2004, the actual code was last updated in 2009, and the author states that the project is no longer maintained. You may want to consider using something still actively maintained instead.
The docs also mention:
In theory, one of the processes accessing the database could get stuck in an infinite loop and not release the lock on the database to allow other users to access it. After a period of 2 seconds, if the process with the current lock on the database doesn't access it, the lock will be released and another process can obtain a lock. The first process will itself have to wait to obtain a lock.
Looking at your traceback, I'll make an educated guess that since you put the cursor at module level (which again, you probably don't want to do), it created the cursor when the module was first imported, then by the time your program actually ran the createTables function, more than 2 seconds had elapsed, and it has given up the lock.
Try moving the line to create your cursor inside your methods:
def createTables():
cursor = connection.cursor()
cursor.execute("CREATE TABLE table (dateColumn Date, numberColumn Integer)")
cursor.execute("INSERT INTO table (dateColumn, numberColumn) VALUES ('2003-11-8', 3)")
cursor.execute("INSERT INTO table (dateColumn, numberColumn) VALUES ('2004-11-8', 4)")
cursor.execute("INSERT INTO table (dateColumn, numberColumn) VALUES ('2005-11-8', 5)")
cursor.execute("INSERT INTO table (dateColumn, numberColumn) VALUES ('2006-11-8', 6)")
def select():
cursor = connection.cursor()
selectResult = cursor.execute("SELECT dateColumn FROM table WHERE numberColumn = 3")
return selectResult
(and do the same in your app.py code).

Unable to query a view created using sdo_nn statment

CREATE VIEW bd_nearest_hy AS
SELECT b1.b_name, h1.h_id
FROM building b1,hydrant h1
WHERE sdo_nn(h1.shape, b1.shape,'sdo_num_res = 1') = 'TRUE';
SELECT bd_nearest_hy.b_name
FROM bd_nearest_hy
WHERE bd_nearest_hy.h_id = 'p30';
I created a view, which is used to store the building name and its corresponding nearest hydrant. The sdo_nn statement works fine and the view is correct.
However, when I select the row contains h_id = p30 from the view, the database says:
Error report:
SQL Error: ORA-13249: SDO_NN cannot be evaluated without using index
ORA-06512: at "MDSYS.MD", line 1723
ORA-06512: at "MDSYS.MDERR", line 17
ORA-06512: at "MDSYS.PRVT_IDX", line 9
What's wrong with it?

Passing lists or tuples as arguments in django raw sql

I have a list and want to pass thru django raw sql.
Here is my list
region = ['US','CA','UK']
I am pasting a part of raw sql here.
results = MMCode.objects.raw('select assigner, assignee from mm_code where date between %s and %s and country_code in %s',[fromdate,todate,region])
Now it gives the below error, when i execute it in django python shell
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/usr/local/lib/python2.6/dist-packages/django/db/models/query.py", line 1412, in __iter__
query = iter(self.query)
File "/usr/local/lib/python2.6/dist-packages/django/db/models/sql/query.py", line 73, in __iter__
self._execute_query()
File "/usr/local/lib/python2.6/dist-packages/django/db/models/sql/query.py", line 87, in _execute_query
self.cursor.execute(self.sql, self.params)
File "/usr/local/lib/python2.6/dist-packages/django/db/backends/util.py", line 15, in execute
return self.cursor.execute(sql, params)
File "/usr/local/lib/python2.6/dist-packages/django/db/backends/mysql/base.py", line 86, in execute
return self.cursor.execute(query, args)
File "/usr/lib/pymodules/python2.6/MySQLdb/cursors.py", line 166, in execute
self.errorhandler(self, exc, value)
File "/usr/lib/pymodules/python2.6/MySQLdb/connections.py", line 35, in defaulterrorhandler
raise errorclass, errorvalue
DatabaseError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ')' at line 1")
I have tried by passing the tuple also but there is no use. Can some one help me.
Thanks
Vikram
For PostgreSQL at least, a list/tuple parameter is converted into an array in SQL, e.g.
ARRAY['US', 'CA', 'UK']
When this is inserted into the given query, it results in invalid SQL -
SELECT assigner, assignee FROM mm_code
WHERE date BETWEEN '2014-02-01' AND '2014-02-05'
AND country_code IN ARRAY['US', 'CA', 'UK']
However, the 'in' clause in SQL is logically equivalent to -
SELECT assigner, assignee FROM mm_code
WHERE date BETWEEN %s AND %s
AND country_code = ANY(%s)
... and when this query is filled with the parameters, the resulting SQL is valid and works -
SELECT assigner, assignee FROM mm_code
WHERE date BETWEEN '2014-02-01' AND '2014-02-05'
AND country_code = ANY(ARRAY['US', 'CA', 'UK'])
I'm not sure if this works in the other databases though, and whether or not this changes how the query is planned.
Casting the list to a tuple does work in Postgres, although the same code fails under sqlite3 with DatabaseError: near "?": syntax error so it seems this is backend-specific. Your line of code would become:
results = MMCode.objects.raw('select assigner, assignee from mm_code where date between %s and %s and country_code in %s',[fromdate,todate,tuple(region)])
I tested this on a clean Django 1.5.1 project with the following in bar/models.py:
from django.db import models
class MMCode(models.Model):
assigner = models.CharField(max_length=100)
assignee = models.CharField(max_length=100)
date = models.DateField()
country_code = models.CharField(max_length=2)
then at the shell:
>>> from datetime import date
>>> from bar.models import MMCode
>>>
>>> regions = ['US', 'CA', 'UK']
>>> fromdate = date.today()
>>> todate = date.today()
>>>
>>> results = MMCode.objects.raw('select id, assigner, assignee from bar_mmcode where date between %s and %s and country_code in %s',[fromdate,todate,tuple(regions)])
>>> list(results)
[]
(note that the query line is changed slightly here, to use the default table name created by Django, and to include the id column in the output so that the ORM doesn't complain)
This is not a great solution, because you must make sure your "region" values are correctly escaped for SQL. However, this is the only thing I could get to work with Sqlite:
sql = ('select assigner, assignee from mm_code '
'where date between %%s and %%s and country_code in %s' % (tuple(region),))
results = MMCode.objects.raw(sql, [fromdate,todate])
I ran into exactly this problem today. Django has changed (we now have RawSQL() and friends!), but the general solution is still the same.
According to https://stackoverflow.com/a/283801/532513 the general idea is to explicitly add the same numbers of placeholders to your SQL string as there are elements in your region array.
Your code would then look like this:
sql = 'select assigner, assignee from mm_code where date between %s and %s and country_code in ({0})'\
.format(','.join([%s] * len(region)))
results = MMCode.objects.raw(sql, [fromdate,todate] + region)
Your sql string would then first become ... between %s and %s and country_code in (%s, %s, %s) ... and your params would be effectively [fromdate, todate, 'US', 'CA', 'UK']. This way, you allow the database backend to correctly escape and potentially encode each of the country codes.
Well i'm not against raw sql but you can use:
MMCode.objects.filter(country_code__in=region, date__range=[fromdate,todate])
hope this helps.