I've looked at the documentation in various places to see how to do this, but I haven't had any success. I want to pass in the name of a column into a sql query. I'm using psycopg2 and My most recent attempt was based off of this doc page http://initd.org/psycopg/docs/sql.html#module-psycopg2.sql
Here is my latest attempt, but I get an error IndexError: tuple index out of range that points to the format() where I'm passing in the parameter.
def parse_files(cursor):
for name in column_names:
cursor.execute(
sql.SQL(
"select planet_osm_point.{}, count(*) from planet_osm_point group by planet_osm_point.{}"
).format(sql.Identifier(name)))
for row in cursor:
print(str(row[0]) + str(row[1]))
It's not clear by the given documentation, but it looks like I need to pass in a value inside of the {} specifying what argument I want to use. In this case it's {0}
column_names = ['col1', 'col2']
for column in column_names:
query = sql.SQL('''
select {0}, count(*)
from planet_osm_point pop
group by {0}
''').format(sql.Identifier('pop.' + column))
cursor.execute(query)
for row in cursor.fetchall():
print (str(row[0]) + str(row[1]))
Related
I want to use the select command in the Postgress database.
I use the following command to select.
But there is a parenthesis in the output.
What command should I use so that there is no parentheses in the output?
cursor = connection.cursor()
p = " select price from mobile "
cursor.execute(p)
result = cursor.fetchall()
print(result)
[(1300.0,), (1100.0,), (1200.0,), (1100.0,), (1200.0,), (1500.0,)]
Take the first item from each row, and store those items in a new list. (A list of values, rather than a list of rows.)
result_1d = [row[0] for row in result]
I have a query like this:
SELECT prodId, prod_name , prod_type FROM mytable WHERE prod_type in (:list_prod_names)
I want to get the information of a product, depending on the possible types are: "day", "week", "weekend", "month". Depending on the date it might be at least one of those option, or a combination of all of them.
This info (List type) is returned by the function prod_names(date_search)
I am using cx_oracle bindings with code like:
def get_prod_by_type(search_date :datetime):
query_path = r'./queries/prod_by_name.sql'
raw_query = open(query_path).read().strip().replace('\n', ' ').replace('\t', ' ').replace(' ', ' ')
print(sql_read_op)
# Depending on the date the product types may be different
prod_names(search_date) #This returns a list with possible names
qry_params = {"list_prod_names": prod_names} # See attempts bellow
try:
db = DB(username='username', password='pss', hostname="localhost")
df = db.get(raw_query,qry_params)
except Exception:
exception_error = traceback.format_exc()
exception_error = 'Exception on DB.get_short_cov_op2() : %s\n%s' % exception_error
print(exception_error)
return df
For this: qry_params = {"list_prod_names": prod_names} I have tried multiple different things such as:
prod_names = ''.join(prod_names)
prod_names = str(prod_names)
prod_names =." \'"+''.join(prod_names)+"\'"
The only thing I have managed to get it work is by doing:
new_query = raw_query.format(list_prod_names=prodnames_for_date(search_date)).replace('[', '').replace(']','')
df = db.query(new_query)
I am trying not to use .format() because is bad practie to do a .format to an sql to prevent attacks.
db.py contains among other functions:
def get(self, sql, params={}):
cur = self.con.cursor()
cur.prepare(sql)
try:
cur.execute(sql, **params)
df = pd.DataFrame(cur.fetchall(), columns=[c[0] for c in cur.description])
except Exception:
exception_error = traceback.format_exc()
exception_error = 'Exception on DB.get() : %s\n%s' % exception_error
print(exception_error)
self.con.rollback()
cur.close()
df.columns = df.columns.map(lambda x: x.upper())
return df
I would like to be able to do a type binding.
I am using:
python = 3.6
cx_oracle = 6.3.1
I have read the followig articles but I a still unable to find a solution:
Python cx_Oracle bind variables
Python cx_Oracle SQL with bind string variable
Search for name in cx_Oracle
Unfortunately you cannot bind an array directly unless you convert it to a SQL type and use a subquery -- which is fairly complex. So instead you need to do something like this:
inClauseParts = []
for i, inValue in enumerate(ARRAY_VALUE):
argName = "arg_" + str(i + 1)
inClauseParts.append(":" + argName)
clause = "%s in (%s)" % (columnName, ",".join(inClauseParts))
This works fine but be aware that if the number of elements in the array changes regularly that using this technique will create a separate statement that must be parsed for each number of elements. If you know that (in general) you won't have more than (for example) 10 elements in the array it would be better to append None to the incoming array so that the number of elements is always 10.
Hopefully that is clear enough!
I have finally manage to do it. It might not be pretty but it works.
I have modified my sql query to include an extra select which returns the value of my list of descriptors:
inner join (
SELECT regexp_substr(:my_list_of_items, '[^,]+', 1, LEVEL) as mylist
FROM dual
CONNECT BY LEVEL <= length(:my_list_of_items) - length(REPLACE(:my_list_of_items, ',', '')) + 1
) d
on d.mylist= a.corresponding_columns
I am wondering what approach should have been selected to perform action from title. I am using ODBC connection and what I get from first sql query are like 40-50 rows in one column. What I want is to put this output as a values in to search for.
How should i treat this? Like a array or separated variables? I still do not know R well so just need to know where to search for.
Regards
------more explanation below----
I have list of 40-50 numbers of 10 digits each, organized in a column.
I am trying to do this:
list <- c(my_input)
sql_in <- paste0(list, collapse="")
and characters are organized like this after this operations:
'c(1234567890, , 1234567890, 1234567890)'
and almost all looks fine and fit into my query besides additional c character at the beginning and missing apostrophes.I try to use gsub function but did not work in way I want.
You may likely do this in one SQL call using a subquery. Notice in the call below that the result of
SELECT n_gear
FROM Gear
WHERE n_gear IN (3,4)
Is passed to the WHERE clause of the primary query. This is perfectly valid and will allow your query to execute entirely in SQL without having to do any intermediate steps in R.
(I use sqldf for simplicity of illustration, but this should work through just about any ODBC connection)
library(sqldf)
Gear <- data.frame(n_gear = 1:5)
sqldf(
"SELECT mpg, qsec, gear, wt
FROM mtcars
WHERE gear IN (SELECT n_gear
FROM Gear
WHERE n_gear IN (3,4))"
)
Try something like this:
list<-c("try","this") #The output from your first query
sql_in<-paste0(list, collapse="','")
The Output
paste("select * from table where table.var in ",paste("('",sql_in,"')",sep=''))
[1] "select * from table where table.var in ('try','this')"
If yuo have space as first or last element of the string you can use this code:
`list<-c(" first element is a space","try","this","last element is a space ")` #The output from your first query
Find space at first or last character
first_space<-substr(list, start = 1, stop = 1)==" "
last_space<-substr(list, start = nchar(list), stop = nchar(list))==" "
Remove spaces
list[first_space]<-substr(list[first_space], start = 2, stop = nchar(list[first_space]))
list[last_space]<-substr(list[last_space], start = 1, stop = nchar(list[last_space])-1)
sql_in<-paste0(list, collapse="','")
Your output
paste0("select * from table where table.var in ",paste("('",sql_in,"')",sep=''))
"select * from table where table.var in ('first element is a space','try','this','last element is a space')"
I think You are expecting some thing like shown below code,
data <- dbGetQuery(con, "select column from yourfirsttable")
list <- paste(data$column, collapse="','")
result <- dbGetQuery(con, statement = sprintf("select * from yourresulttable where inv in ('%s')",list))
It's not entirely clear exactly what you're wanting to achieve here. For example, one use case just means you can do it all with a join. But I have cases where I don't know the values for the test without doing some computation. Then I do a separate query having created a query string thus:
> id <- 1:5
> paste0("SELECT * FROM table WHERE ID IN (", paste0(id, collapse = ","), ")")
[1] "SELECT * FROM table WHERE ID IN (1,2,3,4,5)"
I'm trying to retrieve some data from a postgresql database using psycogp2, and either exclude a variable number of rows or exclude none.
The code I have so far is:
def db_query(variables):
cursor.execute('SELECT * '
'FROM database.table '
'WHERE id NOT IN (%s)', (variables,))
This does partially work. E.g. If I call:
db_query('593')
It works. The same for any other single value. However, I cannot seem to get it to work when I enter more than one variable, eg:
db_query('593, 595')
I get the error:
psycopg2.DataError: invalid input syntax for integer: "593, 595"
I'm not sure how to enter the query correctly or amend the SQL query. Any help appreciated.
Thanks
Pass a tuple as it is adapted to a record:
query = """
select *
from database.table
where id not in %s
"""
var1 = 593
argument = (var1,)
print(cursor.mogrify(query, (argument,)).decode('utf8'))
#cursor.execute(query, (argument,))
Output:
select *
from database.table
where id not in (593)
I have a dataframe in pandas containing the following information
Using a for loop for each entry in the TRANSACTION_ID, I am calling the following function,
def checkForImages(TransNum):
"""pass function a transaction number and get the string with image found information then store that
string into the same row in a new column"""
try:
cursor.execute('select CAMERA_TYPE from VEHICLE_IMAGE where TRANSACTION_ID=' + str(TransNum))
result = ''
for img_type in cursor:
result = result + img_type[0]
if result == '':
result = 'No image available'
print 'Images found: ' + str(TransNum) + " "+ result
resultSort = result.split()
resultSort.sort()
result = ''
for i in range(len(resultSort)):
result = result + " " + resultSort[i]
cursor.close()
return result
except Exception as e:
# print 'Error occured while getting image references: ', e
pass
This function returns a string which is either 'No images available' or has the image information if found. I have to create a new column in the dataframe populated with this result so my final dataframe should look like this
My question is: How can I speed up this process? Using for loop on rows with 100k+ entries is extremely slow and painful. I have looked into functions like dataframe.map and dataframe.apply but haven't been able to get it working. Other options I see is using cython or multiple threads. In which option should I invest my time? Any help is appreciated
You query Oracle for each transaction and then additionally aggregate fetched data for each transaction in a loop - it's very inefficient.
First i would create a "mapping" DataFrame like as follows:
transaction_id images
111 No image available
112 FRONT REAR
113 OVERVIEW
this can be done using Oracle's LISTAGG function:
qry = """
select
transaction_id,
NVL(listagg(camera_type, ' ') within group (order by camera_type), 'No image available') as images
from vehicle_image group by transaction_id
"""
# `engine` - is a SQLAlchemy engine connection ...
cam = pd.read_sql(qry, con=engine, index_col=['transaction_id'])
after that we can use Series.map() method:
df['Image_Found'] = df.transaction_id.map(cam.images)