I am using boto3 library with executeStatement to get data from an RDS cluster using DATA API.
Query is working fine if i select 1 or 2 columns but as soon as I select another column to query, it returns an error with (BadRequestException) permission denied for relation table_name
I have checked using pgadmin the permissions are intact to query the whole db for the user I am using.
function included in call:
def execute_query(self, sql_query, sql_parameters=[]):
"""
Aurora DataAPI execute query. Generally used for select statements.
:param sql_query: Query
:param sql_parameters: parameters in sql query
:return: DataApi response
"""
client = self.api_access()
response = client.execute_statement(
resourceArn=RESOURCE_ARN,
secretArn=SECRET_ARN,
database='db_name',
sql=sql_query,
includeResultMetadata=True,
parameters=sql_parameters)
return response
function call: No errors
query = '''
SELECT id
FROM schema_name.table_name
limit 1
'''
print(query)
result = conn.execute_query(query)
print(result)
function call: fails with above error
query = '''
SELECT id,name,event
FROM schema_name.table_name
limit 1
'''
print(query)
result = conn.execute_query(query)
print(result)
Is there a horizontal limit on what we can get from DATA API using Boto3? I know there is a limit for 1MB, but it should return something as per the documentation if it exceeds the limit.
Backend is Postgres RDS
UPDATE:
I can select the same columns 10 times and its not a problem
query = '''
SELECT id,event,event,event,event,event
FROM schema_name.table_name
limit 1
'''
print(query)
result = conn.execute_query(query)
print(result)
So this means there are some columns that I cannot select.
I didnt know there are column level security in some tables. If there are column level securities in postgres for the user you are using that's obvious I cannot select those columns.
Related
I am using SQLAlchemy by create engine to connect python to snowflake for fetching data. Adding a snippet of code on how I am doing it. Before you suggest to use connector.snowflake, I have tried that and it has query tag but I need to pull queries by thread pool method and couldn't find a way to add query tag.
Have also tried ALTER SESSION SET QUERY_TAG but since the query runs parallelly it doesn't give the query tag.
Code:
vendor_class_query ='select * from table'
query_list1 = [vendor_class_query]
pool = ThreadPool(8)
def query(x):
engine = create_engine(
'snowflake://{user}:{password}#{account}/{database_name}/{schema_name}?\
warehouse={warehouse}&role={role}¶mstyle={paramstyle}'.format(
user=---------,
password=----------,
account=----------,
database_name=----------,
schema_name=----------,
warehouse=----------,
role=----------,
paramstyle='pyformat'
),
poolclass=NullPool
)
try:
connection = engine.connect()
for df in pd.read_sql_query(x, engine, chunksize=1000000000):
df.columns = map(str.upper, df.columns)
return df
finally:
connection.close()
engine.dispose()
return df
results1 = pool.map(query, query_list1)
vendor_class = results1[0]'''
I'm trying to retrieve some data from a postgresql database using psycogp2, and either exclude a variable number of rows or exclude none.
The code I have so far is:
def db_query(variables):
cursor.execute('SELECT * '
'FROM database.table '
'WHERE id NOT IN (%s)', (variables,))
This does partially work. E.g. If I call:
db_query('593')
It works. The same for any other single value. However, I cannot seem to get it to work when I enter more than one variable, eg:
db_query('593, 595')
I get the error:
psycopg2.DataError: invalid input syntax for integer: "593, 595"
I'm not sure how to enter the query correctly or amend the SQL query. Any help appreciated.
Thanks
Pass a tuple as it is adapted to a record:
query = """
select *
from database.table
where id not in %s
"""
var1 = 593
argument = (var1,)
print(cursor.mogrify(query, (argument,)).decode('utf8'))
#cursor.execute(query, (argument,))
Output:
select *
from database.table
where id not in (593)
I have a dataframe in pandas containing the following information
Using a for loop for each entry in the TRANSACTION_ID, I am calling the following function,
def checkForImages(TransNum):
"""pass function a transaction number and get the string with image found information then store that
string into the same row in a new column"""
try:
cursor.execute('select CAMERA_TYPE from VEHICLE_IMAGE where TRANSACTION_ID=' + str(TransNum))
result = ''
for img_type in cursor:
result = result + img_type[0]
if result == '':
result = 'No image available'
print 'Images found: ' + str(TransNum) + " "+ result
resultSort = result.split()
resultSort.sort()
result = ''
for i in range(len(resultSort)):
result = result + " " + resultSort[i]
cursor.close()
return result
except Exception as e:
# print 'Error occured while getting image references: ', e
pass
This function returns a string which is either 'No images available' or has the image information if found. I have to create a new column in the dataframe populated with this result so my final dataframe should look like this
My question is: How can I speed up this process? Using for loop on rows with 100k+ entries is extremely slow and painful. I have looked into functions like dataframe.map and dataframe.apply but haven't been able to get it working. Other options I see is using cython or multiple threads. In which option should I invest my time? Any help is appreciated
You query Oracle for each transaction and then additionally aggregate fetched data for each transaction in a loop - it's very inefficient.
First i would create a "mapping" DataFrame like as follows:
transaction_id images
111 No image available
112 FRONT REAR
113 OVERVIEW
this can be done using Oracle's LISTAGG function:
qry = """
select
transaction_id,
NVL(listagg(camera_type, ' ') within group (order by camera_type), 'No image available') as images
from vehicle_image group by transaction_id
"""
# `engine` - is a SQLAlchemy engine connection ...
cam = pd.read_sql(qry, con=engine, index_col=['transaction_id'])
after that we can use Series.map() method:
df['Image_Found'] = df.transaction_id.map(cam.images)
How do I bind a variable to a SQL set for an IN query in Perl DBI?
Example:
my #nature = ('TYPE1','TYPE2'); # This is normally populated from elsewhere
my $qh = $dbh->prepare(
"SELECT count(ref_no) FROM fm_fault WHERE nature IN ?"
) || die("Failed to prepare query: $DBI::errstr");
# Using the array here only takes the first entry in this example, using a array ref gives no result
# bind_param and named bind variables gives similar results
$qh->execute(#nature) || die("Failed to execute query: $DBI::errstr");
print $qh->fetchrow_array();
The result for the code as above results in only the count for TYPE1, while the required output is the sum of the count for TYPE1 and TYPE2. Replacing the bind entry with a reference to #nature (\#nature), results in 0 results.
The main use-case for this is to allow a user to check multiple options using something like a checkbox group and it is to return all the results. A work-around is to construct a string to insert into the query - it works, however it needs a whole lot of filtering to avoid SQL injection issues and it is ugly...
In my case, the database is Oracle, ideally I want a generic solution that isn't affected by the database.
There should be as many ? placeholders as there is elements in #nature, ie. in (?,?,..)
my #nature = ('TYPE1','TYPE2');
my $pholders = join ",", ("?") x #nature;
my $qh = $dbh->prepare(
"SELECT count(ref_no) FROM fm_fault WHERE nature IN ($pholders)"
) or die("Failed to prepare query: $DBI::errstr");
I'm trying to use .extra() where the query return more than 1 result, like :
'SELECT "books_books"."*" FROM "books_books" WHERE "books_books"."owner_id" = %s' % request.user.id
I got an error : only a single result allowed for a SELECT that is part of an expression
Try it on dev-server using sqlite3. Anybody knows how to fix this? Or my query is wrong?
EDIT:
I'm using django-simple-ratings, my model like this :
class Thread(models.Model):
#
#
ratings = Ratings()
I want to display each Thread's ratings and whether a user already rated it or not. For 2 items, it will hit 6 times, 1 for the actual Thread and 2 for accessing the ratings. The query:
threads = Thread.ratings.order_by_rating().filter(section = section)\
.select_related('creator')\
.prefetch_related('replies')
threads = threads.extra(select = dict(myratings = "SELECT SUM('section_threadrating'.'score') AS 'agg' FROM 'section_threadrating' WHERE 'section_threadrating'.'content_object_id' = 'section_thread'.'id' ",)
Then i can print each Thread's ratings without hitting the db more. For the 2nd query, i add :
#continue from extra
blahblah.extra(select = dict(myratings = '#####code above####',
voter_id = "SELECT 'section_threadrating'.'user_id' FROM 'section_threadrating' WHERE ('section_threadrating'.'content_object_id' = 'section_thread'.'id' AND 'section_threadrating'.'user_id' = '3') "))
Hard-coded the user_id. Then when i use it on template like this :
{% ifequal threads.voter_id user.id %}
#the rest of the code
I got an error : only a single result allowed for a SELECT that is part of an expression
Let me know if it's not clear enough.
The problem is in the query. Generally, when you are writing subqueries, they must return only 1 result. So a subquery like the one voter_id:
select ..., (select sectio_threadrating.user_id from ...) as voter_id from ....
is invalid, because it can return more than one result. If you are sure it will always return one result, you can use the max() or min() aggregation function:
blahblah.extra(select = dict(myratings = '#####code above####',
voter_id = "SELECT max('section_threadrating'.'user_id') FROM 'section_threadrating' WHERE ('section_threadrating'.'content_object_id' = 'section_thread'.'id' AND 'section_threadrating'.'user_id' = '3') "))
This will make the subquery always return 1 result.
Removing that hard-code, what user_id are you expecting to retrieve here? Maybe you just can't reduce to 1 user using only SQL.