"Required parameter key not set" error in AWS ATHENA - sql

I am using AWS ATHENA in my lambda function to query and fetch results.
For simpler queries, sometimes the query execution fails with error "Required parameter key not set"
But on re-execution, the query gets executed successfully.
For complex queries, the query never gets executed and fails with the same error "Required parameter key not set." These queries take more than 5 seconds to execute when queries in the Athena console of the AWS account.
What changes do I need to make to successfully execute Athena query from my lambda function? Below is the code.
def execute_query(query, athena_database, service_query_result_folder):
# Function to execute the sql query and get the results.
athena_client = boto3.client(service_name="athena", region_name=region)
result_location = f"s3://{s3_bucket}/{service_query_result_folder}/"
logger.info(f"Athena result location: {result_location}")
execution_start_response = athena_client.start_query_execution(
QueryString=query,
QueryExecutionContext={"Database": athena_database},
ResultConfiguration={"OutputLocation": result_location}
)
queryId = execution_start_response["QueryExecutionId"]
logger.info(f"Executed query with execution id: {queryId}")
result_file = queryId + ".csv"
result_file_with_prefix = service_query_result_folder + '/' + result_file
result_file_location = result_location + result_file
execution_get_response = athena_client.get_query_execution(QueryExecutionId=queryId)
execution_details = execution_get_response["QueryExecution"]
execution_status = execution_details["Status"]["State"]
if execution_status == "SUCCEEDED":
logger.info(f"Athena query execution {execution_status}. Execution details: {execution_details}")
logger.info(f"Result file location: {result_file_location}")
logger.info(f"Result fie: {result_file}")
return result_file_with_prefix
elif execution_status == "FAILED" or execution_status == "CANCELLED":
logger.error("Athena query execution {}. Details: {}".format(execution_status, execution_get_response))
I tried updating the query to result with one row, but I'm still facing the issue.

Related

RDSdataService execute_statement returns (BadRequestException)

I am using boto3 library with executeStatement to get data from an RDS cluster using DATA API.
Query is working fine if i select 1 or 2 columns but as soon as I select another column to query, it returns an error with (BadRequestException) permission denied for relation table_name
I have checked using pgadmin the permissions are intact to query the whole db for the user I am using.
function included in call:
def execute_query(self, sql_query, sql_parameters=[]):
"""
Aurora DataAPI execute query. Generally used for select statements.
:param sql_query: Query
:param sql_parameters: parameters in sql query
:return: DataApi response
"""
client = self.api_access()
response = client.execute_statement(
resourceArn=RESOURCE_ARN,
secretArn=SECRET_ARN,
database='db_name',
sql=sql_query,
includeResultMetadata=True,
parameters=sql_parameters)
return response
function call: No errors
query = '''
SELECT id
FROM schema_name.table_name
limit 1
'''
print(query)
result = conn.execute_query(query)
print(result)
function call: fails with above error
query = '''
SELECT id,name,event
FROM schema_name.table_name
limit 1
'''
print(query)
result = conn.execute_query(query)
print(result)
Is there a horizontal limit on what we can get from DATA API using Boto3? I know there is a limit for 1MB, but it should return something as per the documentation if it exceeds the limit.
Backend is Postgres RDS
UPDATE:
I can select the same columns 10 times and its not a problem
query = '''
SELECT id,event,event,event,event,event
FROM schema_name.table_name
limit 1
'''
print(query)
result = conn.execute_query(query)
print(result)
So this means there are some columns that I cannot select.
I didnt know there are column level security in some tables. If there are column level securities in postgres for the user you are using that's obvious I cannot select those columns.

Django: How to know the raw sql and the time taken by a model queryset

I am trying to know the
1) time taken or duration and
2) raw sql
for any django model queryset
Eg:
users = User.objects.all()
print(time_take for users)
print(raw sql of users)
Answer tried from going through some previous solutions on this questions in Stack
sql:
I thinking to use the below as answer for sql query. It is given by #Flash in https://stackoverflow.com/a/47542953/2897115
from django.db import connections
def str_query(qs):
"""
qs.query returns something that isn't valid SQL, this returns the actual
valid SQL that's executed: https://code.djangoproject.com/ticket/17741
"""
cursor = connections[qs.db].cursor()
query, params = qs.query.sql_with_params()
cursor.execute('EXPLAIN ' + query, params)
res = str(cursor.db.ops.last_executed_query(cursor, query, params))
assert res.startswith('EXPLAIN ')
return res[len('EXPLAIN '):]
timetaken:
And for timetaken i use start = time() and stop = time()
the code becomes:
someview()
start = time()
qs = Somemodels.objecsts.all()
stop = time()
sql = str_query(qs)
timetaken = "%.3f" % (stop - start)
...
Q Will this show the correct values of sql and timetaken.
Q Is there any way to know the timetaken from the cursor.db module instead of using start = time() and stop = time()
I also found someplace to get sql using:
from django import db
db.connection.queries[-1]
Q How is this different from str_query(qs) method i am trying to use

BigQuery updates failing, but only when batched using Python API

I am trying to update a table using batched update statements. DML queries successfully execute in the BigQuery Web UI, but when batched, the first one succeeds while others fail. Why is this?
A sample query:
query = '''
update `project.dataset.Table`
set my_fk = 1234
where other_fk = 222 and
received >= PARSE_TIMESTAMP("%Y-%m-%d %H:%M:%S", "2018-01-22 05:28:12") and
received <= PARSE_TIMESTAMP("%Y-%m-%d %H:%M:%S", "2018-01-26 02:31:51")
'''
Sample code:
job_config = bigquery.QueryJobConfig()
job_config.priority = bigquery.QueryPriority.BATCH
queries = [] # list of DML Strings
jobs = []
for query in queries:
job = client.query(query, location='US', job_config=job_config)
jobs.append(job)
Job output:
for job in jobs[1:]:
print(job.state)
# Done
print(job.error_result)
# {'message': 'Cannot set destination table in jobs with DML statements',
# 'reason': 'invalidQuery'}
print(job.use_legacy_sql)
# False
print(job.job_type)
# Query
I suspect that the problem is job_config getting some fields populated (destination in particular) by the BigQuery API after the first job is inserted. Then, the second job will fail as it will be a DML statement with a destination table in the job configuration. You can verify that with:
for query in queries:
print(job_config.destination)
job = client.query(query, location='US', job_config=job_config)
print(job_config.destination)
jobs.append(job)
To solve this you can avoid reusing the same job_config for all jobs:
for query in queries:
job_config = bigquery.QueryJobConfig()
job_config.priority = bigquery.QueryPriority.BATCH
job = client.query(query, location='US', job_config=job_config)
jobs.append(job)
Your code seems to be working fine on a single update. This is what I tried using python 3.6.5 and v1.9.0 of the client API
from google.cloud import bigquery
client = bigquery.Client()
query = '''
UPDATE `project.dataset.table` SET msg = null WHERE x is null
'''
job_config = bigquery.QueryJobConfig()
job_config.priority = bigquery.QueryPriority.BATCH
job = client.query(query, location='US', job_config=job_config)
print(job.state)
# PENDING
print(job.error_result)
# None
print(job.use_legacy_sql)
# False
print(job.job_type)
# Query
Please check your configuration and provide full code with an error log if this doesn't help you solve your problem
BTW, I also verify this from the command line
sh-3.2# ./bq query --nouse_legacy_sql --batch=true 'UPDATE `project.dataset.table` SET msg = null WHERE x is null'
Waiting on bqjob_r5ee4f5dd56dc212f_000001697d3f9a56_1 ... (133s) Current status: RUNNING
Waiting on bqjob_r5ee4f5dd56dc212f_000001697d3f9a56_1 ... (139s) Current status: DONE
sh-3.2#
sh-3.2# python --version

Multiple parameter values

I have a problem with BIRT when I try to pass multiple values from report parameter.
I'm using BIRT 2.6.2 and eclipse.
I'm trying to put multiple values from cascading parameter group last parameter "JDSuser". The parameter is allowed to have multiple values and I'm using list box.
In order to be able to do that I'm writing my sql query with where-in statement where I replace text with javascript. Otherwise BIRT sql can't get multiple values from report parameter.
My sql query is
select jamacomment.createdDate, jamacomment.scopeId,
jamacomment.commentText, jamacomment.documentId,
jamacomment.highlightQuote, jamacomment.organizationId,
jamacomment.userId,
organization.id, organization.name,
userbase.id, userbase.firstName, userbase.lastName,
userbase.organization, userbase.userName,
document.id, document.name, document.description,
user_role.userId, user_role.roleId,
role.id, role.name
from jamacomment jamacomment left join
userbase on userbase.id=jamacomment.userId
left join organization on
organization.id=jamacomment.organizationId
left join document on
document.id=jamacomment.documentId
left join user_role on
user_role.userId=userbase.id
right join role on
role.id=user_role.roleId
where jamacomment.scopeId=11
and role.name in ( 'sample grupa' )
and userbase.userName in ( 'sample' )
and my javascript code for that dataset on beforeOpen state is:
if( params["JDSuser"].value[0] != "(All Users)" ){
this.queryText=this.queryText.replaceAll('sample grupa', params["JDSgroup"]);
var users = params["JDSuser"];
//var userquery = "'";
var userquery = userquery + users.join("', '");
//userquery = userquery + "'";
this.queryText=this.queryText.replaceAll('sample', userquery);
}
I tryed many different quote variations, with this one I get no error messages, but if I choose 1 value, I get no data from database, but if I choose at least 2 values, I get the last chosen value data.
If I uncomment one of those additional quote script lines, then I get syntax error like this:
The following items have errors:
Table (id = 597):
+ An exception occurred during processing. Please see the following message for details: Failed to prepare the query execution for the
data set: Organization Cannot get the result set metadata.
org.eclipse.birt.report.data.oda.jdbc.JDBCException: SQL statement does not return a ResultSet object. SQL error #1:You have an error in
your SQL syntax; check the manual that corresponds to your MySQL
server version for the right syntax to use near 'rudolfs.sviklis',
'sample' )' at line 25 ;
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to
your MySQL server version for the right syntax to use near
'rudolfs.sviklis', 'sample' )' at line 25
Also, I should tell you that i'm doing this by looking from working example. Everything is the same, the previous code resulted to the same syntax error, I changed it to this script which does the same.
The example is available here:
http://developer.actuate.com/community/forum/index.php?/files/file/593-default-value-all-with-multi-select-parsmeter/
If someone could give me at least a clue to what I should do that would be great.
You should always use the value property of a parameter, i.e.:
var users = params["JDSuser"].value;
It is not necessary to surround "userquery" with quotes because these quotes are already put in the SQL query arround 'sample'. Furthermore there is a mistake because userquery is not yet defined at line:
var userquery = userquery + users.join("', '");
This might introduce a string such "null" in your query. Therefore remove all references to userquery variable, just use this expression at the end:
this.queryText=this.queryText.replaceAll('sample', users.join("','"));
Notice i removed the blank space in the join expression. Finally once it works finely, you probably need to make your report input more robust by testing if the value is null:
if( params["JDSuser"].value!=null && params["JDSuser"].value[0] != "(All Users)" ){
//Do stuff...
}

BigQuery Pagination

How to do pagination using BigQuery ,when using javascript
First I send request:
var request = gapi.client.bigquery.jobs.query({
'projectId': project_id,
'timeoutMs': '30000',
'query': query,
'maxResults' : 50,
'pageToken': pageToken
});
This query will return me first 50 results and then how can i retrieve next 50 results.I want to do pagination dynamically using javascript and Bigquery.
query:
SELECT year, month,day,state,mother_age, AVG(weight_pounds) as AvgWeight FROM [publicdata:samples.natality] Group EACH By year, month,day,state, mother_age
This is the query that i am using.
TableData.list works, or alternately you can use jobs.getQueryResults(), which is usually the preferred way to get query results (since it can also wait for the query to complete).
You should use the page token returned from the original query response or the previous jobs.getQueryResults() call to iterate through pages. This is generally more efficient and reliable than using index-based pagination.
I don't have a javascript example, but here is an example using python that should be relatively easy to adapt:
from apiclient.discovery import build
def run_query(http, service, project_id, query, response_handler,
timeout=30*1000, max_results=1024):
query_request = {
'query': query,
'timeoutMs': timeout,
'maxResults': max_results}
print 'Running query "%s"' % (query,)
response = service.jobs().query(projectId=project_id,
body=query_request).execute(http)
job_ref = response['jobReference']
get_results_request = {
'projectId': project_id,
'jobId': job_ref['jobId'],
'timeoutMs': timeout,
'maxResults': max_results}
while True:
print 'Response %s' % (response,)
page_token = response.get('pageToken', None)
query_complete = response['jobComplete']
if query_complete:
response_handler(response)
if page_token is None:
# Our work is done, query is done and there are no more
# results to read.
break;
# Set the page token so that we know where to start reading from.
get_results_request['pageToken'] = page_token
# Apply a python trick here to turn the get_results_request dict
# into method arguments.
response = service.jobs().getQueryResults(
**get_results_request).execute(http)
def print_results(results):
fields = results['schema']['fields']
rows = results['rows']
for row in rows:
for i in xrange(0, len(fields)):
cell = row['f'][i]
field = fields[i]
print "%s: %s " % (field['name'], cell['v']),
print ''
def run(http, query):
service = build('bigquery', 'v2')
project_id = '#Your Project Here#'
run_query(http, service, project_id, query, print_results,
timeout=1)
Once the query has run, all results will be saved to a temporary table (or permanent, if you have set the respective flag).
You can read these results with tabledata.list. Notice that it offers an startIndex argument, so you can jump to any arbitrary page too, not only to the next one.
https://developers.google.com/bigquery/docs/reference/v2/tabledata/list