How to parse slow query log from cloudwatch insights - amazon-cloudwatch

How to parse mysql slow query logs from cloudwatch insights.
message looks like below
# User#Host: abc_ro[abc_ro] # [x.x.x.xxx]
# Thread_id: 4935648 Schema: mytable QC_hit: No
# Query_time: 1.282014 Lock_time: 0.000154 Rows_sent: 6 Rows_examined: 964382
# Rows_affected: 0
use mydummydb;
SET timestamp=1662984070;
SELECT * FROM mytable

Below will parse the query time and query
parse #message /Query_time:\s*(?<Query_time>[0-9]+(?:\.[0-9]+)?)\s*Lock_time:\s*(?<Lock_time>[0-9]+(?:\.[0-9]+)?)[\s\S]*?;/
| parse #message /SET timestamp=\d+;\n(?<Query>[^;]*)/

Related

CloudWatch Logs Insights display a filed from the Json in the log message

This is my log entry from AWS API Gateway:
(8d036972-0445) Method request body before transformations: {"TransactionAmount":225.00,"OrderID":"1545623982","PayInfo":{"Method":"ec","TransactionAmount":225.00},"CFeeProcess":0}
I want to write a CloudWatch Logs Insights query which can display AWS request id, present in the first parenthesis and the order id present in the json.
I'm able to get the AWS request id by parsing the message. How can I get the OrderID json field?
Any help is greatly appreciated.
| parse #message "(*) Method request body before transformations: *" as awsReqId,JsonBody
#| filter OrderID = "1545623982" This did not work
| display awsReqId,OrderID
| limit 20
You can do it with two parse steps, like this:
fields #message
| parse #message "(*) Method request body before transformations: *" as awsReqId, JsonBody
| parse JsonBody "\"OrderID\":\"*\"" as OrderId
| filter OrderID = "1545623982"
| display awsReqId,OrderID
| limit 20
Edit:
Actually, they way you're doing it should also work. I think it doesn't work because you have 2 space characters between brackets and the word Method here (*) Method. Try removing 1 space.

Insert data into a snowflake table from sqlachemy

So I am trying to insert data into a snowflake transient table (from a parquet file), but my syntax doesn't allow me to go past our SAST test in the pipeline.
Do you see anything wrong with the following code snippet (especially the insert into step since it is causing the error):
with snowflake_engine.begin() as tx:
(SOME WORKING CODE)...
if table == "lending_adjudications":
tx.execute(
f"put file://{pq_filepath_2} #{destination_schema}.{stage_guid_part_2}"
).fetchall()
stmts = [
f"create or replace transient table {destination_schema}.{table} as " # nosec
f"select $1 as fields from #{destination_schema}.{stage_guid}", # nosec
f"insert into {destination_schema}.{table} "
f"select $1 as fields from #{destination_schema}.{stage_guid_part_2}",
]
[tx.execute(stmt).fetchall() for stmt in stmts]
else:
tx.execute(
f"create or replace transient table {destination_schema}.{table} as " # nosec
f"select $1 as fields from #{destination_schema}.{stage_guid}" # nosec
).fetchall()
...
Thank you so much for your help, any insight is highly appreciated.

RDSdataService execute_statement returns (BadRequestException)

I am using boto3 library with executeStatement to get data from an RDS cluster using DATA API.
Query is working fine if i select 1 or 2 columns but as soon as I select another column to query, it returns an error with (BadRequestException) permission denied for relation table_name
I have checked using pgadmin the permissions are intact to query the whole db for the user I am using.
function included in call:
def execute_query(self, sql_query, sql_parameters=[]):
"""
Aurora DataAPI execute query. Generally used for select statements.
:param sql_query: Query
:param sql_parameters: parameters in sql query
:return: DataApi response
"""
client = self.api_access()
response = client.execute_statement(
resourceArn=RESOURCE_ARN,
secretArn=SECRET_ARN,
database='db_name',
sql=sql_query,
includeResultMetadata=True,
parameters=sql_parameters)
return response
function call: No errors
query = '''
SELECT id
FROM schema_name.table_name
limit 1
'''
print(query)
result = conn.execute_query(query)
print(result)
function call: fails with above error
query = '''
SELECT id,name,event
FROM schema_name.table_name
limit 1
'''
print(query)
result = conn.execute_query(query)
print(result)
Is there a horizontal limit on what we can get from DATA API using Boto3? I know there is a limit for 1MB, but it should return something as per the documentation if it exceeds the limit.
Backend is Postgres RDS
UPDATE:
I can select the same columns 10 times and its not a problem
query = '''
SELECT id,event,event,event,event,event
FROM schema_name.table_name
limit 1
'''
print(query)
result = conn.execute_query(query)
print(result)
So this means there are some columns that I cannot select.
I didnt know there are column level security in some tables. If there are column level securities in postgres for the user you are using that's obvious I cannot select those columns.

BigQuery updates failing, but only when batched using Python API

I am trying to update a table using batched update statements. DML queries successfully execute in the BigQuery Web UI, but when batched, the first one succeeds while others fail. Why is this?
A sample query:
query = '''
update `project.dataset.Table`
set my_fk = 1234
where other_fk = 222 and
received >= PARSE_TIMESTAMP("%Y-%m-%d %H:%M:%S", "2018-01-22 05:28:12") and
received <= PARSE_TIMESTAMP("%Y-%m-%d %H:%M:%S", "2018-01-26 02:31:51")
'''
Sample code:
job_config = bigquery.QueryJobConfig()
job_config.priority = bigquery.QueryPriority.BATCH
queries = [] # list of DML Strings
jobs = []
for query in queries:
job = client.query(query, location='US', job_config=job_config)
jobs.append(job)
Job output:
for job in jobs[1:]:
print(job.state)
# Done
print(job.error_result)
# {'message': 'Cannot set destination table in jobs with DML statements',
# 'reason': 'invalidQuery'}
print(job.use_legacy_sql)
# False
print(job.job_type)
# Query
I suspect that the problem is job_config getting some fields populated (destination in particular) by the BigQuery API after the first job is inserted. Then, the second job will fail as it will be a DML statement with a destination table in the job configuration. You can verify that with:
for query in queries:
print(job_config.destination)
job = client.query(query, location='US', job_config=job_config)
print(job_config.destination)
jobs.append(job)
To solve this you can avoid reusing the same job_config for all jobs:
for query in queries:
job_config = bigquery.QueryJobConfig()
job_config.priority = bigquery.QueryPriority.BATCH
job = client.query(query, location='US', job_config=job_config)
jobs.append(job)
Your code seems to be working fine on a single update. This is what I tried using python 3.6.5 and v1.9.0 of the client API
from google.cloud import bigquery
client = bigquery.Client()
query = '''
UPDATE `project.dataset.table` SET msg = null WHERE x is null
'''
job_config = bigquery.QueryJobConfig()
job_config.priority = bigquery.QueryPriority.BATCH
job = client.query(query, location='US', job_config=job_config)
print(job.state)
# PENDING
print(job.error_result)
# None
print(job.use_legacy_sql)
# False
print(job.job_type)
# Query
Please check your configuration and provide full code with an error log if this doesn't help you solve your problem
BTW, I also verify this from the command line
sh-3.2# ./bq query --nouse_legacy_sql --batch=true 'UPDATE `project.dataset.table` SET msg = null WHERE x is null'
Waiting on bqjob_r5ee4f5dd56dc212f_000001697d3f9a56_1 ... (133s) Current status: RUNNING
Waiting on bqjob_r5ee4f5dd56dc212f_000001697d3f9a56_1 ... (139s) Current status: DONE
sh-3.2#
sh-3.2# python --version

executing HIVE query in background

how to execute a HIVE query in background when the query looks like below
Select count(1) from table1 where column1='value1';
I am trying to write it using a script like below
#!/usr/bin/ksh
exec 1> /home/koushik/Logs/`basename $0 | cut -d"." -f1 | sed 's/\.sh//g'`_$(date +"%Y%m%d_%H%M%S").log 2>&1
ST_TIME=`date +%s`
cd $HIVE_HOME/bin
./hive -e 'SELECT COUNT(1) FROM TABLE1 WHERE COLUMN1 = ''value1'';'
END_TIME=`date +%s`
TT_SECS=$(( END_TIME - ST_TIME))
TT_HRS=$(( TT_SECS / 3600 ))
TT_REM_MS=$(( TT_SECS % 3600 ))
TT_MINS=$(( TT_REM_MS / 60 ))
TT_REM_SECS=$(( TT_REM_MS % 60 ))
printf "\n"
printf "Total time taken to execute the script="$TT_HRS:$TT_MINS:$TT_REM_SECS HH:MM:SS
printf "\n"
but getting error like
FAILED: SemanticException [Error 10004]: Line 1:77 Invalid table alias or column reference 'value1'
let me know exactly where I am doing mistake.
Create a document named example
vi example
Enter the query in the document and save it.
create table sample as
Select count(1) from table1 where column1='value1';
Now run the document using the following command:
hive -f example 1>example.error 2>example.output &
You will get the result as
[1]
Now disown the process :
disown
Now the process will run in the background. If you want to know the status of the output, you may use
tail -f example.output
True #Koushik ! Glad that you found the issue.
In the query, bash was unable to form the hive query due to ambiguous single quotes.
Though SELECT COUNT(1) FROM Table1 WHERE Column1 = 'Value1' is valid in hive,
$hive -e 'SELECT COUNT(1) FROM Table1 WHERE Column1 = 'Value1';' is not valid.
The best solution would be to use double quotes for the Value1 as
hive -e 'SELECT COUNT(1) FROM Table1 WHERE Column1 = "Value1";'
or use a quick and dirty solution by including the single quotes within double quotes.
hive -e 'SELECT COUNT(1) FROM Table1 WHERE Column1 = "'"Value1"'";'
This would make sure that the hive query is properly formed and then executed accordingly. I'd not suggest this approach unless you've a desperate ask for a single quote ;)
I am able to resolve it replacing single quote with double quote. Now the modified statement looks like
./hive -e 'SELECT COUNT(1) FROM Table1 WHERE Column1 = "Value1";'