AQL cutting off data when querying through console - aerospike

Trying to query some bin for some record through AQL, however, the data is cut off.
Any ideas how to make AQL print full data?
aql> select tokens from user.users where pk='some_pk'
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tokens |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| MAP('{"93836":"external-jfbl8squ-zWLoOuBgXD-test::1527455663", "39720":"external-jfbl8squ-UZWvWVMjtc-test::1527455663", "40870":"external-jfbl8squ-kIFcZKdimg-test::1527455663", "70065":"external-jfbl8squ-PezniJBRgE-test::1527455663", "36903":"external-jf |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.000 secs)
As you can see, the data is not finished at the end.

Right after posted this question, I tried starting AQL with "-o json" and it works.
aql> select partner_tokens from user.users where pk='cfef295b-dbd6-4f5f-8ad6-b0332c950772'
[
{
"tokens": {
"93836": "external-jfbl8squ-zWLoOuBgXD-test::1527455663",
"39720": "external-jfbl8squ-UZWvWVMjtc-test::1527455663",
"40870": "external-jfbl8squ-kIFcZKdimg-test::1527455663",
"70065": "external-jfbl8squ-PezniJBRgE-test::1527455663",
"36903": "external-jfbl8squ-yYSCVcZeuF-test::1527455663",
"78608": "external-jfbl8squ-vYukUUHCSa-test::1527455663",
"50785": "external-jfbl8squ-kOonwnEZiL-test::1527455663"
}
}
]

Related

CloudWatch Logs Insights display a filed from the Json in the log message

This is my log entry from AWS API Gateway:
(8d036972-0445) Method request body before transformations: {"TransactionAmount":225.00,"OrderID":"1545623982","PayInfo":{"Method":"ec","TransactionAmount":225.00},"CFeeProcess":0}
I want to write a CloudWatch Logs Insights query which can display AWS request id, present in the first parenthesis and the order id present in the json.
I'm able to get the AWS request id by parsing the message. How can I get the OrderID json field?
Any help is greatly appreciated.
| parse #message "(*) Method request body before transformations: *" as awsReqId,JsonBody
#| filter OrderID = "1545623982" This did not work
| display awsReqId,OrderID
| limit 20
You can do it with two parse steps, like this:
fields #message
| parse #message "(*) Method request body before transformations: *" as awsReqId, JsonBody
| parse JsonBody "\"OrderID\":\"*\"" as OrderId
| filter OrderID = "1545623982"
| display awsReqId,OrderID
| limit 20
Edit:
Actually, they way you're doing it should also work. I think it doesn't work because you have 2 space characters between brackets and the word Method here (*) Method. Try removing 1 space.

Elasticsearch, Elasticsearch SQL, SHOW COLUMNS or DESCRIBE - is there a posibility to filter the output

I have simple elastic SQL query like this:
GET /_sql?format=txt
{
"query" :"""
DESCRIBE "index_name"
"""
}
and it works, and the output is like this:
column | type | mapping
-----------------------------------------------------------
column_name1 | STRUCT | object
column_name1.Id | VARCHAR | text
column_name1.Id.keyword | VARCHAR | keyword
Is there a possibility to the prepare above query using filter or where, for example something like this:
GET /_sql?format=txt
{
"query":"""
DESCRIBE "index_name"
""",
"filter": {"terms": {"type.keyword": ["STRUCT"]}}
}
or
GET /_sql?format=txt
{
"query":"""
DESCRIBE "index_name"
WHERE "type" = 'STRUCT'
"""
}
That is not possible, no.
While the DESCRIBE sql command seems to return tabular data, it is not a query and it does not support WHERE clauses or can be used within a SELECT statement. That is actually not specific to Elasticsearch, but the same in RDBMs.
The same apparently is true for the Elasticsearch filter clause. This again will work with SELECT SQL statements, but with DESCRIBE or SHOW COLUMNS - while not producing an error - it simply will have no effect on the results.
In "real" SQL, you could work around this by querying information_schema.COLUMNS, but that is not an option in Elasticsearch.

Accessing values in JSON array

I am following the instruction in the documentation for how to access JSON values in CloudWatch Insights where the recomendation is as follows
JSON arrays are flattened into a list of field names and values. For example, to specify the value of instanceId for the first item in requestParameters.instancesSet, use requestParameters.instancesSet.items.0.instanceId.
ref
https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CWL_AnalyzeLogData-discoverable-fields.html
I am trying the following and getting nothing in return. The intellisense autofills up to processList.0 but no further
fields processList.0.vss
| sort #timestamp desc
| limit 1
The JSON I am woking with is
"processList": [
{
"vss": xxxxx,
"name": "aurora",
"tgid": xxxx,
"vmlimit": "unlimited",
"parentID": 1,
"memoryUsedPc": 16.01,
"cpuUsedPc": 0.01,
"id": xxxxx,
"rss": xxxxx
},
{
"vss": xxxx,
"name": "aurora",
"tgid": xxxxxx,
"vmlimit": "unlimited",
"parentID": 1,
"memoryUsedPc": 16.01,
"cpuUsedPc": 0.06,
"id": xxxxx,
"rss": xxxxx
}]
Have you tried the following?
fields ##timestamp, #processList.0.vss
| sort ##timestamp desc
| limit 5
It may be a syntax error. If not, please post a couple of records worth of the overall structure, with #timestamp included.
The reference link that you have posted also states the following.
CloudWatch Logs Insights can extract a maximum of 100 log event fields
from a JSON log. For extra fields that are not extracted, you can use
the parse command to parse these fields from the raw unparsed log
event in the message field.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CWL_AnalyzeLogData-discoverable-fields.html
For very large JSON messages, Insights intellisense may not be parsing all the fields into named fields. So, the solution is to use parse on the complete JSON string in the field where you expect your data field to be present. In your example and mine it is processList.
I was able to extract the value of specific cpuUsedPc under processList by using a query like the following.
fields #timestamp, cpuUtilization.total, processList
| parse processList /"name":"RDS processes","tgid":.*?,"parentID":.*?,"memoryUsedPc":.*?,"cpuUsedPc":(?<RDSProcessesCPUUsedPc>.*?),/
| sort #timestamp asc
| display #timestamp, cpuUtilization.total, RDSProcessesCPUUsedPc

Xargs, sqlplus and quote nightmare?

I have one big file containing data, for example :
123;test/x/COD_ACT_008510/descr="R08-Ballon d''eau"
456;test/x/COD_ACT_008510/descr="R08-Ballon d''eau"
In reality, there is much more column but I simplified here.
I want to treat each line, and do some sqlplus treatment with them.
Let say that I have one table, with two column, with this :
ID | CONTENT
123 | test/x/COD_ACT_333/descr="Test 1"
456 | test/x/COD_ACT_444/descr="Test 2"
Let say I want to update the two lines content value to have that :
ID | CONTENT
123 | test/x/COD_ACT_008510/descr="R08-Ballon d''eau"
456 | test/x/COD_ACT_008510/descr="R08-Ballon d''eau"
I have a lot of data and complex request to execute in reality, so I have to use sqlplus, not tools like sqlloader.
So, I treat the input file on 5 multi thread, one line at each time, and define "\n" like separator to evict quote conflict :
cat input_file.txt | xargs -n 1 -P 5 -d '\n' ./my_script.sh &
In "my_script.sh" I have :
#!/bin/bash
line="$1"
sim_id=$(echo "$line" | cut -d';' -f1)
content=$(echo "$line" | cut -d';' -f2)
sqlplus -s $DBUSER/$DBPASSWORD#$DBHOST:$DBPORT/$DBSCHEMA #updateRequest.sql "$id" "'"$content"'"
And in the updateRequest.sql file (just containing a test) :
set heading off
set feed off
set pages 0
set verify off
update T_TABLE SET CONTENT = '&2' where ID = '&1';
commit;
And in result, I have :
01740: missing double quote in identifier
If I put “verify” parameter to on in the sql script, I can see :
old 1: select '&2' from dual
new 1: select 'test/BVAL/COD_ACT_008510/descr="R08-Ballon d'eau"' from dual
It seems like one of the two single quotes (used for escape the second quote) is missing...
I tried everything, but each time I have an error with quote or double quote, either of bash side, or sql side... it's endless :/
I need the double quote for the "descr" part, and I need to process the apostrophe (quote) in content.
For info, the input file is generated automatically, but I can modify his format.
With GNU Parallel it looks like this:
dburl=oracle://$DBUSER:$DBPASSWORD#$DBHOST:$DBPORT/$DBSCHEMA
cat big |
parallel -j5 -v --colsep ';' -q sql $dburl "update T_TABLE SET CONTENT = '{=2 s/'/''/g=}' where ID = '{1}'; commit;"
But only if you do not have ; in the values. So given this input it will do the wrong thing:
456;test/x/COD_ACT_008510/descr="semicolon;in;value"

Hive JSON Serde MetaStore Issue

I have an external table with JSON data and I am using JsonSerde to populate data into the table. I am properly getting the data populated and when I query the data I am able to see the results correctly.
But,when I use desc command on that table I am getting from deserializer text for all the column comments.
Below is the table creation ddl.
CREATE EXTERNAL TABLE IF NOT EXISTS my_table (
field1 string COMMENT 'This is a field1',
field2 int COMMENT 'This is a field2',
field3 string COMMENT 'This is a field3',
field4 double COMMENT 'This is a field4'
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde'
Location '/user/uszszb6/json_test/data';
Entries in the data file.
{"field1":"data1","field2":100,"field3":"more data1","field4":123.001}
{"field1":"data2","field2":200,"field3":"more data2","field4":123.002}
{"field1":"data3","field2":300,"field3":"more data3","field4":123.003}
{"field1":"data4","field2":400,"field3":"more data4","field4":123.004}
When I use use the command desc my_table, I get the below output.
+-----------+------------+--------------------+--+
| col_name | data_type | comment |
+-----------+------------+--------------------+--+
| field1 | string | from deserializer |
| field2 | int | from deserializer |
| field3 | string | from deserializer |
| field4 | double | from deserializer |
+-----------+------------+--------------------+--+
JsonSerde is not able to capture the comments properly. I have also tried with other JSONSerde like
org.openx.data.jsonserde.JsonSerDe
org.apache.hive.hcatalog.data.JsonSerDe
com.amazon.elasticmapreduce.JsonSerde
But desc command output is same. There is a JIRA ticket for this bug [https://issues.apache.org/jira/browse/HIVE-6681][1]
According to ticket it's resolved in version 0.13, I am using hive 1.2.1 but still I am facing this issue.
Could anyone share your thoughts on resolving this issue.
Yeah, it looks like it's an hive bug that affects all the Json SerDes, but have you tried using DESCRIBE EXTENDED ?
DESCRIBE EXTENDED my_table;
hive> describe extended json_serde_test;
OK
browser string from deserializer
device_uuid string from deserializer
custom struct<customer_id:string> from deserializer
Detailed Table Information
Table(tableName:json_serde_test,dbName:default, owner:rcongiu,
createTime:1448477902, lastAccessTime:0, retention:0,
sd:StorageDescriptor(cols:[FieldSchema(name:browser, type:string,
comment:hello), FieldSchema(name:device_uuid, type:string, comment:my
name is elder price), FieldSchema(name:custom,
type:struct<customer_id:string>, comment:null)],
location:hdfs://localhost:9000/user/hive/warehouse/json_serde_test,
inputFormat:org.apache.hadoop.mapred.TextInputFormat,
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
serializationLib:org.openx.data.jsonserde.JsonSerDe, parameters:
{serialization.format=1, mapping.customer_id=Customer ID}),
bucketCols:[], sortCols:[], parameters:{},
skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[],
skewedColValueLocationMaps:{}), storedAsSubDirectories:false),
partitionKeys:[], parameters:{numFiles=1,
transient_lastDdlTime=1448477903, COLUMN_STATS_ACCURATE=true,
totalSize=128, numRows=0, rawDataSize=0}, viewOriginalText:null,
viewExpandedText:null, tableType:MANAGED_TABLE)
Time taken: 0.073 seconds, Fetched: 5 row(s)
Will output a json-ish detailed description that includes comments..kind of hard to read but it is showing me the comments and may be enough for your purposes..or not.