Output specific key value in object for each element in array with jq for JSON - iteration

I have an array:
[
{
"AssetId": 14462955,
"Name": "Cultural Item"
},
{
"AssetId": 114385498,
"Name": "Redspybot"
},
{
"AssetId": 29715011,
"Name": "American Cowboy"
},
{
"AssetId": 98253651,
"Name": "Mahem"
}
]
I would like to loop through each object in this array, and pick out the value of each key called AssetId and output it.
How would I do this using jq for the command line?

The command-line tool jq writes to STDOUT and/or STDERR. If you want to write the .AssetId information to STDOUT, then one possibility would be as follows:
jq -r ".[] | .AssetId" input.json
Output:
14462955
114385498
29715011
98253651
A more robust incantation would be: .[] | .AssetId? but your choice will depend on what you want if there is no key named "AssetId".

You can also do it via this command.
jq ".[].AssetId" input.json
if array like be that which is in my case
{
"resultCode":0,
"resultMsg":"SUCCESS",
"uniqueRefNo":"111222333",
"list":[
{
"cardType":"CREDIT CARD",
"isBusinessCard":"N",
"memberName":"Bank A",
"memberNo":10,
"prefixNo":404591
},
{
"cardType":"DEBIT CARD",
"isBusinessCard":"N",
"memberName":"Bank A",
"memberNo":10,
"prefixNo":407814
},
{
"cardType":"CREDIT CARD",
"isBusinessCard":"N",
"memberName":"Bank A",
"memberNo":10,
"prefixNo":413226
}
]
}
you can get the prefixNo with below jq command.
jq ".list[].prefixNo" input.json
For more specific case on array iterating on jq you can check this blogpost

you have a couple of choices to do the loop itself. you can apply peak's awesome answer and wrap a shell loop around it. replace echo with the script you want to run.
via xargs
$ jq -r ".[] | .AssetId" input.json | xargs -n1 echo # this would print
14462955
114385498
29715011
98253651
via raw loop
$ for i in $(jq -r ".[] | .AssetId" input.json)
do
echo $i
done
14462955
114385498
29715011
98253651

An alternative using map:
jq "map ( .AssetId ) | .[]"

For your case jq -r '.[].AssetId' should work
You can also use online JQ Parser : https://jqplay.org/
If you want to loop through the each value then can use below :
for i in $(echo $api_response | jq -r ".[].AssetId")
do
echo echo $i
done

Related

Fish Shell | Command Substitution using curl and JSON for variable assignment

I cannot find any doc for Fish Shell regarding using Command Substitution more than once.
I'm trying to assign the state, city from the JSON result set (jq parser) piped from a curl API query of LocationIQ. 2 Command Substitution 1:(curl) and 2:(jq). I don't need the location variable assignment if I can get the address variable assignment
Purpose of Function:
#Take 2 arguments (Latitude, Longitude) and return 2 variables $State, $City
The JSON:
{
"address": {
"city": "Aurora",
"country": "United States of America",
"country_code": "us",
"county": "Kane County",
"postcode": "60504",
"road": "Ridge Road",
"state": "Illinois"
},
"boundingbox": [
"41.729347",
"41.730247",
"-88.264466",
"-88.261979"
],
"display_name": "Ridge Road, Aurora, Kane County, Illinois, 60504, USA",
"importance": 0.2,
"lat": "41.729476",
"licence": "https://locationiq.com/attribution",
"lon": "-88.263423",
"place_id": "333878957973"
}
My Function:
function getLocation
set key 'hidden'
set exifLat $argv[1]
set exifLon $argv[2]
set location (curl -s "https://us1.locationiq.com/v1/reverse.phpkey=$key&lat=$exifLat&lon=$exifLon&format=json" | set address (jq --raw-output '.address.state,.address.city') )
echo "Location: $location
echo "state: $address[1]"
echo "city: $address[2]"
end
Error: fish Command substitution not allowed
Works fine using only the curl Command substitution ->removing the: set address & parens for jq.
set location (curl -s "https://us1.locationiq.com/v1/reverse.phpkey=$key&lat=$exifLat&lon=$exifLon&format=json" | jq --raw-output '.address.state,.address.city')
I'm still pretty novice - maybe there is a better way to achieve my desired result: Assign the JSON State to a variable and City to a variable?
I originally tried (slicing the location[17] - City, location[19] - State) and getting inconsistent results as the fields seem to be dynamic and affecting how many results which affects the ordering.
Any help appreciated!
I find the nested set confusing. Did you intend to do use $location to hold the downloaded JSON data, and $address to hold the results of jq? If yes, split them out into separate statements
set url "https://us1.locationiq.com/v1/reverse.phpkey=$key&lat=$exifLat&lon=$exifLon&format=json"
set location (curl -s $url)
set address (echo $location | jq --raw-output '.address.state,.address.city')

pass array as argument to script module in ansible

I am using a script module to run a script on some hosts. I have an array ansible variable, that I want to pass the script as space separated arguments. Passing the array variable directly as argument does not help. Suggestions ?
You can join your list with jinja filter and pass it as a variable, like this:
ansible -m script -a "myscript.sh {{ test_list|join(' ') }}" localhost -e "{"test_list": [1,2,3]}"
If myscript.sh is:
#!/bin/bash
echo Args are: ${#}, 1st: $1 2nd: $2, 3d: $3
The output will be:
localhost | SUCCESS => {
"changed": true,
"failed": false,
"rc": 0,
"stderr": "",
"stdout": "Args are: 1 2 3, 1st: 1 2nd: 2, 3d: 3\n",
"stdout_lines": [
"Args are: 1 2 3, 1st: 1 2nd: 2, 3d: 3"
]
}

What is the best way to create a subset of my data in Elasticsearch?

I have an index in elasticsearch containing apache log data. Here is what I want to do:
Identify all visitors (by ip number) that accessed a certain file (e.g. /signup.php).
Do a search/query/aggregation on my data, but limit the documents that are examined to those containing an ip number found in step 1.
In the sql world, I would just create a temporary table and insert all the matching IP numbers from step one. Next I would query my main table and limit the result set by joining in my temporary table on IP number.
I understand joins are not possible in elasticsearch. The elasticsearch documentation suggests a few ways to handle situations like this:
Application side joins
This does not seem practical, because the list of IP numbers may be very large and it seems inefficient to send the results to the client and then pass it back to elasticsearch in one huge terms filter.
Denormalizing the data
This would involve iterating over the matching IP numbers and updating every document in the index for any given IP number with something like "in_group": true, so I can use that in my query later on. This also seems very impractical and inefficient, especially since the source query (step 1) is dynamic.
Nested Object and/or parent-Child relationship
I'm not sure if dynamically creating new documents with nested objects is practical in this case. It seems to me that I would end up copying huge parts of my data.
I'm new to elasticsearch and noSQL in general, so perhaps I'm just looking at the problem the wrong way and I shouldn't be trying to emulate a JOIN in the first place.
But this seems like such a common case for segmenting a dataset, it makes me wonder if I am overlooking some other obvious way of doing this?
Any help would be appreciated!
If I understood your question correctly, you are trying to get a subset of your documents based on certain condition and use that sub set to query/search/aggregate it further.
If true, why would you like to store it in another view(sql types). The main power of elasticsearch is it's caching capability of filters and thus it highly reduces your query time. Using this feature, all the queries/searches/aggregation you need to perform on, would require a term filter which would specify the condition you are trying to do in step 1. Now, whatever other operations you want to do, you can do it in the same query on the already shrinked dataset.
If you have other different use cases, then the storage of document(mapping) might be considered to get changed for easier and faster retrieval.
This is a current workaround that I use:
Run this bash script to save the first query ip-list to a temp index, then use a terms-query filter (in Kibana) to query using the ip-list from step1.
#!/usr/bin/env bash
es_host='https://************'
elk_user='************'
cred=($(pass ELK/************ | tr "\n" " ")) ##password
index_name='iis-************'
index_hostname='"************"'
temp_index_path='temp1/_doc/1'
results_limit=1000
timestamp_gte='"2018-03-20T13:00:00"' #UTC
timestamp_lte='"now"' #UTC
resp_data="$(curl -X POST $es_host/$index_name/_search -u $elk_user:${cred[0]} -H 'Content-Type: application/json; charset=utf-8' -d #- << EOF
{
"query": {
"bool": {
"must": [{
"match": {
"index_hostname": {
"query": $index_hostname
}
}
},
{
"regexp": {
"iis.access.url":{
"value": ".*((jpg)|(jpeg)|(png))"
}
}
}],
"must_not": {
"match": {
"iis.access.agent": {
"query": "Amazon+CloudFront"
}
}
},
"filter": {
"range": {
"#timestamp": {
"gte": $timestamp_gte,
"lte": $timestamp_lte
}
}
}
}
},
"aggs" : {
"whatever" : {
"terms" : { "field" : "iis.access.remote_ip", "size":$results_limit }
}
},
"size" : 0
}
EOF
)"
ip_list="$(echo "$resp_data" | jq '.aggregations.whatever.buckets[].key' | tr "\n" ",\ " | head -c -1)"
resp_data2="$(curl -X PUT $es_host/$temp_index_path -u $elk_user:${cred[0]} -H 'Content-Type: application/json; charset=utf-8' -d #- << EOF
{
"ips" : [$ip_list]
}
EOF
)"
echo "$resp_data2"
Query DSL - "terms-query" filter:
{
"query": {
"terms": {
"iis.access.remote_ip": {
"id": "1",
"index": "temp1",
"path": "ips",
"type": "_doc"
}
}
}
}

Using the Instagram API to get ALL followers

I'm using the Instagram API to get the number of people who follow a given account as follows.
$follow_info = file_get_contents('https://api.instagram.com/v1/users/477644454/followed-by?access_token=ACESS_TOKEN&count=-1');
$follow_info = #json_decode($follow_info, true);
This returns a set of 50 results. They do have a next_url key in the array, but it becomes time consuming to keep on going to the next page of followers when dealing with tens of thousands.
I read on StackOverflow that setting the count parameter to -1 would return the entire set. But, it doesn't seem to...
Instagram limits the number of results returned in their API for all sorts of endpoints, and they change these limits arbitrarily, without warning, presumably to handle server load.
Several similar threads exist:
Instagram API not fufilling count parameter
Displaying more than 20 photos in instagram API
Instagram API: How to get all user media? (see comments on answer too, -1 returns 1 less result).
350 Request Limit for Instagram API
Instagram API: How to get all user media?
In short, you won't be able to increase the maximum returned rows, and you'll be stuck paginating.
$follow_info = file_get_contents('https://api.instagram.com/v1/users/USER_ID?access_token=ACCES_TOKEN');
$follow_info = json_decode($follow_info);
print_r($follow_info->data);
And:
return
{
"meta": {
"code": 200
},
"data": {
"username": "i_errorw",
"bio": "A Casa do Júlio é um espaço para quem gosta da ideia de cuidar da saúde com uma alimentação saudável e saborosa.",
"website": "",
"profile_picture": "",
"full_name": "",
"counts": {
"media": 5,
"followed_by": 10,
"follows": 120000
},
"id": "1066376857"
}
}
if the APIs are optional
using the mobile version of twitter you can extract a full list of a followers for a designed target using a very simple bash script
the sleep time must me chosen carefully to avoid temporary ip block
the script can be executed by :
./scriptname.sh targetusername
content
#!/bin/bash
counter=1
wget --load-cookies ./twitter.cookies -O - "https://mobile.twitter.com/$1/followers?" > page
until [ $counter = 0 ]; do
cat page | grep -i "#" | grep -vi "fullname" | grep -vi "$1" | awk -F">" '{print $5}' | awk -F"<" '{print $1}' >> userlist
nextpage=$(cat page | grep -i "cursor" | awk -F'"' '{print $4}')
wget --load-cookies twitter.cookies -O - "https://mobile.twitter.com/$nextpage" > page
if [ -z $nextpage ]; then
exit 0
fi
sleep 5
done
it creates a file "userlist" including all usernames that follows the designed target one by line
PS: a cookies file filled with your credentials is necessary to wget to authenticate the requests
I personally suggest to use Wizboost for instagram automation. And the reason is that I have used this tool and my experience is amazing. It gave me a lot of followers. Now you don’t need to invest time in competing with other Instagram accounts as Wizboost has got your back for this, in fact for everything. You don’t need to do anything you can just relax and Wizboost will get you followers, likes and comments. And you can also schedule your posts too. So easy to use and still got lots of potential. I just love Wizboost for all the services it has.
$follow_info = file_get_contents('https://api.instagram.com/v1/users/USER_ID?access_token=ACCES_TOKEN');
$follow_info = json_decode($follow_info);
print_r($follow_info->data);
return
{
"meta": {
"code": 200
},
"data": {
"username": "casadojulio",
"bio": "A Casa do Júlio é um espaço para quem gosta da ideia de cuidar da saúde com uma alimentação saudável e saborosa.",
"website": "",
"profile_picture": "",
"full_name": "",
"counts": {
"media": 5,
"followed_by": 25,
"follows": 12
},
"id": "1066376857"
}
}

In Elasticsearch, Why do I lose the whole word token when I run a word through an ngram filter?

It seems that if I am running a word or phrase through an ngram filter, the original word does not get indexed. Instead, I only get chunks of the word up to my max_gram value. I would expect the original word to get indexed as well. I'm using Elasticsearch 0.20.5. If I set up an index using a filter with ngrams like so:
CURL -XPUT 'http://localhost:9200/test/' -d '{
"settings": {
"analysis": {
"filter": {
"my_ngram": {
"max_gram": 10,
"min_gram": 1,
"type": "nGram"
},
"my_stemmer": {
"type": "stemmer",
"name": "english"
}
},
"analyzer": {
"default_index": {
"filter": [
"standard",
"lowercase",
"asciifolding",
"my_ngram",
"my_stemmer"
],
"type": "custom",
"tokenizer": "standard"
},
"default_search": {
"filter": [
"standard",
"lowercase"
],
"type": "custom",
"tokenizer": "standard"
}
}
}
}
}'
Then I put a long word into a document:
CURL -XPUT 'http://localhost:9200/test/item/1' -d '{
"foo" : "REALLY_REALLY_LONG_WORD"
}'
And I query for that long word:
CURL -XGET 'http://localhost:9200/test/item/_search' -d '{
"query":
{
"match" : {
"foo" : "REALLY_REALLY_LONG_WORD"
}
}
}'
I get 0 results. I do get a result if I query for a 10 character chunk of that word. When I run this:
curl -XGET 'localhost:9200/test/_analyze?text=REALLY_REALLY_LONG_WORD
I get tons of grams back, but not the original word. Am I missing a configuration to make this work the way I want?
If you would like to keep the complete word of phrase, use a multi-field mapping for the value where you keep one "not analyzed" or with keyword-tokenizer instead.
Also, when searching a field with nGram-tokenized values, you should probably also use the nGram-tokenizer for the search, then the n-character limit will also apply for the search-phrase, and you will get the expected results.