Elasticsearch NEST Buckets Inside Buckets Aggregation

Elasticsearch NEST Buckets Inside Buckets Aggregation - nest

How can I retrieve the values from a bucket inside another bucket using NEST
Link to example
This is how I would normally get bucket values
var colors = response.Aggregations.Terms("colors");
but how can i get the value of make
var makes = response.Aggregations.Terms("colors.make");

So this seems to work for me, I am not 100% sure if its the correct way of retrieving it.
var nestedBucket = response.Aggregations.Terms("colors").Buckets
.Select(_ => _.Terms("make").Buckets);

Related

Terraform: How Do I Setup a Resource Based on Configuration

So here is what I want as a module in Pseudo Code:
IF UseCustom, Create AWS Launch Config With One Custom EBS Device and One Generic EBS Device
ELSE Create AWS Launch Config With One Generic EBS Device
I am aware that I can use the 'count' function within a resource to decide whether it is created or not... So I currently have:
resource aws_launch_configuration "basic_launch_config" {
count = var.boolean ? 0 : 1
blah
}
resource aws_launch_configuration "custom_launch_config" {
count = var.boolean ? 1 : 0
blah
blah
}
Which is great, now it creates the right Launch configuration based on my 'boolean' variable... But in order to then create the AutoScalingGroup using that Launch Configuration, I need the Launch Configuration Name. I know what you're thinking, just output it and grab it, you moron! Well of course I'm outputting it:
output "name" {
description = "The Name of the Default Launch Configuration"
value = aws_launch_configuration.basic_launch_config.*.name
}
output "name" {
description = "The Name of the Custom Launch Configuration"
value = aws_launch_configuration.custom_launch_config.*.name
}
But how the heck do I know from the higher area that I'm calling the module that creates the Launch Configuration and Then the Auto Scaling Group which output to use for passing into the ASG???
Is there a different way to grab the value I want that I'm overlooking? I'm new to Terraform and the whole no real conditional thing is really throwing me for a loop.

Terraform: How to conditionally assign an EBS volume to an ECS Cluster
This seemed to be the cleanest way I could find, using a ternary operator:
output "name {
description = "The Name of the Launch Configuration"
value = "${(var.booleanVar) == 0 ? aws_launch_configuration.default_launch_config.*.name : aws_launch_configuration.custom_launch_config.*.name}
}
Let me know if there is a better way!

You can use the same variable you used to decide which resource to enable to select the appropriate result:
output "name" {
value = var.boolean ? aws_launch_configuration.custom_launch_config[0].name : aws_launch_configuration.basic_launch_config[0].name
}
Another option, which is a little more terse but arguably also a little less clear to a future reader, is to exploit the fact that you will always have one list of zero elements and one list with one elements, like this:
output "name" {
value = concat(
aws_launch_configuration.basic_launch_config[*].name,
aws_launch_configuration.custom_launch_config[*].name,
)[0]
}
Concatenating these two lists will always produce a single-item list due to how the count expressions are written, and so we can use [0] to take that single item and return it.

How to evenly distribute data in apache pig output files?

I've got a pig-latin script that takes in some xml, uses the XPath UDF to pull out some fields and then stores the resulting fields:
REGISTER udf-lib-1.0-SNAPSHOT.jar;
DEFINE XPath com.blah.udfs.XPath();
docs = LOAD '$input' USING com.blah.storage.XMLLoader('root') as (content:chararray);
results = FOREACH docs GENERATE XPath(content, 'root/id'), XPath(content, 'root/otherField'), content;
store results into '$output';
Note that we're using pig-0.12.0 on our cluster, so I ripped the XPath/XMLLoader classes out of pig-0.14.0 and put them in my own jar so that I could use them in 0.12.
This above script works fine and produces the data that I'm looking for. However, it generates over 1,900 partfiles with only a few mbs in each file. I learned about the default_parallel option, so I set that to 128 to try and get 128 partfiles. I ended up having to add a piece to force a reduce phase to achieve this. My script now looks like:
set default_parallel 128;
REGISTER udf-lib-1.0-SNAPSHOT.jar;
DEFINE XPath com.blah.udfs.XPath();
docs = LOAD '$input' USING com.blah.storage.XMLLoader('root') as (content:chararray);
results = FOREACH docs GENERATE XPath(content, 'root/id'), XPath(content, 'root/otherField'), content;
forced_reduce = FOREACH (GROUP results BY RANDOM()) GENERATE FLATTEN(results);
store forced_reduce into '$output';
Again, this produces the expected data. Also, I now get 128 part-files. My problem now is that the data is not evenly distributed among the part-files. Some have 8 gigs, others have 100 mb. I should have expected this when grouping them by RANDOM() :).
My question is what would be the preferred way to limit the number of part-files yet still have them evenly-sized? I'm new to pig/pig latin and assume I'm going about this in the completely wrong way.
p.s. the reason I care about the number of part-files is because I'd like to process the output with spark and our spark cluster seems to do a lot better with a smaller number of files.

I'm still looking for a way to do this directly from the pig script but for now my "solution" is to repartition the data within the spark process that works on the output of the pig script. I use the RDD.coalesce function to rebalance the data.

From the first code snippet, I am assuming it is map only job since you are not using any aggregates.
Instead of using reducers, set the property pig.maxCombinedSplitSize
REGISTER udf-lib-1.0-SNAPSHOT.jar;
DEFINE XPath com.blah.udfs.XPath();
docs = LOAD '$input' USING com.blah.storage.XMLLoader('root') as (content:chararray);
results = FOREACH docs GENERATE XPath(content, 'root/id'), XPath(content, 'root/otherField'), content;
store results into '$output';
exec;
set pig.maxCombinedSplitSize 1000000000; -- 1 GB(given size in bytes)
x = load '$output' using PigStorage();
store x into '$output2' using PigStorage();
pig.maxCombinedSplitSize - setting this property will make sure each mapper reads around 1 GB data and above code works as identity mapper job, which helps you write data in 1GB part file chunks.

D3 Graph Example Using In Memory Object

This seems like it should be simple, but I have spent literally hours without any success.
Take the D3 graph example at http://bl.ocks.org/mbostock/950642. The example uses a local file called graph.json. I have set up a Rails app to serve a similar graph, however I don't want to write a file of the JSON. Rather, I generate the nodes and links into an object such as:
{"nodes":[{"node_type":"Person","name":"Damien","id":"damien_person"}, {"node_type":"Person","name":"Grant","id":"grant_person"}}],
"links":[{"source":"damien_person","target":"grant_person","label":"Friends"}}
Now when I render the D3, I need to update the call d3.json("graph.json", function(json) {...}); to reference my in-memory object rather than the local file (or url). However, everything I've tried breaks my html/javascript. For example I tried setting the var dataset = <%= raw(#myInMemoryObject) %>;, and that works for assignment (I did an alert on the dataset), however I can't get the D3 code to use it.
How can I replace the d3.json call in order to use my in-memory object?
Thank you,
Damien

Your idea of using, for example, var dataset = <%= raw(#myInMemoryObject) %>; is the right way to go but you need to prep your object to be in the right format.
The nodes specified in the links need to either be numeric references to nodes in the nodes array eg. 0 for first, 1 for second
var json ={
"nodes":[{"name":"Damien","id":"a"}, {"name":"Bob","id":"b"}],
"links":[{"source":0, "target":1,"value":1}]
}
or links to the actual objects which make the nodes themselves:
var a = {"name":"Damien","id":"a"};
var b = {"name":"Bob","id":"b"}
var json ={
"nodes":[a,b],
"links":[{"source":a,"target":b,"value":1}]
};
Relevant discussion is here: https://groups.google.com/forum/?fromgroups=#!topic/d3-js/LWuhBeEipz4
Example here: http://jsfiddle.net/5A9eV/1/

JMeter regex extractor forEach controller

I'm creating some tests with JMeter, the situation is very simple, I have a search page with a list of results, and I have to retrieve from this results some values to use in the next request.
Those results are around 350, I don't need them all.
I used the RegexExtractor to retrieve those results and it works (I setted it to retrieve just 10 results), but now I don't know how to access the results inside a LoopCounter.
The extractor put the results into a variable named Result.
The problem is that I don't know hot to build dinamically the name of a variable.
Do I have to use some function like _p()??
I can access the variable just putting the static name Result_0_g1
Inside the LoopCounter I putted also a Counter to store the loop count into the variable index
Thank you
EDIT:
SOLVED I have to write:
${__V(Result_${index}_g1)

You have to reference the variable with the function:
${__V(Result_${index}_g1)

...Just for collection.
See also this post for another implementation (case without using ForEach Controller):
ThreadGroup
HttpSampler
Regex Extractor (variableName = links)
WhileController(${__javaScript(${C} < ${links_matchNr})})
HTTPSampler use ${__V(links_${C})} to access the current result
Counter (start=1, increment=1, maximum=${links_matchNr}, referenceName=C)

Use the ForEach Controller - it's specifically designed for this purpose and does exactly what you want.

You may use ForEach Controller:
ThreadGroup
YourSampler
Regular Expression Extractor (match -1, any template)
Foreach controller
Counter(Maximum -> ${Result_matchNr} , Rf Name -> index)
LinkSamplerUsingParsedData(use -> ${__V(Result_${index}_g1)}
Now, if you want to iterate to all groups, you need another foreach to do that. As you know which group represent what value, you can use this way.

dojo query for multiple values with setQuery

I'm having trouble getting the syntax right for a setQuery method call for multiple values, i.e.
setQuery({x : 1}) or setQuery({x : 2})
combined. Or do I need to use filter?

In case you are using Dojo Store API I think one way to query using function is described here
You can modify it like this
store.query(function(item){
return item.x == 1 || item.x == 2;
});

That will depend on the store you are using.
In order to do that easier, you should use dojox.data.AndOrReadStore
Dojo tool kit, and or read store
using that store you can use setQuery as:
yourgrid.setQuery({complexQuery:"x:1 OR x:2"});

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Elasticsearch NEST Buckets Inside Buckets Aggregation - nest

How can I retrieve the values from a bucket inside another bucket using NEST Link to example This is how I would normally get bucket values var colors = response.Aggregations.Terms("colors"); but how can i get the value of make var makes = response.Aggregations.Terms("colors.make");

So this seems to work for me, I am not 100% sure if its the correct way of retrieving it. var nestedBucket = response.Aggregations.Terms("colors").Buckets .Select(_ => _.Terms("make").Buckets);

Related

Terraform: How Do I Setup a Resource Based on Configuration

How to evenly distribute data in apache pig output files?

D3 Graph Example Using In Memory Object

JMeter regex extractor forEach controller

dojo query for multiple values with setQuery

Categories

Resources