I have PCollection<String> of type String and I want to transform this to get values of specific column from BigQuery table. So I used BigQueryIO.readTableRows to get values from BigQuery.
Here is my Code:
PCollection<TableRow> getConfigTable = pipeline.apply("read from Table",
BigQueryIO.readTableRows().from("TableName"));
RetrieveDestTableName retrieveDestTableName = new RetrieveDestTableName();
PCollection<String> getDestTableName = getConfigTable.apply(ParDo.of(new DoFn<String,String>(){
#ProcessElement
public void processElement(ProcessContext c){
c.output(c.element().get("ColoumnName").toString());
}
}));
As per above code I will get an output from getDestTableName of type PCollection<String> but I want this output in String variable.
Is there any way to convert PCollection<String> to String datatype variable so that I can able to use variable in my code?
Converting a PCollection<String> to a String is not possible in the Apache Beam programming model. A PCollection simply describes the state of the pipeline at any given point. During development, you do not have literal access to the strings in the PCollection.
You can process the strings in a PCollection through transforms. However, it seems like you need the table configuration to construct the rest of the pipeline. You'll need to know the destination ahead of time or you can use DynamicDestinations to determine which table to write to during pipeline execution. You cannot get the table configuration value from the PCollection and use it to further construct the pipeline.
It seems that you want something like JdbcIO.readAll() but for BigQuery, allowing the read configuration(s) to be dynamically computed by the pipeline. This is currently not implemented for BigQuery, but it'd be a reasonable request.
Meanwhile your options are:
Express what you're doing as a more complex BigQuery SQL query, and use a single BigQueryIO.read().fromQuery()
Express the part of your pipeline where you extract the table of interest without the Beam API, instead using the BigQuery API directly, so you are operating regular Java variables instead of PCollections.
Related
And now, I am trying to rewrite my java application in Kotlin. And then, I met the log statement, like
log.info("do the print thing for {}", arg);
So I have two ways to do the log things in Kotlin like log.info("do the print thing for {}", arg) and log.info("do the print thing for $arg"). The 1st is delegate format to framework like Slf4j or Log4j; the 2nd is using Kotlin string template.
So what's the difference and which one's performance is better?
In general, these two ways produce the same log, unless the logging library is also configured to localise the message and parameters when formatting the message, which Kotlin's string interpolation does not do at all.
The crucial difference lies in the performance, when you turn off logging (at that particular level). As SLF4J's FAQ says:
There exists a very convenient alternative based on message formats.
Assuming entry is an object, you can write:
Object entry = new SomeObject();
logger.debug("The entry is {}.", entry);
After evaluating whether to log or not, and only if the decision is
affirmative, will the logger implementation format the message and
replace the '{}' pair with the string value of entry. In other words,
this form does not incur the cost of parameter construction in case
the log statement is disabled.
The following two lines will yield the exact same output. However, the
second form will outperform the first form by a factor of at least 30,
in case of a disabled logging statement.
logger.debug("The new entry is "+entry+".");
logger.debug("The new entry is {}.", entry);
Basically, if the logging is disabled, the message won't be constructed if you use parameterised logging. If you use string interpolation however, the message will always be constructed.
Note that Kotlin's string interpolation compiles to something similar to what a series of string concatenation (+) in Java compiles to (though this might change in the future).
"foo $bar baz"
is translated into:
StringBuilder().append("foo ").append(bar).append(" baz").toString()
See also: Unable to understand why to use parameterized logging
I'm currently running some performance tests and am having issues converting a string I have extracted from JSON into an int.
The problem I'm having is that I need this number which has been extracted as both an int and a string, its currently only a string and I don't see how I can create another variable where the number is an int.
Here is the JSON extractor I'm using
How can I have another variable which is an int?
By default JMeter stores values into JMeter Variables in String form, if you need to save it in Integer form as well you can do it using i.e. __groovy() function like:
${__groovy(vars.putObject('Fixture_ID_INT'\, vars.get('Fixture_ID') as int),)}
and access it where required like:
${__groovy(vars.getObject('Fixture_ID_INT'),)}
Demo:
More information: Apache Groovy - Why and How You Should Use It
you can use the below code
Integer.parseInt(vars.get("urs"));
Let say in my Pig Script I just want to generate a summary by calling a UDF just once.
The UDF will actually take a map and inside it will properly format the map and will return a String.
Is there any way of calling this UDF just once, instead of calling it by
report = FOREACH dummyTuple GENERATE myUDF(myMap);
One way of doing is to generate a dummyTuple and limiting it to 1 and then calling the above.
Is there a way to dynamically compute the input value to a LOAD statement in pig? Conceptually, I want to do something like this:
%declare MYINPUT com.foo.myMethod('2013-04-15');
raw = LOAD '$MYINPUT' ...
myMethod() is a UDF that accepts a date as input and returns a (comma-separated) list of directories as a string. That string is then given as the input to the LOAD statement.
Thanks.
It doesn't sound to me like myMethod() needs to be a UDF. Presuming this list of directories doesn't need to be computed in map reduce you could run the function to get the string first, then make it a property you pass to pig. Sample if your driver was in java provided below:
String myInput = myMethod("2013-04-15");
PigServer pig = new PigServer(ExecType.MAPREDUCE);
Map<String,String> myProperties = new HashMap<String,String>();
myProperties.put("myInput",myInput);
pig.registerScript("myScriptLocation.pig");
and then your script would start with
raw = LOAD '$myInput' USING...
this is assuming your myInput String is in a glob format PigStorage can read, or you have a different LoadFunc that can handle your comma separated string in mind.
I had a similar issue and opted for a Java LoadFunc implementation instead of a pre-processor. Using a custom LoadFunc means the script can still be run by analysts using the stock pig executable, and doesn't require another dependency.
Consider the following line of code:
private void DoThis() {
int i = 5;
var repo = new ReportsRepository<RptCriteriaHint>();
// This does NOT work
var query1 = repo.Find(x => x.CriteriaTypeID == i).ToList<RptCriteriaHint>();
// This DOES work
var query1 = repo.Find(x => x.CriteriaTypeID == 5).ToList<RptCriteriaHint>();
}
So when I hardwire an actual number into the lambda function, it works fine. When I use a captured variable into the expression it comes back with the following error:
No mapping exists from object type
ReportBuilder.Reporter+<>c__DisplayClass0
to a known managed provider native
type.
Why? How can I fix it?
Technically, the correct way to fix this is for the framework that is accepting the expression tree from your lambda to evaluate the i reference; in other words, it's a LINQ framework limitation for some specific framework. What it is currently trying to do is interpret the i as a member access on some type known to it (the provider) from the database. Because of the way lambda variable capture works, the i local variable is actually a field on a hidden class, the one with the funny name, that the provider doesn't recognize.
So, it's a framework problem.
If you really must get by, you could construct the expression manually, like this:
ParameterExpression x = Expression.Parameter(typeof(RptCriteriaHint), "x");
var query = repo.Find(
Expression.Lambda<Func<RptCriteriaHint,bool>>(
Expression.Equal(
Expression.MakeMemberAccess(
x,
typeof(RptCriteriaHint).GetProperty("CriteriaTypeID")),
Expression.Constant(i)),
x)).ToList();
... but that's just masochism.
Your comment on this entry prompts me to explain further.
Lambdas are convertible into one of two types: a delegate with the correct signature, or an Expression<TDelegate> of the correct signature. LINQ to external databases (as opposed to any kind of in-memory query) works using the second kind of conversion.
The compiler converts lambda expressions into expression trees, roughly speaking, by:
The syntax tree is parsed by the compiler - this happens for all code.
The syntax tree is rewritten after taking into account variable capture. Capturing variables is just like in a normal delegate or lambda - so display classes get created, and captured locals get moved into them (this is the same behaviour as variable capture in C# 2.0 anonymous delegates).
The new syntax tree is converted into a series of calls to the Expression class so that, at runtime, an object tree is created that faithfully represents the parsed text.
LINQ to external data sources is supposed to take this expression tree and interpret it for its semantic content, and interpret symbolic expressions inside the tree as either referring to things specific to its context (e.g. columns in the DB), or immediate values to convert. Usually, System.Reflection is used to look for framework-specific attributes to guide this conversion.
However, it looks like SubSonic is not properly treating symbolic references that it cannot find domain-specific correspondences for; rather than evaluating the symbolic references, it's just punting. Thus, it's a SubSonic problem.