StandardAnalyzer with stemming

StandardAnalyzer with stemming - lucene

Is there a way to integrate PorterStemFilter into StandardAnalyzer in Lucene, or do I have to copy/paste StandardAnalyzers source code, and add the filter, since StandardAnalyzer is defined as final class. Is there any smarter way?
Also, if I would like not to consider numbers, how can I achieve that?
Thanks

If you want to use this combination for English text analysis, then you should use Lucene's EnglishAnalyzer. Otherwise, you could create a new Analyzer that extends the AnalyzerWraper as shown below.
import java.io.IOException;
import java.io.StringReader;
import java.util.HashSet;
import java.util.Set;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.AnalyzerWrapper;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.core.TypeTokenFilter;
import org.apache.lucene.analysis.en.PorterStemFilter;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
import org.apache.lucene.util.Version;
public class PorterAnalyzer extends AnalyzerWrapper {
private Analyzer baseAnalyzer;
public PorterAnalyzer(Analyzer baseAnalyzer) {
this.baseAnalyzer = baseAnalyzer;
}
#Override
public void close() {
baseAnalyzer.close();
super.close();
}
#Override
protected Analyzer getWrappedAnalyzer(String fieldName)
{
return baseAnalyzer;
}
#Override
protected TokenStreamComponents wrapComponents(String fieldName, TokenStreamComponents components)
{
TokenStream ts = components.getTokenStream();
Set<String> filteredTypes = new HashSet<>();
filteredTypes.add("<NUM>");
TypeTokenFilter numberFilter = new TypeTokenFilter(Version.LUCENE_46,ts, filteredTypes);
PorterStemFilter porterStem = new PorterStemFilter(numberFilter);
return new TokenStreamComponents(components.getTokenizer(), porterStem);
}
public static void main(String[] args) throws IOException
{
//Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_46);
PorterAnalyzer analyzer = new PorterAnalyzer(new StandardAnalyzer(Version.LUCENE_46));
String text = "This is a testing example. It should tests the Porter stemmer version 111";
TokenStream ts = analyzer.tokenStream("fieldName", new StringReader(text));
ts.reset();
while (ts.incrementToken()){
CharTermAttribute ca = ts.getAttribute(CharTermAttribute.class);
System.out.println(ca.toString());
}
analyzer.close();
}
}
The code above is based on this lucene forum thread's. The main work is implemented by the wrapComponents method. You first get the TokenStream object from the wrapped analyzer, you then shoud apply a type filter to ignore numerical tokens. Lastly, you apply the porter stemmer filter. I hope it is clear.

Related

Error in BigQuery Snippets

I'm new to data flow and trying to get schema of table in big query dynamically.
Also i need to get the name of destination table dynamically for which i'm using dynamic destination class in BigQueryIO.write.to(). It works if the schema is provided for the destination table before executing the pipeline. But to get the schema dynamically i'm using BigQuery Snippets which takes datasetId and tableId as input and returns schema for a given table. It gives errors mentioned below when tried to run the pipeline with Snippets.
Any help is appreciated.
Thanks in advance.
Exception in thread "main" java.lang.NoSuchMethodError: com.google.api.client.googleapis.services.json.AbstractGoogleJsonClient$Builder.setBatchPath(Ljava/lang/String;)Lcom/google/api/client/googleapis/services/AbstractGoogleClient$Builder;
at com.google.api.services.bigquery.Bigquery$Builder.setBatchPath(Bigquery.java:3519)
at com.google.api.services.bigquery.Bigquery$Builder.<init>(Bigquery.java:3498)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl.newBigQueryClient(BigQueryServicesImpl.java:881)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl.access$200(BigQueryServicesImpl.java:79)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.<init>(BigQueryServicesImpl.java:388)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.<init>(BigQueryServicesImpl.java:345)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl.getDatasetService(BigQueryServicesImpl.java:105)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO$TypedRead.validate(BigQueryIO.java:676)
at org.apache.beam.sdk.Pipeline$ValidateVisitor.enterCompositeTransform(Pipeline.java:640)
at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:656)
at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:660)
at org.apache.beam.sdk.runners.TransformHierarchy$Node.access$600(TransformHierarchy.java:311)
at org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:245)
at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:458)
at org.apache.beam.sdk.Pipeline.validate(Pipeline.java:575)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:310)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297)
at project2.configTable.main(configTable.java:146)
Code:
package project2;
import java.io.File;
import java.util.ArrayList;
import java.util.List;
import org.apache.avro.Schema;
import org.apache.beam.runners.dataflow.DataflowRunner;
import org.apache.beam.runners.dataflow.options.DataflowPipelineOptions;
import org.apache.beam.sdk.io.TextIO;
import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO;
import org.apache.beam.sdk.io.gcp.bigquery.DynamicDestinations;
import org.apache.beam.sdk.io.gcp.bigquery.TableDestination;
import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.CreateDisposition;
import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.WriteDisposition;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.options.ValueProvider.NestedValueProvider;
import org.apache.beam.sdk.transforms.DoFn;
import org.apache.beam.sdk.transforms.ParDo;
import org.apache.beam.sdk.transforms.SerializableFunction;
import org.apache.beam.sdk.transforms.View;
import org.apache.beam.sdk.transforms.DoFn.ProcessContext;
import org.apache.beam.sdk.transforms.DoFn.ProcessElement;
import org.apache.beam.sdk.values.PCollection;
import org.apache.beam.sdk.values.PCollectionView;
import org.apache.beam.sdk.values.ValueInSingleWindow;
import com.google.api.services.bigquery.model.Table;
import com.google.api.services.bigquery.model.TableFieldSchema;
import com.google.api.services.bigquery.model.TableRow;
import com.google.api.services.bigquery.model.TableSchema;
import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.Field;
import com.google.cloud.bigquery.FieldList;
import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.DatasetInfo;
import com.google.cloud.bigquery.Field;
import com.google.cloud.bigquery.FieldValueList;
import com.google.cloud.bigquery.InsertAllRequest;
import com.google.cloud.bigquery.InsertAllResponse;
import com.google.cloud.bigquery.LegacySQLTypeName;
import com.google.cloud.bigquery.QueryJobConfiguration;
import com.google.cloud.bigquery.StandardTableDefinition;
import com.google.cloud.bigquery.TableId;
import com.google.cloud.bigquery.TableInfo;
import java.util.HashMap;
import java.util.Map;
import avro.shaded.com.google.common.collect.ImmutableList;
public class configTable {
public static void main(String[] args) {
// TODO Auto-generated method stub
customInt op=PipelineOptionsFactory.as(customInt.class);
op.setProject("my-new-project");
op.setTempLocation("gs://train-10/projects");
op.setWorkerMachineType("n1-standard-1");
op.setTemplateLocation("gs://train-10/main-template-with-snippets");
op.setRunner(DataflowRunner.class);
org.apache.beam.sdk.Pipeline p=org.apache.beam.sdk.Pipeline.create(op);
PCollection<TableRow> indata=p.apply("Taking side input",BigQueryIO.readTableRows().from("my-new-project:training.config"));
PCollectionView<String> view=indata.apply("Convert to view",ParDo.of(new DoFn<TableRow, String>() {
#ProcessElement
public void processElement(ProcessContext c) {
TableRow row=c.element();
c.output(row.get("file").toString());
}
})).apply(View.asSingleton());
PCollection<TableRow> mainop = p.apply("Taking input",TextIO.read().from(NestedValueProvider.of(op.getInputFile(), new SerializableFunction<String, String>() {
public String apply(String input) {
// TODO Auto-generated method stub
return "gs://train-10/projects/"+input;
}
} ))).apply("Transform",ParDo.of(new DoFn<String, TableRow>() {
#ProcessElement
public void processElement(ProcessContext c ) {
c.output(new TableRow().set("data", c.element()));
}
}));
mainop.apply("Write data",BigQueryIO.writeTableRows().to(new DynamicDestinations<TableRow, String>() {
#Override
public String getDestination(ValueInSingleWindow<TableRow> element) {
// TODO Auto-generated method stub
String d=sideInput(view);
String tablespec="my-new-project:training."+d;
return tablespec;
}
#Override
public List<PCollectionView<?>> getSideInputs() {
return ImmutableList.of(view);
}
#Override
public TableDestination getTable(String destination) {
// TODO Auto-generated method stub
//String dest=String.format("%s:%s.%s","my-new-project","training", destination);
String dest=destination;
return new TableDestination(dest, dest);
}
#Override
public TableSchema getSchema(String destination) {
BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();
com.google.cloud.bigquery.Table table=bigquery.getTable("training", destination);
com.google.cloud.bigquery.Schema tbschema=table.getDefinition().getSchema();
FieldList tfld=tbschema.getFields();
List<TableFieldSchema> flds=new ArrayList<>();
for (Field each : tfld) {
flds.add(new TableFieldSchema().setName(each.getName()).setType(each.getType().toString()));
}
return new TableSchema().setFields(flds);
}
}).withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED).withWriteDisposition(WriteDisposition.WRITE_TRUNCATE));
p.run();
}
}

I don't think you can do both WRITE_TRUNCATE
.withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED).withWriteDisposition(WriteDisposition.WRITE_TRUNCATE))
and get the table's definition
com.google.cloud.bigquery.Table table=bigquery.getTable("training", destination);
com.google.cloud.bigquery.Schema tbschema=table.getDefinition().getSchema();
Because even if the table exists, it may be recreated when paired with a BigQueryIO.Write.WriteDisposition.WRITE_TRUNCATE and at that point, the getTable call will fail. In other words, WRITE_TRUNCATE is not an atomic operation.
I suggest that you have the table (with right schema) created before hand (CREATE_NEVER) or append to the table if it exists (WRITE_EMPTY or WRITE_APPEND) or store the schema outside of the dataflow pipeline and read it in.

Accessing TableRow columns in BigQuery Apache Beam

I am trying to
1.Read JSON events from Cloud Pub/Sub
2.Load the events from Cloud Pub/Sub to BigQuery every 15 minutes using file loads to save cost on streaming inserts.
3.The destination will differ based on "user_id" and "campaign_id" field in the JSON event, "user_id" will be dataset name and "campaign_id" will be the table name. The partition name comes from the event timestamp.
4.The schema for all tables stays same.
I am new to Java and Beam. I think my code mostly does what I am trying to do and I just a need little help here.
But I unable to access "campaign_id" and "user_id" field in the JSON message.
So, my events are not routing to the correct table.
package ...;
import com.google.api.services.bigquery.model.TableSchema;
import javafx.scene.control.TableRow;
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.coders.Coder;
import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO;
import org.apache.beam.sdk.io.gcp.bigquery.DynamicDestinations;
import org.apache.beam.sdk.io.gcp.bigquery.TableDestination;
import org.apache.beam.sdk.io.gcp.bigquery.TableRowJsonCoder;
import org.apache.beam.sdk.io.gcp.pubsub.PubsubIO;
import org.apache.beam.sdk.transforms.MapElements;
import org.apache.beam.sdk.transforms.PTransform;
import org.apache.beam.sdk.transforms.SimpleFunction;
import org.apache.beam.sdk.values.PCollection;
import org.apache.beam.sdk.values.ValueInSingleWindow;
import org.joda.time.Duration;
import org.joda.time.Instant;
import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.nio.charset.StandardCharsets;
import java.text.SimpleDateFormat;
import static org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED;
import static org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.Method.FILE_LOADS;
import static org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.WriteDisposition.WRITE_APPEND;
public class ClickLogConsumer {
private static final int BATCH_INTERVAL_SECS = 15 * 60;
private static final String PROJECT = "pure-app";
public static PTransform<PCollection<String>, PCollection<com.google.api.services.bigquery.model.TableRow>> jsonToTableRow() {
return new JsonToTableRow();
}
private static class JsonToTableRow
extends PTransform<PCollection<String>, PCollection<com.google.api.services.bigquery.model.TableRow>> {
#Override
public PCollection<com.google.api.services.bigquery.model.TableRow> expand(PCollection<String> stringPCollection) {
return stringPCollection.apply("JsonToTableRow", MapElements.<String, com.google.api.services.bigquery.model.TableRow>via(
new SimpleFunction<String, com.google.api.services.bigquery.model.TableRow>() {
#Override
public com.google.api.services.bigquery.model.TableRow apply(String json) {
try {
InputStream inputStream = new ByteArrayInputStream(
json.getBytes(StandardCharsets.UTF_8.name()));
//OUTER is used here to prevent EOF exception
return TableRowJsonCoder.of().decode(inputStream, Coder.Context.OUTER);
} catch (IOException e) {
throw new RuntimeException("Unable to parse input", e);
}
}
}));
}
}
public static void main(String[] args) throws Exception {
Pipeline pipeline = Pipeline.create(options);
pipeline
.apply(PubsubIO.readStrings().withTimestampAttribute("timestamp").fromTopic("projects/pureapp-199410/topics/clicks"))
.apply(jsonToTableRow())
.apply("WriteToBQ",
BigQueryIO.writeTableRows()
.withMethod(FILE_LOADS)
.withWriteDisposition(WRITE_APPEND)
.withCreateDisposition(CREATE_IF_NEEDED)
.withTriggeringFrequency(Duration.standardSeconds(BATCH_INTERVAL_SECS))
.withoutValidation()
.to(new DynamicDestinations<TableRow, String>() {
#Override
public String getDestination(ValueInSingleWindow<TableRow> element) {
String tableName = "campaign_id"; // JSON message in Pub/Sub has "campaign_id" field, how do I access it here?
String datasetName = "user_id"; // JSON message in Pub/Sub has "user_id" field, how do I access it here?
Instant eventTimestamp = element.getTimestamp();
String partition = new SimpleDateFormat("yyyyMMdd").format(eventTimestamp);
return String.format("%s:%s.%s$%s", PROJECT, datasetName, tableName, partition);
}
#Override
public TableDestination getTable(String table) {
return new TableDestination(table, null);
}
#Override
public TableSchema getSchema(String destination) {
return getTableSchema();
}
}));
pipeline.run();
}
}
I arrived at the above code based on reading:
1.https://medium.com/myheritage-engineering/kafka-to-bigquery-load-a-guide-for-streaming-billions-of-daily-events-cbbf31f4b737
2.https://shinesolutions.com/2017/12/05/fun-with-serializable-functions-and-dynamic-destinations-in-cloud-dataflow/
3.https://beam.apache.org/documentation/sdks/javadoc/2.0.0/org/apache/beam/sdk/io/gcp/bigquery/DynamicDestinations.html
4.BigQueryIO - Write performance with streaming and FILE_LOADS
5.Inserting into BigQuery via load jobs (not streaming)
Update
import com.google.api.services.bigquery.model.TableFieldSchema;
import com.google.api.services.bigquery.model.TableRow;
import com.google.api.services.bigquery.model.TableSchema;
import com.google.api.services.bigquery.model.TimePartitioning;
import com.google.common.collect.ImmutableList;
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.coders.Coder;
import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO;
import org.apache.beam.sdk.io.gcp.bigquery.TableDestination;
import org.apache.beam.sdk.io.gcp.bigquery.TableRowJsonCoder;
import org.apache.beam.sdk.io.gcp.pubsub.PubsubIO;
import org.apache.beam.sdk.transforms.MapElements;
import org.apache.beam.sdk.transforms.PTransform;
import org.apache.beam.sdk.transforms.SimpleFunction;
import org.apache.beam.sdk.values.PCollection;
import org.joda.time.Duration;
import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.nio.charset.StandardCharsets;
import static org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED;
import static org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.Method.FILE_LOADS;
import static org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.WriteDisposition.WRITE_APPEND;
public class ClickLogConsumer {
private static final int BATCH_INTERVAL_SECS = 15 * 60;
private static final String PROJECT = "pure-app";
public static PTransform<PCollection<String>, PCollection<TableRow>> jsonToTableRow() {
return new JsonToTableRow();
}
private static class JsonToTableRow
extends PTransform<PCollection<String>, PCollection<TableRow>> {
#Override
public PCollection<TableRow> expand(PCollection<String> stringPCollection) {
return stringPCollection.apply("JsonToTableRow", MapElements.<String, com.google.api.services.bigquery.model.TableRow>via(
new SimpleFunction<String, TableRow>() {
#Override
public TableRow apply(String json) {
try {
InputStream inputStream = new ByteArrayInputStream(
json.getBytes(StandardCharsets.UTF_8.name()));
//OUTER is used here to prevent EOF exception
return TableRowJsonCoder.of().decode(inputStream, Coder.Context.OUTER);
} catch (IOException e) {
throw new RuntimeException("Unable to parse input", e);
}
}
}));
}
}
public static void main(String[] args) throws Exception {
Pipeline pipeline = Pipeline.create(options);
pipeline
.apply(PubsubIO.readStrings().withTimestampAttribute("timestamp").fromTopic("projects/pureapp-199410/topics/clicks"))
.apply(jsonToTableRow())
.apply(BigQueryIO.write()
.withTriggeringFrequency(Duration.standardSeconds(BATCH_INTERVAL_SECS))
.withMethod(FILE_LOADS)
.withWriteDisposition(WRITE_APPEND)
.withCreateDisposition(CREATE_IF_NEEDED)
.withSchema(new TableSchema().setFields(
ImmutableList.of(
new TableFieldSchema().setName("timestamp").setType("TIMESTAMP"),
new TableFieldSchema().setName("exchange").setType("STRING"))))
.to((row) -> {
String datasetName = row.getValue().get("user_id").toString();
String tableName = row.getValue().get("campaign_id").toString();
return new TableDestination(String.format("%s:%s.%s", PROJECT, datasetName, tableName), "Some destination");
})
.withTimePartitioning(new TimePartitioning().setField("timestamp")));
pipeline.run();
}
}

How about: String tableName = element.getValue().get("campaign_id").toString() and likewise for the dataset.
Besides, for inserting into time-partitioned tables, I strongly recommend using BigQuery's Column-Based Partitioning, instead of using a partition decorator in the table name. Please see "Loading historical data into time-partitioned BigQuery tables" in the javadoc - you'll need a timestamp column. (note that the javadoc has a typo: "time" vs "timestamp")

Arquillian Graphene #Location placeholder

I'm learning Arquillian right now I wonder how to create page that has a placeholder inside the path. For example:
#Location("/posts/{id}")
public class BlogPostPage {
public String getContent() {
// ...
}
}
or
#Location("/posts/{name}")
#Location("/specific-page?requiredParam={value}")
I have looking for an answer on graphine and arquillian reference guides without success. I used library from other language that have support for page-objects, but it has build-in support for placeholders.

AFAIK there is nothing like this implemented in Graphene.
To be honest, I'm not sure how this should behave - how would you pass the values...?
Apart from that, I think that it could be also limited by Java annotation abilities https://stackoverflow.com/a/10636320/6835063

This is not possible currently in Graphene. I've created ARQGRA-500.
It's possible to extend Graphene to add dynamic parameters now. Here's how. (Arquillian 1.1.10.Final, Graphene 2.1.0.Final.)
Create an interface.
import java.util.Map;
public interface LocationParameterProvider {
Map<String, String> provideLocationParameters();
}
Create a custom LocationDecider to replace the corresponding Graphene's one. I replace the HTTP one. This Decider will add location parameters to the URI, if it sees that the test object implements our interface.
import java.io.UnsupportedEncodingException;
import java.net.URLEncoder;
import java.util.Map;
import java.util.Map.Entry;
import org.jboss.arquillian.core.api.Instance;
import org.jboss.arquillian.core.api.annotation.Inject;
import org.jboss.arquillian.graphene.location.decider.HTTPLocationDecider;
import org.jboss.arquillian.graphene.spi.location.Scheme;
import org.jboss.arquillian.test.spi.context.TestContext;
public class HTTPParameterizedLocationDecider extends HTTPLocationDecider {
#Inject
private Instance<TestContext> testContext;
#Override
public Scheme canDecide() {
return new Scheme.HTTP();
}
#Override
public String decide(String location) {
String uri = super.decide(location);
// not sure, how reliable this method of getting the current test object is
// if it breaks, there is always a possibility of observing
// org.jboss.arquillian.test.spi.event.suite.TestLifecycleEvent's (or rather its
// descendants) and storing the test object in a ThreadLocal
Object testObject = testContext.get().getActiveId();
if (testObject instanceof LocationParameterProvider) {
Map<String, String> locationParameters =
((LocationParameterProvider) testObject).provideLocationParameters();
StringBuilder uriParams = new StringBuilder(64);
boolean first = true;
for (Entry<String, String> param : locationParameters.entrySet()) {
uriParams.append(first ? '?' : '&');
first = false;
try {
uriParams.append(URLEncoder.encode(param.getKey(), "UTF-8"));
uriParams.append('=');
uriParams.append(URLEncoder.encode(param.getValue(), "UTF-8"));
} catch (UnsupportedEncodingException e) {
throw new RuntimeException(e);
}
}
uri += uriParams.toString();
}
return uri;
}
}
Our LocationDecider must be registered to override the Graphene's one.
import org.jboss.arquillian.core.spi.LoadableExtension;
import org.jboss.arquillian.graphene.location.decider.HTTPLocationDecider;
import org.jboss.arquillian.graphene.spi.location.LocationDecider;
public class MyArquillianExtension implements LoadableExtension {
#Override
public void register(ExtensionBuilder builder) {
builder.override(LocationDecider.class, HTTPLocationDecider.class,
HTTPParameterizedLocationDecider.class);
}
}
MyArquillianExtension should be registered via SPI, so create a necessary file in your test resources, e.g. for me the file path is src/test/resources/META-INF/services/org.jboss.arquillian.core.spi.LoadableExtension. The file must contain a fully qualified class name of MyArquillianExtension.
And that's it. Now you can provide location parameters in a test.
import java.util.HashMap;
import java.util.Map;
import org.jboss.arquillian.graphene.page.InitialPage;
import org.jboss.arquillian.graphene.page.Location;
import org.junit.Test;
public class TestyTest implements LocationParameterProvider {
#Override
public Map<String, String> provideLocationParameters() {
Map<String, String> params = new HashMap<>();
params.put("mykey", "myvalue");
return params;
}
#Test
public void test(#InitialPage TestPage page) {
}
#Location("MyTestView.xhtml")
public static class TestPage {
}
}
I've focused on parameters specifically, but hopefully this paves the way for other dynamic path manipulations.
Of course this doesn't fix the Graphene.goTo API. This means before using goTo you have to provide parameters via this roundabout provideLocationParameters way. It's weird. You can make your own alternative API, goTo that accepts parameters, and modify your LocationDecider to support other ParameterProviders.

JUnit reporter does not show detailed report for each step in JBehave

I'm trying to set up JBehave for testing web services.
Template story is running well, but I can see in JUnit Panel only Acceptance suite class execution result. What I want is to see execution result for each story in suite and for each step in story like it is shown in simple JUnit tests or in Thucydides framework.
Here is my acceptance suite class: so maybe I Haven't configured something, or either I have to notate my step methods some other way, but I didn't find an answer yet.
package ***.qa_webservices_testing.jbehave;
import java.util.Arrays;
import java.util.List;
import java.util.Properties;
import org.jbehave.core.Embeddable;
import org.jbehave.core.configuration.Configuration;
import org.jbehave.core.configuration.MostUsefulConfiguration;
import org.jbehave.core.io.CodeLocations;
import org.jbehave.core.io.LoadFromClasspath;
import org.jbehave.core.io.StoryFinder;
import org.jbehave.core.junit.JUnitStories;
import org.jbehave.core.parsers.RegexPrefixCapturingPatternParser;
import org.jbehave.core.reporters.CrossReference;
import org.jbehave.core.reporters.Format;
import org.jbehave.core.reporters.StoryReporterBuilder;
import org.jbehave.core.steps.InjectableStepsFactory;
import org.jbehave.core.steps.InstanceStepsFactory;
import org.jbehave.core.steps.ParameterConverters;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import ***.qa_webservices_testing.jbehave.steps.actions.TestAction;
/**
* suite class.
*/
public class AcceptanceTestSuite extends JUnitStories {
private static final String CTC_STORIES_PATTERN = "ctc.stories";
private static final String STORY_BASE = "src/test/resources";
private static final String DEFAULT_STORY_NAME = "stories/**/*.story";
private static final Logger LOGGER = LoggerFactory.getLogger(AcceptanceTestSuite.class);
private final CrossReference xref = new CrossReference();
public AcceptanceTestSuite() {
configuredEmbedder()
.embedderControls()
.doGenerateViewAfterStories(true)
.doIgnoreFailureInStories(false)
.doIgnoreFailureInView(true)
.doVerboseFailures(true)
.useThreads(2)
.useStoryTimeoutInSecs(60);
}
#Override
public Configuration configuration() {
Class<? extends Embeddable> embeddableClass = this.getClass();
Properties viewResources = new Properties();
viewResources.put("decorateNonHtml", "true");
viewResources.put("reports", "ftl/jbehave-reports-with-totals.ftl");
// Start from default ParameterConverters instance
ParameterConverters parameterConverters = new ParameterConverters();
return new MostUsefulConfiguration()
.useStoryLoader(new LoadFromClasspath(embeddableClass))
.useStoryReporterBuilder(new StoryReporterBuilder()
.withCodeLocation(CodeLocations.codeLocationFromClass(embeddableClass))
.withDefaultFormats()
.withViewResources(viewResources)
.withFormats(Format.CONSOLE, Format.TXT, Format.HTML_TEMPLATE, Format.XML_TEMPLATE)
.withFailureTrace(true)
.withFailureTraceCompression(false)
.withMultiThreading(false)
.withCrossReference(xref))
.useParameterConverters(parameterConverters)
// use '%' instead of '$' to identify parameters
.useStepPatternParser(new RegexPrefixCapturingPatternParser(
"%"))
.useStepMonitor(xref.getStepMonitor());
}
#Override
protected List<String> storyPaths() {
String storiesPattern = System.getProperty(CTC_STORIES_PATTERN);
if (storiesPattern == null) {
storiesPattern = DEFAULT_STORY_NAME;
} else {
storiesPattern = "**/" + storiesPattern;
}
LOGGER.info("will search stories by pattern {}", storiesPattern);
List<String> result = new StoryFinder().findPaths(STORY_BASE, Arrays.asList(storiesPattern), Arrays.asList(""));
for (String item : result) {
LOGGER.info("story to be used: {}", item);
}
return result;
}
#Override
public InjectableStepsFactory stepsFactory() {
return new InstanceStepsFactory(configuration(), new TestAction());
}
}
my test methods look like:
Customer customer = new cutomer();
#Given ("I have Access to Server")
public void givenIHaveAccesToServer() {
customer.haveAccesToServer();
}
So they are notated only by JBehave notations.
The result returned in Junit panel is only like here (I yet have no rights to post images):

You should try this open source library:
https://github.com/codecentric/jbehave-junit-runner
It does exactly what you ask for :)

Yes, the codecentric runner works very nicely.
https://github.com/codecentric/jbehave-junit-runner

AnalyzerUtil error on Lucene

I'm learning to work with lucene. I wrote a simple program to test lucene analyzers like:
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.analysis.WhitespaceAnalyzer;
import org.apache.lucene.analysis.SimpleAnalyzer;
import org.apache.lucene.analysis.StopAnalyzer;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.util.Version;
import org.apache.lucene.wordnet.AnalyzerUtils;
import java.io.IOException;
public class AnalyzerDemo {
private static final String[] examples = {
"The quick brown fox jumped over the lazy dog",
"XY&Z Corporation - xyz#example.com"
};
private static final Analyzer[] analyzers = new Analyzer[] {
new WhitespaceAnalyzer(),
new SimpleAnalyzer(),
new StopAnalyzer(Version.LUCENE_30),
new StandardAnalyzer(Version.LUCENE_30)
};
public static void main(String[] args) throws IOException {
String[] strings = examples;
if (args.length > 0) {
strings = args;
}
for (String text : strings) {
analyze(text);
}
}
private static void analyze(String text) throws IOException {
System.out.println("Analyzing \"" + text + "\"");
for (Analyzer analyzer : analyzers) {
String name = analyzer.getClass().getSimpleName();
System.out.println(" " + name + ":");
System.out.print(" ");
AnalyzerUtils.displayTokens(analyzer, text);
System.out.println("\n");
}
}
}
but I got the following error:
AnalyzerDemo.java:7: package org.apache.lucene.wordnet does not exist
import org.apache.lucene.wordnet.AnalyzerUtils;
^
AnalyzerDemo.java:35: cannot find symbol
symbol : variable AnalyzerUtils
location: class AnalyzerDemo
AnalyzerUtils.displayTokens(analyzer, text);
^
Note: AnalyzerDemo.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
2 errors
I think library wordnet or AnalyzerUtils is not available. How can I install this part of lucene? Do you have any ideas? Why is that missing? I've installed lucene 3.5.0.

lucene-wordnet contrib module was removed in Lucene 3.4.0. AnalyzerUtils also doesn't exist, so you either have to get Lucene 3.3.0 or write your own for 3.5.0 based on this one.

In case of word-net, the word-net contrib was removed from lucene 3.4.0 and the functionality was merged with analyzers contrib. Point number 4 at: http://apache.spinellicreations.com/lucene/java/3.4.0/changes-3.4.0/Contrib-Changes.html#3.4.0.new_features.
Java docs can be found for the same at: https://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/all/org/apache/lucene/analysis/synonym/SynonymFilter.html

Instead of AnalyzerUtils.displayTokens(analyzer,text);
Use the function:
private static void displayTokens(Analyzer analyzer,String text) throws IOException
{
TokenStream stream=analyzer.tokenStream(null,new StringReader(text));
CharTermAttribute cattr = stream.addAttribute(CharTermAttribute.class);
stream.reset();
while (stream.incrementToken()){
System.out.print(cattr.toString()+" ");
}
stream.end();
stream.close();
}

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

StandardAnalyzer with stemming - lucene

Related

Error in BigQuery Snippets

Accessing TableRow columns in BigQuery Apache Beam

Arquillian Graphene #Location placeholder

JUnit reporter does not show detailed report for each step in JBehave

AnalyzerUtil error on Lucene

Categories

Resources