Just struggelin with the new Elastic version 8.1 cause have no idea how to migrate a simple serch into the new api:
The old approach:
...
final BoolQueryBuilder boolQuery = QueryBuilders.boolQuery();
boolQuery.must(QueryBuilders.matchQuery("name", "Paul"));
final SearchRequest searchRequest = new SearchRequest("/myIndex");
searchRequest.searchType(SearchType.DEFAULT).source(searchSourceBuilder);
final SearchResponse sr = client.search(searchRequest, RequestOptions.DEFAULT);
...
This was straight forward. Could find out the hits etc. from the response and everything worked fine.
But with the new approach i cannot handle the searchrequest as it was before:
final SearchResponse<TDocument> sr1 = client.search(searchRequest, Class<TDocument> tDocumentClass);
For what is the TDocument type needed and how to define my model that it will fit into the API?
After going deeper into some tutorials i finally find a solution for a simple problem. See:
final Query query = new Query.Builder().term(t -> t.field("field_looking_for").value(v -> v.stringValue("value_looking_for"))).build();
final Query bool_query = new Query.Builder().bool(t -> t.must(query)).build();
try {
final SearchResponse<ChannelDto> result = client.search(s -> s.query(bool_query).index(INDEX).size(100), Dto.class);
if (!result.hits().hits().isEmpty()) {
for (final Hit<Dto> data_result : result.hits().hits()) {
Dto dto = data_result.source();
return dto;
}
}
} catch (ElasticsearchException | IOException e) {
throw new NoDataException("IO Exception", e);
}
Related
I am trying to pass a java String to Apache StandardQueryparser to get the Querynode.
Input - "fq=section:1"
All I need is section:1 in FILTER clause in QueryNode. This looks pretty straightforward but it throws
INVALID_SYNTAX_CANNOT_PARSE: Syntax Error, cannot parse fq=section:1:
Use a ConstantScoreQuery. It won't affect it score, and is the same was as the fq parameter is implemented in Solr:
public Weight createWeight(IndexSearcher searcher, ScoreMode scoreMode, float boost) throws IOException {
// SolrRequestInfo reqInfo = SolrRequestInfo.getRequestInfo();
if (!(searcher instanceof SolrIndexSearcher)) {
// delete-by-query won't have SolrIndexSearcher
return new BoostQuery(new ConstantScoreQuery(q), 0).createWeight(searcher, scoreMode, 1f);
}
SolrIndexSearcher solrSearcher = (SolrIndexSearcher)searcher;
DocSet docs = solrSearcher.getDocSet(q);
// reqInfo.addCloseHook(docs); // needed for off-heap refcounting
return new BoostQuery(new SolrConstantScoreQuery(docs.getTopFilter()), 0).createWeight(searcher, scoreMode, 1f);
}
I can create views using the Java API, but the query needs to be legacy sql:
public void createView(String dataSet, String viewName, String query) throws Exception {
Table content = new Table();
TableReference tableReference = new TableReference();
tableReference.setTableId(viewName);
tableReference.setDatasetId(dataSet);
tableReference.setProjectId(projectId);
content.setTableReference(tableReference);
ViewDefinition view = new ViewDefinition();
view.setQuery(query);
content.setView(view);
LOG.debug("View to create: " + content);
try {
if (tableExists(dataSet, viewName)) {
bigquery.tables().delete(projectId, dataSet, viewName).execute();
}
} catch (Exception e) {
LOG.error("Could not delete table", e);
}
bigquery.tables().insert(projectId, dataSet, content).setProjectId(projectId).execute();
}
Is there a way to create a BQ view with standard sql using the API?
You need to set setUseLegacySQL(false) on the ViewDefinition object.
[..]
ViewDefinition view = new ViewDefinition();
view.setQuery(query);
view.setUseLegacySql(false); //<-- you're missing this
content.setView(view);
[..]
See the API docs here.
With the new API, you can use the "#standardSQL" notation to avoid the default LegacySQL setting (the setUseLegacySql() method no longer exists in the new API).
Here is an example:
ViewDefinition tableDefinition = ViewDefinition.newBuilder("#standardSQL\n WITH A AS (select 1 as foo) SELECT * from A").build();
I trying to test synonym Graph but doesn't work how as i expected and don't return the correct answer.
This is the createComponents custom method in my custom analyzer
public SuggestAnalizer(SynonymMap synonymMap) {
this.synonymMap = synonymMap;
this.stopList = Collections.emptyList();
}
#Override
protected TokenStreamComponents createComponents(String s) {
Tokenizer tokenizer = new StandardTokenizer();
TokenStream tokenStream = new SynonymGraphFilter(tokenizer, synonymMap, true);
tokenStream = new FlattenGraphFilter(tokenStream);
return new TokenStreamComponents(tokenizer, tokenStream);
}
This is the Test code
String entrada = "ALCALDE KOOPER";
String salida = "FEDERICO COOPER";
SynonymMap.Builder builder = new SynonymMap.Builder(true);
CharsRef input = SynonymMap.Builder.join(entrada.split(" "), new CharsRefBuilder());
CharsRef output = SynonymMap.Builder.join(salida.split(" "), new CharsRefBuilder());
builder.add(output, input, true);
suggestAnalizer = new SuggestAnalizer(builder.build());
TokenStream tokenStream = suggestAnalizer.tokenStream("field", entrada2);
assertTokenStreamContents(tokenStream, new String[]{
"FEDERICO"
});
assertAnalyzesTo(suggestAnalizer, entrada, new String[]{
"FEDERICO"
});
I expected the assertion work changing the "ALCALDE KOOPER" string for her synonym "FEDERICO COOPER", but this doesn't happen.
Someone know where is my error or why my code doesn't work?
The reason for these behaviour, is that you add multiword synonym from
FEDERICO COOPER to ALCALDE KOOPER (in the code, I saw adding link from output (which is FEDERICO COOPER) to input, which is ALCALDE KOOPER)
Later you're testing synonyms for a token FEDERICO, but there is no connection from it, that's why you get empty response and an assertion error. So, if you will add synonyms from FEDERICO to ALCALDE.
But, even if you will do that, there is a mistake with building SynonymMap, you used ignoreCase param with true value, which means:
case-folds input for matching with Character#toLowerCase(int).
Note, if you set this to true, it's your responsibility to lowercase the input entries when you create the SynonymMap
So, rather you need to use lowercased version in testing or set ignoreCase to false
You could check a reference code here
I make use of lucene 4.0 to build my search engine. I need to define a Filter when searching. The filter code like this will work fine:
public DocIdSet getDocIdSet(AtomicReaderContext context, Bits acceptDocs)
throws IOException {
String[] target_real_names = {"eMule"};
OpenBitSet obs = new OpenBitSet(context.reader().maxDoc());
for(String target_real_name : target_real_names){
TermQuery query=new TermQuery(new Term(Fields.PROJECT_REAL_NAME,target_real_name));
IndexSearcher indexSearcher=new IndexSearcher(context.reader());
TopDocs docs=indexSearcher.search(query,context.reader().maxDoc());
ScoreDoc[] scoreDocs=docs.scoreDocs;
if (scoreDocs.length==1) {
obs.set(scoreDocs[0].doc);
}
}
return obs;
}
but the code like this fail to work:
public DocIdSet getDocIdSet(AtomicReaderContext context, Bits acceptDocs)
throws IOException {
OpenBitSet obs = new OpenBitSet(context.reader().maxDoc());
String[] target_real_names = {"eMule"};
for(String target_real_name : target_real_names){
DocsEnum de = context.reader().termDocsEnum(new Term(Fields.PROJECT_REAL_NAME, target_real_name));
if(de.nextDoc()!= -1){
obs.set((long)de.docID());
}
}
return obs;
}
In this piece of code, the de will be null, I don't know why. Any one can help me?
Look at the javadoc of termDocsEnum() --> This will return null if either the field or term does not exist.
This means that it's totally normal that de is null when your term target_real_name does not exist.
I can't get this to work with Lucene 4.0 and its new features... Could somebody please help me??
I have crawled a bunch of html-documents from the web. Now I would like to count the number of distinct words of every Document.
This is how I did it with Lucene 3.5 (for a single document. To get them all I loop over all documents... every time with a new RAMDirectory containing only one doc) :
Analyzer analyzer = some Lucene Analyzer;
RAMDirectory index;
index = new RAMDirectory();
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_35, analyzer);
String _words = new String();
// get somehow the String containing a certain text:
_words = doc.getPageDescription();
try {
IndexWriter w = new IndexWriter(index, config);
addDoc(w, _words);
w.close();
} catch (IOException e) {
e.printStackTrace();
} catch (Exception e) {
e.printStackTrace();
}
try {
// System.out.print(", count Terms... ");
IndexReader reader = IndexReader.open(index);
TermFreqVector[] freqVector = reader.getTermFreqVectors(0);
if (freqVector == null) {
System.out.println("Count words: ": 0");
}
for (TermFreqVector vector : freqVector) {
String[] terms = vector.getTerms();
int[] freq = vector.getTermFrequencies();
int n = terms.length;
System.out.println("Count words: " + n);
....
How can I do this with Lucene 4.0?
I'd prefer to do this using a FSDirectory instead of RAMDirectory however; I guess this is more performant if I have a quite high number of documents?
Thanks and regards
C.
Use the Fields/Terms apis.
See especially the example 'access term vector fields for a specific document'
Seeing as you are looping over all documents, if your end goal is really something like the average number of unique terms across all documents, keep reading to the 'index statistics section'. For example in that case, you can compute that efficiently with #postings / #documents: getSumDocFreq()/maxDoc()
http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/index/package-summary.html#package_description