Lucene - apply TokenFilter to a String and get result

Lucene - apply TokenFilter to a String and get result - lucene

I want to debug Lucene token filters and see results. How is it possible to apply a token filter to a token stream in order to see result?
(using Lucene 4.10.3)
import java.io.IOException;
import java.io.StringReader;
import java.util.Iterator;
import org.apache.lucene.analysis.core.LowerCaseFilter;
import org.apache.lucene.analysis.standard.StandardTokenizer;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
public class TokenFilterExample {
public static void main(String[] args) throws IOException {
// 1] Create token stream
StringReader r = new StringReader("Hello World");
StandardTokenizer s = new StandardTokenizer(r);
// Create lower-case token filter
LowerCaseFilter f = new LowerCaseFilter(s);
// Print result
System.out.println(??????);
// close
f.close();
s.close();
}
}

The solution is this:
// Print result
f.reset();
Iterator it = f.getAttributeClassesIterator();
while (f.incrementToken()) {
System.out.println(f
.getAttribute(CharTermAttribute.class));
}

Related

Exception occurs while performing update operation using the java code in bigquery

I am using the jobs.query API to update the bigquery table records using java.
While running this as a part of stability tests for a longer time, we get an exception. Excpetion occurs almost after 5 hours, 11 hours...
The problem is, exception returns blank error message, blank cause, and blank stack trace. So, in this case, I am not able to find the actual cause of the error.
Request your response on this.
Thanks in advance!
package app;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.text.ParseException;
import com.google.api.client.googleapis.auth.oauth2.GoogleCredential;
import com.google.api.client.http.HttpTransport;
import com.google.api.client.http.javanet.NetHttpTransport;
import com.google.api.client.json.JsonFactory;
import com.google.api.client.json.jackson2.JacksonFactory;
import com.google.api.services.bigquery.Bigquery;
import com.google.api.services.bigquery.BigqueryScopes;
import com.google.api.services.bigquery.model.QueryRequest;
import com.google.api.services.bigquery.model.QueryResponse;
public class WhereCLause {
//Create Entry Point: Main function
public static void main(String[] args) throws IOException, ParseException {
String projectId = "projectId";
// Create a new Bigquery client authorized via Application Credentials.
Bigquery bigquery = createAuthorizedClient();
String query = "UPDATE datasetId.tableName SET Name = 'RefUpdate' WHERE ID = 001";
System.out.println(query);
executeQuery(query, bigquery, projectId);
}
public static Bigquery createAuthorizedClient() throws IOException {
String jsonFilePath="file.json";
// Create the credential
HttpTransport transport = new NetHttpTransport();
JsonFactory jsonFactory = new JacksonFactory();
InputStream credentialStream = new FileInputStream(new File(jsonFilePath));
GoogleCredential credential = GoogleCredential.fromStream(credentialStream, transport, jsonFactory);
if (credential.createScopedRequired()) {
credential = credential.createScoped(BigqueryScopes.all());
}
return new Bigquery.Builder(transport, jsonFactory, credential).setApplicationName("BigqueryApplication").build();
}
private static void executeQuery(String querySql, Bigquery bigquery, String projectId) throws IOException {
QueryResponse query = bigquery.jobs().query(projectId, new QueryRequest().setQuery(querySql).setUseLegacySql(true)).execute();
System.out.println(query.getNumDmlAffectedRows());
System.err.println(query.getErrors());
}
}

Paginating OpenLDAP with Apache Directory LDAP API over multiple connections

I've used the Apache Directory API to load in user data from an Active Directory. It's similar to the example code by 'oers' found in this stackoverflow question: Apache Directory LDAP - Paged searches
Some differences I have are:
I'm coding it with in Nashorn (Javascript running in Java)
I'm using the PagedResultsImpl class instead of PagedResultsDecorator
I'm saving the cookie as a string between calls by using a Base64 encoding of the byte[] cookie.
I'm using Apache Directory API with the following maven import:
<dependency>
<groupId>org.apache.directory.api</groupId>
<artifactId>api-all</artifactId>
<version>1.0.3</version>
</dependency>
Here's some of the important bits:
...
var pageCursor = ""; /** or some Base64 encoded byte[] cookie if I'm trying to start from where i left off last time. */
...
pagedSearchControl = new PagedResultsImpl();
pagedSearchControl.setSize(pageSize);
pagedSearchControl.setCookie(Base64.getEncoder().encodeToString(pageCursor.getBytes()));
...
var searchRequest = new SearchRequestImpl();
...
searchRequest.addControl(pagedSearchControl);
...
var cursor = new EntryCursorImpl(connection.search(searchRequest));
...
pagingResults = cursor.getSearchResultDone().getControl(PagedResults.OID);
if (pagingResults != null){
nextPageCursor = Base64.getEncoder().encodeToString(pagingResults.getCookie());
}
When run against Active Directory I can page all the users just fine. When I change my connection to point to an OpenLDAP directory I am able to search for users just fine, EXCEPT for when I set the page control cookie using a non-null value for pageCursor (one I got from base 64 encoding a previous call's cookie). Basically, I can't get the paged results from OpenLDAP. I get no results and there are no exceptions.
What is needed in order to get OpenLDAP to page properly?
Am I missing something with setting up page control in Apache Directory?
Is paging a setting in OpenLDAP I need to turn on?
[UPDATE] I Ported my code outside of Nashorn to just Java and now I can see it is because the paging cookie appears to be only valid within the same connection for OpenLDAP, but for Active Directory it can be different connections. You can see yourself with the code below.:
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Base64;
import java.util.Iterator;
import java.util.Properties;
import java.util.StringJoiner;
import javax.naming.ConfigurationException;
//imports for json
import org.json.simple.JSONArray;
import org.json.simple.JSONObject;
import org.json.simple.parser.JSONParser;
import org.json.simple.parser.ParseException;
//imports for LDAP
import org.apache.directory.api.ldap.model.cursor.EntryCursor;
import org.apache.directory.api.ldap.model.entry.Attribute;
import org.apache.directory.api.ldap.model.entry.DefaultEntry;
import org.apache.directory.api.ldap.model.entry.Entry;
import org.apache.directory.api.ldap.model.entry.Value;
import org.apache.directory.api.ldap.model.exception.LdapException;
import org.apache.directory.api.ldap.model.exception.LdapInvalidAttributeValueException;
import org.apache.directory.api.ldap.model.message.Control;
import org.apache.directory.api.ldap.model.message.ResultCodeEnum;
import org.apache.directory.api.ldap.model.message.SearchRequest;
import org.apache.directory.api.ldap.model.message.SearchRequestImpl;
import org.apache.directory.api.ldap.model.message.SearchResultDone;
import org.apache.directory.api.ldap.model.message.SearchScope;
import org.apache.directory.api.ldap.model.message.controls.AbstractControl;
import org.apache.directory.api.ldap.model.message.controls.PagedResults;
import org.apache.directory.api.ldap.model.message.controls.PagedResultsImpl;
import org.apache.directory.api.ldap.model.name.Dn;
import org.apache.directory.ldap.client.api.EntryCursorImpl;
import org.apache.directory.ldap.client.api.LdapConnection;
import org.apache.directory.ldap.client.api.LdapNetworkConnection;
//Nashorn
import delight.nashornsandbox.NashornSandbox;
import delight.nashornsandbox.NashornSandboxes;
public class Executor
{
public static void main( String[] args )
{
pagingTest();
}
private static void pagingTest(){
//ENTER YOUR CREDENTIALS
String server = "";
int port = 0;
String loginId = "";
String loginPassword = "";
String usersBaseDN = "";
String userObjectClass = "";
String base64cookie = null;
LdapNetworkConnection connection = new LdapNetworkConnection(server, port, true );
try{
connection.setTimeOut( 300 );
connection.bind(loginId,loginPassword);
/*First Pass*/
PagedResultsImpl pagedSearchControl = new PagedResultsImpl();
pagedSearchControl.setSize(5);
pagedSearchControl.setCookie(Base64.getDecoder().decode(""));
SearchRequestImpl searchRequest = new SearchRequestImpl();
searchRequest.setBase(new Dn(usersBaseDN));
searchRequest.setFilter("(objectClass=" + userObjectClass +")");
searchRequest.setScope(SearchScope.SUBTREE);
searchRequest.addAttributes("*");
searchRequest.addControl(pagedSearchControl);
EntryCursorImpl cursor = new EntryCursorImpl(connection.search(searchRequest));
while (cursor.next()) {
System.out.println("First Pass User: " + cursor.get().getDn());
}
PagedResults pagingResults = (PagedResults)cursor.getSearchResultDone().getControl(PagedResults.OID);
if (pagingResults != null){
byte[] cookie = pagingResults.getCookie();
if (cookie != null && cookie.length > 0) {
base64cookie = Base64.getEncoder().encodeToString(cookie);
System.out.println("First Pass Cookie: " + cookie);
}
}
cursor.close();
}catch(Exception e){
System.out.println(e);
}
finally {
//COMMENT THIS CODE BLOCK TO SEE IT WORKING (IT WILL USE THE SAME CONNECTION)
try {
connection.unBind();
connection.close();
} catch (Exception e) {
System.out.println(e);
}
connection = new LdapNetworkConnection( server, port, true );
}
try{
connection.setTimeOut( 300 );
connection.bind(loginId,loginPassword);
/*Second Pass*/
PagedResultsImpl secondPagedSearchControl = new PagedResultsImpl();
secondPagedSearchControl.setSize(5);
secondPagedSearchControl.setCookie(Base64.getDecoder().decode(base64cookie));
SearchRequestImpl secondSearchRequest = new SearchRequestImpl();
secondSearchRequest.setBase(new Dn(usersBaseDN));
secondSearchRequest.setFilter("(objectClass=" + userObjectClass +")");
secondSearchRequest.setScope(SearchScope.SUBTREE);
secondSearchRequest.addAttributes("*");
secondSearchRequest.addControl(secondPagedSearchControl);
EntryCursorImpl secondCursor = new EntryCursorImpl(connection.search(secondSearchRequest));
while (secondCursor.next()) {
System.out.println("Second Pass User: " + secondCursor.get().getDn());
}
secondCursor.close();
}catch(Exception e){
System.out.println(e);
}
finally {
try {
connection.unBind();
connection.close();
} catch (Exception e) {
System.out.println(e);
}
}
}
}
So I guess my new question is:
Is anybody aware of a workaround to allow OpenLDAP to accept
pagination cookies from previous connections?

Query Expansion lucene

I am new to lucene and I am trying to do query expansion.
I have referred to these two posts (first , second) and I've managed to reuse the code in a way that suits version 6.0.0, as the one in the previous is deprecated.
The issue is, either I'm not getting a results or I didn't access the results (expanded queries) appropriately.
Here is my code:
import com.sun.corba.se.impl.util.Version;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.Reader;
import java.io.StringReader;
import java.io.UnsupportedEncodingException;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import java.net.URLDecoder;
import java.net.URLEncoder;
import java.text.ParseException;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.Tokenizer;
import org.apache.lucene.analysis.core.LowerCaseFilter;
import org.apache.lucene.analysis.standard.ClassicTokenizer;
import org.apache.lucene.analysis.standard.StandardFilter;
import org.apache.lucene.analysis.synonym.SynonymFilter;
import org.apache.lucene.analysis.synonym.SynonymMap;
import org.apache.lucene.analysis.synonym.WordnetSynonymParser;
import org.apache.lucene.analysis.util.CharArraySet;
import org.apache.lucene.util.*;
public class Graph extends Analyzer
{
protected static TokenStreamComponents createComponents(String fieldName, Reader reader) throws ParseException{
System.out.println("1");
// TODO Auto-generated method stub
Tokenizer source = new ClassicTokenizer();
source.setReader(reader);
TokenStream filter = new StandardFilter( source);
filter = new LowerCaseFilter(filter);
SynonymMap mySynonymMap = null;
try {
mySynonymMap = buildSynonym();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
filter = new SynonymFilter(filter, mySynonymMap, false);
return new TokenStreamComponents(source, filter);
}
private static SynonymMap buildSynonym() throws IOException, ParseException
{ System.out.print("build");
File file = new File("wn\\wn_s.pl");
InputStream stream = new FileInputStream(file);
Reader rulesReader = new InputStreamReader(stream);
SynonymMap.Builder parser = null;
parser = new WordnetSynonymParser(true, true, new StandardAnalyzer(CharArraySet.EMPTY_SET));
System.out.print(parser.toString());
((WordnetSynonymParser) parser).parse(rulesReader);
SynonymMap synonymMap = parser.build();
return synonymMap;
}
public static void main (String[] args) throws UnsupportedEncodingException, IOException, ParseException
{
Reader reader = new FileReader("C:\\input.txt"); // here I have the queries that I want to expand
TokenStreamComponents TSC = createComponents( "" , new StringReader("some text goes here"));
**System.out.print(TSC); //How to get the result from TSC????**
}
#Override
protected TokenStreamComponents createComponents(String string)
{
throw new UnsupportedOperationException("Not supported yet."); //To change body of generated methods, choose Tools | Templates.
}
}
Please suggest ways to help me access the expanded queries!

So, are you just trying to figure out how to iterate through the terms from the TokenStreamComponents in your main method?
Something like this:
TokenStreamComponents TSC = createComponents( "" , new StringReader("some text goes here"));
TokenStream stream = TSC.getTokenStream();
CharTermAttribute termattr = stream.addAttribute(CharTermAttribute.class);
stream.reset();
while (stream.incrementToken()) {
System.out.println(termattr.toString());
}

Apache FOP the method newInstance(FopFactoryConfig) in the type FopFactory is not applicable for the arguments ()

I am getting the following error:
Exception in thread "main" java.lang.Error: Unresolved compilation problem: The method newInstance(FopFactoryConfig) in the type FopFactory is not applicable for the arguments ()
at fopdemo.fopvass.PDFHandler.createPDFFile(PDFHandler.java:42)
at fopdemo.fopvass.TestPDF.main(TestPDF.java:40)
This is my code:
package fopdemo.fopvass;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.net.URL;
import javax.xml.bind.JAXBContext;
import javax.xml.bind.JAXBException;
import javax.xml.bind.Marshaller;
import javax.xml.transform.Result;
import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerConfigurationException;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.TransformerFactoryConfigurationError;
import javax.xml.transform.dom.DOMResult;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.sax.SAXResult;
import javax.xml.transform.stream.StreamSource;
import org.apache.fop.apps.FOPException;
import org.apache.fop.apps.FOUserAgent;
import org.apache.fop.apps.Fop;
import org.apache.fop.apps.FopFactory;
import org.apache.fop.apps.MimeConstants;
public class PDFHandler {
public static final String EXTENSION = ".pdf";
public String PRESCRIPTION_URL = "template.xsl";
public String createPDFFile(ByteArrayOutputStream xmlSource, String templateFilePath) throws IOException {
File file = File.createTempFile("" + System.currentTimeMillis(), EXTENSION);
URL url = new File(templateFilePath + PRESCRIPTION_URL).toURI().toURL();
// creation of transform source
StreamSource transformSource = new StreamSource(url.openStream());
// create an instance of fop factory
FopFactory fopFactory = FopFactory.newInstance();
// a user agent is needed for transformation
FOUserAgent foUserAgent = fopFactory.newFOUserAgent();
// to store output
ByteArrayOutputStream pdfoutStream = new ByteArrayOutputStream();
StreamSource source = new StreamSource(new ByteArrayInputStream(xmlSource.toByteArray()));
Transformer xslfoTransformer;
try {
TransformerFactory transfact = TransformerFactory.newInstance();
xslfoTransformer = transfact.newTransformer(transformSource);
// Construct fop with desired output format
Fop fop;
try {
fop = fopFactory.newFop(MimeConstants.MIME_PDF, foUserAgent, pdfoutStream);
// Resulting SAX events (the generated FO)
// must be piped through to FOP
Result res = new SAXResult(fop.getDefaultHandler());
// Start XSLT transformation and FOP processing
try {
// everything will happen here..
xslfoTransformer.transform(source, res);
// if you want to save PDF file use the following code
OutputStream out = new java.io.FileOutputStream(file);
out = new java.io.BufferedOutputStream(out);
FileOutputStream str = new FileOutputStream(file);
str.write(pdfoutStream.toByteArray());
str.close();
out.close();
} catch (TransformerException e) {
e.printStackTrace();
}
} catch (FOPException e) {
e.printStackTrace();
}
} catch (TransformerConfigurationException e) {
e.printStackTrace();
} catch (TransformerFactoryConfigurationError e) {
e.printStackTrace();
}
return file.getPath();
}
public ByteArrayOutputStream getXMLSource(EmployeeData data) throws Exception {
JAXBContext context;
ByteArrayOutputStream outStream = new ByteArrayOutputStream();
try {
context = JAXBContext.newInstance(EmployeeData.class);
Marshaller m = context.createMarshaller();
m.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, Boolean.TRUE);
m.marshal(data, System.out);
m.marshal(data, outStream);
} catch (JAXBException e) {
e.printStackTrace();
}
return outStream;
}
}
This is where it is called:
package fopdemo.fopvass;
import java.io.ByteArrayOutputStream;
import java.util.ArrayList;
/**
* #author Debasmita.Sahoo
*
*/
public class TestPDF {
public static void main(String args[]){
System.out.println("Hi Testing");
ArrayList employeeList = new ArrayList();
String templateFilePath ="C:/Paula/Proyectos/fop/fopvass/resources/";
Employee e1= new Employee();
e1.setName("Debasmita1 Sahoo");
e1.setEmployeeId("10001");
e1.setAddress("Pune");
employeeList.add(e1);
Employee e2= new Employee();
e2.setName("Debasmita2 Sahoo");
e2.setEmployeeId("10002");
e2.setAddress("Test");
employeeList.add(e2);
Employee e3= new Employee();
e3.setName("Debasmita3 Sahoo");
e3.setEmployeeId("10003");
e3.setAddress("Mumbai");
employeeList.add(e3);
EmployeeData data = new EmployeeData();
data.setEemployeeList(employeeList);
PDFHandler handler = new PDFHandler();
try {
ByteArrayOutputStream streamSource = handler.getXMLSource(data);
handler.createPDFFile(streamSource,templateFilePath);
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}

The following line won't compile:
FopFactory fopFactory = FopFactory.newInstance();
The method newInstance() requires a parameter. From the documentation (provided by Mathias Muller), under 'Basic Usage', that parameter refers to a configuration file:
FopFactory fopFactory = FopFactory.newInstance(
new File( "C:/Temp/fop.xconf" ) );
You need to provide it a File object. Alternatively, it is also possible to create a FopFactory instance by providing a URI to resolve relative URIs in the input file (because SVG files reference other SVG files). As the documentation suggests:
FopFactory fopFactory = FopFactory.newInstance(
new File(".").toURI() );
The user agent is the entity that allows you to interact with a single rendering run, i.e. the processing of a single document. If you wish to customize the user agent's behaviour, the first step is to create your own instance of FOUserAgent using the appropriate factory method on FopFactory and pass that to the factory method that will create a new Fop instance.

FopFactory.newInstance() support
FopFactory.newInstance(File)
FopFactory.newInstance(Uri)
FopFactory.newInstance(Uri, InputStream)
FopFactory.newInstance(FopFactoryConfig)
I successfully implemented on Windows and Mac OS (i.e. a Unix\Linux like NFS) the following code
URI uri = new File(config.getRoot()).toPath().toUri();
FopFactory fopFactory = FopFactory.newInstance(uri);
On the other hand, this does not work (and I don't understand why)
URI uri = new File(config.getRoot()).toUri();

Parsing content from Word document using docx4j

Thanks to a previous answer, I'm now able to read my password-protected Word 2010 documents. (I have to translate them one by one from .doc to .docx. They go back to 1994, but that's okay.)
I wrote a simple Java class to get started:
package model.docx4j;
import model.JournalEntry;
import model.JournalEntryFactory;
import org.docx4j.openpackaging.exceptions.Docx4JException;
import org.docx4j.openpackaging.packages.OpcPackage;
import org.docx4j.openpackaging.parts.Parts;
import java.io.IOException;
import java.io.InputStream;
import java.security.GeneralSecurityException;
import java.util.LinkedList;
import java.util.List;
/**
* JournalEntryFactoryImpl using docx4j
* #author Michael
* #link
* #since 9/8/12 12:44 PM
*/
public class JournalEntryFactoryImpl implements JournalEntryFactory {
#Override
public List<JournalEntry> getEntries(InputStream inputStream, String password) throws IOException, GeneralSecurityException {
List<JournalEntry> journalEntries = new LinkedList<JournalEntry>();
if (inputStream != null) {
try {
OpcPackage opcPackage = OpcPackage.load(inputStream, password);
Parts parts = opcPackage.getParts();
} catch (Docx4JException e) {
LOGGER.error("Could not load document into docx4j", e);
throw new IOException(e);
}
}
return journalEntries;
}
}
And a JUnit test to drive it:
package model.docx4j;
import model.JournalEntry;
import model.JournalEntryFactory;
import org.docx4j.openpackaging.exceptions.Docx4JException;
import org.docx4j.openpackaging.packages.OpcPackage;
import org.docx4j.openpackaging.parts.Parts;
import java.io.IOException;
import java.io.InputStream;
import java.security.GeneralSecurityException;
import java.util.LinkedList;
import java.util.List;
/**
* JournalEntryFactoryImpl using docx4j
* #author Michael
* #link
* #since 9/8/12 12:44 PM
*/
public class JournalEntryFactoryImpl implements JournalEntryFactory {
#Override
public List<JournalEntry> getEntries(InputStream inputStream, String password) throws IOException, GeneralSecurityException {
List<JournalEntry> journalEntries = new LinkedList<JournalEntry>();
if (inputStream != null) {
try {
OpcPackage opcPackage = OpcPackage.load(inputStream, password);
Parts parts = opcPackage.getParts();
} catch (Docx4JException e) {
LOGGER.error("Could not load document into docx4j", e);
throw new IOException(e);
}
}
return journalEntries;
}
}
I put a breakpoint into the test to see what docx4j was doing once it read my document. I see a list of 8 parts, but I walked through the tree without finding the content.
Each document consists of a page with a date and content, but I can't find pages. Where do they live?

The main document content lives in the "main document part", which is often named "/word/document.xml".
The usual way to get it with docx4j is:
WordprocessingMLPackage wordMLPackage = (WordprocessingMLPackage)opcPackage;
MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();
but you'd expect your approach to work as well.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Lucene - apply TokenFilter to a String and get result - lucene

The solution is this: // Print result f.reset(); Iterator it = f.getAttributeClassesIterator(); while (f.incrementToken()) { System.out.println(f .getAttribute(CharTermAttribute.class)); }

Related

Exception occurs while performing update operation using the java code in bigquery

Paginating OpenLDAP with Apache Directory LDAP API over multiple connections

Query Expansion lucene

Apache FOP the method newInstance(FopFactoryConfig) in the type FopFactory is not applicable for the arguments ()

Parsing content from Word document using docx4j

Categories

Resources