Migrate lucene FST files from 5.1.0 to 8.9.0 - lucene

I have files with FST's created with lucene 5.1.0.
After upgrading to lucene 8.9.0 I get exception when I am trying to read FST from file:
org.apache.lucene.index.IndexFormatTooOldException: Format version is not supported (resource org.apache.lucene.store.InputStreamDataInput#34ce8af7): 4 (needs to be between 6 and 7). This version of Lucene only supports indexes created with release 6.0 and later.
Is there any way to upgrade old FST files to new format?

I solved it this way.
Write all content from FST to text file:
public static <T> void writeToTextFile(FST<T> fst, Path filePath) throws IOException {
try (BufferedWriter writer = Files.newBufferedWriter(filePath)) {
BytesRefFSTEnum<T> fstEnum = new BytesRefFSTEnum<>(fst);
while (fstEnum.next() != null) {
BytesRefFSTEnum.InputOutput<T> inputOutput = fstEnum.current();
writer.write(inputOutput.input.utf8ToString() + "\t" + inputOutput.output.toString() + "\n");
}
}
}
Change lucene version to new and read content from file:
public static <T> FST<T> readFromTextFile(Path filePath, Outputs<T> outputs, Function<String, T> fromString) throws IOException {
Builder<T> builder = new Builder<>(FST.INPUT_TYPE.BYTE1, outputs);
IntsRefBuilder scratchInts = new IntsRefBuilder();
try (BufferedReader reader = Files.newBufferedReader(filePath)) {
String[] split = reader.readLine().split("\t");
BytesRef scratchBytes = new BytesRef(split[0]);
builder.add(Util.toIntsRef(scratchBytes, scratchInts), fromString.apply(split[1]));
}
return builder.finish();
}

Related

Support to convert the HTML to PDF in Xamarin Forms

With the reference of following StackOverflow suggestion,
Convert HTML to PDF in .NET
I tried to convert the HTML file to PDF using HtmlRenderer.PdfSharp but unfortunately it shows compatible error like below,
HtmlRendererCore.PdfSharpCore 1.0.1 is not compatible with netstandard2.0 (.NETStandard,Version=v2.0). Package HtmlRendererCore.PdfSharpCore 1.0.1 supports: netcoreapp2.0 (.NETCoreApp,Version=v2.0)
HtmlRenderer.Core 1.5.0.5 is not compatible with monoandroid90 (MonoAndroid,Version=v9.0). Package HtmlRenderer.Core 1.5.0.5 supports:
- net20 (.NETFramework,Version=v2.0)
- net30 (.NETFramework,Version=v3.0)
- net35-client (.NETFramework,Version=v3.5,Profile=Client)
- net40-client (.NETFramework,Version=v4.0,Profile=Client)
- net45 (.NETFramework,Version=v4.5)
HtmlRendererCore.PdfSharpCore 1.0.1 is not compatible with monoandroid90 (MonoAndroid,Version=v9.0). Package HtmlRendererCore.PdfSharpCore 1.0.1 supports: netcoreapp2.0 (.NETCoreApp,Version=v2.0)
And I tried with wkhtmltopdf too but it throws similar error in android and other platform projects.
My requirement is to convert the HTML file to PDF file only (no need to view the PDF file, just to save it in local path).
Can anyone please provide suggestions?
Note : Need open source suggestion :)
Awaiting for your suggestions !!!
Support to convert the HTML to PDF in Xamarin Forms
You can read the HTML as a stream and store it into local like below,
public static class FileManager
{
public static async Task<MemoryStream> DownloadFileAsStreamAsync(string url)
{
try
{
var stream = new MemoryStream();
using (var httpClient = new HttpClient())
{
var downloadStream = await httpClient.GetStreamAsync(new Uri(url));
if (downloadStream != null)
{
await downloadStream.CopyToAsync(stream);
}
}
return stream;
}
catch (Exception exception)
{
return null;
}
}
public static async Task<bool> DownloadAndWriteIntoNewFile(string url, string fileName)
{
var stream = await DownloadFileAsStreamAsync(url);
if (stream == null || stream.Length == 0)
return false;
var filePath = GetFilePath(fileName);
if (!File.Exists(filePath))
return false;
File.Delete(filePath);
// Create file.
using (var createdFile = File.Create(filePath))
{
}
// Open and write into file.
using (var openFile = File.Open(filePath, FileMode.Open, FileAccess.ReadWrite))
{
stream.WriteTo(openFile);
}
return true;
}
public static string GetFilePath(string fileName)
{
var filePath = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.LocalApplicationData), fileName);
return filePath;
}
public static void WriteAsText(string filePath, string contents)
{
File.WriteAllText(filePath, contents);
}
public static string ReadAsText(string filePath)
{
return File.ReadAllText(filePath);
}
}
You can read a stored pdf file and displayed using webview like below,
private async void HtmlToPDF()
{
await FileManager.DownloadAndWriteIntoNewFile("https://www.google.co.in/?gws_rd=ssl", "SavePDF.pdf");
var filePath = FileManager.GetFilePath("SavePDF.pdf");
var pdfString = FileManager.ReadAsText(filePath);
var webView = new WebView
{
Source = new HtmlWebViewSource
{
Html = pdfString
}
};
this.Content = webView;
}
And the output below,
Likewise, you can save HTML as PDF and do what you want..
you can use the HtmlToPdfConverter
private void ConvertUrlToPdf()
{
try {
String serverIPAddress = serverIP.Text;
uint serverPortNumber = uint.Parse (serverPort.Text);
// create the HTML to PDF converter object
HtmlToPdfConverter htmlToPdfConverter = new HtmlToPdfConverter (serverIPAddress, serverPortNumber);
// set service password if necessary
if (serverPassword.Text.Length > 0)
htmlToPdfConverter.ServicePassword = serverPassword.Text;
// set PDF page size
htmlToPdfConverter.PdfDocumentOptions.PdfPageSize = PdfPageSize.A4;
// set PDF page orientation
htmlToPdfConverter.PdfDocumentOptions.PdfPageOrientation = PdfPageOrientation.Portrait;
// convert the HTML page from given URL to PDF in a buffer
byte[] pdfBytes = htmlToPdfConverter.ConvertUrl (urlToConvert.Text);
string documentsFolder = Environment.GetFolderPath (Environment.SpecialFolder.MyDocuments);
string outPdfFile = System.IO.Path.Combine (documentsFolder, "EvoHtmlToPdf.pdf");
// write the PDF buffer in output file
System.IO.File.WriteAllBytes (outPdfFile, pdfBytes);
// open the PDF document in the default PDF viewer
UIDocumentInteractionController pdfViewer = UIDocumentInteractionController.FromUrl (Foundation.NSUrl.FromFilename (outPdfFile));
pdfViewer.PresentOpenInMenu (this.View.Frame, this.View, true);
} catch (Exception ex) {
UIAlertView alert = new UIAlertView ();
alert.Title = "Error";
alert.AddButton ("OK");
alert.Message = ex.Message;
alert.Show ();
}
}
another
you can see thisurl

Read resource file from inside SonarQube Plugin

I am developing a plugin using org.sonarsource.sonarqube:sonar-plugin-api:6.3. I am trying to access a file in my resource folder. The reading works fine in unit testing, but when it is deployed as a jar into sonarqube, it couldn't locate the file.
For example, I have the file Something.txt in src/main/resources. Then, I have the following code
private static final String FILENAME = "Something.txt";
String template = FileUtils.readFile(FILENAME);
where FileUtils.readFile would look like
public String readFile(String filePath) {
try {
return readAsStream(filePath);
} catch (IOException ioException) {
LOGGER.error("Error reading file {}, {}", filePath, ioException.getMessage());
return null;
}
}
private String readAsStream(String filePath) throws IOException {
try (InputStream inputStream = Thread.currentThread().getContextClassLoader().getResourceAsStream(filePath)) {
if (inputStream == null) {
throw new IOException(filePath + " is not found");
} else {
return IOUtils.toString(inputStream, StandardCharsets.UTF_8);
}
}
}
This question is similar with reading a resource file from within a jar. I also have tried with /Something.txt and Something.txt, both does not work.If I put the file Something.txt in the classes folder in sonarqube installation folder, the code will work.
Try this:
File file = new File(getClass().getResource("/Something.txt").toURI());
BufferredReader reader = new BufferedReader(new FileReader(file));
String something = IOUtils.toString(reader);
Your should not use getContextClassLoader(). see Short answer: never use the context class loader!

JAX-RS 2.0 MULTIPART_FORM_DATA file upload not library specific

I need to create a JAX-RS 2.0 client that posts a file and a couple of parameters using MULTIPART_FORM_DATA content type. (Don't need the service, just the client) I’ve seen some examples that depend on an specific implementation, like Jersey or RESTEasy, but I’d like not to bind my code to any... in particular, to Apache CXF (I am using WAS Liberty Profile). Any ideas on how to do it? Do I have to stick to some specific classes? If so, how can I do it using Apache CXF 3.0 (Liberty uses CXF for JAX-RS 2.0)
Thanks
[I currently cannot comment under the already written answer]
If someone is searching for the maven dependency of IMultipartBody from the answer of Anatoly:
<dependency>
<groupId>com.ibm.websphere.appserver.api</groupId>
<artifactId>com.ibm.websphere.appserver.api.jaxrs20</artifactId>
<version>1.0.39</version>
<scope>provided</scope>
</dependency>
Thanks to andymc12 from https://github.com/OpenLiberty/open-liberty/issues/11942#issuecomment-619996093
You can use this example how to implement it by using jax-rs 2.0 feature: https://www.ibm.com/support/knowledgecenter/SSD28V_8.5.5/com.ibm.websphere.wlp.nd.doc/ae/twlp_jaxrs_multipart_formdata_from_html.html this is almost working example (some statements should be wrapped in try-catch block, but you'll see when'll post it to IDE.
package com.example.jaxrs;
#POST
#Consumes("multipart/form-data")
#Produces("multipart/form-data")
public Response postFormData(IMultipartBody multipartBody) {
List <IAttachment> attachments = multipartBody.getAllAttachments();
String formElementValue = null;
InputStream stream = null;
for (Iterator<IAttachment> it = attachments.iterator(); it.hasNext();) {
IAttachment attachment = it.next();
if (attachment == null) {
continue;
}
DataHandler dataHandler = attachment.getDataHandler();
stream = dataHandler.getInputStream();
MultivaluedMap<String, String> map = attachment.getHeaders();
String fileName = null;
String formElementName = null;
String[] contentDisposition = map.getFirst("Content-Disposition").split(";");
for (String tempName : contentDisposition) {
String[] names = tempName.split("=");
formElementName = names[1].trim().replaceAll("\"", "");
if ((tempName.trim().startsWith("filename"))) {
fileName = formElementName;
}
}
if (fileName == null) {
StringBuffer sb = new StringBuffer();
BufferedReader br = new BufferedReader(new InputStreamReader(stream));
String line = null;
try {
while ((line = br.readLine()) != null) {
sb.append(line);
}
} catch (IOException e) {
e.printStackTrace();
} finally {
if (br != null) {
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
formElementValue = sb.toString();
System.out.println(formElementName + ":" + formElementValue);
} else {
//handle the file as you want
File tempFile = new File(fileName);
...
}
}
if (stream != null) {
stream.close();
}
return Response.ok("test").build();
}

How to extract text from PDFs using a PIG UDF and Apache Tika?

I'm attempting to write a PIG eval function (UDF) to extract text from pdf files using Apache Tika. However, my function only writes 0 or 1 bytes to output whenever I try to run the function. How could I fix my code?
public class ExtractTextFromPDFs extends EvalFunc<String> {
#Override
public String exec(Tuple input) throws IOException {
String pdfText;
if (input == null || input.size() == 0 || input.get(0) == null) {
return "N/A";
}
DataByteArray dba = (DataByteArray)input.get(0);
InputStream is = new ByteArrayInputStream(dba.get());
ContentHandler contenthandler = new BodyContentHandler();
Metadata metadata = new Metadata();
Parser pdfparser = new AutoDetectParser();
try {
pdfparser.parse(is, contenthandler, metadata, new ParseContext());
} catch (SAXException | TikaException e) {
e.printStackTrace();
}
pdfText = contenthandler.toString();
//close the input stream
if(is != null){
is.close();
}
return pdfText;
}
}
I run the code using 'c = foreach b generate ExtractTextFromPDFs(content);' where b is a pdf and content is a bytearray.

Reading content of a JAR file (at runtime)? [duplicate]

This question already has answers here:
How to list the files inside a JAR file?
(17 answers)
Closed 8 years ago.
I have read the posts:
Viewing contents of a .jar file
and
How do I list the files inside a JAR file?
But I, sadly, couldn't find a good solution to actually read a JAR's content (file by file).
Furthermore, could someone give me a hint, or point to a resource, where my problem is discussed?
I just could think of a not-so-straight-forward-way to do this:
I could somehow convert the list of a JAR's resources to a list of
inner-JAR URLs, which I then could open using openConnection().
You use JarFile to open a Jar file. With it you can get ZipEntry or JarEntry (they can be seen as the same thing) by using 'getEntry(String name)' or 'entires'. Once you get an Entry, you can use it to get InputStream by calling 'JarFile.getInputStream(ZipEntry ze)'. Well you can read data from the stream.
Here is the complete code which reads all the file contents inside the jar file.
public class ListJar {
private static void process(InputStream input) throws IOException {
InputStreamReader isr = new InputStreamReader(input);
BufferedReader reader = new BufferedReader(isr);
String line;
while ((line = reader.readLine()) != null) {
System.out.println(line);
}
reader.close();
}
public static void main(String arg[]) throws IOException {
JarFile jarFile = new JarFile("/home/bathakarai/gold/click-0.15.jar");
final Enumeration<JarEntry> entries = jarFile.entries();
while (entries.hasMoreElements()) {
final JarEntry entry = entries.nextElement();
if (entry.getName().contains(".")) {
System.out.println("File : " + entry.getName());
JarEntry fileEntry = jarFile.getJarEntry(entry.getName());
InputStream input = jarFile.getInputStream(fileEntry);
process(input);
}
}
}
}
Here is how I read it as a ZIP file,
try {
ZipInputStream is = new ZipInputStream(new FileInputStream("file.jar"));
ZipEntry ze;
byte[] buf = new byte[4096];
int len;
while ((ze = is.getNextEntry()) != null) {
System.out.println("----------- " + ze);
len = ze.getSize();
// Dump len bytes to the file
...
}
is.close();
} catch (Exception e) {
e.printStackTrace();
}
This is more efficient than JarFile approach if you want decompress the whole file.