lucene .net fuzzy multiple words

lucene .net fuzzy multiple words - lucene

I want combined fuzzy search on two terms , eg:- 'magikal~0.8' and 'mistery'~0.8, should return the results where words 'magical' and 'mystery' both are there.
My current code is as follows
Directory createIndex(DataTable table)
{
var directory = new RAMDirectory();
using (Analyzer analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30))
using (var writer = new IndexWriter(directory, analyzer, new IndexWriter.MaxFieldLength(1000)))
{
foreach (DataRow row in table.Rows)
{
var document = new Document();
document.Add(new Field("DishName", row["DishName"].ToString(), Field.Store.YES, Field.Index.ANALYZED));
document.Add(new Field("CustomisationID", row["CustomisationID"].ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED));
writer.AddDocument(document);
}
writer.Optimize();
writer.Flush(true, true, true);
}
return directory;
}
private DataTable SearchDishName(string textSearch)
{
string MatchingCutomisationIDs = "0"; //There is no Dish with ID zero, this is just to easen the coding..
var ds = new DataSet();
ds.ReadXml(System.Web.HttpContext.Current.Server.MapPath("~/App_data/MyDataset.xml"));
DataTable Sample = new DataTable();
Sample = ds.Tables[0];
var table = Sample.Clone();
var Index = createIndex(Sample);
using (var reader = IndexReader.Open(Index, true))
using (var searcher = new IndexSearcher(reader))
{
using (Analyzer analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30))
{
var queryParser = new QueryParser(Lucene.Net.Util.Version.LUCENE_30, "DishName", analyzer);
var collector = TopScoreDocCollector.Create(1000, true);
try
{
var query = queryParser.Parse(textSearch);
searcher.Search(query, collector);
}
catch
{ }
var matches = collector.TopDocs().ScoreDocs;
foreach (var item in matches)
{
var id = item.Doc;
var doc = searcher.Doc(id);
var row = table.NewRow();
row["CustomisationID"] = doc.GetField("CustomisationID").StringValue;
table.Rows.Add(row);
}
}
}
return table;
}
Also another issue with this code is that. If I run a normal query with an && operator it still don't get the accurate results
eg :- results for the query "magical"&&"mystery", includes cases where only "magical" or only "mystery" is there.

Related

Fill XFA without breaking usage rights

I have an XFA form that I can successfully fill in by extracting the XML modifying and writing back. Works great if you have the full Adobe Acrobat, but fails with Adobe Reader. I have seen various questions on the same thing with answers but they were some time ago so updating an XFA that is readable by Adobe Reader may no longer be doable?
I use this code below and I've utilised the StampingProperties of append as in the iText example but still failing. I'm using iText 7.1.15.
//open file and write to temp one
PdfDocument pdf = new(new PdfReader(FileToProcess), new PdfWriter(NewPDF), new StampingProperties().UseAppendMode());
PdfAcroForm form = PdfAcroForm.GetAcroForm(pdf, true);
XfaForm xfa = form.GetXfaForm();
XElement node = xfa.GetDatasetsNode();
IEnumerable<XNode> list = node.Nodes();
foreach (XNode item in list)
{
if (item is XElement element && "data".Equals(element.Name.LocalName))
{
node = element;
break;
}
}
XmlWriterSettings settings = new() { Indent = true };
using XmlWriter writer = XmlWriter.Create(XMLOutput, settings);
{
node.WriteTo(writer);
writer.Flush();
writer.Close();
}
//We now how to strip an extra xfa line if updating
if(update)
{
string TempXML= CSTrackerHelper.MakePath($"{AppContext.BaseDirectory}Temp", $"{Guid.NewGuid()}.XML");
StreamReader fsin = new(XMLOutput);
StreamWriter fsout = new(TempXML);
string linedata = string.Empty;
int cnt = 0;
while (!fsin.EndOfStream)
{
if (cnt != 3 && linedata != string.Empty)
{
fsout.WriteLine(linedata);
}
linedata = fsin.ReadLine();
cnt++;
}
fsout.Close();
fsin.Close();
XMLOutput = TempXML;
}
xlogger.Info("Populating pdf fields");
//Now loop through our field data and update the XML
XmlDocument xmldoc = new();
xmldoc.Load(XMLOutput);
XmlNamespaceManager xmlnsManager = new(xmldoc.NameTable);
xmlnsManager.AddNamespace("xfa", #"http://www.xfa.org/schema/xfa-data/1.0/");
string[] FieldValues;
string[] MultiNodes;
foreach (KeyValuePair<string, DocumentFieldData> v in DocumentData.FieldData)
{
if (!string.IsNullOrEmpty(v.Value.Field))
{
FieldValues = v.Value.Field.Contains(";") ? v.Value.Field.Split(';') : (new string[] { v.Value.Field });
foreach (string FValue in FieldValues)
{
XmlNodeList aNodes;
if (FValue.Contains("{"))
{
aNodes = xmldoc.SelectNodes(FValue.Substring(0, FValue.LastIndexOf("{")), xmlnsManager);
if (aNodes.Count > 1)
{
//We have a multinode
MultiNodes = FValue.Split('{');
int NodeIndex = int.Parse(MultiNodes[1].Replace("}", ""));
aNodes[NodeIndex].InnerText = v.Value.Data;
}
}
else
{
aNodes = xmldoc.SelectNodes(FValue, xmlnsManager);
if (aNodes.Count >= 1)
{
aNodes[0].InnerText = v.Value.Data;
}
}
}
}
}
xmldoc.Save(XMLOutput);
//Now we've updated the XML apply it to the pdf
xfa.FillXfaForm(new FileStream(XMLOutput, FileMode.Open, FileAccess.Read));
xfa.Write(pdf);
pdf.Close();
FYI I've also tried to set a field directly also with the same results.
PdfReader preader = new PdfReader(source);
PdfDocument pdfDoc=new PdfDocument(preader, new PdfWriter(dest), new StampingProperties().UseAppendMode());
PdfAcroForm pdfForm = PdfAcroForm.GetAcroForm(pdfDoc, true);
XfaForm xform = pdfForm.GetXfaForm();
xform.SetXfaFieldValue("VRM[0].CoverPage[0].Wrap2[0].Table[0].CSID[0]", "Test");
xform.Write(pdfForm);
pdfDoc.Close();
If anyone has any ideas it would be appreciated.
Cheers

I ran into a very similar issue. I was attempting to auto fill an XFA that was password protected while not breaking the certificate or usage rights (it allowed filling). iText7 seems to have made this not possible for legal/practical reasons, however it is still very much possible with iText5. I wrote the following working codeusing iTextSharp (C# version if iText5):
using iTextSharp.text;
using iTextSharp.text.pdf;
string pathToRead = "/Users/home/Desktop/c#pdfParser/encrypted_empty.pdf";
string pathToSave = "/Users/home/Desktop/c#pdfParser/xfa_encrypted_filled.pdf";
string data = "/Users/home/Desktop/c#pdfParser/sample_data.xml";
FillByItextSharp5(pathToRead, pathToSave, data);
static void FillByItextSharp5(string pathToRead, string pathToSave, string data)
{
using (FileStream pdf = new FileStream(pathToRead, FileMode.Open))
using (FileStream xml = new FileStream(data, FileMode.Open))
using (FileStream filledPdf = new FileStream(pathToSave, FileMode.Create))
{
PdfReader.unethicalreading = true;
PdfReader pdfReader = new PdfReader(pdf);
PdfStamper stamper = new PdfStamper(pdfReader, filledPdf, '\0', true);
stamper.AcroFields.Xfa.FillXfaForm(xml, true);
stamper.Close();
pdfReader.Close();
}
}

PdfStamper stamper = new PdfStamper(pdfReader, filledPdf, '\0', true)
you have to use this line.

itextsharp setting background opacity not working

I have the following code to set background color to one of my fields but for some reason I can not control the transparency of the background. Can someone please take a look at it and let me know what I am doing wrong. Thanks.
using (var newFileStream = new FileStream(fileNameNew, FileMode.Create))
{
// Open existing PDF
var pdfReader = new PdfReader(existingFileStream);
// PdfStamper, which will create
var stamper = new PdfStamper(pdfReader, newFileStream);
var form = stamper.AcroFields;
var fieldKeys = form.Fields.Keys;
foreach (string fieldKey in fieldKeys)
{
if (fieldKey.Equals("Title"))
{
form.SetFieldProperty(fieldKey, "bgcolor", new BaseColor(System.Drawing.Color.FromArgb(20,225,160,0)),null);
form.SetField(fieldKey, "Test");
}
else
{
form.SetField(fieldKey, "REPLACED!");
}
}
stamper.FormFlattening = true;
stamper.Close();
pdfReader.Close();
}

Just in case anyone else faces the same problem
var pdfReader = new PdfReader(existingFileStream);
// PdfStamper, which will create
var stamper = new PdfStamper(pdfReader, newFileStream);
var form = stamper.AcroFields;
var fieldKeys = form.Fields.Keys;
PdfContentByte background;
foreach (string fieldKey in fieldKeys)
{
if (fieldKey.Equals("Title"))
{
//form.SetFieldProperty(fieldKey, "bgColor", new BaseColor(System.Drawing.Color.FromArgb(125,225,160,0)),null);
form.SetField(fieldKey, "Test");
IList<AcroFields.FieldPosition> pos = form.GetFieldPositions(fieldKey);
PdfContentByte contentBtye = stamper.GetOverContent(pos[0].page);
contentBtye.SetColorFill(new BaseColor(200, 50, 50));
contentBtye.Rectangle(pos.FirstOrDefault().position.Left, pos.FirstOrDefault().position.Bottom, pos.FirstOrDefault().position.Width, pos.FirstOrDefault().position.Height);
PdfGState state = new PdfGState();
state.FillOpacity = 0.5f;
contentBtye.SetGState(state);
contentBtye.Fill();
}
else
{
form.SetField(fieldKey, "REPLACED!");
}
}

SQL (lite) replace string value in all columns with formatting

I'm trying to update a SQLite database with a column "URL" with a specific value. The old url format is "http://www.blz.nl/voster/boek/9789056628512" and the new one is "http://www.blz.nl/voster/boekwinkel/zoeken/?q=9780789327505&oa=true&searchin=taal&taal=dut".
What I am trying to do is replace the url to the new format but keep the value of the 'Q' param from the old url. What is the fastest / best way to achieve this for all columns? I have no idea how to approach this using an SQL query.

You want the SQL?
Basically, you read in the table, search for your keyword, replace it, and write that data back to the database.
Assuming your database name is "mydatabase.db", your code would look something like this (untested):
using Devart.Data.SQLite;
public void Convert()
{
using (var conn = new SQLiteConnection("DataSource=mydatabase.db"))
{
var boek = "//boek//";
var boekwinkel = "//boekwinkel//";
var boek_len = boek.Length;
var table = new DataTable();
var sqlCmd = "SELECT * FROM TableName1;";
conn.Open();
// Read the data into the DataTable:
using (var cmd = new SQLiteCommand(sqlCmd, conn))
{
// The DataTable is ReadOnly!
table.Load(cmd.ExecuteReader());
}
// Use a new SQLiteCommand to write the data.
using (var cmd = new SQLiteCommand("UPDATE TableName1 SET URL=#NEW_URL WHERE URL=#URL;", conn))
{
cmd.Parameters.Add("#URL", SQLiteType.Text, 20);
cmd.Parameters.Add("#NEW_URL", SQLiteType.Text, 20);
foreach (DataRow row in table.Rows)
{
var url = String.Format("{0}", row["URL"]).Trim();
cmd.Parameters["#URL"].SQLiteValue = row["URL"];
if (!String.IsNullOrEmpty(url))
{
var index = url.IndexOf(boek);
if (-1 < index)
{
var first = url.Substring(0, index);
var middle = boekwinkel;
var last = url.Substring(index + boek_len);
var new_url = String.Format("{0}{1}{2}&oa=true&searchin=taal&taal=dut", first, middle, last);
// Place a break point after String.Format.
// Make sure the information being written is correct.
cmd.Parameters["#NEW_URL"].SQLiteValue = new_url;
cmd.ExecuteNonQuery();
}
}
}
}
conn.Close();
}
}

iTextSharp XmlWorker: right-to-left

After a long time of struggling with this not-so-friendly API, I am finally making progress, but now I've come to a really nasty issue.. I have placed "dir" attributes in various places in my html with the value being "rtl".. but the XMLWorker doesn't seem to respect that at all. Does anyone know of a workaround? Here's my method:
public static void Generate<TModel>(string templateFile, TModel model, string outputFile, IEnumerable<string> fonts)
{
string template = System.IO.File.ReadAllText(templateFile);
string result = Razor.Parse(template, model);
using (var fsOut = new FileStream(outputFile, FileMode.Create, FileAccess.Write))
using (var stringReader = new StringReader(result))
{
var document = new Document();
var pdfWriter = PdfWriter.GetInstance(document, fsOut);
pdfWriter.InitialLeading = 12.5f;
document.Open();
var xmlWorkerHelper = XMLWorkerHelper.GetInstance();
var cssResolver = new StyleAttrCSSResolver();
//cssResolver.AddCss(cssFile);
var xmlWorkerFontProvider = new XMLWorkerFontProvider();
foreach (string font in fonts)
{
xmlWorkerFontProvider.Register(font);
}
var cssAppliers = new CssAppliersImpl(xmlWorkerFontProvider);
var htmlContext = new HtmlPipelineContext(cssAppliers);
htmlContext.SetTagFactory(Tags.GetHtmlTagProcessorFactory());
PdfWriterPipeline pdfWriterPipeline = new PdfWriterPipeline(document, pdfWriter);
HtmlPipeline htmlPipeline = new HtmlPipeline(htmlContext, pdfWriterPipeline);
CssResolverPipeline cssResolverPipeline = new CssResolverPipeline(cssResolver, htmlPipeline);
XMLWorker xmlWorker = new XMLWorker(cssResolverPipeline, true);
XMLParser xmlParser = new XMLParser(xmlWorker);
xmlParser.Parse(stringReader);
document.Close();
}
}

I've created a sample to show how to parse and display RTL data using XMLWorker. Download it from here.

storing the RDBMS table data through lucene in text file on hard disk

I want to store the RDBMS sql query result of 3.2 million records in text file using lucene and then search that.
[I saw the example here how to integrate RAMDirectory into FSDirectory in lucene
[1]: how to integrate RAMDirectory into FSDirectory in lucene .I have this piece of code that is working for me
public class lucetest {
public static void main(String args[]) {
lucetest lucetestObj = new lucetest();
lucetestObj.main1(lucetestObj);
}
public void main1(lucetest lucetestObj) {
final File INDEX_DIR = new File(
"C:\\Documents and Settings\\44444\\workspace\\lucenbase\\bin\\org\\lucenesample\\index");
try {
Connection conn;
Class.forName("com.teradata.jdbc.TeraDriver").newInstance();
conn = DriverManager.getConnection(
"jdbc:teradata://x.x.x.x/CHARSET=UTF16", "aaa", "bbb");
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_35);
// Directory index = new RAMDirectory(); //To use RAM space
Directory index = FSDirectory.open(INDEX_DIR); //To use Hard disk,This will not consume RAM
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_35,
analyzer);
IndexWriter writer = new IndexWriter(index, config);
// IndexWriter writer = new IndexWriter(INDEX_DIR, analyzer, true);
System.out.println("Indexing to directory '" + INDEX_DIR + "'...");
lucetestObj.indexDocs(writer, conn);
writer.optimize();
writer.close();
System.out.println("pepsi");
lucetestObj.searchDocs(index, analyzer, "india");
try {
conn.close();
} catch (SQLException e2) {
// TODO Auto-generated catch block
e2.printStackTrace();
}
} catch (Exception e) {
e.printStackTrace();
} finally {
}
}
void indexDocs(IndexWriter writer, Connection conn) throws Exception {
String sql = "select id, name, color from pet";
String queryy = " SELECT CFMASTERNAME, " + " ULTIMATEPARENTID,"
+ "ULTIMATEPARENT, LONG_NAMEE FROM XCUST_SRCH_SRCH"
+ "sample 100000;";
Statement stmt = conn.createStatement();
ResultSet rs = stmt.executeQuery(queryy);
int kk = 0;
while (rs.next()) {
Document d = new Document();
d.add(new Field("id", rs.getString("CFMASTERID"), Field.Store.YES,
Field.Index.NO));
d.add(new Field("name", rs.getString("CFMASTERNAME"),
Field.Store.YES, Field.Index.ANALYZED));
d.add(new Field("color", rs.getString("LONG_NAMEE"),
Field.Store.YES, Field.Index.ANALYZED));
writer.addDocument(d);
}
if (rs != null) {
rs.close();
}
}
void searchDocs(Directory index, StandardAnalyzer analyzer,
String searchstring) throws Exception {
String querystr = searchstring.length() > 0 ? searchstring : "lucene";
Query q = new QueryParser(Version.LUCENE_35, "name", analyzer)
.parse(querystr);
int hitsPerPage = 10;
IndexReader reader = IndexReader.open(index);
IndexSearcher searcher = new IndexSearcher(reader);
TopScoreDocCollector collector = TopScoreDocCollector.create(
hitsPerPage, true);
searcher.search(q, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;
System.out.println("Found " + hits.length + " hits.");
for (int i = 0; i < hits.length; ++i) {
int docId = hits[i].doc;
Document d = searcher.doc(docId);
System.out.println((i + 1) + ".CFMASTERNAME " + d.get("name")
+ " ****LONG_NAMEE**" + d.get("color") + "****ID******"
+ d.get("id"));
}
searcher.close();
}
}
How to format this code so that instead of RAM directory the sql result table is saved on the hard disk at the path specified.I am not able to work out a solution.My requirement is that this table data stored on disk through lucene returns result very fast.Hence i am saving data on disk through lucene which is indexed.

Directory index = FSDirectory.open(INDEX_DIR);
You mention saving the sql result to a text file, but that is unnecessary overhead. As you iterate through a ResultSet, save the rows directly to the Lucene index.
As an aside, not that it matters much, but naming your local var (final or otherwise) in all caps is against the convention. Use camelCase. All caps is only for class-level constants (static final members of a class).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

lucene .net fuzzy multiple words - lucene

Related

Fill XFA without breaking usage rights

itextsharp setting background opacity not working

SQL (lite) replace string value in all columns with formatting

iTextSharp XmlWorker: right-to-left

storing the RDBMS table data through lucene in text file on hard disk

Categories

Resources