I just wanted to know if the following is possible using fuzzy match in excel:
I have a database in excel that I am building a search engine for. The database is in a table format. My data involves 200 hyperlinks of excel files, so there’s 200 rows of data. So my data has specific data about each of these excel files like the topic of what these files contain. I want to build a search engine so someone can search for a specific topic.
I want the search engine to involve fuzzy matching so something can be typed wrong and a result can still be found from the dynamic table/database. It’s dynamic since there might be more hyperlinks added to the database in excel. I just want to know if this kind of search engine is possible because I have not been able to find any answer on this.
Excel supports fuzzy lookup. This is an AddIn
https://www.microsoft.com/en-us/download/details.aspx?id=15011
Related
I need to somehow detect duplicate documents (.doc, .pdf, etc) which are stored in BLOB field of my table.
I've been looking into Oracle Text functionality, but failed to find something that can help me in achieving my goal.
In fact, i need something like UTL_MATCH functionality but with possible comparison of entire document in file.
Can someone provide me any tips on how to do that?
EDIT:
I'm not searching for absolutely same duplicates, which can be done through file comparison, I need to analyse text in documents, that's why I'm trying to use Oracle Text.
So I have a VBA macro which I put together quite recently, and does an adequate job, if painfully slow. However, I have been told to port it to VB.net (various reasons, a main one being that the Software team don't want to be stuck supporting VBA macros if I move on).
A key part of the process in VBA was running a couple of lookups on another sheet in the same workbook.
The table in question is ~10,000 values, and looks something like:
Location | Ref-Code | Type
Aberdare | ABDARE | ST
I can put the columns in any order, but what I need to do is check that a value is found in Ref-Code, and if it is return Location and Type.
So, first sub-question: is SQLite the right tool for doing this? Would something else be more sensible for looking up values in a persistent, rarely-altered 10,000ish row table from VB.net?
If SQLite is the right tool, are there any good tutorials to take me through how to connect to and query an SQLite database in VB.Net? I haven't been able to find one yet.
Thanks in advance.
Why not just use an xml file to store the lookups. There are loads of easy ways to parse xml files in VB and this way you don't have to learn how to connect to SQLlite at all.
The xml file can also be maintained by someone who doesn't know anything about databases.
I would like to populate a Word document with data from our MS SQL database.
Is this possible, and if yes how?
I have done it various ways in the past. It depends whether the user initiates the action from OUTSIDE of Microsoft Word or from INSIDE Microsoft Word.
From INSIDE Microsoft Word, you can use one of the following techniques:
Open a template with placeholders and use VBA or VSTO to iterate over them using a data source using copy & paste. Please note that tables are also a crime here. This approach resembles approach 1 below with "OUTSIDE". Disadvantage that it is relatively slow (copy & paste) and that Microsoft Word autocorrect likes to kick in when least needed.
Open a template with placeholders and use VBA or VSTO to parse the XML representation and then replace these. It is faster, but harder to write. Especially since the XML representation can contain XML fragments within the placeholders (such as '<<PUT_<xxx/>IT_HERE>>' and more complex cases). Also, you need to make sure you keep a valid XML document and well balanced.
From OUTSIDE Microsoft Word (such as web interface) you can use one of the following techniques:
Store a template somewhere using RTF (which is far easier to process than Word's own structure). Put '<<PLACEHOLDER-FOR-NAME>>' or similar easily recognized texts where you like to replace it. When user requests the Word document, fetch the RTF, fetch the data, replace the placeholders, server RTF to user. RTF has some restrictions, but some advantages. Advantages are: ease of creating new templates and also works with Microsoft Wordpad and other Office packages. Disadvantages are that tables are a real mess to process and that not all Microsoft Word constructs are possible. Repeating rows in a table are even less recommended. High volume can be an issue.
Use a reporting package that happens to also output docx, doc or RTF documents. Write a report. In general perfectly suited for high volume. Less suited if you want the end user to type a major amount of additional text, since reporting packages typically work based on pages instead on a flowing text, where sometimes an explicit or implicit page break is inserted. But if you only need the end user to enter one or two sentences in addition, it is sufficient.
Fat-client. Put the SQL data somewhere. Open Word. Read the data and see further techniques for from INSIDE Microsoft Word.
If you need to fill a Word document from SQL Server (or any other database or data platform) , I can suggest the free edition of Invantive Composition for filling Word documents from the database (please note that I've been involved with that product). It opens templates and merges them from within Word, but is more targeted at non-developers; just specify the template and datablocks (possibly nested) and publish. Developers can only add some C# in plugins. I think it is a good product when you have MANY templates (over 50) because it scales easier.
You may use microsoft query to fetch data from SQL database to your document, this video may be usefull
https://vimeo.com/83983247
You could also try using MS-Excel as it binds to XML better than Word does.
It's easy too to make Excel produce 'Word' styled output.
I am using Lucene.NET 2.9 with one of my projects. I am using Lucene to create indexes for documents and search on those documents. A field in my document is text heavy and I have stored that into my MS SQL Database. So basically I search via lucene on its indexes and then fetch complete documents from MS SQL database.
The problem I am facing is that I want to highlight my search query terms in results. For that I am using FastVectorHighlighter. Now this particular highlighter required Lucence DocId and field to highlight fields. The problem is that this particular text heavy field since is not stored in lucene database, is not highlighted in my search results.
Any suggestion on how to accomplish same. I either add the same field to my lucene database. It will resolve the problem but would make my database very heavy. Secondly if there is some alternative method to highlight the text it will give me very high flexibility.
Thank you for reading question,
Naveen
if you dont want to store the text in the Lucene index, you should use the Highlighter contrib.
Latest sources for it can be grabbed at https://svn.apache.org/repos/asf/incubator/lucene.net/trunk/src/contrib/Highlighter/
I have three databases that all have the contents of several web pages in them. What would be the best way to go about searching all three and having the most relevant web page at the top of the search results?
The only way I can think of is break down content by word count and/or creating a complex set of search rules to give one content priority over another. This might be more trouble than what it's worth, but I was wondering if anybody knows a way or product out there that would be able to help me.
To further support Ivans answer above Lucene is the way to go. You haven't mentioned what platform you're on so I'll point out that you can use a .NET port of this too.
If you do use Lucene there is a very good book from Manning on the subject which I recommend you look at.
When it comes to populating your index, you have a couple of choices. For starters you can just dump all of your text into the index and allow the engine to just search on it. However, I'd recommend adding fixed fields to your index which will allow you to support things such as partitioned searches or searches against those fields only.
To explain, lets say you have a field for the website. Then you can partition your index by restricting the index search to those documents that have that website in that field.
The other process is to extract points of interest from your document and allow searches on those without searching the entire index entry. Your mileage may vary with this as the lucene engine is very well written so it may simply allow you to collect your searches into more logical units which helps you with your solution.
I've done this myself and it helps when answering management questions about what exactly is searched and indexed.
HTH!
If you're using MS SQL Server then the full text search can return a ranking for you. I haven't used it, so you'll need to check the documentation or online for specifics.