Untokenized field in Lucene search

Untokenized field in Lucene search - lucene

I have stored a field in index file which is untokenized. When I try to get that field value from the index file I'm not able to do get it.
Note: I have another one untokenized field, there I'm able to get that value, the data stored in this field are not having any white spaces among this.
Example: (smith,david,walter,john)... But the one I'm asking is having white spaces among it. Example: (david smith,mark john,bill man)...
I don't think this might be the reason.
Your help is appreciated.

Remember that tokenization or lack of it has to be done both while indexing and searching.
Did you try using a keywordTokenizer in the search side?

Related

Search in all columns in crate

As elastic search has _all field I am not able to find anything regarding that in cratedb. SO do we need to maintain our own analyzed field for that purpose or does crate provide something in built?

The _all field is a special catch-all field which concatenates the values of all of the other fields into one big string, using space as a delimiter, which is then analyzed and indexed, but not stored. This means that it can be searched, but not retrieved.
The _all field allows you to search for values in documents without knowing which field contains the value. This makes it a useful option when getting started with a new dataset
refer : https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-all-field.html

We don't have something similar to that, so you'd need to add it to the query or maintain a dedicated column.

SQL Parse NVARCHAR Field

I am loading data from Excels into database on SQL Server 2008. There is one column which is in nvarchar data type. This field contains the data as
Text text text text text text text text text text.
(ABC-2010-4091, ABC-2011-0586, ABC-2011-0587, ABC-2011-0604)
Text text text text text text text text text text.
(ABC-2011-0562, ABC-2011-0570, ABC-2011-0575, ABC-2011-0588)
so its text with many sentences of this kind.
For each row I need to get the data ABC-####-####, respectivelly I only need the last part. So e.g. for ABC-2010-4091 I need to obtain 4091. This number I will need to join to other table. I guess it would be enough to get the last parts of the format ABC-####-####, then I should be able to handle the request.
So the example of given above, the result should be 4091, 0586, 0587, 0604, 0562, 0570, 0575, 0588 in the row instead of the whole nvarchar value field.
Is this possible somehow? The text in the nvarchar field differ, but the text format (ABC-####-####) I want to work with is still the same. Only the count of characters for the last part may vary so its not only 4 numbers, but could be 5 or more.
What is the best approach to get these data? Should I parse it in SSIS or on the SQL server side with SQL Query? And how?
I am aware this is though task. I appreciate every help or advice how to deal with this. I have not tried anything yet as I do not know where to start. I read articles about SQL parsing, but I want to ask for best approach to deal with this task.

Stackoverflow is about programming.
Sit down and start programming.
Ok, seriously. That is string parsing and the last part in brackets with multiple fields means no bulk import, it is not a standard CSV file.
Either you use SSIS in SQL Server and program the parsing there or.... you write a program for that.
String maniupation in SQL is the worst part of the language and I would avoid it.
So, yes, sit down and program a routine. Probable the fastest way.

If I understand correctly, "ABS-####-####" will be the value coming through in the column and the numeric part is variable in length.
If that is the case, maybe this will work.
Use a "Derived Column" transformation.
Lets say we call "ABC-####-####" = Column1
SUBSTRING("Column1",(FINDSTRING("Column1","-",2)+1),LEN(Column1)-(FINDSTRING("Column1","-",2)))
If I am not mistaken, that should give you the last # values in a new column no matter how long that value is.
HTH

I have worked this problem out with the following guides:
Split Multi Value Column into Multiple Records &
Remove Multiple Spaces with Only One Space

Luke Where are my field values?

I've used Luke like four times per year for the past three years. I only break it out when I need it. One concept I've never understood is why only certain fields' values are displayed. I can query these "empty" fields for expected values and get the expected results, but Luke never displays these. I assume I'm missing something fundamental and obvious, but it's not so obvious to me.
Example Search tab:
Example Documents tab:

When a program creates a Lucene Document, it might tell Lucene whether to store the value of the field or not. See, for example, the stored argument to the StringField constructor. If the value is not stored then it can be searched on, but the original bytes of the value are not saved in the index, since they are not required nor used by the search.
A typical pattern with, say, http://www.elasticsearch.org/ is to store the original JSON in a single field and not to store the actually indexed fields. That way the application working with the retrieved data might use it's native data format and does not have to be aware of the Lucene and it's flat key-value Document.

adding several lines of text to an Access database

Would like to know what the best option is for adding several lines of text to a field of an access database(more than 255 characters of text). The text would be a recipe for example or the method of making the dish. Is it possible to add a textfile to a field of an access database for example?
Kind regards

You can use the Memo field type. It stores up to 63,999 characters.
Here are the details:
http://office.microsoft.com/en-au/access-help/field-data-types-available-in-access-mdb-HP005238518.aspx
OLE Object also let you store Word Documents and so on. So you can try using it if you are still short with 63,999 characters.
Hope this helps you

Full Text Searching for single characters

I have a table with a TEXT column where the contents is just strings of CSV numbers. Example ",1,76,77,115," Each string can have an arbitrary number of numbers.
I am trying to set up Full Text Indexing so that I can search this column rapidly. This works great. Instead of running queries with
where MY_COL LIKE '%,77,%' and MY_COL LIKE '%,115,%'
I can do
where CONTAINS(MY_COL,'77 and 115')
However, when I try to search for a single character it doesn't work.
where CONTAINS(MY_COL,'1')
But I know that there should be records returned! I quickly found that I need to edit the Noise file and rebuild the index. But even after doing that it still doesn't work.

Working with relational databases that way is going to hurt.
Use a proper schema. Either store the values in different rows or use an array datatype for the column.
That will make solving the problem trivial.

I fixed my own problem, although I'm not exactly sure what fixed it.
I dropped my table and populated a new one (my program does batch processing) and created a new Full Text Index. Maybe I wasn't being patient enough to allow the indexing to fully rebuild.

Agreed. How does 12,15,33 not return that record for a search for 1 with fulltext? Use an actual table schema to accomplish this.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Untokenized field in Lucene search - lucene

Remember that tokenization or lack of it has to be done both while indexing and searching. Did you try using a keywordTokenizer in the search side?

Related

Search in all columns in crate

SQL Parse NVARCHAR Field

Luke Where are my field values?

adding several lines of text to an Access database

Full Text Searching for single characters

Categories

Resources