Is it Possible to Update a Lucene Document using its Document ID? - lucene

A ScoreDoc[] array contains all the document ids from a search. I would like to use these document ids to update a single document. In this particular instance I cannot uniquely identify the row I wish to update, as the given terms will result in matching multiple documents.
Imagine a query where 1:a, 2:b and the following documents are returned
1 2 3 4 5 6
doc 1: a b c d e f
doc 2: a b g h i j
doc 3: a b k l m n
I am basically making an update to fields 3 and 4, but want to leave 5 and 6 intact.
Currently I can grab these rows, make the updates I want, but I can't figure out a way to update them in the index.
An indexWriter.updateDocuments(...) or an indexwriter.DeleteDocuments(...) will result in document 1, 2 3 being deleted.
Since I have the documentId, I assume there is a way for me to update the index with it.

Lucene doesn't allow the updating of fields in a document. It is strictly a delete/add mechanism.
A document's docId can be change during optimization, merging, etc. so relying on that to always be the same isn't something you want to do. You should put your own field into the document that won't change over time and use that instead.

There is a method to delete by docid: IndexWriter.tryDeleteDocument. Having deleted the document, you can then add the new one, which, as stated by others, is how Lucene executes an update.
The documentation linked above provides some interesting information on why it's called tryDeleteDocument

Related

<SQL>Extract data from table which is txt format (contain 1 fields only)

In our exsiting system we have a table to store the reported user info (let's say UserInfoRaw), this table contains 1 fields (detail) only , the sample data would be sth like below:
^StartNewUser
^UserName
Simon
^EnableFacebook
Y
^EnableTwitter
N
^EndNewUser
^StartNewUser
^UserName
Vicky
^EnableFacebook
N
^EndNewUser
Currently we need to convert this format to a query-able table, lets say "User-info" which contain below 3 fields , the output should be
UserName facebook twitter
==================================================
Simon Y N
Vicky N N
The Constraint is
1. I do know the tage fields i need to extract (said , ^EnableFacebook is a tag name that i know , can use for selection)
2. We're extract by user level, for each user they MUST have ^StartNewUser/^EndNewUser in the txt , this is pre-assumption.
3. The attribute tag may not exist for some cases (eg , Vicky's ^EnableTwitter tag) , it should treat this field as N while extraction.
4. We can use pure SQL here only as this is an interim solution for our MI Team , they can run SQL only and currently we can't do any program release to automate this process at this moment.
Currently we have come up a solution by RRN but need many interim table
1: Produce OUT1 which include the row number for start/end for each user
SELECT RRN(A) ,detail from UserInfoRaw A where A.detail in ('^StartNewUser' , '^EndNewUser')
OUT1 :
1 ^StartNewUser
8 ^EndNewUser
9 ^StartNewUser
14 ^EndNewUser
2: Produce OUT2 which include the row number for user name
Select RRN(A) ,detail from UserInfoRaw A where RRN(A) IN
(select RRN(B)+1 from UserInfoRaw B where B.detail = '^UserName')
OUT2 :
3 Simon
11 Vicky
3: Produce JOIN12 which include the row mapping for ^StartNewUser / ^UserName
Select MAX(A.row) as startRow , B.row as nameRow from OUT1 A,OUT2 B
where A.detail = '^StartNewUser' AND A.row <B.row
GROUP BY B.row
order by A.row
JOIN12 :
1 3
9 11
4: Join 3 table by the row of ^startNewUser to get the 1 field mapping
Select C.startRow ,A.detail , C.nameRow,B.detail from OUT1 A, OUT2 B, JOIN12 C where A.row=C.startRow and B.row=C.nameRow
Result :
1 ^startNewUser 3 Simon
9 ^startNewUser 11 Vicky
By this approach we can produce a 1 field mapping , and using similar step we can get all the result field table we want.
But we have 10+ fields to extract (mayeb more if business request) , so we're seek help here to see if we have better idea for this case. Thanks!
(ps: if you're a AS400 guy and you know how to produce result by wrkqry that would be the best :) you know what MI team im referring to... really mess..)
By your own admission, you have a text file, not a table.
SQL wasn't designed to deal with files (text or otherwise), it was designed to deal with databases, containing tables and relations.
Therefore, don't use SQL statements for this, process it as a text file. It's going to be faster and simpler. My default assumption is that a traditional RPGLE program doing READ is going to beat any attempt to do this sort of processing in SQL, simply because this is the type of workload it was designed for. Or use any other language that can process files.
(It would be easier to do this in SQL with a stored procedure and firing up a cursor, but it would be unwieldy at best, because SQL lacks many of the features makes doing this in a 'normal' programming language so much easier, like local, private functions)
tl;dr
I have a hammer, what's the best way to use it to cut a board in half?

Storing data about objects with a variable number of ordered subparts in Access Database

The situation: I have a database storing biological specimen data. One table contains data about each specimen. Each specimen has between 1 and 8 parts, which are ordered.
I would like to enumerate each subpart in a query, using the specimen id and the number of parts. So if I have 2 specimens, A and B, and A has 2 parts and B has 3 parts, I want the result:
Parts:
A - 1
A - 2
B - 1
B - 2
B - 3
I realize that this is probably a trivial task, but I don't know the correct terminology to talk about it in a way that help pages and Google will understand. Thank you.
Edit to add thoughts: If I were dealing with something like this in a non-SQL context, I'd use a for loop to iterate the enumeration process over each specimen, but I don't understand how to implement anything remotely similar in SQL.
You mentioned "main table" which implies there's some other table for the sub parts. What you're after is likely a simple JOIN:
SELECT
*
FROM
maintable
INNER JOIN
subtable
ON
subtable.mainid = maintable.id
If you want an exact query, post a screenshot of your database tables and their column names and any relationships

SQL Server Multiple Likes

I have an unusual question that seems simple but has me stumped in a SQL Server stored procedure.
I have 2 tables as described below.
tblMaster
ID, CommitDate, SubUser, OrigFileName
Sample data
ID CommitDate SubUser OrigFileName
----------------------------------------
1 2014-10-07 Test1 Test1.pdf
2 2014-10-08 Test2 Test2.pdf
3 2014-10-09 Test3 Test3.pdf
The above table is basically the header table that tracks the committed files. In addition to this, we have a details table with the following structure.
tblIndex
ID, FileID (Linking column to the header row described above), Word
Sample data:
1. 1, 1, Oil
2. 2, 1, oil
3. 3, 2, oil
4. 4, 2, tank
5. 5, 3, tank
The above rows represent the words that we want to search on and if a certain criteria matches return the corresponding filename/header row ID. What I would love to figure out to do is if I do a search for
One word (i.e. "oil"), then the system should respond with all the files that meet the criteria (easiest case and figured out)
If more than one word is searched for (i.e. "oil" and "tank"), then we should only see the second file since it is the only one that has both oil and tank as its key words.
Tried using a LIKE "%oil%" AND LIKE "%tank%" and that resulted in no rows being created since one value can't be both oil and tank.
Tried doing a LIKE "%oil%" OR LIKE "%tank%" but I get files 1, 2, and 3 since the OR is inclusive of all the other rows.
One last thing, I recognize I could just do a search for the first term and then save the results into a temp table and then search for the second term in that second table and I will get what I am looking for. The problem with that is that I don't exactly know how many items will be searched for. I don't want to have to create a structure where I am constantly having to store data into another temp table if someone does a search for 6 "keywords".
Any help/ideas will be much appreciated.
try this ! slightly differing from the previous answer
SELECT distinct FileID,COUNT(distinct t.word) FROM tblIndex t
WHERE t.Word LIKE '%oil%' OR t.Word LIKE '%tank%'
GROUP BY FileID
HAVING COUNT(distinct t.word) > 1
One simple option would be to do something like this :
SELECT FileID
FROM tblIndex t
WHERE t.Word LIKE '%oil%' OR t.Word LIKE '%tank%'
GROUP BY FileID
HAVING COUNT(*) > 1
This assume you do not have duplicate in your tblIndex.
I'm also unsure whether you really need the like or not. According to your sample data you don't, a basic comparison would be way more efficient and would avoid possible collisions.

Postgres text search on multiple rows

I have a table called 'exclude' that contains hashtags:
-------------
id | tag
-------------
1 #oxford
2 #uk
3 #england
-------------
I have another table called 'post':
-----------------------------------------------
id | tags | text
1 #oxford #funtimes Sitting in the sun
2 #oz Beach fun
3 #england Milk was a bad choice
-----------------------------------------------
In order to do a text search on the posts tags I've been running a query like follows:
SELECT * FROM post WHERE to_tsvector(tags) ## plainto_tsquery('mysearchterm')
However, I now want to be able to exclude all posts where some or all of the tags are in my exclude table. Is there any easy way to do this in SQL/Postgres?
I tried converting the tags row into one column, and using this term within the plainto_tsquery function but it doesn't work (I don't know how to do a text search 'not equal' to either, hence the logic is actual wrong, albeit on the right lines in my mind):
select * from post where to_tsvector(tags) ## plainto_tsquery(
select array_to_string(array(select RTRIM(value) from exclude where key = 'tag'), ' ')
)
What version of PostgreSQL are you on? And how flexible is your schema design? In other words, can you change it at will? Or is this out of your control?
Two things immediately popped to mind when I read your questions. One is you should be able to use array and the the #> (contains) or <# (is contains by) operators.
Here is documentation
Second, you might be able to utilize an hstore and do a similar operation.
to:
hstore #> hstore
It's not a true hstore, because you are not using a real key=>value pair. But, I guess you could do {tagname}=True or {tagname}=NULL. Might be a bit hackish.
You can see the documentation (for PostgreSQL 9.1) hstore and how to use it here

Aggregating MoreLikeThis Results in RavenDB

I have been trying out the MoreLikeThis Bundle to bring back a set of documents ordered by the number of matches in a field called 'backyardigans' compared to a key document. This all works as expected.
But what I would like to do is order by the number of matches of 3 separate fields added together.
An example record would be:
var data = new Data{
backyardigans = "Pablo Tasha Uniqua Tyrone Austin",
engines = "Thomas Percy Henry Toby",
pigs = "Daddy Peppa George Mummy Granny"
};
If another document matched 1 backyardigan 2 engines and 1 pig it would get a score of 4
If another document matched 2 backyardigans 4 engines and 0 pigs it would get a score of 6
These aggregated scores would be the field we would order the results by so they would come back 6,4 and so on.
Is there a way to achieve this with the MoreLikeThis bundle please?
This isn't possible, we use only a single field frequency for this.
This is important because we need to compare the score on a field basis, and it isn't really possible to compare it on a global basis without taking into account the per fields values.
Note that this is also a limitation in the underlying Lucene implementation, so there isn't much we can do about it.