Lucene.net - query returning unwanted documents - lucene

All of my Lucene.net (2.9.2) documents have two fields:
categoryid
bodytext
bodytext is the default field, and is where all of the document's text is stored (using Field.Store.NO , Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS ).
categoryid is just a numeric field stored as text: Field.Store.YES, Field.Index.NOT_ANALYZED
When this query is executed, it only returns documents with that category ID: categoryid:1
However when I perform this query: categoryid:1 foo bar it returns documents from other categories other than 1.
Why is this? And how can I force it to respect the original categoryid:N query term?

Do you want to require all words entered to be present in your matched documents?
var analyzer = new StandardAnalyzer(Version.LUCENE_30);
var queryParser = new QueryParser(Version.LUCENE_30, "bodytext", analyzer);
// This ensures that all terms are required.
queryParser.DefaultOperator = QueryParser.Operator.AND;
var query = queryParser.Parse("categoryid:1 foo bar");
// query = "+categoryid:1 +bodytext:foo +bodytext:bar"

Related

Dynamic fields for my index not coming up in the list of fields

I am trying to get the fields of my index using the below code snippet.
var fieldsList= DocumentStore.DatabaseCommands.GetIndex("IndexName").Fields.ToList();
This is returning a string list with all fields defined in the index except the dynamic fields ( fields returned from _ ).
Here is the Map command for my index.
Map = products =>
from product in product s
select new
{
product.Title,
product.Subject,
product.From,
_ = product.
Attributes.Select(attribute =>
CreateField(attribute.Name, attribute.Value, false, true))
};
That is by design. The list of fields is the static fields that are in your index.
We don't try to find the dynamic ones.

Getting greater count result in raven index statistics

I have an index, with a transformation:
docs.FeedPosts
.SelectMany(doc => (doc.Labels).DefaultIfEmpty(), (doc, docLabelsItem1) => new {AnnouncementGuid = doc.AnnouncementGuid, CreationDateUtc = doc.CreationWhenAndWhere.Time, FeedOwner = doc.FeedOwner, Key = doc.Key, Labels_Text = doc.Labels
.Select(label => label.Text), SequentialId = ((long)doc.SequentialId), SubjectGuid = doc.SubjectGuid, SubjectId = doc.SubjectId})
(transform)
results
.Select(doc => new {doc = doc, tags = Database.Load(doc.Key)})
.Select(__h__TransparentIdentifier1 => new {AnnouncementGuid = __h__TransparentIdentifier1.tags.AnnouncementGuid, AreCommentsLocked = __h__TransparentIdentifier1.tags.AreCommentsLocked, Author = __h__TransparentIdentifier1.tags.Author, Comments = __h__TransparentIdentifier1.tags.Comments, CreationWhenAndWhere = __h__TransparentIdentifier1.tags.CreationWhenAndWhere, FeedOwner = __h__TransparentIdentifier1.tags.FeedOwner, Key = __h__TransparentIdentifier1.tags.Key, Labels = __h__TransparentIdentifier1.tags.Labels, MessageBody = __h__TransparentIdentifier1.tags.MessageBody, SequentialId = __h__TransparentIdentifier1.tags.SequentialId, SubjectGuid = __h__TransparentIdentifier1.tags.SubjectGuid, SubjectId = __h__TransparentIdentifier1.tags.SubjectId})
Which works for QUERYING the data. But if I then request statistics, I get back the wrong document count! (I Limit the query to 0 results, and request raven statistics).
It seem that because my document looks like this:
{
Labels: [
{ Text: "label 1" }
{ Text: "label 2" }
]
}
Raven generates TWO index entries for this one document - If I look in the raven query tool, the first index contains the actual index data, the second index document is just completely empty.
If I have 3 labels in a document, it generates 3 index results... and my 'count' is 3 times what it should be.
Whats going on?
Thanks
I translated your index to the query syntax, because that is easier to look at:
from doc in docs.FeedPosts
from NOT_USING_THIS in doc.labels
select new
{
AnnouncementGuid = doc.AnnouncementGuid,
CreationDateUtc = doc.CreationWhenAndWhere.Time,
FeedOwner = doc.FeedOwner,
Key = doc.Key,
Labels_Text = doc.Labels.Select(label => label.Text),
SequentialId = ((long)doc.SequentialId),
SubjectGuid = doc.SubjectGuid,
SubjectId = doc.SubjectId}
}
The second from clause is the SelectMany() in your index, whose value you are not using
If you'll remove that and work on top of the root object, you won't have this issue.
The docs for this are:
http://ravendb.net/docs/faq/skipped-results

Raven query returns 0 results for collection contains

I have a basic schema
Post {
Labels: [
{ Text: "Mine" }
{ Text: "Incomplete" }
]
}
And I am querying raven, to ask for all posts with BOTH "Mine" and "Incomplete" labels.
queryable.Where(candidate => candidate.Labels.Any(label => label.Text == "Mine"))
.Where(candidate => candidate.Labels.Any(label => label.Text == "Incomplete"));
This results in a raven query (from Raven server console)
Query: (Labels,Text:Incomplete) AND (Labels,Text:Mine)
Time: 3 ms
Index: Temp/XWrlnFBeq8ENRd2SCCVqUQ==
Results: 0 returned out of 0 total.
Why is this? If I query for JUST containing "Incomplete", I get 1 result.
If I query for JUST containing "Mine", I get the same result - so WHY where I query for them both, I get 0 results?
EDIT:
Ok - so I got a little further. The 'automatically generated index' looks like this
from doc in docs.FeedAnnouncements
from docLabelsItem in ((IEnumerable<dynamic>)doc.Labels).DefaultIfEmpty()
select new { CreationDate = doc.CreationDate, Labels_Text = docLabelsItem.Text }
So, I THINK the query was basically testing the SAME label for 2 different values. Bad.
I changed it to this:
from doc in docs.FeedAnnouncements
from docLabelsItem1 in ((IEnumerable<dynamic>)doc.Labels).DefaultIfEmpty()
from docLabelsItem2 in ((IEnumerable<dynamic>)doc.Labels).DefaultIfEmpty()
select new { CreationDate = doc.CreationDate, Labels1_Text = docLabelsItem1.Text, Labels2_Text = docLabelsItem2.Text }
Now my query (in Raven Studio) Labels1_Text:Mine AND Labels2_Text:Incomplete WORKS!
But, how do I address these phantom fields (Labels1_Text and Labels2_Text) when querying from Linq?
Adam,
You got the reason right. The default index would generate 2 index entries, and your query is executing on a single index entry.
What you want is to either use intersection, or create your own index like this:
from doc in docs.FeedAnnouncements
select new { Labels_Text = doc.Labels.Select(x=>x.Text)}
And that would give you all the label's text in a single index entry, which you can execute a query on.

linq to sql/xml - generate xml for linked tables

i have alot of tables with alot of columns and want to generate xml using linq without having to specify
the column names. here's a quick example:
users
---------------
user_id
name
email
user_addresses
---------------
address_id
user_id
city
state
this is the xml i want to generate with linq would look like
<user>
<name>john</name>
<email>john#dlsjkf.com</email>
<address>
<city>charleston</city>
<state>sc</state>
</address>
<address>
<city>charlotte</city>
<state>nc</state>
</address>
</user>
so i'm guessing the code would look something like this:
var userxml = new XElement("user",
from row in dc.Users where user.id == 5
select (what do i put here??)
);
i can do this for one table but can't figure out how to generate the xml for a linked table (like user_addresses).
any ideas?
ok found a way to get the xml i want, but i have to specify the related table names in the query...which is good enough for now i guess. here's the code:
XElement root = new XElement("root",
from row in dc.users
where row.user_id == 5
select new XElement("user",
row.AsXElements(),
new XElement("addresses",
from row2 in dc.user_addresses
where row2.user_id == 5
select new XElement("address", row2.AsXElements())
)
)
);
// used to generate xml tags/elements named after the table column names
public static IEnumerable<XElement> AsXElements(this object source)
{
if (source == null) throw new ArgumentNullException("source");
foreach (System.Reflection.PropertyInfo prop in source.GetType().GetProperties())
{
object value = prop.GetValue(source, null);
if (value != null)
{
bool isColumn = false;
foreach (object obj in prop.GetCustomAttributes(true))
{
System.Data.Linq.Mapping.ColumnAttribute attribute = obj as System.Data.Linq.Mapping.ColumnAttribute;
if (attribute != null)
{
isColumn = true;
break;
}
}
if (isColumn)
{
yield return new XElement(prop.Name, value);
}
}
}
}
You need to use a join. Here's one way:
var query = from user in dc.Users
from addr in dc.UserAddress
where user.Id == addr.UserId
select new XElement("user",
new XElement("name", user.Name),
new XElement("email", user.Email),
new XElement("address",
new XElement("city", addr.City),
new XElement("state", addr.State)));
foreach (var item in query)
Console.WriteLine(item);
i have alot of tables with alot of
columns and want to generate xml using
linq without having to specify the
column names.
Not quite sure how you want to achieve that. You need to state the column names that go into the XML. Even if you were to reflect over the field names, how would you filter the undesired fields out and structure them properly without specifying the column names? For example how would you setup the address part? You could get the fields by using this on your User and UserAddress classes: User.GetType().GetFields() and go through the Name of each field, but then what?

SQL to Insert data into multiple tables from one POST in WebMatrix Razor Syntax

I've got two form fields from which the user submits a 'category' and an 'item'.
The following code inserts the category fine (I modified it from the WebMatrix intro PDF) but I've no idea how to then insert the 'item' into the Items table. I'll also need to add the Id of the new category to the new item row.
This is the code that's working so far
#{ var db = Database.OpenFile("StarterSite.sdf");
var Category = Request["Category"]; //was name
var Item = Request["Item"]; //was description
if (IsPost) {
// Read product name.
Category = Request["Category"];
if (Category.IsEmpty()) {
Validation.AddFieldError("Category", "Category is required");
}
// Read product description.
Item = Request["Item"];
if (Item.IsEmpty()) {
Validation.AddFieldError("Item",
"Item type is required.");
}
// Define the insert query. The values to assign to the
// columns in the Products table are defined as parameters
// with the VALUES keyword.
if(Validation.Success) {
var insertQuery = "INSERT INTO Category (CategoryName) " +
"VALUES (#0)";
db.Execute(insertQuery, Category);
// Display the page that lists products.
Response.Redirect(#Href("~/success"));
}
}
}
I'm guessing/hoping this is a very easy question to answer so hopefully there isn't much more detail required - but please let me know if there is. Thanks.
There's a Database.GetLastInsertId method within WebMatrix which returns the id of the last inserted record (assuming it's an IDENTITY column you are using). Use that:
db.Execute(insertQuery, Category);
var id = (int)db.GetLastInsertId(); //id is the new CategoryId
db.Execute(secondInsertQuery, param1, id);