Easiest way to reindex lucene.net indexes when nhibernate.search used? - nhibernate

Context =>
Calling wcf, some random stored procedures and sql stuff theoretically imports some data.
Requirements =>
Reindex lucene indexes for some of imported entities.
Question =>
What's the easiest way to do that?
Theoretically, if nhibernate is initialized, nhibernate.search should be aware which entities are supposed to be indexed. Therefore - i was wondering, are there any ready to use tools/whatnot to fulfill my requirement?
Is this the only way?

My quick and dirty approach =>
public static class LuceneReindexer
{
public static void Run()
{
var entityTypes = typeof(FooEntity).Assembly.GetTypes()
.Where(x => x.BaseType == typeof(Entity)
|| x.BaseType == typeof(KeyedEntity));
foreach (var t in entityTypes)
if (TypeDescriptor
.GetAttributes(t)[typeof(IndexedAttribute)] != null)
ReindexEntity(t);
}
private static void ReindexEntity(Type t)
{
var stop = false;
var index = 0;
const int pageSize = 500;
do
{
var list = NHibernateSession.Current.CreateCriteria(t)
.SetFirstResult(index)
.SetMaxResults(pageSize).List();
NHibernateSession.Current.Transaction.Begin();
foreach (var itm in list)
NHibernateSession.Current.Index(itm);
NHibernateSession.Current.Transaction.Commit();
index += pageSize;
if (list.Count < pageSize) stop = true;
} while (!stop);
}
}
No ideas about transaction and paging part (and don't care at the moment). Kind a does what i needed. :D

Related

My Akka.Net Demo is incredibly slow

I am trying to get a proof of concept running with akka.net. I am sure that I am doing something terribly wrong, but I can't figure out what it is.
I want my actors to form a graph of nodes. Later, this will be a complex graph of business objekts, but for now I want to try a simple linear structure like this:
I want to ask a node for a neighbour that is 9 steps away. I am trying to implement this in a recursive manner. I ask node #9 for a neighbour that is 9 steps away, then I ask node #8 for a neighbour that is 8 steps away and so on. Finally, this should return node #0 as an answer.
Well, my code works, but it takes more than 4 seconds to execute. Why is that?
This is my full code listing:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using Akka;
using Akka.Actor;
namespace AkkaTest
{
class Program
{
public static Stopwatch stopwatch = new Stopwatch();
static void Main(string[] args)
{
var system = ActorSystem.Create("MySystem");
IActorRef[] current = new IActorRef[0];
Console.WriteLine("Initializing actors...");
for (int i = 0; i < 10; i++)
{
var current1 = current;
var props = Props.Create<Obj>(() => new Obj(current1, Guid.NewGuid()));
var actorRef = system.ActorOf(props, i.ToString());
current = new[] { actorRef };
}
Console.WriteLine("actors initialized.");
FindNeighboursRequest r = new FindNeighboursRequest(9);
stopwatch.Start();
var response = current[0].Ask(r);
FindNeighboursResponse result = (FindNeighboursResponse)response.Result;
stopwatch.Stop();
foreach (var d in result.FoundNeighbours)
{
Console.WriteLine(d);
}
Console.WriteLine("Search took " + stopwatch.ElapsedMilliseconds + "ms.");
Console.ReadLine();
}
}
public class FindNeighboursRequest
{
public FindNeighboursRequest(int distance)
{
this.Distance = distance;
}
public int Distance { get; private set; }
}
public class FindNeighboursResponse
{
private IActorRef[] foundNeighbours;
public FindNeighboursResponse(IEnumerable<IActorRef> descendants)
{
this.foundNeighbours = descendants.ToArray();
}
public IActorRef[] FoundNeighbours
{
get { return this.foundNeighbours; }
}
}
public class Obj : ReceiveActor
{
private Guid objGuid;
readonly List<IActorRef> neighbours = new List<IActorRef>();
public Obj(IEnumerable<IActorRef> otherObjs, Guid objGuid)
{
this.neighbours.AddRange(otherObjs);
this.objGuid = objGuid;
Receive<FindNeighboursRequest>(r => handleFindNeighbourRequest(r));
}
public Obj()
{
}
private async void handleFindNeighbourRequest (FindNeighboursRequest r)
{
if (r.Distance == 0)
{
FindNeighboursResponse response = new FindNeighboursResponse(new IActorRef[] { Self });
Sender.Tell(response, Self);
return;
}
List<FindNeighboursResponse> responses = new List<FindNeighboursResponse>();
foreach (var actorRef in neighbours)
{
FindNeighboursRequest req = new FindNeighboursRequest(r.Distance - 1);
var response2 = actorRef.Ask(req);
responses.Add((FindNeighboursResponse)response2.Result);
}
FindNeighboursResponse response3 = new FindNeighboursResponse(responses.SelectMany(rx => rx.FoundNeighbours));
Sender.Tell(response3, Self);
}
}
}
The reason of such slow behavior is the way you use Ask (an that you use it, but I'll cover this later). In your example, you're asking each neighbor in a loop, and then immediately executing response2.Result which is actively blocking current actor (and thread it resides on). So you're essentially making synchronous flow with blocking.
The easiest thing to fix that, is to collect all tasks returned from Ask and use Task.WhenAll to collect them all, without waiting for each one in a loop. Taking this example:
public class Obj : ReceiveActor
{
private readonly IActorRef[] _neighbours;
private readonly Guid _id;
public Obj(IActorRef[] neighbours, Guid id)
{
_neighbours = neighbours;
_id = id;
Receive<FindNeighboursRequest>(async r =>
{
if (r.Distance == 0) Sender.Tell(new FindNeighboursResponse(new[] {Self}));
else
{
var request = new FindNeighboursRequest(r.Distance - 1);
var replies = _neighbours.Select(neighbour => neighbour.Ask<FindNeighboursResponse>(request));
var ready = await Task.WhenAll(replies);
var responses = ready.SelectMany(x => x.FoundNeighbours);
Sender.Tell(new FindNeighboursResponse(responses.ToArray()));
}
});
}
}
This one is much faster.
NOTE: In general you shouldn't use Ask inside of an actor:
Each ask is allocating a listener inside current actor, so in general using Ask is A LOT heavier than passing messages with Tell.
When sending messages through chain of actors, cost of ask is additionally transporting message twice (one for request and one for reply) through each actor. One of the popular patterns is that, when you are sending request from A⇒B⇒C⇒D and respond from D back to A, you can reply directly D⇒A, without need of passing the message through whole chain back. Usually combination of Forward/Tell works better.
In general don't use async version of Receive if it's not necessary - at the moment, it's slower for an actor when compared to sync version.

Fluent nHibernate, Hi-Lo table with entity-per-row using a convention

Is there a way to specify a table to use for Hi-Lo values, with each entity having a per-row entry, via a convention (while still having nHibernate create the table structure for you)? I would like to replicate what Phil Haydon blogged about here, but without having to manually manage the table. As it stands, migrating his row-per-table code to its own convention will work only if you've already created the appropriate entries for 'TableKey' in the table already.
Alternatively, is this possible via the XML mappings?
And if all else fails, is the only other appropriate option to use a custom generator, a la this post?
Fabio Maulo talked about this in one of his mapping-by-code posts.
Mapping by code example:
mapper.BeforeMapClass += (mi, type, map) =>
map.Id(idmap => idmap.Generator(Generators.HighLow,
gmap => gmap.Params(new
{
table = "NextHighValues",
column = "NextHigh",
max_lo = 100,
where = string.Format(
"EntityName = '{0}'", type.Name.ToLowerInvariant())
})));
For FluentNHibernate, you could do something like:
public class PrimaryKeyConvention : IIdConvention
{
public void Apply(IIdentityInstance instance)
{
var type = instance.EntityType.Name;
instance.Column(type + "Id");
instance.GeneratedBy.HiLo(type, "NextHigh", "100",
x => x.AddParam("where", String.Format("EntityName = '{0}'", type));
}
}
Also, Fabio explained how you could use IAuxiliaryDatabaseObject to create Hi-Lo script.
private static IAuxiliaryDatabaseObject CreateHighLowScript(
IModelInspector inspector, IEnumerable<Type> entities)
{
var script = new StringBuilder(3072);
script.AppendLine("DELETE FROM NextHighValues;");
script.AppendLine(
"ALTER TABLE NextHighValues ADD EntityName VARCHAR(128) NOT NULL;");
script.AppendLine(
"CREATE NONCLUSTERED INDEX IdxNextHighValuesEntity ON NextHighValues "
+ "(EntityName ASC);");
script.AppendLine("GO");
foreach (var entity in entities.Where(x => inspector.IsRootEntity(x)))
{
script.AppendLine(string.Format(
"INSERT INTO [NextHighValues] (EntityName, NextHigh) VALUES ('{0}',1);",
entity.Name.ToLowerInvariant()));
}
return new SimpleAuxiliaryDatabaseObject(
script.ToString(), null, new HashedSet<string> {
typeof(MsSql2005Dialect).FullName, typeof(MsSql2008Dialect).FullName
});
}
You would use it like this:
configuration.AddAuxiliaryDatabaseObject(CreateHighLowScript(
modelInspector, Assembly.GetExecutingAssembly().GetExportedTypes()));
For users of Fluent NHibernate, Anthony Dewhirst has posted a nice solution over here: http://www.anthonydewhirst.blogspot.co.uk/2012/02/fluent-nhibernate-solution-to-enable.html
Building off of Anthony Dewhirst's already excellent solution, I ended up with the following, which adds a couple improvements:
Adds Acceptance Criteria so that it doesn't try to handle non-integral Id types (e.g. Guid) and won't stomp on Id mappings which have a generator explicitly set
Script generation takes Dialect into consideration
public class HiLoIdGeneratorConvention : IIdConvention, IIdConventionAcceptance
{
public const string EntityColumnName = "entity";
public const string MaxLo = "500";
public void Accept(IAcceptanceCriteria<IIdentityInspector> criteria)
{
criteria
.Expect(x => x.Type == typeof(int) || x.Type == typeof(uint) || x.Type == typeof(long) || x.Type == typeof(ulong)) // HiLo only works with integral types
.Expect(x => x.Generator.EntityType == null); // Specific generator has not been mapped
}
public void Apply(IIdentityInstance instance)
{
instance.GeneratedBy.HiLo(TableGenerator.DefaultTableName, TableGenerator.DefaultColumnName, MaxLo,
builder => builder.AddParam(TableGenerator.Where, string.Format("{0} = '{1}'", EntityColumnName, instance.EntityType.FullName)));
}
public static void CreateHighLowScript(NHibernate.Cfg.Configuration config)
{
var dialect = Activator.CreateInstance(Type.GetType(config.GetProperty(NHibernate.Cfg.Environment.Dialect))) as Dialect;
var script = new StringBuilder();
script.AppendFormat("DELETE FROM {0};", TableGenerator.DefaultTableName);
script.AppendLine();
script.AppendFormat("ALTER TABLE {0} {1} {2} {3} NOT NULL;", TableGenerator.DefaultTableName, dialect.AddColumnString, EntityColumnName, dialect.GetTypeName(SqlTypeFactory.GetAnsiString(128)));
script.AppendLine();
script.AppendFormat("CREATE NONCLUSTERED INDEX IX_{0}_{1} ON {0} ({1} ASC);", TableGenerator.DefaultTableName, EntityColumnName);
script.AppendLine();
if (dialect.SupportsSqlBatches)
{
script.AppendLine("GO");
script.AppendLine();
}
foreach (var entityName in config.ClassMappings.Select(m => m.EntityName).Distinct())
{
script.AppendFormat("INSERT INTO [{0}] ({1}, {2}) VALUES ('{3}',1);", TableGenerator.DefaultTableName, EntityColumnName, TableGenerator.DefaultColumnName, entityName);
script.AppendLine();
}
if (dialect.SupportsSqlBatches)
{
script.AppendLine("GO");
script.AppendLine();
}
config.AddAuxiliaryDatabaseObject(new SimpleAuxiliaryDatabaseObject(script.ToString(), null));
}
}

Generate test data in Raven DB

I am looking for a preferred and maintainable way of test data generation in Raven DB. Currently, our team does have a way to do it through .NET code. Example is provided.
However, i am looking for different options. Please share.
public void Execute()
{
using (var documentStore = new DocumentStore { ConnectionStringName = "RavenDb" })
{
documentStore.Conventions.DefaultQueryingConsistency = ConsistencyOptions.QueryYourWrites;
// Override the default key prefix generation strategy of Pascal case to lower case.
documentStore.Conventions.FindTypeTagName = type => DocumentConvention.DefaultTypeTagName(type).ToLower();
documentStore.Initialize();
InitializeData(documentStore);
}
}
Edit: Raven-overflow is really helpful. Thanks for pointing out to the right place.
Try checking out RavenOverflow. In there, I've got a FakeData project that has fake data (both hardcoded AND randomly generated). This can then be used in either my Tests project or the Main Website :)
Here's some sample code...
if (isDataToBeSeeded)
{
HelperUtilities.CreateSeedData(documentStore);
}
....
public static void CreateSeedData(IDocumentStore documentStore)
{
Condition.Requires(documentStore).IsNotNull();
using (IDocumentSession documentSession = documentStore.OpenSession())
{
// First, check to make sure we don't have any data.
var user = documentSession.Load<User>(1);
if (user != null)
{
// ooOooo! we have a user, so it's assumed we actually have some seeded data.
return;
}
// We have no users, so it's assumed we therefore have no data at all.
// So let's fake some up :)
// Users.
ICollection<User> users = FakeUsers.CreateFakeUsers(50);
StoreFakeEntities(users, documentSession);
// Questions.
ICollection<Question> questions = FakeQuestions.CreateFakeQuestions(users.Select(x => x.Id).ToList());
StoreFakeEntities(questions, documentSession);
documentSession.SaveChanges();
// Make sure all our indexes are not stale.
documentStore.WaitForStaleIndexesToComplete();
}
}
....
public static ICollection<Question> CreateFakeQuestions(IList<string> userIds, int numberOfFakeQuestions)
{
.... you get the idea .....
}

NHibernate Search and Lucene initial indexing of IList

I have some boring problem of indexing IList using Lucene, and I can not fix.
My entity contains IList which I apply IndexedEmbedded attribute like this:
[ScriptIgnore] //will not serialize
[IndexedEmbedded(Depth = 1, Prefix = "BookAdditionalInfos_"]
public virtual IList<BookAdditionalInfo> BookAdditionalInfos { get; set; }
Also, some other properties used Field attribute for indexing:
[Field(Index.Tokenized, Store = Store.Yes)]
After marking entity for indexing, I have to make initial indexing of 12 millions of rows (using batch processing). And everything works perfect until I start to index IList called BookAdditionalInfos. Without this IndexedEmbedded attribute (or without indexing this IList) everything is OK, and every property mark with Field attribute will be indexed.
I am using Fluent NHibernate.
What can be a problem?
Thank you
EDIT: Also I looked at http://ayende.com/blog/3992/nhibernate-search, but without any results
The problem is: when I try to index IList, indexing taking forever and nothing will be indexed. Without indexing this IList (or without specify IndexedEmbedded to IList) indexing is OK, and I got indexed results.
EDIT (Initial Indexing function):
public void BuildInitialBookSearchIndex()
{
FSDirectory directory = null;
IndexWriter writer = null;
var type = typeof(Book);
var info = new DirectoryInfo(GetIndexDirectory());
//if (info.Exists)
//{
// info.Delete(true);
//}
try
{
directory = FSDirectory.GetDirectory(Path.Combine(info.FullName, type.Name), true);
writer = new IndexWriter(directory, new StandardAnalyzer(), true);
}
finally
{
if (directory != null)
{
directory.Close();
}
if (writer != null)
{
writer.Close();
}
}
var fullTextSession = Search.CreateFullTextSession(Session);
var currentIndex = 0;
const int batchSize = 5000;
while (true)
{
var entities = Session
.CreateCriteria<Book>()
.SetFirstResult(currentIndex)
.SetMaxResults(batchSize)
.List();
using (var tx = Session.BeginTransaction())
{
foreach (var entity in entities)
{
fullTextSession.Index(entity);
}
currentIndex += batchSize;
Session.Flush();
tx.Commit();
Session.Clear();
}
if (entities.Count < batchSize)
break;
}
}
It looks to me like you've got a classic N+1 select problem there - basically when you're selecting your books, you're not also selecting the BookAdditionalInfos, so NHibernate will have to issue a new select for each and every book to retrieve the BookAdditionalInfo's for that book while indexing.
A quick fix would be to change your select to:
var entities = Session
.CreateCriteria<Book>()
.SetFetchMode("BookAdditionalInfos", FetchMode.Eager)
.SetResultTransformer(Transformers.DistinctRootEntity)
.SetFirstResult(currentIndex)
.SetMaxResults(batchSize)
.List();
You'll probably run into additional problems with your paging now however because it will do a join onto the BookAdditionalInfo table giving you multiple rows for the same entity in your result set, so you might want to look at doing something like:
var pagedEntities = DetachedCriteria.For<Book>()
.SetFirstResult(currentIndex)
.SetMaxResults(batchSize)
.SetProjection(Projections.Id());
var entities = Session
.CreateCriteria<Book>()
.Add(Property.ForName("id").In(pagedEntities))
.SetFetchMode("BookAdditionalInfos", FetchMode.Eager)
.SetResultTransformer(Transformers.DistinctRootEntity)
.List();

EntityFramework, Insert if not exist, otherwise update

I'm having a Entity-Set Countries, reflecting a database table '<'char(2),char(3),nvarchar(50> in my database.
Im having a parser that returns a Country[] array of parsed countries, and is having issues with getting it updated in the right way. What i want is: Take the array of countries, for those countries not already in the database insert them, and those existing update if any fields is different. How can this be done?
void Method(object sender, DocumentLoadedEvent e)
{
var data = e.ParsedData as Country[];
using(var db = new DataContractEntities)
{
//Code missing
}
}
I was thinking something like
for(var c in data.Except(db.Countries)) but it wount work as it compares on wronge fields.
Hope anyone have had this issues before, and have a solution for me. If i cant use the Country object and insert/update an array of them easy, i dont see much benefict of using the framework, as from performers i think its faster to write a custom sql script that inserts them instead of ect checking if an country is already in the database before inserting?
Solution
See answer of post instead.
I added override equals to my country class:
public partial class Country
{
public override bool Equals(object obj)
{
if (obj is Country)
{
var country = obj as Country;
return this.CountryTreeLetter.Equals(country.CountryTreeLetter);
}
return false;
}
public override int GetHashCode()
{
int hash = 13;
hash = hash * 7 + (int)CountryTreeLetter[0];
hash = hash * 7 + (int)CountryTreeLetter[1];
hash = hash * 7 + (int)CountryTreeLetter[2];
return hash;
}
}
and then did:
var data = e.ParsedData as Country[];
using (var db = new entities())
{
foreach (var item in data.Except(db.Countries))
{
db.AddToCountries(item);
}
db.SaveChanges();
}
I would do it straightforward:
void Method(object sender, DocumentLoadedEvent e)
{
var data = e.ParsedData as Country[];
using(var db = new DataContractEntities)
{
foreach(var country in data)
{
var countryInDb = db.Countries
.Where(c => c.Name == country.Name) // or whatever your key is
.SingleOrDefault();
if (countryInDb != null)
db.Countries.ApplyCurrentValues(country);
else
db.Countries.AddObject(country);
}
db.SaveChanges();
}
}
I don't know how often your application must run this or how many countries your world has. But I have the feeling that this is nothing where you must think about sophisticated performance optimizations.
Edit
Alternative approach which would issue only one query:
void Method(object sender, DocumentLoadedEvent e)
{
var data = e.ParsedData as Country[];
using(var db = new DataContractEntities)
{
var names = data.Select(c => c.Name);
var countriesInDb = db.Countries
.Where(c => names.Contains(c.Name))
.ToList(); // single DB query
foreach(var country in data)
{
var countryInDb = countriesInDb
.SingleOrDefault(c => c.Name == country.Name); // runs in memory
if (countryInDb != null)
db.Countries.ApplyCurrentValues(country);
else
db.Countries.AddObject(country);
}
db.SaveChanges();
}
}
The modern form, using later EF versions would be:
context.Entry(record).State = (AlreadyExists ? EntityState.Modified : EntityState.Added);
context.SaveChanges();
AlreadyExists can come from checking the key or by querying the database to see whether the item already exists there.
You can implement your own IEqualityComparer<Country> and pass that to the Except() method. Assuming your Country object has Id and Name properties, one example of that implementation could look like this:
public class CountryComparer : IEqualityComparer<Country>
{
public bool Equals(Country x, Country y)
{
return x.Name.Equals(y.Name) && (x.Id == y.Id);
}
public int GetHashCode(Country obj)
{
return string.Format("{0}{1}", obj.Id, obj.Name).GetHashCode();
}
}
and use it as
data.Countries.Except<Country>(db, new CountryComparer());
Although, in your case it looks like you just need to extract new objects, you can use var newCountries = data.Where(c => c.Id == Guid.Empty); if your Id is Guid.
The best way is to inspect the Country.EntityState property and take actions from there regarding on value (Detached, Modified, Added, etc.)
You need to provide more information on what your data collection contains i.e. are the Country objects retrieved from a database through the entityframework, in which case their context can be tracked, or are you generating them using some other way.
I am not sure this will be the best solution but I think you have to get all countries from DB then check it with your parsed data
void Method(object sender, DocumentLoadedEvent e)
{
var data = e.ParsedData as Country[];
using(var db = new DataContractEntities)
{
List<Country> mycountries = db.Countries.ToList();
foreach(var PC in data)
{
if(mycountries.Any( C => C.Name==PC.Name ))
{
var country = mycountries.Any( C => C.Name==PC.Name );
//Update it here
}
else
{
var newcountry = Country.CreateCountry(PC.Name);//you must provide all required parameters
newcountry.Name = PC.Name;
db.AddToCountries(newcountry)
}
}
db.SaveChanges();
}
}