linq to sql insert locks - sql

here is some insert code
gkInfo.data.ToList()
.ForEach(p => p.hour.ToList()
.ForEach(r => r.block.ToList()
.ForEach(q =>
{
var v = new VarValues();
v.dt = DateTime.Parse(p.target_date + " " + (r.value - 1).ToString() + ":00:00");
v.id_objecttype = config.stations.Where(i => i.text == q.station_name).Single().id_objecttype;
v.id_object = q.bnum.ToString();
v.id_param = config.stations.Where(i => i.text == q.station_name).Single().id_param;
v.pl_lev = 3;
v.source = 0;
v.value = q.block_state;
v.version = version;
v.description = q.change_type;
m53500context1.VarValues.InsertOnSubmit(v);
}
)));
m53500context1.SubmitChanges();
and this code, locks table.
can i avoid it? or its impossible?

Although I don't know all the details regarding your issue the patterns seems very familiar. Often times you need to do a big update in the database, but at the same time you still need the database be available, so for example if there is a web site that is working off the dataset, it does not time-out while update operations in progress.
Sometimes update operations can be regular exports from another database, sometimes, calculating some caches, not unlike the example you have provided.
If you need your update be transactional (i.e. all or nothing), there is no real way around the lock. While the update is underway the table is locked. If you don't need transaction, then you can try and break up you update to smaller batches. SubmitChanges, wrap all the changes in a single transaction, so you gonna need to use several SubmitChanges, so each individual transaction is fast and thus not locking the table for long.
If being transactional is a requirement, you can do you inserts in a staging area, i.e. not in the same table that other processes read from. When the insert is finished you figure out the way, to swap the areas. This could complicated by the fact that there maybe updates into this table that you haven't accounted for, but I do not know if this is true in your case.
In the worst case you will need to have some application logic, that knows that update is in progress and while it's happening it reads the data from an alternate location. Of course you will have to provide this alternate location (a copy) to read from.
There is no hard and fast answer, but there a few things (above) that you can try. Also feel free to tell more about your specific task / requirements.

Please see: LINQ To SQL NO_LOCK. You need to set a different isolation level for your transaction.

Related

Ravendb memory leak on query

I'm having a hard problem solving an issue with RavenDB.
At my work we have a process to trying to identify potential duplicates in our database on a specified collection (let's call it users collection).
That means, I'm iterating through the collection and for each document there is a query that is trying to find similar entities. So just imagine, it's quite a long task to run.
My problem is, when the task starts running, the memory consumption for RavenDB is going higher and higher, it's literally just growing and growing, and it seems to continue until it reaches the maximum memory of the system.
But it doesn't really makes sense, since I'm only doing query, I'm using one single index and take a default page size when querying (128).
Anybody meet a similar problem like this? I really have no idea what is going on in ravendb. but it seems like a memory leak.
RavenDB version: 3.0.179
When i need to do massive operations on large collections i work following this steps to prevent problems on memory usage:
I use Query Streaming to extract all the ids of the documents that i want to process (with a dedicated session)
I open a new session for each id, i load the document and then i do what i need
First, a recommendation: if you don't want duplicates, store them with a well-known ID. For example, suppose you don't want duplicate User objects. You'd store them with an ID that makes them unique:
var user = new User() { Email = "foo#bar.com" };
var id = "Users/" + user.Email; // A well-known ID
dbSession.Store(user, id);
Then, when you want to check for duplicates, just check against the well known name:
public string RegisterNewUser(string email)
{
// Unlike .Query, the .Load call is ACID and never stale.
var existingUser = dbSession.Load<User>("Users/" + email);
if (existingUser != null)
{
return "Sorry, that email is already taken.";
}
}
If you follow this pattern, you won't have to worry about running complex queries nor worry about stale indexes.
If this scenario can't work for you for some reason, then we can help diagnose your memory issues. But to diagnose that, we'll need to see your code.

Memory leak in Rails 3.0.11 migration

A migration contains the following:
Service.find_by_sql("select
service_id,
registrations.regulator_given_id,
registrations.regulator_id
from
registrations
order by
service_id, updated_at desc").each do |s|
this_service_id = s["service_id"]
if this_service_id != last_service_id
Service.find(this_service_id).update_attributes!(:regulator_id => s["regulator_id"],
:regulator_given_id => s["regulator_given_id"])
last_service_id = this_service_id
end
end
and it is eating up memory, to the point where it will not run in the 512MB allowed in Heroku (the registrations table has 60,000 items). Is there a known problem? Workaround? Fix in a later version of Rails?
Thanks in advance
Edit following request to clarify:
That is all the relevant source - the rest of the migration creates the two new columns that are being populated. The situation is that I have data about services from multiple sources (regulators of the services) in the registrations table. I have decided to 'promote' some of the data ([prime]regulator_id and [prime]regulator_given_key) into the services table for the prime regulators to speed up certain queries.
This will load all 60000 items in one go and keep those 60000 AR objects around, which will consume a fair amount of memory. Rails does provide a find_each method for breaking down a query like that into chunks of 1000 objects at a time, but it doesn't allow you to specify an ordering as you do.
You're probably best off implementing your own paging scheme. Using limit/offset is a possibility however large OFFSET values are usually inefficient because the database server has to generate a bunch of results that it then discards.
An alternative is to add conditions to your query that ensures that you don't return already processed items, for example specifying that service_id be less than the previously returned values. This is more complicated if when compared in this matter some items are equal. With both of these paging type schemes you probably need to think about what happens if a row gets inserted into your registrations table while you are processing it (probably not a problem with migrations, assuming you run them with access to the site disabled)
(Note: OP reports this didn't work)
Try something like this:
previous = nil
Registration.select('service_id, regulator_id, regulator_given_id')
.order('service_id, updated_at DESC')
.each do |r|
if previous != r.service_id
service = Service.find r.service_id
service.update_attributes(:regulator_id => r.regulator_id, :regulator_given_id => r.regulator_given_id)
previous = r.service_id
end
end
This is a kind of hacky way of getting the most recent record from regulators -- there's undoubtedly a better way to do it with DISTINCT or GROUP BY in SQL all in a single query, which would not only be a lot faster, but also more elegant. But this is just a migration, right? And I didn't promise elegant. I also am not sure it will work and resolve the problem, but I think so :-)
The key change is that instead of using SQL, this uses AREL, meaning (I think) the update operation is performed once on each associated record as AREL returns them. With SQL, you return them all and store in an array, then update them all. I also don't think it's necessary to use the .select(...) clause.
Very interested in the result, so let me know if it works!

mysterious oracle query

if a query in oracle takes the first time it is executed 11 minutes, and the next time, the same query 25 seconds, with the buffer being flushed, what is the possible cause? could it be that the query is written in a bad way?
set timing on;
set echo on
set lines 999;
insert into elegrouptmp select idcll,idgrpl,0 from elegroup where idgrpl = 109999990;
insert into SLIMONTMP (idpartes, indi, grecptseqs, devs, idcll, idclrelpayl)
select rel.idpartes, rel.indi, rel.idgres,rel.iddevs,vpers.idcll,nvl(cdsptc.idcll,vpers.idcll)
from
relbqe rel,
elegrouptmp ele,
vrdlpers vpers
left join cdsptc cdsptc on
(cdsptc.idclptcl = vpers.idcll and
cdsptc.cdptcs = 'NOS')
where
rel.idtits = '10BCPGE ' and
vpers.idbqes = rel.idpartes and
vpers.cdqltptfc = 'N' and
vpers.idcll = ele.idelegrpl and
ele.idgrpl = 109999990;
alter system flush shared_pool;
alter system flush buffer_cache;
alter system flush global context;
select /* original */ mvtcta_part_SLIMONtmp.idpartes,mvtcta_part_SLIMONtmp.indi,mvtcta_part_SLIMONtmp.grecptseqs,mvtcta_part_SLIMONtmp.devs,
mvtcta_part_SLIMONtmp.idcll,mvtcta_part_SLIMONtmp.idclrelpayl,mvtcta_part_vrdlpers1.idcll,mvtcta_part_vrdlpers1.shnas,mvtcta_part_vrdlpers1.cdqltptfc,
mvtcta_part_vrdlpers1.idbqes,mvtcta_part_compte1.idcll,mvtcta_part_compte1.grecpts,mvtcta_part_compte1.seqc,mvtcta_part_compte1.devs,mvtcta_part_compte1.sldminud,
mvtcta.idcll,mvtcta.grecptseqs,mvtcta.devs,mvtcta.termel,mvtcta.dtcptl,mvtcta.nusesi,mvtcta.fiches,mvtcta.indl,mvtcta.nuecrs,mvtcta.dtexel,mvtcta.dtvall,
mvtcta.dtpayl,mvtcta.ioi,mvtcta.mtd,mvtcta.cdlibs,mvtcta.libcps,mvtcta.sldinitd,mvtcta.flagtypei,mvtcta.flagetati,mvtcta.flagwarnl,mvtcta.flagdonei,mvtcta.oriindl,
mvtcta.idportfl,mvtcta.extnuecrs
from SLIMONtmp mvtcta_part_SLIMONtmp
left join vrdlpers mvtcta_part_vrdlpers1 on
(
mvtcta_part_vrdlpers1.idbqes = mvtcta_part_SLIMONtmp.idpartes
and mvtcta_part_vrdlpers1.cdqltptfc = 'N'
and mvtcta_part_vrdlpers1.idcll = mvtcta_part_SLIMONtmp.idcll
)
left join compte mvtcta_part_compte1 on
(
mvtcta_part_compte1.idcll = mvtcta_part_vrdlpers1.idcll
and mvtcta_part_compte1.grecpts = substr (mvtcta_part_SLIMONtmp.grecptseqs, 1, 2 )
and mvtcta_part_compte1.seqc = substr (mvtcta_part_SLIMONtmp.grecptseqs, -1 )
and mvtcta_part_compte1.devs = mvtcta_part_SLIMONtmp.devs
and (mvtcta_part_compte1.devs = ' ' or ' ' = ' ')
and mvtcta_part_compte1.cdpartc not in ( 'L' , 'R' )
)
left join mvtcta mvtcta on
(
mvtcta.idcll = mvtcta_part_SLIMONtmp.idclrelpayl
and mvtcta.devs = mvtcta_part_SLIMONtmp.devs
and mvtcta.grecptseqs = mvtcta_part_SLIMONtmp.grecptseqs
and mvtcta.flagdonei <> 0
and mvtcta.devs = mvtcta_part_compte1.devs
and mvtcta.dtvall > 20101206
)
where 1=1
order by mvtcta_part_compte1.devs,
mvtcta_part_SLIMONtmp.idpartes,
mvtcta_part_SLIMONtmp.idclrelpayl,
mvtcta_part_SLIMONtmp.grecptseqs,
mvtcta.dtvall;
"if a query in oracle takes the first
time it is executed 11 minutes, and
the next time, the same query 25
seconds, with the buffer being
flushed, what is the possible cause?"
The thing is, flushing the DB Buffers, like this ...
alter system flush shared_pool
/
... wipes the Oracle data store but there are other places where data gets cached. For instance the chances are your OS caches its file reads.
EXPLAIN PLAN is good as a general guide to how the database thinks it will execute a query, but it is only a prediction. It can be thrown out by poor statistics or ambient conditions. It is not good at explaining why a specific instance of a query took as much time as it did.
So, if you really want to understand what occurs when the database executes a specific query you need to get down and dirty, and learn how to use the Wait Interface. This is a very powerful tracing mechanism, which allows us to see the individual events that happen over the course of a single query execution. Each version of Oracle has extended the utility and richness of the Wait Interface, but it has been essential to proper tuning since Oracle 9i (if not earlier).
Find out more by reading Roger Schrag's very good overview .
In your case you'll want to run the trace multiple times. In order to make it easier to compare results you should use a separate session for each execution, setting the 10046 event each time.
What else is happening on the box when you ran these? You can get different timings based on other processes chewing resources. Also, with a lot of joins, performance will depend on memory usage (hash_area_size, sort_area_size, etc) and availability, so perhaps you are paging (check temp space size/usage also). In short, try sql_trace and tkprof to analyze deeper
Sometimes a block is written to the file system before it is committed (a dirty block). When that block is read later, Oracle sees that it was uncommitted. It checks the open transaction and, if the transaction isn't still there, it knows the change was committed. Therefore it writes the block back as a clean block. It is called delayed block cleanout.
That is one possible reason why reading blocks for the first time can be slower than a subsequent re-read.
Could be the second time the execution plan is known. Maybe the optimizer has a very hard time finding a execution plan for some reason.
Try setting
alter session set optimizer_max_permutations=100;
and rerun the query. See if that makes any difference.
could it be that the query is written in a bad way?
"bad" is a rather emotional expression - but broadly speaking, yes, if a query performs significantly faster the second time it's run, it usually means there are ways to optimize the query.
Actually optimizing the query is - as APC says - rather a question of "down and dirty". Obvious candidate in your examply might be the substring - if the table is huge, and the substring misses the index, I'd imagine that would take a bit of time, and I'd imagine the result of all those substrin operations are cached somewhere.
Here's Tom Kyte's take on flushing Oracle buffers as a testing practice. Suffice it to say he's not a fan. He favors the approach of attempting to emulate your production load with your test data ("real life"), and tossing out the first and last runs. #APC's point about OS caching is Tom's point - to get rid of that (non-trivial!) effect you'd need to bounce the server, not just the database.

Fastest way to query for object existence in NHibernate

I am looking for the fastest way to check for the existence of an object.
The scenario is pretty simple, assume a directory tool, which reads the current hard drive. When a directory is found, it should be either created, or, if already present, updated.
First lets only focus on the creation part:
public static DatabaseDirectory Get(DirectoryInfo dI)
{
var result = DatabaseController.Session
.CreateCriteria(typeof (DatabaseDirectory))
.Add(Restrictions.Eq("FullName", dI.FullName))
.List<DatabaseDirectory>().FirstOrDefault();
if (result == null)
{
result = new DatabaseDirectory
{
CreationTime = dI.CreationTime,
Existing = dI.Exists,
Extension = dI.Extension,
FullName = dI.FullName,
LastAccessTime = dI.LastAccessTime,
LastWriteTime = dI.LastWriteTime,
Name = dI.Name
};
}
return result;
}
Is this the way to go regarding:
Speed
Separation of Concern
What comes to mind is the following: A scan will always be performed "as a whole". Meaning, during a scan of drive C, I know that nothing new gets added to the database (from some other process). So it MAY be a good idea to "cache" all existing directories prior to the scan, and look them up this way. On the other hand, this may be not suitable for large sets of data, like files (which will be 600.000 or more)...
Perhaps some performance gain can be achieved using "index columns" or something like this, but I am not so familiar with this topic. If anybody has some references, just point me in the right direction...
Thanks,
Chris
PS: I am using NHibernate, Fluent Interface, Automapping and SQL Express (could switch to full SQL)
Note:
In the given problem, the path is not the ID in the database. The ID is an auto-increment, and I can't change this requirement (other reasons). So the real question is, what is the fastest way to "check for the existance of an object, where the ID is not known, just a property of that object"
And batching might be possible, by selecting a big group with something like "starts with C:Testfiles\" but the problem then remains, how do I know in advance how big this set will be. I cant select "max 1000" and check in this buffered dictionary, because i might "hit next to the searched dir"... I hope this problem is clear. The most important part, is, is buffering really affecting performance this much. If so, does it make sense to load the whole DB in a dictionary, containing only PATH and ID (which will be OK, even if there are 1.000.000 object, I think..)
First off, I highly recommend that you (anyone using NH, really) read Ayende's article about the differences between Get, Load, and query.
In your case, since you need to check for existence, I would use .Get(id) instead of a query for selecting a single object.
However, I wonder if you might improve performance by utilizing some knowledge of your problem domain. If you're going to scan the whole drive and check each directory for existence in the database, you might get better performance by doing bulk operations. Perhaps create a DTO object that only contains the PK of your DatabaseDirectory object to further minimize data transfer/processing. Something like:
Dictionary<string, DirectoryInfo> directories;
session.CreateQuery("select new DatabaseDirectoryDTO(dd.FullName) from DatabaseDirectory dd where dd.FullName in (:ids)")
.SetParameterList("ids", directories.Keys)
.List();
Then just remove those elements that match the returned ID values to get the directories that don't exist. You might have to break the process into smaller batches depending on how large your input set is (for the files, almost certainly).
As far as separation of concerns, just keep the operation at a repository level. Have a method like SyncDirectories that takes a collection (maybe a Dictionary if you follow something like the above) that handles the process for updating the database. That way your higher application logic doesn't have to worry about how it all works and won't be affected should you find an even faster way to do it in the future.

Batch Update in NHibernate

Does batch update command exist in NHibernate? As far as I am aware it doesn't. So what's the best way to handle this situation? I would like to do the following:
Fetch a list of objects ( let's call them a list of users, List<User> ) from the database
Change the properties of those objects, ( Users.Foreach(User=>User.Country="Antartica")
Update each item back individually ( Users.Foreach(User=>NHibernate.Session.Update(User)).
Call Session.Flush to update the database.
Is this a good approach? Will this resulted in a lot of round trip between my code and the database?
What do you think? Or is there a more elegant solution?
I know I'm late to the party on this, but thought you may like to know this is now possible using HQL in NHibernate 2.1+
session.CreateQuery(#"update Users set Country = 'Antarctica'")
.ExecuteUpdate();
Starting NHibernate 3.2 batch jobs have improvements which minimizes database roundtrips. More information can be found on HunabKu blog.
Here is example from it - these batch updates do only 6 roundtrips:
using (ISession s = OpenSession())
using (s.BeginTransaction())
{
for (int i = 0; i < 12; i++)
{
var user = new User {UserName = "user-" + i};
var group = new Group {Name = "group-" + i};
s.Save(user);
s.Save(group);
user.AddMembership(group);
}
s.Transaction.Commit();
}
You can set the batch size for updates in the nhibernate config file.
<property name="hibernate.adonet.batch_size">16</property>
And you don't need to call Session.Update(User) there - just flush or commit a transaction and NHibernate will handle things for you.
EDIT: I was going to post a link to the relevant section of the nhibernate docs but the site is down - here's an old post from Ayende on the subject:
As to whether the use of NHibernate (or any ORM) here is a good approach, it depends on the context. If you are doing a one-off update of every row in a large table with a single value (like setting all users to the country 'Antarctica' (which is a continent, not a country by the way!), then you should probably use a sql UPDATE statement. If you are going to be updating several records at once with a country as part of your business logic in the general usage of your application, then using an ORM could be a more sensible method. This depends on the number of rows you are updating each time.
Perhaps the most sensible option here if you are not sure is to tweak the batch_size option in NHibernate and see how that works out. If the performance of the system is not acceptable then you might look at implementing a straight sql UPDATE statement in your code.
Starting with NHibernate 5.0 it is possible to make bulk operations using LINQ.
session.Query<Cat>()
.Where(c => c.BodyWeight > 20)
.Update(c => new { BodyWeight = c.BodyWeight / 2 });
NHibernate will generate a single "update" sql query.
See Updating entities
You don't need to update, nor flush:
IList<User> users = session.CreateQuery (...).List<User>;
users.Foreach(u=>u.Country="Antartica")
session.Transaction.Commit();
I think NHibernate writes a batch for all the changes.
The problem is, that your users need to be loaded into memory. If it gets a problem, you can still use native SQL using NHibernate. But until you didn't prove that it is a performance problem, stick with the nice solution.
No it's not a good approach!
Native SQL is many times better for this sort of update.
UPDATE USERS SET COUNTRY = 'Antartica';
Just could not be simpler and the database engine will process this one hundred times more efficiently than row at a time Java code.