I am trying to create a program that has two main parts: Projects and Audits
A Project will have a start date and a number that will tell it how many days until completion (they're all preset).
I need an Audit to be created say one day before the Project is complete.
No big deal so far.
The problem I forsee and am struggling with how to answer is:
How can I efficiently maintain this? I am using MVC4 and have tried Action Filters. I have the code running on every page - to cover any custom urls accessing the site. However, this seems to slow things down considerably. Here's what I have so far:
public class CreateAuditsController : ActionFilterAttribute
{
private QAAPPEntities db = new QAAPPEntities();
public override void OnActionExecuted(ActionExecutedContext filterContext)
{
CreateAuditsFromProjects();
base.OnActionExecuted(filterContext);
}
public void CreateAuditsFromProjects()
{
// Generate list of Research Projects
List<ResearchProject> allResearchProjects = db.ResearchProjects.Include(a => a.ResearchProjectStatus).ToList();
// Cycle through those to see which ones need audits created
for (int i = 0; i < allResearchProjects.Count; i++)
{
// Get current looped project for further use
ResearchProject currentLoopedProject = allResearchProjects[i];
if (currentLoopedProject.Description == "Some Description")
{
Audit auditToAdd = new Audit();
auditToAdd.ResearchProjectID = currentLoopedProject.ResearchProjectID;
auditToAdd.Name = "Some Name";
auditToAdd.AssignedQAID = 1;
auditToAdd.CreatedDate = DateTime.Now;
auditToAdd.AuditStatusID = 1;
auditToAdd.AuditTypeID = 1;
db.Audits.Add(auditToAdd);
db.SaveChanges();
}
}
db.SaveChanges();
// Create audits
}
}
Any help is appreciated.
Check out Revalee. It let's you schedule actions to be taken at a later time. So, when you create a project, you would then use the Revalee client to schedule an audit to be created one day before it ends.
Simple solution is that, create a database event with the logic of checking all projects with one day left and add a record in audit table for each of those projects.
Thank you.
Related
I have some problem with updating all documents in a collection. What I need to do: I need to iterate through ~2 million docs load each doc into memory, parse HTML from one of fields of a doc and save the doc back to DB.
I tried take/skip logic with/without indexes but Id etc. but some records still remain unchanged (even tested for 1000 records with 128 records in a page). In the process of updating documents no more records are inserted. Simple patching (patching API) does not work for this as the update I need to perform is quite complex
Please help with this. Thanks
Code:
public static int UpdateAll<T>(DocumentStore docDB, Action<T> updateAction)
{
return UpdateAll(0, docDB, updateAction);
}
public static int UpdateAll<T>(int startFrom, DocumentStore docDB, Action<T> updateAction)
{
using (var session = docDB.OpenSession())
{
int queryCount = 0;
int start = startFrom;
while (true)
{
var current = session.Query<T>().Take(128).Skip(start).ToList();
if (current.Count == 0)
break;
start += current.Count;
foreach (var doc in current)
{
updateAction(doc);
}
session.SaveChanges();
queryCount += 2;
if (queryCount >= 30)
{
return UpdateAll(start, docDB, updateAction);
}
}
}
return 1;
}
Move your session.SaveChanges(); to outside the while loop.
As per Raven's session design, you can only do 30 interactions with the database during any given instance of a session.
If you refactor your code to only SaveChanges() once (or very few times) per using block, it should work.
For more information, check out the Raven docs : Understanding The Session Object - RavenDB
Say I have something like a support ticket system (simplified as example). It has many users and organizations. Each user can be a member of several organizations, but the typical case would be one org => many users, and most of them belong only to this organization. Each organization has a "tag" which is to be used to construct "ticket numbers" for this organization. Lets say we have an org called StackExchange that wants the tag SES.
So if I open the first ticket of today, I want it to be SES140407-01. The next is SES140407-02 and so on. Doesn't have to be two digits after the dash.
How can I make sure this is generated in a way that makes sure it is 100% unique across the organization (no orgs will have the same tag)?
Note: This does not have to be the document ID in the database - that will probably just be a Guid or similar. This is just a ticket reference - kinda like a slug - that will appear in related emails etc. So it has to be unique, and I would prefer if we didn't "waste" the sequential case numbers hilo style.
Is there a practical way to ensure I get a unique ticket number even if two or more people report a new one at almost the same time?
EDIT: Each Organization is a document in RavenDB, and can easily hold a property like LastIssuedTicketId. My challenge is basically to find the best way to read this field, generate a new one, and store this back in a way that is "race condition safe".
Another edit: To be clear - I intend to generate the ticket ID in my own software. What I am looking for is a way to ask RavenDB "what was the last ticket number", and then when I generate the next one after that, "am I the only one using this?" - so that I give my ticket a unique case id, not necessarily related to what RavenDB considers the document id.
I use for that generic sequence generator written for RavenDB:
public class SequenceGenerator
{
private static readonly object Lock = new object();
private readonly IDocumentStore _docStore;
public SequenceGenerator(IDocumentStore docStore)
{
_docStore = docStore;
}
public int GetNextSequenceNumber(string sequenceKey)
{
lock (Lock)
{
using (new TransactionScope(TransactionScopeOption.Suppress))
{
while (true)
{
try
{
var document = GetDocument(sequenceKey);
if (document == null)
{
PutDocument(new JsonDocument
{
Etag = Etag.Empty,
// sending empty guid means - ensure the that the document does NOT exists
Metadata = new RavenJObject(),
DataAsJson = RavenJObject.FromObject(new { Current = 0 }),
Key = sequenceKey
});
return 0;
}
var current = document.DataAsJson.Value<int>("Current");
current++;
document.DataAsJson["Current"] = current;
PutDocument(document);
{
return current;
}
}
catch (ConcurrencyException)
{
// expected, we need to retry
}
}
}
}
}
private void PutDocument(JsonDocument document)
{
_docStore.DatabaseCommands.Put(
document.Key,
document.Etag,
document.DataAsJson,
document.Metadata);
}
private JsonDocument GetDocument(string key)
{
return _docStore.DatabaseCommands.Get(key);
}
}
It generates incremental unique sequence based on sequenceKey. Uniqueness is guaranteed by raven optimistic concurrency based on Etag. So each sequence has its own document which we update when generate new sequence number. Also, there is lock which reduced extra db calls if several threads are executing at the same moment at the same process (appdomain).
For your case you can use it this way:
var sequenceKey = string.Format("{0}{1:yyMMdd}", yourCompanyPrefix, DateTime.Now);
var nextSequenceNumber = new SequenceGenerator(yourDocStore).GetNextSequenceNumber(sequenceKey);
var nextSequenceKey = string.Format("{0}-{1:00}", sequenceKey, nextSequenceNumber);
I'm using MyBatis with Struts 2 and I'm curious about setting up custom sessions. I will explain more below.
Currently I have a bunch of DAOs which open a session, execute their query, commit if necessary, and close their connection:
public String selectUsername(Integer id){
session = getSession(); //opens the session
mapper = session.getMapper(UserMapper.class); //gets the mapper
String name = mapper.selectUsername(id); //executes query
session.close(); //closes session
return name;
}
Sometimes I have complex things I need to do such as inserting a large object which consists of smaller objects which each have their own DAO:
public boolean insertNewProfile(UserProfile profile)
{
session = getSession();
mapper = session.getMapper(UserMapper.class);
int result = mapper.insertNewProfile(profile);
session.commit();
int id = mapper.selectId(profile.getUserName());
for(UserSkill skill : profile.getSkills())
skill.setUserID(id);
// insert the user's skills in the database
boolean skillResult = skillDAO.insertSkills(profile.getSkills());
session.close();
return (result > 0) && skillResult;
}
In the above, you can see that we are not only inserting the UserProfile, but we are also inserting the associated UserSkills by using the UserSkillDAO. That DAO also opens the session, commits, and closes the session. I'm trying to make this compound query to the database in one session.
I have tried making functions to open, commit, and close the session at the action level (Struts 2) but when I tried that I get errors (### Error querying database. Cause: org.apache.ibatis.executor.ExecutorException: Executor was closed. ### The error may exist in . . .) about the session being closed prematurely to my closing it. I have looked into using MyBatis transactions but a lot of the examples online are for older versions of iBatis or are examples using Spring.
Example of opening/closing sessions at the action level:
public String execute(){
DAOFactory.openSession();
logger.info("Creating ProjectDAO");
projectDAO = DAOFactory.getDao(ProjectDAO.class);
needDAO = DAOFactory.getDao(ProjectNeedDAO.class);
majorDAO = DAOFactory.getDao(ProjectNeedMajorsDAO.class);
skillDAO = DAOFactory.getDao(ProjectNeedSkillDAO.class);
logger.info("Performing selectAllProjects query.");
projects = projectDAO.selectAllProjects();
// loop over the projects and add the project needs
for(int i = 0; i < projects.size(); i++){
// get all of the project needs for a project
projects.get(i).setProjectNeeds((ArrayList<ProjectNeed>)needDAO.selectProjectNeeds(projects.get(i).getProjectID()));
// loop over all of the project needs and add the majors and skills needed
for(int j = 0; j < projects.get(i).getProjectNeeds().size(); j++){
ArrayList<ProjectNeed> needs = projects.get(i).getProjectNeeds();
needs.get(j).setMajors((ArrayList<ProjectNeedMajor>)majorDAO.selectProjectNeedMajors(needs.get(j).getNeedID()));
needs.get(j).setSkills((ArrayList<ProjectNeedSkill>)skillDAO.selectProjectNeedSkills(needs.get(j).getNeedID()));
}
}
DAOFactory.closeSession();
logger.info("Returning success");
return "success";
}
Is there an easy way for me to rework my code so that I can choose when to open and close sessions at the action level instead of at the DAO level as I believe the opening and closing of sessions involves a lot of overhead and is slowing down the performance of my website?
I am working on an app, and need to keep track of how any views a page has. Almost like how SO does it. It is a value used to determine how popular a given page is.
I am concerned that writing to the DB every time a new view needs to be recorded will impact performance. I know this borderline pre-optimization, but I have experienced the problem before. Anyway, the value doesn't need to be real time; it is OK if it is delayed by 10 minutes or so. I was thinking that caching the data, and doing one large write every X minutes should help.
I am running on Windows Azure, so the Appfabric cache is available to me. My original plan was to create some sort of compound key (PostID:UserID), and tag the key with "pageview". Appfabric allows you to get all keys by tag. Thus I could let them build up, and do one bulk insert into my table instead of many small writes. The table looks like this, but is open to change.
int PageID | guid userID | DateTime ViewTimeStamp
The website would still get the value from the database, writes would just be delayed, make sense?
I just read that the Windows Azure Appfabric cache does not support tag based searches, so it pretty much negates my idea.
My question is, how would you accomplish this? I am new to Azure, so I am not sure what my options are. Is there a way to use the cache without tag based searches? I am just looking for advice on how to delay these writes to SQL.
You might want to take a look at http://www.apathybutton.com (and the Cloud Cover episode it links to), which talks about a highly scalable way to count things. (It might be overkill for your needs, but hopefully it gives you some options.)
You could keep a queue in memory and on a timer drain the queue, collapse the queued items by totaling the counts by page and write in one SQL batch/round trip. For example, using a TVP you could write the queued totals with one sproc call.
That of course doesn't guarantee the view counts get written since its in memory and latently written but page counts shouldn't be critical data and crashes should be rare.
You might want to have a look at how the "diagnostics" feature in Azure works. Not because you would use diagnostics for what you are doing at all, but because it is dealing with a similar problem and may provide some inspiration. I am just about to implement a data auditing feature and I want to log that to table storage so also want to delay and bunch the updates together and I have taken a lot of inspiration from diagnostics.
Now, the way Diagnostics in Azure works is that each role starts a little background "transfer" thread. So, whenever you write any traces then that gets stored in a list in local memory and the background thread will (by default) bunch all the requests up and transfer them to table storage every minute.
In your scenario, I would let each role instance keep track of a count of hits and then use a background thread to update the database every minute or so.
I would probably use something like a static ConcurrentDictionary (or one hanging off a singleton) on each webrole with each hit incrementing the counter for the page identifier. You'd need to have some thread handling code to allow multiple request to update the same counter in the list. Alternatively, just allow each "hit" to add a new record to a shared thread-safe list.
Then, have a background thread once per minute increment the database with the number of hits per page since last time and reset the local counter to 0 or empty the shared list if you are going with that approach (again, be careful about the multi threading and locking).
The important thing is to make sure your database update is atomic; If you do a read-current-count from the database, increment it and then write it back then you may have two different web role instances doing this at the same time and thus losing one update.
EDIT:
Here is a quick sample of how you could go about this.
using System.Collections.Concurrent;
using System.Data.SqlClient;
using System.Threading;
using System;
using System.Collections.Generic;
using System.Linq;
class Program
{
static void Main(string[] args)
{
// You would put this in your Application_start for the web role
Thread hitTransfer = new Thread(() => HitCounter.Run(new TimeSpan(0, 0, 1))); // You'd probably want the transfer to happen once a minute rather than once a second
hitTransfer.Start();
//Testing code - this just simulates various web threads being hit and adding hits to the counter
RunTestWorkerThreads(5);
Thread.Sleep(5000);
// You would put the following line in your Application shutdown
HitCounter.StopRunning(); // You could do some cleverer stuff with aborting threads, joining the thread etc but you probably won't need to
Console.WriteLine("Finished...");
Console.ReadKey();
}
private static void RunTestWorkerThreads(int workerCount)
{
Thread[] workerThreads = new Thread[workerCount];
for (int i = 0; i < workerCount; i++)
{
workerThreads[i] = new Thread(
(tagname) =>
{
Random rnd = new Random();
for (int j = 0; j < 300; j++)
{
HitCounter.LogHit(tagname.ToString());
Thread.Sleep(rnd.Next(0, 5));
}
});
workerThreads[i].Start("TAG" + i);
}
foreach (var t in workerThreads)
{
t.Join();
}
Console.WriteLine("All threads finished...");
}
}
public static class HitCounter
{
private static System.Collections.Concurrent.ConcurrentQueue<string> hits;
private static object transferlock = new object();
private static volatile bool stopRunning = false;
static HitCounter()
{
hits = new ConcurrentQueue<string>();
}
public static void LogHit(string tag)
{
hits.Enqueue(tag);
}
public static void Run(TimeSpan transferInterval)
{
while (!stopRunning)
{
Transfer();
Thread.Sleep(transferInterval);
}
}
public static void StopRunning()
{
stopRunning = true;
Transfer();
}
private static void Transfer()
{
lock(transferlock)
{
var tags = GetPendingTags();
var hitCounts = from tag in tags
group tag by tag
into g
select new KeyValuePair<string, int>(g.Key, g.Count());
WriteHits(hitCounts);
}
}
private static void WriteHits(IEnumerable<KeyValuePair<string, int>> hitCounts)
{
// NOTE: I don't usually use sql commands directly and have not tested the below
// The idea is that the update should be atomic so even though you have multiple
// web servers all issuing similar update commands, potentially at the same time,
// they should all commit. I do urge you to test this part as I cannot promise this code
// will work as-is
//using (SqlConnection con = new SqlConnection("xyz"))
//{
// foreach (var hitCount in hitCounts.OrderBy(h => h.Key))
// {
// var cmd = con.CreateCommand();
// cmd.CommandText = "update hits set count = count + #count where tag = #tag";
// cmd.Parameters.AddWithValue("#count", hitCount.Value);
// cmd.Parameters.AddWithValue("#tag", hitCount.Key);
// cmd.ExecuteNonQuery();
// }
//}
Console.WriteLine("Writing....");
foreach (var hitCount in hitCounts.OrderBy(h => h.Key))
{
Console.WriteLine(String.Format("{0}\t{1}", hitCount.Key, hitCount.Value));
}
}
private static IEnumerable<string> GetPendingTags()
{
List<string> hitlist = new List<string>();
var currentCount = hits.Count();
for (int i = 0; i < currentCount; i++)
{
string tag = null;
if (hits.TryDequeue(out tag))
{
hitlist.Add(tag);
}
}
return hitlist;
}
}
Given 1000 documents with a complex data structure. for e.g. a Car class that has three properties, Make and Model and one Id property.
What is the most efficient way in C# to push these documents to raven db (preferably in a batch) without having to query the raven collection individually to find which to update and which to insert. At the moment I have to going like so. Which is totally inefficient.
note : _session is a wrapper on the IDocumentSession where Commit calls SaveChanges and Add calls Store.
private void PublishSalesToRaven(IEnumerable<Sale> sales)
{
var page = 0;
const int total = 30;
do
{
var paged = sales.Skip(page*total).Take(total);
if (!paged.Any()) return;
foreach (var sale in paged)
{
var current = sale;
var existing = _session.Query<Sale>().FirstOrDefault(s => s.Id == current.Id);
if (existing != null)
existing = current;
else
_session.Add(current);
}
_session.Commit();
page++;
} while (true);
}
Your session code doesn't seem to track with the RavenDB api (we don't have Add or Commit).
Here is how you do this in RavenDB
private void PublishSalesToRaven(IEnumerable<Sale> sales)
{
sales.ForEach(session.Store);
session.SaveChanges();
}
Your code sample doesn't work at all. The main problem is that you cannot just switch out the references and expect RavenDB to recognize that:
if (existing != null)
existing = current;
Instead you have to update each property one-by-one:
existing.Model = current.Model;
existing.Make = current.Model;
This is the way you can facilitate change-tracking in RavenDB and many other frameworks (e.g. NHibernate). If you want to avoid writing this uinteresting piece of code I recommend to use AutoMapper:
existing = Mapper.Map<Sale>(current, existing);
Another problem with your code is that you use Session.Query where you should use Session.Load. Remember: If you query for a document by its id, you will always want to use Load!
The main difference is that one uses the local cache and the other not (the same applies to the equivalent NHibernate methods).
Ok, so now I can answer your question:
If I understand you correctly you want to save a bunch of Sale-instances to your database while they should either be added if they didn't exist or updated if they existed. Right?
One way is to correct your sample code with the hints above and let it work. However that will issue one unnecessary request (Session.Load(existingId)) for each iteration. You can easily avoid that if you setup an index that selects all the Ids of all documents inside your Sales-collection. Before you then loop through your items you can load all the existing Ids.
However, I would like to know what you actually want to do. What is your domain/use-case?
This is what works for me right now. Note: The InjectFrom method comes from Omu.ValueInjecter (nuget package)
private void PublishSalesToRaven(IEnumerable<Sale> sales)
{
var ids = sales.Select(i => i.Id);
var existingSales = _ravenSession.Load<Sale>(ids);
existingSales.ForEach(s => s.InjectFrom(sales.Single(i => i.Id == s.Id)));
var existingIds = existingSales.Select(i => i.Id);
var nonExistingSales = sales.Where(i => !existingIds.Any(x => x == i.Id));
nonExistingSales.ForEach(i => _ravenSession.Store(i));
_ravenSession.SaveChanges();
}