Should I use unique tables for every user? - sql

I'm working on an web app that collects traffic information for websites that use my service. Think google analytics but far more visual. I'm using SQL Server 2012 for the backbone of my app and am considering using MongoDB as the data gathering analytic side of the site.
If I have 100 users with an average of 20,000 hits a month on their site, that's 2,000,000 records in a single collection that will be getting queried.
Should I use MongoDB to store this information (I'm new to it and new things are intimidating)?
Should I dynamically create new collections/tables for every new user?
Thanks!

With MongoDB the collection (aka sql table) can get quite big without much issue. That is largely what it is designed for. The Mongo is part HuMONGOus (pretty clever eh). This is a great use for mongodb which is great at storing point in time information.
Options :
1. New Collection for each Client
very easy to do I use a GetCollectionSafe Method for this
public class MongoStuff
private static MongoDatabase GetDatabase()
{
var databaseName = "dbName";
var connectionString = "connStr";
var client = new MongoClient(connectionString);
var server = client.GetServer();
return server.GetDatabase(databaseName);
}
public static MongoCollection<T> GetCollection<T>(string collectionName)
{
return GetDatabase().GetCollection<T>(collectionName);
}
public static MongoCollection<T> GetCollectionSafe<T>(string collectionName)
{
//var db = GetDatabase();
var db = GetDatabase();
if (!db.CollectionExists(collectionName)) {
db.CreateCollection(collectionName);
}
return db.GetCollection<T>(collectionName);
}
}
then you can call with :
var collection = MongoStuff.GetCollectionSafe<Record>("ClientName");
Running this script
static void Main(string[] args)
{
var times = new List<long>();
for (int i = 0; i < 1000; i++)
{
Stopwatch watch = new Stopwatch();
watch.Start();
MongoStuff.GetCollectionSafe<Person>(String.Format("Mark{0:000}", i));
watch.Stop();
Console.WriteLine(watch.ElapsedMilliseconds);
times.Add(watch.ElapsedMilliseconds);
}
Console.WriteLine(String.Format("Max : {0} \nMin : {1} \nAvg : {2}", times.Max(f=>f), times.Min(f=> f), times.Average(f=> f)));
Console.ReadKey();
}
Gave me (on my laptop)
Max : 180
Min : 1
Avg : 6.635
Benefits :
Ease of splitting data if one client needs to go on their own
Might match your brain map of the problem
Cons :
Almost impossible to do aggregate data over all collections
Hard to find collections in Management studios (like robomongo)
2. One Large Collection
Use one collection for it all access it this way
var coll = MongoStuff.GetCollection<Record>("Records");
Put an index on the table (the index will make reads orders of magnitude quicker)
coll.EnsureIndex(new IndexKeysBuilder().Ascending("ClientId"));
needs to only be run once (per collection, per index )
Benefits :
One Simple place to find data
Aggregate over all clients possible
More traditional Mongodb setup
Cons :
All Clients Data is intermingled
May not mentally map as well
Just as a reference the mongodb limits for sizes are here :
[http://docs.mongodb.org/manual/reference/limits/][1]
3. Store only aggregated data
If you are never intending to break down to an individual record just save the aggregates themselves.
Page Loads :
# Page Total Time Average Time
15 Default.html 1545 103

I will let someone else tackle the MongoDB side of your question as I don't feel I'm the best person to comment on it, I would point out that MongoDB is a very different animal and you'll lose a lot of the RI you enjoy in SQL.
In terms of SQL design I would not use a different schema for each customer approach. Your database schema and backups could grow uncontrollably, maintaining a dynamically growing schema will be a nightmare.
I would suggest one of two approaches:
Either you can create a new database for each customer:
This is more secure as users cannot access each other's data (just use different credentials) and users are easier to manage/migrate and separate.
However many hosting providers charge per database, it will cost more to run and maintain and should you wish to compare data across users it gets much more challenging.
Your second approach is to simply host all users in a single DB, your tables will grow large (although 2 million rows is not over the top for a well maintained SQL DB). You would simply use a UserID column to discriminate.
The emphasis will be on you to get the performance you need through proper indexing
Users' data will exist in the same system and there's no SQL defense against users accessing each other's data - your code will have to be good!

Related

how to improve speed with using RallyAPIForJava

Now, I use RallyApiForJava to get story from rally with getRequest method.It's very slow when get about 500 stories from rally.How to improve the speed.
Limiting the scope helps performance. Here is an example of limiting the query by LastUpdateDate, scoping the request to a project and fetching only some fields:
int x = -30;
Calendar cal = GregorianCalendar.getInstance();
cal.add( Calendar.DAY_OF_YEAR, x);
Date nDaysAgoDate = cal.getTime();
SimpleDateFormat iso = new SimpleDateFormat("yyyy-MM-dd'T'HH:mmZ");
QueryRequest defectRequest = new QueryRequest("Defect");
defectRequest.setProject(projectRef);
defectRequest.setFetch(new Fetch(new String[] {"Name", "FormattedID","State","Priority"}));
defectRequest.setQueryFilter(new QueryFilter("LastUpdateDate", ">", iso.format(nDaysAgoDate)));
Hydrating collections (e.g. Tasks on User Stories) requires a separate request, but if you only need a count of items in the collection, you may save time and not hydrate the collection. CRUD examples available in the User Guide illustrate this API extensively and I don't think that on the user side ,as far as the custom code, there are ways to make it faster other than to limit the results to only what's necessary.

Existing saga instances after applying the [Unique] attribute to IContainSagaData property

I have a bunch of existing sagas in various states of a long running process.
Recently we decided to make one of the properties on our IContainSagaData implementation unique by using the Saga.UniqueAttribute (about which more here http://docs.particular.net/nservicebus/nservicebus-sagas-and-concurrency).
After deploying the change, we realized that all our old saga instances were not being found, and after further digging (thanks Charlie!) discovered that by adding the unique attribute, we were required to data fix all our existing sagas in Raven.
Now, this is pretty poor, kind of like adding a index to a database column and then finding that all the table data no longer select-able, but being what it is, we decided to create a tool for doing this.
So after creating and running this tool we've now patched up the old sagas so that they now resemble the new sagas (sagas created since we went live with the change).
However, despite all the data now looking right we're still not able to find old instances of the saga!
The tool we wrote does two things. For each existing saga, the tool:
Adds a new RavenJToken called "NServiceBus-UniqueValue" to the saga metadata, setting the value to the same value as our unique property for that saga, and
Creates a new document of type NServiceBus.Persistence.Raven.SagaPersister.SagaUniqueIdentity, setting the SagaId, SagaDocId, and UniqueValue fields accordingly.
My questions are:
Is it sufficient to simply make the data look correct or is there something else we need to do?
Another option we have is to revert the change which added the unique attribute. However in this scenario, would those new sagas which have been created since the change went in be OK with this?
Code for adding metadata token:
var policyKey = RavenJToken.FromObject(saga.PolicyKey); // This is the unique field
sagaDataMetadata.Add("NServiceBus-UniqueValue", policyKey);
Code for adding new doc:
var policyKeySagaUniqueId = new SagaUniqueIdentity
{
Id = "Matlock.Renewals.RenewalSaga.RenewalSagaData/PolicyKey/" + Guid.NewGuid().ToString(),
SagaId = saga.Id,
UniqueValue = saga.PolicyKey,
SagaDocId = "RenewalSaga/" + saga.Id.ToString()
};
session.Store(policyKeySagaUniqueId);
Any help much appreciated.
EDIT
Thanks to David's help on this we have fixed our problem - the key difference was we used the SagaUniqueIdentity.FormatId() to generate our document IDs rather than a new guid - this was trivial tio do since we were already referencing the NServiceBus and NServiceBus.Core assemblies.
The short answer is that it is not enough to make the data resemble the new identity documents. Where you are using Guid.NewGuid().ToString(), that data is important! That's why your solution isn't working right now. I spoke about the concept of identity documents (specifically about the NServiceBus use case) during the last quarter of my talk at RavenConf 2014 - here are the slides and video.
So here is the long answer:
In RavenDB, the only ACID guarantees are on the Load/Store by Id operations. So if two threads are acting on the same Saga concurrently, and one stores the Saga data, the second thread can only expect to get back the correct saga data if it is also loading a document by its Id.
To guarantee this, the Raven Saga Persister uses an identity document like the one you showed. It contains the SagaId, the UniqueValue (mostly for human comprehension and debugging, the database doesn't technically need it), and the SagaDocId (which is a little duplication as its only the {SagaTypeName}/{SagaId} where we already have the SagaId.
With the SagaDocId, we can use the Include feature of RavenDB to do a query like this (which is from memory, probably wrong, and should only serve to illustrate the concept as pseudocode)...
var identityDocId = // some value based on incoming message
var idDoc = RavenSession
// Look at the identity doc's SagaDocId and pull back that document too!
.Include<SagaIdentity>(identityDoc => identityDoc.SagaDocId)
.Load(identityDocId);
var sagaData = RavenSession
.Load(idDoc.SagaDocId); // Already in-memory, no 2nd round-trip to database!
So then the identityDocId is very important because it describes the uniqueness of the value coming from the message, not just any old Guid will do. So what we really need to know is how to calculate that.
For that, the NServiceBus saga persister code is instructive:
void StoreUniqueProperty(IContainSagaData saga)
{
var uniqueProperty = UniqueAttribute.GetUniqueProperty(saga);
if (!uniqueProperty.HasValue) return;
var id = SagaUniqueIdentity.FormatId(saga.GetType(), uniqueProperty.Value);
var sagaDocId = sessionFactory.Store.Conventions.FindFullDocumentKeyFromNonStringIdentifier(saga.Id, saga.GetType(), false);
Session.Store(new SagaUniqueIdentity
{
Id = id,
SagaId = saga.Id,
UniqueValue = uniqueProperty.Value.Value,
SagaDocId = sagaDocId
});
SetUniqueValueMetadata(saga, uniqueProperty.Value);
}
The important part is the SagaUniqueIdentity.FormatId method from the same file.
public static string FormatId(Type sagaType, KeyValuePair<string, object> uniqueProperty)
{
if (uniqueProperty.Value == null)
{
throw new ArgumentNullException("uniqueProperty", string.Format("Property {0} is marked with the [Unique] attribute on {1} but contains a null value. Please make sure that all unique properties are set on your SagaData and/or that you have marked the correct properties as unique.", uniqueProperty.Key, sagaType.Name));
}
var value = Utils.DeterministicGuid.Create(uniqueProperty.Value.ToString());
var id = string.Format("{0}/{1}/{2}", sagaType.FullName.Replace('+', '-'), uniqueProperty.Key, value);
// raven has a size limit of 255 bytes == 127 unicode chars
if (id.Length > 127)
{
// generate a guid from the hash:
var key = Utils.DeterministicGuid.Create(sagaType.FullName, uniqueProperty.Key);
id = string.Format("MoreThan127/{0}/{1}", key, value);
}
return id;
}
This relies on Utils.DeterministicGuid.Create(params object[] data) which creates a Guid out of an MD5 hash. (MD5 sucks for actual security but we are only looking for likely uniqueness.)
static class DeterministicGuid
{
public static Guid Create(params object[] data)
{
// use MD5 hash to get a 16-byte hash of the string
using (var provider = new MD5CryptoServiceProvider())
{
var inputBytes = Encoding.Default.GetBytes(String.Concat(data));
var hashBytes = provider.ComputeHash(inputBytes);
// generate a guid from the hash:
return new Guid(hashBytes);
}
}
}
That's what you need to replicate to get your utility to work properly.
What's really interesting is that this code made it all the way to production - I'm surprised you didn't run into trouble before this, with messages creating new saga instances when they really shouldn't because they couldn't find the existing Saga data.
I almost think it might be a good idea if NServiceBus would raise a warning any time you tried to find Saga Data by anything other than a [Unique] marked property, because it's an easy thing to forget to do. I filed this issue on GitHub and submitted this pull request to do just that.

"Convert" Entity Framework program to raw SQL

I used Entity Framework to create a prototype for a project and now that it's working I want to make the program ready for production.
I face many challenges with EF, the biggest one being the concurrency management (it's a financial software).
Given that it seems to have no way to handle pessimistic concurrency with EF, I have to switch to stored procs in SQL.
To be honest I'm a bit afraid of the workload that may represent.
I would like to know if anybody have been in the same situation before and what is the best strategy to convert a .net code using EF to raw SQL.
Edit:
I'm investigating CLR but it's not clear if pessimistic concurency can be manage with it. is it an option more interesting than TSQl in this case ? It would allow me to reuse part of my C# code and structure of function calling another functions, if I understand well.
I was there and the good news is you don't have to give up Entity Framework if you don't want to. The bad news is you have to update the database yourself. Which isn't as hard as it seems. I'm currently using EF 5 but plan to go to EF 6. I don't see why this still wouldn't work for EF 6.
First thing is in the constructor of the DbContext cast it to IObjectContextAdapter and get access to the ObjectContext. I make a property for this
public virtual ObjectContext ObjContext
{
get
{
return ((IObjectContextAdapter)this).ObjectContext;
}
}
Once you have that subscribe to the SavingChanges event - this isn't our exact code some things are copied out of other methods and redone. This just gives you an idea of what you need to do.
ObjContext.SavingChanges += SaveData;
private void SaveData(object sender, EventArgs e)
{
var context = sender as ObjectContext;
if (context != null)
{
context.DetectChanges();
var tsql = new StringBuilder();
var dbParams = new List<KeyValuePair<string, object>>();
var deletedEntites = context.ObjectStateManager.GetObjectStateEntries(EntityState.Deleted);
foreach (var delete in deletedEntites)
{
// Set state to unchanged - so entity framework will ignore
delete.ChangeState(EntityState.Unchanged);
// Method to generate tsql for deleting entities
DeleteData(delete, tsql, dbParams);
}
var addedEntites = context.ObjectStateManager.GetObjectStateEntries(EntityState.Added);
foreach (var add in addedEntites)
{
// Set state to unchanged - so entity framework will ignore
add.ChangeState(EntityState.Unchanged);
// Method to generate tsql for added entities
AddData(add, tsql, dbParams);
}
var editedEntites = context.ObjectStateManager.GetObjectStateEntries(EntityState.Modified);
foreach (var edit in editedEntites)
{
// Method to generate tsql for updating entities
UpdateEditData(edit, tsql, dbParams);
// Set state to unchanged - so entity framework will ignore
edit.ChangeState(EntityState.Unchanged);
}
if (!tsql.ToString().IsEmpty())
{
var dbcommand = Database.Connection.CreateCommand();
dbcommand.CommandText = tsql.ToString();
foreach (var dbParameter in dbParams)
{
var dbparam = dbcommand.CreateParameter();
dbparam.ParameterName = dbParameter.Key;
dbparam.Value = dbParameter.Value;
dbcommand.Parameters.Add(dbparam);
}
var results = dbcommand.ExecuteNonQuery();
}
}
}
Why we set the entity to unmodified after the update because you can do
var changed properties = edit.GetModifiedProperties();
to get a list of all the changed properties. Since all the entities are now marked as unchanged EF will not send any updates to SQL.
You will also need to mess with the metadata to go from entity to table and property to fields. This isn't that hard to do but messing the metadata does take some time to learn. Something I still struggle with sometimes. I refactored all that out into an IMetaDataHelper interface where I pass it in the entity type and property name to get the table and field back - along with caching the result so I don't have to query metadata all the time.
At the end the tsql is a batch that has all the T-SQL how we want it with the locking hints and containing the transaction level. We also change numeric fields from just being set to nfield = 10 but to be nfield = nfield + 2 in the TSQL if the user updated them by 2 to avoid the concurrency issue as well.
What you wont get to is having SQL locked once someone starts to edit your entity but I don't see how you would get that with stored procedures as well.
All in all it took me about 2 solid days to get this all up and running for us.

Symfony2, Doctrine, add\insert\update best solution for big count of queries

Let's imagine we have this code:
while (true)
{
foreach($array as $row)
{
$item = $em->getRepository('reponame')->findOneBy(array('filter'));
if (!$item)
{
$needPersist = true;
$item = new Item();
}
$item->setItemName()
// and so on ...
if ($needPersist)
{
$em->persist();
}
}
$em->flush();
}
So, the point is that code will be executed a lot of times (while server won't die :) ). And we want to optimize it. Every time we:
Select already entry from repository.
If entry not exists, create it.
Set new (update) vars to it.
Apply actions (flush).
So question is - how to avoid unnecessary queries and optimize "check if entry is exist"? Because when there are 100-500 queries it's not so scary... But when it comes up to 1000-10000 for one while loop - it's too much.
PS: Each entry in DB is unique by several columns (not only by ID).
Instead of fetching results one-by-one, load all results with one query.
Eg.
let's say your filter wants to load ids 1, 2, 10. So QB would be something like:
$allResults = ...
->where("o.id IN (:ids)")->setParameter("ids", $ids)
->getQuery()
->getResults() ;
"foreach" of these results, do your job of updating them and flushing
While doing that loop, save ids of those fetched objects in new array
Compare that array with original one using array_diff. Now you have ids that were not fetched the first time
Rinse and repeat :)
And don't forget $em->clear() to free memory
While this can still be slow when working with 10.000 records (dunno, never tested), it will be much faster to have 2 big queries than 10.000 small ones.
Regardless if you need them to persist or not after the update, retrieving 10k+ and up entries from the database and hydrating them to php objects is going to need too much memory. In such cases you should better fallback to the Doctrine DBAL Layer and fire pure SQL queries.

How can I speed my Entity Framework code?

My SQL and Entity Framework knowledge is a somewhat limited. In one Entity Framework (4) application, I notice it takes forever (about 2 minutes) to complete one of my method calls. The first queries do not take much time, but when I loop through the Entity Framework objects returned by the queries, even though I am only reading (not modifying) the data I supposedly got, it takes forever to complete the nested loops, even though there are only dozens of entries in each list and a few levels of looping.
I expect the example below could be re-written with a fancier query that could probably include all of the filtering I am doing in my loops with some SQL words I don't really know how to use, so if someone could show me what the equivalent SQL expression would be, that would be extremely educational to me and probably solve my current performance problem.
Moreover, since other parts of this and other applications I develop often want to do more complex computations on SQL data, I would also like to know a good way to retrieve data from Entity Framework to local memory objects that do not have huge delays in reading them. In my LINQ-to-SQL project there was a similar performance problem, and I solved it by refactoring the whole application to load all SQL data into parallel objects in RAM, which I had to write myself, and I wonder if there isn't a better way to either tell Entity Framework to not keep doing whatever high-latency communication it is doing, or to load into local RAM objects.
In the example below, the code gets a list of food menu items for a member (i.e. a person) on a certain date via a SQL query, and then I use other queries and loops to filter out the menu items on two criteria: 1) If the member has a rating of zero for any group id which the recipe is a member of (a many-to-many relationship) and 2) If the member has a rating of zero for the recipe itself.
Example:
List<PFW_Member_MenuItem> MemberMenuForCookDate =
(from item in _myPfwEntities.PFW_Member_MenuItem
where item.MemberID == forMemberId
where item.CookDate == onCookDate
select item).ToList();
// Now filter out recipes in recipe groups rated zero by the member:
List<PFW_Member_Rating_RecipeGroup> ExcludedGroups =
(from grpRating in _myPfwEntities.PFW_Member_Rating_RecipeGroup
where grpRating.MemberID == forMemberId
where grpRating.Rating == 0
select grpRating).ToList();
foreach (PFW_Member_Rating_RecipeGroup grpToExclude in ExcludedGroups)
{
List<PFW_Member_MenuItem> rcpsToRemove = new List<PFW_Member_MenuItem>();
foreach (PFW_Member_MenuItem rcpOnMenu in MemberMenuForCookDate)
{
PFW_Recipe rcp = GetRecipeById(rcpOnMenu.RecipeID);
foreach (PFW_RecipeGroup group in rcp.PFW_RecipeGroup)
{
if (group.RecipeGroupID == grpToExclude.RecipeGroupID)
{
rcpsToRemove.Add(rcpOnMenu);
break;
}
}
}
foreach (PFW_Member_MenuItem rcpToRemove in rcpsToRemove)
MemberMenuForCookDate.Remove(rcpToRemove);
}
// Now filter out recipes rated zero by the member:
List<PFW_Member_Rating_Recipe> ExcludedRecipes =
(from rcpRating in _myPfwEntities.PFW_Member_Rating_Recipe
where rcpRating.MemberID == forMemberId
where rcpRating.Rating == 0
select rcpRating).ToList();
foreach (PFW_Member_Rating_Recipe rcpToExclude in ExcludedRecipes)
{
List<PFW_Member_MenuItem> rcpsToRemove = new List<PFW_Member_MenuItem>();
foreach (PFW_Member_MenuItem rcpOnMenu in MemberMenuForCookDate)
{
if (rcpOnMenu.RecipeID == rcpToExclude.RecipeID)
rcpsToRemove.Add(rcpOnMenu);
}
foreach (PFW_Member_MenuItem rcpToRemove in rcpsToRemove)
MemberMenuForCookDate.Remove(rcpToRemove);
}
You can use EFProf http://www.hibernatingrhinos.com/products/EFProf to track see exactly what EF is sending to SQL. It can also show you how many queries you are sending and how many unique queries. It also provides you some analysis of each query (e.g. is it unbound etc). Entity Framework with its navigation properties, it is quite easy to not realize you are making a db request. When you are in a loop, and have a navigation property, you get in to the N + 1 problem.
You could use the Keyword Virtual on your List parts of your model if you are using code first to enable proxying, that way you will not have to get all the data back at once, only as you need it.
Also consider NoTracking for read only data
context.bigTable.MergeOption = MergeOption.NoTracking;