How to choose a best merging scale? - physics

I want to do merging, using CKKW-L method ( or other method...) , but I am not familiar with it. I get problem in how to choose a best merging scale, since in the Pythia8 example, the CKKW-L merging scale is already fixed ( they choose t_{ms}=30 ). I think maybe for different process, the merging scale is different. So is there any conditions to choose a best merging scale?
For example, should I pick some merging scales and output some jet information then choose which one is best?

Yes, I think its depend on your process. So, you need to see if you are not restricting your phase-space much. I don't have much idea but I found two interesting links about this may be it will help you. They are
http://home.thep.lu.se/~prestel/Talks/talk011112.pdf
http://home.thep.lu.se/Pythia/pythia82html/CKKWLMerging.html

Related

How to create random SQLqueries based on database?

For testing purposes I require a large amount of queries.
Creating this manually is not an option, so I am searching a tool which will do this automatically.
Sadly, the only solution I found (sqlsmith), is limited to postgres and SQLite.
Are there any similar tools for SQL-Server?
"I do not know from what random place people will want to travel to a random other place, so instead, let's create roads for every possible combination of origin and destination".
That sounds kind of insane, doesn't it? The same applies to what you seem to be wanting to achieve. You basically are hoping to find a tool that generates random queries against your database so you can feed them to the tuning advisor, which will then suggest query optimization indexes for hypothetical queries.
If you want to performance tune your database, you should have a pretty good idea of the type of questions your users will be throwing at it, as well as the structure of your data. Typical questions that will help you get started would be things like:
What is the most common search my users would do against this table?
What criteria are they most likely to use?
Which columns are guaranteed or likely to contain unique data in every row?
Which columns will most likely have a low selectivity of data? (I.e. Male/Female)
are you looking for generate random data for multiple tables ? we generally use redgate data genearator tool for the same.
for SQL tuning purpose I would suggest
https://www.brentozar.com/blitzindex/
http://www.nguyenlamminhdieu.com/zone/213/news/vi-VN/zone/213/news/351-database-engine-tuning-advisor-in-sql-server.aspx

Change detection in complex system

This might seem like a fairly specific question but I'm wondering if there is any technology/pattern out there that might help me in a current project. I have a hugely complex database which is updated by multiple systems. I now need to do change tracking on various bits of data that is spread across multiple tables so that I can send it to a third party system.
I've considered a number of options but unfortunately I can't seem to come to any other conclusion than using database triggers. I'm thinking of storing a flag in a table (or queue) to identify the rows that have changed and then building an xml diff containing the changed data to send to a web service. This feels a little dirty so I was wondering if anyone could think of a better alternative.
Depending on the database platform you're using, you might look into Change Data Capture. Since you mention .NET, here's some info about it: http://technet.microsoft.com/en-us/library/bb522489(v=sql.105).aspx
Other database systems may offer something similar.
Another option would be insert/update/delete triggers on the tables, however triggers should be approached carefully as they can cause some significant performance problems if not done right.
And yet another option still would be what you describe - some sort of flag to monitor for changes. A simple CREATED and MODIFIED timestamp fields can go a long way here, as rather than just a bit indicator suggesting that the row may need attention, you'll know when the update happened, and your export process can be programmed accordingly (e.g., select * from table where modified > getdate()-1).

Should I be concerned that ORMs, by default, return all columns?

In my limited experience in working with ORMs (so far LLBL Gen Pro and Entity Framework 4), I've noticed that inherently, queries return data for all columns. I know NHibernate is another popular ORM, and I'm not sure that this applies with it or not, but I would assume it does.
Of course, I know there are workarounds:
Create a SQL view and create models and mappings on the view
Use a stored procedure and create models and mappings on the result set returned
I know that adhering to certain practices can help mitigate this:
Ensuring your row counts are reasonably limited when selecting data
Ensuring your tables aren't excessively wide (large number of columns and/or large data types)
So here are my questions:
Are the above practices sufficient, or should I still consider finding ways to limit the number of columns returned?
Are there other ways to limit returned columns other than the ones I listed above?
How do you typically approach this in your projects?
Thanks in advance.
UPDATE: This sort of stems from the notion that SELECT * is thought of as a bad practice. See this discussion.
One of the reasons to use an ORM of nearly any kind is to delay a lot of those lower-level concerns and focus on the business logic. As long as you keep your joins reasonable and your table widths sane, ORMs are designed to make it easy to get data in and out, and that requires having the entire row available.
Personally, I consider issues like this premature optimization until encountering a specific case that bogs down because of table width.
First of : great question, and about time someone asked this! :-)
Yes, the fact an ORM typically returns all columns for a database table is something you need to take into consideration when designing your systems. But as you've mentioned - there are ways around this.
The main fact for me is to be aware that this is what happens - either a SELECT * FROM dbo.YourTable, or (better) a SELECT (list of all columns) FROM dbo.YourTable.
This is not a problem when you really want the whole object and all its properties, and as long as you load a few rows, that's fine, too - the convenience beats the raw performance.
You might need to think about changing your database structures a little bit - things like:
maybe put large columns like BLOBs into separate tables with a 1:1 link to your base table - that way, a select on the parent tables doesn't grab all those large blobs of data
maybe put groups of columns that are optional, that might only show up in certain situations, into separate tables and link them - again, just to keep the base tables lean'n'mean
Also: avoid trying to "arm-wrestle" your ORM into doing bulk operations - that's just not their strong point.
And: keep an eye on performance, and try to pick an ORM that allows you to change certain operations into e.g. stored procedures - Entity Framework 4 allows this. So if the deletes are killing you - maybe you just write a Delete stored proc for that table and handle that operation differently.
The question here covers your options fairly well. Basically you're limited to hand-crafting the HQL/SQL. It's something you want to do if you run into scalability problems, but if you do in my experience it can have a very large positive impact. In particular, it saves a lot of disk and network IO, so your scalability can take a big jump. Not something to do right away though: analyse then optimise.
Are there other ways to limit returned columns other than the ones I listed above?
NHibernate lets you add projections to your queries so you wouldn't need to use views or procs just to limit your columns.
For me this has only been an issue if the tables has LOTS of columns > 30 or if the column had alot of data for example a over 5000 character in a field.
The approach I have used is to just map another object to the existing table but with only the fields I need. So for a search that populates a table with 100 rows I would have a
MyObjectLite, but when I click to view the Details of that Row I would call a GetById and return a MyObject that has all the columns.
Another approach is to use custom SQL, Stroed procs but I only think you should go down this path if you REALLY need the performance gain and have users complaining. SO unless there is a performance problem do not waste your time trying to fix a problem that does not exist.
You can limit number of returned columns by using Projection and Transformers.AliasToBean and DTO here how it looks in Criteria API:
.SetProjection(Projections.ProjectionList()
.Add(Projections.Property("Id"), "Id")
.Add(Projections.Property("PackageName"), "Caption"))
.SetResultTransformer(Transformers.AliasToBean(typeof(PackageNameDTO)));
In LLBLGen Pro, you can return Typed Lists which not only allow you to define which fields are returned but also allow you to join data so you can pull a custom list of fields from multiple tables.
Overall, I agree that for most situations, this is premature optimization.
One of the big advantages of using LLBLGen and other ORMs as well (I just feel confident speaking about LLBLGen because I have used it since its inception) is that the performance of the data access has been optimized by folks who understand the issues better than your average bear.
Whenever they figure out a way to further speed up their code, you get those changes "for free" just by re-generating your data layer or by installing a new dll.
Unless you consider yourself an expert at writing data access code, ORMs probably improve most developers efficacy and accuracy.

Is adding a bit mask to all tables in a database useful?

A colleague is adding a bit mask to all our database tables. In theory this is so we can track certain properties of each row across the entire system. For example...
Is the row shipped with the system or added by the client once they've started using the system
Has the row been deleted from the table (soft deletes)
Is the row a default value within a set of rows
Is this a good idea? Are there other uses where this approach would be beneficial?
My preference is these properties are obviously important, and having a dedicated column for each property is justified to make what is happening clearer to fellow developers.
Not really, no.
You can only store bits in it, and only so many. So, seems to me like it's asking for a lot of application-level headaches later on keeping track of what each one means and potential abuse later on because "hey they're everywhere". Is every bitmask on every table going to use the same definition for each bit? Will it be different on each table? What happens when you run out of bits? Add another?
There are lots of potential things you could do with it, but it begs the question "why do it that way instead of identifying what we will use those bits for right now and just make them proper columns?" You don't really circumvent the possibility of schema changes this way anyway, so it seems like it's trying to solve a problem that you can't really "solve" and especially not with bitmasks.
Each of the things you mentioned can be (and should be) solved with real columns on the database, and those are far more self-documenting than "bit 5 of the BitMaskOptions field".
A dedicated column is is better, because it's undoubtedly more obvious and less error-prone. SQL Server already stores BIT columns efficiently, so performance is not an issue.
The only argument I could see for a bitmask is not having to change the DB schema every time you add a new flag, but really, if you're adding new flags that often then something is not right.
No, it is not even remotely a good idea IMO. Each column should represent a single concept and value. Bit masks have all kinds of performance and maintenance problems. How do new developers understand what each of the bits mean? How do you prevent someone from accidentally mixing the meaning of the order of the bits?
It would be better to have a many-to-many relationship or separate columns rather than a bit mask. You will be able to index on it, enable referential integrity (depending on approach), easily add new items and change the order of the results to fit different reports and so on.

Storing multiple choice values in database

Say I offer user to check off languages she speaks and store it in a db. Important side note, I will not search db for any of those values, as I will have some separate search engine for search.
Now, the obvious way of storing these values is to create a table like
UserLanguages
(
UserID nvarchar(50),
LookupLanguageID int
)
but the site will be high load and we are trying to eliminate any overhead where possible, so in order to avoid joins with main member table when showing results on UI, I was thinking of storing languages for a user in the main table, having them comma separated, like "12,34,65"
Again, I don't search for them so I don't worry about having to do fulltext index on that column.
I don't really see any problems with this solution, but am I overlooking anything?
Thanks,
Andrey
Don't.
You don't search for them now
Data is useless to anything but this one situation
No data integrity (eg no FK)
You still have to change to "English,German" etc for display
"Give me all users who speak x" = FAIL
The list is actually a presentation issue
It's your system, though, and I look forward to answering the inevitable "help" questions later...
You might not be missing anything now, but when you're requirements change you might regret that decision. You should store it normalized like your first instinct suggested. That's the correct approach.
What you're suggesting is a classic premature optimization. You don't know yet whether that join will be a bottleneck, and so you don't know whether you're actually buying any performance improvement. Wait until you can profile the thing, and then you'll know whether that piece needs to be optimized.
If it does, I would consider a materialized view, or some other approach that pre-computes the answer using the normalized data to a cache that is not considered the book of record.
More generally, there are a lot of possible optimizations that could be done, if necessary, without compromising your design in the way you suggest.
This type of storage has almost ALWAYS come back to haunt me. For one, you are not even in first normal form. For another, some manager or the other will definitely come back and say.. "hey, now that we store this, can you write me a report on... "
I would suggest going with a normalized design. Put it in a separate table.
Problems:
You lose join capability (obviously).
You have to reparse the list on each page load / post back. Which results in more code client side.
You lose all pretenses of trying to keep database integrity. Just imagine if you decide to REMOVE a language later on... What's the sql going to be to fix all of your user profiles?
Assuming your various profile options are stored in a lookup table in the DB, you still have to run "30 queries" per profile page. If they aren't then you have to code deploy for each little change. bad, very bad.
Basing a design decision on something that "won't happen" is an absolute recipe for failure. Sure, the business people said they won't ever do that... Until they think of a reason they absolutely must do it. Today. Which will be promptly after you finish coding this.
As I stated in a comment, 30 queries for a low use page is nothing. Don't sweat it, and definitely don't optimize unless you know for darn sure it's necessary. Guess how many queries SO does for it's profile page?
I generally stay away at the solution you described, you asking for troubles when you store relational data in such fashion.
As alternative solution:
You could store as one bitmasked integer, for example:
0 - No selection
1 - English
2 - Spanish
4 - German
8 - French
16 - Russian
--and so on powers of 2
So if someone selected English and Russian the value would be 17, and you could easily query the values with Bitwise operators.
Premature optimization is the root of all evil.
EDIT: Apparently the context of my observation has been misconstrued by some - and hence the downvotes. So I will clarify.
Denormalizing your model to make things easier and/or 'more performant' - such as creating concatenated columns to represent business information (as in the OP case) - is what I refer to as a "premature optimization".
While there may be some extreme edge cases where there is no other way to get the necessary performance necessary for a particular problem domain - one should rarely assume this is the case. In general, such premature optimizations cause long-term grief because they are hard to undo - changing your data model once it is in production takes a lot more effort than when it initially deployed.
When designing a database, developers (and DBAs) should apply standard practices like normalization to ensure that their data model expresses the business information being collected and managed. I don't believe that proper use of data normalization is an "optimization" - it is a necessary practice. In my opinion, data modelers should always be on the lookout for models that could be restructured to (at least) third normal form (3NF).
If you're not querying against them, you don't lose anything by storing them in a form like your initial plan.
If you are, then storing them in the comma-delimited format will come back to haunt you, and I doubt that any speed savings would be significant, especially when you factor in the work required to translate them back.
You seem to be extremely worried about adding in a few extra lookup table joins. In my experience, the time it takes to actually transmit the HTML response and have the browser render it far exceed a few extra table joins. Especially if you are using indexes for your primary and foreign keys (as you should be). It's like you are planning a multi-day cross-country trip and you are worried about 1 extra 10 minute bathroom stop.
The lack of long-term flexibility and data integrity are not worth it for such a small optimization (which may not be necessary or even noticeable).
Nooooooooooooooooo!!!!!!!!
As stated very well in the above few posts.
If you want a contrary view to this debate, look at wordpress. Tables are chocked full of delimited data, and it's a great, simple platform.