Purpose of GUIDs in COM - com

What is the purpose of GUIDs in COM?
Is it only to avoid name conflicts?

It serves the exact same purpose as a name. A COM client can ask the system to create a COM object using a simple identifier (CoCreateInstance). That identifier has the scope of the full machine. Different chunks of code written by programmers that don't know each other and work for different companies live at that scope.
The problem with names is that people suck at picking good names. The odds that one programmer picks the exact same name as another programmer, 3000 miles away and 5 years ago are high. Names like "Record", "Database" etc would be popular choices. Evident at this website too, lot's of users named "Jason" or "Mike". They don't mind, they know their own name when they review their profile. Context. Impossible for me to find them back though when they send me an email with just their user name, following up on a question with a generic subject string.
Getting a name collision and COM creating the wrong object is disastrous. The program stops working because it is getting a completely wrong object. Finding out why is difficult, the error message sucks. Actually fixing the problem is impossible. Calling programmer B and ask him in the friendliest possible way to "pick a different name, somebody already picked yours" doesn't work. Automatic response is "call programmer A instead".
This is not a problem when you use a GUID instead of a name. They are Globally Unique IDs. The odds of getting a collision are astronomically small.

Probably, as it would guarantee a globally unique identifier for each object.

As the name suggests, it's an identifier and serves the same purpose as any other identifier. As you mentioned, avoiding name conflicts is one use. Another advantage is that it is only 128 bits long (as compared to a name which could be arbitrarily long) so comparing two GUIDs is much faster than comparing their corresponding names.

As it name suggests, GUID is a globally unique identifier. Which means that if one developer generates two GUIDs he gets two different GUIDs and if two developers who don't even know about each other generate one GUID each - either at the same moment or at different moments - they again get two different GUIDs.
Now consider a problem - any developer needs an ability to introduce new interfaces and classes and make them uniquely identifiable. That's because if you get an interface pointer and your program treats it as some InterfaceA* but it really is not an InterfaceA* you've got a major problem - your program runs into undefined behavior - crashes or produces unexpected results.
The above problem is easily solved with GUIDs. Every developer creates a new (and thus unique) GUID value for every new interface or class he introduces. This is reliable and convenient (except that GUIDs are not that human-readable).

Related

Simulation for large number of objects associated with other objects ("have a")

I am trying to really get a good idea how to think in OOP terms, so I have a semi-hypothetical scenario in my mind and I was looking for some thoughts.
If I wanted to design a simulation for different types of people interacting with each other, each of whom could acquire different proficiency levels in different "skills", what would be an optimal way to do this?
It's really the "skills" thing that I was a bit caught up on. My requirements are as follows:
-Each person either "has" a skill or does not
-If people have skills, they also have a "proficiency level" associated with the skill
-I need a way to find and pick out every person that has certain skills at all, or at a certain level
-The design needs to be extendible (ie, I need to be able to add more "skills" later)
I considered the following options:
have a giant enum for every single skill I include, and have the person class contain an
"int Skills[TOTAL_NUM_SKILLS]" member. The array would have zeros for "unacquired" skills, and 1 to (max) for proficiency levels of "acquired skills".
have the same giant enumeration, and have the person class contain a map of skills (from the enum) and numbers associated with the skills so that you can just add only the acquired skills to the map and associate a number this way.
Create a concrete class for every single skill, and have each inherit from an abstract base class (ISkill, say), and have the person class have a map of ISkill's
Really, option 1 seems like the straightforward no-nonsense way to do it. Please criticize; is there some reason this is not acceptable? Is there a more object oriented way to do this?
I know that option 3 doesn't make much sense right now, but if I decided to extend this later to have skills be more than just things with proficiency associated with them (ie, actually associate new actions with the skills (ISkill::DoAction, etc), does this make sense as an option?
Sorry for the broad question, I just want to see if this line of thought makes sense, or if I'm barking up the wrong tree altogether.
The problem with option 1 is future compatibility. Say you were shipping this framework to customers. Now, the customer has built this array of Skill values, which is length TOTAL_NUM_SKILLS, for each person. But this fails as soon as you try to add another skill, and especially as you try to reorder skills.
What if the customer is using an RPC framework in which a client and server pass Person objects over the wire? Now, unless the customer upgrades the client and server at the exact same time, the RPC calls break, since now the client and server expect arrays of different lengths. This can be particularly tricky because the customer may own only the client, or only the server, and be unable to upgrade both at once.
But it gets worse. Say the client has written out a Person object to disk in some file. If they decided to serialize a person as a simple list of numbers, then a new skill will cause the deserialization code to fail. Worse, if you reorder skills in your enum, the deserialization code may work just fine but give a wrong answer.
I like option 3 exactly for the reason you named: later you can add more functionality, and do so safely (well, except for the fact that every public change is a breaking change if your customers exercised certain edge cases in the language).
If you want to add skills often without changing the overall program structure, I'd consider some kind of external data file that you can change without recompiling your code. Think about how you'd want to do it in a really large project. The person who chooses the skills might be a designer with no programming ability. He could edit the skills in an XML file, but not in C++ code.
If you defined the skills in XML, it would naturally extend to store more data with each skill. Your players could be serialized as XML files too.
When you set up a player's skills at runtime, you could build a hash table keyed on the skill name from the XML file. If it's more common to enumerate a player's skills than to query whether a player has a certain skill, you could just use a vector of strings.
Of course, this solution will use more memory and run slower than your enum solution. But it will probably be good enough unless you're dealing with millions of players in your program.

About GUID usage

Wiki said it used to make class,interface uniquely identifier , how about object (actual instance) ??
When work with SQL,i also see the GUID for ID field (table user,..etc in database aspnetdb in asp.net MVC template project)
So I want to clearly understand the GUID usage, which case should use it , and is it really unique ,
Any explain appreciated
thank
For a good overview of what a GUID is, check out our good friend Wikipedia: GUID.
and is it really unique
GUIDs generated from the same machine are virtually guaranteed to be unique. You have an infinitesimally small chance of generating the same one twice on the same machine. Arguably you have a tiny chance of generating two GUIDs the same out in the wider world, but that chance is still small and the chances of those two GUIDs ever meeting are also pretty small. In fact you probably have a greater chance of the Large Hadron Collider generating a black hole that swallows the Earth than you would having two identical GUIDs meeting somewhere on a network.
Because of this, some people like to use it as the primary key for database tables. Personally i don't like to do this because:
an auto-incrementing integer gives me enough uniqueness to be able to use it as a primary key
GUIDs are a massive PITA to deal with when you are writing SQL queries.
Wiki said it used to make class,interface uniquely identifier
If you need an identifier that is unique across several disparate areas (like hives in a registry), then GUIDs are a good solution. In this particular case they are being used to identify a type. A concrete instance could also internally use a GUID identifier, but this is really only useful for data objects.

how to create a system-wide independent universal counter object primarily for Database keys?

I would like to create/use a system-wide independent universal 'counter object' that can be called via COM in a thread-safe manner.
The counter object will be passed an ID to identify which counter to return, handle the counting, 'persist' the count (occasionally), have reasonable performance (as fast as possible) perhaps capable of 1000 counts per second or better (1mS) and be accessible cross-process/out-of-process. The current count status must be persisted between object restarts/shutdowns.
The counter object is liklely to be a 'singleton' type object implemented in some form of free-threaded dictionary, containing maybe 10 counters (perhaps 50 max). The count needs to be monotonic and consistent, (ie: guaranteed unique sequential values).
Each counter should have a few methods, like reset, inc, dec, set, clear, remove. As a luxury, I would like to have a variable-increment (ie: 'step by' value). To support thread-safefty, perhaps some sorm of critical-section or mutex call. It just needs to return a long/4byte signed integer.
I really want something that can be called from anywhere, including VBScript, so I figure COM is my preferred solution.
The primary use of this is for database keys. I am unable to use autoinc or guid type keys and have ruled out database-generated counting systems at this point.
I've spent days researching this and I have really struggled to find a solution. The best I can find is a free-threaded dictionary object that can be instantiated using COM+ from Motobit - it seems to offer all the 'basics' and I guess I could create some form of wrapper for this.
So, here are my questions:
Does such a 'general purpose
counter-object already exist? Can you direct me to it? (MS did
do an IIS/ASP object called
'MSWC.Counter' but this isn't
'cross-process'/ out-of-process
component and isn't thread-safe. (but if it was, it would do!)
What is the best way of creating such
a Component? (I'd prefer VB6
right-now, [don't ask!] but can do in VB.NET2005
if I had to). I don't have the
skills/knowledge/tools to use
anything else.
I am desparate for a workable solution. I need specific guidance! If anybody can code something up for me I am prepared to pay for it.
Update:
Whats wrong with GUIDs? a) 16bytes if I'm lucky (Binary storage), 32+bytes if I'm not (ANSI without formatting) or even worse(64bytes Unicode). b) I have an high-volume replicated app where the GUID is just too big (compared to the actual row data) and c) the overhead of indexing and inserts d) I want a readable number! - I only need 4 byte integer, so why not try and get that? I know you will say that disc-space is cheap, but for my application the cost is in slow inserts, and guids don't help (and I have tried/tested) but would prefer not to use if I have a choice.
Autonumber/autoincs are evil: a) don't get the value until after the insert, b) session specific, c) easy to lose/screw up on a table alter, d) no good for mutli-table inserts, (its not MS-SQL Svr) plus I have a need for counters outside my DB...
By the sound of it, what you're looking to create is an ActiveX EXE. They run in their own process but can be accessed from any other process by instantiating an object from it as though it is just another COM object. It handles all the marshaling necessary to sync its internal thread with the threads of any process calling it. Since all you planning on using is integers, there's no need to worry about the thread safety of objects passed between the threads.
More than likely you can use the MSWC.Counter object within that ActiveX EXE and let it do the counter work.
A database engine is already very good at generating unique primary key values for a dbase table. Either by marking the column auto-increment or by using a Guid. Trying to create your own is a grave mistake. System wide is just not wide enough, it fails miserably when your app grows and more than one machine starts using the database.
Nevertheless, you can get what you want in VB6 by creating a COM server. It's been to long, I forgot the exact names of the project options, something resembling "single use".
I have implemented a similar solution implemented as a REST web service - accessible from any technology that supports http.
Simple c sharp backend implementation using a singleton pattern and will scale nicely under IIS.
The whole thing sounds like a twisted idea, so why should I not add another twisted one. :P
Host an old-skool ASP page.
You can use Application.Lock with a counter then, just like in the sample.
Added benefit: use it from any platform/language. (e.g. other HTML pages with XMLHttpRequest. :)
If you save the value at say every 100th request to a file, you do not even have to worry about IIS resets.
Just set the starting value to last saved value + 100 in Application_OnStart. :P

Getting rid of hard coded values when dealing with lookup tables and related business logic

Example case:
We're building a renting service, using SQL Server. Information about items that can be rented is stored in a table. Each item has a state that can be either "Available", "Rented" or "Broken". The different states reside in a lookup table.
ItemState table:
id name
1 'Available'
2 'Rented'
3 'Broken'
Adding to this we have a business rule which states that whenever an item is returned, it's state is changed from "Rented" to "Available".
This could be done with a an update statement like "update Items set state=1 where id=#itemid". In application code we might have an enum that maps to the ItemState id:s. However, these contain hard coded values that could lead to maintenance issues later on. Say if a developer were to change the set of states but forgot to fix the related business logic layer...
What good methods or alternate designs are there for dealing with this type of design issues?
Links to related articles are also appreciated in addition to direct answers.
In my experience this is a case where you actually have to hardcode, preferably by using an Enum which integer values match the id's of your lookup tables. I can't see nothing wrong with saying that "1" is always "Available" and so forth.
Most systems that I've seen hard code the lookup table values and live with it. That's because, in practice, code tables rarely change as much as you think they might. And if they ever do change, you generally need to re-compile any programs that rely on that DDL anyway.
That said, if you want to make the code maintainable (a laudable goal), the best approach would be to externalize the values into a properties file. Then you can edit this file later without having to re-code your entire app.
The limiting factor here is that your app depends for its own internal state on the value you get from the lookup table, so that implies a certain amount of coupling.
For lookups where the app doesn't rely on that code, (for instance, if your code table stores a list of two-letter state codes for use in an address drop-down), then you can lazily load the codes into an object and access them only when needed. But that won't work for what you're doing.
When you have your lookup tables as well as enums defined in the code, then you always have an issue with keeping them in sync. There is not much that can be done here. Both live effectively in two different worlds and are generally unaware of each other.
You may wish to reject using lookup tables and only let your business logic operate these values. In that case you miss the options of relying on referential integrity to back you ap on the data integrity.
The other option is to build up your application in that way that you never need these values in your code. That means moving part of your business logic to the database layer, meaning, putting them in stored procedures and triggers. This will also have the benefit of being agnostic to the client. Anyone can invoke SPs and get assured the data will be kept in the consistence state, consistent with your business logic rules as well.
You'll need to have some predefined value that never changes, be it an integer, a string or something else.
In your case, the numerical value of the state is the state's surrogate PRIMARY KEY which should never change in a well-designed database.
If you're concerned about the consistency, use a CHAR code: A, R or B.
However, you should stick to it as well as to a numerical code so that A always means Available etc.
You database structure should be documented as well as the code is.
The answer depends entirely on the language you're using: solutions for this are not the same in Java, PHP, Smalltalk or even Assembler...
But let me tell you something: while it's true hard coded values are not a great thing, there are times in which you do need them. And this one is pretty much one of them: you need to declare in your code your current knowledge of the business logic, which includes these hard coded states.
So, in this particular case, I would hard code those values.
Don't overdesign it. Before trying to come up with a solution to this problem, you need to figure out if it's even a problem. Can you think of any legit hypothetical scenario where you would change the values in the itemState table? Not just "What if someone changes this table?" but "Someone wants to change this table in X way for Y reason, what effect would that have?". You need to stay realistic.
New state? you add a row, but it doesn't affect the existing ones.
Removing a state? You have to remove the references to it in code anyway.
Changing the id of a state? There is no legit reason to do that.
Changing the name of a state? There is no legit reason to do that.
So there really should be no reason to worry about this. But if you must have this cleanly maintainable in the case of irrational people who randomly decide to change Available to 2 because it just fits their Feng Shui better, make sure all tables are generated via a script which reads these values from a configuration file, and then make sure all code reads constants from that same configuration file. Then you have one definition location and any time you want to change the value you modify that configuration file instead of the DB/code.
I think this is a common problem and a valid concern, that's why I googled and found this article in the first place.
What about creating a public static class to hold all the lookup values, but instead of hard-coding, we initialize these values when the application is loaded and use names to refer them?
In my application, we tried this, it worked. Also you can do some checking, e.g. the number of different possible values of a lookup in code should be the same as in db, if it's not, log/email/etc. But I don't want to manually code this for the status of 40+ biz entities.
Moreover, this can be part of the bigger problem of OR mapping. We're exposed with too much details of the persistence layer, and thus we have to take care of it. With technologies like Entity Framework, we don't need to worry about the "sync" part because it's automated, am I right?
Thanks!
I've used a similar method to what you're describing - a table in the database with values and descriptions (useful for reporting, etc.) and an enum in code. I've handled the synchronization with a comment in code saying something like "these values are taken from table X in database ABC" so that the programmer knows the database needs to be updated. To prevent changes from the database side without the corresponding changes in code I set permissions on the table so that only certain people (who hopefully remember they need to change the code as well) have access.
The values have to be hard-coded, which effectively means that they can't be changed in the database, which means that storing them in the database is redundant.
Therefore, hard-code them and don't have a lookup table in the database. Instead store the items state directly in the items table.
You can structure your database so that your application doesn't actually have to care about the codes themselves, but rather the business rules behind them.
I have done both of the following:
Do one or more of your codes have a certain characteristic, such as IsAvailable, that the application cares about? If so, add it as a flag column to the code table, where those that match are set to true (or your DB's equivalent), and those that don't are set to false.
Do you need to use a specific, single code under a certain condition? You can create a singleton table, named something like EnvironmentSettings, with a column such as ItemStateIdOnReturn that's a foreign key to the ItemState table.
If I wanted to avoid declaring an enum in the application, I would use #2 to address the example in the question.
Whether you take this approach depends on your application's priorities. This type of structure comes at the cost of additional development and lookup overhead. Plus, if every individual code comes with its own business rules, then it's not practical to create one new column per required code.
But, it may be worthwhile if you don't want to worry about synchronizing your application with the contents of a code table.

Should we use prefixes in our database table naming conventions?

We are deciding the naming convention for tables, columns, procedures, etc. at our development team at work. The singular-plural table naming has already been decided, we are using singular. We are discussing whether to use a prefix for each table name or not. I would like to read suggestions about using a prefix or not, and why.
Does it provide any security at all (at least one more obstacle for a possible intruder)? I think it's generally more comfortable to name them with a prefix, in case we are using a table's name in the code, so to not confuse them with variables, attributes, etc. But I would like to read opinions from more experienced developers.
I find hungarian DB object prefixes to indicate their types rather annoying.
I've worked in places where every table name had to start with "tbl". In every case, the naming convention ended up eventually causing much pain when someone needed to make an otherwise minor change.
For example, if your convention is that tables start with "tbl" and views start with "v", thn what's the right thing to do when you decide to replace a table with some other things on the backend and provide a view for compatibility or even as the preferred interface? We ended up having views that started with "tbl".
I prefer prefixing tables and other database objects with a short name of the application or solution.
This helps in two potential situations which spring to mind:
You are less likely to get naming conflicts if you opt to use any third-party framework components which require tables in your application database (e.g. asp net membership provider).
If you are developing solutions for customers, they may be limited to a single database (especially if they are paying for external hosting), requiring them to store the database objects for multiple applications in a single database.
I don't see how any naming convention can improve security...
If an intruder have access to the database (with harmful permissions), they will certainly have permissions to list table names and select to see what they're used for.
But I think that truly confusing table names might indirectly worsen security.
It would make further development hard, thus reducing the chance security issues will be fixed, or it could even hide potential issues:
If a table named (for instance) 'sro235onsg43oij5' is full of randomly named coloumns with random strings and numbers, a new developer might just think it's random test data (unless he touches the code that interact with it), but if it was named 'userpasswords' or similar any developer who looks at the table would perhaps be shocked that the passwords is stored in plaintext.
Why not name the tables according to the guidelines you have in place for coding? Consider the table name a "class" and the columns a "property" or "field". This assists when using an ORM that can automatically infer table/column naming from class/member naming.
For instance, Castle ActiveRecord, declared like below assumes the names are the same as the member they are on.
[ActiveRecord]
public class Person
{
[PrimaryKey]
public Int32 Id { get; set; }
[Property]
public String Name { get; set; }
}
If you use SqlServer the good start would be to look at the sample databases provided for some guidance.
In the past, I've been opposed to using prefixes in table names and column names. However, when faced with the task of redesigning a system, having prefixes is invaluable for doing search and replace. For example, grepping for "tbl_product" will probably give you much more relevant results than grepping for "product".
If you're worried about mixing up your table names, employ a hungarian notation style system in your code. Perhaps "s" for string + "tn" for table name:
stnUsers = 'users';
stnPosts = 'posts';
Of course, the prefix is up to you, depending on how verbose you like your code... strtblUsers, strtblnmeUsers, thisisthenameofatableyouguysUsers...
Appending a prefix to table names does have some benefits, especially if you don't hardcode that prefix into the system, and allow it to change per installation. For one, you run less risk of conflicts with other components, as Ian said, and secondly, should you wish, you could have two or instances of your program running off the same database.