Is GUID unique across SQL servers? - sql

I have been wondering about the uniqueness of the GUID across the sql servers.
I have one central Database server and 100's of client databases (both SQL Servers). I have a merge replication (bi-directional) setup to sync the data between client and master servers. The sync process will happen 2-3 times a day.
For each of the tables to be synced I am using GUID as PrimaryKey and each table locally gets new records added and new GUIDs are generated locally.
When GUIDs are getting created at each client machine as well as at master DB server, how it will make sure it generates the unique GUID across all Client & Master DBs?
How it will keep track of GUID generated at other client/server DB, so that it will not repeat that GUID?

GUIDs are unique (for your purposes)
There are endless debates on the internet - I like this one

I think GUID's are not really necessarily unique. Their uniqueness comes from the fact that it's extremely unlikely to generate the same GUID randomly but that's all.
But for your purpose, that should be ok - they should be unique on a distributed system with extremely high probability.

You will have to do more research, but I think GUID is based upon MAC address and timestamp, if I remember right.
http://www.sqlteam.com/article/uniqueidentifier-vs-identity
I know some MCM's who have come across a unique key violation on a GUID.
How can this happen? Well, in the Virtual World, you have virtual adapters.
If you copy one virtual machine from one host to another, you can have the same adapter, MAC address?
Now if both images are running at the same time, it is possible to get no unique GUIDs.
However, the condition is rare. You can always add another field to the key to make it unique.
There is a whole debate on whether or not to use a GUID as a clustered PK. Remember, any other index will take a copy of the PK in the leaf (nodes). This is 16 bytes for every record x number of indexes.
I hope this helps.
John

You don't need to do anything special to ensure a GUID/Uniqueidentifier is globally unique. That basic guarantee is the motivating requirement for the GUID.

Related

Extending a set of existing tables into a dynamic client defined structure?

We have an old repair database that has alot of relational tables and it works as it should but i need to update it to be able to handle different clients ( areas ) - currenty this is done as a single client only.
So i need to extend the tables and the sql statements so ex i can login as user A and he will see his own system only and user B will have his own system too.
Is it correctly understood that you wouldnt create new tables for each client but just add a clientID to every record in every ( base ) table and then just filter with a clientid in all sql statements to be able to achieve multiple clients ?
Is this also something that would work ( how is it done ) on hosted solutions ? Am worried about performance if thats an issue lets say i had 500 clients ( i wont but from a theoretic viewpoint ) ?
The normal situation is to add a client key to each table where appropriate. Many tables don't need them -- such as reference tables.
This is preferred for many reasons:
You have the data for all clients in one place, so you can readily answers a question such as "what is the average X for each client".
If you change the data structure, then it affects all clients at the same time.
Your backup and restore strategy is only implemented once.
Your optimization is only implemented once.
This is not always the best solution. You might have requirements that specify that data must be separated -- in which case, each client should be in a separate database. However, indexes on the additional keys are probably a minor consideration and you shouldn't worry about it.
This question has been asked before. The problem with adding the key to every table is that you say you have a working system, and this means every query needs to be updated.
Probably the easiest is to create a new database for each client, so that the only thing you need to change is the connection string. This also means you can get automated query tools for example to work without worrying about cross-client data leakage.
And it also allows you to backup, transfer, delete a single client easily as well.
There are of course pros and cons to this approach, but it will simplify the development effort. Also remember that if you plan to spin it up in a cloud environment then spinning up databases like this is also very easy.

primary key in access not in order

I recently started working with access and there's something that so far has cause me no problems but I'm concern that it could bring me some issues as the database continues expanding.
When I create tables, Microsoft Access recommend to use their default primary key, which I usually do, the problem is that for some reason when the table get populated the primary key "ID" keeps being inconsistent, it will go from 4 to 2679 (just random example) and it skip lots of numbers, If I'm correct this primary key get set as auto increment automatically, correct? so why is it skipping all numbers in between?
The Table gets populated with a simple SQL query using Visual Studio and C# language. See below a photo from my access table
enter image description here
SQL Server used to do that (in v6.0/6.5 and possibly later ones). It's quite conceivable that Access uses te same mechanism.
IDENTITY works by having the next number (or last, who cares) stored on disc in the DB. To speed up access it is cached in memory, and only occasionally written back to disc (it is SQL Server after all). Depending on how SQL Server was shut down the disk update might be missed. When the server was restarted is had some way of detecting that the disc version was stale and would up it by some number.
Oracle does the same with SEQUENCE's. This got complicated on multi-machine cluster installations where there are multiple servers for the same database. To support this, the first time a server had to get a sequence number it got a lot of them (the Cache variable part of a SEQUENCE's definition, default 20 IIRC) and updated the SEQUENCE assuming that it would use all of the numbers assigned. If it didn't use all the numbers assigned then there would be gaps in the numbers used. (It also meant that with a SEQUENCE in a cluster, the SEQUENCE numbers would not necessarily be used sequentially: machine A writes 21, B writes 41, A writes 22, etc.) I've never checked but I assume that a SQL Server in a fail-over cluster might have the same gaps.
Apply the same mechanism to Access where there is no central server for the DB, just potentially lots of local ones on each client's machine. You can see that there is the potential for gaps.

Using data from multiple redis databases in one command

At my current project I actively use redis for various purposes. There are 2 redis databases for current application:
The first one contains absolutely temporary data: how many users are online, who are online, various admin's counters. This db is cleared before the application starts by start-up script.
The second database is used for persistent data like user's ratings, user's friends, etc.
Everything seems to be correct and everybody is happy.
However, when I've started implementing a new functionality in my application, I discover that I need to intersect a set with user's friends with a set of online users. These sets stored in different redis databases, and I haven't found any possibility to do this task in redis, except changing application architecture and move all keys into one namespace(database).
Is there actually any way to perform some command in redis using data from multiple databases? Or maybe my use case of redis is wrong and I have to perform a fix of system architecture?
There is not. There is a command that makes it easy to move keys to another DB:
http://redis.io/commands/move
If you move all keys to one DB, make sure you don't have any key clashes! You could suffix or prefix the keys from the temp DB to make absolutely sure. MOVE will do nothing if the key already exists in the target DB. So make sure you act on a '0' reply
Using multiple DBs is definitely not a good idea:
A Quote from Salvatore Sanfilippo (the creator of redis):
I understand how this can be useful, but unfortunately I consider
Redis multiple database errors my worst decision in Redis design at
all... without any kind of real gain, it makes the internals a lot
more complex. The reality is that databases don't scale well for a
number of reason, like active expire of keys and VM. If the DB
selection can be performed with a string I can see this feature being
used as a scalable O(1) dictionary layer, that instead it is not.
With DB numbers, with a default of a few DBs, we are communication
better what this feature is and how can be used I think. I hope that
at some point we can drop the multiple DBs support at all, but I think
it is probably too late as there is a number of people relying on this
feature for their work.
https://groups.google.com/forum/#!msg/redis-db/vS5wX8X4Cjg/8ounBXitG4sJ

Safe way to store primary key generator outside of database?

We have an application which stores its data in SQL Server. Each table has a bigint primary key. We used to generate these exclusively on demand, i.e. when you go to insert a new row, you first make a call to generate the next ID, and then you do the insert.
We added support to run in offline mode: if your connection is down (or SQL Server is down), it saves the data to a local file until you go back online, and then syncs everything you've done since then.
This required being able to generate IDs on the client side. Instead of asking SQL for the next 1 ID, it now asks for the next hundred or thousand or 10,000 IDs, and then stores the range locally, so it doesn't have to ask for more until those 10,000 run out. It would actually get them in smaller chunks, so when 5000 run out, it still has a buffer of 5000, and it can ask for 5000 more.
The problem is, as soon as this went live, we started getting reports of primary key violations. We stored the data in the Windows registry, in HKEY_CURRENT_USER (the only place in the registry a user is guaranteed to be able to write to). So after some research, we discovered that HKEY_CURRENT_USER is part of the roaming profile, so it's possible the IDs could get overwritten with an old version. Especially if the user logs into multiple computers on the network simultaneously.
So we re-wrote the part that generates IDs to read/write a file from the user's "Local Settings" directory. Surely that shouldn't get overwritten by an old version. But even now, I still see occasional primary key violations. The only thing we can do in that case is delete any keys in the file, kick the user out of the application, and don't let them back in until they get new ID ranges.
But if "Local Settings" isn't safe, what would be? Is there anywhere you can store a persistent value on a computer which is guaranteed not to be rolled back to an old version? Can anyone explain why "Local Settings" does not meet this criteria?
I've done some consideration of a GUID like solution, but that has problems on its own.
in distributed environment as yours, your best bet is using GUID
Do you have to use the same key when you persist the data locally that you use when you sync with the database?
I would be sorely tempted to use a GUID when you persist the data locally and then generate the real key when you're actually writing the data to the database. Or persist the data locally starting with a value of 1 and then generate real keys when you actually write the data to the database.
Setup an IDENTITY (http://www.simple-talk.com/sql/t-sql-programming/identity-columns/) on the bigint primary key so that SQL Server generates the values automatically.
When your application is offline, you keep the pending changes local. When it comes back online, you send your updates (including new records) and SQL Server would INSERT them and automatically assign a primary key since you have the IDENTITY setup.
If you need to know what key value was generated/used after an insert you can utilize the ##IDENTITY property (http://msdn.microsoft.com/en-us/library/aa933167%28v=sql.80%29.aspx)

SQL Server Database securing against clever admins?

I want to secure events stored in one table, which has relations to others.
Events are inserted through windows service, that is connecting to hardware and reading from the hardware.
In events table is PK, date and time, and 3 different values.
The problem is that every admin can log in and insert/update/delete data in this table e.g. using sql management studio. I create triggers to prevent update and delete, so if admin doesn't know triggers, he fail to change data, but if he knows trigger, he can easily disable trigger and do whatever he wants.
So after long thinking I have one idea, to add new column (field) to table and store something like checksum in this field, this checksum will be calculated based on other values. This checksum will be generated in insert/update statements.
If someone insert/update something manually I will know it, because if I check data with checksum, there will be mismatches.
My question is, if you have similar problem, how do you solve it?
What algorithm use for checksum? How to secure against delete statement (I know about empty numbers in PK, but it is not enough) ?
I'm using SQL Server 2005.
As admins have permissions to do everything on your SQL Server, I recommend a temper-evident auditing solution. In this scenario – everything that happens on a database or SQL Server instance is captured and saved in a temper-evident repository. In case someone who has the privileges (like admins) modifies or deletes audited data from the repository, it will be reported
ApexSQL Comply is such a solution, and it has a built in integrity check option
There are several anti-tampering measures that provide different integrity checks and detect tampering even when it’s done by a trusted party. To ensure data integrity, the solution uses hash values. A hash value is a numeric value created using a specific algorithm that uniquely identifies it
Every table in the central repository database has the RowVersion and RowHash column. The RowVersion contains the row timestamp – the last time the row was modified. The RowHash column contains the unique row identifier for the row calculated using the values other table columns
When the original record in the auditing repository is modified, ApexSQL Comply automatically updates the RowVersion value to reflect the time of the last change. To verify data integrity, ApexSQL Comply calculates the RowHash value for the row based on the existing row values. The values used in data integrity verification now updated, and the newly calculated RowHash value will therefore be different from the RowHash value stored in the central repository database. This will be reported as suspected tampering
To hide the tampering, I would have to calculate a new value for RowHash and update it. This is not easy, as the formula used for calculation is complex and non-disclosed. But that’s not all. The RowHash value is calculated using the RowHash value from the previous row. So, to cover up tampering, I would have to recalculate and modify the RowHas values in all following rows
For some tables in the ApexSQL Comply central repository database, the RowHash values are calculated based on the rows in other tables, so to cover tracks of tampering in one table, the admin would have to modify the records in several central repository database tables
This solution is not tamper-proof, but definitely makes covering tempering tracks quite difficult
Disclaimer: I work for ApexSQL as a Support Engineer
Security through obscurity is a bad idea. If there's a formula to calculate a checksum, someone can do it manually.
If you can't trust your DB admins, you have bigger problems.
Anything you do at the server level the admin can undo. That's the very definition of its role and there's nothing you can do to prevent it.
In SQL 2008 you can request auditing of the said SQL server with X events, see http://msdn.microsoft.com/en-us/library/cc280386.aspx. This is CC compliant solution that is tamper evident. That means the admin can stop the audit and do its mischievous actions, but the stopping of the audit is recorded.
In SQL 2005 the auditing solution recommended is using the profiler infrastructure. This can be made tamper evident when correctly deployed. You would prevent data changes with triggers and constraints and audit DDL changes. If the admin changes the triggers, this is visible in the audit. If the admin stops the audit, this is also visible in the audit.
Do you plan this as a one time action against a rogue admin or as a feature to be added to your product? Using digital signatures to sign all your application data can be very costly in app cycles. You also have to design a secure scheme to show that records were not deleted, including last records (ie. not a simple gap in an identity column). Eg. you could compute CHECSUM_AGG over BINARY_CHECKSUM(*), sign the result in the app and store the signed value for each table after each update. Needles to say, this will slow down your application as basically you serialize every operation. For individual rows cheksums/hashes you would have to compute the entire signature in your app, and that would require possibly values your app does not yet have (ie. the identity column value to be assigned to your insert). And how far do you want to go? A simple hash can be broken if the admin gets hold of your app and monitors what you hash, in what order (this is trivial to achieve). He then can recompute the same hash. An HMAC requires you to store a secret in the application which is basically impossible against a a determined hacker. These concerns may seem overkill, but if this is an application you sell for instance then all it takes is for one hacker to break your hash sequence or hmac secret. Google will make sure everyone else finds out about it, eventually.
My point is that you're up the hill facing a loosing battle if you're trying to deter the admin via technology. The admin is a person you trust and if this is broken in your case, the problem is trust, not technology.
Ultimately, even if admins do not have delete rights, they can give themselves access, make the change to not deny deletes, delete the row and then restore the permission and then revoke their access to make permission changes.
If you are auditing that, then when they give themselves access, you fire them.
As far as an effective tamper-resistant checksum, it's possible to use public/private key signing. This will mean that if the signature matches the message, then no one except who the record says created/modified the record could have done it. Anyone can change and sign the record with their own key, but not as someone else.
I'll just point to Protect sensitive information from the DBA in SQL Server 2008
The idea of a checksum computed by the application is a good one. I would suggest that you research Message Authentication Codes, or MACs, for a more secure method.
Briefly, some MAC algorithms (HMAC) use a hash function, and include a secret key as part of the hash input. Thus, even if the admin knows the hash function that is used, he can't reproduce the hash, because he doesn't know all of the input.
Also, in your case, a sequential number should be part of the hash input, to prevent deletion of entire entries.
Ideally, you should use a strong cryptographic hash function from the SHA-2 family. MD5 has known vulnerabilities, and similar problems are suspected in SHA-1.
It might be more effective to try to lock down permissions on the table. With the checksum, it seems like a malicious user might be able spoof it, or insert data that appears to be valid.
http://www.databasejournal.com/features/mssql/article.php/2246271/Managing-Users-Permissions-on-SQL-Server.htm
If you are concerned about people modifying the data, you should also be concerned about them modifying the checksum.
Can you not simply password protect certain permissions on that database?