I recently started working with access and there's something that so far has cause me no problems but I'm concern that it could bring me some issues as the database continues expanding.
When I create tables, Microsoft Access recommend to use their default primary key, which I usually do, the problem is that for some reason when the table get populated the primary key "ID" keeps being inconsistent, it will go from 4 to 2679 (just random example) and it skip lots of numbers, If I'm correct this primary key get set as auto increment automatically, correct? so why is it skipping all numbers in between?
The Table gets populated with a simple SQL query using Visual Studio and C# language. See below a photo from my access table
enter image description here
SQL Server used to do that (in v6.0/6.5 and possibly later ones). It's quite conceivable that Access uses te same mechanism.
IDENTITY works by having the next number (or last, who cares) stored on disc in the DB. To speed up access it is cached in memory, and only occasionally written back to disc (it is SQL Server after all). Depending on how SQL Server was shut down the disk update might be missed. When the server was restarted is had some way of detecting that the disc version was stale and would up it by some number.
Oracle does the same with SEQUENCE's. This got complicated on multi-machine cluster installations where there are multiple servers for the same database. To support this, the first time a server had to get a sequence number it got a lot of them (the Cache variable part of a SEQUENCE's definition, default 20 IIRC) and updated the SEQUENCE assuming that it would use all of the numbers assigned. If it didn't use all the numbers assigned then there would be gaps in the numbers used. (It also meant that with a SEQUENCE in a cluster, the SEQUENCE numbers would not necessarily be used sequentially: machine A writes 21, B writes 41, A writes 22, etc.) I've never checked but I assume that a SQL Server in a fail-over cluster might have the same gaps.
Apply the same mechanism to Access where there is no central server for the DB, just potentially lots of local ones on each client's machine. You can see that there is the potential for gaps.
Related
I need to take a snapshot of a table at a given time, and determine the difference between the snapshot and the current data. What is the most effective way to do that? Can it be done in pure SQL (MS SQL), or do my app server do that in Delphi code?
I'm using an app server that keeps track of these changes, and transmit them over a Telnet protocol to any number of clients/ on the same machine or not.
Because of the txt protocol, I have to use the difference of the tables because it is impractical to send all the data (~10k records) every time something changes.
The apps involved are, Swordfish (an Automatic Trading System/ ATS), not written by me. The app server (Chef), and the client (Diner), both written by me. The ATS uses MS SQL as a layer for its API, so Chef, sends and receives data to the MS SQL server, essentially controlling the ATS. The client communicates what it wants done to Chef, and then Chef talks to Swordfish through the DBMS, and the the other Diners, through Telnet.
Code. Is the most efficient way to do this. According to all the info that I could find on the web
It may be possible to know with pure SQL what rows were added, but I could find nothing (in SQL) to detect changes to already existent rows or row deletes, both of which I need knowledge of to keep my app server (that is aware and synced with the SQL table) and my app clients synchronized.
Keeping an in-memory table of 10-15k records isn't that serious, a different error in my code (to do with TFDQuery) made me think that my "offline" or "in memory" snapshot op the tables needed A LOT of memory (every sql add command created it's own instance of TFDQuery, requiring 30mb per record that leaked when destroying the TFDQuery, now I create the instance of TFDQuery once, and reuse the instance for every record added, and my memory usage total stays ~50mb, which I have no problem with)
So, every time Service Broker detects a change in the dataset of the sql table, I save the old dataset to a in-memory table, and do 3 compares between dataset and dataset (dataset saved/old and dataset current/the newest version of the SQL table). 1. Scan for addition. 2. Scan for changes. 3 Scan for deletion, DONE :-)
Then its' a simple task of encoding the text for the Telnet protocol, and all my clients and my SQL server and my app server are happily synced!
I work as part of a two man DBA team running SQL Server 2008 R2 with me being somewhat of an accidental DBA. We recently had an issue where a small table we hardly ever use ended up getting truncated. Both of us swear we didn't do it, but it happened nonetheless.
To avoid the situation in the future, we're interested in implementing change tracking. It's not really necessary for us to preserve the data that was changed so we decided against using change data capture.
With that said, the things I'm reading about change tracking seem to be more about using it to synchronize data with an application rather than simply recording all the changes. Can I use change tracking to simply keep a list of all the changes made in the last 6 months or something? Once I enable it for each database in the SQL Server GUI, where is the info stored? Any other info you may have on implementing this correctly would be great.
Thanks!
From the documentation for change tracking:
The values of the primary key column is only information from the
tracked table that is recorded with the change information. These
values identify the rows that have been changed. To obtain the latest
data for those rows, an application can use the primary key column
values to join the source table with the tracked table.
So, if you're going to try to use this as a mechanism to somehow recover accidental deletions, I think you'll find it lacking. But don't take my word for it. Set up the following test:
In a test environment, set up a dummy table and enable change tracking on it.
Insert some data into the dummy table.
Delete the data
Recover the data using change tracking.
Despite having already dismissed CDC as an option, it sounds more in line with what you're after. CDC does keep track of non-primary key columns so if someone does a data modification accidentally, CDC will keep track of all of the values in the row(s) affected. It has the added benefit of not allowing table truncation because of how it's implemented (it uses the replication log reader).
Additionally, you can configure the CDC cleanup job to automatically purge data after any any amount of time you want (it sounds like 6 months is your retention period, which is completely doable).
I am looking for a (SQL/RDB) database setup that works something like this:
I will have 3+ databases in an active/active/active configuration
prior to doing any insert, the database will communicate with atleast a majority of the others, such that they all either insert at the same time or rollback (transaction)
this way I can write and read from any of the databases, and always get the same results (as long as the field wasn't updated very recently)
note: this is for a use case that will be very read-heavy and have few writes (and delay on the writes is an OK situation)
does anything like this exist? I see all sorts of solutions with database HA configurations, but most of them suggest writing to a primary node or having a passive backup
alternatively I could setup a custom application, and have each application talk to exactly 1 database, and achieve a similar result, but I was hoping something similar would already exist
So my questions is: does something like this exist? if not, are there any technical/architectural reasons why not?
P.S. - I will NOT be using a SAN where all databases can store/access the same data
edit: more clarifications as far as what I am looking for:
1. I have no database picked out yet, but I am more familiar with MySQL / SQL Server / Oracle, so I would have a minor inclination towards on of those
2. If a majority of the nodes are down (or a single node can't communicate with the collective), then I expect all writes from that node to fail, and accept that it may provide old/outdated information
failure / recover scenario expectations:
1. A node goes down: it will query and get updates from the other nodes when it comes back up
2. A node loses connection with the collective: it will provide potentially old data to read request, and refuse any writes
3. A node is in disagreement with the data stores in others: majority rule
4. 4. majority rule does not work: go with whomever has the latest data (although this really shouldn't happen)
5. The entries are insert/update/read only, i.e. there will be no deletes (except manually ofc), so I shouldn't need to worry about an update after a delete, however in that case I would choose to delete the record and ignore the update
6. Any other scenarios I missed?
update: I the closest I see to what I am looking for seems to be using a quorum + 2 DBs, however I would prefer if I could have 3 DBs instead, such that I can query any of them at any time (to further distribute the reads, and also to keep another copy of the data)
You need to define "very recently". In most environments with replication for inserts, all the databases will have the same data within a few seconds of an insert (and a few seconds seems pessimistic).
An alternative approach is a "read-from-one/write-to-all" approach. In this case, reads are spread through the system. Writes are then sent to all nodes by the application (or a common layer that the application uses).
Remember, though, that the issue with database replication is not how it works when it works. The issue is how it recovers when it fails and even how failures are identified. You need to decide what happens when nodes go down, how they recover lost transactions, how you decide that nodes are really synchronized. I would suggest that you peruse the documentation of the database that you are actually using and understand the replication mechanisms provided by that platform.
I have been wondering about the uniqueness of the GUID across the sql servers.
I have one central Database server and 100's of client databases (both SQL Servers). I have a merge replication (bi-directional) setup to sync the data between client and master servers. The sync process will happen 2-3 times a day.
For each of the tables to be synced I am using GUID as PrimaryKey and each table locally gets new records added and new GUIDs are generated locally.
When GUIDs are getting created at each client machine as well as at master DB server, how it will make sure it generates the unique GUID across all Client & Master DBs?
How it will keep track of GUID generated at other client/server DB, so that it will not repeat that GUID?
GUIDs are unique (for your purposes)
There are endless debates on the internet - I like this one
I think GUID's are not really necessarily unique. Their uniqueness comes from the fact that it's extremely unlikely to generate the same GUID randomly but that's all.
But for your purpose, that should be ok - they should be unique on a distributed system with extremely high probability.
You will have to do more research, but I think GUID is based upon MAC address and timestamp, if I remember right.
http://www.sqlteam.com/article/uniqueidentifier-vs-identity
I know some MCM's who have come across a unique key violation on a GUID.
How can this happen? Well, in the Virtual World, you have virtual adapters.
If you copy one virtual machine from one host to another, you can have the same adapter, MAC address?
Now if both images are running at the same time, it is possible to get no unique GUIDs.
However, the condition is rare. You can always add another field to the key to make it unique.
There is a whole debate on whether or not to use a GUID as a clustered PK. Remember, any other index will take a copy of the PK in the leaf (nodes). This is 16 bytes for every record x number of indexes.
I hope this helps.
John
You don't need to do anything special to ensure a GUID/Uniqueidentifier is globally unique. That basic guarantee is the motivating requirement for the GUID.
We have an application which stores its data in SQL Server. Each table has a bigint primary key. We used to generate these exclusively on demand, i.e. when you go to insert a new row, you first make a call to generate the next ID, and then you do the insert.
We added support to run in offline mode: if your connection is down (or SQL Server is down), it saves the data to a local file until you go back online, and then syncs everything you've done since then.
This required being able to generate IDs on the client side. Instead of asking SQL for the next 1 ID, it now asks for the next hundred or thousand or 10,000 IDs, and then stores the range locally, so it doesn't have to ask for more until those 10,000 run out. It would actually get them in smaller chunks, so when 5000 run out, it still has a buffer of 5000, and it can ask for 5000 more.
The problem is, as soon as this went live, we started getting reports of primary key violations. We stored the data in the Windows registry, in HKEY_CURRENT_USER (the only place in the registry a user is guaranteed to be able to write to). So after some research, we discovered that HKEY_CURRENT_USER is part of the roaming profile, so it's possible the IDs could get overwritten with an old version. Especially if the user logs into multiple computers on the network simultaneously.
So we re-wrote the part that generates IDs to read/write a file from the user's "Local Settings" directory. Surely that shouldn't get overwritten by an old version. But even now, I still see occasional primary key violations. The only thing we can do in that case is delete any keys in the file, kick the user out of the application, and don't let them back in until they get new ID ranges.
But if "Local Settings" isn't safe, what would be? Is there anywhere you can store a persistent value on a computer which is guaranteed not to be rolled back to an old version? Can anyone explain why "Local Settings" does not meet this criteria?
I've done some consideration of a GUID like solution, but that has problems on its own.
in distributed environment as yours, your best bet is using GUID
Do you have to use the same key when you persist the data locally that you use when you sync with the database?
I would be sorely tempted to use a GUID when you persist the data locally and then generate the real key when you're actually writing the data to the database. Or persist the data locally starting with a value of 1 and then generate real keys when you actually write the data to the database.
Setup an IDENTITY (http://www.simple-talk.com/sql/t-sql-programming/identity-columns/) on the bigint primary key so that SQL Server generates the values automatically.
When your application is offline, you keep the pending changes local. When it comes back online, you send your updates (including new records) and SQL Server would INSERT them and automatically assign a primary key since you have the IDENTITY setup.
If you need to know what key value was generated/used after an insert you can utilize the ##IDENTITY property (http://msdn.microsoft.com/en-us/library/aa933167%28v=sql.80%29.aspx)