In indexedDB, if the keys are arrays of integers such as [n,0] through [n,m], for operations that involve getting all the records in which the first element of the array key is n or opening a cursor on the same set of records, is there any advantage to using an index on an additonal property that stores n over using a key range?
Reasons to think an index may not be better include that the browser has to maintain the index for each change to the object store, an additional property has to be added to each record to store already stored data n, and little may be gained since the keys in the index will always point to consecutive records in the object store rather than dispersed throughout.
If the number of different values of n are likely no more than 1,000 and for m no more than 50, is using an index superior to a key range?
Thank you.
I guess the purpose of indexedDB is to have object store locally.
It is not sql that you need to update columns in every object.
since you change the object structure (saying by adding property)
it is true that all the objects in the store must be rewriten as you said...
emm well... another option for you is to update the db with another store
which contain somthing similar to forien key in sql or uniqe key which store the other stored objects extentions... and in it every obj item is also supposed to be same structured.
I think this is the point you start to use onupgradeneeded intansively.
Related
I recently got to know Redis, integrated it into my project and now I am facing the following use case.
My question in short:
Which data type can I use to get all entries sorted AND to be able to overwrite single entries?
My question in long:
I have a huge amount of point cloud models that I want to store and work with via Redis.
My point cloud model consists of three things:
Unique id (stays the same)
Point Cloud as a string (changes over time)
Priority as an integer (changes over time)
Basically I would like to be able to do only two things with Redis. However, if I understand the documentation correctly, these are seen as benefits of two different data types, so I can't find a data type that exactly fits my use case. I hope, however, that I am wrong about this and that someone here can help me.
Use case:
Get quick all models, all already sorted
Overwrite/update a specific model
Sorted Sets
Advantage
Get all entries in sorted order
my model property Priority can be used here as a score, which determines the order.
Disadvantage
No possibility to access a special value via a key and overwrite it.
Hashes:
Advantage
Overwrite specific entry via Key > Field
Get all entries via Key
Disadvantage
No sorting
I would suggest to just use two distinct data types:
a hash with all the properties of your model, with the exception of the priority;
a sorted set which allows to easily sort your collection and deal with the scores / priorities.
You could then link the two by storing each hash key (or a distinctive value which allows to reconstruct the final hash key) as the related sorted set member.
For example:
> HSET point-cloud:123 foo bar baz suppiej
> ZADD point-clouds-by-priority 42 point-cloud:123
You will keep all the advantages you mentioned, with no disadvantages at all.
I need to understand how one can search attributes of a DynamoDB that is part of an array.
So, in denormalising a table, say a person that has many email addresses. I would create an array into the person table to store email addresses.
Now, as the email address is not part of the sort key, and if I need to perform a search on an email address to find the person record. I need to index the email attribute.
Can I create an index on the email address, which is 1-many relationship with a person record and it's stored as an array as I understand it in DynamoDB.
Would this secondary index be global or local? Assuming I have billions of person records?
If I could create it as either LSI or GSI, please explain the pros/cons of each.
thank you very much!
Its worth getting the terminology right to start with. DynamoDB supported data types are
Scalar - String, number, binary, boolean
Document - List, Map
Sets - String Set, Number Set, Binary Set
I think you are suggesting you have an attribute that contains a list of emails. The attribute might look like this
Emails: ["one#email.com", "two#email.com", "three#email.com"]
There are a couple of relevant points about Key attributes described here. Firstly keys must be top-level attributes (they cant be nested in JSON documents). Secondly they must be of scalar types (i.e. String, Number or Binary).
As your list of emails is not a scalar type, you cannot use it in a key or index.
Given this schema you would have to perform a scan, in which you would set the FilterExpression on your Emails attribute using the CONTAINS operator.
Stu's answer has some great information in it and he is right, you can't use an Array it's self as a key.
What you CAN sometimes do is concatenate several variables (or an Array) into a single string with a known seperator (maybe '_' for example), and then use that string as a Sort Key.
I used this concept to create a composite Sort Key that consisted of multiple ISO 8061 date objects (DyanmoDB stores dates as ISO 8061 in String type attributes). I also used several attributes that were not dates but were integers with a fixed character length.
By using the BETWEEN comparison I am able to individually query each of the variables that are concatenated into the Sort Key, or construct a complex query that matches against all of them as a group.
In other words a data object could use a Sort Key like this:
email#gmail.com_email#msn.com_email#someotherplace.com
Then you could query that (assuming you knew what the partition key is) with something like this:
SELECT * FROM Users
WHERE User='Bob' AND Emails LIKE '%email#msn.com%'
YOU MUST know the partition key in order to perform a Query no matter what you choose as your Sort Key and no matter how that Sort Key is constructed.
I think the real question you are asking is what should my sort keys and partition keys be? That will depend on exactly which queries you want to make and how frequently each type of query is used.
I have found that I have way more success with DynamoDB if I think about the queries I want to make first, and then go from there.
A word on Secondary Indexes (GSI / LSI)
The issue here is that you still need to 'know' the Partition Key for your secondary data structure. GSI / LSI help you avoid needing to create additional DynamoDB tables for the sole purpose of improving data access.
From Amazon:
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SecondaryIndexes.html
To me it sounds more like the issue is selecting the Keys.
LSI (Local Secondary Index)
If (for your Query case) you don't know the Partition Key to begin with (as it seems you don't) then a Local Secondary Index won't help — since it has the SAME Partition Key as the base table.
GSI (Global Secondary Index)
A Global Secondary Index could help in that you can have a DIFFERENT Partition Key and Sort Key (presumably a partition key that you could 'know' for this query).
So you could use the Email attribute (perhaps composite) as the Sort Key on your GSI and then something like a service name, or sign-up stage, as your Partition Key. This would let you 'know' what partition that user would be in based on their progress or the service they signed up from (for example).
GSI / LSI still need to generate unique values using their keys so keep that in mind!
I am trying to read up on best practices on DynamoDB. I saw that DynamoDB has two PK types:
Hash Key
Hash and Range Key
From what I read, it appears the latter is like the former but supports sorting and indexing of a finite set of columns.
So my question is why ever use only a hash key without a range key? Is it a viable choice only when the table is not searched?
It'd also be great to have some general guidelines on when to use what key type. I've read several guides (including Amazon's own documentation on DynamoDB) but none of them appear to directly address this question.
Thanks
The choice of which key to use comes down to your Use Cases and Data Requirements for a particular scenario. For example, if you are storing User Session Data it might not make much sense using the Range Key since each record could be referenced by a GUID and accessed directly with no grouping requirements. In general terms once you know the Session Id you just get the specific item querying by the key. Another example could be storing User Account or Profile data, each user has his own and you most likely will access it directly (by User Id or something else).
However, if you are storing Order Items then the Range Key makes much more sense since you probably want to retrieve the items grouped by their Order.
In terms of the Data Model, the Hash Key allows you to uniquely identify a record from your table, and the Range Key can be optionally used to group and sort several records that are usually retrieved together. Example: If you are defining an Aggregate to store Order Items, the Order Id could be your Hash Key, and the OrderItemId the Range Key. Whenever you would like to search the Order Items from a particular Order, you just query by the Hash Key (Order Id), and you will get all your order items.
You can find below a formal definition for the use of these two keys:
"Composite Hash Key with Range Key allows the developer to create a
primary key that is the composite of two attributes, a 'hash
attribute' and a 'range attribute.' When querying against a composite
key, the hash attribute needs to be uniquely matched but a range
operation can be specified for the range attribute: e.g. all orders
from Werner in the past 24 hours, or all games played by an individual
player in the past 24 hours." [VOGELS]
So the Range Key adds a grouping capability to the Data Model, however, the use of these two keys also have an implication on the Storage Model:
"Dynamo uses consistent hashing to partition its key space across its
replicas and to ensure uniform load distribution. A uniform key
distribution can help us achieve uniform load distribution assuming
the access distribution of keys is not highly skewed."
[DDB-SOSP2007]
Not only the Hash Key allows to uniquely identify the record, but also is the mechanism to ensure load distribution. The Range Key (when used) helps to indicate the records that will be mostly retrieved together, therefore, the storage can also be optimized for such need.
Choosing the correct keys to represent your data is one of the most critical aspects during your design process, and it directly impacts how much your application will perform, scale and cost.
Footnotes:
The Data Model is the model through which we perceive and manipulate our data. It describes how we interact with the data in the database [FOWLER]. In other words, it is how you abstract your data model, the way you group your entities, the attributes that you choose as primary keys, etc
The Storage Model describes how the database stores and manipulates the data internally [FOWLER]. Although you cannot control this directly, you can certainly optimize how the data is retrieved or written by knowing how the database works internally.
Is it possible to use NHibernate for getting unique identifier of type: UInt64.
The initialize value of this property must be unique (for more than 1 DB).
The usage: I need to get this value and increment it by 1.
This action should be a closed transaction.
usually we want to have unique keys in one DB, not for multiple databases, GUIDs are used there as far as I know. Getting the max(int64) out of both DBs and then picking the largests and adding 1 will work but that sounds rather clumsy I think? What do you need this for actually?
Hope this helps,
What about using database identities (or similar) and setting the starting value for database N as N, and the increment by value (how much you add to the current latest key every time you get a new one) equal to the number of databases?
That way each database should keep a distinct set of ids. Only problem would be in case you need to add a new database to the set.
First, I'm aware of this question, and the suggestion (using GUID) doesn't apply in my situation.
I want simple UIDs so that my users can easily communicate this information over the phone :
Hello, I've got a problem with order
1584
as opposed to
hello, I've got a problem with order
4daz33-d4gerz384867-8234878-14
I want those to be unique (database wide) because I have a few different kind of 'objects' ... there are order IDs, and delivery IDs, and billing-IDs and since there's no one-to-one relationship between those, I have no way to guess what kind of object an ID is referring to.
With database-wide unique IDs, I can immediately tell what object my customer is referring to. My user can just input an ID in a search tool, and I save him the extra-click to further refine what is looking for.
My current idea is to use identity columns with different seeds 1, 2, 3, etc, and an increment value of 100.
This raises a few question though :
What if I eventually get more than 100 object types? granted I could use 1000 or 10000, but something that doesn't scale well "smells"
Is there a possibility the seed is "lost" (during a replication, a database problem, etc?)
more generally, are there other issues I should be aware of?
is it possible to use an non integer (I currently use bigints) as an identity columns, so that I can prefix the ID with something representing the object type? (for example a varchar column)
would it be a good idea to user a "master table" containing only an identity column, and maybe the object type, so that I can just insert a row in it whenever a need a new idea. I feel like it might be a bit overkill, and I'm afraid it would complexify all my insertion requests. Plus the fact that I won't be able to determine an object type without looking at the database
are there other clever ways to address my problem?
Why not use identities on all the tables, but any time you present it to the user, simply tack on a single char for the type? e.g. O1234 is an order, D123213 is a delivery, etc.? That way you don't have to engineer some crazy scheme...
Handle it at the user interface--add a prefix letter (or letters) onto the ID number when reporting it to the users. So o472 would be an order, b531 would be a bill, and so on. People are quite comfortable mixing letters and digits when giving "numbers" over the phone, and are more accurate than with straight digits.
You could use an autoincrement column to generate the unique id. Then have a computed column which takes the value of this column and prepends it with a fixed identifier that reflects the entity type, for example OR1542 and DL1542, would represent order #1542 and delivery #1542, respectively. Your prefix could be extended as much as you want and the format could be arranged to help distiguish between items with the same autoincrement value, say OR011542 and DL021542, with the prefixes being OR01 and DL02.
I would implement by defining a generic root table. For lack of a better name call it Entity. The Entity table should have at a minimum a single Identity column on it. You could also include other fields that are common accross all your objects or even meta data that tells you this row is an order for example.
Each of your actual Order, Delivery...tables will have a FK reference back to the Entity table. This will give you a single unique ID column
Using the seeds in my opinion is a bad idea, and one that could lead to problems.
Edit
Some of the problems you mentioned already. I also see this being a pain to track and ensure you setup all new entities correctly. Imagine a developer updating the system two years from now.
After I wrote this answer I had thought a but more about why your doing this, and I came to the same conclusion that Matt did.
MS's intentional programing project had a GUID-to-word system that gave pronounceable names from random ID's
Why not a simple Base36 representation of a bigint? http://en.wikipedia.org/wiki/Base_36
We faced a similar problem on a project. We solved it by first creating a simple table that only has one row: a BIGINT set as auto-increment identity.
And we created an sproc that inserts a new row in that table, using default values and inside a transaction. It then stores the SCOPE_IDENTITY in a variable, rolls back the transaction and then returns the stored SCOPE_IDENTITY.
This gives us a unique ID inside the database without filling up a table.
If you want to know what kind of object the ID is referring to, I'd lose the transaction rollback and also store the type of object along side the ID. That way findout out what kind of object the Id is referring to is only one select (or inner join) away.
I use a high/low algorithm for this. I can't find a description for this online though. Must blog about it.
In my database, I have an ID table with an counter field. This is the high part. In my application, I have a counter that goes from 0 to 99. This is the low part. The generated key is 100 * high + low.
To get a key, I do the following
initially high = -1
initially low = 0
method GetNewKey()
begin
if high = -1 then
high = GetNewHighFromDatabase
newkey = 100 * high + low.
Inc low
If low = 100 then
low = 0
high = -1
return newKey
end
The real code is more complicated with locks etc but that is the general gist.
There are a number of ways of getting the high value from the database including auto inc keys, generators etc. The best way depends on the db you are using.
This algorithm gives simple keys while avoiding most the db hit of looking up a new key every time. In testing, I found it had similar performance to guids and vastly better performance than retrieving an auto inc key every time.
You could create a master UniqueObject table with your identity and a subtype field. Subtables (Orders, Users, etc.) would have a FK to UniqueObject. INSTEAD OF INSERT triggers should keep the pain to a minimum.
Maybe an itemType-year-week-orderNumberThisWeek variant?
o2009-22-93402
Such identifier can consist of several database column values and simply formatted into a form of an identifier by the software.
I had a similar situation with a project.
My solution: By default, users only see the first 7 characters of the GUID.
It's sufficiently random that collisions are extremely unlikely (1 in 268 million), and it's efficient for speaking and typing.
Internally, of course, I'm using the entire GUID.