Independent Indexes in Elastic Search

Independent Indexes in Elastic Search - ruby-on-rails-3

UPDATE: Redefined what I am trying to do.
I have a model of Contact, this contact belongs to an account as does every other model in my account. I need all searches whether they be global or model specific to only query the containing account. I was told that I could do this with custom index names. I would like the index name to be the 'index-#{account-id}'. How would I achieve this in my active-models?
class Contact < ActiveRecord::Base
include Tire::Model::Search
include Tire::Model::Callbacks
belongs_to :account
mapping do
indexes :first_name
indexed :last_name
end
end
class Account < ActiveRecord::Base
has_many :contacts
end

You may want to check this comment at Tire's issues, which basically walks through some possible scenarios of the “tenant-based” index naming with Tire. I believe it's what you're after.
In elasticsearch itself, you have the option to have a separate index for every account, a filtered & routed index alias for every account, index templates, etc etc., so the toolkit is vast in this area.

Do you refer to having each account (user?) physically separate in each it's own index? This is generally referred to as 'multi-tenant' http://en.wikipedia.org/wiki/Multitenancy
Assuming this is indeed what you set out to do:
Much has been said in the past about the 'need' (I assume you want this for security reasons, I'm not familiar with other reasons why you would want this although I'm not an expert with multi-tenancy apps) for partitioning data per account/user, as apposed to just having, say, a field accountid for Contact and be sure all your queries filter, at least, on accountid. IMO, a carefully designed query-component where, say, every query used in the system inherits from a 'super-query' which is required to set accountid would suffice in a lot of cases.
Even if you don't know upfront what apps in the future will want to query these indices, you could still enforce the above by say, having a thin REST-service around ES and require all programs to interact with ES through this service. You could then have this service handle this type of security by enforcing an accountid or, probably better, by inferring the accountid by the current logged-in user doing the request.
If you still want to pursue Multi-tenancy have a look at: http://elasticsearch-users.115913.n3.nabble.com/Multi-tenacy-td471400.html (quickly searched this, perhaps there's better stuff around) 'Kimchy' (the creator of ES) comments in that thread as well.
Regardless, the best way in ES to have multi-tenancy is probably to have 1 index per account/user . Within that you could have multiple 'types' (an ES construct) , where Contact could be such a type.
http://www.elasticsearch.org/guide/reference/mapping/
http://www.elasticsearch.org/guide/reference/api/search/indices-types.html
Enforcing this in your models, as you are suggesting, is probably not the correct way IMO. Generally, you should keep your domain-models clean from any knowledge on the storage backend (including the index in which the data is stored)
To me, a better solution would be to have, as earlier suggested, a query-component in which the logic of choosing the correct index based on account/user would be contained. Going with the rest-service approach above, the dynamic indexname, as you suggested, could be derived from the logged-in user doing the request.
I realize that this probably wasn't a straight answer to your question, but I hope it was useful nonetheless.

Related

Should an API assign and return a reference number for newly created resources?

I am building a RESTful API where users may create resources on my server using post requests, and later reference them via get requests, etc. One thing I've had trouble deciding on is what IDs the clients should have. I know that there are many ways to do what I'm trying to accomplish, but I'd like to go with a design which follows industry conventions and best design practices.
Should my API decide on the ID for each newly created resource (it would most likely be the primary key for the resource assigned by the database)? Or should I allow users to assign their own reference numbers to their resources?
If I do assign a reference number to each new resource, how should this be returned to the client? The API has some endpoints which allow for bulk item creation, so I would need to list out all of the newly created resources on every response?
I'm conflicted because allowing the user to specify their own IDs is obviously a can of worms - I'd need to verify each ID hasn't been taken, makes database queries a lot weirder as I'd be joining on reference# and userID rather than foreign key. On the other hand, if I assign IDs to each resource it requires clients to have to build some type of response parser and forces them to follow my imposed conventions.

Why not do both? Let the user create there reference and you create your own uid. If the users have to login then you can use there reference and userid unique key. I would also give the uid created back if not needed the client could ignore it.

It wasn't practical (for me) to develop both of the above methods into my application, so I took a leap of faith and allowed the user to choose their own IDs. I quickly found that this complicated development so much that it would have added weeks to my development time, and resulted in much more complex and slow DB queries. So, early on in the project I went back and made it so that I just assign IDs for all created resources.
Life is simple now.
Other popular APIs that I looked at, such as the Instagram API, also assign IDs to certain created resources, which is especially important if you have millions of users who can interact with each-other's resources.

Ways to handle security/authorization in a multi tenant API

I'm playing around with a spare time project, mainly to try out new stuff :)
This involves designing a REST API for a system that is multi tenant. Lets say you have an "organization" that is the "top" entity, this might have an API key assigned that is used for authenticating each request. So on each request we have an organization associated.
Now when a user of the API would like to get a list of, lets say projects, only those that belong to that organization should be returned. The actual implementation, the queries to the database, is pretty straight forward. However the approach is interesting I think.
You could implement the filtering each time you query the database, but a better approach would be a general pre-query applied to all "organization" related queries, like all queries for enities that belong to an organization. It's all about avoiding the wrong entities from being returned. You could isolate the database, but if that is not possible how would you approach it?
Right now I use NancyFX and RavenDB so input for that stack would be appreciated, but general ideas and best practices, do's and don't is very welcome.

In this case you could isolate your collections by prefixing them with the organization_id. It will duplicate maybe many collections.
Use case with mongodb: http://support.mongohq.com/use-cases/multi-tenant.html

Rails save record draft with dummy variable

So I need more alongs the lines of just advice than actual code here, but this is my situation:
I have a model that requires two associations - an author and a user. However, I want it to be possible for authors to create the record with the possibility that it will later be "claimed" by a user created later.
The best solution that I've come up with is to user some sort of "Dummy User" association for those cases, but it feels hacky.
Any better suggestions?

How about making the associations not required? Are the associations enforced with validations? How about relaxing the validations instead? So there would be no user association persisted until the time it is claimed.

The REST-way to check/uncheck like/unlike favorite/unfavorite a resource

Currently I am developing an API and within that API I want the signed in users to be able to like/unlike or favorite/unfavorite two resources.
My "Like" model (it's a Ruby on Rails 3 application) is polymorphic and belongs to two different resources:
/api/v1/resource-a/:id/likes
and
/api/v1/resource-a/:resource_a_id/resource-b/:id/likes
The thing is: I am in doubt what way to choose to make my resources as RESTful as possible. I already tried the next two ways to implement like/unlike structure in my URL's:
Case A: (like/unlike being the member of the "resource")
PUT /api/v1/resource/:id/like maps to Api::V1::ResourceController#like
PUT /api/v1/resource/:id/unlike maps to Api::V1::ResourceController#unlike
and case B: ("likes" is a resource on it's own)
POST /api/v1/resource/:id/likes maps to Api::V1::LikesController#create
DELETE /api/v1/resource/:id/likes maps to Api::V1::LikesController#destroy
In both cases I already have a user session, so I don't have to mention the id of the corresponding "like"-record when deleting/"unliking".
I would like to know how you guys have implemented such cases!
Update April 15th, 2011: With "session" I mean HTTP Basic Authentication header being sent with each request and providing encrypted username:password combination.

I think the fact that you're maintaining application state on the server (user session that contains the user id) is one of the problems here. It's making this a lot more difficult than it needs to be and it's breaking a REST's statelessness constraint.
In Case A, you've given URIs to operations, which again is not RESTful. URIs identify resources and state transitions should be performed using a uniform interface that is common to all resources. I think Case B is a lot better in this respect.
So, with these two things in mind, I'd propose something like:
PUT /api/v1/resource/:id/likes/:userid
DELETE /api/v1/resource/:id/likes/:userid
We also have the added benefit that a user can only register one 'Like' (they can repeat that 'Like' as many times as they like, and since the PUT is idempotent it has the same result no matter how many times it's performed). DELETE is also idempotent, so if an 'Unlike' operation is repeated many times for some reason then the system remains in a consistent state. Of course you can implement POST in this way, but if we use PUT and DELETE we can see that the rules associated with these verbs seem to fit our use-case really well.
I can also imagine another useful request:
GET /api/v1/resource/:id/likes/:userid
That would return details of a 'Like', such as the date it was made or the ordinal (i.e. 'This was the 50th like!').

case B is better, and here have a good sample from GitHub API.
Star a repo
PUT /user/starred/:owner/:repo
Unstar a repo
DELETE /user/starred/:owner/:repo

You are in effect defining a "like" resource, a fact that a user resource likes some other resource in your system. So in REST, you'll need to pick a resource name scheme that uniquely identifies this fact. I'd suggest (using songs as the example):
/like/user/{user-id}/song/{song-id}
Then PUT establishes a liking, and DELETE removes it. GET of course finds out if someone likes a particular song. And you could define GET /like/user/{user-id} to see a list of the songs a particular user likes, and GET /like/song/{song-id} to see a list of the users who like a particular song.
If you assume the user name is established by the existing session, as #joelittlejohn points out, and is not part of the like resource name, then you're violating REST's statelessness constraint and you lose some very important advantages. For instance, a user can only get their own likes, not their friends' likes. Also, it breaks HTTP caching, because one user's likes are indistinguishable from another's.

Typical normalization security issue in web applications

i am currently having a problem, i guess a lot of people have run into before and i would like to know how you handled it.
So, imagine you have 10.000 Users on your App. ( each one has an own user/pw login to administrate his stuff ).
Imagine further, that you have a growing normalized SQL-tablestructure in the backend, with tables like: Users, Orders, OrderPositions, Invoices, etc.
So, to show/edit/delete stuff of a table which isn't the usertable itself, u'll probably have links like these, to let ypur users interact with the application.
~/Orders/EditOrder?id=12
~/Orders/ShowOrderPosition?orderId=12&posId=443
Ok, and now the problem:
How, do i prevent in a "none-complex"-way, that user A has access ( show/edit/delete ) the data of user B.
Example:
User B calls:
~/Orders/ShowOrderPosition?orderId=12&posId=443
which is an order of user A, so user B should have no access to it.
So, in my code i would need to have a UserIdentity-check before or within every single SQL-statement, like:
select * from OrderPosition op, Order o, User u
where op.Id = :orderId
and op.Fk_OrderId = :orderpositionId
and o.Id = :orderId
and o.Fk_User = :userId
Only this way i can make sure, that the data belongs to the requesting user.
To reach the usertable will of course get far more complex, the deeper the usertable-connection is "buried" in the normalization ( imagine tables like payments or invoices, connected to the order-table... )
Question:
What is your approach to deal with this, concidering: Low complexity, DRY and performance
( Hope u understand what i mean ;) )

This is a bit like a multi-tenant application - I have gone down this route and denormalized an ID onto all those tables that require this kind of check (a tenant ID, in your case, sounds like the user id).
I then created an interface that contains this field only and applied it to all those classes in my model layer that required this access.
In my base data access (repository) class, where all the select/update/delete calls go through, I then check to see if the class if of the type of that interface, and I then check that the current access matches that ID.
Of course, this depends on how your code is structured, and how simple/complex making this global kind of change will be...

Never expose ids.
And if you have to: encrypt them.

Performance
for ultimate performance you will have to denormalize to the point that reading the row and comparing with some application level variable would give you an answer on what kind of rights the user has (this is fairly fast and if your DAO/BAO level is well organized plugging it in will keep it relatively DRY and at relatively low complexity.) NOTE: complexity is also a function of your security model, once you start to implement inheritable, positive and negative, role-based access rules then it can not be really simple.
DRY
another route to take (which is very seldomly taken these days) is to use your database roles to manage security; this might get complicated but will offer unparalleled security (as it will be ensured at the DB level and not application level. Complexity should go down, at the application code level, if you manage to encapsulate all of your access paths into VIEWS, which might require quite a bit of re-tailoring at the database level. However(!), it might be possible to implement security model with very little changes to the application code - by renaming existing tables and replacing them with secured views)

Don't use your internal ID column, encrypted or not, it'll come back to bite you one day.
Create a random, unique, string (GUID, whatever), which contains the link between the user and the data he's requesting. So, instead of having, for user 34567:
Edit order
Create a record {"5dsfwe8frf823jrf",34567,12} in a temporary table and show:
Edit order
When the users clicks the link, fetch 34567,12 from your temporary table.
The string 5dsfwe8frf823jrf is impossible to guess = no security risk.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Independent Indexes in Elastic Search - ruby-on-rails-3

Related

Should an API assign and return a reference number for newly created resources?

Ways to handle security/authorization in a multi tenant API

Rails save record draft with dummy variable

The REST-way to check/uncheck like/unlike favorite/unfavorite a resource

Typical normalization security issue in web applications

Categories

Resources