How do I store 3rd-party API data after user interaction? - api

The project that I'm currently on is consuming a large volume of 3rd-party information exposed via APIs. These datasets are constantly changing and in the order of millions of entries for each.
Users are to denote their favorites and recall that data when they need it. An example may be that the user wants to "bookmark" an inventory level to their "analyze later" list.
My current thinking is that during actions like searching users are presented with "live" data from the 3rd parties. If they flag something they're interested in I copy that data to a database I control. Subsequent views of that info are served from my database, not the 3rd party, since the 3rd party entry may change (or cease to exist entirely).
Is this good API practice? What object keys are sent to the client-facing application on search? The 3rd party keys? Or do I preprocess the results of a search and determine which items I have locally, thus returning local keys in those instances? Or do I completely abstract the 3rd party sources and generate unique local keys for every returned item, which is then subsequently used if someone saves [that seems REALLY heavy, tho]? Or do I put that processing off and do the lookup as to whether something exists locally to after someone bookmarks something?

Related

Online Users Storing Elixir

I am working on one chatroom [all to all] application in Elixir using OTP Genserver and getting messages from js client as user gets registered with their names as first phase. Now, just bit not sure what would be the best approach to store these names at my elixir server somehow and send regular updates to client with list of users online or database storage. Please suggest the best approach.
I agree with bitwalker that ETS is a good fit.
Here's a short summary of what I did in production. It wasn't a chat server, but a server push with a couple of thousand of users connecting via long polling. Pushed data was divided in some 50 categories, and users were able to choose which ones they want. At peak times the server pushed new messages each 2 secs, and processed > 2000 reqs/sec.
Essentially, I kept a gen_server for each user, where I held pending messages and user's configuration (basically a list of selected channels). This was beneficial with long polling, since user's data is decoupled from the user's request, so the data remains while requests are transient. However, I think this approach is also good for permanent connections, such as websockets, since there might still be occasional disconnections, and keeping a more stable user's data gives you a chance of resuming after reconnect.
Obviously, when a request arrives, you need to find the user specific process, and for this, ETS is a good fit, since you don't have a single process bottleneck. Instead of manually working with ETS, I'd recommend using gproc in conjunction with via tuples. Basically, when starting a user's gen_server, you can provide name: {:via, :gproc, {:n, :l, key}} where key is some custom key (arbitrary term) you make based on your internal user's id(:n and :l indicate a unique name on the local node). You can then use that same via tuple when issuing calls/casts, and gen_server will use gproc to find the corresponding process.
Finally, you need to have some timeout/disconnect logic to cleanup user processes. In my case, I simply terminated a user's process if there was no activity from the web layer (no end-user came for data in some time). Gproc will automatically remove entries for terminated process from its internal ETS table. It's probably best to supervise user processes under a temporary strategy.
I realize all of this is still a bit vague, but I hope it makes some sense. Keep in mind that this is not the ultimate pattern (there's no such thing of course), but I think it's a reasonable first attempt.
You may also want to take a look at Phoenix web framework that has an interesting pub-sub facility in form ofTopics. I didn't try this out myself yet, but it seems interesting, and may even simplify some of the stuff I discussed above, or at least help for pushing notifications from chatroom to all users.
Sounds like a good use case for ETS.
A simpler approach might be to use an Agent to store the online users information, but it depends quite a lot on what you need from the storage mechanism you choose.

Should an API assign and return a reference number for newly created resources?

I am building a RESTful API where users may create resources on my server using post requests, and later reference them via get requests, etc. One thing I've had trouble deciding on is what IDs the clients should have. I know that there are many ways to do what I'm trying to accomplish, but I'd like to go with a design which follows industry conventions and best design practices.
Should my API decide on the ID for each newly created resource (it would most likely be the primary key for the resource assigned by the database)? Or should I allow users to assign their own reference numbers to their resources?
If I do assign a reference number to each new resource, how should this be returned to the client? The API has some endpoints which allow for bulk item creation, so I would need to list out all of the newly created resources on every response?
I'm conflicted because allowing the user to specify their own IDs is obviously a can of worms - I'd need to verify each ID hasn't been taken, makes database queries a lot weirder as I'd be joining on reference# and userID rather than foreign key. On the other hand, if I assign IDs to each resource it requires clients to have to build some type of response parser and forces them to follow my imposed conventions.
Why not do both? Let the user create there reference and you create your own uid. If the users have to login then you can use there reference and userid unique key. I would also give the uid created back if not needed the client could ignore it.
It wasn't practical (for me) to develop both of the above methods into my application, so I took a leap of faith and allowed the user to choose their own IDs. I quickly found that this complicated development so much that it would have added weeks to my development time, and resulted in much more complex and slow DB queries. So, early on in the project I went back and made it so that I just assign IDs for all created resources.
Life is simple now.
Other popular APIs that I looked at, such as the Instagram API, also assign IDs to certain created resources, which is especially important if you have millions of users who can interact with each-other's resources.

Yodlee APIs: ContentServiceInfos versus SiteInfos

There appear to be two lines of APIs for adding, authenticating and aggregating sites. Depending upon which version of the Documentation/SDK set your rep started you off on, or where in the SDK Guide you started implementing from determines where you start.
Path #1 starts at
ContentServiceTraversal which allows for the retrieval of all ContentServiceInfo (by container type (such as BANK)
ItemManagementService is used to add these items
Refresh is done through RefreshService (most API not containing the word Site)
Path #2 starts at
SiteTranversalService which allows for the retrieval of all SiteInfo (no apparent support for Container Type filter)
SiteAccountManagementService is used to add these items
Refresh is done through Refreshservice (all API containing the word Site).
From the best that I can tell the aforementioned API have a lot of functionality duplication. I have noticed certain API that exist on one branch and not the other but usually they are minor changes (e.g. things you are able to filter by).
I started off with ContentServiceInfo because the documentation and samples that our rep initially gave us started there. Additionally this API started off by providing greater granularity (e.g. simply being able to filter by Container type since we were pretty much only interested in Banks and Processor sites (which I do not believe you guys support)).
My questions are:
Do the two branches of API do the exact same thing?
Do they mostly behave the same way?
Do they back-end to the exact same
System
Data store
Scraper?
Is one line of API supposed to be deprecated sooner in the future than another?
Does one line of API have more future in terms of actually adding new or augmenting existing functionality?
Site-level addition has been introduced through Yodlee APIs to overcome the fact that though a user had bank,creditcard,loan,rewards account at the same end site, user had to provide credentials for each of these containers. Site level addition APIs try to add all these containers with only 1 set of credentials. That's the only difference between container based addition and site based addition.
As to answer your questions:
Do the two branches of API do the exact same thing?
Do they mostly behave the same way?
If you mean the aggregation functionality, Yes.Except for the fact that Site level adds/refreshes all the container(bank,creditcard,loan,rewards) and Container level can add/refresh only one container per API call, all the other behavior will remain the same.
Do they back-end to the exact same
System
Data store
Scraper?
If you are referring to the Yodlee data gathering components, Yes.
Is one line of API supposed to be deprecated sooner in the future than another?
No.Both these sets of APIs cater to different needs. If you are a company who solely rely on Creditcard data, using site level addition will be overkill as it will take longer time for the aggregation and it makes more sense to use container based addition. There is also the factor of backward compatibility, which rules out deprecation of APIs.

REST best practices: should a store also return metadata?

I'm building my first REST API (at least trying) for a personal project.
In this project there are resources called players which hold can be in a team. According to REST API design rulebook a resource should be made either to be a document or a store and one should keeps these roles as segregated as possible.
Yet I would like to append some metadata to the team resource, eg the date the team was founded. Is it okay then for GET /teams/atlanta to return this metadata (making it a document) alongside the list of players in the team (making it a store).
Is this a good idea? If so why? If not why not and how to solve this better?
I know there are no rules to developing a REST API, but there are good practices and I would like to adhere to those. Please also not that this is really my first REST API so pardon my ignorance if there is any.
I would recommend having GET /teams/atlanta return just the information about the team, such as the founding date that you mention, and then having GET /teams/atlanta/players return the list of players for that team. These distinctions become more important when you are presenting an API that uses HTTP methods other than GET.
For example, if you wanted to add a player to a team - this would be a lot easier if you could just POST a player object to /teams/atlanta/players than if you had to PUT the whole team object to /teams/atlanta every time you wanted to add one individual player.
If your API only allows retrieval of data, and if it is for a specific client application, there is an argument for combining all the team data into one object to save the client having to make additional requests for the data, but bear in mind that it is less flexible.
Your application may want to display a list of teams by calling GET /teams but you probably wouldn't want all of the player information included in each object in the list as this is quite a lot of data, but if GET /teams/atlanta returns player information then it would be inconsistent not to include it in the list version too.
I would personally favour splitting up the resources as I've suggested, and live with the fact the client may need to make an extra request or two.

eCommerce Third Party API Data Best Practice

What would be best practice for the following situation. I have an ecommerce store that pulls down inventory levels from a distributor. Should the site, for everytime a user loads a product detail page use the third party API for the most up to date data? Or, should the site using third party APIs and then store that data for a certain amount of time in it's own system and update it periodically?
To me it seems obvious that it should be updated everytime the product detail page is loaded but what about high traffic ecommerce stores? Are completely different solutions used for that case?
In this case I would definitely cache the results from the distributor's site for some period of time, rather than hitting them every time you get a request. However, I would not simply use a blanket 5 minute or 30 minute timeout for all cache entries. Instead, I would use some heuristics. If possible, for instance if your application is written in a language like Python, you could attach a simple script to every product which implements the timeout.
This way, if it is an item that is requested infrequently, or one that has a large amount in stock, you could cache for a longer time.
if product.popularityrating > 8 or product.lastqtyinstock < 20:
cache.expire(productnum)
distributor.checkstock(productnum)
This gives you flexibility that you can call on if you need it. Initially, you can set all the rules to something like:
cache.expireover("3m",productnum)
distributor.checkstock(productnum)
In actual fact, the script would probably not include the checkstock function call because that would be in the main app, but it is included here for context. If python seems too heavyweiaght to include just for this small amount of flexibilty, then have a look at TCL which was specifically designed for this type of job. Both can be embedded easily in C, C++, C# and Java applications.
Actually, there is another solution. Your distributor keeps the product catalog on their servers and gives you access to it via Open Catalog Interface. When a user wants to make an order he gets redirected in-place to the distributor's catalog, chooses items then transfers selection back to your shop.
It is widely used in SRM (Supplier Relationship Management) branch.
It depends on many factors: the traffic to your site, how often the inventory levels change, the business impact of displaing outdated data, how often the supplers allow you to call their API, their API's SLA in terms of availability and performance, and so on.
Once you have these answers, there are of course many possibilities here. For example, for a low-traffic site where getting the inventory right is important, you may want to call the 3rd-party API on every call, but revert to some alternative behavior (such as using cached data) if the API does not respond within a certain timeout.
Sometimes, well-designed APIs will include hints as to the validity period of the data. For example, some REST-over-HTTP APIs support various HTTP Cache control headers that can be used to specify a validity period, or to only retrieve data if it has changed since last request.