Yodlee- Transaction Storage - yodlee

I'm creating a Fin App which uses Yodlee service to access user's transaction information. I want to display this information to user each time when they log in. So I'm wondering if my app should store this transaction information in the database after the initial successful API query or should the app query the API each time the user log in. I can see either way works but I'm wondering what's the standard way Fin App developer uses. And if so, is what's the advantage/disadvantage?

As mentioned, there are two ways to display the transactions to the user.
1. Query the API each time and then display the transactions to the user.
Pros:
You need not to have a DB infrastructure to store the transactions.
Easy to implement.
Cons:
You need to dependent on Yodlee every time you want to display transactions to the user.
Depending upon how many number of day's/transactions you display to the user, may cause issues as response would be huge depending upon the number of transactions user will have.
In case due to some network issue your App won't be able to connect with Yodlee then better User Experience could be questioned.
2. Query and store the transactions and then display it from your local database.
Pros:
You can query and store the user's transactions and even do analytic on that.
Shouldn't cause any issue if user has much more transactions, you can put customer queries to display the transactions.
You could use Procedural Data Extracts to keep your data in sync with Yodlee i.e., have your latest data.
Cons:
You need to implement your own transactions reconciliation logic.
Have to setup DB infrastructure.
These are the high level pros and cons of both the approaches, while it depends upon what solution you are building and how/what options you will be going to provide users to see the transactions in the App.

Related

Service that does advanced queries on a data set, and automatically returns relevant updated results every time new data is added to the set?

I'm looking for a cloud service that can do advanced statistics calculations on a large amount of votes submitted by users, in "real time".
In our app, users can submit different kind of votes like picking a favorite, rating 1-5, say yes/no etc. on various topics.
We also want to show "live" statistics to the user, showing the popularity of a person etc. This will be generated by a rather complex SQL where we are calculating the average number of times a person was picked as favorite, divided by total number of votes and the number of games in which the person has been participating etc. And the score for the latest X games should count higher than the overall score for all games. This is just an example, there are several other SQL queries with similar complexity.
All our presentable data (including calculated statistics) is served from Firestore documents, and the votes will be saved as Firestore documents.
Ideally, the Firebase-backend (functions, firestore etc) should not need to know about the query logic.
What I wish for is a pay as you go cloud service that does the following:
I define some schemas and set up the queries we need for the statistics we have (15-20 different SQLs). Like setting up views in MySQL
On every vote, we push the vote data to this service, which will store it in a row.
The service should then, based on its knowledge about the defined queries, and the content of the pushed vote data, determine which statistics that are affected by the newly added row, and recalculate these. A specific vote type can affect one or more statistics.
Every time a statistic is recalculated, the result should be automatically pushed back to our Firebase backend (for instance by calling an HTTPS endpoint that hits a cloud function) - so we can update the relevant Firestore documents.
The service should be able to throttle the calculations, like only regenerating new statistics every 1 minute despite having several votes per second on the same topic.
Is there any product like this in the market? Or can it be built by combining available cloud services? And what is the official term for such a product, if I should search for it myself?
I know that I can probably build a solution like this myself, and run it on a cloud hosted database server, which can scale as our need grows - but I believe that I'm not the first developer with a need of this, so I hope that someone has solved it before me :)
You can leverage the existing cloud services available on the Google Cloud Platform.
Google BigQuery, Google Cloud Firestore, Google App Engine (CRON Jobs), Google Cloud Tasks
The services can be used to solve the problems mentioned above:
1) Google BigQuery : Here you can define schema for the data on which you're going to run the SQL queries. BigQuery supports Standard and legacy SQL queries.
2) Every vote can be pushed to the defined BigQuery tables using its streaming insert service.
3) Every vote pushed can trigger the recalculation service which calculates the statistics by executing the defined SQL queries and the query results can be stored as documents in collections in Google Cloud Firestore.
4) Google Cloud Firestore: Here you can store the live statistics of the user. This is a real time database, so you'll be able to configure listeners for the modifications to the statistics and show the modifications as soon as the statistics are recalculated.
5) In the same service which inserts every vote, create a new record with a "syncId" in an another table. The idea is to group a number of votes cast in a particular interval to a its corresponding syncId. The syncId can be suffixed with a timestamp. According to your requirement a particular time interval can be set so that the recalculation can be triggered using CRON jobs service which invokes the recalculation service within the interval. Once the recalculation related to a particular syncId is completed the record corresponding to the syncId should be marked as completed.
We are leveraging the above technologies to build a web application on Google Cloud Platform, where the inputs are recorded on Google Firestore and then stream-inserted to Google BigQuery. The data stored in BigQuery is queried after 30 sec of each update using SQL queries and the query results are stored in Google Cloud Firestore to serve dashboards which are automatically updated using listeners configured for the collection in which the dashboard information is stored.

Storing user logs on remote server: Which data should be collected in a React Native app?

I've developed a react native application where users can login, chose different items from a list, see the details to the item (profile) and add/delete/edit different posts (attached to one item).
Now my user base has grown and therefor I have decided to introduce new database tables in order to log each action my users do to analyze the accumulated data later and optimize for example the usability etc.
My first question is: Is there any convention or standard that lists the data to be collected in such a case (like logtime, action, ...)? Don't want to lose any useful data because I've noticed the value of it too late.
And: In which time intervals should an app send the users logdata to my remote server (async requests after each action, daily, before logout...)? Is there any gold standard?
Actually it's more about how much data you would like to collect and if it is matching with your privacy terms and conditions. If you're going to store the data on some server other than yours to analyse it, it is highly recommended that you don' refer to user ids there, clearly for privacy reasons.
About when is the right time to log data, again it depends of the data you would like to track, if you are tracking how many minutes they spend on a screen or how they interact with some posts, you may need to send those regularly to your server depending on your needs: whether you want to analyse the data instantly to improve the user experience (show more relevant posts) or just to use the data later. If the data you need to analyse is not really that much, you can do it after each call, if you're planning to track huge amounts of data that you don't need right away, probably you could choose to send the data at time frames where you don't have a big load on your server (to save bandwidth, you can choose night time (it's a little bit more complicated than that))

Handling paging with changing sort orders

I'm creating a RESTful web service (in Golang) which pulls a set of rows from the database and returns it to a client (smartphone app or web application). The service needs to be able to provide paging. The only problem is this data is sorted on a regularly changing "computed" column (for example, the number of "thumbs up" or "thumbs down" a piece of content on a website has), so rows can jump around page numbers in between a client's request.
I've looked at a few PostgreSQL features that I could potentially use to help me solve this problem, but nothing really seems to be a very good solution.
Materialized Views: to hold "stale" data which is only updated every once in a while. This doesn't really solve the problem, as the data would still jump around if the user happens to be paging through the data when the Materialized View is updated.
Cursors: created for each client session and held between requests. This seems like it would be a nightmare if there are a lot of concurrent sessions at once (which there will be).
Does anybody have any suggestions on how to handle this, either on the client side or database side? Is there anything I can really do, or is an issue such as this normally just remedied by the clients consuming the data?
Edit: I should mention that the smartphone app is allowing users to view more pieces of data through "infinite scrolling", so it keeps track of it's own list of data client-side.
This is a problem without a perfectly satisfactory solution because you're trying to combine essentially incompatible requirements:
Send only the required amount of data to the client on-demand, i.e. you can't download the whole dataset then paginate it client-side.
Minimise amount of per-client state that the server must keep track of, for scalability with large numbers of clients.
Maintain different state for each client
This is a "pick any two" kind of situation. You have to compromise; accept that you can't keep each client's pagination state exactly right, accept that you have to download a big data set to the client, or accept that you have to use a huge amount of server resources to maintain client state.
There are variations within those that mix the various compromises, but that's what it all boils down to.
For example, some people will send the client some extra data, enough to satisfy most client requirements. If the client exceeds that, then it gets broken pagination.
Some systems will cache client state for a short period (with short lived unlogged tables, tempfiles, or whatever), but expire it quickly, so if the client isn't constantly asking for fresh data its gets broken pagination.
Etc.
See also:
How to provide an API client with 1,000,000 database results?
Using "Cursors" for paging in PostgreSQL
Iterate over large external postgres db, manipulate rows, write output to rails postgres db
offset/limit performance optimization
If PostgreSQL count(*) is always slow how to paginate complex queries?
How to return sample row from database one by one
I'd probably implement a hybrid solution of some form, like:
Using a cursor, read and immediately send the first part of the data to the client.
Immediately fetch enough extra data from the cursor to satisfy 99% of clients' requirements. Store it to a fast, unsafe cache like memcached, Redis, BigMemory, EHCache, whatever under a key that'll let me retrieve it for later requests by the same client. Then close the cursor to free the DB resources.
Expire the cache on a least-recently-used basis, so if the client doesn't keep reading fast enough they have to go get a fresh set of data from the DB, and the pagination changes.
If the client wants more results than the vast majority of its peers, pagination will change at some point as you switch to reading direct from the DB rather than the cache or generate a new bigger cached dataset.
That way most clients won't notice pagination issues and you don't have to send vast amounts of data to most clients, but you won't melt your DB server. However, you need a big boofy cache to get away with this. Its practical depends on whether your clients can cope with pagination breaking - if it's simply not acceptable to break pagination, then you're stuck with doing it DB-side with cursors, temp tables, coping the whole result set at first request, etc. It also depends on the data set size and how much data each client usually requires.
I am not aware of a perfect solution for this problem. But if you want the user to have a stale view of the data then cursor is the way to go. Only tuning you can do is to store only the data for 1st 2 pages in the cursor. Beyond that you fetch it again.

What database solution will you suggest for competitive online tickets sale

Can you please give me an database design suggestion?
I want to sell tickets for events but the problem is that the database can become bootleneck when many user what to buy simultaneously tickets for the same event.
if I have an counter for tickets left for each event there will be more updates on this field (locking) but I will easy found how much tickets are left
if I generate tickets for each event in advance it will be hard to know how much tickets are left
May be it will be better if each event can use separate database (if the requests for this event are expected to be high)?
May be reservation also have to asynchronous operation?
Do I have to use relation database (MySQL, Postgres) or no relation database (MongoDB)?
I'm planing to use AWS EC2 servers so I can run more servers if I need them.
I heard that "relation databases don't scale" but I think that I need them because they have transactions and data consistency that I will need when working with definite number of tickets, Am I right or not?
Do you know some resources in internet for this kind of topics?
If you sell 100.000 tickets in 5 minutes, you need a database that can handle at least 333 transactions per second. Almost any RDBMS on recent hardware, can handle this amount of traffic.
Unless you have a not so optimal database schema and/of SQL, but that's another problem.
First things first: when it comes to selling stuff (ecommerce), you really do need a transactional support. This basically excludes any type of NoSQL solutions like MongoDB or Cassandra.
So you must use database that supports transactions. MySQL does, but not in every storage engine. Make sure to use InnoDB and not MyISAM.
Of cause many popular databases support transactions, so it's up to you which one to choose.
Why transactions? Because you need to complete a bunch of database updates and you must be sure that they all succeed as one atomic operation. For example:
1) make sure ticket is available.
2) Reduce the number of available tickets by one
3) process credit card, get approval
4) record purchase details into database
If any of the operations fail you must rollback the previous updates. For example if credit card is declined you should rollback the decreasing of available ticket.
And database will lock those tables for you, so there is no change that in between step 1 and 2 someone else tries to purchase a ticket but the count of available tickets has not yet been decreased. So without the table lock it would be possible for a situation where only 1 ticket is left available but it is sold to 2 people because second purchase started between step 1 and step 2 of first transaction.
It's essential that you understand this before you start programming ecommerce project
Check out this question regarding releasing inventory.
I don't think you'll run into the limits of a relational database system. You need one that handles transactions, however. As I recommended to the poster in the referenced question, you should be able to handle reserved tickets that affect inventory vs tickets on orders where the purchaser bails before the transaction is completed.
your question seems broader than database design.
first of all, relational database will scale perfectly well for this. You may need to consider a web services layer which will provide the actual ticket brokering to the end users. here you will be able to manage things in a cached manner independent of the actual database design. however, you need to think through the appropriate steps for data insertion, and update as well as select in order to optimize your performance.
first step would be to go ahead and construct a well normalized relational model to hold your information.
second, build some web service interface to interact with the data model
then put that into a user interface and stress test for many simultaneous transactions.
my bet will be you need to then rework your web services layer iteratively until you are happy - but your database (well normalized) will not be cusing you any bottleneck issues.

What is the best way to allow website users to edit already existing database records?

I am building a web application that will essentially allow authenticated users access to mass amounts of data, but I don't want users to only have read-only access. If there are records missing fields but a user has found information to fill these fields or correct already populated data, I would like the user to be able to do so.
However, I'm worried about mean-spirited folks coming in and simply clearing out records out of sheer boredom and am wondering what the best way to prevent this from happening would be.
My first thought is to have users submit edits, and have a page devoted to batch approvals of these edits after myself or trusted individuals skim over the page. Of course, this would be time consuming (especially as the database grows larger), and I'm curious to know of any better ways to give users editing privileges.
As you are in Rails, there are a number of plugins that provide auditing and versioning of records -
http://github.com/andersondias/acts_as_auditable
http://github.com/laserlemon/vestal_versions
These should let you build something that allows edits but still support reversions in the worst case scenario.
Support rollbacks, like Wikis, to undo malicious edits.