Is it a bad idea to have a web browser query another api instead of my site providing it? - api

Here's my issue. I have a site that provides some investing services, I pay for end of day data which is all I really need for my service but I feel its a bit odd when people check in during the day and it only displays yesterdays closing price. End of day is fine for my analytics but I want to display delayed quotes on my site.
According to the yahoo's YQL faq: If you use IP based authentication then you are limited to 1000 calls/day/IP, if my site grows I may exceed that but I was thinking of trying to push this request to the people browsing my site themselves since its extremely unlikely that the same IP will visit my site 1,000 times a day(my site itself has no use for this info). I would call a url from their browser, then parse the results so I can allow them to view it in the format of the sites template.
I'm new to web development so I'm wondering is it a common practice or a bad idea to have the users browser make the api call themselves?

It is not a bad idea at all:
You stretch up limitations this way;
Your server will respond faster (since it does not have to contact the api);
Your page will load faster because the initial response is smaller;
You can load the remaining data from the api in async manner while your UI is already responsive.
Generally speaking it is a great idea to talk with api's from the client. It's more dynamic, you spread traffic, more responsiveness etc...
The biggest downside I can think of is depending on the availability of other services. On the other hand your server(s) will be stressed less because of spreading the traffic.
Hope this helped a bit! Cheers!

Related

Is there a way to protect data from being scraped in a PWA?

Let’s say I have a client who has spent a lot of time and money creating a custom database. So there is a need for extra data security. They have concerns that the information from the database could get scraped if they allow access to it from a normal web app. A secure login won’t be enough; someone could log in and then scrape the data. Just like any other web app, a PWA won't protect against this.
My overall opinion is that sensitive data would be better protected on a hybrid app that has to be installed. I am leaning toward React-Native or Ionic for this project.
Am I wrong? Is there a way to protect the data from being scraped in a PWA?
There is no way to protected data visible to browser client regardless of technology - simple HTML or PWA/hybrid app.
Though you can make it more difficult.
Enforce limits on how many information a client can fetch per minute/hour/day. The one who exceed limits can be blocked/sued/whatever.
You can return some data as images rather than text. Would make extraction process a bit more difficult but would complicate your app and will use more bandwidth.
If we are talking about a native/hybrid app it can add few more layers to make it more secure:
Use HTTPS connection and enforce check for valid certificate.
Even better if you can check for a specific certificate so it's not replaced by a man-in-the-middle.
I guess iOS app would be more secure then Android as Android is easier to decompile and run modified version with removed restrictions.
Again, rate limiting seems to be the most cost effective solution.
On top of rate limiting, you can add some sort of pattern limiting. For example, if a client requests data with regular intervals close to limits, it is logical to think that requests are from a robot and data is being scrapped.
HTTPS encrypts the data being retrieved from your API, so it could not be 'sniffed' by a man in the middle.
The data stored in the Cache and IndexedDB is somewhat encrypted, which makes it tough to access.
What you should do is protect access to the data behind authentication.
The only way someone could get to the stored data is by opening the developer tools and viewing the data in InsdexedDB. Right now you can only see a response has been cached in the Cache database.
Like Alexander says, a hybrid or native application will not protect the data any better than a web app.

API call request limit

I have been looking into various different APIs which can provide my the weather data I need in JSON format. A lot of these API's have certain limits such as: in order to get more requests per minute, you need to pay more money per month so that your app can make more API requests.
However, a lot of these API's also have free account which five you limited access to them.
So what I was thinking is, wouldn't it be possible for a developer to just make lots of different developer accounts with an API provider and then just make lots of different API keys?
That way, they wouldn't have to pay anything as they could stick with the free accounts. Whenever one of the API keys has reached the maximum daily request calls, the developer could just put a switch statement in their code which gets their software to use a different API key.
I see no reason why this wouldn't work from a technical point of view... but, is such a thing allowed?
Thanks, Dan.
This would technically be possible, and it happens.
It is also probably against the service's terms, a good reason for the service to ban all your sock puppet accounts, and perhaps even illegal.
If the service that offers the API has spent time and money implementing a per-developer limit for their API, they have almost certainly enforced that in their terms of service, and you would be wise to respect those.
(relevant xkcd)

How many active, simultaneous connections can a web server accept?

I know this is a difficult question but here it is, in context:
Our company has a request to build a WordPress website for a certain client. The caveat is that, on one day per year, for a period of about 20 minutes, 5,000 - 10,000 people will attempt to access the home page of this website. Their purpose: Only to acquire an outbound link to another site.
My concern is, no matter what kind of hosting we provide, the server may reject the connections after a certain number of connections are reached.
Any ideas on this?
This does not depend on WordPress. WordPress is basically software to render webpages: it helps you to quickly modify the content content of a page. Other software like for instance Apache accepts connections and redirects the calls to for instance WordPress.
Apache can be configured to accept more connections. I think the default is about 200. If that's bad really depends. If the purpose is only to give another URL, you can say that connections will be terminated fast. So that's not really an issue. If on the other hand you want to generate an entire page using PHP and MySQL it can take some time before a client is satisfied. In that case 200 connections are perhaps not sufficient.
As B-Lat points out. You can use cloud computing platforms like Google App Engine or Microsoft Azure that provide a lot of server power. But only bill their clients on the consumption on these resources. In other words you can accept thousands of connections at once. But you don't need to pay for the other days when clients visit your website less often.

How to properly benchmark / stresstest single-page web application

I am somehow familiar with benchmarking/stress testing traditional web application and I find it relatively easy to start estimating maximum load for it. With tools I am familiar (Apache ab, Apache JMeter) I can get a rough estimate of the number of request/second a server with a standard app can handle. I can come up with user story, create a list of page I would like to check and benchmark them separately. A lot of information can be found on the internet how to go from novice like me to a master.
But in my opinion a lot of things is different when benchmarking single page application. The main entry point is the most expensive request, because the user loads majority of things needed for proper app experience (or at least in my app it is this way). After it navigation to other places is just ajax request, waiting for json, templating. So time to window load is not important anymore.
To add problems to this, I was not able to find any resources how people do it properly.
In my particular case I have a SPA written with knockout, and sitting on apache server (most probably this is irrelevant). I would like to have rough estimate how many users can my app on the particular server handle. I am not looking for a tool recommendation (also it would be nice), I am looking for experienced person to share his insight about benchmarking process.
I suggest you to test this application just like you would test any other web application, as you said - identify the common use cases, prepare scripts for them, run in the appropriate mix and analyze the results.
Web-applications can break in many different ways for different reasons. You are speculating that the first page load is heavy and the rest is just small ajax. From experience I can tell you that this is sometimes misleading - for example, you can find that the heavy page is coming from cache and the server is not working hard for it, but a small ajax response requires a lot of computing power or long running database query or has some locking in the code that cause it to break or be slow under load - that's why we do load testing.
You can do this with any load testing tool, ideally one that can handle those types of script with many dynamic values. My personal preference is WebLOAD by RadView
I am dealing with similar scenario, SPA application where first page loads and there after everything is done by just requesting for other html pages and/or web service calls to get the data.
My goal is to stress test the web server and DB server.
My solution is to just create request for those html pages(very low performance issue, IMO, since they are static and they can be cached for a very long time in the browser) and web service call requests. Biggest load will come from the request for data/processing via the web service call requests.
Capture all the requests for html and web service calls using a tool like fiddler, and use any load test tools(like JMeter) to run these requests using as many virtual users as you want to test your application with.

"reasonable" use of web APIs to sync data

My goal is to synchronize a web-application with an internal database. The web-application has a public API, but in order to fully synchronize the two sources I would need to make around 2000 separate API calls every time. My instinct tells me that this is excessive and possibly irresponsible, but I lack the experience to know for sure.
In this particular case the web-application is Asana, but I've encountered similar situations before with other services. Is there any way to know if you're abusing a service through excessive API calls? I know I'm not going to DOS a company like Asana, but I can't shake the feeling that there must be a better way than making ~150k requests per day.
The only other option I can think of is to update the web-service only when I know there's been a change in the database, but I'll lose a lot of capability that way.
I apologize for the subjectivity of this question, but I'm really hoping that someone can explain if there's any kind of etiquette that's expected when using public APIs.
(I work at Asana)
This is an excellent question, or rather set of questions.
You are designing a system that will repeatedly make requests for every object. What will happen as the number of objects grows? Even if your initial request rate were reasonable, this would suffer problems with scalability. A more scalable solution is one that scales with the number of changes in the system. This will also grow over time, but much more slowly - the number of changes a single user can make per day is relatively constant, but the total number of objects they've created over time grows and grows. So my first piece of advice would be to avoid doing things this way, and instead find a way to detect changes and just act on those. It would be interesting to know why you feel you'll lose capability by taking this approach.
Now, I happen to know that the Asana API does not currently provide you with any friendly mechanism to just detect changes in the system. This is a commonly requested feature and we are looking into it, though I unfortunately cannot promise a delivery date. So you might be left with no choice but to poll our system for now.
As for being polite to the API, many service providers set limits on their API usage to prevent accidental or malicious use of the API from impacting the service to their other customers -- Asana is no exception. Sometimes these limits are published, other times not, and there is no standard limit: it all depends on the service. But it is very thoughtful of you to be curious about service limitations.
That said, 150k requests per day is, for the Asana API, kind of a lot. If all of our API users gave us that much traffic, we might be serving more requests per day than Google Web Search, and we're not quite that scalable yet. :) Technically, sometimes, we might handle requests at that volume from a single user.
If you must poll, try to poll on intervals like 15 minutes. But please do not poll your entire workspace on this time period; it's likely to be too much traffic/data. We're working on trying to provide you with a better solution.
If you do happen to make too many requests of the Asana API, you will get back HTTP status code 429 instead of your desired response; you can read more about that here (https://asana.com/developers/documentation/getting-started/errors).