Rails "sub-environment" -- still production (or test, etc.) but different - ruby-on-rails-3

How should we best handle code that is part of a single Rails app, but is used in several different "modes"?
We have several different cases of an app that is driven from the same data sources (MySQL, MongoDB, SOLR) and shares core logic, assets, etc. across multiple different uses.
Background/details:
HTML vs REST API
A common scenario is that we have HTML and REST interfaces. These differences are handled through routing (e.g. /api/v1/user/new vs /user/new) -- with minor differences they provide the same functions. This seems reasonably clean to me.
Multi-tenant
Another common scenario is that the app is "multi-tenant", determined mainly by subdomain of the URL, e.g. partner1.example.com and partner2.example.com (or query-string parameter for API customers) -- each has a number of features or properties that differ. This is handled by a filter ApplicationController using data largely stored in a set of tenant-specific database tables with tenant-specific functionality encapsulated by methods. This also seems reasonably clean to me.
Offline Tasks
One scenario is that a great deal of the data is acquired through a very large number of tasks, running pretty much continuously: feed loaders, scrapers, crawlers, and other tasks of this sort ... the kinds of things you would find in a search engine, which is a large part of what we do. These tasks are launched on idle server instances and run periodically ... but are just rake tasks that are part of the app.
These tasks are characteristically different than our front-end code -- they update data, run calculations, do maintenance tasks and so on -- some tasks run for days (e.g. update 30M documents from an external web service). In the end, these tasks create and keep fresh the core data that our front end app uses.
This one doesn't seem as clean to me, in particular, in some cases, these tasks are running and doing data updates at the same time as our application is using them, so occasionally need to defer to the front-end app when we're under peak loads.
Major Variants of the App
This last case is clearly wrong -- we have made major customizations of our app -- 15% or 20% different, by making branches and then running as an entirely separate app, sharing some of the core data sometimes, but using some of its own data other times. We have mostly fixed this now, as it was, of course, untenable.
OK, there's a question in here somewhere, right?
So in particular for the offline tasks I feel like the app really needs to be launched in a "mode" or perhaps "sub-environment". But we still have normal development, test, qa, demo, pre_release, production environments that have their own isolated data and other configuration parameters. For each of these, we want to be able to run, develop, test and deploy the various "modes" of the application.
Can anyone suggest an appropriate architecture that is similar to the declarative notions of standard Rails environments?

If the number of modes is ever-increasing:
Perhaps the offline tasks could be separated from the main app, into their own application (or a parent abstract task with actual tasks inheriting from it and deployed individually).
If the number of modes is relatively small and won't be changing often:
You could put the per-mode configuration into a config file, logically separate from the rest of the code. Then during the deployments, you would be able to provide a combination of (environment, mode, set of hosts) and get a good level of control of your environments while using the same codebase.

Related

Can you make separate schedules for workflows on staging versus prod?

I would like to have certain workflows run on a different schedule in staging and in production (ex: one workflow run multiple times a day in staging and only once a day in production). Doing so would help with getting quicker feedback on the runs and also save on compute costs. How can I do this with flytekit? Is it recommended?
There is no easy way to do this as it goes against main integration/deployment strategy championed by Flyte.
Flyte entities, comprising tasks, workflow, and launch plans, are designed to be iterated under in a user's development domain. After iterating, users are expected to deploy that version to the staging domain and then to production. The more difference there are between those domains, the more confusion there will be down the road we suspect.
That said, it is possible to do this because the registration step allows the user to specify a different configuration file. One of the entries in the configuration file is this workflow_packages construct. This enables the user basically to look at different folders when registering in staging vs. production for instance.
In order to get a launch plan to only exist in one domain, you'll need to put it in a new folder/module that is inaccessible from any of the extant workflow packages, and then put the other one in yet another.
In the staging file,
[sdk]
workflow_packages=app.workflows,staging_lps
In the production file,
[sdk]
workflow_packages=app.workflows,production_lps

Use IronWorkers while using my work

My website is hosted on AWS Elastic Beanstalk (PHP). I use Yii Framework as an MVC.
A while ago I wanted to run a SQL query everyday. I looked up how to run crons on Beanstalk and it seemed complicated to merge the concepts of Cloud and Cron. I ran into Iron Worker (http://www.iron.io/worker), and managed to create a worker that is currently doing its job fine.
Today I want to run a more complex cron (Look for notifications in my database, decide whether to send an email, build an email template and send the email (via AWS SES).
From what I understand, worker files are supposed to be self-contained items, with everything they need to work.
However, I have invested a lot of time and effort in building my MVC. I have complex models, verifications, an email templating engine, etc...
It seems very difficult to use the work I've done to create an Iron Worker. Even if I managed to port all of my code to a worker (which seems like a great deal of work), it means anytime I make changes to my main code I need to make sure the worker also has those changes. It means I would have a "branch" of my code. Even more so if I want to create more workers in the future.
What is the correct approach?
Short-term, you could likely just use the scheduling capabilities in IronWorker and have the worker hit an endpoint in your application. The endpoint will then trigger the operations to run within your app environment.
Longer-term, we do suggest you look at more of a service-oriented approach whereby you break your application up to be more loose-coupled and distributed. Here's a post on the subject. The advantages are many especially around scalability and development agility.
https://blog.heroku.com/archives/2013/12/3/end_monolithic_app
You can also take a look at this YII addition.
http://www.yiiframework.com/extension/yiiron/
Certainly don't want you rewrite your app unnecessarily but there are likely areas where you can look to decouple. Suggest creating a worker directory and making efforts to write the workers to be self-contained. In that way, you could run them in a different environment and just pass payloads to the worker. (Push queues can also be used to push to these workers.) Once you get used to distributed async processing, it's a pretty easy process to manage.
(Note: I work at Iron.io)

Testing an n-tier web application - should my test project have its own database?

In an n-tier web-app, should I be running integration tests against a different database, one dedicated to testing the code? Is it standard practice to test against the production database as well?
You should never run untested code on production. After all, you don't want to discover that it has a bug that wipes out all data. That's what tests are supposed to find. And you should not have test/staging data in the production system. It is good practice to dump the data out of production and load it into another environment for periodic testing with real-world data.
You should have a test database (not shared with production). It's a good idea to wipe out the data before every test.
You can have smoke tests that run in production. They will pretend to be a user(agent) and visit many pages, maybe even create things (with a special tag so you can find them again and delete them.)
I'd rather think of different database user with own data set. Database schema should be the same. I'd never run tests on production database with the same database user. Test logic shouldn't even be delivered to the client as it may lead to severe security issues.
In my opinion you'd need a full production-like data set for testing purposes, to be able to test every single feature of your application. And also you would need an empty database (without any bussiness data) for application clients to have it as initial point on delivery. Such a dataset shouldn't be tested as there is no data needed to test bussiness logic.

How to perform stress test against sharepoint site using threads

I want to analyze the performance (hence its weak points) of a sharepoint site doing stress test activity. What is needed to be done is call some methods exposed via web service that do the following things inside the sharepoint site:
-create a new group
-add a content to the group
-add an attachment to the content
-delete the content
-delete the previously created group
What is required is a simulation of a situation where there are 4500 users trying to do these operations concurrently (at the same time or more realistically within a short timespan, for example within 5 seconds).
We want to register the execution time of each operation (web method, for example of the "create new group"), too. I thought I could simulate these operations via a console applications using threads and stopwatchs. Is there anyone who has encountered a similar problem and can give me any existing solutions or hints to do it "the right way"? For
example how can I obtain that all threads start at the same instant? Thanks in advance.
I am a user of Visual Studio Load Testing since 2 years, and I find it very powerfull and easy to use. You can run integration tests, navigation in a web site, simulate database load, ... in fact, everything. Because it is a MS application, it is also fully compatible with all MS products like Sharepoint : it's easier to call a WCF service from a unit test than another technology (how to test nettcpbinding ?). You can also use the Visual Studio Profiler for instrumenting your code (and see what line of code is expensive or event ADO.net interactions). You can also easily extend the load testing by many extensibility points.
One important thing is that VS laod testing is "intrusive". It will note only collect response time, request lengths, ... but also all performance counters, database queries, ... All this metrics are saved in a dedicated database like SQLExpress for reporting. There is an AddOn for Excel.
Juste one important note (available for all load testing solutions !) :
You can run load tests from a developer machine or even a single dedicated machine, but you usually can't generate enough traffic to really see how the application responds (you machine can not simulate 500 concurrent users because of limited CPU/Memory/Network) . In order to simulate a lot of users, you'll set up what is known as a Load Test Rig.
A test rig is made up of a Test Controller machine and one or more Test Agent machines as shown in Figure 1. The controller manages and coordinates the agent machines and the agents generate load against the application. The test controller is also responsible for collecting performance monitor data from the servers under test and optionally from the test rig machines.
Here are some links :
MSDN
Dave's introduction
Not saying Visual Studio Load Testing is not a great tool. There are tools, like Tsung, Eventlet (and many others) that can support well over thousands of concurrent users.
Good luck.

Should test data be used in production?

We are deploying an update to our main application in production. The update has been tested in QA and it looks good to go. Our client wants to do a test in production. For that case, we will run the application using "test data" in production and once the test has been finished, we will delete the "test data".
A couple of server admins are against this because "test data doesn't belong to production". I think it's OK since the QA server and the production server have different hardware and the databases house different applications (QA has more databases, production is dedicated). Besides that, are there other facts that I can use to back my opinion?
EDIT: adding context
The application is a tool that automates the reception and validation of data. We receive the files via email and this tool automatically validates them and imports them to the database. We have a BI system that creates reports using this information (excel files are received by email, then validate, then reports/views come out, all this automated).
The "test data" would be old files (good and bad files from previous efforts) that represent true data (actually it is true data but with problems or just too old).
Yes! But manual usage of test data in production does not sound like a good idea to me as it cannot be controlled or monitored. My answer below is assuming the test data is used for automated testing.
Test data in production is "todays" need. This was not a requirement back then when automated testing was not a requirement(or did not exist). So in general this will be frowned upon. Security is the main reason. Its impact in messing up site analytics is another reason. These are genuine and good reasons.
One cannot decide one day to simply put test data in production especially towards the end of project. This needs to be made a requirement from the time development starts. So the test data needs to be there in production from the very first deployment onwards. And its impact needs to studied and documented. Organization as a whole need to understand it's benefit and impact.
Test data needs to be divided based on it's type,need or context. eg: Retrievable test data and editable test data. First step would be to have Retrievable(read only-never changes) test data available. Perhaps this is farthest we could go in many case, still would provide good results. And creation of this read only test data needs to be automated and preferably documented.
The benefits of having test data in production is huge. An automated test of an application is more precious that then the application itself. If the management realizes that then at least the initial "frown" changes.I feel test data in production should be considered a requirement/userstory and all problems against it should be mitigated. And new patterns of development need to evolve in this area.
This discussion is also related to integration testing and this article focuses on the benefits of it over unit testing
Your admins are right. Having test data in production will expose you to the risks (security holes):
Test data in production can be used to do damage to your company (intentional or nonintentional).
For example if you have non excisting identities on production you can do payment to them. If they are linked to real bank accounts you lose money without the ability to detect it.
Test data can change your management reports. When having fake action, some can infuence reports and have impact on decisions made. This will very hard to track and even harder to correct.
Test data can interact with production data. If someone makes a mistake and make a wrong relation production data can be changed based on test data.
There is no good way of detecting you have test data, if you would mark it. All data can be marked as test data. If you handle the test data different in your businesslayer, it whould not be a real test of your production environent.
Nowadays it is a good practice have Staging environment with the same infrastructure configuration like Production, so you can execute pentests, load tests, and do whatever you want to do to ensure that Production will behave as you expect.