How to integrate multiple components into an API - api

I am working on tying together multiple components that I have developed for my undergraduate thesis, and have hit a road block. There are five components to my project:
Sqlite Database (Considering switching to postgres to make propping it up on Amazon RDS easier)
Python Data Scraper
C++ Data Transformation/Computation Layer
C++ Data Output Component
Angular Front-End (To be developed)
Currently all of my components are duct-taped together with a bash script, with the Angular component non-existent.
What I want to do is the following:
Make an AWS hosted API which does the following:
Make the Data Scraper / Data Transformation layer into a cron job
Make the Data Output component be callable by the Angular site for certain values (at the moment I only have 1 API call in mind)
Problems:
Do not know how to integrate components with various languages into an API
Do not know how (or if it is even possible) to prop up a SQlite db onto some server.
Any suggestions to simplify this would be greatly appreciated. I have had these components completed for about a month but have not been able to figure out how to tie them together.
Thanks!

You make things very complicated :-)
You have several languages at play here, which makes things much trickier. I would consider rewriting your C++ code in python, or making python wrappers around it: https://docs.python.org/2/extending/extending.html so that you can use a python framework like one of these: http://blog.miguelgrinberg.com/post/designing-a-restful-api-with-python-and-flask
or http://python-eve.org.
If you want to use RDS, you will have to switch to one of their supported databases. If you are OK running your own database, you will already be using a server to run your code on, just use an EBS volume there (don't use ephemeral disk unless you are going to do a very robust backup/replication process). But considering what you have built, I would really consider using something with less maintenance overhead. If you are willing to pay for it, Aurora just got released for public use, and removes most of the db administration overhead work.

Related

Keeping the api inside or outside a vb.net application

I have inherited a vb.net application at work which is written in procedural code. As I work on the app I am planning on moving it towards an OO-architecture.
In addition, we have some needs within the company to use the data from this app in other places. One existing way we've done this is to write Perl scripts that access the underlying MySQL database directly. As a result, there has been much logic duplication, because the scripts can't access all the business rules that live inside the vb.net app.
Most of my experience is in web development, so when I first came across this problem the first solution that came to mind was to build a REST api to serve up the data over HTTP and which contained the business logic, and then to build the vb.net app as a client of this api. Then, external scripts could also consume the api without duplication of effort.
I don't have any experience with vb.net, and because of the way the app is currently structured this would require lots of work. Another possibility I thought of was creating the api within the vb.net app and somehow exposing it to the outside world (in addition to the GUI layer of the vb.net app itself).
I'm also aware that my analogy to the web world may not hold up, just because I'm so unfamiliar with desktop development.
Can anybody share any wisdom or guidance here? The project is going to be a big one either way, and I'm just trying to figure out the best path forward.
Thanks in advance.

Magento Dropshipping - How to automate catalog updates?

I am new to Magento and impressed by the MVC framework that powers it, making module development a well thought out solution. I am strong CakePHP developer.
I am working on a project that uses a dropshipper for the physical products. As a result, every day at 4am a feed needs to be parsed and the products / categories modified, plus stock information. A CRON will be setup to do this.
Additional requirements are:
Upon a sucessful order, the system must upload a CSV feed to the Dropshipper via FTP with the order details for distribution.
Realtime stock checks, either every hour by CRON or a lookup on the product page
I can think of 2 approaches:
Write everything natively into Magento. As a newbie, this is going to be a big learning curve, but it is the right solution?
Write a simple CakePHP app that runs as a shell. This will use the Magento API to manage all dropshipper processes. This solution will be easier to rollout but introduces an additional system to support.
Does anyone have an advice relating to dropshipping in Magento?
First, with respect to the product import (product, stock data), make sure to do the real data saving inside of Magento. There have been changes to the catalog implementation in the past, and it's likely with a framework like Magento that there will be more. Keeping it inside the framework will reduce the likelihood of it simply no longer operating and you getting a very unpleasant phone call.
Another advantage to this approach is that, in contrast to the API approach, the native code will not try to spin up the entire framework for every request. This is expensive and to be avoided. Depending on how many products there are, you may need to break the script into multiple executions due to memory leaks when saving catalog products.
Don't tie the stock checks to a catalog page view. Some web crawler will come eat your lunch.
Finally, there's no easy FTP library built into Magento, but throwing that on another cronjob and using system calls to perform the actual (S)FTP call is possibly your easiest option.
Hope that helps!
Thanks,
Joe
I think the answer to this question is simple. Write it in what you know. The biggest reason is "UPGRADES"... with Magento being as high profile, the possibility of being hacked with older versions increase every day. Therefore, when they release new versions, you are going to want to upgrade. With that in mind, are you going to want to add all of your changes into each new version as it is released? Probably not. If there is a solution to write this as a separate tool, that is what you should do.
PROS TO BUILDING OUTSIDE OF MAGENTO
No need to re-integrate the upgrades
every time a new version of Magento
is released.
Code is easier to maintain.
Tool is easier to write in something
you are familiar with.
No learning curve.
Integration speed will be much
quicker.
More flexibility since you do not
have to fit inside Magento code
limitations.

PyAMF backend choices!

I've been using PyAMF to write a backend for a flex app that will request different groups of hundreds of different images depending on what the client needs. I have been using the "simple_server" WSGI server that PyAMF supplies while developing the flex code. Now I'm ready to write a robust backend that will be able to pull images from a mySQL database and send them as fast as possible and as efficiently as possible to many concurrent clients.
The PyAMF documentation is great because they supply many examples to follow, however I am confused about what kind of backend I am trying to create.
Do I want a SocketServer or a WSGI server or something like Twisted or web2py or Tornado? Are these even all different? :) Should I be using Apache modules instead (mod_wsgi or modjy or mod_python)?
I realize that this probably touches on many open debates, so maybe you could just point me to any good summaries of these debates?
Its great to have so many options, but how do I choose?
The short answer is, of course, that it depends on the requirements of your project.
How many concurrent connections is "a lot"?
How much programmer time can you throw at the problem?
How much hardware can you throw at the problem?
...etc...
If you plan to have lots of concurrent clients, it's hard to beat Twisted in the Python world. However, you'll have to deal with your database asynchronously to avoid blocking, and depending on how complex your database interactions are, this can be a bit of a pain. You're basically limited to either using twisted.enterprise.adbapi or coming up with your own twisted-ORM integration.
If you'd rather have "easy" database code (i.e. you want to use an ORM), you're better off going with a (TurboGears/Pylons/plain wsgi) project, probably hosted using Apache and mod_wsgi. This can be a pretty scalable solution, and you get a lot of stuff for free using these frameworks, but it may be more than you need.
I would avoid using one of the many plain python wsgi servers out there (wsgiref, paster, etc.) in production if you really want high performance.
Good Luck!

Good database library/ORM for cocoa development

I am developing a cocoa application that will be making heavy use of both web services and a standard dbms (most likely MySQL) and I am wondering if anyone has a good option for a database library or ORM solution they have used. CoreData is not an option due to the need to support a standard DBMS and to be able to modify the data outside of the normal application operation.
I have found a number of possible options from new open source libraries:
http://github.com/aptiva/activerecord/tree/master
To writing my own wrapper for the C MySQL api.
Any advice is welcome,
Thanks!
Paul
We faced a similar question when we first started work on Checkout, our solution was to code the entire app in Python, using PyObjC.  Checkout 1 had an sqlite backend, Checkout 2 has a postgres backend.
There are a couple of really mature and powerful ORMs on the Pyton side, such as SQLObject, which is pretty simple to work with (we used it for Checkout 1.0) and SQLAlchemy, which is more powerful but a bit harder to wrap your brain around (we used it for Checkout 2.0).
One approach you could evaluate, is building the application in Objective-C, but writing the data model and database connectivity/adminstration code in Python. You can use PyObjC to create a plugin bundle from this code, that you then load into your app  That's more or less the approach we took for Checkout Server, which uses a Foundation command-line tool to administer a postgres server and the databases in it, this CLI tool in turn loads in a Python plugin bundle that has all of the actual database code in it.  End-users mostly interact with the database through a System Preferences pane, that has no clue what the database looks like, but instead uses the command-line tool to interact with it.
Loading a plugin is simple:
NSBundle *pluginBundle = [NSBundle bundleWithPath:pluginPath];
[pluginBundle load];
You will probably need to create .h files for the classes in your bundle that you want to have access to from your Obj-C code.
You might also want to check out the BaseTen framework. It is a Core Data-like framework (in fact, it can import Core Data models), but works with PostgreSQL (though not MySQL, as far as I know). It includes some very nice features such as schema discovery at run time. It also includes an NSArrayController subclass that automatically handles locking and synchronizing across multiple users, so you can continue to make use of Apples Key-value Binding in your UI.
I have personal experience with this particular problem. I even started down the road of writing my own wrapper for the C MySQL API.
The eventual conclusion was: Don't!
The solution that worked in my case was to communicate with the MySQL server via PHP. If you are familiar with web services, chances are that you know about PHP, so I don't won't go into loads of detail about that.
To read from the database:
The cocoa app sends a request for a URL on the server: http://theserver.com/app/get_values.php
The get_values.php script handles the database query, and returns the data in xml format
The cocoa app loads and parses the xml
To write to the database:
The cocoa app sends a more complex request to the server: http://theserver.com/app/put_values.php?name="john doe"&age=21&address=...
The put_values.php script parses the input and writes to the database
The beauty of this solution is that PHP is great for working with MySQL, and cocoa has some handy built-in classes for working with XML data.
edit: one more thing:
One of the key things you have to figure out with this approach is how much processing should be done on the server, and how much should be done in the app itself. Let cocoa do the things that cocoa is good at, and let PHP and MySQL do the things that they are good at.
You could write a generic PHP script to handle all queries: perform_query.php?querystring="SELECT * FROM .....", but that is hardly an optimal solution. Your best bet is several smaller PHP scripts that handle individual datasets for you. In my case, there was one to get the list of users, one to get the list of transactions, etc. Again, it all depends on what your app is going to do.
GDL2 is a nice example, based on EOF.
Instead of reinventing the wheel by writing your own communication wrapper to deal with MySQL from Cocoa, you could try the SMySQL framework (a.k.a. MCPKit), it was part of the CocoaMySQL application that evolved into the Sequel Pro project. It works with varying versions of MySQL, and seems to be quite robust.
If you need to understand how to incorporate it into your application, there's not much documentation around, but it has an easy to understand interface and you can see it working by looking at the source of Sequel Pro, which is downloadable from Google code.
There is also the CocoaMySQL-SBG fork of the CocoaMySQL project, but that seems to be out of date and I couldn't get it to build properly.
I've also implemented a simple object persistence framework based on sqlite, but it certainly wasn't trivial to do. I agree with eJames' conclusion- don't implement one yourself if you don't have to.
If you aren't committed to programming in Objective-C you might want to take a look at PyObjC which would allow you to program the database portion in Python. You can use the MySQLdb module for DB access and there are plenty of tutorials online for its use. It isn't hard to stuff the data back into Cocoa/CF classes and pass them back to your app.
The main caveat with PyObjC is that at the moment it doesn't work with Tiger.

Amazon SimpleDB

Has anyone considered using something along the lines of the Amazon SimpleDB data store as their backend database?
SQL Server hosting (at least in the UK) is expensive so could something like this along with cloud file storage (S3) be used for building apps that could grow with your application.
Great in theory but would anyone consider using it. In fact is anyone actually using it now for real production software as I would love to read your comments.
This is a good analysis of Amazon services from Dare.
S3 handled what I've typically heard described as "blob storage". A typical Web application typically has media files and other resources (images, CSS stylesheets, scripts, video files, etc) that is simply accessed by name/path. However a lot of these resources also have metadata (e.g. a video file on YouTube has metadata about it's rating, who uploaded it, number of views, etc) which need to be stored as well. This need for queryable, schematized storage is where SimpleDB comes in. EC2 provides a virtual server that can be used for computation complete with a local file system instance which isn't persistent if the virtual server goes down for any reason. With SimpleDB and S3 you have the building blocks to build a large class of "Web 2.0" style applications when you throw in the computational capabilities provided by EC2.
However neither S3 nor SimpleDB provides a solution for a developer who simply wants the typical LAMP or WISC developer experience of building a database driven Web application or for applications that may have custom storage needs that don't fit neatly into the buckets of blob storage or schematized storage. Without access to a persistent filesystem, developers on Amazon's cloud computing platform have had to come up with sophisticated solutions involving backing data up manually from EC2 to S3 to get the desired experience.
I just finished writing a library to make porting an app to simpledb in Perl easy, Net::Amazon::SimpleDB::Simple because I found the Amazon client libraries painful. The library isn't on CPAN yet, but it is at http://rjurneyopen.s3.amazonaws.com/SimpleDB/Simple.pm The idea was to make it trivial to stuff hashes in and out of SimpleDB.
I just ported an app to use it. Overall I am impressed with SimpleDB... even inefficient queries take only 2-3 seconds to return. SimpleDB doesn't seem to care about the size of your table, owing to its Erlang/parallel nature. Tablescans are easy for it.
The pain comes from the fact that you can't count, sum or group by. If you plan on doing any of those things... then SimpleDB probably isn't for you. At the moment in terms of functionality it exists somewhere in between memcached and MySQL. You can SELECT ORDER BY LIMIT, which is nice. Its also nice that you don't have to scale it yourself, and its nice that it doesn't care how much you stuff into it. But more advanced operations like analytics are painful at best. You'll have to do your own calculations server side. Its also a big plus that on any computer I can use the simpledb CLI http://code.google.com/p/amazon-simpledb-cli/ to query my data.
There are some confusing 'gotchas.' For instance, attributes can have more than one value, and you have to explicitly set 'replace' when storing items. Also, storing undef or null string results in a library error, instead of deleting that attribute name/value pair or setting it null/empty string.
Learning to think in terms of a largely un-normalized way is a little strange too, which is why I would second the suggestion above that says it is best for new applications. Porting from a SQL app to SimpleDB would be painful because your application logic would have to change. The way you do things is a bit different. The amazon docs are pretty good at explaining this.
All of this is extractable in a library that sits atop SimpleDB, so for your use of SimpleDB you will want to pick a good library... you probably don't want to deal with it directly. There is some work on the PHP side to make things easy, and there is my library. There is a RAILS activesource, but it doesn't seem to do much for you.
All in all its still early in the game, but compared to other APIs (twitter comes to mind), I have to say that the SimpleDB REST API is pretty simple (especially considering that it is XML) and polite to work with. I would recommend it... depending on the requirements of your application and the economics of your use of it. If you're looking to rapidly scale a service that doesn't put a great load on the DB and don't want to bother with a scalable MySQL/memcache combo... then SimpleDB can offer a 'simple' solution for you.
I expect that its features will continue to grow and it will be a good choice for more and more applications that do more complex and interesting things. But right now it is targeted at and appropriate for your typical Web 2.0 service.
We are using SimpleDB almost exclusively for our new projects. The zero maintenance, high availability, no install aspects are just too good. And for your Ruby developers, check out SimpleRecord, an ActiveRecord like interface for SimpleDB which makes it super easy to use.
But do you really need SQL Server? Can't you live with PostgreSQL or MySQL? Both have proven to be ok for most tasks.
Now if you need SQL Server features then you're out of luck.
Another option is to rent a server. How expensive is expensive?
(I've used Amazon S3 to store images for an application, it's ok and works fine, at least for that)
I haven't used SimpleDB, but have been using combination of S3, EC2, and MySQL for our application.
As long as you are willing to use SimpleDB, then you might as well consider using MySQL (which is very scalable, and not that expensive).
On the S3 and EC2 side, it is great in practice as well.
SimpleDB works great for many applications.... if your project will require a lot of analytic reporting, joining, etc, you may consider MySQL or a hybrid-model.
If you go SimpleDB, we've developed Radquery.com for our internal use and opened it up to the public.