Is there any tool to monitor the incoming requests to Redshift database? - redis

I've implemented predis server. I can't determine whether my application is pinging the database every time or if it is taking data from predis. Is there any way I can check incoming requests to the Redshift instance? Or test the predis?

Well, not sure if I understood the question but if you are looking to see the most recent queries run on your cluster you can check the stv_recents view. Or if you have access to the AWS console, you can open the "queries" tab for your cluster for real-time query execution details.

Related

Database for live mobile tracking

I'm developing an app that allows to track a mobile device instantly (live) ... I need an of advice. The application must send the location to a webservice that in it's turn records the received data in a database.
What would be, in your opinion, the best way to store the location values?
I'm new in using bigdata and I'm afraid that simple sql requests wont be able to do the work properly ... I imagine if there is lot of users and each user send a request each 1sec I'll have issue with the database ...
An advice ? Thank you very much
i think you could have a look into the geospatial queries in mongo, if you chose to go ahead with mongodb.
Refer here
And here
for the design of the database would depend on the nature of the query (essentially the read and write).
Worth having a look into
Working at Cintric we landed on using elasticsearch. We process billions of location points in real time and provide advanced analytics to our users.
We started with mongoDB and ran into a lot of troubles, eventually leading to a painful migration.
Our stack currently has mobile devices dump location updates into AWS Kinesis, which are then processed by AWS Lambda handlers, and then dumped into elasticsearch. We're able to serve, process and store 300 million requests/month for only a few hundred dollars/month. Analytics for our dashboard add additional cost but for your needs I would highly recommend checking out your options on AWS.

Running multiple Kettle transformation on single JVM

We want to use pan.sh to execute multiple kettle transformations. After exploring the script I found that it internally calls spoon.sh script which runs in PDI. Now the problem is every time a new transformation starts it create a separate JVM for its executions(invoked via a .bat file), however I want to group them to use single JVM to overcome memory constraints that the multiple JVM are putting on the batch server.
Could somebody guide me on how can I achieve this or share the documentation/resources with me.
Thanks for the good work.
Use Carte. This is exactly what this is for. You can startup a server (on the local box if you like) and then submit your jobs to it. One JVM, one heap, shared resource.
Benefit of that is then scalability, so when your box becomes too busy just add another one, also using carte and start sending some of the jobs to that other server.
There's an old but still current blog here:
http://diethardsteiner.blogspot.co.uk/2011/01/pentaho-data-integration-remote.html
As well as doco on the pentaho website.
Starting the server is as simple as:
carte.sh <hostname> <port>
There is also a status page, which you can use to query your carte servers, so if you have a cluster of servers, you can pick a quiet one to send your job to.

Auto-scaling with Amazon EC2 when SQL is involved

I'm building a whiteboard web app with self-contained "rooms" of clients that runs off Amazon EC2 instances (a single one for now). Commands are sent via websockets to a PHP server, which stores all commands in a SQL database.
Up until now I was using Google Cloud SQL. My plan was to learn how to scale with EC2 and have all instances use the same remote database. I've learned this won't work due to the 200 ms write latency of a remote SQL server vs. the 0.5 ms write latency of a local SQL server. The server makes a write every time a command arrives.
I'm new to scalability and distributed systems. My intuition tells me I either need to use Amazon RDS and hope for millisecond latencies if my EC2 and RDS instances are in the same region, or work with SQL locally on EC2 instances. I'm leaning toward the latter. Here's my issue: EC2 is elastic. What happens when I need to get rid of an instance?
All I can think of right now is somehow replicating the SQL data from each EC2 instance to a master instance (maybe even Google Cloud SQL!). In other words, all reads/writes for each "room" happen locally, and are eventually replicated to the master server for long-term storage. If a "room" is re-opened a week later, a different EC2 instance can grab data from the master server, work with it locally, and replicate changes back before being destroyed.
Does my approach sound correct--is replication the right concept here? If so, how much support for what I'm trying to do already exists? That is, do I need to set up a master server that manages EC2 instances and distributes/collects the SQL data manually (100% custom implementation), or is there are there existing libraries/mechanisms for SQL and maybe even EC2 instance replication/management? And if my approach is wrong, what are some better approaches? This is one of those times where I don't know what to research on my own. Thanks!
I'd agree with user02525 perhaps look at using Elasticache redis, it sounds more in line with what you're doing.

Is Redis data volatile?

I am trying to figure out something and I've been searching for a while with no results.
What happens if a Redis server loses power or gets shut down or something that would wipe the RAM? Does it keep a backup somewhere?
I am wanting to use Redis for a SaaS style app so if I go to app.com/usernamesapp it would use redis to verify usernamesapp exists and get the ID... At which point it would use MySQL for all the rest of the stuff... Reasons being I want to begin showing the page ASAP and most of the stuff is javascript so all the MySQL would happen after the fact.
Thanks
Redis can be configured to write to disk at regular intervals so if the server fails you wont lose your data.
http://redis.io/topics/persistence
From the Redis FAQ
Redis is an in-memory but persistent on disk database
So a critical failure should not result in data loss. Read more at http://redis.io/topics/faq

What technologies/tools do people use to implement live websites?

I have the following situation:
-I have a server A hooked up to a piece of hardware that sends values and information out of every second. Programs on the server machine can read these values. This server A is in a very remote location so Internet connection is very slow and not reliable but the connection does exist. Let's say it's a weather station in the Arctic.
-Users from the home location want to monitorize the weather values somehow. Well, the users can use a remote desktop connection the server A but that would be too too slow.
My idea is somehow to have a website on a web server (let's call the webserver - B and B is in a home location ) and make the server A connect to the server B and somehow send values and the web application reads the values and displays them....... but how to do such a system ?? I know I can use MySQL and have the server A connect to a SQL server on server B and send INSERT queries and have the web application running on server B constantly read from the SQL server but I think that would be way way too slow and I think there has to be a better solution.
Any ideas ?
BTW. The users should be able to send information to the weather station from the website as well ( so an ADMIN user should be allowed to shut down the weather station from the website or whatever)
Best regards,
MadSeb
Ganglia (http://ganglia.sourceforge.net/) is a popular monitoring tool that supports collecting arbitrary statistics using the gmetric tool. You might be able to build something around that.
If you do need to roll your own solution then you could have a persistent message queue at A (I'm a fan of RabbitMQ) to which you could log your metrics. You could then have something at B which listens for messages on the queue and saves the state at B.
This approach means you don't lose data when the connection drops.
The message might be a simple compressed data value, say csv or json so should be fine on low bandwidth connections.
All the work (parsing the csv or json, and saving the data to a database for example) is done at B where you don't have limitations.