I am trying to update a legacy system's sql solution to use the cloud.
The solution today involves a customer Windows SQL server installed onsite, then various machines are configured to connect to that IP Address / Port / Server Name. When they do connect the machines will set up any tables that are missing and regularly send their data. Data rates are low for an individual machine. Roughly one write request ever 10 seconds (it varies a lot), no more than 2-3k of information on each write request.
Moving this to the cloud is tricky mostly because each of the machines do not have a unique identifier. The good news is that we have the legacy machines connected to a IOT Gateway (Just think RPI) that knows a unique machineId. Furthermore the IOTG is a full fledge computer but not too powerful of one, and its Disk is an SD card.
New and Old Network Layout
So far I have had a few things fall on their face.
1) Setting the Machine to think the DB's IP/Port is that of the IOT Gateway. Setting up an Express server on the IOTG, listening, then injecting the unique id into the queries that I'd proxy up to the cloud. I may have had a bug, but for some reason I couldn't even see the requests coming in on the port. Even if I could I'd still have to figure out how to decode them. Shouldn't I at least be able to see these requests coming in?
2) Started looking into SQLite. The idea being to have SQLite listen on the port as an actual DB then have a process in the IOTG query data out of SQLite, append a unique ID, and then send it to the cloud. Unfortunately SQLite does not listen on a port.
I am starting to looking at just installing a whole SQL server on the device, but I'd really like to avoid that. I'm pretty sure its fairly large and writing to disk is not advisable for a small embedded system like I'm running.
Generally my questions boil down to:
1) Should I be able to see SQL Queries in an express server?
2) Should I be using a different tech? I failed to find a different more sql specific proxy.
3) Am I correct to think that the SQLite path is dead? Even if I could find a way to attach it to a port there is still not going to be any sort of response from SQLite when the clients try to make a connection.
4) Am I wrong to fear the local server? Diving into some documentation for making express work with DBs gets me to here: https://www.microsoft.com/en-us/sql-server/developer-get-started/node/ubuntu/ which suggests 4GB of memory, we're working on 0.5GB.
Any other thoughts on how to approach this would be great.
I was assigned to update existing system of gathering data coming from points of sale and inserting it into central database. The one that is working now is based on FTP/SFTP transmission, where the information is sent once a day, usually at night. Unfortunately, because of unstable connection links (low quality 2G/3G modems), some of the files appear to be broken. With just a few shops connected that way everything was working smooth, but along with increasing number of shops, errors became more often. What is worse, the time needed to insert data into central database is taking up to 12 - 14h (including waiting for the data to be downloaded from all of the shops) and that cannot happen during the working day as it would block the process of creating sale reports and other activities with the database - so we are really tight with processing time here.
The idea my manager suggested is to send the data continuously, during the day. Data packages would be significantly smaller, so their transmission and insertion would be much faster, central server would contain actual (almost real time) data and night could be used for long running database activities like creating backups, rebuilding indexes etc.
After going through many websites, I found that:
using ASMX web service is now obsolete and WCF should be used instead
WCF with MSMQ or System Messaging could be used to safely transmit data, where I don't have to care that much about acknowledging delivery of data, consistency, nodes going offline etc.
according to http://blogs.msdn.com/b/motleyqueue/archive/2007/09/22/system-messaging-versus-wcf-queuing.aspx WCF queuing is better
there are also other technologies for implementing message queue, like RabbitMQ, ZeroMQ etc.
And that is where I become confused. With so many options, do you have any pros and cons of these technologies?
We were using .NET with Windows Forms and SQL Server, but if it would be necessary, we could change to something more suitable. I am also a bit afraid of server efficiency. After some calculations, server would be receiving about 15 packages of data per second (peak). Is it much? I know there are many websites without serious server infrastructure, that handle hundreds of visitors online and still run smooth, but the website mainly uploads data to the client, and here we would download it from the client.
I also found somewhat similar SO question: Middleware to build data-gathering and monitoring for a distributed system
where DDS was mentioned. What do you think about introducing some middleware servers that would cope with low quality links to points of sale, so the main server would not be clogged with 1KB/s transmission?
I'd be grateful with all your help. Thank you in advance!
Rabbitmq can easily cope with thousands of 1kb messages per second.
As your use case is not about processing real time data, I'd say you should combine few messages and send them as a batch. That would be good enough in order to spread load over the day.
As the motivation here is not to process the data in real time, then any transport layer would do the job. Even ftp/sftp. As rabbitmq will work fine here, it's not the typical use case for it.
As you mentioned that one of your concerns is slow/unreliable network, I'd suggest to compress the files before sending them, and on the receiving end, immediately verify their integrity. Rsync or similar will probably do great job in doing that.
From what I understand, you have basically two problems:
Potential for loss/corruption of call data
Database write performance
The potential for loss/corruption of call data is being caused by a lack of reliability in the transmission of data from client to service.
And it's not clear what is causing the database contention/performance issues, beyond a vague reference to high volumes, so this answer will be more geared towards solving the first problem.
You have correctly identified the need for reliable asynchronous communication transport as a way to address the reliability issues in your current setup.
Looking at MSMQ to deliver this is a valid first step. MSMQ provides reliable communication via a store and forward messaging semantic which comes out of the box and requires very little in the way of configuration.
Unfortunately, while suitable for your needs, MSMQ relies on 2 things:
A reliable network protocol, and
A client service running on both sending and receiving machine.
From your description above, I don't believe 1 exists (the internet is not a reliable network), and you might well struggle with 2 - MSMQ only ships with Windows Server or business/enterprise versions of Windows on the desktop.(*see below...)
As a possible solution to the network reliability problem, you could use a WCF or a RESTful endpoint (using Nancy or WebApi) to expose a service operation(s) exposed over HTTP, which would accept the incoming calls from the client machines. These technologies are quite different, so you'll need to make sure you're making the correct choice early on.
WCF supports WS-ReliableMessaging from the SOAP 1.2 specification out of the box, which allows for reliable web service calls over http, however it's very config-heavy and not generally a nice framework to work with.
REST much simpler than WCF in .Net, is very lightweight and easy to use. However, for reliable delivery you would have to expose some kind of GET operation (in addition to a POST to allow the client to send data) to be called (within a reasonable time-frame) to verify the data was committed. The client would have to implement some kind of retry semantic if the result of the GET "acknowledgement" was negative.
Despite requiring two operations rather than one for the WCF route, I would favour the REST approach. I've done plenty of both and find REST services way nicer to work with.
(*) That's not to say that MSMQ wouldn't work in your ultimate solution, just that it would not be used to address the transmission reliability issue. However it could still be used to address another of your problems, that of database write contention. If you were to queue incoming requests once they came into the server, then these could be processed by an "offline" process, which could then perform the required database operations in a reliable manner. This could be done by using MSMQ transactional queues.
In response to comments:
99% messages are passed from shop to main server, but if some change
is needed (price correction, discounts etc.), that data has to be sent
to shop.
This kind of changes things. Had I understood from the beginning that you had a bidirectional requirement, and seeing as how you have managed to establish msmq communication, I would have nudged you towards NServiceBus, which is a really, really cool wrapper around MSMQ. The reason I would have done this is that you appear to have both a one way, and a publish-subscribe requirement, which is supported really nicely by NServiceBus.
First let me say i am only a novice programmer, and by no means an sql guru. We have an app at work that is and has been under heavy dev from the vendor for sometime (2+ years). It runs as a MSSQL instance on one of our servers, and there is a client install for the desktops. The client software is making direct sql calls to the database.(it also has a local mysql instance to handle the client settings) there is 6-12 ports that had to be opened up for the communication. Looking at the sql manager, i can see direct sql calls from various clients.
Seems to me this is entirely the wrong approach. the closest thing i have done to this, was a webpage + php+ mysql. The webpage would make requests, and all the processing would be serverside, then simply display the results. The sluggishness my users feel i think is from the clientside request+ processing of the sql data.
ps: i realize that if they have not done it by now, switching to another paradigm seems out of the question. i just want to know if i am way off base.
You are way off base.
The client side has much more processing power.
Consider the case of one server and 5 clients. Even is the server has 3 times the power of a client the clients as a whole are still 5:3 more powerful.
If the application is sluggish it was probably poorly written. You need to investigate the root cause. Client / Server is a leading practice in design, I'm guessing it is not the root cause. It might be badly implemented or there might be other reasons. Your comment about having a local mysql sounds very fishy to me -- there should be no need for this.
I have the need to access a sybase database (12.5) from oversea. The high latency is definitely a problem.
I already optimized the connection parameters to make better use of the network and achieved a 20x performance increase, but it's still not enough : 1 minute to get 3Mb of data.
We need another 10x or 20x increase for our application.
Technical data :
the data are flowing through a single TCP connection using the TDS protocol
the client app is an excel sheet with macros, using the default Sybase driver
the corporate environment makes it difficult to push big changes in the 10+ years architecture, so solutions need to be the least intrusive. But some changes may be bargained due to the importance of this project.
Can anyone give me pointers ?
I already thought of :
splitting SQL requests over several concurrent connections to the database. The problem is data consistency : what if records are modified at the same time since requests will not be exactly executed at the same time ? Is there an existing mechanism to spread a request over several calls on different connections ?
using some kind of database "cache" or "local replication" oversea, but I don't know what is possible.
Thanks.
Try to install local database (ASE or ASA) and synchronize this databases with Sybase Mobilink (or Sybase Replication Server if you need small replication latency and you have a lot of money).
(I know I answer to my own question)
Eventually, we settled to designing our own database remote access protocol. It's not complicated since we are only using a basic subset of SQL (SELECT and UPDATE), and the protocol doesn't have to understand SQL anyway.
By using our own protocol, we'll be able to use compression, make the client able to use several TCP links at the same time, maximize network utilisation and add some functionnal caching secific to our application.
The client will be our app and the server will be a "proxy" to the real database, sitting next to it (like #Tim suggested in the comments).
It's not the only solution, but we feel that it's a good balance between enormous replication price, development complexity and expected benefits.
I have the following situation:
-I have a server A hooked up to a piece of hardware that sends values and information out of every second. Programs on the server machine can read these values. This server A is in a very remote location so Internet connection is very slow and not reliable but the connection does exist. Let's say it's a weather station in the Arctic.
-Users from the home location want to monitorize the weather values somehow. Well, the users can use a remote desktop connection the server A but that would be too too slow.
My idea is somehow to have a website on a web server (let's call the webserver - B and B is in a home location ) and make the server A connect to the server B and somehow send values and the web application reads the values and displays them....... but how to do such a system ?? I know I can use MySQL and have the server A connect to a SQL server on server B and send INSERT queries and have the web application running on server B constantly read from the SQL server but I think that would be way way too slow and I think there has to be a better solution.
Any ideas ?
BTW. The users should be able to send information to the weather station from the website as well ( so an ADMIN user should be allowed to shut down the weather station from the website or whatever)
Best regards,
MadSeb
Ganglia (http://ganglia.sourceforge.net/) is a popular monitoring tool that supports collecting arbitrary statistics using the gmetric tool. You might be able to build something around that.
If you do need to roll your own solution then you could have a persistent message queue at A (I'm a fan of RabbitMQ) to which you could log your metrics. You could then have something at B which listens for messages on the queue and saves the state at B.
This approach means you don't lose data when the connection drops.
The message might be a simple compressed data value, say csv or json so should be fine on low bandwidth connections.
All the work (parsing the csv or json, and saving the data to a database for example) is done at B where you don't have limitations.