I am trying to create a dashboard app for my company that displays data from a few different sources that they use. I am starting with an in house system that stores data in MSSQL. I'm struggling to decide how I can display real time (or at least updated regularly) data based on this database.
I was thinking of writing a node server to poll the company database and check for updates and then store a copy of the relevant tables in my own database. Then creating another node server that computes metrics (average delivery time, Turnover, etc.) from my database and then a frontend (probably react) to display these metrics nicely and trigger the logic in the backend whenever the page is loaded by a user.
This is my first project so just need some guidance on whether this is the right way to go about this or if I'm over complicating it.
Thanks
One of solutions is to implement a CRON job in nodejs or in you frontEnd side, then you can retrieve new Data inserted to your Database.
You can reffer to this link for more information about the CROB job :
https://www.npmjs.com/package/cron
if you are using MySQL, you can use the mysql-events listner, it is a MySQL database and runs callbacks on matched events.
https://www.npmjs.com/package/mysql-events
I'm trying to get smarter about caching and have certain users whose entities are getting into 10s of megabytes. These entities are loaded every minute and most of the time they do not change (ie; they change a few times per day)
In order to avoid network roundtrips, I'd love to cache these entities on Azure worker role instances.
Is there a way to get at the timestamp of an entity without loading the whole entity and making it travel over the wire?
Alternatively, is there a way to /realiably/ and without additional license cost to synchronize RavenDB locally onto an Azure worker role instance and keep it updated with changes from the master?
You can use the Head function in the DatabaseCommands:
var metadata = _documentStore.DatabaseCommands.Head("customers/1");
var lastModified = metadata.LastModified;
More information: http://ravendb.net/docs/article-page/3.0/csharp/client-api/commands/documents/how-to/get-document-metadata-only
Hope this helps!
Is there any way to measure the time taken for data from Database using WCF Data service. So that we can log it for our analysis purpose?
Any pointers or suggestions on this will be helpful.
Thanks in advance!
We have data stored in a data warehouse as follows:
Price
Date
Product Name (varchar(25))
We currently only have four products. That changes very infrequently (on average once every 10 years). Once every business day, four new data points are added representing the day's price for each product.
On the website, a user can request this information by entering a date range and selecting one or more products names. Analytics shows that the feature is not heavily used (about 10 users requests per week).
It was suggested that the data warehouse should daily push (SFTP) a CSV file containing all data (currently 6718 rows of this data and growing by four each day) to the web server. Then, the web server would read data from the file and display that data whenever a user made a request.
Usually, the push would only be once a day, but more than one push could be possible to communicate (infrequent) price corrections. Even in the price correction scenario, all data would be delivered in the file. What are problems with this approach?
Would it be better to have the web server make a request to the data warehouse per user request? Or does this have issues such as a greater chance for network errors or performance issues?
Would it be better to have the web server make a request to the data warehouse per user request?
Yes it would. You have very little data, so there is no need to try and 'cache' this in some way. (Apart from the fact that CSV might not be the best way to do this).
There is nothing stopping you from doing these requests from the webserver to the database server. With as little information as this you will not find performance an issue, but even if it would be when everything grows, there is a lot to be gained on the database-side (indexes etc) that will help you survive the next 100 years in this fashion.
The amount of requests from your users (also extremely small) does not need any special treatment, so again, direct query would be the best.
Or does this have issues such as a greater chance for network errors or performance issues?
Well, it might, but that would not justify your CSV method. Examples and why you need not worry, could be
the connection with the databaseserver is down.
This is an issue for both methods, but with only one connection per day the change of a 1-in-10000 failures might seem to be better for once-a-day methods. But these issues should not come up very often, and if they do, you should be able to handle them. (retry request, give a message to user). This is what enourmous amounts of websites do, so trust me if I say that this will not be an issue. Also, think of what it would mean if your daily update failed? That would present a bigger problem!
Performance issues
as said, this is due to the amount of data and requests, not a problem. And even if it becomes one, this is a problem you should be able to catch at a different level. Use a caching system (non CSV) on the database server. Use a caching system on the webserver. Fix your indexes to stop performance from being a problem.
BUT:
It is far from strange to want your data-warehouse separated from your web system. If this is a requirement, and it surely could be, the best thing you can do is re-create your warehouse-database (the one I just defended as being good enough to query directly) on another machine. You might get good results by doing a master-slave system
your datawarehouse is a master-database: it sends all changes to the slave but is inexcessible otherwise
your 2nd database (on your webserver even) gets all updates from the master, and is read-only. you can only query it for data
your webserver cannot connect to the datawarehouse, but can connect to your slave to read information. Even if there was an injection hack, it doesn't matter, as it is read-only.
Now you don't have a single moment where you update the queried database (the master-slave replication will keep it updated always), but no chance that the queries from the webserver put your warehouse in danger. profit!
I don't really see how SQL injection could be a real concern. I assume you have some calendar type field that the user fills in to get data out. If this is the only form just ensure that the only field that is in it is a date then something like DROP TABLE isn't possible. As for getting access to the database, that is another issue. However, a separate file with just the connection function should do fine in most cases so that a user can't, say open your webpage in an HTML viewer and see your database connection string.
As for the CSV, I would have to say querying a database per user, especially if it's only used ~10 times weekly would be much more efficient than the CSV. I just equate the CSV as overkill because again you only have ~10 users attempting to get some information, to export an updated CSV every day would be too much for such little pay off.
EDIT:
Also if an attack is a big concern, which that really depends on the nature of the business, the data being stored, and the visitors you receive, you could always create a backup as another option. I don't really see a reason for this as your question is currently stated, but it is a possibility that even with the best security an attack could happen. That mainly just depends on if the attackers want the information you have.
I need to synchronize between two data sources:
I have a web service running on then net. It continuously gathers data from the net and stores it on the database. It also provides the data to the client based on the client's request. I want to keep a repository of data as object for faster service.
On the client side, there is a windows service that calls the web service mentioned previously and synchronize its local database to the server.
Few of my restrictions:
The web service has very small buffer limit and it can only transfer less then 200 records per call which is not enough for data collected in a day.
I also can't copy the database files since the database structure is very different (sql and other is access)
The data is being updated on a hourly basis and there will be large amount of data that will be needed to be transfer.
Sync by date or other group is not possible with the size limitation. Paging can be done but the remote repository keeps changing (and I don't know how to take chunk of data from the middle of table of SQL database)
How do I use the repository for recent data update/or full database in sync with this limitation?
A better approach for the problem or an improvement of the current approach will be taken as the right answer
You mentioned that syncing by date or by group wouldn't work because the number of records would be too big, but what about syncing by date (or group or whatever) and then paging by that? The benefit is that you will have a defined batch of records and you can now page over that because that group won't change.
For example, if you need to pull data off hourly, as each hour elapses (so, when it goes from 8:59am to 9:00 am), you begin pulling down the data that was added between 8am and 9am in chunks of 200 or whatever size the service can handle.