Storing vast amounts of "uptime" data for a website monitoring service - sql

this is more of a general discussion rather than a code question.
I have a website monitoring platform whereby users of the system can input their website URL and we'll check it every X minutes based on the customer's interval, at each interval, an entry is stored as a UptimeCheck model in the Laravel 8 project with the status being down or up.
If a customer has 20 monitors, and each checks every minute, then over a 30 day period for the one customer they'd accumulate over 1 million rows.
My query, is really do I need to keep this number of rows?
The reason this number of rows is kept is so that we can present a graph showing the average website uptime.
My thinking is that if I created some kind of SVG programatically for each day and store this in the table then I wouldn't need to store as many entries, but my concern here is how would I merge SVG models into one to present a daily graph?
What kind of libraries could I use and how else might I approach this?

Unlike performance, the trick for storing uptime data is simple. You don't store it. ;)
You need to store DOWNTIME data instead. Register only unavailability events and extrapolate uptime when displaying reports.

Related

Logging visits performance wise

So I'm developping a website in php, with the framework symfony (not like it matters for the question though).
My website has some kind of articles, pages that will be created.
So I'd like to have counts of visits by day, week, etc... not only for my personal stats but to display on the homepage any article of the day, or something like that.
The way I would do it is : each time someone visit an article, it insert a record in a visit_log table, with the date and the id of the table.
An ON DUPLICATE (or equivalent) would be interesting to perform an update of the count per day instead of an update.
That's pretty simple and working but I can't help but wondering : is it the right way to do ? I'm thinking mostly of performance but of the "well thought" as well. I know the table can get big over time but I guess a cron to clean it regularly would do the trick.
Any thought ? I guess it's a super common thing that people thought of
Are you facing performance issues with the current implementation?
If you're, then one approach would be: instead of inserting each view directly into the database as it takes place, use an in-memory store to keep a list of pages and their view count {'Id': 'count'} for some interval (e.g. 5-10 mins) or till the store reaches a certain size, and after the store reaches its time or size limits, insert the counts into the database.
Another approach is to periodically parse the web-server logs (most web-servers log this information by default), and again do bulk inserts to the database.

Changing Opening Hours without affecting historic data

I've been tasked to create a data visualisation dashboard that relies on me drilling into the existing database.
One report is 'revenue per available covers' - part of the calculation determining how many hours were booked against how many hours were available.
The problem is the 'hours available', currently this is stored in a schedule table that has a 1-1 link with the venue - and if admin want to update this there is a simple CRUD panel with the pre-linked field ready to complete.
I have realised that if I rely on this table at any point in the future when the schedule changes the calculations change for any historic data.
Any ideas on how to keep a 'historic' schedule with as minimum impact as possible to the database?
What you have is a (potentially) slowly-changing dimension. Basically, there are two approaches:
For each transactional record, include the hours that you are interested in.
Store the schedule with time frames, which capture the schedule at a particular point in time.
In SQL Server, I would normally go for the second option, using effDate and endDate columns to capture the period when the schedule is active.
I would suggest that you start with a deeper explanation of the concept, such as the Wikipedia page.

Google BigQuery for realtime call records data

I am thinking to use Google Big Query to store realtime call records involving around 3 million rows per day inserted and never updated.
I have signed up for a trial account and ran some tests
I have few concerns before i can go ahead with development
When streaming data via PHP it takes around 10-20 minutes sometime to get loaded on my tables and this is a show stopper for us because network support engineers need this data updated realtime to troubleshoot quality issues
Partitions, we can store data in partitions divided for each day but that also involves one partition being 2.5 GB on any given day and that shoots my costs to query data in range of thousands per month. Is there any other way to bring down cost here? We can store data partitioned per hour but there is no such support available.
If not BigQuery what other solutions are out there in market which can deliver similar performance and can solve these problems ?
You have the "Streaming insert" option which enables the records to be searchable in few seconds (it has its price).
See: streaming-data-into-bigquery
Check table-decorators for limiting query scan.

Bloomberg API request limit

Is there anyway to determine how many requests or how much data you have in your remaining request limit amount for Bloomberg API?
from Bloomberg HelpDesk on April 2014 (this is valid for a basic desktop client):
We have 3 kind of limits..
You can have no more that 3500 real time fields open at the same time.
If you exceed this limit you will see "NA Limit" as error message and
you just need to delete some securities/ fields in order for the error
message to disappeared and to see the values.
We have also a daily limit. The Daily API limit is 500,000 hits/per
day. A "hit" is defined as one request for a single security/field
pairing. Therefore, if you request static data for 5 fields and 10
securities, that will translate into a total of 50 hits. so try to
refresh just the portion of the spreadsheet that really needs to be
refreshed and avoid refreshing it all or reopen it many times a day.
The last limit is a monthly limit. Our monthly limits comes from a
proprietary model. Only about 0.4% of our user database ever goes over
this limit. This limit is based on unique securities and depends on
the type of data being downloaded. For example some of the data on the
system such as intra-day is valued a little bit higher than historical
end of day for any given list of securities. We do not recommend more
than 5000 to 7000 unique identifiers per month and the limit upgrade
will only allow you to get data to complete your project. Once a
security is used once in a month then if you use it again it will not
count again towards the monthly limit.
We normally grant 2 resets per month in case you exceed your daily
limit and if you exceed your monthly limit we grant 1 extension per
month (10% more), if you breach the limit again you will then need to
wait for the midnight for the daily limit to be reset automatically or
the end of the month for the reset of the monthly.
Bloomberg do not state what the explicit limits are, and there is no programmatic way of finding out what the limits are or what proportion of your limits you have used.
The best information from Bloomberg that I have found is on the WAPI page (in the terminal). On the menus on the LHS, go to WAPI Home > API Resources > API Data Limits. There are two pages, 'Extended Rules and Usage Limits' and 'Managing Your API Data Limits' that shed some further light on the matter.
Broadly speaking... there is a daily limit of individual data requests (i.e. security/field pairs - but duplicates are counted for each request). However, your limit for subscriptions is based on the number of securities you are subscribing to concurrently - i.e. if you expect to be requesting the price of a security every 5 mins, you are much better off subscribing to that security's price. Then there is a monthly limit that is based on the number of unique securities that you are making requests for.
there is an upper limit on Bloomberg API, 500,000 hits per day.
-- information from Bloomberg Help Help
The daily limit is clearly stated - it is the monthly limit that is not to my knowledge disclosed in writing. I have been told the following in the context of discussions about Data Licence, which is one Bloomberg product for bulk data subscription. The monthly limit is expressed as a budget in $, and it is the equivalent price for your requests, priced under the Data Licence schema, which clearly is not secret if you enquire about that product. So why the secrecy about the budget? The reason it is commercially sensitive is that this budget is many times the monthly cost of the Terminal Licence, so clearly if you (a) know what it is and (b) either have access via API to the budget spent (nope) OR write software to 'count the cost' (not hard), then you could pony up a couple of terminals and vastly reduce your Data Licence spend. Bloomberg naturally frown on this sort of activity because it represents an arbitrage opportunity in their pricing model and it is not really 'playing nice'. They likewise do not like if you hit 'the wrong kind of data' too often or the monthly limit at all often, and these activities may prompt them to investigate your business model to be sure you are in compliance with all the T&C of the Data Addendum. Out of courtesy to Bloomberg I am not posting that budget number here, but you should be able to get it from your salesperson and confirm the validity of what I have said, because it may change at any time as it is not part of any contract.
I don't believe this is possible programmatically, however if you speak to the Bloomberg helpdesk they will be able to tell you whether you are near the limit, and reset it for you if necessary. Obviously they will only do that a certain number of times. I have not managed to get a definitive answer as to what the limit is, but it's designed to be large enough that you would not hit it just running spreadsheets, which have a limit of 3500 Bloomberg real-time formulas.
If you feel the download limit is not breached but you still get the error message, you can run the following steps to solve the issue:
Close Excel completely.
From the Windows "Start" menu, select All Programs > Bloomberg > Stop API Process. A command prompt window appears.
Press <Enter> to close the window.
From the Windows Start menu, select All Programs > Bloomberg > API Environment Diagnostics.
Click the Start button.
When the test is complete, if there are any red errors, click the "Repair" button.
Re-open Excel and test a formula.
500'000 data points is the approximate daily limit, however remember different types of data use up varying amounts. It is not 1 for 1. Typically requests for esoteric securities and fields will use up more data per request, than PX_LAST for AAPL US for example. Also there are different types of request, such as reference or historic, which will also consume your limit differently.
If you are requesting intra-day realtime data, these fields are typically not charged to your usage limit. Rather you have limits on how many times the realtime 'pipe' can be opened.
Bloomberg are typically very helpful at resetting your monthly data usage limit should you exceed it on an adhoc basis. This is not written company policy, but seems to be part of their customer care. If you are persistently breaching limits each month, they are likely to stop resetting your limits and try to move you to B-PIPE. But otherwise for my experience they are flexible

Data synchronization between two databases

I need to synchronize between two data sources:
I have a web service running on then net. It continuously gathers data from the net and stores it on the database. It also provides the data to the client based on the client's request. I want to keep a repository of data as object for faster service.
On the client side, there is a windows service that calls the web service mentioned previously and synchronize its local database to the server.
Few of my restrictions:
The web service has very small buffer limit and it can only transfer less then 200 records per call which is not enough for data collected in a day.
I also can't copy the database files since the database structure is very different (sql and other is access)
The data is being updated on a hourly basis and there will be large amount of data that will be needed to be transfer.
Sync by date or other group is not possible with the size limitation. Paging can be done but the remote repository keeps changing (and I don't know how to take chunk of data from the middle of table of SQL database)
How do I use the repository for recent data update/or full database in sync with this limitation?
A better approach for the problem or an improvement of the current approach will be taken as the right answer
You mentioned that syncing by date or by group wouldn't work because the number of records would be too big, but what about syncing by date (or group or whatever) and then paging by that? The benefit is that you will have a defined batch of records and you can now page over that because that group won't change.
For example, if you need to pull data off hourly, as each hour elapses (so, when it goes from 8:59am to 9:00 am), you begin pulling down the data that was added between 8am and 9am in chunks of 200 or whatever size the service can handle.