BigQuery view appears to be generating huge number of Queries - google-bigquery

I have a View which is populated by a SQL request to a regular dataset. As far as I know, not a soul is looking at that view (at least not very often). But if I go to the View and click on Project History then there is an IAM service account which is running the query every 15 seconds. Each query is shuffling 2.55 MB and reading 70,000 records, so I'd really prefer that it didn't do this.
The dataset source to create the view shows that it's last modified date was 3 days ago, so the service account is not being trigged by a change in the source. I checked the job scheduler and nothin there. So what is triggering it and how can I tell it to calm down?

Related

Storing vast amounts of "uptime" data for a website monitoring service

this is more of a general discussion rather than a code question.
I have a website monitoring platform whereby users of the system can input their website URL and we'll check it every X minutes based on the customer's interval, at each interval, an entry is stored as a UptimeCheck model in the Laravel 8 project with the status being down or up.
If a customer has 20 monitors, and each checks every minute, then over a 30 day period for the one customer they'd accumulate over 1 million rows.
My query, is really do I need to keep this number of rows?
The reason this number of rows is kept is so that we can present a graph showing the average website uptime.
My thinking is that if I created some kind of SVG programatically for each day and store this in the table then I wouldn't need to store as many entries, but my concern here is how would I merge SVG models into one to present a daily graph?
What kind of libraries could I use and how else might I approach this?
Unlike performance, the trick for storing uptime data is simple. You don't store it. ;)
You need to store DOWNTIME data instead. Register only unavailability events and extrapolate uptime when displaying reports.

How to auto refresh Power BI dataset after failure

I have scheduled query for a dataset in Power BI.
In case of a refresh failure, I want Power BI to "retry" to refresh the data again, up to 5 times.
Is there a way to do it?
For the time being it doesn't seem possible as confirmed by this post. You can play with the "Command time out in minutes(optional)" in your query when creating your data source as noted in the comments.
Under Advanced options.
If the timeout is left blank the default is 10 minutes. So if the issue is that your queries are timing out this may be the solve for you.
Another workaround is that you can schedule your data source to update multiple times at half hour increments. Like so. Note that depending on how big your data set is this may place a burden on the server you are pulling from. If that is the case then looking into incremental refresh would be your next go to.
Hope this helps.

Changing Opening Hours without affecting historic data

I've been tasked to create a data visualisation dashboard that relies on me drilling into the existing database.
One report is 'revenue per available covers' - part of the calculation determining how many hours were booked against how many hours were available.
The problem is the 'hours available', currently this is stored in a schedule table that has a 1-1 link with the venue - and if admin want to update this there is a simple CRUD panel with the pre-linked field ready to complete.
I have realised that if I rely on this table at any point in the future when the schedule changes the calculations change for any historic data.
Any ideas on how to keep a 'historic' schedule with as minimum impact as possible to the database?
What you have is a (potentially) slowly-changing dimension. Basically, there are two approaches:
For each transactional record, include the hours that you are interested in.
Store the schedule with time frames, which capture the schedule at a particular point in time.
In SQL Server, I would normally go for the second option, using effDate and endDate columns to capture the period when the schedule is active.
I would suggest that you start with a deeper explanation of the concept, such as the Wikipedia page.

SSRS Caching and/or Snapshot

I am fairly new to SSRS reports so I am looking for guidance. I have SSRS reports that have 3 visible parameters: Manager, Director, and VP. The report will display data based on the parameters selected. Initially, the report was taking a very long time to load and my research led me to creating a snapshot of the report.
The initial load of the report is really quick (~5 secs) but the parameters are set to "Select All" in all sections. When the report is later filtered to say, only 1 VP, the load time can vary anywhere between 20 to 90 seconds. Because this report will be used by all aspects of management within the organization, load time is critical.
Is it possible to load the filtered data quicker? Is there anything I can do?
Any help will be much appreciated.
Thank you!
This is a pretty broad efficiency issue. One of the big questions is whether or not the query takes a long time to run in the database or just in SSRS. Ideally you would start with optimizing the query and indexing, but that's not always enough. So the work has to be done somewhere, all you can do is shift the work to be done before the report is run. Here are a couple options:
Caching
Turn on caching for the report.
Schedule a subscription to run with each possible value for the parameter. This will cause the report to still load quickly once an individual is specified.
Intermediate Table
Schedule a SQL stored procedure to aggregate and index the data in a new table in your database.
Point the report to run from this data for quick reads.
Each option has it's pros and cons because you have to balance where the data preparation work is done. Sometimes you have to try a few options to see what works best for your situation.

SQL job loading data into portal

Can someone help me on this?
Am working on a portal(Website) where all the data comes from another application. I load those data into my application's database table using a SQL Job on hourly basis everyday. The real problem is when the job runs and the data gets started to load into the portal. My portal's behavior is bad. i.e. when a user opens the portal when the job is running he see data mismatch/slow performance etc. But everything becomes normal once the job runs successfully.
Job takes 7 minutes to get complete and the real problem occurs during that 7 mins.
Please help, Thank you in advance.