How to perform Rolling window in tensorflow-extended? - tfx

I am new to TFX. My doubts may seems trivial but it may help many people who want to implement TFX.
What are various ways to load the data for rolling window needed for time series? (My input source is postgress database which already has label). I am querying last week data from database and from this data I want to implement a rolling window of 10 hrs of feed to predict next hour of feed.
Thanks

Related

Storing vast amounts of "uptime" data for a website monitoring service

this is more of a general discussion rather than a code question.
I have a website monitoring platform whereby users of the system can input their website URL and we'll check it every X minutes based on the customer's interval, at each interval, an entry is stored as a UptimeCheck model in the Laravel 8 project with the status being down or up.
If a customer has 20 monitors, and each checks every minute, then over a 30 day period for the one customer they'd accumulate over 1 million rows.
My query, is really do I need to keep this number of rows?
The reason this number of rows is kept is so that we can present a graph showing the average website uptime.
My thinking is that if I created some kind of SVG programatically for each day and store this in the table then I wouldn't need to store as many entries, but my concern here is how would I merge SVG models into one to present a daily graph?
What kind of libraries could I use and how else might I approach this?
Unlike performance, the trick for storing uptime data is simple. You don't store it. ;)
You need to store DOWNTIME data instead. Register only unavailability events and extrapolate uptime when displaying reports.

Is there a way to pause querying in google data-studio while editing a report

I’m building a bar chart in google data-studio in a report connected to BigQuery, calculating min, max, avg for a metric with one dimension. The problem is that every time I edit the chart to add the metric and change its calculation (for instance for sum to min), a BigQuery query is run which is very wasteful. So I was wondering if there is a way to pause the querying until I finish constructing / editing the chart before unpausing it to have the final query with the final chart.
Thx in advance
Turning off the pre-fetch cache may lower costs: https://support.google.com/datastudio/answer/7020039?hl=en. But I'm not sure if that will stop queries from being issued during edits.
You can try writing your own connector using Apps Script and fetch data using the BigQuery service. If you set up an intermediary Apps Script cache, you reduce the number of times you have to hit BigQuery.

Changing Opening Hours without affecting historic data

I've been tasked to create a data visualisation dashboard that relies on me drilling into the existing database.
One report is 'revenue per available covers' - part of the calculation determining how many hours were booked against how many hours were available.
The problem is the 'hours available', currently this is stored in a schedule table that has a 1-1 link with the venue - and if admin want to update this there is a simple CRUD panel with the pre-linked field ready to complete.
I have realised that if I rely on this table at any point in the future when the schedule changes the calculations change for any historic data.
Any ideas on how to keep a 'historic' schedule with as minimum impact as possible to the database?
What you have is a (potentially) slowly-changing dimension. Basically, there are two approaches:
For each transactional record, include the hours that you are interested in.
Store the schedule with time frames, which capture the schedule at a particular point in time.
In SQL Server, I would normally go for the second option, using effDate and endDate columns to capture the period when the schedule is active.
I would suggest that you start with a deeper explanation of the concept, such as the Wikipedia page.

how to store weekly data from google analytics

I have some simple weekly aggregates from Google analytics that i'd like to store somewhere. The reason for storing is because if I run a query against too much data in google analytics, it becomes sampled and I want it to be totally accurate.
What is the best way to solve this?
My thoughts are:
1) Write a process in bigquery to append the data each week to a permanent dataset
2) Use an API that gets the data each week and stores the data in a google spreadsheet (appending a line each time)
What is the best recommendation for my problem - and how do I go about executing it?
Checking your previous questions, we see that you already use Bigquery.
When you run a query against the Google Analytics tables that is not sampled, as that has all the data in it. There is no need to store as you can query every time you need.
In case if you want to store, and pay for the addition table, you can go ahead store in a destination table.
If you want to access quickly, try creating a view.
I suggest the following:
1) make a roll-up table for your weekly data - you can do that either by writing a query for it and running manually or with a script in a Google Spreadsheet that uses the same query (using the API) and is scheduled to run every week. I tried a bunch of the tutorials out there and this one is the simplest to implement
2) depending on the data points you want, you can even use the Google Analytics API without having to go through BigQuery for this request, try pulling this report of yours from here . If it works there are a bunch of Google Sheets extensions that can make it a lot quicker to set up a weekly report. Or you can just code it yourself
Would that work for you?
thks!

Time based sliding window query in splunk

Is there a way to do time based sliding window query in splunk on real time? To provide an insight what I am looking for is, lets say if log statements are published to splunk, can I get counts of error which has occurred in last 15 minutes. And this has to be sliding and continuously updating me the state of the system.
As you said, you can use real time queries.
create your query
make it real time (last 15 min)
save as alert
set the cron period for the query to run
Hope it helps.