BigQuery: Why does Table Range Decorators return wrong result sometimes? - google-bigquery

I've been using the Table Range Decorators feature daily since May in order to only query the data from the last 7 days in some of my tables.
Since 2 weeks, I've noticed that sometimes some data is missing when I use that feature. For example, if do a query to get the results for the last 7 days (by adding "#-604800000--1" to table), some data will be missing as opposed to if I query on the whole table (without a table decorator).
I wonder what could explain this and if there is a fix coming soon to address this?
If this can help the BigQuery team, I've noticed that when using Table Decorators some data was missing for us for October 16th between around 16:00 and 20:00 UTC time.
For the BigQuery team here are 2 jobs ids where some data is missing: job_-xtL4PlIYhNjQ5weMnssvqDmd6U , job_9ASNxqq_swjCd1eMmiQ6SmPpxlQ
and 1 job id where data is correct(without decorators): job_QbcRwYGbQv0BZdHreQEvRlYh-mM

This is a known issue with table decorators containing a time range. Due to a bug in BigQuery, it is possible for certain time ranges to omit data that should be included within the time range.
We're working on a fix and plan to have it released next week. After this fix is deployed time range decorators should again work as expected.

Related

How do I add new rows to SQL automatically by time?

I'm a pretty new programmer and I'm working on a project that I'm not sure how to make work. I'm hoping for some advice please.
Part of the project I'm working on will be used by a company to allow employees to sign up for lunch from their computers. I'm doing the project in MVC ASP.NET
The interface will look something like this:
----------------------
|1200 | Employee Dropdown Name 1
| Employee Dropdown Name 2
|---------------------
|1230 | Employee Dropdown Name 1
| Employee Dropdown Name 2
|---------------------
and on and on and on.
With this company, everything has to be recorded and stored. So, I already have a table with employee information. That will populate the drop down areas. Lunch times need to be stored in the database so it can be searched years down the line. So it has to be in a table.
The table get more tricky because not every time of the day is available for lunch (i.e. - no lunches after 0430 and before 0800).
My question is about how to create the future time slots in the database.
I could obviously make the table with all of these rows already in places for several years down the line. That's time-consuming, though, and I'll have to go back in in several years and fix it. Horrible idea.
What I'd LOVE to do is make it so every 24 hours, the database just automatically adds new rows with the next days times available - so just increment (at midnight, the program will just add the next day's times associated with that date (so at midnight on February 6, 2020, it will create February 7, 2020 0000, February 7, 2020 0030, etc. I've studied a lot but I'm still beside myself on how to make this work.
Thanks in advance everyone!!!
As I understand, you want to drive your interface from the database table so that the user can select Name 1 and Name 2 and a time slot and submit.
It sounds like you also want the available timeslots to be driven by the database also (ie, timeslot in table without names with it is availlable). This is not a good idea. As you mentioned, you would be inserting data that is not actually a record but a placeholder. That will be very confusing down the track when you come to query the data.
My approach would be to do the following:
* add NOT NULL constraints to all columns in your database (if your database supports this feature) or have your app complain very much about NULLS in any of the columns. There is no need for NULLS in your use case by the look of it.
the database should have a CHECK constraint that the time is within the allowable time range, and (assuming employees can not double book time slots) a CHECK constraint that there is no overlapping time slots, and also a UNIQUE constraint that ensures no duplicate times.... adjust to suit your needs.
your app populates times between 0800 and 1630 (8AM and 4:30PM) and also query the database for all records matching the current day so those booked slots can be removed from the list of available time slots... adjust to suit.
your app sends the user request of name and time slot to the DB. All the critical requirements are accepted or rejected by the DB schema and if there is something wrong, display an appropriate error in the app.
This way, your database is literally storing records of booked lunches.
I would NOT go down the path of pre inserting as then it becomes more complex as some records are "real" and some are artificially generated records to drive a GUI...
If you can't do the time slot calculations in your app rather than in the DB, then at least use a separate table that is maintained by a worker thread in your app OR if your DB supports it, a Stored Procedure which returns a table of available time slots.
I would use the stored procedure if I was avoiding doing complex time calculations in my app (also avoids need to worry about time zones - if you make sure to only store and display UTC times in your DB).
Having in mind structure like this:
LunchTimeSlots (id, time_slot)
Employee (id, name, preferred_time_slot_id, etc)
Lunches(employee_id, time_slot_id, date)
You need a scheduled job to add records to the "Lunches" table every midnight. How to define the job depends on your database vendor. But most of the popular rdbms have this feature. (f.e. mssql)
Despite it's possible to do what you want with db schedulers or any other scheduler, i would recommend to avoid such db design. It's always better to write real facts to the database like a list of employees or fact that lunch was served
to employee at 1pm today.
Unlike real facts, virtual data can be always generated "on-the-fly" by sql queries. F.e. by joining employees to list of dates from today till year 2100, we can get planned lunches for all employees for next 80 years.

Bigquery - Table decorators changed weirdly

I used to have a number of queries running on the past 40 days of data using a decorator with [dataset.table#-4123456789-].
However, since September 15 all the decorators return maximum 10 days of data.
By the way [dataset.table#0] returns the whole table and not the past 7 days as told in the documentation.
Does anyone know what is going on. Do I have to move my table to partition in order to receive data for a limited period of time but more the a week?
Thanks

How to store availability information in SQL, including recurring items

So I'm developing a database for an agency that manages many relief staff.
Relief workers set their availability for each day in one of three categories (day, evening, night).
We also need to be able to set some part-time relief workers as busy on weekly, biweekly, and in one instance, on a 9-week rotation. Since we're already developing recurring patterns of availability here, we might as well also give the relief workers the option of setting recurring availability days.
We also need to be able to query the database, and determine if an employee is available for a given day.
But here's the gotcha - we need to be able to use change data capture. So I'm not sure if calculating availability is the best option.
My SQL prototype table looks like this:
TABLE Availability Day
employee_id_fk | workday (DATETIME) | day | eve | night (all booleans)| worksite_code_fk (can be null)
I'm really struggling how to wrap my head around recurring events. I could create say, a years worth, of availability days following a pattern in 'x' day cycle. But how far ahead of time do we store information? I can see running into problems when we reach the end of the data set.
I was thinking of storing say, 6 months of information, then adding a server side task that runs monthly to keep the tables updated with 6 months of data, but my intuition is telling me this is a bad fix.
For absolutely flexibility in the future and keeping data from bloating my first thought would be something like
Calendar Dimension Table - Make it for like 100 years or Whatever you Want make it include day of week information etc.
Time Dimension Table - Hour, Minutes, every 15 what ever but only for 24 hour period
Shifts Table - 1 record per shift e.g. Day, Evening, and Night
Specific Availability Table - Relationship to Calendar & Time with Start & Stops recommend 1 record per day so even if they choose a range of 7 days split that to 1 record perday and 1 record per shift.
Recurring Availability Table - for day of week (1-7),Month,WeekOfYear, whatever you can think of. But again I am thinking 1 record per value so if they are available Mondays and Tuesday's that would be 2 rows. and if multiple shifts then it would be multiple rows.
Now and here is the perhaps the weird part, I would put a Available Column on the Specific and Recurring Availability Tables, maybe make it a tiny int and store something like 0 not available, 1 available, 2 maybe available, 3 available with notice.
If you want to take into account Availability with Notice you could add columns for that too such as x # of days. If you want full flexibility maybe that becomes a related table too.
The queries would be complex but you could use a stored procedure or a table valued function to handle it fairly routinely.

Use domain of one table for criteria in another in ms Access query?

I am trying to create a report that displays 3 different numbers for each of my projects.
Contract Hours - Stored in projects table, 1 to 1 relationship
Worked Hours - Stored in linked table that will be updated using an external website reporting feature that will contain only data for the dates that are to be displayed in the report, one to many relationship needs to be a sum
Allocated Hours - Stored in a table in my database called allocations and contains data for all dates, one to many relationship needs to be summed.
Right now i have it set up in a way that the user has to type the data range for the report every time it is run, however the date range only actually applies to the Allocation data because the worked hours data comes filtered and the contract data is one to one.
What I would like to do is set up a query that can see the domain of the worked hours and apply it as a date criteria for the allocated hours.
I have attempted to use max and min values of the Worked hours and tried to get creative but I'm actually not even sure if this is possible because I cannot see any simple solution (although I know it should be possible and fairly simple)
Any help, suggestions, or recommendations are appreciated.

Big Query Table Last Modified Timestamp does not correspond to time of last table insertion

I have a table, rising-ocean-426:metrics_bucket.metrics_2015_05_09
According to the node js API, retrieving metadata for this table,
Table was created Sat, 09 May 2015 00:12:36 GMT-Epoch 1431130356251
Table was last modified Sun, 10 May 2015 02:09:43 GMT-Epoch 1431223783125
By my records, the last batch insertion to this table was actually on:
Sun, 10 May 2015 00:09:36 GMT - Epoch 1431216576000.
This is two hours earlier than the reported last modification time. Using table decorators, I can show that no records were inserted into the table after Epoch 1431216576000, proving that no records were inserted in the last two hours between the last batch insertion I made and the last modification time reported in the metadata:
The query: SELECT
count(1) as count
FROM [metrics_bucket.metrics_2015_05_09#1431216577000-1431223783125];
returns zero count. Whereas the query:
SELECT
count(1) as count
FROM [metrics_bucket.metrics_2015_05_09#1431216576000-1431216577000];
returns count: 222,891
Which shows that the correct last modification time was Sun, 10 May 2015 00:09:36 GMT, and not 02:09:43 GMT as the metadata asserts.
I am trying to programmatically generate a FROM clause that spans multiple tables with decorators, so I need accurate creation and last modification times for the tables in order to determine when decorators can be omitted because the time range spans the entire table. However, due to this time discrepancy, I cannot eliminate the table decorators.
The question is, am I looking at the right metadata to obtain correct creation and last modification information?
Short answer: You are indeed looking at the correct metadata.
Long answer:
The last modification time includes the time of some internal compaction of data, unrelated to data change. Executing a query against your table with a decorator ending at either 1431223783125 or 1431216576000 produces the same results, just like your experiments show, but executing at the later time includes our storage efficiency improvements that may slightly improve execution time and efficiency. We consider this a bug, and will soon update the API to return the last user-modification time instead.
In the meantime, there's no harm to including table decorators that aren't really needed, other than the added query text. Neither query cost or performance will change.