Changing Default Expiry on a Dataset - How to ensure data preservation - google-bigquery

I have noticed that my big query database was configured to clear data more than 60 days old. The setting has been changed on the dataset for Default table expiry = Never but came with the warning "existing tables will not be affected"
What does this mean for future data preservation? It looks as though each day is its own table, so the 60 day expiry will only stop being a problem 60 days from now?
any clarification on what the wording of this warning means and how it will affect our data preservation in big query would be great
thanks
Aaron

If you have already created tables in that dataset, then these table still have 60 days expiry. You need to clear this setting on each table individually. Otherwise all data older than 60 days will keep deleting every day.
If your system creates new table everyday, then yes, the expiry will stop being a problem after 60 days, because new tables will have no expiry. All old tables will be emptied though.

Related

How to manually test a data retention requirement in a search functionality?

Say, data needs to be kept for 2years. Then all data that were created 2years + 1day ago should not be displayed and be deleted from the server. How do you manually test that?
I’m new to testing and I can’t think of any other ways. Also, we cannot do automation due to time constraints.
You can create the data with backdating of more than two years in the database and can test, if it is being deleted or not automatically, In other ways ,you can change the current business date from the database and can test it
For the data retention functionality a manual tester needs to remember the search data so that the tester can perform the test cases for the search retention feature.
By Taking an example of a social networking app , being a manual tester you need to remember all the users that you searched for recently.
To check the time period of retention you can take the help from the backend developer so that they can change the time period (from like one year to 10 min) for testing purpose.
Even if you delete the search history and then you start typing the already entered search result the related result should pop on the first location of the search result. Data retention policies concern what data should be stored or archived, where that should happen, and for exactly how long. Once the retention time period for a particular data set expires, it can be deleted or moved as historical data to secondary or tertiary storage, depending on the requirement
Let’s us understand with an example, that we have below data in our database table based on past search made by users. Now with the help of this table, you can perform this testing with minimum effort and optimum result. We have Current Date as - ‘2022-03-10’ and Status column states that data is available / not available in database, where Visible means available, while Expired means deleted from table.

Search Keyword
Search On Date
Search Expiry Date
Status
sport
2022-03-05
2024-03-04
Visible
cricket news
2020-03-10
2022-03-09
Expired - Deleted
holy books
2020-03-11
2022-03-10
Visible
dance
2020-03-12
2022-03-11
Visible

How do I add new rows to SQL automatically by time?

I'm a pretty new programmer and I'm working on a project that I'm not sure how to make work. I'm hoping for some advice please.
Part of the project I'm working on will be used by a company to allow employees to sign up for lunch from their computers. I'm doing the project in MVC ASP.NET
The interface will look something like this:
----------------------
|1200 | Employee Dropdown Name 1
| Employee Dropdown Name 2
|---------------------
|1230 | Employee Dropdown Name 1
| Employee Dropdown Name 2
|---------------------
and on and on and on.
With this company, everything has to be recorded and stored. So, I already have a table with employee information. That will populate the drop down areas. Lunch times need to be stored in the database so it can be searched years down the line. So it has to be in a table.
The table get more tricky because not every time of the day is available for lunch (i.e. - no lunches after 0430 and before 0800).
My question is about how to create the future time slots in the database.
I could obviously make the table with all of these rows already in places for several years down the line. That's time-consuming, though, and I'll have to go back in in several years and fix it. Horrible idea.
What I'd LOVE to do is make it so every 24 hours, the database just automatically adds new rows with the next days times available - so just increment (at midnight, the program will just add the next day's times associated with that date (so at midnight on February 6, 2020, it will create February 7, 2020 0000, February 7, 2020 0030, etc. I've studied a lot but I'm still beside myself on how to make this work.
Thanks in advance everyone!!!
As I understand, you want to drive your interface from the database table so that the user can select Name 1 and Name 2 and a time slot and submit.
It sounds like you also want the available timeslots to be driven by the database also (ie, timeslot in table without names with it is availlable). This is not a good idea. As you mentioned, you would be inserting data that is not actually a record but a placeholder. That will be very confusing down the track when you come to query the data.
My approach would be to do the following:
* add NOT NULL constraints to all columns in your database (if your database supports this feature) or have your app complain very much about NULLS in any of the columns. There is no need for NULLS in your use case by the look of it.
the database should have a CHECK constraint that the time is within the allowable time range, and (assuming employees can not double book time slots) a CHECK constraint that there is no overlapping time slots, and also a UNIQUE constraint that ensures no duplicate times.... adjust to suit your needs.
your app populates times between 0800 and 1630 (8AM and 4:30PM) and also query the database for all records matching the current day so those booked slots can be removed from the list of available time slots... adjust to suit.
your app sends the user request of name and time slot to the DB. All the critical requirements are accepted or rejected by the DB schema and if there is something wrong, display an appropriate error in the app.
This way, your database is literally storing records of booked lunches.
I would NOT go down the path of pre inserting as then it becomes more complex as some records are "real" and some are artificially generated records to drive a GUI...
If you can't do the time slot calculations in your app rather than in the DB, then at least use a separate table that is maintained by a worker thread in your app OR if your DB supports it, a Stored Procedure which returns a table of available time slots.
I would use the stored procedure if I was avoiding doing complex time calculations in my app (also avoids need to worry about time zones - if you make sure to only store and display UTC times in your DB).
Having in mind structure like this:
LunchTimeSlots (id, time_slot)
Employee (id, name, preferred_time_slot_id, etc)
Lunches(employee_id, time_slot_id, date)
You need a scheduled job to add records to the "Lunches" table every midnight. How to define the job depends on your database vendor. But most of the popular rdbms have this feature. (f.e. mssql)
Despite it's possible to do what you want with db schedulers or any other scheduler, i would recommend to avoid such db design. It's always better to write real facts to the database like a list of employees or fact that lunch was served
to employee at 1pm today.
Unlike real facts, virtual data can be always generated "on-the-fly" by sql queries. F.e. by joining employees to list of dates from today till year 2100, we can get planned lunches for all employees for next 80 years.

Bigquery - Table decorators changed weirdly

I used to have a number of queries running on the past 40 days of data using a decorator with [dataset.table#-4123456789-].
However, since September 15 all the decorators return maximum 10 days of data.
By the way [dataset.table#0] returns the whole table and not the past 7 days as told in the documentation.
Does anyone know what is going on. Do I have to move my table to partition in order to receive data for a limited period of time but more the a week?
Thanks

How to store availability information in SQL, including recurring items

So I'm developing a database for an agency that manages many relief staff.
Relief workers set their availability for each day in one of three categories (day, evening, night).
We also need to be able to set some part-time relief workers as busy on weekly, biweekly, and in one instance, on a 9-week rotation. Since we're already developing recurring patterns of availability here, we might as well also give the relief workers the option of setting recurring availability days.
We also need to be able to query the database, and determine if an employee is available for a given day.
But here's the gotcha - we need to be able to use change data capture. So I'm not sure if calculating availability is the best option.
My SQL prototype table looks like this:
TABLE Availability Day
employee_id_fk | workday (DATETIME) | day | eve | night (all booleans)| worksite_code_fk (can be null)
I'm really struggling how to wrap my head around recurring events. I could create say, a years worth, of availability days following a pattern in 'x' day cycle. But how far ahead of time do we store information? I can see running into problems when we reach the end of the data set.
I was thinking of storing say, 6 months of information, then adding a server side task that runs monthly to keep the tables updated with 6 months of data, but my intuition is telling me this is a bad fix.
For absolutely flexibility in the future and keeping data from bloating my first thought would be something like
Calendar Dimension Table - Make it for like 100 years or Whatever you Want make it include day of week information etc.
Time Dimension Table - Hour, Minutes, every 15 what ever but only for 24 hour period
Shifts Table - 1 record per shift e.g. Day, Evening, and Night
Specific Availability Table - Relationship to Calendar & Time with Start & Stops recommend 1 record per day so even if they choose a range of 7 days split that to 1 record perday and 1 record per shift.
Recurring Availability Table - for day of week (1-7),Month,WeekOfYear, whatever you can think of. But again I am thinking 1 record per value so if they are available Mondays and Tuesday's that would be 2 rows. and if multiple shifts then it would be multiple rows.
Now and here is the perhaps the weird part, I would put a Available Column on the Specific and Recurring Availability Tables, maybe make it a tiny int and store something like 0 not available, 1 available, 2 maybe available, 3 available with notice.
If you want to take into account Availability with Notice you could add columns for that too such as x # of days. If you want full flexibility maybe that becomes a related table too.
The queries would be complex but you could use a stored procedure or a table valued function to handle it fairly routinely.

BigQuery: Why does Table Range Decorators return wrong result sometimes?

I've been using the Table Range Decorators feature daily since May in order to only query the data from the last 7 days in some of my tables.
Since 2 weeks, I've noticed that sometimes some data is missing when I use that feature. For example, if do a query to get the results for the last 7 days (by adding "#-604800000--1" to table), some data will be missing as opposed to if I query on the whole table (without a table decorator).
I wonder what could explain this and if there is a fix coming soon to address this?
If this can help the BigQuery team, I've noticed that when using Table Decorators some data was missing for us for October 16th between around 16:00 and 20:00 UTC time.
For the BigQuery team here are 2 jobs ids where some data is missing: job_-xtL4PlIYhNjQ5weMnssvqDmd6U , job_9ASNxqq_swjCd1eMmiQ6SmPpxlQ
and 1 job id where data is correct(without decorators): job_QbcRwYGbQv0BZdHreQEvRlYh-mM
This is a known issue with table decorators containing a time range. Due to a bug in BigQuery, it is possible for certain time ranges to omit data that should be included within the time range.
We're working on a fix and plan to have it released next week. After this fix is deployed time range decorators should again work as expected.