How to manually test a data retention requirement in a search functionality? - testing

Say, data needs to be kept for 2years. Then all data that were created 2years + 1day ago should not be displayed and be deleted from the server. How do you manually test that?
I’m new to testing and I can’t think of any other ways. Also, we cannot do automation due to time constraints.

You can create the data with backdating of more than two years in the database and can test, if it is being deleted or not automatically, In other ways ,you can change the current business date from the database and can test it

For the data retention functionality a manual tester needs to remember the search data so that the tester can perform the test cases for the search retention feature.
By Taking an example of a social networking app , being a manual tester you need to remember all the users that you searched for recently.
To check the time period of retention you can take the help from the backend developer so that they can change the time period (from like one year to 10 min) for testing purpose.
Even if you delete the search history and then you start typing the already entered search result the related result should pop on the first location of the search result. Data retention policies concern what data should be stored or archived, where that should happen, and for exactly how long. Once the retention time period for a particular data set expires, it can be deleted or moved as historical data to secondary or tertiary storage, depending on the requirement

Let’s us understand with an example, that we have below data in our database table based on past search made by users. Now with the help of this table, you can perform this testing with minimum effort and optimum result. We have Current Date as - ‘2022-03-10’ and Status column states that data is available / not available in database, where Visible means available, while Expired means deleted from table.

Search Keyword
Search On Date
Search Expiry Date
Status
sport
2022-03-05
2024-03-04
Visible
cricket news
2020-03-10
2022-03-09
Expired - Deleted
holy books
2020-03-11
2022-03-10
Visible
dance
2020-03-12
2022-03-11
Visible

Related

report scheduler system design using database as master

Problem
we have ~50k scheduled financial reports that we periodically deliver to clients via email
reports have their own delivery frequency (date&time format - as configured by clients)
weekly
daily
hourly
weekdays only
etc.
Current architecture
we have a table called report_metadata that holds report information
report_id
report_name
report_type
report_details
next_run_time
last_run_time
etc...
every week, all 6 instances of our scheduler service poll the report_metadata database, extract metadata for all reports that are to be delivered in the following week, and puts them in a timed-queue in-memory.
Only in the master/leader instance (which is one of the 6 instances):
data in the timed-queue is popped at the appropriate time
processed
a few API calls are made to get a fully-complete and current/up-to-date report
and the report is emailed to clients
the other 5 instances do nothing - they simply exist for redundancy
Proposed architecture
Numbers:
db can handle up to 1000 concurrent connections - which is good enough
total existing report number (~50k) is unlikely to get much larger in the near/distant future
Solution:
instead of polling the report_metadata db every week and storing data in a timed-queue in-memory, all 6 instances will poll the report_metadata db every 60 seconds (with a 10 s offset for each instance)
on average the scheduler will attempt to pick up work every 10 seconds
data for any single report whose next_run_time is in the past is extracted, the table row is locked, and the report is processed/delivered to clients by that specific instance
after the report is successfully processed, table row is unlocked and the next_run_time, last_run_time, etc for the report is updated
In general, the database serves as the master, individual instances of the process can work independently and the database ensures they do not overlap.
It would help if you could let me know if the proposed architecture is:
a good/correct solution
which table columns can/should be indexed
any other considerations
I have worked on a differt kind of sceduler for a program that reported analyses on a specific moment of the month/week and what I did was combining the reports to so called business cycle based time moments. these moments are on the "start of a new week", "start of the month", "start/end of a D/W/M/Q/Y'. So I standardised the moments of sending the reports and added the id's to a table that would carry the details of the report. - now you add thinks to the cycle of you remove it when needed, you could do this by adding a tag like(EOD(end of day)/EOM (End of month) SOW (Start of week) ect, ect, ect,).
So you could index the moments of when the clients want to receive the reports and build on that track. Hope that this comment can help you with your challenge.
It seems good to simply query that metadata table by all 6 instances to check which is the next report to process as you are suggesting.
It seems odd though to have a staggered approach with a check once every 60 seconds offset by 10 seconds for your servers. You have 6 servers now but that may change. Also I don't understand the "locking" you are suggesting, why now simply set a flag on the row such as [State] = "processing", then the next scheduler knows to skip that row and move on to the next available one. Once a run is processed, you can simply update a [Date_last_processed] column, or maybe something like [last_cycle_complete] = 'YES'.
Alternatively you could have one server-process to go through the table, and for each available row, sends it off to one of the instances, in a round-robin fashion (or keep track of who is busy and who isn't).

How do I add new rows to SQL automatically by time?

I'm a pretty new programmer and I'm working on a project that I'm not sure how to make work. I'm hoping for some advice please.
Part of the project I'm working on will be used by a company to allow employees to sign up for lunch from their computers. I'm doing the project in MVC ASP.NET
The interface will look something like this:
----------------------
|1200 | Employee Dropdown Name 1
| Employee Dropdown Name 2
|---------------------
|1230 | Employee Dropdown Name 1
| Employee Dropdown Name 2
|---------------------
and on and on and on.
With this company, everything has to be recorded and stored. So, I already have a table with employee information. That will populate the drop down areas. Lunch times need to be stored in the database so it can be searched years down the line. So it has to be in a table.
The table get more tricky because not every time of the day is available for lunch (i.e. - no lunches after 0430 and before 0800).
My question is about how to create the future time slots in the database.
I could obviously make the table with all of these rows already in places for several years down the line. That's time-consuming, though, and I'll have to go back in in several years and fix it. Horrible idea.
What I'd LOVE to do is make it so every 24 hours, the database just automatically adds new rows with the next days times available - so just increment (at midnight, the program will just add the next day's times associated with that date (so at midnight on February 6, 2020, it will create February 7, 2020 0000, February 7, 2020 0030, etc. I've studied a lot but I'm still beside myself on how to make this work.
Thanks in advance everyone!!!
As I understand, you want to drive your interface from the database table so that the user can select Name 1 and Name 2 and a time slot and submit.
It sounds like you also want the available timeslots to be driven by the database also (ie, timeslot in table without names with it is availlable). This is not a good idea. As you mentioned, you would be inserting data that is not actually a record but a placeholder. That will be very confusing down the track when you come to query the data.
My approach would be to do the following:
* add NOT NULL constraints to all columns in your database (if your database supports this feature) or have your app complain very much about NULLS in any of the columns. There is no need for NULLS in your use case by the look of it.
the database should have a CHECK constraint that the time is within the allowable time range, and (assuming employees can not double book time slots) a CHECK constraint that there is no overlapping time slots, and also a UNIQUE constraint that ensures no duplicate times.... adjust to suit your needs.
your app populates times between 0800 and 1630 (8AM and 4:30PM) and also query the database for all records matching the current day so those booked slots can be removed from the list of available time slots... adjust to suit.
your app sends the user request of name and time slot to the DB. All the critical requirements are accepted or rejected by the DB schema and if there is something wrong, display an appropriate error in the app.
This way, your database is literally storing records of booked lunches.
I would NOT go down the path of pre inserting as then it becomes more complex as some records are "real" and some are artificially generated records to drive a GUI...
If you can't do the time slot calculations in your app rather than in the DB, then at least use a separate table that is maintained by a worker thread in your app OR if your DB supports it, a Stored Procedure which returns a table of available time slots.
I would use the stored procedure if I was avoiding doing complex time calculations in my app (also avoids need to worry about time zones - if you make sure to only store and display UTC times in your DB).
Having in mind structure like this:
LunchTimeSlots (id, time_slot)
Employee (id, name, preferred_time_slot_id, etc)
Lunches(employee_id, time_slot_id, date)
You need a scheduled job to add records to the "Lunches" table every midnight. How to define the job depends on your database vendor. But most of the popular rdbms have this feature. (f.e. mssql)
Despite it's possible to do what you want with db schedulers or any other scheduler, i would recommend to avoid such db design. It's always better to write real facts to the database like a list of employees or fact that lunch was served
to employee at 1pm today.
Unlike real facts, virtual data can be always generated "on-the-fly" by sql queries. F.e. by joining employees to list of dates from today till year 2100, we can get planned lunches for all employees for next 80 years.

How to store availability information in SQL, including recurring items

So I'm developing a database for an agency that manages many relief staff.
Relief workers set their availability for each day in one of three categories (day, evening, night).
We also need to be able to set some part-time relief workers as busy on weekly, biweekly, and in one instance, on a 9-week rotation. Since we're already developing recurring patterns of availability here, we might as well also give the relief workers the option of setting recurring availability days.
We also need to be able to query the database, and determine if an employee is available for a given day.
But here's the gotcha - we need to be able to use change data capture. So I'm not sure if calculating availability is the best option.
My SQL prototype table looks like this:
TABLE Availability Day
employee_id_fk | workday (DATETIME) | day | eve | night (all booleans)| worksite_code_fk (can be null)
I'm really struggling how to wrap my head around recurring events. I could create say, a years worth, of availability days following a pattern in 'x' day cycle. But how far ahead of time do we store information? I can see running into problems when we reach the end of the data set.
I was thinking of storing say, 6 months of information, then adding a server side task that runs monthly to keep the tables updated with 6 months of data, but my intuition is telling me this is a bad fix.
For absolutely flexibility in the future and keeping data from bloating my first thought would be something like
Calendar Dimension Table - Make it for like 100 years or Whatever you Want make it include day of week information etc.
Time Dimension Table - Hour, Minutes, every 15 what ever but only for 24 hour period
Shifts Table - 1 record per shift e.g. Day, Evening, and Night
Specific Availability Table - Relationship to Calendar & Time with Start & Stops recommend 1 record per day so even if they choose a range of 7 days split that to 1 record perday and 1 record per shift.
Recurring Availability Table - for day of week (1-7),Month,WeekOfYear, whatever you can think of. But again I am thinking 1 record per value so if they are available Mondays and Tuesday's that would be 2 rows. and if multiple shifts then it would be multiple rows.
Now and here is the perhaps the weird part, I would put a Available Column on the Specific and Recurring Availability Tables, maybe make it a tiny int and store something like 0 not available, 1 available, 2 maybe available, 3 available with notice.
If you want to take into account Availability with Notice you could add columns for that too such as x # of days. If you want full flexibility maybe that becomes a related table too.
The queries would be complex but you could use a stored procedure or a table valued function to handle it fairly routinely.

Monitoring Updates over Time Frames and/or SQL Query with Regex counting dates within string

This is probably a fork in the road question. I have a journal blog that date stamps a continuation of a single field within a record.
Example:
Proj #1 (ID): Notes (memo field:) 10/12/2012 - visited site. 10/11/2012 - updated information. 10/11/2012 - call client. 10/10/2012 - Input information.
Proj #2 (ID): Notes (memo field:) 10/10/12 - visited site. 10/10/2012 - call client. 10/9/2012 - Input information. 10/1/2012 - Started project. etc etc...
I need to count how many updates where made over a specific time frame. I know I can create a hidden field and add + 1 everytime there is an update which is useful for an OVERALL update count... but how can i keep track of number of updates over the last 5 days. Like the example above you may update it twice in one day and I may not care about updates made 2 weeks ago.
I think I need to create an SQL that counts the number of "dates" since 10/10/12 or since 10/2/12 etc.
I have done the SQL: SELECT memo FROM Projects WHERE memo IN ('%10/10/12%', '%10/9/2012%' etc)
and then the Len(memoStringCombined) - Len(Replace(searchword""etc)/Len(searchword) and it works fine for countings a single date... but if I have count multiple dates over 30 days it gets to be quite cumbersome to keep rewriting each search word. Is there a regex or obj that can loop through this for me?
Otherwise any other suggestions for counting updates between time frames would be greatly appreciated.
BTW - I can't really justify creating a new table dedicated to tracking updates because there will be 100's of updates for close to 10,000 records which means the update tracking table will be more monstrous than the data... or am I wrong with that idea too?

Design of Databases for storing the details of the recurrent occurrence of an event

I need to implement a feature similar to the one provided by Microsoft Outlook to make your meeting appointment recurrent. I am trying to figure out the optimized Database design that I will be requiring for implementing this feature.
The requirement is something like that each run or task entered by the user will also be applicable for scheduling like a recurrent event - weekly, monthly or yearly. Could you please suggest me the Database model - table structure (with constraints) for storing these details in the DB which can be afterwards accessed by the program to do the appropriate task. Screenshots for some of the possible scheduler details can be found at the following link.
We have a mysql DB running at the backend for storing these details. As soon as the user submits a request, a request id with the details of the request is stored in the table and then a action corresponding to it is taken by the program. More clarification would be like that the users intent is to run a sql script,getting the values and then performing statistical analysis to it. But as the oracle reference DB is dynamically updated by many users, he wants to run it in a recurrent manner and get the analysis done. Note that the mysql db and the ref DB are different.
Please let me know if you require any other details.!
I would suggest storing the details of the first occurence in one table (scheduled tasks) and then the recurance (recurring tasks) details in another.
I might also then be tempted to update the scheduled task table with the next occurance as each task is completed.
As for the Table layout, a rough sketch would be as follows:
[ScehduledTasks]
TaskId (Primary Key)
Description and Details etc...
Start Datetime
End Datetime
[RecurringTasks]
TaskId (Foreign Key)
Frequency : Daily, Weekly, Monthly or Yearly.
DayNo : What Day to run on (1-7 for weekly, 1-31 for monthly, 1-365 for yearly)
Interval : Every x weeks, months etc.
WeekOfMonth : first, second, third... etc If populated then DayNo specifies the day of the week.
MonthOfYear : 1-12.
EndDatetime : The last date to perform
Occurences : The number of times to perform. If this and the previous value are null then perform for ever.
Obvious certain fields would be blank depending on how the task was set up, but I think the above covers all you would need to emulate the tasks in Outlook.