Stored proc to update a table from using data stored in a file on network - sql

I have this table in a database that holds the number of a countries credit card transactions broken out in 10 minute intervals. I have a separate process which generates Excel files for updated transaction counts every 10 minutes. I want to create a stored procedure that reads the data in from these network folders every 10 minutes, and I want to compare a countries most recent 10 minute period to its prior 10 minute period. If it increases by more than 10 percent, it should alert a handful of people with an email alert. I know that I will need a stored procedure to append this data to the current transaction table to the database, but I was wondering if SQL Server Job Agent is to be used when attempting to read in this file every 10 minutes, and then again when the potential alert is sent out?

Related

report scheduler system design using database as master

Problem
we have ~50k scheduled financial reports that we periodically deliver to clients via email
reports have their own delivery frequency (date&time format - as configured by clients)
weekly
daily
hourly
weekdays only
etc.
Current architecture
we have a table called report_metadata that holds report information
report_id
report_name
report_type
report_details
next_run_time
last_run_time
etc...
every week, all 6 instances of our scheduler service poll the report_metadata database, extract metadata for all reports that are to be delivered in the following week, and puts them in a timed-queue in-memory.
Only in the master/leader instance (which is one of the 6 instances):
data in the timed-queue is popped at the appropriate time
processed
a few API calls are made to get a fully-complete and current/up-to-date report
and the report is emailed to clients
the other 5 instances do nothing - they simply exist for redundancy
Proposed architecture
Numbers:
db can handle up to 1000 concurrent connections - which is good enough
total existing report number (~50k) is unlikely to get much larger in the near/distant future
Solution:
instead of polling the report_metadata db every week and storing data in a timed-queue in-memory, all 6 instances will poll the report_metadata db every 60 seconds (with a 10 s offset for each instance)
on average the scheduler will attempt to pick up work every 10 seconds
data for any single report whose next_run_time is in the past is extracted, the table row is locked, and the report is processed/delivered to clients by that specific instance
after the report is successfully processed, table row is unlocked and the next_run_time, last_run_time, etc for the report is updated
In general, the database serves as the master, individual instances of the process can work independently and the database ensures they do not overlap.
It would help if you could let me know if the proposed architecture is:
a good/correct solution
which table columns can/should be indexed
any other considerations
I have worked on a differt kind of sceduler for a program that reported analyses on a specific moment of the month/week and what I did was combining the reports to so called business cycle based time moments. these moments are on the "start of a new week", "start of the month", "start/end of a D/W/M/Q/Y'. So I standardised the moments of sending the reports and added the id's to a table that would carry the details of the report. - now you add thinks to the cycle of you remove it when needed, you could do this by adding a tag like(EOD(end of day)/EOM (End of month) SOW (Start of week) ect, ect, ect,).
So you could index the moments of when the clients want to receive the reports and build on that track. Hope that this comment can help you with your challenge.
It seems good to simply query that metadata table by all 6 instances to check which is the next report to process as you are suggesting.
It seems odd though to have a staggered approach with a check once every 60 seconds offset by 10 seconds for your servers. You have 6 servers now but that may change. Also I don't understand the "locking" you are suggesting, why now simply set a flag on the row such as [State] = "processing", then the next scheduler knows to skip that row and move on to the next available one. Once a run is processed, you can simply update a [Date_last_processed] column, or maybe something like [last_cycle_complete] = 'YES'.
Alternatively you could have one server-process to go through the table, and for each available row, sends it off to one of the instances, in a round-robin fashion (or keep track of who is busy and who isn't).

How long a temporary table and a Job goes on bigquery goes for default?

By default how long a temp table will be on to get data? I know that we can set up the expiration, but what is the default?
And about the job? What's the default expiration time if it has?
I tried to find these in the documentation but i couldn't find, we return the jobId to the client so he can get the data when the job is complete, but some of them like to store and tries to fetch data with a jobId from 2 weeks ago, 1 month ago.
What's the default time here so i can explain them better?
Query results are stored for 24 hours:
All query results, including both interactive and batch queries, are cached in temporary tables for approximately 24 hours with some exceptions.
https://cloud.google.com/bigquery/docs/cached-results
As was mentioned by Alexey, they query results are stored for 24 hours when using cache.
Regarding the timelife for BigQuery jobs, you can get the job history for the last six months
By the other side, based on your description, creating a new table from your query results with expiration time seems to be the most appropriate strategy. Also, you could check if using materialized views could help you to store some results of recurring queries.

Select Data between current time - 15 mins and current time in SQL

I am looking to pull data between two time periods at only 15 to 30 mins apart. I want to be able to rerun the code multiple times to constantly update the data I had already pulled. I know there is a function for current system time but I am unable to use it effectively in SQL developer.
I have tried using the function CURRENT_TIMESTAMP but could not get it to work effectively.
Currently i am using the following code and just pulling over a broad time frame, but i would like to shrink that down to 15 to 30 minute intervals that could be used to continue to pull updated data.
I expect to be able to pull current data within 15 to 30 minute segments of time.

Analysis to be carried out in SQL

I have a database which captures everyday every column data as binary 0 or 1 in a SQL database. Now I want to calculate what is the avg no. of days a user takes to complete that column. How can I run the query for that?
Please note timeline for every user is different and I want data to be in 3 days slab (for example 1-3,4-6 so on and so forth).

Movable window of fixed length of time - SQL

I have a database of about 100 million customer relation records and 3 million distinct clients
I need to write a SQL script to work out which clients have registered 5 or more complaints within 30 days of each other, over the entire history of the client
I thought that a window function would be the answer but I haven't had any luck
Any ideas would be useful, but efficient ones would be even better, as I have low system priority and my codes takes hours to run.