Best way to pull data - sql

We have a SQL table that stores the date an email was created and then another table that gives details about that email (how long they spent writing the email, how long it was in a Draft mode etc). The join between the two tables is through a key.
The problem is, it only stores the date the email was created (entered into the system) and data is only written to the table when the email is completed (sent). So for yesterday, you may have 1000 emails completed that day, but their create date and time is varied. Also the create date time column is only in one table.
My method right now is to join the two tables, and in the where clause I calculate the completed date by adding the number of seconds of the email write time to the created date time.
WHERE DATEADD(s, ISNULL(a.emailwritetime,0), b.CreateDateTime) BETWEEN #start AND #end
(#start and #end are usually the previous day)
The tables have millions of rows, so expectedly, this takes a while to run and its hitting the production server to pull the data. Can anyone suggest a better/cleaner way of pulling the data? If you dont know what createdatetimes finished yesterday?

You should probably use a computed column for the value you're looking for.

Related

What is the best approach for bulk cleaning a database table that has a large amount of duplicated data loaded every day (snowflake db)

Thanks in advance for reading this, I hope I explain my problem.
In one of our domains, we have a pipeline (Multiple) where data flows from S3 into a snowflake staging table using airflow. The data itself originates from a number of different applications but the process is always the same. The data is extracted from the application by the support teams (multiple support teams across multiple countries, using different technologies), then into AWS S3 and then bulk loaded into snowflake. Due to limitations on the data from source their often isn't any filter on the data itself and effectively the staging table is loaded with the raw CSV every single day, a file date column is added to the data itself. The result is that we have tables that have been loaded with the same data every single day since 2009.
However the data does change, from day to day a column value will change and so the file date is very useful in tracking changed attributes and something that I want to exploit. Further if the data was cleansed we would need approximately 1% of the data.
These tables are huge some contain around 16 trillion rows but can we be quite narrow.
I would like to optimally loop through each days worth of data and then only load into the staging tables new data as apposed to just loading everything each day.
I have tried the following
A query that windows over the entire set and compares the hashed value of each row (minus the file date) and then only returns if it did not appear in the previous dates data set. This works but not for the larger tables as the warehouse starts to write to disk and then it takes hours.
A day by day loop that looks at each file date data set and compares to the previous day and only loads the difference, this takes to long on the initial clean of the tables but is what I am doing once the data has been cleaned and will form the initial load procedure.
The current solution is where I dynamically create multiple minus set statement where I look at each day minus the day before then batch these into blocks of 10-20 based of the average daily row size so as an example
INSERT INTO TEMP TABLE
(Select * FROM TABLE A WHERE FILE_DATE = 040123
MINUS
Select * FROM TABLE A WHERE FILE_DATE = 030123)
UNION ALL
(Select * FROM TABLE A WHERE FILE_DATE = 030123
MINUS
Select * FROM TABLE A WHERE FILE_DATE = 020123)
etc...
This is not pretty though does work however its taking me around 12 hours to process 70 odd tables.
I would like advice on if their is another approach.
Please bear in mind that I am limited to using snowflake due to resourcing issues and politics.
Any guidance and ideas would be much appreciated.
Regards

SQL query for limiting records

I have following SQL query in Data Flow, Control flow of SSIS package and I want to limit records by cutting off point, and that cut off point is current day/date from system. So, it should only display past records, not including todays. So, I think I need to use the specific field (which is date field - in the query its called 'FinalCloseDate' and compare with current system date and tell it to only to pull the records (perhaps < todays date) that happened before today or current system day.
Add
AND dbo.Producthit.FinalCloseDate < CAST(GETDATE() AS DATE)
to your WHERE clause.

Use domain of one table for criteria in another in ms Access query?

I am trying to create a report that displays 3 different numbers for each of my projects.
Contract Hours - Stored in projects table, 1 to 1 relationship
Worked Hours - Stored in linked table that will be updated using an external website reporting feature that will contain only data for the dates that are to be displayed in the report, one to many relationship needs to be a sum
Allocated Hours - Stored in a table in my database called allocations and contains data for all dates, one to many relationship needs to be summed.
Right now i have it set up in a way that the user has to type the data range for the report every time it is run, however the date range only actually applies to the Allocation data because the worked hours data comes filtered and the contract data is one to one.
What I would like to do is set up a query that can see the domain of the worked hours and apply it as a date criteria for the allocated hours.
I have attempted to use max and min values of the Worked hours and tried to get creative but I'm actually not even sure if this is possible because I cannot see any simple solution (although I know it should be possible and fairly simple)
Any help, suggestions, or recommendations are appreciated.

How to Query for Due Dates in Access 2007

I have a 2 access 2007 tables with the following fields:
Table 1: Loan Release Table
ReleaseDate as Date
Maturity as Date
MemberName as Text
MemberNo as Text
Term (in months) as Number
Mode (M/Q/Semi-Monthly) as Text
LoanType as Text
LoanAmount as Currency
LoanCode as Text
Table 2: Payments Table
ReceiptNo as Text
DatePaid as Date
MemberName as Text
MemberNo as Text
LoanCode as Text
LoanReceivable as Currency
InterestPaid as Currency
I would like to ask on how to use Query to create a temporary table that will display Members that should pay on current date or a specified date base on their Term, Mode of Payment and Loan Type (Regular Loans every 30 days to pay, Special Loans every 45 days to pay) and their remaining balance.
Here's my First Attempted Query: I tried to subtract 30 days from Current Date and it obviously gave me just the transactions last month. I would like it to list all transactions including those for example Member with Regular Loan 12 month term on their 3rd monthly payment, Member with Special Loan that is due today.
I am thinking of creating another table that contains the schedule of payments of every Loan released and then go from there.
Is there another way than this? Like a Query that can be run everyday without the need for a bulky ScheduleOfPayments table?
I'm an office clerk who 'graduated' from Excel and a novice using Access at worst and I'm not afraid of VBA codes if that is necessary.
If you know of a better way of doing this, please do tell me or point me in the right direction. I'm all for learning new things and having read and learned a lot from stackoverflow before, I am sure that with your help, my question is as good as solved.
Thank you guys for reading my inquiry.
You have here two solutions:
You can write a procedure that will, when needed, calculate\generate a matrix containing payment schedule for each loan and compare it to payment done.
You can write a procedure that will, when a loan is created, generate corresponding records in a payment schedule table. further comparison will be done between the ScheduledPayment table and the Payment table.
So basically you have to manage a similar set of data, either as a calculated/on the fly matrix or as a permanent set of data kept in a table.
The second version is by very very far the most effective one. You think of it as bulky but it is exactly the opposite, and indeed what is done every time you get a loan from a bank, where your banker will let you sign the reimbursement schedule.
The table solution will allow you to make use of all querying facilities, while the calculated solution will force you to write specific procedures each time you'll want to do some data mining. Just think about a question like "What are the expected reimbursements for the month of April 2014?". Answering this question with the ScheduledPayment table will be as easy as getting a cafe out of your nespresso machine. The same answer without the ScheduledPayment table will be like having to do the whole coffee production process before getting your cup ready.

I Need Help Fixing My Small Time Sheet Table - Relational DB - SQL Server

I have a TimeSheet table as:
CREATE TABLE TimeSheet
(
timeSheetID
employeeID
setDate
timeIn
outToLunch
returnFromLunch
timeOut
);
Employee will set his/her time sheet daily, i want to ensure that he/she doesn't cheat. What should i do?
Should i create a column that gets date/time of the system when insertion/update happens to the table and then compare the created date/time with the time employee's specified - If so in this case i will have to create date/time column for timeIn, outToLunch, returnFromLunch and timeOut. I don't know, what do you suggest?
Note: i'm concerned about tracking these 4 columns timeIn, outToLunch, returnFromLunch and timeOut
The single table design only allows an employee one break (I'm guessing that lunch is not paid). And it would be difficult to detect fraud short of auditing every record change. I'm thinking something like a two table approach would be more flexible and more secure.
Start by creating a TimeSheetDetail record for every event. i.e. Shift Start, Break Start, Break Stop, Shift End. Allow the employee to record whatever date and time in the Entered column. There may be legitimate cases where an employee forget to clock in or out.
It would be very easy to detect fraud by comparing the Entered value to the AddedOn value before Payroll or any other time an audit is needed. You could even detect small fraud where an employee constantly rounds up or down in their favor every day. Ten minutes every day over the course of a year adds up to extra week.
This design can be furthered secured by not allowing record updates or deletes.
CREATE TABLE TimeSheet
(
TimeSheetId
EmployeeId
AddedOn //populate using GETDATE()
AddedBy //populate using SUSER_SNAME()
);
CREATE TABLE TimeSheetDetail
(
TimeSheetDetailId
TimeSheetId
Type //Shift Start, Shift End, Break Start, Break End
Entered
AddedOn //populate using GETDATE()
AddedBy //populate using SUSER_SNAME()
);
If you're that concerned about employee dishonesty about their working hours, then install a manual punch card clock in/clock out system and treat them like factory shop floor workers.
Failing that, a trigger that archives off the changed record with a date-time stamp against it will allow you to see at what time every change to a timesheet was made, and a case for fraud could be made. So you'd need something like a TimeSheetHistory table, with the additional columns for time of change and user making the change (populated using GETDATE() or similar, and SUSER_SNAME() or similar if you're using Windows authentication).
Of course you are concerned about this, that is one of the basic requirements for most time sheet applications! No one should be able to change their own time sheet once submitted without a supervisor override. This is to prevent time-card fraud and thus is a legal issue and should not be subverted. Employees who get apid overtime could submit a correct timesheet for approval by the supervisor, then change it to add hours just before payroll is run and then change it back otherwise. This is critical feature that any timesheet application must have.
First, you need to have a history table to store a record of all the changes and who made them.
Next you need an update trigger that prevents updates unless a timesheet has been reopened.
Third you need a field for timesheet status. A insert/update trigger will ensure that only people in the management group can change a submitted status to a reutrned status and that no one can return his own timesheet to without a differentperson approving it. In the terms I learned when working for an audit agency, this is an internal control becasue it is known that it is far less likely that two people will join together to commit fraud than one person.