Use domain of one table for criteria in another in ms Access query? - sql

I am trying to create a report that displays 3 different numbers for each of my projects.
Contract Hours - Stored in projects table, 1 to 1 relationship
Worked Hours - Stored in linked table that will be updated using an external website reporting feature that will contain only data for the dates that are to be displayed in the report, one to many relationship needs to be a sum
Allocated Hours - Stored in a table in my database called allocations and contains data for all dates, one to many relationship needs to be summed.
Right now i have it set up in a way that the user has to type the data range for the report every time it is run, however the date range only actually applies to the Allocation data because the worked hours data comes filtered and the contract data is one to one.
What I would like to do is set up a query that can see the domain of the worked hours and apply it as a date criteria for the allocated hours.
I have attempted to use max and min values of the Worked hours and tried to get creative but I'm actually not even sure if this is possible because I cannot see any simple solution (although I know it should be possible and fairly simple)
Any help, suggestions, or recommendations are appreciated.

Related

What is the best approach for bulk cleaning a database table that has a large amount of duplicated data loaded every day (snowflake db)

Thanks in advance for reading this, I hope I explain my problem.
In one of our domains, we have a pipeline (Multiple) where data flows from S3 into a snowflake staging table using airflow. The data itself originates from a number of different applications but the process is always the same. The data is extracted from the application by the support teams (multiple support teams across multiple countries, using different technologies), then into AWS S3 and then bulk loaded into snowflake. Due to limitations on the data from source their often isn't any filter on the data itself and effectively the staging table is loaded with the raw CSV every single day, a file date column is added to the data itself. The result is that we have tables that have been loaded with the same data every single day since 2009.
However the data does change, from day to day a column value will change and so the file date is very useful in tracking changed attributes and something that I want to exploit. Further if the data was cleansed we would need approximately 1% of the data.
These tables are huge some contain around 16 trillion rows but can we be quite narrow.
I would like to optimally loop through each days worth of data and then only load into the staging tables new data as apposed to just loading everything each day.
I have tried the following
A query that windows over the entire set and compares the hashed value of each row (minus the file date) and then only returns if it did not appear in the previous dates data set. This works but not for the larger tables as the warehouse starts to write to disk and then it takes hours.
A day by day loop that looks at each file date data set and compares to the previous day and only loads the difference, this takes to long on the initial clean of the tables but is what I am doing once the data has been cleaned and will form the initial load procedure.
The current solution is where I dynamically create multiple minus set statement where I look at each day minus the day before then batch these into blocks of 10-20 based of the average daily row size so as an example
INSERT INTO TEMP TABLE
(Select * FROM TABLE A WHERE FILE_DATE = 040123
MINUS
Select * FROM TABLE A WHERE FILE_DATE = 030123)
UNION ALL
(Select * FROM TABLE A WHERE FILE_DATE = 030123
MINUS
Select * FROM TABLE A WHERE FILE_DATE = 020123)
etc...
This is not pretty though does work however its taking me around 12 hours to process 70 odd tables.
I would like advice on if their is another approach.
Please bear in mind that I am limited to using snowflake due to resourcing issues and politics.
Any guidance and ideas would be much appreciated.
Regards

SQL - How do I expand a dataset to do a cohort analysis?

This is my first post so apologies if something is posted incorrectly - please let me know and I will fix it.
I am trying to build a SQL query in bigQuery that creates a cohort analysis so I can see how many customers have been retained over time by the month they joined (their cohort).
I work in insurance, so we have data on customers, when they joined, and any time they changed the policy (e.g., when they added a car to their coverage), but I do not have it laid out as every month of premium. The data is as follows:
Data vs how I need the data
Do you know how I could fill in the missing months?

How do I add new rows to SQL automatically by time?

I'm a pretty new programmer and I'm working on a project that I'm not sure how to make work. I'm hoping for some advice please.
Part of the project I'm working on will be used by a company to allow employees to sign up for lunch from their computers. I'm doing the project in MVC ASP.NET
The interface will look something like this:
----------------------
|1200 | Employee Dropdown Name 1
| Employee Dropdown Name 2
|---------------------
|1230 | Employee Dropdown Name 1
| Employee Dropdown Name 2
|---------------------
and on and on and on.
With this company, everything has to be recorded and stored. So, I already have a table with employee information. That will populate the drop down areas. Lunch times need to be stored in the database so it can be searched years down the line. So it has to be in a table.
The table get more tricky because not every time of the day is available for lunch (i.e. - no lunches after 0430 and before 0800).
My question is about how to create the future time slots in the database.
I could obviously make the table with all of these rows already in places for several years down the line. That's time-consuming, though, and I'll have to go back in in several years and fix it. Horrible idea.
What I'd LOVE to do is make it so every 24 hours, the database just automatically adds new rows with the next days times available - so just increment (at midnight, the program will just add the next day's times associated with that date (so at midnight on February 6, 2020, it will create February 7, 2020 0000, February 7, 2020 0030, etc. I've studied a lot but I'm still beside myself on how to make this work.
Thanks in advance everyone!!!
As I understand, you want to drive your interface from the database table so that the user can select Name 1 and Name 2 and a time slot and submit.
It sounds like you also want the available timeslots to be driven by the database also (ie, timeslot in table without names with it is availlable). This is not a good idea. As you mentioned, you would be inserting data that is not actually a record but a placeholder. That will be very confusing down the track when you come to query the data.
My approach would be to do the following:
* add NOT NULL constraints to all columns in your database (if your database supports this feature) or have your app complain very much about NULLS in any of the columns. There is no need for NULLS in your use case by the look of it.
the database should have a CHECK constraint that the time is within the allowable time range, and (assuming employees can not double book time slots) a CHECK constraint that there is no overlapping time slots, and also a UNIQUE constraint that ensures no duplicate times.... adjust to suit your needs.
your app populates times between 0800 and 1630 (8AM and 4:30PM) and also query the database for all records matching the current day so those booked slots can be removed from the list of available time slots... adjust to suit.
your app sends the user request of name and time slot to the DB. All the critical requirements are accepted or rejected by the DB schema and if there is something wrong, display an appropriate error in the app.
This way, your database is literally storing records of booked lunches.
I would NOT go down the path of pre inserting as then it becomes more complex as some records are "real" and some are artificially generated records to drive a GUI...
If you can't do the time slot calculations in your app rather than in the DB, then at least use a separate table that is maintained by a worker thread in your app OR if your DB supports it, a Stored Procedure which returns a table of available time slots.
I would use the stored procedure if I was avoiding doing complex time calculations in my app (also avoids need to worry about time zones - if you make sure to only store and display UTC times in your DB).
Having in mind structure like this:
LunchTimeSlots (id, time_slot)
Employee (id, name, preferred_time_slot_id, etc)
Lunches(employee_id, time_slot_id, date)
You need a scheduled job to add records to the "Lunches" table every midnight. How to define the job depends on your database vendor. But most of the popular rdbms have this feature. (f.e. mssql)
Despite it's possible to do what you want with db schedulers or any other scheduler, i would recommend to avoid such db design. It's always better to write real facts to the database like a list of employees or fact that lunch was served
to employee at 1pm today.
Unlike real facts, virtual data can be always generated "on-the-fly" by sql queries. F.e. by joining employees to list of dates from today till year 2100, we can get planned lunches for all employees for next 80 years.

SSRS - Report doesn't loads via Report Builder, while it does via SQL

this is my first question here.
I struggled days and days trying to find a solution everywhere with no success.
Basically I have a standard stored procedure pulling out a report dataset in a few seconds (5-6 seconds).
It aggregates (GROUPING BY and SUMMING) 23000 rows.
Indeed, my final dataset comes out with 4 rows and 33 columns executing, as said, in 5-6 seconds.
Unfortunately, while trying to load it via ReportBuilder, it loads endlessly (querying SQL Server, the StoredProcedure remains stuck in a RUNNING status forever).
Everything on ReportBuilder (DB Accesses, Dataset, Parameters, Matrix....) is right configured: I was indeed able to load it until I added a few additional (4) fields.
The SQL dataset is basically something like:
PARAMETERS DECLARATION
SELECT
FIELDS
FROM
(SELECT
FIELD A
SUMS
FROM
TABLE
JOIN TABLES
WHERE
PARAMETERS MATCHING
GROUP BY A
) AS B
ORDER BY FIELD
An "external layer" SELECT was needed to make some calculations on some FIELDS, also in some cases using some PARAMETERS.
That's it.
I use to work with huge datasets, sometimes pulling out 30,000 rows with 110 fields, but if something loads via SQL it also does always via ReportBuilder: this is the very first time it behaves in this different way.
So I'm asking if there are some strange SSRS/ReportBuilder limitations I never faced in my experience.
Any help would be really really appreciated!
Thanks in advance to everyone who'll spend time :)

Use a Query from the Destination db to limit OLE DB Source task in SSIS 2008

All,
I have a package that I'm building as a data importer so I can copy sets of data from my production environment and develop on another instance.
I have two tables that contain header and detail rows for service tickets. Those service tickets are tied back to orders.
I am pulling the service tickets from a certain time window, however, the originating orders fall outside of the date range that I'm pulling for the tickets.
I want to be able to take the following steps in an SSIS package:
Import the header and detail rows within the given date range from prod to dev
Select the relevant order numbers from dev tables
Use the list of order numbers to import only the relevant orders from prod
I poked through other answers and couldn't find answers that addressed this directly, so I apologize if there is an answer out there and I missed it. I may not have been asking the question correctly. I'm assuming that I would need to pull those order numbers into a temp table or variable in order to apply them as a filter.
As I write this, it just crossed my mind to use a join on the source system with the ticket to order tables and still use the date range to limit, but I'm still posting the question to see if anyone has dealt with this before.
Your steps are already fairly clear, are you asking how to actually implement them? It looks like you can do all three steps by using SELECT statements in your data sources:
Build a SELECT statement dynamically with the correct dates to use in your data source. The dates could be programmatically generated in a script task, or saved in a database table and populated into variables. Then you copy the data across to the dev system.
Run a SELECT statement in the dev system that returns the order numbers, and copy the results to a table in the prod database.
Run a SELECT statement in the prod database that joins on the table from step 2 and copy the results back to dev.
An alternative to the table in steps 2 and 3 would be a lookup transformation, but if you have a large number of rows then using a table will probably be faster.