Store Report Data in a SQL table - sql

I have a report that is ran every quarter. The report is based on current values and creates a score card. We do this for about 50 locations and then have to manually create a report to compare the previous run to the current run. I'd like to automate by taking the report data and saving it to a table for each location and each quarter, then we can run reports that will show the data changes over time.
Data Sample:
Employees Active
Employees with ref checks
Clients Active
Clients with careplans
The reports are fairly complex and pulling data from many different tables so creating this via a query may not work or be just as complex. Any ideas on how to get the report data to a table without having to export each to a CSV or Excel file then importing manually?

If each score card has some dimensions (or metric names) and aggregate values (or metric values) then you can just add a time series table with columns for:
date
location or business unit
or instead of date and location, a scorecard ID (linking to another table with scorecard metadata)
dimension grouping
scores/values/metrics
Then, assuming you're creating the reports with a stored procedure, you can add a flag parameter to the stored procedure to update this table while generating a specific report on a date. This might be less work and/or faster than importing from CSVs, if you store intermediate report data into a temporary table that you can select from when additionally storing the data into the time series table described above.

Related

What is the best approach for bulk cleaning a database table that has a large amount of duplicated data loaded every day (snowflake db)

Thanks in advance for reading this, I hope I explain my problem.
In one of our domains, we have a pipeline (Multiple) where data flows from S3 into a snowflake staging table using airflow. The data itself originates from a number of different applications but the process is always the same. The data is extracted from the application by the support teams (multiple support teams across multiple countries, using different technologies), then into AWS S3 and then bulk loaded into snowflake. Due to limitations on the data from source their often isn't any filter on the data itself and effectively the staging table is loaded with the raw CSV every single day, a file date column is added to the data itself. The result is that we have tables that have been loaded with the same data every single day since 2009.
However the data does change, from day to day a column value will change and so the file date is very useful in tracking changed attributes and something that I want to exploit. Further if the data was cleansed we would need approximately 1% of the data.
These tables are huge some contain around 16 trillion rows but can we be quite narrow.
I would like to optimally loop through each days worth of data and then only load into the staging tables new data as apposed to just loading everything each day.
I have tried the following
A query that windows over the entire set and compares the hashed value of each row (minus the file date) and then only returns if it did not appear in the previous dates data set. This works but not for the larger tables as the warehouse starts to write to disk and then it takes hours.
A day by day loop that looks at each file date data set and compares to the previous day and only loads the difference, this takes to long on the initial clean of the tables but is what I am doing once the data has been cleaned and will form the initial load procedure.
The current solution is where I dynamically create multiple minus set statement where I look at each day minus the day before then batch these into blocks of 10-20 based of the average daily row size so as an example
INSERT INTO TEMP TABLE
(Select * FROM TABLE A WHERE FILE_DATE = 040123
MINUS
Select * FROM TABLE A WHERE FILE_DATE = 030123)
UNION ALL
(Select * FROM TABLE A WHERE FILE_DATE = 030123
MINUS
Select * FROM TABLE A WHERE FILE_DATE = 020123)
etc...
This is not pretty though does work however its taking me around 12 hours to process 70 odd tables.
I would like advice on if their is another approach.
Please bear in mind that I am limited to using snowflake due to resourcing issues and politics.
Any guidance and ideas would be much appreciated.
Regards

Multi user Saving Query in SQL

I am working on VS 2010 C# and SQL Server 2014.
My entry program generate new transaction number at saving and that number store in tables, I have header and detail concept in tables, meaning that in header table this data is stored:
party Name/Code,
Transaction No
Date
and in the details table, these piece of information are stored:
Item Code/Name
Unit
Rate
Quantity
In the header table, I have an incremented column i.e. SrNo as integer.
On saving, 1st save in header table then get the max SrNo from header table and then that store in the detail table.
This is working fine on a single user/machine, but when on multi user / machines when save at the same time, data store on different SrNo in detail table.
How can I store entry when multi user save same program at the same time?

Use domain of one table for criteria in another in ms Access query?

I am trying to create a report that displays 3 different numbers for each of my projects.
Contract Hours - Stored in projects table, 1 to 1 relationship
Worked Hours - Stored in linked table that will be updated using an external website reporting feature that will contain only data for the dates that are to be displayed in the report, one to many relationship needs to be a sum
Allocated Hours - Stored in a table in my database called allocations and contains data for all dates, one to many relationship needs to be summed.
Right now i have it set up in a way that the user has to type the data range for the report every time it is run, however the date range only actually applies to the Allocation data because the worked hours data comes filtered and the contract data is one to one.
What I would like to do is set up a query that can see the domain of the worked hours and apply it as a date criteria for the allocated hours.
I have attempted to use max and min values of the Worked hours and tried to get creative but I'm actually not even sure if this is possible because I cannot see any simple solution (although I know it should be possible and fairly simple)
Any help, suggestions, or recommendations are appreciated.

SQL Server Query: Daily Data Snapshot Comparison (Counting Delta Occurrences)

I am working towards counting customer subscription ("package") changes. To do this, I am selecting all data from my package table once, every day. I am calling the daily query results "snapshots" (approx 500k rows). I then load the snapshot data into a new table. After 10 days I have a total of 5 million rows in the snapshots table (500k rows * 10 days). The majority of customers do not changes packages (65%). I need to report which customers, of the remaining 35%, are switching packages, when they are switching packages, what package changes they are making (from "package X" to "package y") and which customers are changing packages most frequently.
The query I have written uses a self-join. I am identifying the changes but my results contain duplicate rows.
This is my query:
select *
from UserPackageDump UPD1, UserPackageDump UPD2
where UPD1.user_id = UPD2.user_id
and UPD1.package_id <> UPD2.package_id
How can I change this query to yield only distinct results?
SELECT
DISTINCT *
FROM
UserPackageDump UPD1
JOIN UserPackageDump UPD2
ON UPD1.user_id = UPD2.user_id
WHERE
UPD1.package_id <> UPD2.package_id
You have many options for doing this, and I'm not sure your approach is the right one to take. Firstly to answer your specific question, you could perform a DISTINCT as per #sqlab's answer. Or you could include the date in the join, ensuring that UDP1 only matches a record in UDP2 that is one day different.
However, to come back to the approach, there should be no need to take a full copy of all the data. You have lots of other options for more efficient data storage, some of which being:
Put a "LastUpdated" datetime2 field in the database, to be populated each time the row is changed. Copy only those rows that have a LastUpdated more recent than the last time the copy was made. Assuming the only change possible to the table is to change the package_id then you will now only have rows in the table for users that have changed.
Create a UserPackageHistory table into which rows are written each time a user subscribes to a package, at the same time that UserPackage is updated. This then leaves you with much the same result as the first bullet, but in advance of running the copy job.
Then, with any one of these sets of data, to satisfy the reporting requirements you could populate a cube. Your source would be a set of rows containing user_id, old_package_id, new_package_id and date. You would create a measure group containing these measures:
Distinct count of user_id
Count of switches (basically just the row count of the source data)
This measure group could then be related to the following dimensions:
Date, so you can see when the switches are taking place
User, so you can drill down on who is switching
Switch Type, which is a dimension built from the selecting the old_package_id and new_package_id from your source data. This gives you the ability to see the popularity of particular shifts.

How to autofill table column in MS Access?

I am trying to develop a simple database that stores taking information for a taxi daily figure etc. and there are some calculations that I would like to have auto-filled from basic information supplied by the driver, such as:
gross takings given a start and end value from a metering device km's
driven given an odometer reading driver owner split given the days takings
The problem I have is I would like to store all these values in a single attribute to make retrieval and entry into another third party system easier. Currently this is paper based and I am trying to digitize this process
The operations are simple mathematical expressions such as addition subtraction and percentage split (multiplication or division)
I've tried various sql commands like
INSERT INTO table (fieldname)
select
table.feildname1, table.feildname2, [feildname2]-[fieldname1]
from
table
I will be using a input form for data entry that will display the basic data input and a drivers share of takings/expenses based upon these calculations
And I'm drawing a blank I'm using ms access 2007
You can do:
INSERT INTO table (fieldname)
SELECT CStr(table.feildname1) & CStr(table.feildname2) & CStr([feildname2]-[fieldname1])
FROM table;
But as #Tarik said, it is not recommended to store all fields in one column, unless it is some temp table or just for view.