How to speed up an sql query - sql

I have a table which lists advertising actions by brands. Each action is recorded by one entry as shown in the excel file which you can see in the drive folder below. There are about 7mil records in the table.
I am running several queries which run fine but i am facing an issue with one specific report. This report lists and ranks the clients (Column L in Excel file) by total amounts spent on advertising. This report takes about a minute to generate if the date range set is anywhere from two months to two years.
Below is a copy of the query
select customer_id, sum(value) as value
from `data`
where ((`date` >= '2019-01-01' and `date` <= '2019-12-31'))
group by `customer_id`
order by `value` desc, `customer_id` asc;
This report takes about a minute to generate if the date range set is anywhere from 2 months to two years. If my date range selection is for one month or less, it takes less than 3 seconds.
I need to get this query to process in less than 10-15 seconds max. We tried to come up with ideas such as creating a new table in the DB especially for this query but it hit a wall when we saw that we still had to keep all date records, and we could therefore not group the results in the table.
We really are open to any kind of idea that would make this query faster, including DB changes.
Below is the link to the folder which contains a copy of the DB structure and a sample data set exported from the data table which contains all the data.
Drive Folder

You should make an index on fields that are essential to the criteria of your query. In this case something like:
CREATE INDEX `idx_data_date` ON `data`(`date`);
CREATE INDEX `idx_data_customer_id` ON `data`(`customer_id`);
You should also avoid storing dates as text. Use DATETIME if you can.

Related

Calculating the difference in days from two records in an Access Database

I am creating an Access Database from a very complex Excel Spreadsheet. The process has been going well until I got to this problem. The solution is easy in Excel, but I cannot figure out how to do it in Access.
Here is what I had before in Excel.
I had a list of Customers on one sheet with multiple fields. I then had another sheet act as a report that would run a VBA macro to search through the table of all customers and list out every customer by name that was an inbound call from our contact center (Que Call), when that call came and then would calculate a third column for the number of days between calls. This last column is where I am running into difficulties translating to Access. In Excel, I would just have it do something like in cell C3 =SUM(B3-B2). Given that the table looked like this:
Column A Column B Column C
Row 1 Name Date Time Lapse
Row 2 Customer 1 7/1/2019 ----------
Row 3 Customer 2 7/2/2019 =SUM(B3-B2) <-- 1 day
Row 4 Customer 3 7/4/2019 =SUM(B4-B3) <-- 2 days
In Access:
I have a report that goes through my table of customers and lists off only those from our contact center (Que Call), but I can't figure out how to put in the calculation of time between calls as the design only allows me to affect one row. How do I make this calculation? Is it a SQL query that I need to do? I would prefer to not have to have a separate table for call center calls or a separate column in my customers table to calculate this as some customers are not from the call center. Can I just run a report or a query. Any advise or help would be greatly appreciated.
Current SQL Code:
SELECT
[Customers].FullName,
[Customers].ID,
[Customers].QueCall,
[Customers].Status,
[Customers].InterestLevel,
[Customers].State,
[Customers].Product,
[Customers].Created,
[Customers].LastContact,
[Customers].PrimaryNote
FROM
Customers
WHERE
((([Customers].QueCall)=True));
ORDER BY
[Customers].Created;
Describe exactly how it isn't working (error message, unexpected results, etc...)
It just lists out the customers and does not allow me to calculate the difference between when the records were created (ie when they were first contacted). I have found many things online about how to calculate the difference between two columns of the same record, but not between two different records; nor two different records that may not be sequentially after each other as there may be other non Que Call customers between records in the customer table.
Describe the desired results
I would like to have a column in the end report that shows how many days lapsed between records that were que calls.
Thank you in advance for any input that you may have.
Consider a correlated aggregate subquery where an inner query from same source, Customer, is correlate with outer query by same ID (assumed to be unique identifier) with date comparison (assumed to be Created field). Notice the use of table alias, c and sub for the correlation.
Use DateDiff for difference between dates. To use this query, place below query into the SQL mode of Query Designer and save the object to be used as recordsources to forms, reports, opened on its own, or used in application code as recordsets.
SELECT
c.FullName,
c.ID,
c.QueCall,
c.Status,
c.InterestLevel,
c.State,
c.Product,
c.Created,
c.LastContact,
c.PrimaryNote,
(SELECT TOP 1 SUM(DateDiff("d", sub.Created, c.Created))
FROM Customer sub
WHERE sub.ID = c.ID
AND sub.Created < c.Created
GROUP BY sub.Created
ORDER BY sub.Created DESC) AS TimeElapsed
FROM
Customers c
WHERE
(((c.QueCall)=True));
ORDER BY
c.Created;
Do be aware for large tables this correlated subquery can be taxing in time and performance. Allow time to complete and look into storing output in a temp table with a Make-Table Query to avoid re-run.

MS Access - Log daily totals of query in new table

I have an ODBC database that I've linked to an Access table. I've been using Access to generate some custom queries/reports.
However, this ODBC database changes frequently and I'm trying to discover where the discrepancy is coming from. (hundreds of thousands of records to go through, but I can easily filter it down into what I'm concerned about)
Right now I've been manually pulling the data each day, exporting to Excel, counting the totals for each category I want to track, and logging in another Excel file.
I'd rather automate this in Access if possible, but haven't been able to get my heard around it yet.
I've already linked the ODBC databases I'm concerned with, and can generate the query I want to generate.
What I'm struggling with is how to capture this daily and then log that total so I can trend it over a given time period.
If it the data was constant, this would be easy for me to understand/do. However, the data can change daily.
EX: This is a database of work orders. Work orders(which are basically my primary key) are assigned to different departments. A single work order can belong to many different departments and have multiple tasks/holds/actions tied to it.
Work Order 0237153-03 could be assigned to Department A today, but then could be reassigned to Department B tomorrow.
These work orders also have "ranking codes" such as Priority A, B, C. These too can be changed at any given time. Today Work Order 0237153-03 could be priority A, but tomorrow someone may decide that it should actually be Priority B.
This is why I want to capture all available data each day (The new work orders that have come in overnight, and all the old work orders that may have had changes made to them), count the totals of the different fields I'm concerned about, then log this data.
Then repeat this everyday.
the question you ask is very vague so here is a general answer.
You are counting the items you get from a database table.
It may be that you don't need to actually count them every day, but if the table in the database stores all the data for every day, you simply need to create a query to count the items that are in the table for every day that is stored in the table.
You are right that this would be best done in access.
You might not have the "log the counts in another table" though.
It seems you are quite new to access so you might benefit form these links videos numbered 61, 70 here and also video 7 here
These will help or buy a book / use web resources.
PART2.
If you have to bodge it because you can't get the ODBC database to use triggers/data macros to log a history you could store a history yourself like this.... BUT you have to do it EVERY day.
0 On day 1 take a full copy of the ODBC data as YOURTABLE. Add a field "dump Number" and set it all to 1.
1. Link to the ODBC data every day.
join from YOURTABLE to the ODBC table and find any records that have changed (ie test just the fields you want to monitor and if any of them have changed...).
Append these changed records to YOURTABLE with a new value for "dump number of 2" This MUST always increment!
You can now write SQL to get the most recent record for each primary key.
SELECT *
FROM Mytable
WHERE
(
SELECT PrimaryKeyFields, MAX(DumpNumber) AS MAXDumpNumber
FROM Mytable
GROUP BY PrimaryKeyFields
) AS T1
ON t1.PrimaryKeyFields = Mytable.PrimaryKeyFields
AND t1.MAXDumpNumber= Mytable.DumpNumber
You can compare the most recent records with any previous records.
ie to get the previous dump
Note that this will NOT work in the abvoe SQL (unless you always keep every record!)
AND t1.MAXDumpNumber-1 = Mytable.DumpNumber
Use something like this to get the previous row:
SELECT *
FROM Mytable
INNER JOIN
(
SELECT PrimaryKeyFields
, MAX(DumpNumber) AS MAXDumpNumber
FROM Mytable
INNER JOIN
(
SELECT PrimaryKeyFields
, MAX(DumpNumber) AS MAXDumpNumber
FROM Mytable
GROUP BY PrimaryKeyFields
) AS TabLatest
ON TabLatest.PrimaryKeyFields = Mytable.PrimaryKeyFields
AND
TabLatest.MAXDumpNumber <> Mytable.DumpNumber
-- Note that the <> is VERY important
GROUP BY PrimaryKeyFields
) AS T1
ON t1.PrimaryKeyFields = Mytable.PrimaryKeyFields
AND t1.MAXDumpNumber= Mytable.DumpNumber
Create 4 and 5 and MS Access named queries (or SS views) and then treate them like tables to do comparison.
Make sure you have indexes created on the PK fields and the DumpNumber and they shoudl be unique - this will speed things up....
Finish it in time for christmas... and flag this as an answer!

Inefficient query or am I reaching the limits of Access?

I have been taking an online class on relational databases and created an Access database (for the first time) to practice my SQL queries and solve a couple of work-related problems along the way. The database consists of three tables, with the primary table being used to record company wide sales summary information at the branch/store/menu item level (e.g. lowest level of detail) and with three periods of data the database is presently 1.3GB with that one table containing 4,262,421 records.
Everything has gone well until I attempted to run the following query:
SELECT P1.*, P13.[Price?] AS P13Price
FROM (SELECT * FROM PBASE WHERE Period = 13) AS P13, (SELECT * FROM PBASE WHERE Period = 1) AS P1
WHERE P1.Key = P13.Key and P1.[Price?]<>P13.[Price?];
To explain, the big table is PriceAccData and so I first ran a query (PBASE) that added a field to the PriceAccData that I can use as a key to compare price changes from one period to the next (combination of branch, store, menu item). Then I used subqueries to create a data set from the last period of 2013 (Period 13) and the first period of 2014 (Period 1)....from there I attempted to identify items that had changed in price from one period to the next in the Where clause.
Is there a more efficient way to write the query or to accomplish the comparison....it will work for one branch at a time, but takes a long time and locks up Access if I run it for more than one branch.
Subqueries are always known to be inefficient and are used as last resort. There's usually a way to JOIN tables for better efficiency. I suggest something in the line of :
SELECT ...... FROM PBASE P13 INNER JOIN PBASE P1 ON P13.KEY=P1.KEY
this will give you the data for the 2 periods then you can check for your equality criteria. Let me know if you need further help for that

Sql Server 2008 partition table based on insert date

My question is about table partitioning in SQL Server 2008.
I have a program that loads data into a table every 10 mins or so. Approx 40 million rows per day.
The data is bcp'ed into the table and needs to be able to be loaded very quickly.
I would like to partition this table based on the date the data is inserted into the table. Each partition would contain the data loaded in one particular day.
The table should hold the last 50 days of data, so every night I need to drop any partitions older than 50 days.
I would like to have a process that aggregates data loaded into the current partition every hour into some aggregation tables. The summary will only ever run on the latest partition (since all other partitions will already be summarised) so it is important it is partitioned on insert_date.
Generally when querying the data, the insert date is specified (or multiple insert dates). The detailed data is queried by drilling down from the summarised data and as this is summarised based on insert date, the insert date is always specified when querying the detailed data in the partitioned table.
Can I create a default column in the table "Insert_date" that gets a value of Getdate() and then partition on this somehow?
OR
I can create a column in the table "insert_date" and put a hard coded value of today's date.
What would the partition function look like?
Would seperate tables and a partitioned view be better suited?
I have tried both, and even though I think partition tables are cooler. But after trying to teach how to maintain the code afterwards it just wasten't justified. In that scenario we used a hard coded field date field that was in the insert statement.
Now I use different tables ( 31 days / 31 tables ) + aggrigation table and there is an ugly union all query that joins togeather the monthly data.
Advantage. Super timple sql, and simple c# code for bcp and nobody has complained about complexity.
But if you have the infrastructure and a gaggle of .net / sql gurus I would choose the partitioning strategy.

How do put the Time Dimension into order which I imported from an excel spreadsheet to SQL Server?

I imported a Time Dimension from an excel spreadsheet to SQL Server.
The time dimension start date is 2005-07-01 to 2025-12-31 (aussie format)
Tha attributes is composed of
TimeKey Date Date_Name Year Year_Name Half_Year Half_Year_Name Quarter Quarter_Name and all the way to fiscal attributes.
Anyways, when I created this TimeDim in the excel spreadsheet, it was in order, arranged properly from 2005-07-01 to 2025-12-31. I imported the spreadsheet in sql server then when I query using a select * from TimeDim.
The results are shuffled, dates are disarray.
Is there anyway to fix this? Im willing to truncate or drop the table then import the spreadsheet again so long it could fix the problem.
Many Thanks!!
Beau
The order in which a table stores data is dependent on the clustered index that you define.
However, even if you define a clustered index for your date column, simply selecting the entire table does not guarantee that your data will be returned in that order.
The only way to guarantee data is selected in your desired order is by specifying the ORDER BY clause in your select statement.