Access 2016 query runtime - sql

I am trying to find out the amount of IP's that correspond with each big category (big_cat). In order to do this I need to JOIN 3 tables.
I have the following 3 tables:
Large categories:
small_cat | big_cat
final_parsed_userlogs_access_longuri:
ip | uri
all_categories_from_all_unique_uri:
uri | small_cat
And the following SQL query in Access 2016:
SELECT COUNT(Final_parsed_userlogs_access_longuri.ip), Large_categories.big_cat
FROM (All_categories_from_all_unique_uri INNER JOIN Final_parsed_userlogs_access_longuri ON All_categories_from_all_unique_uri.uri = Final_parsed_userlogs_access_longuri.uri) INNER JOIN Large_categories ON All_categories_from_all_unique_uri.small_cat = Large_categories.small_cat
GROUP BY Large_categories.big_cat;
The tables are 2.2 million, 4.4 million and 1.1 million lines long (in previously mentioned order). Obviously this will take quite some time to run, but I have been running this query for 1,5 hours now and it's still not finished.
Is there a way to make this query run faster? I indexed all fields already. If this is not possible; Is there a way to get a general sense of how long this query will take (through some equation or something)?

Related

MSAccess Slow Updates on Self-Joined table

I am trying to improve the performance of updating only about 60K rows with data coming from different rows in the same table. At about 2 minutes, it's not terrible, but it's not great either, and my application really doesn't work if you have to wait so long between recalculations.
The app generates a set of financial statements for a business, where it calculates basic formulas on 1300 line items, like Rent, or Direct Labor, or Inventory costs, all of which roll up to totals that mimic the Balance Sheet, P&Ls, Cash Flow etc. Many of the line items need to calculate on a month by month basis, where for instance it has figure out April's On Hand Inventory before knowing what April's Inventory Value is. So the total program ends up looping through 48 months over 30 calculation passes, requiring about 8000 SQL statements. (fortunately it figures it all out by itself!) Each SQL is taking only a few milliseconds, but it adds up.
I'm pretty sure I can't reduce the number of loops, so I keep trying to figure out how to make each SQL quicker. The basic structure is as follows:
LI: Line item table that holds the basic info of each item, primary key LID
LID Name
123 Sales_1
124 Sales_2
200 Total Sales
Formula: Master/Detail tables that create any formula from the line items
Total sales=Sales_1 + Sales_2
or
{200}={123}+{124}
(I use curly braces to be able to find and replace the LIDs within the formula, as shown in the SQL below)
FC: Formula Calculation table: all line items by month, about 1300 items x 48 months=62K records. Primary key FID
FID SQL_ID LID LID_brace LIN OutputMonth Formula Amount
3232 25 123 {123} Sales_1 1 1200
3255 26 124 {124} Sales_2 1 1500
5454 177 200 {200} Total Sales 1 {123}+{124}
DMO:Operand Join table, which links a formula to its detail lines within the same table, so once Sales_1 is calculated, it can find the Total Sales record and update it, which then will evaluate then send its amount up the chain to the other LIDs that depend on it, such as Total Income. It locates the record to update based on the SQL_id, which is set based on the calc pass and month. Its complex to setup, but pretty straightforward once you actually run things
Master_FID Detail_FID
5454 3232 (links total sales to sales_1)
5454 3255 (links total sales to sales_2)
SQL1:
Update FC inner join DMO on FC.FID=DMO.Master_FID inner join FC2 on DMO.Detail_FID=FC2.FID set FC.formula=replace(FC.Formula,FC.LID_brace,FC2.Amount) where FC.sql_id=177
The above will change {123} + {124} to 1200+1500 which will then evaluate to 2700 when I run the following
SQL2:
UPDATE FC SET FC.amount = Eval([fc].[formula]) WHERE (((FC.calc_sql_id)=177 )
So those two sql statements are run over and over again, with the only thing changing is the SQL_id.
There are indexes on the SQL_ID, LID, FID etc
When measuring, the milliseconds per record can range from .04ms if there are many records included (~10K for some passes), up to 10 or 15 ms for just one record updated. Perhaps it is the setup of the query causing a whole lot of overhead time, because it doesn't seem to be a function of the actual number of records updated? Also its not very consistent, where some runs have 20+ ms compared to less than 3ms when it runs it again.
I know this is a complex question i'm asking that probably doesn't have a simple answer, but I'm just looking for directions for what might help. For instance, a parameter query if there isn't a whole lot of change between runs? Does Access have a better time of running a query if knows about it in advance, i.e a named query with parameters vs dynamic SQL? Am I just doomed because it still needs to run those 8000 queries?
Also, is there inherently a problem with trying to update the same table through a secondary join table, and/or is there a better way to do it?
Is it also because string replacing isn't efficient this way? If I tried RegEx would that be quicker? I would have to make a function that could do that within a query, but it seems like that's going to be slower.
Thanks in advance, this has been a most vexing problem!!!

Access 2010 doubling the sum in query

I know this question has been asked and answered. I understand the problem and I understand the underlying cause and I understand the solution. What I DON'T understand is how to implement the solution.
I'll try to be detailed....
Background: Each material is being grouped on WellID (I work in oil and gas) and SandType which is my primary key in each table, these come from 2 lookup tables one for each. (I work in oil and gas)
I have 3 tables that store material (sand)) weights at 3 different stages in the job process. Basically the weight from the engineer's DESIGN, what was DELIVERED and what is in INVENTORY.
I know that the join is messed up and adding the total for each row in each table. Sometimes double triple etc.
I am grouping on WellID and SandID.
Now I don't want someone to do the work for me. I just don't know how or where in access to restrict it to what I want, or if modifying t he sql the proper way to write the code. Current work around is 3 separate sum queries one for each table, but that is going to get inefficient and added steps.
My whole database purpose and subsequent reports hinge off math on these 3 numbers so, my show stopper here is putting the fat lady on stage, and is about to become a deal breaker at the end of the line! 0
I need some advice, direction, criticism, wisdom, witty euphemisms or a new job!
The 3 tables look as follows
Design:
T_DESIGN
DesignID WellID Sand_ID Weight_DES Time_DES
89 201 1 100 4/21/2014 6:46:02 AM
98 201 2 100 4/21/2014 7:01:22 AM
86 201 4 100 4/21/2014 6:28:01 AM
93 228 5 100 4/21/2014 6:53:34 AM
91 228 1 100 4/21/2014 6:51:23 AM
92 228 1 100 4/21/2014 6:53:30 AM
Delivered:
T_BOL
BOLID WellID_BOL SandID_BOL Weight_BOL
279 201 1 100
280 201 1 100
281 228 2 5
282 228 1 10
283 228 9 100
Inventory:
T_BIN
StrapID WellID_BIN SandID_BIN Weight_BIN
11 201 1 100
13 228 1 10
14 228 1 0
17 228 1 103
19 201 1 50
The Query Results:
Test Query99
WellID
WellID SandID Sum Of Weight_DES Sum Of Weight_BOL Sum Of Weight_BIN
201 1 400 400 300
228 1 600 60 226
SQL:
SELECT DISTINCTROW L_WELL.WellID, L_SAND.SandID,
Sum(T_DESIGN.Weight_DES) AS [Sum Of Weight_DES],
Sum(T_BOL.Weight_BOL) AS [Sum Of Weight_BOL],
Sum(T_BIN.Weight_BIN) AS [Sum Of Weight_BIN]
FROM ((L_SAND INNER JOIN
(L_WELL INNER JOIN T_DESIGN ON L_WELL.[WellID] = T_DESIGN.[WellID_DES])
ON L_SAND.SandID = T_DESIGN.[SandID_DES])
INNER JOIN T_BIN
ON (L_WELL.WellID = T_BIN.WellID_BIN)
AND (L_SAND.SandID = T_BIN.SandID_BIN))
INNER JOIN T_BOL
ON (L_WELL.WellID = T_BOL.WellID_BOL) AND (L_SAND.SandID = T_BOL.SandID_BOL)
GROUP BY L_WELL.WellID, L_SAND.SandID;
Two LooUp tables are for Well Names and Sand Types. (Well has been abbreviate do to size)
L_Well:
WellID WellName_WELL
3 AAGVIK 1-35H
4 AARON 1-22
5 ACHILLES 5301 41-12B
6 ACKLINS 6092 12-18H
7 ADDY 5992 43-21 #1H
8 AERABELLE 5502 43-7T
9 AGNES 1-13H
10 AL 5493 44-23B
11 ALDER 6092 43-8H
12 AMELIA FEDERAL 5201 41-11B
13 AMERADA STATE 1-16X
14 ANDERSMADSON 5201 41-13H
15 ANDERSON 1-13H
16 ANDERSON 7-18H
17 ANDRE 5501 13-4H
18 ANDRE 5501 14-5 3B
19 ANDRE SHEPHERD 5501 14-7 1T
Sand Lookup:
LSand
SandID SandType_Sand
1 100 Mesh
2 20/40 EP
3 20/40 RC
4 20/40 W
5 30/50 Ceramic
6 30/50 EP
7 30/50 RC
8 40/70 EP
9 40/70 W
10 NA See Notes
Querying and Joining Aggregation Data through an MS Access Database
I noticed your concern for pointers on how to implement some of the theory behind your aggregation queries. While SQL queries are good power-tools to get to the core of a difficult analysis problem, it might also be useful to show some of the steps on how to bring things together using the built-in design tools of MS Access.
This solution was developed on MS Access 2010.
Comments on Previous Solutions
#xQbert had a solid start with the following SQL statement. The sub query approach could be visualized as individual query objects created in Access:
FROM
(SELECT WellID, Sand_ID, Sum(weight_DES) as sumWeightDES
FROM T_DESGN) A
INNER JOIN
(SELECT WellID_BOL, Sum(Weight_BOL) as SUMWEIGHTBOL
FROM T_BOL B) B
ON A.Well_ID = B.WellID_BOL
INNER JOIN
(SELECT WellID_BIN, sum(Weight_Bin) as SumWeightBin
FROM T_BIN) C
ON C.Well_ID_BIN = B.Well_ID_BOL
Depending on the actual rules of the business data, the following assumptions made in this query may not necessarily be true:
Will the tables of T_DESIGN, T_BOL and T_BIN be populated at the same time? The sample data has mixed values, i.e., there are WellID and SandID combinations which do not have values for all three of these categories.
INNER type joins assume all three tables have records for each dimension value (Well-Sand combination)
#Frazz improved on the query design by suggesting that whatever is selected as the "base" joining table (T_DESIGN in this case), this table must be populated with all the relevant dimensional values (WellID and SandID combinations).
SELECT
WellID_DES AS WellID,
SandID_DES AS SandID,
SUM(Weight_DES) AS Weight_DES,
(SELECT SUM(Weight_BOL) FROM T_BOL WHERE T_BOL.WellID_BOL=d.WellID_DES
AND T_BOL.SandID_BOL=d.SandID_DES) AS Weight_BOL,
(SELECT SUM(Weight_BIN) FROM T_BIN WHERE T_BIN.WellID_BIN=d.WellID_DES
AND T_BIN.SandID_BIN=d.SandID_DES) AS Weight_BIN
FROM T_DESIGN;
(... note: a group-by statement should be here...)
This was animprovement because now all joins originate from a single point. If a key-value does not exist in either T_BOL or T_BIN, results will still come back and the entire record of the query would not be lost.
Again, it may be possible that there are no T_DESIGN records matching to values stored in the other tables.
Building Aggregation Sub Query Objects
The presented data does not suggest that there is any direct interaction between the data in each of the three tables aside from lining up their results in the end for presentation based on a common key-value pair (WellID and SandID). Since we are using Access, there is a chance to do these calculations separately.
This query was designed using the "summarizing" feature of the Access query design tool. It's output, after pointing to the T_DESIGN table looked like this:
Making Dimension Table Through a Cartesian Product
There are mixed opinions out there about cartesian products, but they do actually have a purpose.
Most of the concern is that a runaway cartesian product query will make millions and millions of nonsensical data values. In this query, it's specifically designed to simulate a real business condition.
The Case for a Cartesian Product
Picking from the sample data provided:
Some of the Sand Types: "20/40 EP", "30/50 Ceramic", "40/70 EP", and "30/50 RC" that are moved between their respective wells, are these sand types found at these wells consistently throughout the year?
Without an anchoring dimension for the key-values, Wells would not be found anywhere in the database via querying. It's not that they do not exist... it's just that there is no recorded data (i.e., Sand Type Weights delivered) for them.
A Reference Dimension Query Product
A dimension query is simple to produce. By referencing the two sources of keys: L_WELL and L_SAND (both look up tables or dimensional tables) without identifying a join condition, all the different combinations of the two key-values (WellID and SandID) are made:
The shortcut in SQL looks like this:
SELECT L_WELL.WellID, L_SAND.SandID, L_WELL.WellName, L_SAND.SandType
FROM L_SAND, L_WELL;
The resulting data looks like this:
Instead of using any of the operational data tables: T_DESIGN, T_BOL, or T_BIN as sources of data for a static dimension such as a list of Oil Wells, or a catalog of Sand Types, that data has been predetermined and can even be transferred to a real table since it probably will not change much once it is created.
Correlating Sub Query Results from Different Sources
After repeating the process and creating the summary tables for the other two sources (T_BOL and T_BIN), You can finally arrange the results through a simple query and join process.
The actual JOIN operations are between the dimension table/query: QSUB_WELL_SAND and all three of the summary queries: QSUB_DES, QSUB_BOL, and QSUB_BIN.
I have chosen to chosen to implement LEFT OUTER joins. If you are not sure of the difference between the different "outer" joins, this is the choice I made through the Access Query Design dialogue:
QSUB_WELL_SAND is defined as our anchor dimension. It will always have more records than any of the other tables. An OUTER JOIN should be defined to KEEP all reference dimension records... and all Summary Table query results, regardless if there is a match between the two Query results.
QSUB_WEIGHTS/ The Query to Combine All Sub Query Results
This is what the design of the final output query looks like:
This is what the data output looks like when this query design is executed:
Conclusions and Clean Up: Some Closing Thoughts
With respect to the join to the dimension query, there is a lot of empty space where there are no records or data to report on. This is where a cleverly placed filter or query criteria can shrink the output to exactly what you care to look at the most. Here's how mine looked after I added additional ending query criteria:
My data was based on what was supplied by the OP, except where the ID's assigned to the Well Type attribute did not match the sample data. The values I assigned instead are posted below as well.
Access supports a different style of database operations. Step-wise queries can be developed to hold pre-processed, special sets of data that can be reintroduced to the other data tables and query results to develop complex query criteria.
All this being said, Programming in SQL can also be just as rewarding. Be sure to explore some of the differences between the results and the capabilities you can tap into by using one approach (sql coding), the other approach (access design wizards) or both of the approaches. There's definitely a lot of room to grow and discover new capabilities from just the example provided here.
Hopefully I haven't stolen all the fun from developing a solution for your situation. I read into your comment about "building more on top" as the harbinger of more fun to come, so I don't feel so bad...! Happy Developing!
Data Modifications from the Sample Set
Without understanding L_SAND and L_WELL this is the best I could come up with..
use sub selects to get the sums first so you don't compound the data issues on the joins.
Select WellID, Sand_ID, sumWeightDES, WellID_BOL, SUMWEIGHTBOL,
WellID_BIN, SumWeightBin
FROM
(SELECT WellID, Sand_ID, Sum(weight_DES) as sumWeightDES
FROM T_DESGN) A
INNER JOIN
(SELECT WellID_BOL, Sum(Weight_BOL) as SUMWEIGHTBOL
FROM T_BOL B) B
ON A.Well_ID = B.WellID_BOL
INNER JOIN
(SELECT WellID_BIN, sum(Weight_Bin) as SumWeightBin
FROM T_BIN) C
ON C.Well_ID_BIN = B.Well_ID_BOL
I would simplify it excluding L_WELL and L_SAND. If you are just interestend in IDs, then they really shouldn't be necessary joins. If all the other 3 tables have the WellID and SandID columns, then pick the one that is sure to have all combos.
Supposing it's the Design table, then:
SELECT
WellID_DES AS WellID,
SandID_DES AS SandID,
SUM(Weight_DES) AS Weight_DES,
(SELECT SUM(Weight_BOL) FROM T_BOL WHERE T_BOL.WellID_BOL=d.WellID_DES AND T_BOL.SandID_BOL=d.SandID_DES) AS Weight_BOL,
(SELECT SUM(Weight_BIN) FROM T_BIN WHERE T_BIN.WellID_BIN=d.WellID_DES AND T_BIN.SandID_BIN=d.SandID_DES) AS Weight_BIN
FROM T_DESIGN
GROUP BY WellID, SandID;
... and make sure all your tables have an index on WellID and SandID.
Just to be clear. I dont' think it's a good idea to start the join from the lookup tables, or from their cartesian product. You can always left join them to fetch descriptions and other data. But the main query should be the one with all the combos of WellID and SandID... or if not all, at least the most. Things get difficult if none of the 3 tables (DESIGN, BOL and BIN) have all combos. In that case (and I'd say only in that case) then you might as well start with the cartesian product of the two lookup tables. You could also do a UNION, but I doubt that would be more efficient.

Iterating through table to use as parameters in function to create large new table

I have been researching this and haven't found a usable answer yet. I'm a moderate hack with SQL Server and have some success with using parameters in functions and stored procedures but this is a combination that I can't seem to get my head around.
Here is my scenario summarized for clarity:
My company sells computers as laptops and desktops and accessories. I have a tbl_Computers where I maintain Computer_Type, Model_Num, and Mix_Percent like this:
Desktop ABC .75
Desktop XYZ .25
Laptop DEF .60
Laptop MNO .40
We also have a table for forecast by month where I maintain Computer_Type, Jul_Num, Aug_Num, and Sep_Num like this:
Desktop 100 200 150
Laptop 300 400 700
I have created a function for a planning bill of material that will find all components and accessories sold in the past twelve months for a given model. It works as follows:
P_BOM ("ABC") will result in a table with two columns: Component and Comp_Percent
CPU 1 (This means we sell 1 CPU with every desktop)
Hard Drive 2 (We sell 2 with every desktop)
Printer .8 (80% of the time we sell a printer)
What I'd like my Stored Procedure to do is to provide a single, combined table that would look like this with the following headers Component, Jul_Num, Aug_Num, and Sep_Num:
CPU 400 600 850
Hard Drive 500 800 1000
I get the CPU number by summing the following logic:
Desktop's Jul_Num x ABC's Mix_Percentage x ABC's CPU Comp_Percent
Desktop's Jul_Num x XYZ's Mix_Percentage x XYZ's CPU Comp_Percent
Laptop's Jul_Num x ABC's Mix_Percentage x ABC's CPU Comp_Percent
Laptop's Jul_Num x XYZ's Mix_Percentage x XYZ's CPU Comp_Percent
400 = (100 x .75 x 1) + (100 x .25 x 1) + (300 x .6 x 1) + (300 x .4 x 1)
Any ideas?
Thanks,
Rob
EDIT:
Thanks to the suggestion that this needn't be an iterative problem but rather a table-based solution.
I created this first table to give me:
ABC CPU 3 3 1
ABC Hard Drive 6 3 2
DEF CPU 2 2 1
DEF Hard Drive 2 2 1
MNO CPU 1 1 1
MNO Hard Drive 1 1 1
XYZ CPU 1 1 1
XYZ Hard Drive 2 1 2
Here was the SQL:
SELECT All_Components.Model_Num, All_Components.Part_Num, SUM(All_Components.Qty) AS Total, TTO.Target_Total, SUM(All_Components.Qty)
/ TTO.Target_Total AS Comp_Percent
FROM (SELECT Test_tbl_Computers.Model_Num, Test_tbl_Orders_2.Order_Num, Test_tbl_Orders_2.Part_Num, Test_tbl_Orders_2.Qty
FROM Test_tbl_Orders AS Test_tbl_Orders_2 CROSS JOIN
Test_tbl_Computers) AS All_Components INNER JOIN
(SELECT Test_tbl_Orders.Part_Num, SUM(Test_tbl_Orders.Qty) AS Target_Total
FROM Test_tbl_Orders INNER JOIN
Test_tbl_Computers AS Test_tbl_Computers_1 ON Test_tbl_Orders.Part_Num = Test_tbl_Computers_1.Model_Num
GROUP BY Test_tbl_Orders.Part_Num) AS TTO ON All_Components.Model_Num = TTO.Part_Num
WHERE (All_Components.Order_Num IN
(SELECT Order_Num
FROM Test_tbl_Orders AS Test_tbl_Orders_1
WHERE (Part_Num = All_Components.Model_Num))) AND (All_Components.Part_Num <> All_Components.Model_Num)
Then, to keep it from becoming a SQL-Monster I couldn't tame, I created another function to conduct an inner join to the forecast and mix percentages and then sum up all numbers grouping by Part_Num.
If nothing else, I appreciated having to write out my question to help focus my thoughts.
"Computer_Type, Jul_Num, Aug_Num, and Sep_Num"
One-column-per-month works for reporting or a data-entry interface, but you are going to drive yourself absolutely bonkers if you actually store the data that way. If you have the means to go back and change that table to "Computer_Type, Year, Month, Num" or "Computer_Type, Date, Num", then you should do that first.
I don't see why do you need a function here. All the data you are stating here can be stored in tables and then picked up with simple joins.
Even if the BOM is recursive you can make use of CTEs.
If you insist using the function you can use CROSS APPLY to call the function to each row.
The only function I've created thus far has been to create a
two-column table filled with all components sold on the same sales
order as a specific desktop model. Are you recommending that I post
the results of that table to another table and continue to insert new
rows for each instance of the function?
I see, you don't need to use a function.
there are many different possibilities,
one that you are probably very close to it, is converting the function to a SP and use the
INSERT INTO [TableName]
EXEC [SP_NAME]
another one convert the query into an accumulating subquery or CTE (see below)
and join it with the others.
in general avoid creating functions every time you can, because they make big damage to the database.
The BOM's are not recursive. I'm not familiar with CTE's.
Common Table Expression (CTE) are great tools when you have many concatenated subqueries, once you get used to them, they bring lot of clarity (IMO) to complex queries.
CTEs also provide native support for recursion.
There are tons of articles in the net that will guide you. just google CTE expamples or CTE tutorials
I have no experience with CROSS APPLY.
CROSS APPLY: from page http://technet.microsoft.com/en-us/library/ms175156(v=sql.105).aspx
The APPLY operator allows you to invoke a table-valued function for each row returned by an outer table expression of a query.
last and not least
Any examples would be appreciated
if you would give me the partial scripts that you already had, I'll try convert it to the query that I meant to guide you.

Cumulative average number of records created for specific day of week or date range

Yeah, so I'm filling out a requirements document for a new client project and they're asking for growth trends and performance expectations calculated from existing data within our database.
The best source of data for something like this would be our logs table as we pretty much log every single transaction that occurs within our application.
Now, here's the issue, I don't have a whole lot of experience with MySql when it comes to collating cumulative sum and running averages. I've thrown together the following query which kind of makes sense to me, but it just keeps locking up the command console. The thing takes forever to execute and there are only 80k records within the test sample.
So, given the following basic table structure:
id | action | date_created
1 | 'merp' | 2007-06-20 17:17:00
2 | 'foo' | 2007-06-21 09:54:48
3 | 'bar' | 2007-06-21 12:47:30
... thousands of records ...
3545 | 'stab' | 2007-07-05 11:28:36
How would I go about calculating the average number of records created for each given day of the week?
day_of_week | average_records_created
1 | 234
2 | 23
3 | 5
4 | 67
5 | 234
6 | 12
7 | 36
I have the following query which makes me want to murderdeathkill myself by casting my body down an elevator shaft... and onto some bullets:
SELECT
DISTINCT(DAYOFWEEK(DATE(t1.datetime_entry))) AS t1.day_of_week,
AVG((SELECT COUNT(*) FROM VMS_LOGS t2 WHERE DAYOFWEEK(DATE(t2.date_time_entry)) = t1.day_of_week)) AS average_records_created
FROM VMS_LOGS t1
GROUP BY t1.day_of_week;
Halps? Please, don't make me cut myself again. :'(
How far back do you need to go when sampling this information? This solution works as long as it's less than a year.
Because day of week and week number are constant for a record, create a companion table that has the ID, WeekNumber, and DayOfWeek. Whenever you want to run this statistic, just generate the "missing" records from your master table.
Then, your report can be something along the lines of:
select
DayOfWeek
, count(*)/count(distinct(WeekNumber)) as Average
from
MyCompanionTable
group by
DayOfWeek
Of course if the table is too large, then you can instead pre-summarize the data on a daily basis and just use that, and add in "today's" data from your master table when running the report.
I rewrote your query as:
SELECT x.day_of_week,
AVG(x.count) 'average_records_created'
FROM (SELECT DAYOFWEEK(t.datetime_entry) 'day_of_week',
COUNT(*) 'count'
FROM VMS_LOGS t
GROUP BY DAYOFWEEK(t.datetime_entry)) x
GROUP BY x.day_of_week
The reason why your query takes so long is because of your inner select, you are essentialy running 6,400,000,000 queries. With a query like this your best solution may be to develop a timed reporting system, where the user receives an email when the query is done and the report is constructed or the user logs in and checks the report after.
Even with the optimization written by OMG Ponies (bellow) you are still looking at around the same number of queries.
SELECT x.day_of_week,
AVG(x.count) 'average_records_created'
FROM (SELECT DAYOFWEEK(t.datetime_entry) 'day_of_week',
COUNT(*) 'count'
FROM VMS_LOGS t
GROUP BY DAYOFWEEK(t.datetime_entry)) x
GROUP BY x.day_of_week

ORM Select n + 1 performance; join or no join

There are similar questions to this, but I don't think anyone has asked this particular question.
Scenario:
Customer - Order (where Order has a CustomerID) - OrderPart - Part
I want a query that returns a customer with all its orders and each order with its parts.
Now I have two main choices:
Use a nested loop (which produces separate queries)
Use data loading options (which produces a single query join)
The question:
Most advice and examples on ORMs suggest using option 2 and I can see why. However, option 2 will potentially be sending back a huge amount of duplicated data, eg:
Option 1 results (3 queries):
ID Name Country
1 Customer1 UK
ID Name
1 Order1
2 Order2
ID Name
1 Part1
2 Part2
3 Part3
Option 2 results (1 query):
ID Name Country ID Name ID Name
1 Customer1 UK 1 Order1 1 Part1
1 Customer1 UK 1 Order1 2 Part2
1 Customer1 UK 1 Order1 3 Part3
1 Customer1 UK 2 Order2 1 Part1
1 Customer1 UK 2 Order2 2 Part2
Option 1 sends back 13 fields with 3 queries. Option 2 sends back 42 fields in 1 query. Now imagine Customer table has 30 fields and Orders have more complex sub joins, the data duplication can quickly become huge.
What impact on overall performance do the following things have:
Overhead of making a database connection
Time taken to send data (potentially across network if on different server)
Bandwidth
Is option 2 always the best choice, option 1 the best choice or does it depend on the situation? If it depends, what criteria should you use to determine? Are any ORMs clever enough to work it out for themselves?
Overhead of making a database connection
Very little if they are on the same subnet, which they usually are. If they're not then this is still not a huge overhead and can be overcome with caching, which most ORMs have (NHibernate has 1st and 2nd level caching).
Time taken to send data (potentially across network if on different server)
For SELECT N+1 this will obviously be longer as it will have to send the select statement each time, which might be up to 1k long. It will also have to grab a new connection from the pool. Chatty versus chunky use to be an argument around 2002-2003 but now it really doesn't make a huge difference unless this is a really big application, in which case you will probably want a more experienced (or better paid) pundit giving his views - i.e. a consultant.
I would favour joins however, as databases will be optimised for this usage over their 10 or more years of development. If performance is really slow a View can sort this out, or Stored Procedure.
By the way, SELECT N+1 is probably the commonest performance problem people experience with NHibernate when they first start using it (including me), and is something that actually takes tweaking to sort out. This is because NHibernate is to ORMs what C++ is to languages.
Bandwidth
An extra SELECT statement for every Customer will eventually build up to however many Customer objects * Orders. So for a large system this might be noticeable - but as I mentioned, ORMs usually have caching mechanisms in place to negate this problem. The amount of SELECT statements also isn't going to be that huge considering:
You're on the same network as the SQL server most of the time
The increased amount of bytes account for about an extra 0.5-50k of extra bandwidth? Think how fast that is on most servers.
a great deal of this is going to depend on the amount of data you are going through.
The join, while returning more fields, is going to run markedly faster (as a rule) than the Option 1 set of queries.
From my personal experience, slow-downs are almost always at that level, the actual running of the query, not the sheer amount of data being passed along whatever pipe you have.