Why does the count function in SQL seem to be doing more than counting the column I ask it to? - sql

I have an INSERT query that is pulling data from two tables and inserting that data into a third table. Everything appears to be working fine except that the COUNT part of the query isn't returning the results I would expect.
The first set of tables this query runs is MIUsInGrid1000 (number of rows = 1) and Results1000 (number of rows = 24). The number that is returned from the Count part of the query is 24 instead of being 1 like I would have expected.
The next set of tables is MIUsInGrid1000 (number of rows = 3) and Results1000 (number of rows = 30). The number that is returned from the Count part of the query is 90 instead of being 3 like I would have expected.
It appears that the product of the two counts is what is being returned to me and I can't figure out why that is. If I take out the references to the Results tables then the query works the way I would expect. I think that I am misunderstanding how at least some part of this works. Can someone explain why this is not working as I expected it to?
strQuery1 = "Insert Into MIUsInGridAvgs (NumberofMIUs, ProjRSSI, RealRSSI, CenterLat, CenterLong) " & _
"Select Count(MIUsInGrid" & i & ".MIUID), Avg(MIUsInGrid" & i & ".ProjRSSI), Avg(MIUsInGrid" & i & ".RealRSSI), Avg(Results" & i & ".Latitude), Avg(Results" & i & ".Longitude) " & _
"From MIUsInGrid" & i & ", Results" & i & " "

It seems logical to me that if you are joining two tables, one with 1 row and the other with 24 rows that there, is the possibility, of having a result set of 24 rows.
I notice you have not included a WHERE clause in your SQL (perhaps for brevity), but if you don't have that you are doing a CROSS JOIN (or cartesian join) between the tables and this will provide unexpected results.
The COUNT function will count all rows in the database, to determine how many "distinct" ID's there are, you can use the answer provided by Tomalak

This should solve your immediate problem
Count(DISTINCT MIUsInGrid" & i & ".MIUID)
The naked COUNT function counts the non-NULL values, not the distinct values, unless you tell it to switch behavior by using DISTINCT.
When two tables are joined like you do it (you build a cartesian product), then the number of resulting rows is the number of rows in the one times the number of rows in the other table.
This leads me to the suspicion that you are missing a join condition.
Apart from that i find it bewildering that you have a number of obviously identical tables that are indexed by name. This most certainly is a substantial design flaw in the database.

The way I usually figure these things out is to not use any aggregates first to see what my result sets will be. Then, I start adding in the aggregate functions.

Related

Access VBA Filtering out multiple Criteria one a table mark complete on when all is Done

Have a table that has multiple Order Numbers and with those there is multiple Line Numbers. Basically There is a Order Number 123456 and there could be 5 Different Line Numbers with it. I want to be able to say when all those line numbers are complete then Mark the Order Number Complete. Thinking about using a recordset and a dcount. As you can see in the picture there are multiple Line Numbers with the same Order Number.
Saving calculated data, especially aggregate calc, is often not necessary and can even put data integrity at risk - saved calculated aggregate can get 'out of sync' with data. "Done" status can be calculated when needed. Domain aggregate function is one way. In a textbox or query:
IIf(DCount("*", "OrderDetails", "ord_no='" & [ord_no] & "' AND doc_completed = False")=0, "Done", "Not Done")
or
Nz(DLookup("doc_completed", "OrderDetails, "ord_no='" & [ord_no] "' AND doc_completed = False"), "True")
A correlated subquery should be able to return same value and possibly perform more efficiently.
SELECT *, IIf((SELECT Count(*) FROM OrderDetails WHERE ord_no=Orders.OrderNumber AND doc_completed = False)=0, True, False) AS IsDone FROM Orders;
Or build an aggregate query that counts records where doc_completed = False GROUP BY ord_no and join that query to Orders table.

Microsoft Access Multiple Criteria based on another table

I am working with a database in MS Access.
I have a table(Table A) with different categories (criteria)
I have another table (Table B), where I have to pull values from Table A based on two categories, (year and amount).
For example,
From table B, the cost is $15,000, so we go to table A and find the contingency from year 2018 which falls between $0-$20,000 and report a contingency of 25%.
Is there a way to go about this? I've been racking my brain trying to use nested "IIF" and "AND" functions but i can't figure it out
Add both tables to a query.
Join on C_YEAR.
Use BETWEEN AND to grab the appropriate range hence the appropriate contingency.
Something like:
SELECT tableA.CONTINGENCY
FROM tableA INNER JOIN tableB ON tableA.C_YEAR = tableB.C_YEAR
WHERE tableB.COST BETWEEN tableA.MIN_VALUE AND tableA.MAX_VALUE;
It looks like the contingency rates for all the years are the same:
25% for values between $0 and $20,000
15% for values between $20,001 and $200,000
10% for values between $200,000 and $100,000
Is this always the case, or just based on the sample data you're using?
Is it possible you'll have data where the cost is > $100,000,000? If so, how should the data be handled?
I'm just wondering if there's not a better way to represent your contingency rate rules. Otherwise, I'd agree with Rene about joining the tables and adding a WHERE condition to get the rates you need. I'd also add that, since you'll always be pulling the rate from the query, you don't need an actual field on Table B to store the contingency rate.
Consider a DLookUp in an UPDATE query. The domain aggregate is needed for query to be updateable.
UPDATE tableB b
SET b.Contingency = DLookUp("CONTINGENCY", "tableA", "[C_YEAR] = " & b.[C_YEAR] &
" AND [MIN_VALUE] <= " & b.[COST] & " AND [MAX_VALUE] >= " & b.[Cost])

MS Access: divide result set into multiple pieces by text value or number of rows

I have a query against an Item table. This is all the items our library has, e.g., books, DVDs,CDs, etc.
I have to send a tab-delimited file that contains data on all these items.
There are over 100,000 items.
I'm stuck using MS Access.
Access can pull all the data, but it cannot send the result set via email because it is too big (over 65,535 rows; I am aware that later versions of Excel past 2007 can hold more rows but that does not help me).
All the columns are text data. So normal relational operators won't work (I tried).
I need to split the result set into two or three result sets in order to get it from Access to Excel. The need for Excel is that this is how the vendor expects it, and it needs to be tweaked some before shipping.
How can I divide it?
I have thought of at least two ways
If I can count rows, I can tell Access to use the first 60,0000 rows it gets. How do I tell it do that and then fetch only the second set of all the rows past 60,000. I have not figured out how to do this.
Divide based upon a field. The only field that is unique is the barcode, e.g., "30001001672906" Usually, the barcode is 14 numbers in length. I have experimented with using StrComp in a where clause, but I have a problem:
The barcodes are not in sorted order before they are fetched. "Order by" works on the result set, not how the data is processed before it is selected.
I am at a loss as to how to accomplish my big goal. That's the one that matters, not the particular way to fix my SQL to get it. I've looked at some pages, such as those below but not found a solution.
https://support.office.com/en-us/article/Table-of-operators-e1bc04d5-8b76-429f-a252-e9223117d6bd#__toc272228349
MS ACCESS count/sum number of rows, with no duplicates
http://www.techonthenet.com/access/functions/string/strcomp.php
I don't understand the problem with 2.
SELECT barcode FROM items ORDER BY barcode
Open a Recordset on that, move to record 60000, get the barcode
rst.Move 60000
strBarcode = rst!barcode
See https://msdn.microsoft.com/en-us/library/bb243789%28v=office.12%29.aspx
Then build your queries dynamically.
myQuerydef.SQL = "SELECT * FROM items WHERE barcode <= '" & strBarcode & "'"
Export the query e.g. with DoCmd.TransferSpreadsheet
myQuerydef.SQL = "SELECT * FROM items WHERE barcode > '" & strBarcode & "'"
Export to second file.
If you need more than two files, use an array instead of strBarcode and do
myQuerydef.SQL = "SELECT * FROM items WHERE barcode > '" & Barcode(i) & _
"' AND barcode <= '" & Barcode(i+1) & "'"

Calculation based on values in 2 different rows

I have a table in MS Access which has stock prices arranged like
Ticker1, 9:30:00, $49.01
Ticker1, 9:30:01, $49.08
Ticker2, 9:30:00, $102.02
Ticker2, 9:30:01, $102.15
and so on.
I need to do some calculation where I need to compare prices in 1 row, with the immediately previous price (and if the price movement is greater than X% in 1 second, I need to report the instance separately).
If I were doing this in Excel, it's a fairly simple formula. I have a few million rows of data, so that's not an option.
Any suggestions on how I could do it in MS Access?
I am open to any kind of solutions (with or without SQL or VBA).
Update:
I ended up trying to traverse my records by using ADODB.Recordset in nested loops. Code below. I though it was a good idea, and the logic worked for a small table (20k rows). But when I ran it on a larger table (3m rows), Access ballooned to 2GB limit without finishing the task (because of temporary tables, the size of the original table was more like ~300MB). Posting it here in case it helps someone with smaller data sets.
Do While Not rstTickers.EOF
myTicker = rstTickers!ticker
rstDates.MoveFirst
Do While Not rstDates.EOF
myDate = rstDates!Date_Only
strSql = "select * from Prices where ticker = """ & myTicker & """ and Date_Only = #" & myDate & "#" 'get all prices for a given ticker for a given date
rst.Open strSql, cn, adOpenKeyset, adLockOptimistic 'I needed to do this to open in editable mode
rst.MoveFirst
sPrice1 = rst!Open_Price
rst!Row_Num = i
rst.MoveNext
Do While Not rst.EOF
i = i + 1
rst!Row_Num = i
rst!Previous_Price = sPrice1
sPrice2 = rst!Open_Price
rst!Price_Move = Round(Abs((sPrice2 / sPrice1) - 1), 6)
sPrice1 = sPrice2
rst.MoveNext
Loop
i = i + 1
rst.Close
rstDates.MoveNext
Loop
rstTickers.MoveNext
Loop
If the data is always one second apart without any milliseconds, then you can join the table to itself on the Ticker ID and the time offsetting by one second.
Otherwise, if there is no sequence counter of some sort to join on, then you will need to create one. You can do this by doing a "ranking" query. There are multiple approaches to this. You can try each and see which one works the fastest in your situation.
One approach is to use a subquery that returns the number of rows are before the current row. Another approach is to join the table to itself on all the rows before it and do a group by and count. Both approaches produce the same results but depending on the nature of your data and how it's structured and what indexes you have, one approach will be faster than the other.
Once you have a "rank column", you do the procedure described in the first paragraph, but instead of joining on an offset of time, you join on an offset of rank.
I ended up moving my data to a SQL server (which had its own issues). I added a row number variable (row_num) like this
ALTER TABLE Prices ADD Row_Num INT NOT NULL IDENTITY (1,1)
It worked for me (I think) because my underlying data was in the order that I needed for it to be in. I've read enough comments that you shouldn't do it, because you don't know what order is the server storing the data in.
Anyway, after that it was a join on itself. Took me a while to figure out the syntax (I am new to SQL). Adding SQL here for reference (works on SQL server but not Access).
Update A Set Previous_Price = B.Open_Price
FROM Prices A INNER JOIN Prices B
ON A.Date_Only = B.Date_Only
WHERE ((A.Ticker=B.Ticker) AND (A.Row_Num=B.Row_Num+1));
BTW, I had to first add the column Date_Only like this (works on Access but not SQL server)
UPDATE Prices SET Prices.Date_Only = Format([Time_Date],"mm/dd/yyyy");
I think the solution for row numbers described by #Rabbit should work better (broadly speaking). I just haven't had the time to try it out. It took me a whole day to get this far.

MS Access SQL Query using Sum() and Count() gives incorrect results

I am having an issue with a query which returns results that are very far from reality (not only does it not make sense at all but I can also calculate the correct answer using filters).
I am building a KPI db for work and this query returns KPIs by employee by period. I have a very similar query from which this one is derived which returns KPIs by sector by period which gives the exact results I have calculated using a spreadsheet. I really have no idea what happens here. Basically, I want to sum a few measures that are in the maintenances table like temps_requete_min, temps_analyse_min, temps_maj_min and temps_rap_min and then create a subtotal AND present these measures as hours (measures are presented in minutes, thus the divide by 60).
SELECT
[anal].[prenom] & " " & [anal].[nom] AS Analyste,
maint.periode, maint.annee,
Round(Sum(maint.temps_requete_min)/60,2) AS REQ,
Round(Sum(maint.temps_analyse_min)/60,2) AS ANA,
Round(Sum(maint.temps_maj_min)/60,2) AS MAJ,
Round(Sum(maint.temps_rap_min)/60,2) AS RAP,
Round((Sum(maint.temps_requete_min)+Sum(maint.temps_analyse_min)+Sum(maint.temps_maj_min)+Sum(maint.temps_rap_min))/60,2) AS STOTAL,
Count(maint.periode) AS Nombre,
a.description
FROM
rapports AS rap,
analyste AS anal,
maintenances AS maint,
per_annuelle,
annees AS a
WHERE
(((rap.id_anal_maint)=anal.id_analyste) And
((maint.id_fichier)=rap.id_rapport) And
((maint.maint_effectuee)=True) And
((maint.annee)=per_annuelle.annee) And
((per_annuelle.annee)=a.annees))
GROUP BY
[anal].[prenom] & " " & [anal].[nom],
maint.periode,
maint.annee,
a.description,
anal.id_analyste
ORDER BY
maint.annee, maint.periode;
All measures are many orders of magnitude higher than what they should be. I suspect that my Count() is wrong, but I can't see what would be wrong with the sums :|
Edit: Finally I have come up with this query which shows the same measures I have calculated using Excel from the advice given in the comments and the answer provided. Many thanks to everyone. What I would like to know however, is why it makes a difference to use explicit joins rather than implicit joins (WHERE clause on PKs).
SELECT
maintenances.periode,
[analyste].[prenom] & " " & analyste.nom,
Round(Sum(maintenances.temps_requete_min)/60,2) AS REQ,
Round(Sum(maintenances.temps_analyse_min)/60,2) AS ANA,
Round(Sum(maintenances.temps_maj_min)/60,2) AS MAJ,
Round(Sum(maintenances.temps_rap_min)/60,2) AS RAP,
Round((Sum(maintenances.temps_requete_min)+Sum(maintenances.temps_analyse_min)+Sum(maintenances.temps_maj_min)+Sum(maintenances.temps_rap_min))/60,2) AS STOTAL,
Count(maintenances.periode) AS Nombre
FROM
(maintenances INNER JOIN rapports ON maintenances.id_fichier = rapports.id_rapport)
INNER JOIN analyste ON rapports.id_anal_maint = analyste.id_analyste
GROUP BY analyste.prenom, maintenances.periode
In this case, the problem is typically that your joins are bringing together multiple dimensions. You end up doing a cross product across two or more categories.
The fix is to do the summaries independently along each dimension. That means that the "from" clause contains subqueries with group bys, and these are then joined together. The group by would disappear from the outer query.
This would suggest having a subquery such as:
from (select maint.periode, maint.annee,
Round(Sum(maint.temps_requete_min)/60,2) AS REQ,
Round(Sum(maint.temps_analyse_min)/60,2) AS ANA,
Round(Sum(maint.temps_maj_min)/60,2) AS MAJ,
Round(Sum(maint.temps_rap_min)/60,2) AS RAP,
Round((Sum(maint.temps_requete_min)+Sum(maint.temps_analyse_min) +Sum(maint.temps_maj_min)+Sum(maint.temps_rap_min))/60,2) AS STOTAL,
Count(maint.periode) AS Nombre,
from maintenances maint
group by maint.periode, maint.annee
) m
I say "such as" because without a layout of the tables, it is difficult to see exactly where the problem is and what the exact solution is.