Access SQL - Add Row Number to Query Result for a Multi-table Join - sql

What I am trying to do is fairly simple. I just want to add a row number to a query. Since this is in Access is a bit more difficult than other SQL, but under normal circumstances is still doable using solutions such as DCount or Select Count(*), example here: How to show row number in Access query like ROW_NUMBER in SQL or Access SQL how to make an increment in SELECT query
My Issue
My issue is I'm trying to add this counter to a multi-join query that orders by fields from numerous tables.
Troubleshooting
My code is a bit ridiculous (19 fields, seven of which are long expressions, from 9 different joined tables, and ordered by fields from 5 of those tables). To make things simple, I have an simplified example query below:
Example Query
SELECT DCount("*","Requests_T","[Requests_T].[RequestID]<=" & [Requests_T].[RequestID]) AS counter, Requests_T.RequestHardDeadline AS Deadline, Requests_T.RequestOverridePriority AS Priority, Requests_T.RequestUserGroup AS [User Group], Requests_T.RequestNbrUsers AS [Nbr of Users], Requests_T.RequestSubmissionDate AS [Submitted on], Requests_T.RequestID
FROM (((((((Requests_T
INNER JOIN ENUM_UserGroups_T ON ENUM_UserGroups_T.UserGroups = Requests_T.RequestUserGroup)
INNER JOIN ENUM_RequestNbrUsers_T ON ENUM_RequestNbrUsers_T.NbrUsers = Requests_T.RequestNbrUsers)
INNER JOIN ENUM_RequestPriority_T ON ENUM_RequestPriority_T.Priority = Requests_T.RequestOverridePriority)
ORDER BY Requests_T.RequestHardDeadline, ENUM_RequestPriority_T.DisplayOrder DESC , ENUM_UserGroups_T.DisplayOrder, ENUM_RequestNbrUsers_T.DisplayOrder DESC , Requests_T.RequestSubmissionDate;
If the code above is trying to select a field from a table not included, I apologize - just trust the field comes from somewhere (lol i.e. one of the other joins I excluded to simply the query). A great example of this is the .DisplayOrder fields used in the ORDER BY expression. These are fields from a table that simply determines the "priority" of an enum. Example: Requests_T.RequestOverridePriority displays to the user as an combobox option of "Low", "Med", "High". So in a table, I assign a numerical priority to these of "1", "2", and "3" to these options, respectively. Thus when ENUM_RequestPriority_T.DisplayOrder DESC is called in order by, all "High" priority requests will display above "Medium" and "Low". Same holds true for ENUM_UserGroups_T.DisplayOrder and ENUM_RequestNbrUsers_T.DisplayOrder.
I'd also prefer to NOT use DCOUNT due to efficiency, and rather do something like:
select count(*) from Requests_T where Requests_T.RequestID>=RequestID) as counter
Due to the "Order By" expression however, my 'counter' doesn't actually count my resulting rows sequentially since both of my examples are tied to the RequestID.
Example Results
Based on my actual query results, I've made an example result of the query above.
Counter Deadline Priority User_Group Nbr_of_Users Submitted_on RequestID
5 12/01/2016 High IT 2-4 01/01/2016 5
7 01/01/2017 Low IT 2-4 05/06/2016 8
10 Med IT 2-4 07/13/2016 11
15 Low IT 10+ 01/01/2016 16
8 Low IT 2-4 01/01/2016 9
2 Low IT 2-4 05/05/2016 2
The query is displaying my results in the proper order (those with the nearest deadline at the top, then those with the highest priority, then user group, then # of users, and finally, if all else is equal, it is sorted by submission date). However, my "Counter" values are completely wrong! The counter field should simply intriment +1 for each new row. Thus if displaying a single request on a form for a user, I could say
"You are number: Counter [associated to RequestID] in the
development queue."
Meanwhile my results:
Aren't sequential (notice the first four display sequentially, but then the final two rows don't)! Even though the final two rows are lower in priority than the records above them, they ended up with a lower Counter value simply because they had the lower RequestID.
They don't start at "1" and increment +1 for each new record.
Ideal Results
Thus my ideal result from above would be:
Counter Deadline Priority User_Group Nbr_of_Users Submitted_on RequestID
1 12/01/2016 High IT 2-4 01/01/2016 5
2 01/01/2017 Low IT 2-4 05/06/2016 8
3 Med IT 2-4 07/13/2016 11
4 Low IT 10+ 01/01/2016 16
5 Low IT 2-4 01/01/2016 9
6 Low IT 2-4 05/05/2016 2
I'm spoiled by PLSQL and other software where this would be automatic lol. This is driving me crazy! Any help would be greatly appreciated.
FYI - I'd prefer an SQL option over VBA if possible. VBA is very much welcomed and will definitely get an up vote and my huge thanks if it works, but I'd like to mark an SQL option as the answer.

Unfortuantely, MS Access doesn't have the very useful ROW_NUMBER() function like other clients do. So we are left to improvise.
Because your query is so complicated and MS Access does not support common table expressions, I recommend you follow a two step process. First, name that query you already wrote IntermediateQuery. Then, write a second query called FinalQuery that does the following:
SELECT i1.field_primarykey, i1.field2, ... , i1.field_x,
(SELECT field_primarykey FROM IntermediateQuery i2
WHERE t2.field_primarykey <= t1.field_primarykey) AS Counter
FROM IntermediateQuery i1
ORDER BY Counter
The unfortunate side effect of this is the more data your table returns, the longer it will take for the inline subquery to calculate. However, this is the only way you'll get your row numbers. It does depend on having a primary key in the table. In this particular case, it doesn't have to be an explicitly defined primary key, it just needs to be a field or combination of fields that is completely unique for each record.

Related

A more efficient way to sum the difference between columns in postgres?

For my application I have a table with these three columns: user, item, value
Here's some sample data:
user item value
---------------------
1 1 50
1 2 45
1 23 35
2 1 88
2 23 44
3 2 12
3 1 27
3 5 76
3 23 44
What I need to do is, for a given user, perform simple arithmetic against everyone else's values.
Let's say I want to compare user 1 against everyone else. The calculation looks something like this:
first_user second_user result
1 2 SUM(ABS(50-88) + ABS(35-44))
1 3 SUM(ABS(50-27) + ABS(45-12) + ABS(35-44))
This is currently the bottleneck in my program. For example, many of my queries are starting to take 500+ milliseconds, with this algorithm taking around 95% of the time.
I have many rows in my database and it is O(n^2) (it has to compare all of user 1's values against everyone else's matching values)
I believe I have only two options for how to make this more efficient. First, I could cache the results. But the resulting table would be huge because of the NxN space required, and the values need to be relatively fresh.
The second way is to make the algorithm much quicker. I searched for "postgres SIMD" because I think SIMD sounds like the perfect solution to optimize this. I found a couple related links like this and this, but I'm not sure if they apply here. Also, they seem to both be around 5 years old and relatively unmaintained.
Does Postgres have support for this sort of feature? Where you can "vectorize" a column or possibly import or enable some extension or feature to allow you to quickly perform these sorts of basic arithmetic operations against many rows?
I'm not sure where you get O(n^2) for this. You need to look up the rows for user 1 and then read the data for everyone else. Assuming there are few items and many users, this would be essentially O(n), where "n" is the number of rows in the table.
The query could be phrased as:
select t1.user, t.user, sum(abs(t.value - t1.value))
from t left join
t t1
on t1.item = t.item and
t1.user <> t.user and
t1.user = 1
group by t1.user, t.user;
For this query, you want an index on t(item, user, value).

Should I combine the columns of a fact table to make it more narrow, or should I keep it more user friendly with a lot of columns?

I have a Fact table that shows the results of KPIs. There are several KPIs, and some of these have a similar output.
My current columns are something like this:
KPI_ID, DOCUMENT_ID, TRUE_FALSE_FLAG1, TRUE_FALSE_FLAG2, DURATION_3, DURATION_4
So, for KPI number 1 (true false output), the last three columns will be NULL- values. Should I combine TRUE_FALSE_FLAG1 and TRUE_FALSE_FLAG2? What is BEST PRACTICE?
In total, there are 18 columns, where 12 of them are either true/false- flags or durations in the shape of "number of days" (integer).
picture of the two alternatives
EDIT:
KPI 3 could be "duration of problem", and you'd have a bunch of problems, each with a documentID, represented as a row. Dur_3 would be like 5 days, 3 days, 10 days, etc. KPI 4 would be "Delay of fix after repair was ordered", and the answer would still be an integer in days. But completely non- related to KPI 3.
Reporting could be "average delay of fix". So roughly a select AVG() from table where KPI_ID = 3 group by KPI_ID.
Based on your latest comment, you are best with Alternative 2. Specifically, as long as every KPI is only True/False, and has only one duration to store, you are better with Alternative 2.
EDIT: with Alternative 2, each KPI can store one True/False value AND one duration value

Multicriteria Insert/Update

I'm trying to create a query that will insert new records to a table or update already existing records, but I'm getting stuck on the filtering and grouping for the criteria I want.
I have two tables: tbl_PartInfo, and dbo_CUST_BOOK_LINE.
I'm want to select from dbo_CUST_BOOK_LINE based upon the combination of CUST_ORDER_ID, CUST_ORDER_LINE_NO, and REVISION_ID. Each customer order can have multiple lines, and each line can have multiple revision. I'm trying to select the unique combinations of each order and it's connected lines, but take the connected information for the row with the highest value in the revision column.
I want to insert/update from dbo_CUST_BOOK_LINE the following columns:
CUST_ORDER_ID
PART_ID
USER_ORDER_QTY
UNIT_PRICE
I want to insert/update them into tbl_PartInfo as the following columns respectively:
JobID
DrawingNumber
Quantity
UnitPrice
So if I have the following rows in dbo_CUST_BOOK_LINE (PART_ID omitted for example)
CUST_ORDER_ID CUST_ORDER_LINE_NO REVISION_ID USER_ORDER_QTY UNIT_PRICE
SCabc 1 1 0 100
SCabc 1 2 4 150
SCabc 1 3 4 125
SCabc 2 3 2 200
SCxyz 1 1 0 0
SCxyz 1 2 3 50
It would return
CUST_ORDER_ID CUST_ORDER_LINE_NO (REVISION_ID) USER_ORDER_QTY UNIT_PRICE
SCabc 1 3 4 125
SCabc 2 3 2 200
SCxyz 1 2 3 50
but with PART_ID included and without REVISION_ID
So far, my code is just for the inset portion as I was trying to get the correct records selected, but I keep getting duplicates of CUST_ORDER_ID and CUST_ORDER_LINE_NO.
INSERT INTO tbl_PartInfo ( JobID, DrawingNumber, Quantity, UnitPrice, ProductFamily, ProductCategory )
SELECT dbo_CUST_BOOK_LINE.CUST_ORDER_ID, dbo_CUST_BOOK_LINE.PART_ID, dbo_CUST_BOOK_LINE.USER_ORDER_QTY, dbo_CUST_BOOK_LINE.UNIT_PRICE, dbo_CUST_BOOK_LINE.CUST_ORDER_LINE_NO, Max(dbo_CUST_BOOK_LINE.REVISION_ID) AS MaxOfREVISION_ID
FROM dbo_CUST_BOOK_LINE, tbl_PartInfo
GROUP BY dbo_CUST_BOOK_LINE.CUST_ORDER_ID, dbo_CUST_BOOK_LINE.PART_ID, dbo_CUST_BOOK_LINE.USER_ORDER_QTY, dbo_CUST_BOOK_LINE.UNIT_PRICE, dbo_CUST_BOOK_LINE.CUST_ORDER_LINE_NO;
This has been far more complicated that anything I've done so far, so any help would be greatly appreciated. Sorry about the long column names, I didn't get to choose them.
I did some research and think I found a way to make it work, but I'm still testing it. Right now I'm using three queries, but it should be easily simplified into two when complete.
The first is an append query that takes the two columns I want to get distinct combo's from and selects them and using "group by," while also selecting max of the revision column. It appends them to another table that I'm using called tbl_TempDrop. This table is only being used right now to reduce the number of results before the next part.
The second is an update query that updates tbl_TempDrop to include all the other columns I wanted by setting the criteria equal to the three selected columns from the first query. This took an EXTREMELY long time to complete when I had 700,000 records to work with, hence the use of the tbl_TempDrop.
The third query is a basic append query that appends the rows of tbl_TempDrop to the end destination, tbl_PartInfo.
All that's left is to run all three in a row.
I didn't want to include the full details of any tables or queries yet until I ensure that it works as desired, and because some of the names are vague since I will be using this method for multiple query searches.
This website helped me a little to make sure I had the basic idea down. http://www.techonthenet.com/access/queries/max_query2_2007.php
Let me know if you see any flaws with the ideology!

Find value -> sum for 90 cells -> drop -> repeat

I have this spreadsheet i am working on for awhile. Its basically attendance piece. End user keeps track of employees, if they showed up or not etc...
I have tired looking up loops but i couldn't figure out how to do what i am trying to do.
What i have in this excel.
A-D : Emp info
E-∞ : 1-3 Days/Dates; 4-∞ emp data (if they missed a day, values for that)
To get better understanding, see this
The data entered from E5 to xx thats where i am trying to get this vba working.
Anything the script detects first value either '1' or '2', start 90 days (cells) from there. And after 90, reset to 0. starting from 91 start search for '1' or '2' and do similar.
See the excel file for better understanding. If it doesn't make sense, ill be happy to simplify.
Thank You
The most efficient and clean way to handle this is to use a form of a relational data model because it can be done easily without using VBA code. You will have two simple tables in your spreadsheet, EmployeeInfo and AttendanceRecords. Your Employee info will look something like this
Emp# Name Craft # In 90 Days NumOf2s NumOf1s
1 EMP 1 SM Site Manager 0 0 0
2 EMP 2 SM Site Manager 1 0 1
3 EMP 3 SM Site Manager 0 0 0
4 EMP 4 SM Site Manager 0 0 0
5 EMP 5 SM Site Manager 1 0 1
The last three columns are calculated from the AttendanceRecords table. This table is going to be variable size but this way you only need to store the important data (When employees actually got marks). It will look like this.
Emp# Date Days Count
1 12/1/2013 122 1
3 1/1/2014 91 2
2 2/1/2014 60 1
5 2/15/2014 46 1
You can have multiple entries for the same day and the same employee. The important thing is that we only need one entry per infraction (NOTE: In order to do this in a proper database type model, each attendance record should also have some kind of incrementing totally unique ID (like employees), but we can forgo that for this application).
You enter in the employee number, the date, and the count. The "Days" column then auto calculates the age of the record with the following formula:
=TODAY()-[#Date]
NOTE: If the [#Date] notation does not look familiar, this is because it deals with Excel Tables. I recommend you read up on those if not already familiar.
So now we have the age of each record. So back on the EmployeeInfo table, we use the following formula to get all AttendanceRecords that apply to Employee x for the last 90 days
=SUMIFS(AttendanceRecords[Count],AttendanceRecords[Emp'#],[#[Emp'#]],AttendanceRecords[Days],"<=90")
You can now also use some simple formulas to get the other columns I pointed out, including the number of 2 count in fractions or the number of 1 count infractions:
=COUNTIFS(AttendanceRecords[Emp'#],[#[Emp'#]],AttendanceRecords[Days],"<=90",AttendanceRecords[Count],2)
=COUNTIFS(AttendanceRecords[Emp'#],[#[Emp'#]],AttendanceRecords[Days],"<=90",AttendanceRecords[Count],1)
There is a lot more data that could be gathered, including the date of the last infraction, total number of infractions for all time, etc. If any of the formulas or terms I used don't make sense or need more explaining, feel free to ask.
EDIT: If you want them automatically removed after 90 days, it would be relatively easy to write a VBA script to do this. It would also be easy to just sort the AttendanceRecords table on Days and delete all records that are older than a certain number of days. However, unless you see yourself adding hundreds of records a week, this really shouldn't be necessary. Also, If you want to write a Visual Basic form to enter in new infractions, that is definitely very possible, but another discussion.
EDIT: To respond to concerns about viewing when these issues happened, I will give you an example of a way to view the data in your tables. One of the advantages of excel tables is that the order of the records isn't as absolute as in a normal range, so we can sort, rearrange, and filter them to see what we need. So if you need to see all of the issues with employee 3, you just go to the Emp# column in the AttendanceRecords table, select the little arrow down button next to Emp#, uncheck 'Select All', and then check the '3', and then the only values I will see in the table are the ones from employee 3. I can then sort the 'Date' column by clicking its little arrow and selecting 'Sort Newest to Oldest'.
What it comes down to is that you can view ANY data you need to, and if you think through what you really need to see, you can set up your summary table (EmployeeInfo) to display enough data that you hardly ever have to look at the AttendanceRecords table. But if you need to, you can go into that table and do a manual sort (as I described above) very easily.
EDIT: To help add some of the functionality I've shown above to the askers current spreadsheet, I will show the current formula.
=SUMIFS($E5:$BR5,$E$3:$BR$3,">"&(TODAY()-90))
For EMP 1, this formula uses the employees row as the sum range. It then looks at the field of dates in the corresponding columns in row 3. If the date in row 3 is > TODAY()-90, then we will add it to the count. This will at least just look at the infractions for the previous 90 days.

How to get Next 4 digit number from Table (Contains 3,4, 5 & 6, digit long numbers)

I found a good method of getting the next 4 digit number.
How to find next free unique 4-digit number
But in my case I want to get next available 4 or 5 digit number.
And this will change depending upon the users request. Theses number are not a key ID columns, but they are essential to the tagging structure for the business purpose.
Currently I use a table adabpter query. But how would I write a query.
I suppose I could do a long iterative loop through all values until I see a 4 digit.
But I'm trying to think of something more efficient.
Function GetNextAvailableNumber(NumofDigits) as Long
'SQL Code Here ----
'Query Number Table
Return Long
End Function
Here's my current SQL:
'This Queries my View
SELECT MIN([Number]) AS Expr1
FROM LineNumbersNotUsed
'This is my View SQL
SELECT Numbers.Number
FROM Numbers
WHERE (((Exists (Select * From LineList Where LineList.LineNum = Numbers.Number))=False))
ORDER BY Numbers.Number;
Numbers is the List of All available number from 0 to 99999, basically what's available to use.
LineList is my final master table where I keep the long and all the relevant other business information.
Hopefully this make sense.
Gosh you guys are so tough on new guys.
I accidentally hit the enter key, and the question posted and I instantly get -3 votes.
Give a new guy a break will you! Please.
I apologize in advance in case I overlooked something in your question. Using your design, won't a query like this return the next unused 4 digit number?
SELECT MIN([Number]) AS next_number
FROM LineNumbersNotUsed
WHERE
[Number] > 999
AND [Number] < 10000;
This approach is not adequate with multiple concurrent users, but you didn't indicate that is an issue for you.
The question you linked to explains that what you need is a table with 2 fields:
Number InUse
0000 No
0001 No
0002 Yes
0003 No
0005 Yes
Whenever a number is used/released, the table must be updated to set InUse to Yes/No.
Maybe I'm missing something, but from your explanation, and the SQL code you show us, it seems that you only have a table with a single field containing all numbers from 0 to 100000.
If that's the case, I don't see the usefulness of that table at all.
If I were you, and if I understand your need correctly, what you want is something like this:
First of all, create the table as above, with all running numbers from 0 to 100000, and a field for confirming if that number is used or not.
Initialise the InUse field with all the numbers already taken in your LineList table, something like:
UPDATE Numbers SET InUse = True
WHERE Numbers.Number IN (SELECT LineNum FROM LineList)
Write a function ReserveNumber(NumOfDigits as Integer) As Long to find and reserve a 4-digit or 5-digit free number following this logical sequence:
Depending on NumOfDigits (4 or 5) get the result of one of the queries as LowestNumber:
SELECT Min(Number) FROM Numbers WHERE Number < 10000 AND NOT InUse
SELECT Min(Number) FROM Numbers WHERE Number >= 10000 AND NOT InUse
Reserve that particular number to ensure it's not going to be used again:
UPDATE Numbers SET InUse = True WHERE Number = #LowestNumber
Return LowestNumber
Whenever
Notes: the logic above is a bit naive as it suppose that no two users will attempt to get the lowest number at the same time. There is however a risk that this may happen one day.
To remove that risk, you can, for instance, add a TakenBy column to the Numbers table and set it to the current username. Then, after you have reserved the number, read-it again to ensure that the TakenBy is really updated by the current client. If not, just try gain.
There are lots of ways to do this. You can try to fiddle around table locks as well, but whatever your solution, make sure you test it.