Creating a column by using co-occurrence of each instance - sql

TRANSACTION_NUMBER
UPC_CODE
PURCHASED_UNIT
COPURCHASED_FREQUENCY
T123456
1040-1204-8612
2
2
T123456
4020-4104-2120
1
0
T123456
1040-1204-8612
3
2
T123456
2994-8182-9311
5
0
T191201
9879-8712-3456
2
2
T191201
2387-1928-1247
1
0
T191201
7417-2741-4245
4
0
T191201
9879-8712-3456
2
2
[Also please refer to a screenshot example of the hypothetical table]
Please refer to the image above:
Let's hypothesize that the data contains information such as 'TRANSACTION_NUMBER' (indicating the unique instances of each transaction), 'UPC_CODE' (indicating unique product identifiers information), and 'PURCHASED_UNIT' (indicating how many times that particular product is purchased).
My goal is to generate a column called "COPURCHASED_FREQUENCY". This columnn should contain information about number of times that pairs of UPCs are co-purchased within the same transaction instance indicated by a column 'Transaction Number".
The tricky thing is that we cannot perform this operation on neither R nor Python; instead, this column should be created by SQL operation.
I think what I am trying to create is something similar to co-occurence instance. I cannot think of a particular way to do this as of now, and I would appreciate your help! If you can help with this, that would be great!

that would be achievable by using window functions & if you want to show 0 for the ones that have not bought more that once , you can use a case statement :
select *
, count(*) over (partition by TRANSACTION_NUMBER,UPC_CODE) as COPURCHASED_FREQUENCY
from youratble

Related

How can I write a SQL query to list distinct values and their number of occurrences from a table?

I have a table that contains a history of alarm events.
I think the only pertinent column for this is Message, but there are others such as Time/Date and Source.
Here's a sample table:
Time/Date
Message
Source
2022-04-27/11:59:28
Code 1
VFD1
2022-04-27/11:59:37
Code 4
VFD1
2022-04-27/11:59:39
Code 1
VFD1
2022-04-27/11:59:42
Code2
VFD1
2022-04-27/11:59:44
Code 1
VFD1
2022-04-27/11:59:46
Code 3
VFD1
2022-04-27/11:59:48
Code 1
VFD1
2022-04-27/11:59:50
Code 2
VFD1
From this, I'd like to create something like this:
Message
Occurrences
Code 1
4
Code 2
2
Code 3
1
Code 4
1
This is being done inside a SCADA software package (ICONICS/Genesis64), so I'm not sure of the exact flavor of SQL, but I think it should be Microsoft SQL Server or similar to it.
I can run this:
SELECT COUNT( DISTINCT Message) as Messages FROM dm_Alarms
to get how many unique values I have, but I'm stuck on how to count for each unique value, and then list them.
And I do NOT know what all values I will possibly have for Message, it could be very many and change over time.
Thank You
It appears you just need to aggregate?
select Message, count(*) Occurrences
from dm_Alarms
group by Message;

Updating columns based on a combine rows value on the same table

Please assist if possible, I have used Stuff to combine rows into a single row based on other columns. However I want to turn each of the unique items into it's own column with a number showing if it exists, e.g. 1 or 0 and then doing the same for all subsequent rows?
I have been able to create the columns but I can't get them to update per whats in the one column.
But I want it to be dynamic so matter how many different names appear in categories it creates a new column and adds 1 or 0 if it appears or not
How about something like this for SQL Server?
strSQL = "SELECT Category, CASE WHEN Category IS NOT NULL THEN 1 ELSE 0 END AS IsCategoryExist FROM MyTable"
Sample data (the 2nd column shows as 1 if the first column is non-blank):
Cars, 1
[Blank], 0
Airplanes, 1
Radios, 1

Change random assignment to cyclical assignment SQL

EXAMPLE:
The issue is that I have, for example, 5 people to solve 100 cases, and the assignment has to be fair, I think that SQL through loops should be able to assign the first cases to the first 5 people, but then it has to go back to count and reassign, in case a new case falls.
I have two tables with the following fields
Technicians
ID_TEC-----NOM_TEC-----LINEA_TEC
and other whit cases
ID_CASE----DESCRIPTION_CASE
The problem arises because I have to assign cases to each technician. The assignment must be cyclic, that is:
CASE1 TECH1
CASE2 TECH2
CASE3 TECH3
CASE4 TECH1 ...
and when you load the data in the table and rerun the SP or run the job that assigns them, go back to the table, re-count the values ​​and reassign them according to the last assigned TECn. I hope the description is clearer!
You can assign the numbers 1-5 randomly to the tasks by doing:
select t.*,
(1 + row_number() over (order by newid()) % 5) as user_assignment
from t

Multicriteria Insert/Update

I'm trying to create a query that will insert new records to a table or update already existing records, but I'm getting stuck on the filtering and grouping for the criteria I want.
I have two tables: tbl_PartInfo, and dbo_CUST_BOOK_LINE.
I'm want to select from dbo_CUST_BOOK_LINE based upon the combination of CUST_ORDER_ID, CUST_ORDER_LINE_NO, and REVISION_ID. Each customer order can have multiple lines, and each line can have multiple revision. I'm trying to select the unique combinations of each order and it's connected lines, but take the connected information for the row with the highest value in the revision column.
I want to insert/update from dbo_CUST_BOOK_LINE the following columns:
CUST_ORDER_ID
PART_ID
USER_ORDER_QTY
UNIT_PRICE
I want to insert/update them into tbl_PartInfo as the following columns respectively:
JobID
DrawingNumber
Quantity
UnitPrice
So if I have the following rows in dbo_CUST_BOOK_LINE (PART_ID omitted for example)
CUST_ORDER_ID CUST_ORDER_LINE_NO REVISION_ID USER_ORDER_QTY UNIT_PRICE
SCabc 1 1 0 100
SCabc 1 2 4 150
SCabc 1 3 4 125
SCabc 2 3 2 200
SCxyz 1 1 0 0
SCxyz 1 2 3 50
It would return
CUST_ORDER_ID CUST_ORDER_LINE_NO (REVISION_ID) USER_ORDER_QTY UNIT_PRICE
SCabc 1 3 4 125
SCabc 2 3 2 200
SCxyz 1 2 3 50
but with PART_ID included and without REVISION_ID
So far, my code is just for the inset portion as I was trying to get the correct records selected, but I keep getting duplicates of CUST_ORDER_ID and CUST_ORDER_LINE_NO.
INSERT INTO tbl_PartInfo ( JobID, DrawingNumber, Quantity, UnitPrice, ProductFamily, ProductCategory )
SELECT dbo_CUST_BOOK_LINE.CUST_ORDER_ID, dbo_CUST_BOOK_LINE.PART_ID, dbo_CUST_BOOK_LINE.USER_ORDER_QTY, dbo_CUST_BOOK_LINE.UNIT_PRICE, dbo_CUST_BOOK_LINE.CUST_ORDER_LINE_NO, Max(dbo_CUST_BOOK_LINE.REVISION_ID) AS MaxOfREVISION_ID
FROM dbo_CUST_BOOK_LINE, tbl_PartInfo
GROUP BY dbo_CUST_BOOK_LINE.CUST_ORDER_ID, dbo_CUST_BOOK_LINE.PART_ID, dbo_CUST_BOOK_LINE.USER_ORDER_QTY, dbo_CUST_BOOK_LINE.UNIT_PRICE, dbo_CUST_BOOK_LINE.CUST_ORDER_LINE_NO;
This has been far more complicated that anything I've done so far, so any help would be greatly appreciated. Sorry about the long column names, I didn't get to choose them.
I did some research and think I found a way to make it work, but I'm still testing it. Right now I'm using three queries, but it should be easily simplified into two when complete.
The first is an append query that takes the two columns I want to get distinct combo's from and selects them and using "group by," while also selecting max of the revision column. It appends them to another table that I'm using called tbl_TempDrop. This table is only being used right now to reduce the number of results before the next part.
The second is an update query that updates tbl_TempDrop to include all the other columns I wanted by setting the criteria equal to the three selected columns from the first query. This took an EXTREMELY long time to complete when I had 700,000 records to work with, hence the use of the tbl_TempDrop.
The third query is a basic append query that appends the rows of tbl_TempDrop to the end destination, tbl_PartInfo.
All that's left is to run all three in a row.
I didn't want to include the full details of any tables or queries yet until I ensure that it works as desired, and because some of the names are vague since I will be using this method for multiple query searches.
This website helped me a little to make sure I had the basic idea down. http://www.techonthenet.com/access/queries/max_query2_2007.php
Let me know if you see any flaws with the ideology!

Access SQL how to make an increment in SELECT query

I Have an SQL query giving me X results, I want the query output to have a coulmn called
count making the query somthing like this:
count id section
1 15 7
2 3 2
3 54 1
4 7 4
How can I make this happen?
So in your example, "count" is the derived sequence number? I don't see what pattern is used to determine the count must be 1 for id=15 and 2 for id=3.
count id section
1 15 7
2 3 2
3 54 1
4 7 4
If id contained unique values, and you order by id you could have this:
count id section
1 3 2
2 7 4
3 15 7
4 54 1
Looks to me like mikeY's DSum approach could work. Or you could use a different approach to a ranking query as Allen Browne described at this page
Edit: You could use DCount instead of DSum. I don't know how the speed would compare between the two, but DCount avoids creating a field in the table simply to store a 1 for each row.
DCount("*","YourTableName","id<=" & [id]) AS counter
Whether you go with DCount or DSum, the counter values can include duplicates if the id values are not unique. If id is a primary key, no worries.
I frankly don't understand what it is you want, but if all you want is a sequence number displayed on your form, you can use a control bound to the form's CurrentRecord property. A control with the ControlSource =CurrentRecord will have an always-accurate "record number" that is in sequence, and that will update when the form's Recordsource changes (which may or may not be desirable).
You can then use that number to navigate around the form, if you like.
But this may not be anything like what you're looking for -- I simply can't tell from the question you've posted and the "clarifications" in comments.
The only trick I have seen is if you have a sequential id field, you can create a new field in which the value for each record is 1. Then you do a running sum of that field.
Add to your query
DSum("[New field with 1 in it]","[Table Name]","[ID field]<=" & [ID Field])
as counterthing
That should produce a sequential count in Access which is what I think you want.
HTH.
(Stolen from Rob Mills here:
http://www.access-programmers.co.uk/forums/showthread.php?p=160386)
Alright, I guess this comes close enough to constitute an answer: the following link specifies two approaches: http://www.techrepublic.com/blog/microsoft-office/an-access-query-that-returns-every-nth-record/
The first approach assumes that you have an ID value and uses DCount (similar to #mikeY's solution).
The second approach assumes you're OK creating a VBA function that will run once for EACH record in the recordset, and will need to be manually reset (with some VBA) every time you want to run the count - because it uses a "static" value to run its counter.
As long as you have reasonable numbers (hundreds, not thousands) or records, the second approach looks like the easiest/most powerful to me.
This function can be called from each record if available from a module.
Example: incrementingCounterTimeFlaged(10,[anyField]) should provide your query rows an int incrementing from 0.
'provides incrementing int values 0 to n
'resets to 0 some seconds after first call
Function incrementingCounterTimeFlaged(resetAfterSeconds As Integer,anyfield as variant) As Integer
Static resetAt As Date
Static i As Integer
'if reset date < now() set the flag and return 0
If DateDiff("s", resetAt, Now()) > 0 Then
resetAt = DateAdd("s", resetAfterSeconds, Now())
i = 0
incrementingCounterTimeFlaged = i
'if reset date > now increments and returns
Else
i = i + 1
incrementingCounterTimeFlaged = i
End If
End Function
autoincrement in SQL
SELECT (Select COUNT(*) FROM table A where A.id<=b.id),B.id,B.Section FROM table AS B ORDER BY B.ID Asc
You can use ROW_NUMBER() which is in SQL Server 2008
SELECT ROW_NUMBER() OVER (ORDER By ID DESC) RowNum,
ID,
Section
FROM myTable
Then RowNum displays sequence of row numbers.