Create a new row when encountering a special character (eg "/") in a field - sql

I'm trying to replicate a record whenever I come across a slash in a specific field.
Background of the problem: I'm trying to compare two lists of data containing item number, item description, and item serial number. One list has item quantity and status information, the other list has item location information, so I'm trying to match up the location list onto the main list. The problem is that both lists were created independently of each other so they both have errors and I've only been able to inner join in SQL about 20%. The rest don't match up because the item number is wrong in one list or the other, the serial number might be missing a digit in one list, and I can't compare the nomenclatures very well either because one might say "Hand Wrench" and the other might say "Wrench, 5mm, socket".
Additionally, one list of data has multiple items, that related to some main item, saved in each record. They did this by storing the multiple serial numbers separated by slashes in the serial number field.
Tried using Levenshtein difference (fuzzy match) in Alteryx for serial number / item number matching. This created way too many false positives because serial numbers are sequential, item numbers are frequently incorrect, and item descriptions might look similar to a human but character length can be wildly different (eg. "Truck" in one list might not match well if the other list has something like "Truck, 8 wheel, cargo, flatbed").
I'm currently trying to just match the lists on if the serial number in one list is contained in the other list (with multiple serial numbers in the serial number field).
Example SQLite code I'm currently using
select * from [MISSING ITEMS LIST] as a
left join [RFID TAG SCAN] as b on
b.[SERIAL NUMBER] like '%' || (a.[SERIAL NUMBER] || '%')
where b.[SERIAL NUMBER] <> '' and b.[SERIAL NUMBER] is not null
What I'm trying to achieve:
Recopying this part from above:
So table A might have this:
Record# Item# Description SN
1, 156928, Truck, 1234
2, 209344, Truck Cover, 5588
And Table B might have this
Record# Item# Description SN
1, 156928, Truck, 5588/01234
To make the analysis a little easier, I'd like to convert Table B to this:
Record# Item# Description SN
1, 156928, Truck, 5588
1, 156928, Truck, 01234

If I assume there is at most one slash, then in MariaDB you can do:
select Record, Item, Description
substring_index(SN, '/', 1) as SN
from t
union all
select Record, Item, Description
substring_index(SN, '/', -1) as SN
from t
where SN like '%/%';

In order to convert table B to the format you want use the Text to Columns tool
In the Configuration, set the Delimiter to "/" and select "Split to Rows"

Related

Create a "products you may be interested in" algorithm in SQL?

i have a problem which im not sure how to approach.
I have a simple database where i store products , users , and purchases of products by users.
Each product has a name , a category and a price.
My goal is the following :
I want to display a list of 5 items that are suggested as "You might be interested in" to the user.The main problem is that i don't just want to search LIKE %..% for the name , but i also want to take into account the types of products the user usually buys , the price range he usually buys at , and giving priority to products being bought more often.
Is such an algorithm realistic? I can think of some metrics , like grouping all categories into semantically "similar" buckets and calculating distance from that , but im not sure how i should rank them when there is multiple criteria.
Maybe i should give each criteria an importance factor and have the result be a multiplication of the distance * the factor?
What you could do is create 2 additional fields for each product in your database. In the first field called Type for example you could say "RC" and in the second field called similar you could say, "RC, Radio, Electronics, Remote, Model" Then in your query in SQL later on you can tell it to select products that match up between type and similar. This provides a system that doesn't just rely on the product name, as these can be deceiving. It would be still using the LIKE command, but it would be far more accurate as it's pre-defined by you as to what other products are similar to this one.
Depending on the size of your database already, I believe this to be the simplest option.
I was using this on MySql for some weighted search :
SELECT *,
IF(
`libelle` LIKE :startSearch, 30,
IF(`libelle` LIKE :fullSearch, 20, 0)
)
+ IF(
`description` LIKE :startSearch, 10,
IF(`description` LIKE :fullSearch, 5, 0)
)
+ IF(
`keyword` LIKE :fullSearch, 1, 0
)
AS `weight`
FROM `t`
WHERE (
-- at least 1 match
`libelle` LIKE :fullSearch
OR `description` LIKE :fullSearch
OR `keyword` LIKE :fullSearch
)
ORDER BY
`weight` DESC
/*
'fullSearch'=>'%'.str_replace(' ', '_', trim($search)).'%',
'startSearch'=>str_replace(' ', '_', trim($search)).'%',
*/

Unmatched Access Query to check multiple text

So I have field column of data that has multiple Vendor names that are separated by semicolon's that sell a given product that I would like to confirm they match the Vendor's name within a Vendor table. The following Access Query does what I need if there is only a single Vendor in the Product table but falls apart when I have multiples separated by the the semicolon. Is there a way I can modify the syntax here to check multiple Vendors when they present? Example of this would be: "Vendor A; Vendor B; Vendor C" all in the same record for the Vendor field in the Product table and with the Vendor table Vendor A, B and C are individual records.
SELECT [Product_Table].[Product Name]
FROM [Product_Table] LEFT JOIN [Vendor_Table] ON [Product_Table].[Vendor] = [Vendor_Table].[Vendor]
WHERE ((([Vendor_Table].Vendor) Is Null));
Have a way to do this but need to create some helper objects.
Create this query as qryNumberList_:
SELECT DISTINCT MSysObjects.Id
FROM MSysObjects;
(This query pulls unique object Ids from MSysObject table. In a brand new db in Access 2007, this returns 34 rows. This will be important later on.)
Create this query as qryNumberList:
SELECT DCount("*","qryNumberList_","[Id] < " & [Id]) AS [Number]
FROM qryNumberList_
WHERE DCount("*","qryNumberList_","[Id] < " & [Id]) <= DMax("Len([Vendor])-Len(Replace([Vendor],';',''))","Product_Table", '[Vendor] Is Not Null');
(This query creates a sequential list of numbers starting from 0 based on the Id from the previous query.)
Create this function in a public module:
Public Function Split_(ByVal v As Variant, ByVal d As String, ByVal p As Integer) As Variant
Split_ = Trim(Split(Nz(v, ""), d)(p))
End Function
(This is simply a wrapper function for Split, which cannot be called within queries.)
After you've set this up, copy this SQL into a new query and run it:
SELECT Product_Table.[Product Name], Split_([Vendor],";",[Number]) AS SplitVendor
FROM Product_Table, qryNumberList
WHERE (((qryNumberList.Number)<=Len([Vendor])-Len(Replace([Vendor],";",""))));
What this code is doing is creating a sequential list of numbers, then CROSS JOINING it against that list. The Split_ function pulls out the part of the string denoted by the number. So for data in Product_Table that looks like:
[Product Name] [Vendor]
My Product Vendor A;Vendor B;Vendor C
the resutls are:
[Product Name] [SplitVendor]
My Product Vendor A
My Product Vendor B
My Product Vendor C
The real gotcha here is that you will only be able to return the part of a string that has as equal or less distinct rows in qryNumberList_. Luckily, this should be at least 34 (see note about qryNumberList_). I've tried to do this in a way where you don't create another persistent table, and thus the limitation.
I've tried to explain it best I can, but I'm tired at the moment. Just try it and see if it gets you what you want.
EDIT: Oops, test data I presented had a comma delimiter for the second one. Should have been ;.
EDIT 2: Changed qryNumberList to limit number of rows to max delimiters in Vendor column in Product_Table.
Can you try using the inSTR function instead of =.
I.e,
SELECT [Product_Table].[Product Name]
FROM [Product_Table] LEFT JOIN [Vendor_Table]
ON inSTR([Vendor_Table].[Vendor], [Product_Table].[Vendor]) > 0
WHERE ((([Vendor_Table].Vendor) Is Null));
I might help you or at least give a start point.
Regards,

How to create a derived column base on other column name

I have two tables. One table has product information with a column ProdNum and the second table has all the related product documents. Those prodcut document name format is like this: ProdNumxxxx.pd So all documents' five first digits are related to a ProdNum then it has letters and other number. I need to join these two tables to find the documents related to the product number. Now I can't join it on ProdNum and ProdFile because obviously they do not match. I was thinking of creating another column that will pick up the first five characters of ProdFile name and then create a column on those first five characters that way I can do JOIN on it to match the ProdNum. I have absolutely no clue how to do this. Any ideas/opinions?
Here is a query that you can use to handle filenames that have less than 5 numbers:
select
ProdFileInfo.*
FROM ProdFileInfo
INNER JOIN ProdInfo
ON ProdInfo.ProdNum = (CASE
WHEN LEN(ProdFileInfo.ProdFile) < 5 THEN CAST(SUBSTRING(ProdFileInfo.ProdFile, 1, LEAST(5) AS decimal)
ELSE CAST(SUBSTRING(ProdFileInfo.ProdFile, 1, 5) AS decimal)
END);

Search a column for values LIKE in another column

I searched but couldn't find what I was looking for, maybe I'm not looking for the right terms though.
I have a colum for SKUs and a Keyword column, the SKUs are formatted AA 12345, and the Keywords are just long lists of words, what I need to do is find any records where the numbers in the SKU match any part of the Keywords, I'm just not sure how to do this. For example I'd like to remove the AA so that I'm looking for %12345% anywhere inside of the value of keywords, but I need to do it for every record.
I've tried a few variations of:
SELECT *, Code AS C
FROM Prod
WHERE Keywords LIKE '%C%';
but I get errors on all of them. Can someone help?
Thank you.
EDIT: Okay, sorry about that, the question wasn't the clearest. I'll try to clarify;
The SKU column has values that have a 2 letter prefix in front of a varying amount of numbers such as, AA 12345 or UN 98767865
The Keywords columns are full of information, but also include the SKU values, the problem here is that some of the keyword columns contain the SKU values of products that have entirely different records
I'm trying to find what columns contain the value of different records.
I hope that's more understandable.
EDIT EDIT: Here is some actual sample data
Code: AD 56409429
Keywords: 56409429, 409249, AD 56409429, AD-56409429, Advance 56409429, Nilfisk 56409429, Nilfisk Advance 56409429, spx56409429, 56409429M, 56409429G, 56409429H, ADV56409429, KNT56409429, Kent 56409429, AA 12345
Code: AA 12345
Keywords: AA 12345, 12345, Brush
I need to find all the records where an Errant Code value has found it's way into the Keywords, such as the first case above, so I need a query that would only return the first example
I'm really sorry my explanation is confusing, it's perhaps an extension of how confused I am trying to figure out how to do it. Imagine me sitting there with the site owner who added thousands of these extra sku numbers to their keywords and having them ask me to then remove them :/
Assuming all of your SKU values are in exactly the same format you can remove the 'AA' part using SUBSTRING and then use the result in the LIKE statement:
SELECT * FROM Prod WHERE Keywords LIKE '%' + SUBSTRING(Code, 3,5) + '%'
Seeing as your SKU codes can be variable length the SUBSTRING statement above will have to changed to:
SELECT * FROM Prod WHERE Keywords LIKE '%' + SUBSTRING(Code, 3, LEN(Code)) + '%'
This will remove the first 3 characters from your SKU code regardless of the number of digits it contains afterwards.
It is not entirely clear from your question whether or not the Keywords are in the format AA 12345 or just 12345 but assuming they are and are comma separated. Then you can find all records where the code is in the keywords but there are OTHER keywords also by using this statement:
SELECT *
FROM Prod
WHERE Keywords LIKE '%' + SUBSTRING(Code, 3, LEN(Code)) + '%'
AND Keywords <> SUBSTRING(Code, 3, LEN(Code))
This statement basically says find me all records where SKU code is somewhere in the Keywords BUT also must not exactly match the Keywords contents, i.e. there must be other keywords in the data.
Ok based on your last revisions I think this will work - or at least get you along the road (I am assuming your Product table has a primary key of Id). Also this is most likely horribly inefficient but seeing as it sounds as if this is a one off tidy up it may not matter too much as long as it works (at least that is what I am hoping).
SELECT DISTINCT P.Id
FROM PROD P
INNER JOIN
(
-- Get all unique SKU codes from Prod table
SELECT DISTINCT SUBSTRING(CODE, 3, LEN(CODE)) as Code FROM Prod
) C ON P.Keywords LIKE '%' + C.Code + '%'
AND SUBSTRING(P.Code, 3, LEN(P.Code)) <> C.Code
The above statement joins a unique list of SKU codes (with the letter prefix removed) with every matching record via the join on the Keyword column. Note: This will result in duplicate product records being returned. Additionally the result-set is filtered so as to only return matching records where the SKU Code of the original Product record does not match a SKU code contained in the keywords column.
The distinct then returns only a unique list of Product Id's that have a erroneous SKU code in the Keyword column (they have may have multiples).
Stuff() seems better suited here.... I would do this:
SELECT *
FROM Prod WHERE
Keywords LIKE '%' + STUFF(SKU,1,3,'') + '%'
This will work for both AA 12345 and UN 98767865 -- it replace the first 3 characters with blank.

looping through a numeric range for secondary record ID

So, I figure I could probably come up with some wacky solution, but i figure i might as well ask up front.
each user can have many orders.
each desk can have many orders.
each order has maximum 3 items in it.
trying to set things up so a user can create an order and the order auto generates a reference number and each item has a reference letter. reference number is 0-99 and loops back around to 0 once it hits 99, so orders throughout the day are easy to reference for the desks.
So, user places an order for desk #2 of 3 items:
78A: red stapler
78B: pencils
78C: a kangaroo foot
not sure if this would be done in the program logic or done at the SQL level somehow.
was thinking something like neworder = order.last + 1 and somehow tying that into a range on order create. pretty fuzzy on specifics.
Without knowing the answer to my comment above, I will assume you want to have the full audit stored, rather than wiping historic records; as such the 78A 78B 78C type orders are just a display format.
If you have a single Order table (containing your OrderId, UserId, DeskId, times and any other top-level stuff) and an OrderItem table (containing your OrderItemId, OrderId, LineItemId -- showing 1,2 or 3 for your first and optional second and third line items in the order, and ProductId) and a Product table (ProductId, Name, Description)
then this is quite simple (thankfully) using the modulo operator, which gives the remainder of a division, allowing you in this case to count in groups of 3 and 100 (or any other number you wish).
Just do something like the following:
(you will want to join the items into a single column, I have just kept them distinct so that you can see how they work)
Obviously join/query/filter on user, desk and product tables as appropriate
select
o.OrderId,
o.UserId,
o.DeskId
o.OrderId%100 + 1 as OrderNumber,
case when LineItem%3 = 1 then 'A'
when LineItem%3 = 2 then 'B'
when LineItem%3 = 0 then 'C'
end as ItemLetter,
oi.ProductId
from tb_Order o inner join tb_OrderItem oi on o.OrderId=oi.OrderId
Alternatively, you can add the itemLetter (A,B,C) and/or the OrderNumber (1-100) as computed (and persisted) columns on the tables themselves, so that they are calculated once when inserted, rather than recalculating/formatting when they are selected.
This sort-of breaks some best practice that you store the raw data in the DB and you format on retrieval; but if you are not going to update the data and you are going to select the data for more than you are going to write the data; then I would break this rule and calculate your formatting at insert time