Create data model according to Select Case Instruction - sql

this question might appear a bit strange to you but i´ll try to explain it.
In our company in the production department we are tracking machine data. This data is also used for evaluating the quality of the production process.
In the following i will refer to these attributes:
productId
componentOfProduct -> the component which is affected by the error
routeStepOfError
causeOfError
The problem is, that the data the machine produces is not in the order the management wants to have it for evaluation.
So we have to do a data matching. Most of the time it is a simple relationship e.g. matching several productId numbers to 1 product Name / Group.
But in the case of the routeStepOfError it´s different. For some cases the routeStep the production lines are logging can be matched to the routeStep for the management reports like descirbed above with the productIds.
But for some routeSteps a way more complicated matching is done. So far it´s implementet in an VBA app which is matching the database output and writes data into a spreadsheet. the matching is done via Select Case Instructions like this:
Select Case routeStep
Case EOL
Select Case productId
Case 1111, 1112, 1113
Select Case causeOfError
Case A1:
Select Case componentOfProduct
Case "be1": routeStepReport = "final optical test"
Case Else: routeStepReport = "end of line"
End Select
Case Else: routeStepReport = "end of line"
End Select
Case Else: routeStepReport = "end of line"
End Select
Case...
End Select
...i know that the syntax might not be correct, but i hope you get what i´m trying to say: sometimes the mathing from routeStep to routeStepReport (i.e. the value we need for our management reports) depends on the routeStep, the productId, the componentOfProduct and the causeOfError.
...and these Select Case Statements are really long as there are many products and many routeSteps in our production process. So, each time, there is a change in the production programm / process, this has to be maintained in the VBA Code which is far away from being perfect as only 1 guy in our company really knows where in the code to look for this and how to maintain it.
So, i proposed to implement the whole matching in an SQL Database and just create the right relationships between the values of the machines and the values the management wants to have. Togehter with an interface in php or whatever people could just do the matching quite easily.
Well, for the simple matchings like productIds to Product Groups this works quite fine, but for the routeSteps like described above for me this might be a problem.
I would have created one table with the following attributes:
|-----------------|-----------------|-----------------|-----------------|-----------------|
|routeSTepofError |productId | componentOfProd | causeOfError | routeStepReport |
|-----------------|-----------------|-----------------|-----------------|-----------------|
But Let´s say, we have about 20 routeSteps, 50 productIds, each with about 4 Components and 10 causes of error this table might be endless as well and really hard to maintain.
Maybe i should have told before, that for the majority of routeSTepofErrors, there is a simple matiching from routeSTepofError to routeStepReport regardless to productIds, components and causes.... but if some mathings are depending on all 4 criterias, i have to completly fill the table above, don´t I?
Maybe there´s an easier solution to achieve this, but yet I cannot see it.
So i would be really pleased for each and every hint you could give me for solving this problem (i cannot change the way of matching itself; they still want to have "their" well-known figures :-) ).
Thanks a lot in advance!
Regards

You might use two tables, tblRouteStepErrorMatch and tblRouteStepErrorException.
tblRouteStepErrorMatch
routeStepofError
routeStepReport
tblRouteStepErrorException
routeStepError
productID
componentOfProd
causeOfError
routeStepReport
Then in your code, check the Exception table. If there's not match, go to the Match table.
ExcRecordset = SELECT * FROM tblRouteStepErrorException WHERE ...
If BOF(ExcRecordset) and EOF(ExcRecordset) Then 'No match in exception table
MatchRecordset = SELECT * FROM tblRouteStepErrorMatch WHERE ... 'go get from match table
get result from MatchRecordset
Else
get result from ExcRecordset
End if
Now your exceptions are a lot easier to maintain because there are far fewer of them and the match table becomes the fallback for when a special case isn't found.

Related

Select Distinct for First Part of column values, but last part (ticket number) needs to be wildcard

So I have a database of emails sent and received by our ticket system, Cherwell, version 9.3.2. It uses Microsoft SQL as a backend, we're on version 2012. I'm interested in doing cleanup on old or irrelevant emails. For instance emails 3+ years old, or notices sent to technicians saying they have a new task, or notices we send out that really have no value in retaining in full email stored in the database, as Cherwell also creates rows of plaintext for most of these emails. The table related to mail, TrebuchetMail is this size: 193,883.156 MB.
I'm wondering if it would improve overall performance to reduce this table, as nearly every type of record in Cherwell would access this table. Granted it would only be those rows relevant to the specific record.
Okay so my question: Subject is a column that stores the subject of the email. I have a few types of Subjects identified for removal, one example is this:
--165765
select count(*)
FROM [cherwell].[dbo].[TrebuchetMail]
where subject like 'You have an unacknowledged Task%';
After the You have an unacknowledged Task part of the subject is a number, the individual Task object's ID number. So doing a select distinct treats all 165765 rows as distinct, because they are. Can you do a wildcard with select distinct to group together similar but not exactly the same? Is there another function I could use rather than distinct? I realize it actually is distinct, but surely this problem has come up before. "select distinct Subject" query that would group together the rows where Subject is like 'You have an unacknowledged Task%' and Subject is like 'Ticket #%Created'. Would I always need some criteria, so maybe this is pointless because I'm going to have to look at the full results to come up with the criteria for the select distinct query anyway.
My goal is to identify different Subjects that could be targeted for archival/removal.
I found a 2013 thread that was a similar question, but it had to do with dates. The asker wanted to group together rows from a log that grouped together the days, disregarding the time aspect of the log. I didn't quite understand how I could translate that to work for my situation. I'd be very grateful for an explanation if that would work for me.
I know this might not be the answer you are looking for since it is low tech not based on formula. But since it is most likely a one time action, why not export the database as a table and sort it by the subject field. All the irrelevant records would be grouped together and could be easily deleted.
After this action simply re-import the table to the database. Of course this only works nicely on a flat database, not on a highly linked up one.
At the same time you would have a backup in case something goes wrong.
What you may want to do (bearing in mind I'm not a SQL expert), is create a new subquery/expression that stands in as a new column in your query, as a truncated section of subject.
Something like,
Select RecID, ( Subject.Replace('1','').Replace('2','').Replace('3','') As CustomColumn )
From TrebuchetMail
and so on, to where you strip out numbers 0-9 anywhere they appear in the subject line.
You can then potentially go distinct based on this I believe.
I'm sure there's a more elegant way of doing this with a Regex expression as well, I just am too novice for it
Not sure how it works out in practice.
Note.... I might have the syntax wrong on those replace commands. I think I'm thinking of how it's done in VB/C# and I think in SQL it's more like Replace(expression, 'text to be replaced', 'text to replace with') but you get the idea

Explanation of particular sql injection

Browsing through the more dubious parts of the web, I happened to come across this particular SQL injection:
http://server/path/page.php?id=1+union+select+0,1,concat_ws(user(),0x3a,database(),0x3a,version()),3,4,5,6--
My knowledge of SQL - which I thought was half decent - seems very limiting as I read this.
Since I develop extensively for the web, I was curious to see what this code actually does and more importantly how it works.
It replaces an improperly written parametrized query like this:
$sql = '
SELECT *
FROM products
WHERE id = ' . $_GET['id'];
with this query:
SELECT *
FROM products
WHERE id = 1
UNION ALL
select 0,1,concat_ws(user(),0x3A,database(),0x3A,version()),3,4,5,6
, which gives you information about the database name, version and username connected.
The injection result relies on some assumptions about the underlying query syntax.
What is being assumed here is that there is a query somewhere in the code which will take the "id" parameter and substitute it directly into the query, without bothering to sanitize it.
It's assuming a naive query syntax of something like:
select * from records where id = {id param}
What this does is result in a substituted query (in your above example) of:
select * from records where id = 1 union select 0, 1 , concat_ws(user(),0x3a,database(),0x3a,version()), 3, 4, 5, 6 --
Now, what this does that is useful is that it manages to grab not only the record that the program was interested in, but also it UNIONs it with a bogus dataset that tells the attacker (these values appear separated by colons in the third column):
the username with which we are
connected to the database
the name of the database
the version of the db software
You could get the same information by simply running:
select concat_ws(user(),0x3a,database(),0x3a,version())
Directly at a sql prompt, and you'll get something like:
joe:production_db:mysql v. whatever
Additionally, since UNION does an implicit sort, and the first column in the bogus data set starts with a 0, chances are pretty good that your bogus result will be at the top of the list. This is important because the program is probably only using the first result, or there is an additional little bit of SQL in the basic expression I gave you above that limits the result set to one record.
The reason that there is the above noise (e.g. the select 0,1,...etc) is that in order for this to work, the statement you are calling the UNION with must have the same number of columns as the first result set. As a consequence, the above injection attack only works if the corresponding record table has 7 columns. Otherwise you'll get a syntax error and this attack won't really give you what you want. The double dashes (--) are just to make sure anything that might happen afterwords in the substitution is ignored, and I get the results I want. The 0x3a garbage is just saying "separate my values by colons".
Now, what makes this query useful as an attack vector is that it is easily re-written by hand if the table has more or less than 7 columns.
For example if the above query didn't work, and the table in question has 5 columns, after some experimentation I would hit upon the following query url to use as an injection vector:
http://server/path/page.php?id=1+union+select+0,1,concat_ws(user(),0x3a,database(),0x3a,version()),3,4--
The number of columns the attacker is guessing is probably based on an educated look at the page. For example if you're looking at a page listing all the Doodads in a store, and it looks like:
Name | Type | Manufacturer
Doodad Foo Shiny Shiny Co.
Doodad Bar Flat Simple Doodads, Inc.
It's a pretty good guess that the table you're looking at has 4 columns (remember there's most likely a primary key hiding somewhere if we're searching by an 'id' parameter).
Sorry for the wall of text, but hopefully that answers your question.
this code adds an additional union query to the select statement that is being executed on page.php. The injector has determined that the original query has 6 fields, thus the selection of the numeric values (column counts must match with a union). the concat_ws just makes one field with the values for the database user , the database, and the version, separated by colons.
It seems to retrieve the user used to connect to the database, the database adress and port, the version of it. And it will be put by the error message.

MySQL: select the closest match?

I want to show the closest related item for a product. So say I am showing a product and the style number is SG-sfs35s. Is there a way to select whatever product's style number is closest to that?
Thanks.
EDIT: to answer your questions. Well I definitely want to keep the first 2 letters as that is the manufacturer code but as for the part after the first dash, just whatever matches closest. so for example SG-sfs35s would match SG-shs35s much more than SG-sht64s. I hope this makes sense whenever I do LIKE product_style_number it only pulls the exact match.
There normally isn't a simple way to match product codes that are roughly similar.
A more SQL friendly solution is to create a new table that maps each product to all the products it is similar to.
This table would either need to be maintained manually, or a more sophisticated script can be executed periodically to update it.
If your product codes follow a consistent pattern (all the letters are the same for similar products, with only the numbers changing), then you should be able to use a regular expression to match the similar items. There are docs on this here...
It sounds like what you want is levenshtein distance .
Unfortunately, there isn't a built-in levenshtein function for mysql, but some folks have come up with a user-defined function that does it(deadlink).
You will probably want to do it as a stored procedure, as I expect that the algorithm may not be trivial.
For example, you may split the term at the -, so you have two parts. You do a LIKE query on each part and use that to make a decision.
You could just loop though, replacing the last character with "%" until you get at least one result, in your stored procedure.
Sounds like you need something like Lucene, though i'm not sure if that would be overkill for your situation. But it certainly would be able to do text searches and return the ones most similar first.
If you need something more simple I would try to start by searching with the full product code, then if that doesn't work try to use wildcards/remove some characters until you return a result.
JD Isaacks.
This situation of yours is very simple to solve.
It`s not like you need to use Artificial Intelligence like the Google.
http://www.w3schools.com/sql/sql_wildcards.asp
Take a look at this manual at w3schools about wildcards to use with your SELECT code.
But also you will need to create a new table with 3 columns: LeftCode, RightCode and WildCard.
Example:
Rows on Table:
LeftCode = SG | RightCode = 35s | WildCard = SG-s_s35s
LeftCode = SG | RightCode = 64s | WildCard = SG-s_t64s
SQL Code
If the user typed the code that matches the row1 of the table:
SELECT * FROM PRODUCTS WHERE CODE LIKE "$WildCard";
Where $WildCard is the PHP variable containing the column 3 of the new table.
I hope I helped, even 4 years late...

Select from different tables via SQL depending on flag

I have a script to extract certain data from a much bigger table, with one field in particular changing regularly, e.g.
SELECT CASE #Flag WHEN 1 THEN t.field1 WHEN 2 THEN t.field2 WHEN 3
THEN t.field3 END as field,
...[A bunch of other fields]
FROM table t
However, the issue is now I want to do other processing on the data. I'm trying to figure out the most effective method. I need to have some way of getting the flag through, so I know I'm talking about data sliced by the right field.
One possible solution I was playing around with a bit (mostly to see what would happen) is to dump the contents of the script into a table function which has the flag passed to it, and then use a SELECT query on the results of the function. I've managed to get it to work, but it's significantly slower than...
The obvious solution, and probably the most efficient use of processor cycles: to create a series of cache tables, one for each of the three flag values. However, the problem then is to find some way of extracting the data from the right cache table to perform the calculation. The obvious, though incorrect, response would be something like
SELECT CASE #Flag WHEN 1 THEN table1.field WHEN 2 THEN table2.field WHEN 3
THEN table3.field END as field,
...[The various calculated fields]
FROM table1, table2, table3
Unfortunately, as is obvious, this creates a massive cross join - which is not my intended result at all.
Does anyone know how to turn that cross join into an "Only look at x table"? (Without use of Dynamic SQL, which makes things hard to deal with?) Or an alternative solution, that's still reasonably speedy?
EDIT: Whether it's a good reason or not, the idea I was trying to implement was to not have three largely identical queries, that differ only by table - which would then have to be edited identically whenever a change is made to the logic. Which is why I've avoided the "Have the flag entirely separate" thing thus far...
I think you need to pull #Flag out of the query altogether, and use it to decide which of three separate SELECT statements to run.
How about a UNION ALL for each value of FLAG.
In the where clause of the first bit include:
AND #flag = 1
Although the comment about running different select statements for different flag values also makes sense to me.
You seem to be focusing your attention on the technology rather than the problem to be solved. Think about one select from the main table for each case - which is how you describe it here, isn't it?
A simpler solution, and one suggested by a workmate:
SELECT CASE #Flag WHEN 1 THEN t.field1 WHEN 2 THEN t.field2 WHEN 3
THEN t.field3 END as field,
[A bunch of other fields],
#Flag as flag
FROM table t
Then base the decision making on the last field. A lot simpler, and probably should have occurred to me in the first place.

Need Pattern for dynamic search of multiple sql tables

I'm looking for a pattern for performing a dynamic search on multiple tables.
I have no control over the legacy (and poorly designed) database table structure.
Consider a scenario similar to a resume search where a user may want to perform a search against any of the data in the resume and get back a list of resumes that match their search criteria. Any field can be searched at anytime and in combination with one or more other fields.
The actual sql query gets created dynamically depending on which fields are searched. Most solutions I've found involve complicated if blocks, but I can't help but think there must be a more elegant solution since this must be a solved problem by now.
Yeah, so I've started down the path of dynamically building the sql in code. Seems godawful. If I really try to support the requested ability to query any combination of any field in any table this is going to be one MASSIVE set of if statements. shiver
I believe I read that COALESCE only works if your data does not contain NULLs. Is that correct? If so, no go, since I have NULL values all over the place.
As far as I understand (and I'm also someone who has written against a horrible legacy database), there is no such thing as dynamic WHERE clauses. It has NOT been solved.
Personally, I prefer to generate my dynamic searches in code. Makes testing convenient. Note, when you create your sql queries in code, don't concatenate in user input. Use your #variables!
The only alternative is to use the COALESCE operator. Let's say you have the following table:
Users
-----------
Name nvarchar(20)
Nickname nvarchar(10)
and you want to search optionally for name or nickname. The following query will do this:
SELECT Name, Nickname
FROM Users
WHERE
Name = COALESCE(#name, Name) AND
Nickname = COALESCE(#nick, Nickname)
If you don't want to search for something, just pass in a null. For example, passing in "brian" for #name and null for #nick results in the following query being evaluated:
SELECT Name, Nickname
FROM Users
WHERE
Name = 'brian' AND
Nickname = Nickname
The coalesce operator turns the null into an identity evaluation, which is always true and doesn't affect the where clause.
Search and normalization can be at odds with each other. So probably first thing would be to get some kind of "view" that shows all the fields that can be searched as a single row with a single key getting you the resume. then you can throw something like Lucene in front of that to give you a full text index of those rows, the way that works is, you ask it for "x" in this view and it returns to you the key. Its a great solution and come recommended by joel himself on the podcast within the first 2 months IIRC.
What you need is something like SphinxSearch (for MySQL) or Apache Lucene.
As you said in your example lets imagine a Resume that will composed of several fields:
List item
Name,
Adreess,
Education (this could be a table on its own) or
Work experience (this could grow to its own table where each row represents a previous job)
So searching for a word in all those fields with WHERE rapidly becomes a very long query with several JOINS.
Instead you could change your framework of reference and think of the Whole resume as what it is a Single Document and you just want to search said document.
This is where tools like Sphinx Search do. They create a FULL TEXT index of your 'document' and then you can query sphinx and it will give you back where in the Database that record was found.
Really good search results.
Don't worry about this tools not being part of your RDBMS it will save you a lot of headaches to use the appropriate model "Documents" vs the incorrect one "TABLES" for this application.