I have a query I am writing that examines an ID field and derives an ID number from that column based on several criteria. Now that I have its logic written, I want to run the query on each criteria to see if the logic is working. So, the last part of my query for doing so is as follows:
FROM TABLE1
WHERE SOURCE_SYSTEM_NM = 'XYZ' AND ((STRLEFT(SOURCE_ARRANGEMENT_ID,4)) NOT IN ('23CC','21CC'))
LIMIT 10000
Essentially what I am trying to do here is tell it to return to me only items with SOURCE_SYSTEM_NM equal to 'XYZ', while eliminating any with a SOURCE_ARRANGEMENT_ID not having the first 4 characters equal to '21CC' or '23CC'. I have a third criteria I want to filter on as well, which is that the first three characters must be '0CC'.
My problem when I run this is I get back an "Invalid Position" error. I removed one of the strings from the criteria, and it works. So, I decided to add the second in its own 'NOT IN...' clause with an AND between them, but that resulted in the same error.
If I had to guess, the NOT IN ('21CC','23CC') puts an AND between them and I think that must be the root of my issue. The criteria in my CASE statement derives the ID number with the following:
WHEN (M_CRF_CU_PRODUCT_ARRANGEMENT.SOURCE_SYSTEM_NM) IN ('XYZ') AND STRLEFT(SOURCE_ARRANGEMENT_ID, 4) IN ('23CC','21CC') THEN STRRIGHT(SOURCE_ARRANGEMENT_ID, LENGTH(SOURCE_ARRANGEMENT_ID)-4)
WHEN (M_CRF_CU_PRODUCT_ARRANGEMENT.SOURCE_SYSTEM_NM) IN ('XYZ') AND STRLEFT(SOURCE_ARRANGEMENT_ID, 3) IN ('0CC') THEN STRRIGHT(SOURCE_ARRANGEMENT_ID, LENGTH(SOURCE_ARRANGEMENT_ID)-3)
WHEN (M_CRF_CU_PRODUCT_ARRANGEMENT.SOURCE_SYSTEM_NM) IN ('XYZ') AND (STRLEFT(SOURCE_ARRANGEMENT_ID, 4) NOT IN ('23CC','21CC') OR STRLEFT(SOURCE_ARRANGEMENT_ID, 3) NOT IN ('0CC')) THEN (SOURCE_ARRANGEMENT_ID)
So with that, I am just trying to check each criteria to make sure the ID derived/created is correct. I need to filter down to get results for that last WHEN statement above, but I keep getting that "Invalid Position" in my WHERE statement at the end. I am using Aginity to run this query and it's running against an IBM Netezza database. Thanks in advance!
I figured out what the issue was on this - when performing
STRRIGHT(SOURCE_ARRANGEMENT_ID, LENGTH(SOURCE_ARRANGEMENT_ID)-4)
There are some of those Arrangement IDs that do not have 4 characters, thus I was getting an "Invalid Position". I fixed this by updating this query to use substring() instead:
SUBSTRING(SOURCE_ARRANGEMENT_ID,5,LENGTH(SOURCE_ARRANGEMENT_ID))
This fixed my issue. Just wanted to post an answer in case others have this issue. It s not Netezza specific, this will react this way with any SQL variant.
Related
I have two EXCEL datasources. 175,000 rows. I'm trying to set up a join (Add New Join Clause) using the INNER option between the two datasources. The left datasource includes certain member id #s. Unfortunately, the right datasource's member id #s are within a large field called member Desc. Something like below,
Datasource Left
Member ID #
ALL89098
Datasource Right
Member Desc
YTRNNN TO=ALL89098_KIA TO BE OR NOT OR
POALL89098 JOE
So, I need to deal with two scenarios as you notice from above. The member id is within the Member Desc after a TO= and it could be anywhere like scenario 2 POALL89098
If I can't get this done in Tableau to establish the Join between these two columns from different datasources, since I have both of these datasources loaded into SQL Server DB, I can run SQL statements in SQL since they are in two different Tables within SQL Server DB as well.
I'm trying the use of CONTAINS clause in Tableau such as below but it is running very very slow. it is only Tableau Desktop with 16 GB Ram.
if contains([Member Desc],([Member id #])then
[Member id #]
ELSE
"NOT FOUND"
END
Thanks so much for your time.
SO, IS THERE A WAY TO HAVE THE REGEXP WITHIN IF AND ELSE OR CASE STATEMENTS?
You can create a join calculation. The highlighted dropdown shows where this can be found:
As long as the format of the Member ID in [Member Desc] has some pattern, it can be extracted with Regex. As you mention in your question, one way the ID may present itself is after a "TO=" and it looks like it ends before a "_". The following regex calculated field will pull the string between the two:
REGEXP_EXTRACT([Member Desc],"([^TO=]*)(?=_)")
The result should properly join the two datasources:
The above is an outline which I hope sets you on the right path. I realize that there may be a few different methods in which the [Member ID] presents itself so I wont be able to nail down the exact Regex, but if there is any pattern at all then the format above should work. (ie: even if the only pattern is that [Member ID] is three letters followed by four numbers - or it always starts with an A and ends with something else - etc.)
Regex should also perform better than a contains() function, but do be aware that the function does need to search through every string in every row to make the join.
Edit in response to comment:
To add multiple conditions, try the following method:
IF LEN(REGEXP_EXTRACT([Member Desc],"([^FROM=]*)(?=,)")) > 0
THEN REGEXP_EXTRACT([Member Desc],"([^FROM=]*)(?=,)")
ELSEIF LEN(REGEXP_EXTRACT([Member Desc],"([^TO=]*)(?=,)")) > 0
THEN REGEXP_EXTRACT([Member Desc],"([^TO=]*)(?=,)")
ELSEIF [...Put as many of these as might match your pattern]
THEN [...Put as many of these as might match your pattern]
END
Essentially the calculation is going down the list and trying each possibility. I changed yours a little to look at the length (LEN()) of the returned value which should compare fairly quickly, as it is an integer. As this calculation iterates through each ELSEIF and finds a match, it will stop iterating through the list -- so its important to put the most likely match at the top. The result of the calculated field should be a member ID. If there is no match, there really isn't a need for an ELSE statement because the Inner Join will exclude it automatically.
Edit in response to comment:
Thank you. I see your recommendations.
I think you are going to have to find a way to strip out the member ID from the member desc in SQL. There should be some pattern to Member ID.
For instance is it always 3 letters followed by 5 numbers or something similar.
If you can come up with a pattern, then you can use SQL and some combination of Substring, Charindex, and/or Like %Text% or a regex
pattern to strip out the actual member ID in the SQL Server table as its own field before bringing it into Tableau.
Alright so I understand the point of the HAVING clause. I am having an issue and I am wondering if I can solve this the way I want to.
I want to execute one query using ADODB.Recordset and then use the Filter function to sift through the data set.
The problem is the query at the moment which looks like this:
SELECT tblMT.Folder, tblMT.MTDATE, tblMT.Cust, Sum(tblMT.Hours)
FROM tblMT
GROUP BY tblMT.Folder, tblMT.MTDATE, tblMT.Cust
HAVING tblMT.Cust LIKE "TEST*" AND Min(tblMT.MTDATE)>=Date()-30 AND MAX(tblMT.MTDATE)<=Date()
ORDER BY tblMT.TheDATE DESC;
So the above works as expected.... however I want to be able to use the tblMT.Cust as the filter without having to keep re querying the database. If I remove it I get a:
Data type mismatch in criteria expression.
Is what I am trying to do possible? If someone can point me in the right direction here would be great.
Ok... the type mismatch is caused because either tblmt.mtdate isn't a date field or tblmt.hours isn't a number field AND you have data that either isn't a date or isn't a number when the customer isn't like 'TEST*'. Or, for some customers, you have a NULL in mt.date and null can't be compared with >=. you'd still get the error if you said where tblMt.cust not like "TEST*" too.
Problem is likely with the data or your expectation and you need to handle it.
What data types are tblMT.hours and tblMt.MtDate?
Background
I am creating a database to tracks lab samples. I wish to put a restriction in place that prevents a technician from reporting multiple results for the same sample.
Current strategy
My query called qselReport lists all samples being reported.
SELECT tblResult.strSampleName, tblResult.ysnReport FROM tblResult WHERE (((tblResult.ysnReport)=True));
When a technician wishes to report a result for a given sample, I use a Before Change Event to check for that sample in qselReport (the code block below is my event macro N.B. it is not VBA).
If Updated("ysnReport") And Old.[ysnReport]=False Then
Look Up A Record In qselReport
Where Condition = [strSampleName]=[tblResult].[strSampleName]
Alias
RaiseError
Error Number 1
Error Description This sample is already being reported.
End If
That all works fine and dandy. The error message pops up if a second result is selected to report for a sample.
The problem
I like to keep things as sleek as possible, so I don't want qselReport unless it's absolutely necessary. So I made a slight adjustment to the LookupRecord block so that it operates on a SQL statement rather than on the query. Here's what that looks like (again N.B. not VBA, just a macro):
If Updated("ysnReport") And Old.[ysnReport]=False Then
Look Up A Record In SELECT tblResult.strSampleName, tblResult.ysnReport FROM tblResult WHERE [tblResult].[ysnReport]=True;
Where Condition = [strSampleName]=[tblResult].[strSampleName]
Alias
RaiseError
Error Number 1
Error Description This sample is already being reported.
End If
Now I get the error message every time that a result is reported, even if it's the first instance for that sample. From what I can tell, the issue is that the SQL statement WHERE clause does not filter the records to only those where ysnReport=True.
Can anyone explain why I can do LookupRecord in a query but not LookupRecord in an identical SQL statement? Thanks for the input.
If you want things as sleek as possible, at least performance-wise, a stored query should, in principle, outperform dynamic SQL.
Syntax-wise, I'm not familiar with the macro constructs, but I'd consider enclosing the select statement in parentheses if it accepts them and also adding an explicit alias. I suspect that alias would in turn need to be referenced in your WHERE condition:
Where Condition = MySelect.[strSampleName]=[tblResult].[strSampleName]
Alias MySelect
I found the solution to my problem. The SQL statement where clause needed to be moved to the LookupRecord data block Where condition. This works:
If Updated("ysnReport") And Old.[ysnReport]=False Then
Look Up A Record In SELECT tblResult.strSampleName, tblResult.ysnReport FROM tblResult;
Where Condition = [ysnReport]=True And [strSampleName]=[tblResult].[strSampleName]
Alias
RaiseError
Error Number 1
Error Description This sample is already being reported.
End If
I am using Jet SQL from excel using an ADODB connection to an IBM400 server to try and and get some data. I have done this fine before and it is fine with all other JET SQL commands however I have ran into a problem to which I am unable to solve. It is quite simple so I imagine that I am just not putting the correct syntax in but what I am trying to do is get some totals.
I have a table that contains part numbers and quantities within the locations of that part (more than one location per part). My goal is to have an sql command grab the total quantity (summing all locations) per part. I am able to do this one part at a time successfuly using: (for simplicity I will use part numbers 12345678 and 01234567)
SELECT SUM(CPJDDTA81.F4101JD.LIPQOH) FROM CPJDDTA81.F4101JD WHERE CPJDDTA81.F4101JD.IMLITM = '12345678'
CPJDDTA81.F4101JD is my table, IMLITM is the column name of part numbers, LIPQOH is the quantity on hand per location.
The single search produces the sum I want however the problem comes when trying to run more than one sum within one sql command. I have tried using a select iif command like the following:
SELECT IIF(CPJDDTA81.F4101JD.IMLITM = '12345678',SUM(CPJDDTA81.F4101JD.LIPQOH),IIF(CPJDDTA81.F4101JD.IMLITM = '01234567',SUM(CPJDDTA81.F4101JD.LIPQOH),0) FROM CPJDDTA81.F4101JD
This command provides an error saying that "=" is not a valid token (the = sign within the IIF statement). I was hoping that someone out there can help me write a correct statement to accomplish this. My actual part list will be much larger so I will be using VBA to construct the SQL statement but I need to learn how to do two parts first. Thanks ahead of time.
SELECT CPJDDTA81.F4101JD.IMLITM, SUM(CPJDDTA81.F4101JD.LIPQOH) AS TotalQuantity
FROM CPJDDTA81.F4101JD
GROUP BY CPJDDTA81.F4101JD.IMLITM
Does the above help?
Additional, the items can be limited by adding a WHERE clause.
SELECT CPJDDTA81.F4101JD.IMLITM, SUM(CPJDDTA81.F4101JD.LIPQOH) AS TotalQuantity
FROM CPJDDTA81.F4101JD
WHERE CPJDDTA81.F4101JD.IMLITM IN ('12345678', '01234567')
GROUP BY CPJDDTA81.F4101JD.IMLITM
I suppose I have always naively assumed that scalar functions in the select part of a SQL query will only get applied to the rows that meet all the criteria of the where clause.
Today I was debugging some code from a vendor and had that assumption challenged. The only reason I can think of for this code failing is that the Substring() function is getting called on data that should have been filtered out by the WHERE clause. But it appears that the substring call is being applied before the filtering happens, the query is failing.
Here is an example of what I mean. Let's say we have two tables, each with 2 columns and having 2 rows and 1 row respectively. The first column in each is just an id. NAME is just a string, and NAME_LENGTH tells us how many characters in the name with the same ID. Note that only names with more than one character have a corresponding row in the LONG_NAMES table.
NAMES: ID, NAME
1, "Peter"
2, "X"
LONG_NAMES: ID, NAME_LENGTH
1, 5
If I want a query to print each name with the last 3 letters cut off, I might first try something like this (assuming SQL Server syntax for now):
SELECT substring(NAME,1,len(NAME)-3)
FROM NAMES;
I would soon find out that this would give me an error, because when it reaches "X" it will try using a negative number for in the substring call, and it will fail.
The way my vendor decided to solve this was by filtering out rows where the strings were too short for the len - 3 query to work. He did it by joining to another table:
SELECT substring(NAMES.NAME,1,len(NAMES.NAME)-3)
FROM NAMES
INNER JOIN LONG_NAMES
ON NAMES.ID = LONG_NAMES.ID;
At first glance, this query looks like it might work. The join condition will eliminate any rows that have NAME fields short enough for the substring call to fail.
However, from what I can observe, SQL Server will sometimes try to calculate the the substring expression for everything in the table, and then apply the join to filter out rows. Is this supposed to happen this way? Is there a documented order of operations where I can find out when certain things will happen? Is it specific to a particular Database engine or part of the SQL standard? If I decided to include some predicate on my NAMES table to filter out short names, (like len(NAME) > 3), could SQL Server also choose to apply that after trying to apply the substring? If so then it seems the only safe way to do a substring would be to wrap it in a "case when" construct in the select?
Martin gave this link that pretty much explains what is going on - the query optimizer has free rein to reorder things however it likes. I am including this as an answer so I can accept something. Martin, if you create an answer with your link in it i will gladly accept that instead of this one.
I do want to leave my question here because I think it is a tricky one to search for, and my particular phrasing of the issue may be easier for someone else to find in the future.
TSQL divide by zero encountered despite no columns containing 0
EDIT: As more responses have come in, I am again confused. It does not seem clear yet when exactly the optimizer is allowed to evaluate things in the select clause. I guess I'll have to go find the SQL standard myself and see if i can make sense of it.
Joe Celko, who helped write early SQL standards, has posted something similar to this several times in various USENET newsfroups. (I'm skipping over the clauses that don't apply to your SELECT statement.) He usually said something like "This is how statements are supposed to act like they work". In other words, SQL implementations should behave exactly as if they did these steps, without actually being required to do each of these steps.
Build a working table from all of
the table constructors in the FROM
clause.
Remove from the working table those
rows that do not satisfy the WHERE
clause.
Construct the expressions in the
SELECT clause against the working table.
So, following this, no SQL dbms should act like it evaluates functions in the SELECT clause before it acts like it applies the WHERE clause.
In a recent posting, Joe expands the steps to include CTEs.
CJ Date and Hugh Darwen say essentially the same thing in chapter 11 ("Table Expressions") of their book A Guide to the SQL Standard. They also note that this chapter corresponds to the "Query Specification" section (sections?) in the SQL standards.
You are thinking about something called query execution plan. It's based on query optimization rules, indexes, temporaty buffers and execution time statistics. If you are using SQL Managment Studio you have toolbox over your query editor where you can look at estimated execution plan, it shows how your query will change to gain some speed. So if just used your Name table and it is in buffer, engine might first try to subquery your data, and then join it with other table.