SQL Server substring - sql-server-2005

I need a good expression in order to select correctly parts of a field.
For example, the field can be of the type: "google_organic" or "google_campaign_HereGoesMyCode" . The part I am interested in is "organic" or "campaign" without any other addition.
So far I select with this:
substring(Referer, charIndex('_',Referer)+1, len(Referer))
But in the case of "campaign" I select the whole thing... I don't know how to manage the existence or non-existence of the second underscore...
thank you

One way is to basically create a lastIndex type search using the below SQL and use the result as the length:
len(Referer) – (charindex('_', reverse(Referer))-1)
You can then rewrite your query as follows, although you need the result of the first charIndex so this is fairly intense:
substring(Referer, charIndex('_',Referer)+1, (len(Referer) – (charindex('_', reverse(Referer))-1) - (charIndex('_',Referer)+1))-1 )
I realize that this will now only work if you have 2 underscores. But you can filter which query to run based off a CASE/WHEN statement.

Related

SQL full text search behavior on numeric values

I have a table with about 200 million records. One of the columns is defined as varchar(100) and it's included in a full text index. Most of the values are numeric. Only few are not numeric.
The problem is that it's not working well. For example if a row contains the value '123456789' and i look for '567', it's not returning this row. It will only return rows where the value is exactly '567'.
What am I doing wrong?
sql server 2012.
Thanks.
Full text search doesn't support leading wildcards
In my setup, these return the same
SELECT *
FROM [dbo].[somelogtable]
where CONTAINS (logmessage, N'28400')
SELECT *
FROM [dbo].[somelogtable]
where CONTAINS (logmessage, N'"2840*"')
This gives zero rows
SELECT *
FROM [dbo].[somelogtable]
where CONTAINS (logmessage, N'"*840*"')
You'll have to use LIKE or some fancy trigram approach
The problem is probably that you are using a wrong tool since Full-text queries perform linguistic searches and it seems like you want to use simple "like" condition.
If you want to get a solution to your needs then you can post DDL+DML+'desired result'
You can do this:
....your_query.... LIKE '567%' ;
This will return all the rows that have a number 567 in the beginning, end or in between somewhere.
99% You're missing % after and before the string you search in the LIKE clause.
es:
SELECT * FROM t WHERE att LIKE '66'
is the same as as using WHERE att = '66'
if you write:
SELECT * FROM t WHERE att LIKE '%66%'
will return you all the lines containing 2 'sixes' one after other

Natural or Human Sort order

I have been working on this on for months. I just cannot get the natural (True alpha-numeric) results. I am shocked that I cannot get them as I have been able to in RPG since 1992 with EBCDIC.
I am looking for any solution in SQL, VBS or simple excel or access. Here is the data I have:
299-8,
3410L-87,
3410L-88,
420-A20,
420-A21,
420A-40,
4357-3,
AN3H10A,
K117GM-8,
K129-1,
K129-15,
K271B-200L,
K271B-38L,
K271D-200EL,
KD1051,
KD1062,
KD1092,
KD1108,
KD1108,
M8000-3,
MS24665-1,
SK271B-200L,
SAYA4008
The order I am looking for is the true alpha-numeric order as below:
AN3H10A,
KD1051,
KD1062,
KD1092,
KD1108,
KD1108,
K117GM-8,
K129-1,
K129-15,
MS24665-1,
M8000-3,
SAYA4008,
SK271B-200L
The inventory is 7800 records so I have had some problems with processing power as well.
Any help would be appreciated.
Jeff
In native Excel, you can add multiple sorting columns to return the ASCII code for each character, but if the character is a number, then add a large number to the code (e.g 1000).
Then sort on each of the helper columns, including the first column in the table, but not in the sort.
The formula:
=IFERROR(CODE(MID($A1,COLUMNS($A:A),1))+AND(CODE(MID($A1,COLUMNS($A:A),1))>=48,CODE(MID($A1,COLUMNS($A:A),1))<=57)*1000,"")
The Sort dialog:
The results:
You can implement a similar algorithm using VBA, and probably SQL also. I dunno about VBS or Access.
You could try using format for left padding the string in order by
select column
from my_table
order by Format(column, "0000000000")
Add a sorting column:
, iif (left(fieldname, 1) between '0' and '9', 1, 0) sortField
etc
order by sortField, FieldName
Lets say you have your data in column "A". If you put this formula in column "B" =IFERROR(IF(LEFT(A1,1)+1>0,"ZZZZZZZ "&A1,A1),A1), it will automatically add Z in front of all numerical values, so that they will naturally appear after all alphabetical values when you sort A-Z. later you can find&replace that funny ZZZZZZ string...
There a number of approaches, but likely the least amount of work is to build two columns that split out the delimiter (-) in this case.
You then “pad” the results (spaces, or 0) right justified, and then sort on the two columns.
So in the query builder we have this:
SELECT Field1,
Format(
Mid(field1,1,IIf(InStr(field1,"-")=0,50,InStr(field1,"-")-1)),
">##########") AS Expr1,
Format(
Mid(field1,IIf(InStr(field1,"-")=0,99,InStr(field1,"-")+1)),
">##########") AS Expr2
FROM Data
When we run the above raw query we get this:
So now in the query builder, simply sort on the first derived column, and then sort on the 2nd derived column.
Eg this:
Run the query, and we get this result:
Edit:
Looking at you desired results, it looks like above sort is wrong. We have to RIGHT just and pad with 0’s.
So this 2nd try:
SELECT Field1,
Left(Mid(field1,1,IIf(InStr(field1,"-")=0,30,InStr(field1,"-")-1))
& String(30,"0"),30) AS Expr1,
Left(Mid(field1,IIf(InStr(field1,"-")=0,99,InStr(field1,"-")+1))
& String(30,"0"),30) AS Expr2
FROM Data
The results are thus this:
Given your small table size, then the above query should perform quite well.

Pentaho Dynamic SQL queries

I have a Pentaho CDE project in development and i wanted to display a chart wich depends on several parameters (like month, year, precise date, country, etc). But when i want to "add" another parameter to my query, it doesn't work anymore... So i'm sure i'm doing something wrong but what ? Please take a look for the parameter month for example :
Select_months_query : (this is for my checkbox)
SELECT
"All" AS MONTH(TransactionDate)
UNION
SELECT DISTINCT MONTH(TransactionDate) FROM order ORDER BY MONTH(TransactionDate);
Select_barchart_query : (this is for my chart, don't mind the other tables)
SELECT pginit.Family, SUM(order.AmountEUR) AS SALES
FROM pginit INNER JOIN statg ON pginit.PG = statg.PGInit INNER JOIN order ON statg.StatGroup = order.StatGroup
WHERE (MONTH(order.TransactionDate) IN (${month}) OR "All" IN (${month}) OR ${month} IS NULL) AND
/*/* Apply the same pattern for another parameter (like year for example) *\*\
GROUP BY pginit.Family
ORDER BY SALES;
(Here, ${month} is a parameter in CDE)
Any ideas on how to do it ?
I read something there that said to use CASE clauses... But how ?
http://forums.pentaho.com/showthread.php?136969-Parametrized-SQL-clause-in-CDE&highlight=dynamic
Thank you for your help !
Try simplifying that query until it runs and returns something and work from there.
Here are some things I would look into as possible causes:
I think you need single quotes around ${parameter} expressions if they're strings;
"All" should probably be 'All' (single quotes instead of double quotes);
Avoid multi-line comments. I don't think you can have multi-line comments in CDE SQL queries, although -- for single line comments usually works.
Be careful with multi-valued parameters; they are passed as arrays, which CDA will convert into comma separated lists. Try with a single valued parameter, using = instead of IN.

Regular expressions inside SQL Server

I have stored values in my database that look like 5XXXXXX, where X can be any digit. In other words, I need to match incoming SQL query strings like 5349878.
Does anyone have an idea how to do it?
I have different cases like XXXX7XX for example, so it has to be generic. I don't care about representing the pattern in a different way inside the SQL Server.
I'm working with c# in .NET.
You can write queries like this in SQL Server:
--each [0-9] matches a single digit, this would match 5xx
SELECT * FROM YourTable WHERE SomeField LIKE '5[0-9][0-9]'
stored value in DB is: 5XXXXXX [where x can be any digit]
You don't mention data types - if numeric, you'll likely have to use CAST/CONVERT to change the data type to [n]varchar.
Use:
WHERE CHARINDEX(column, '5') = 1
AND CHARINDEX(column, '.') = 0 --to stop decimals if needed
AND ISNUMERIC(column) = 1
References:
CHARINDEX
ISNUMERIC
i have also different cases like XXXX7XX for example, so it has to be generic.
Use:
WHERE PATINDEX('%7%', column) = 5
AND CHARINDEX(column, '.') = 0 --to stop decimals if needed
AND ISNUMERIC(column) = 1
References:
PATINDEX
Regex Support
SQL Server 2000+ supports regex, but the catch is you have to create the UDF function in CLR before you have the ability. There are numerous articles providing example code if you google them. Once you have that in place, you can use:
5\d{6} for your first example
\d{4}7\d{2} for your second example
For more info on regular expressions, I highly recommend this website.
Try this
select * from mytable
where p1 not like '%[^0-9]%' and substring(p1,1,1)='5'
Of course, you'll need to adjust the substring value, but the rest should work...
In order to match a digit, you can use [0-9].
So you could use 5[0-9][0-9][0-9][0-9][0-9][0-9] and [0-9][0-9][0-9][0-9]7[0-9][0-9][0-9]. I do this a lot for zip codes.
SQL Wildcards are enough for this purpose. Follow this link: http://www.w3schools.com/SQL/sql_wildcards.asp
you need to use a query like this:
select * from mytable where msisdn like '%7%'
or
select * from mytable where msisdn like '56655%'

SQL produced by Entity Framework for string matching

Given this linq query against an EF data context:
var customers = data.Customers.Where(c => c.EmailDomain.StartsWith(term))
You’d expect it to produce SQL like this, right?
SELECT {cols} FROM Customers WHERE EmailDomain LIKE #term+’%’
Well, actually, it does something like this:
SELECT {cols} FROM Customer WHERE ((CAST(CHARINDEX(#term, EmailDomain) AS int)) = 1)
Do you know why?
Also, replacing the Where selector to:
c => c.EmailDomain.Substring(0, term.Length) == term
it runs 10 times faster but still produces some pretty yucky SQL.
NOTE: Linq to SQL correctly translates StartsWith into Like {term}%, and nHibernate has a dedicated LikeExpression.
I don't know about MS SQL server but on SQL server compact LIKE 'foo%' is thousands time faster than CHARINDEX, if you have INDEX on seach column. And now I'm sitting and pulling my hair out how to force it use LIKE.
http://social.msdn.microsoft.com/Forums/en-US/adodotnetentityframework/thread/1b835b94-7259-4284-a2a6-3d5ebda76e4b
The reason is that CharIndex is a lot faster and cleaner for SQL to perform than LIKE. The reason is, that you can have some crazy "LIKE" clauses. Example:
SELECT * FROM Customer WHERE EmailDomain LIKE 'abc%de%sss%'
But, the "CHARINDEX" function (which is basically "IndexOf") ONLY handles finding the first instance of a set of characters... no wildcards are allowed.
So, there's your answer :)
EDIT: I just wanted to add that I encourage people to use CHARINDEX in their SQL queries for things that they didn't need "LIKE" for. It is important to note though that in SQL Server 2000... a "Text" field can use the LIKE method, but not CHARINDEX.
Performance seems to be about equal between LIKE and CHARINDEX, so that should not be the reason. See here or here for some discussion. Also the CAST is very weird because CHARINDEX returns an int.
charindex returns the location of the first term within the second term.
sql starts with 1 as the first location (0 = not found)
http://msdn.microsoft.com/en-us/library/ms186323.aspx
i don't know why it uses that syntax but that's how it works
I agree that it is no faster, I was retrieving tens of thousands of rows from our database with the letter i the name. I did find however that you need to use > rather than = ... so use
{cols} FROM Customer WHERE ((CAST(CHARINDEX(#term, EmailDomain) AS int)) > 0)
rather than
{cols} FROM Customer WHERE ((CAST(CHARINDEX(#term, EmailDomain) AS int)) = 1)
Here are my two tests ....
select * from members where surname like '%i%' --12 seconds
select * from sc4_persons where ((CAST(CHARINDEX('i', surname) AS int)) > 0) --12 seconds
select * from sc4_persons where ((CAST(CHARINDEX('i', surname) AS int)) = 1) --too few results