I have a simple problem but I started to use google bq and their help menu was so complex for me.
I have a column like that for some rows:
ANSWER(title of column)
9
10 - Certainly Satisfied.
7 -
My aim is to split the previous part of that column from "-" sign and convert it to integer. I found some formulas like split(), regexp_extract() but I couldn't be sure how can I imply them for my data.
Thanks for your help in advance :)
If the number is always first, you can use:
select sum(safe.cast((split(answer, '-'))[ordinal(1)] as int64)
from t;
Note: It looks like you have spaces, so you might really want to split on the space:
select sum(safe.cast((split(answer, ' '))[ordinal(1)] as int64)
from t;
Consider below option
select answer,
safe_cast(regexp_extract(trim(answer), r'^\d+') as int64) as score
from `project.dataset.table`
if to apply to sample data in your question - output is
I have a table with about 200 million records. One of the columns is defined as varchar(100) and it's included in a full text index. Most of the values are numeric. Only few are not numeric.
The problem is that it's not working well. For example if a row contains the value '123456789' and i look for '567', it's not returning this row. It will only return rows where the value is exactly '567'.
What am I doing wrong?
sql server 2012.
Thanks.
Full text search doesn't support leading wildcards
In my setup, these return the same
SELECT *
FROM [dbo].[somelogtable]
where CONTAINS (logmessage, N'28400')
SELECT *
FROM [dbo].[somelogtable]
where CONTAINS (logmessage, N'"2840*"')
This gives zero rows
SELECT *
FROM [dbo].[somelogtable]
where CONTAINS (logmessage, N'"*840*"')
You'll have to use LIKE or some fancy trigram approach
The problem is probably that you are using a wrong tool since Full-text queries perform linguistic searches and it seems like you want to use simple "like" condition.
If you want to get a solution to your needs then you can post DDL+DML+'desired result'
You can do this:
....your_query.... LIKE '567%' ;
This will return all the rows that have a number 567 in the beginning, end or in between somewhere.
99% You're missing % after and before the string you search in the LIKE clause.
es:
SELECT * FROM t WHERE att LIKE '66'
is the same as as using WHERE att = '66'
if you write:
SELECT * FROM t WHERE att LIKE '%66%'
will return you all the lines containing 2 'sixes' one after other
I have been working on this on for months. I just cannot get the natural (True alpha-numeric) results. I am shocked that I cannot get them as I have been able to in RPG since 1992 with EBCDIC.
I am looking for any solution in SQL, VBS or simple excel or access. Here is the data I have:
299-8,
3410L-87,
3410L-88,
420-A20,
420-A21,
420A-40,
4357-3,
AN3H10A,
K117GM-8,
K129-1,
K129-15,
K271B-200L,
K271B-38L,
K271D-200EL,
KD1051,
KD1062,
KD1092,
KD1108,
KD1108,
M8000-3,
MS24665-1,
SK271B-200L,
SAYA4008
The order I am looking for is the true alpha-numeric order as below:
AN3H10A,
KD1051,
KD1062,
KD1092,
KD1108,
KD1108,
K117GM-8,
K129-1,
K129-15,
MS24665-1,
M8000-3,
SAYA4008,
SK271B-200L
The inventory is 7800 records so I have had some problems with processing power as well.
Any help would be appreciated.
Jeff
In native Excel, you can add multiple sorting columns to return the ASCII code for each character, but if the character is a number, then add a large number to the code (e.g 1000).
Then sort on each of the helper columns, including the first column in the table, but not in the sort.
The formula:
=IFERROR(CODE(MID($A1,COLUMNS($A:A),1))+AND(CODE(MID($A1,COLUMNS($A:A),1))>=48,CODE(MID($A1,COLUMNS($A:A),1))<=57)*1000,"")
The Sort dialog:
The results:
You can implement a similar algorithm using VBA, and probably SQL also. I dunno about VBS or Access.
You could try using format for left padding the string in order by
select column
from my_table
order by Format(column, "0000000000")
Add a sorting column:
, iif (left(fieldname, 1) between '0' and '9', 1, 0) sortField
etc
order by sortField, FieldName
Lets say you have your data in column "A". If you put this formula in column "B" =IFERROR(IF(LEFT(A1,1)+1>0,"ZZZZZZZ "&A1,A1),A1), it will automatically add Z in front of all numerical values, so that they will naturally appear after all alphabetical values when you sort A-Z. later you can find&replace that funny ZZZZZZ string...
There a number of approaches, but likely the least amount of work is to build two columns that split out the delimiter (-) in this case.
You then “pad” the results (spaces, or 0) right justified, and then sort on the two columns.
So in the query builder we have this:
SELECT Field1,
Format(
Mid(field1,1,IIf(InStr(field1,"-")=0,50,InStr(field1,"-")-1)),
">##########") AS Expr1,
Format(
Mid(field1,IIf(InStr(field1,"-")=0,99,InStr(field1,"-")+1)),
">##########") AS Expr2
FROM Data
When we run the above raw query we get this:
So now in the query builder, simply sort on the first derived column, and then sort on the 2nd derived column.
Eg this:
Run the query, and we get this result:
Edit:
Looking at you desired results, it looks like above sort is wrong. We have to RIGHT just and pad with 0’s.
So this 2nd try:
SELECT Field1,
Left(Mid(field1,1,IIf(InStr(field1,"-")=0,30,InStr(field1,"-")-1))
& String(30,"0"),30) AS Expr1,
Left(Mid(field1,IIf(InStr(field1,"-")=0,99,InStr(field1,"-")+1))
& String(30,"0"),30) AS Expr2
FROM Data
The results are thus this:
Given your small table size, then the above query should perform quite well.
I have a data set that looks something like this:
A6177PE
A85506
A51SAIO
A7918F
A810004
A11483ON
A5579B
A89903
A104F
A9982
A8574
A8700F
And I need to find all the ENDings where they are non-numeric. In this example, that means PE, AIO, F, ON, B and F.
In pseudocode, I'm imagining I need something like
SELECT DISTINCT X FROM
(SELECT SUBSTR(COL,[SOME_CLEVER_LOGIC]) AS X FROM TABLE);
Any ideas? Can I solve this without learning regexp?
EDIT: To clarify, my data set is a lot larger than this example. Also, I'm only interested in the part of the string AFTER the numeric part. If the string is "A6177PE" I want "PE".
Disclaimer: I don't know Oracle SQL. But, I think something like this should work:
SELECT DISTINCT X FROM
(SELECT SUBSTR(COL,REGEXP_INSTR(COL, "[[:ALPHA:]]+$")) AS X FROM TABLE);
REGEXP_INSTR(COL, "[[:ALPHA:]]+$") should return the position of the first of the characters at the end of the field.
For readability, I'd recommend using the REGEXP_SUBSTR function (If there are no performance issues of course, as this is definitely slower than the accepted solution).
...also similar to REGEXP_INSTR, but instead of returning the position of the substring, it returns the substring itself
SELECT DISTINCT SUBSTR(MY_COLUMN,REGEXP_SUBSTR("[a-zA-Z]+$")) FROM MY_TABLE;
(:alpha: is supported also, as #Audun wrote )
Also useful: Oracle Regexp Support (beginning page)
For example
SELECT SUBSTR(col,INSTR(TRANSLATE(col,'A0123456789','A..........'),'.',-1)+1)
FROM table;
I'm trying to retrieve all columns that start with any non alpha characters in SQlite but can't seem to get it working. I've currently got this code, but it returns every row:
SELECT * FROM TestTable WHERE TestNames NOT LIKE '[A-z]%'
Is there a way to retrieve all rows where the first character of TestNames are not part of the alphabet?
Are you going first character only?
select * from TestTable WHERE substr(TestNames,1) NOT LIKE '%[^a-zA-Z]%'
The substr function (can also be called as left() in some SQL languages) will help isolate the first char in the string for you.
edit:
Maybe substr(TestNames,1,1) in sqllite, I don't have a ready instance to test the syntax there on.
Added:
select * from TestTable WHERE Upper(substr(TestNames,1,1)) NOT in ('A','B','C','D','E',....)
Doesn't seem optimal, but functionally will work. Unsure what char commands there are to do a range of letters in SQLlite.
I used 'upper' to make it so you don't need to do lower case letters in the not in statement...kinda hope SQLlite knows what that is.
try
SELECT * FROM TestTable WHERE TestNames NOT LIKE '[^a-zA-Z]%'
SELECT * FROM NC_CRIT_ATTACH WHERE substring(FILENAME,1,1) NOT LIKE '[A-z]%';
SHOULD be a little faster as it is
A) First getting all of the data from the first column only, then scanning it.
B) Still a full-table scan unless you index this column.