How to use substring and charindex in google query language - sql

I have a google spreadsheet sheet with several columns:
A: date
B: string
C: number
...
G: string (could be empty)
H: string (could be empty)
I would like to have a small table with the following:
Get the sum of C in table, where rows are the values of G (substring of it, as they are configured as this: CATEGORY:ITEM, I need grouping by CATEGORY only) and columns are monts
So far I've got only partial solutions with query (for example group by month(toDate(A)) etc) - I seem to be unable to use substring and charindex nor left to do string manipulation to remove the item part after the category nor to visualize those as the first clumns in the resulting rows...
Edit: just to clear up a bit: I need to alter the value in G for each row so that I can group by the altered value. I know it is possible to do with dates ( in my example -> group by month(toDate(A)) gives me access to each value in A column so the result is grouped correctly for each separate month). But it seems string manipulation is not allowed?
How do I do that for starters.
Thanks

Based on your desired output, can you try this on sheet 2:
=ArrayFormula(query({day(Sheet1!A2:A10)&text(month(Sheet1!A2:A10), " (mmm)"), Sheet1!B2:F10, regexextract(Sheet1!G2:G10, "(.+):")}, "select Col7, sum(Col3) group by Col7 pivot Col1"))
and see if this is getting somewhere ?
Or in case you prefer open ended ranges:
=ArrayFormula(query({day(Sheet1!A2:A)&text(month(Sheet1!A2:A), " (mmm)"), Sheet1!B2:F, iferror(regexextract(Sheet1!G2:G, "(.+):"))}, "select Col7, sum(Col3) where Col7 <>'' group by Col7 pivot Col1"))

been searching all over for a similar solution :-)
just in case you would like an alternative solution...
This works a treat based on my sheet H with data in column A to F
=transpose(split(textjoin("|",1,transpose({H!A1:F100})),"|"))
See... https://webapps.stackexchange.com/questions/90629/concatenate-several-columns-into-one-in-google-sheets

Related

How to do a regex compare of string values in 2 columns in big query

So let’s say I have 2 columns, both containing string values,
ColA , ResultCol
I want to check if In a row, the string in ColA is a substring of string in ResultCol
I know about ‘WHERE REGEXP_CONTAINS(col, regex)’ , but how do I do this comparison with string in another column?
Please let me know if I missed explaining any criteria of the question
Thanks!
For your requirement, you can use like operator in BigQuery which will compare the strings of two columns.I have created a sample table Product and ran the below code:
Code
SELECT * from `project.dataset.Product` where product like concat('%',country,'%')
Sample Table
Output

How to get tally of unique words using only SQL?

How do you get a list of all unique words and their frequencies ("tally") in one text column of one table of an SQL database? An answer for any SQL dialect in which this is possible would be appreciated.
In case it helps, here's a one-liner that does it in Ruby using Sequel:
Hash[DB[:table].all.map{|r| r[:text]}.join("\n").gsub(/[\(\),]+/,'').downcase.strip.split(/\s+/).tally.sort_by{|_k,v| v}.reverse]
To give an example, for table table with 4 rows with each holding one line of Dr. Dre's Still D.R.E.'s refrain in a text field, the output would be:
{"the"=>4,
"still"=>3,
"them"=>3,
"for"=>2,
"d-r-e"=>1,
"it's"=>1,
"streets"=>1,
"love"=>1,
"got"=>1,
"i"=>1,
"and"=>1,
"beat"=>1,
"perfect"=>1,
"to"=>1,
"time"=>1,
"my"=>1,
"taking"=>1,
"girl"=>1,
"low-lows"=>1,
"in"=>1,
"corners"=>1,
"hitting"=>1,
"world"=>1,
"across"=>1,
"all"=>1,
"gangstas"=>1,
"representing"=>1,
"i'm"=>1}
This works, of course, but is naturally not as fast or elegant as it would be to do it in pure SQL - which I have no clue if that's even in the realm of possibilites...
I guess it would depend on how the SQL database would look like. You would have to first turn your 4 row "database" into a data of single column, each row representing one word. To do that you could use something like String_split, where every space would be a delimiter.
STRING_SPLIT('I'm representing for them gangstas all across the world', ' ')
https://www.sqlservertutorial.net/sql-server-string-functions/sql-server-string_split-function/
This would turn it into a table where every word is a row.
Once you've set up your data table, then it's easy.
Your_table:
[Word]
I'm
representing
for
them
...
world
Then you can just write:
SELECT Word, count(*)
FROM your_table
GROUP BY Word;
Your output would be:
Word | Count
I'm 1
representing 1
I had a play using XML in sql server. Just an idea :)
with cteXML as
(
select *
,cast('<wd>' + replace(songline,' ' ,'</wd><wd>') + '</wd>' as xml) as XMLsongline
from #tSongs
),cteBase as
(
select p.value('.','nvarchar(max)') as singleword
from cteXML as x
cross apply x.XMLsongline.nodes('/wd') t(p)
)
select b.singleword,count(b.singleword)
from cteBase as b
group by b.singleword

Need to find string using bigquery

We have below string column and having below data
and I want to find Null count present in string columns means how many times null value('') present in front of id column present in select statement
using big query.
Don't use string position.
Expected output:
count of null ('')id =3
1st row,2nd row and 5th row
Below is for BigQuery Standard SQL
#standardSQL
SELECT
FORMAT(
"count of null ('')id = %d. List of id is: %s",
COUNT(*),
STRING_AGG(CAST(ID AS STRING))
) AS output
FROM `project.dataset.table`
WHERE REGEXP_CONTAINS(String, r"(?i)''\s+(?:as|)\s+(?:id|\[id\])")
if to apply to sample data from your question - the output is
Row output
1 count of null ('')id = 3. List of id is: 1,2,5
The idea is to unify all strings to something you can query with like = "%''asid%" or regex
First replace all spaces with ''
replace "[", "]" with ''.
Make the use of " or ' consistent.
Then query with like.
For example:
select 1 from (select replace(replace(replace(replace('select "" as do, "" as [id] form table1',' ',''),'[',''),']',''),'"',"'") as tt)
where tt like ("%''asid%")
Its not a "smart" idea but its simple.
A better idea will be to save the query columns in a repeat column '"" as id' and the table in another column.
You don't need to save 'select' and 'from' this way you can query easily and also assemble a query from the data.
If I understand correctly, you want to count the number of appearances of '' in the string column.
If so, you can use regexp_extract_all():
select t.*,
(select count(*)
from unnest(regexp_extract_all(t.string, "''")) u
) as empty_string_count
from t;

Natural or Human Sort order

I have been working on this on for months. I just cannot get the natural (True alpha-numeric) results. I am shocked that I cannot get them as I have been able to in RPG since 1992 with EBCDIC.
I am looking for any solution in SQL, VBS or simple excel or access. Here is the data I have:
299-8,
3410L-87,
3410L-88,
420-A20,
420-A21,
420A-40,
4357-3,
AN3H10A,
K117GM-8,
K129-1,
K129-15,
K271B-200L,
K271B-38L,
K271D-200EL,
KD1051,
KD1062,
KD1092,
KD1108,
KD1108,
M8000-3,
MS24665-1,
SK271B-200L,
SAYA4008
The order I am looking for is the true alpha-numeric order as below:
AN3H10A,
KD1051,
KD1062,
KD1092,
KD1108,
KD1108,
K117GM-8,
K129-1,
K129-15,
MS24665-1,
M8000-3,
SAYA4008,
SK271B-200L
The inventory is 7800 records so I have had some problems with processing power as well.
Any help would be appreciated.
Jeff
In native Excel, you can add multiple sorting columns to return the ASCII code for each character, but if the character is a number, then add a large number to the code (e.g 1000).
Then sort on each of the helper columns, including the first column in the table, but not in the sort.
The formula:
=IFERROR(CODE(MID($A1,COLUMNS($A:A),1))+AND(CODE(MID($A1,COLUMNS($A:A),1))>=48,CODE(MID($A1,COLUMNS($A:A),1))<=57)*1000,"")
The Sort dialog:
The results:
You can implement a similar algorithm using VBA, and probably SQL also. I dunno about VBS or Access.
You could try using format for left padding the string in order by
select column
from my_table
order by Format(column, "0000000000")
Add a sorting column:
, iif (left(fieldname, 1) between '0' and '9', 1, 0) sortField
etc
order by sortField, FieldName
Lets say you have your data in column "A". If you put this formula in column "B" =IFERROR(IF(LEFT(A1,1)+1>0,"ZZZZZZZ "&A1,A1),A1), it will automatically add Z in front of all numerical values, so that they will naturally appear after all alphabetical values when you sort A-Z. later you can find&replace that funny ZZZZZZ string...
There a number of approaches, but likely the least amount of work is to build two columns that split out the delimiter (-) in this case.
You then “pad” the results (spaces, or 0) right justified, and then sort on the two columns.
So in the query builder we have this:
SELECT Field1,
Format(
Mid(field1,1,IIf(InStr(field1,"-")=0,50,InStr(field1,"-")-1)),
">##########") AS Expr1,
Format(
Mid(field1,IIf(InStr(field1,"-")=0,99,InStr(field1,"-")+1)),
">##########") AS Expr2
FROM Data
When we run the above raw query we get this:
So now in the query builder, simply sort on the first derived column, and then sort on the 2nd derived column.
Eg this:
Run the query, and we get this result:
Edit:
Looking at you desired results, it looks like above sort is wrong. We have to RIGHT just and pad with 0’s.
So this 2nd try:
SELECT Field1,
Left(Mid(field1,1,IIf(InStr(field1,"-")=0,30,InStr(field1,"-")-1))
& String(30,"0"),30) AS Expr1,
Left(Mid(field1,IIf(InStr(field1,"-")=0,99,InStr(field1,"-")+1))
& String(30,"0"),30) AS Expr2
FROM Data
The results are thus this:
Given your small table size, then the above query should perform quite well.

Sorting data in Access database where the column has numbers and letters

Please help me because I have been unable to get this right.
What is the access SQL to select this column(columnA) so that it returns a resultset with distinct values sorted first according to numbers and then to letters.
Here is the columns values: {10A,9C,12D,11G,9B,10C,9R,8T}
I have tried 'Select distinct ColumnA from tblClass order by 1'
but it returns {10A,10C,11G,12D,8T,9B,9C,9R} which is not what I want.
Thank you in advance.
You can use the Val() function for this. From the help topic: "The Val function stops reading the string at the first character it can't recognize as part of a number"
Val(10A) will give you 10, Val(9C) will give you 9, and so on. So in your query, order by Val(ColumnA) first, then ColumnA.
SELECT DISTINCT Val([ColumnA]) AS number_part, ColumnA
FROM tblClass
ORDER BY Val([ColumnA]), ColumnA;
SELECT DISTINCT ColumnA
FROM tblClass
ORDER BY CInt(LEFT(ColumnA,len(ColumnA)-1)), RIGHT(ColumnA,1);
If there last character is a letter and the others are a number.
Your data type is a string so it's sorting correctly, to get the result you want you're going to have to split your values into numeric and alphabetic parts and then sort first on the numeric then the alphabetic. Not being an Access programmer I can't help you with exactly how you're going to do that.
order by 1?
Don't you mean order by ColumnA?
SELECT DISTINCT ColumnA
FROM tblClass
ORDER BY ColumnA
I had a similar problem and used a dummie workaround:
changing a list of {10A,10C,11G,12D,8T,9B,9C,9R}
into {10A,10C,11G,12D,08T,09B,09C,09R} by adding the 0 before each <10 number.
now all items are the same length and access will sort correctly into {08T, 09B, 09C, 09R, 10A, 10C, 11G, 12D}
.
To achieve this, I copied this column into excel column A and used IF(LEN(A2)<3, concatenate("0", A2))