Group by first element of the array in BigQuery

Group by first element of the array in BigQuery - sql

I have a column "numbers" with array values. If I select the column in a query, the result looks like:
["40432","83248","1"]
["40432","8923","7723"]
["2340","837","20309"]
["290348","83248","20309","187"]
["98184897","98234","20309"]
["40432","83248"]
["2340"]
Now, I'd like to group the results on only the first value in the array and count them. The result should look like:
value amount
40432 3
2340 2
290348 1
98184897 1
How do I arrange this? How should the query look like?
I tried things like:
SELECT.... WHERE split(TO_JSON_STRING(numbers), ',')[ordinal(1)] as firstNumber ......
But this did not result in the desired data.

Related

BigQuery: UNNESTING string representation of list of JSONs

I have a STRING column with a LIST [,,] of JSONS that I would like to UNNEST into separate lines.
For example:
ROW TICKET_ID Subject UPDATES(STRING)
1 1 Need help... [{"Actor":"Tom","Type":"Request"}, {"Actor":"John","Type":"Update"}]
2 2 Something... [{"Actor":"Kate","Type":"Request"}, {"Actor":"Tim","Type":"Update"}]
I would like it to look like:
ROW TICKET_ID SUBJECT UPDATE
1 1 Need help... {"Actor":"Tom","Type":"Request"}
2 1 Need help... {"Actor":"Tom","Type":"Request"}
3 2 Something... {"Actor":"Kate","Type":"Request"}
4 2 Something... {"Actor":"Kate","Type":"Request"}
I have tried using JSON_EXTRACT_ARRAY() and CROSS JOIN UNNEST() so far but unable to split the updates into separate lines as the updates appear as separate rows within the same row (array)

Use below
select * except(updates)
from your_table,
unnest(json_extract_array(updates)) update
if applied to sample data in your_question - output is

How to show 1 value per repeated value in BigQuery?

How can I make a BigQuery to show the result look like this?
This is an sample data that have repeated value in "FeatureName" column.
https://i.stack.imgur.com/RJkQa.png
Expected Result
https://i.stack.imgur.com/lyZaU.png

SAP HANA SQL - Concatenate multiple result rows for a single column into a single row

I am pulling data and when I pull in the text field my results for the "distinct ID" are sometimes being duplicated when there are multiple results for that ID. Is there a way to concatenate the results into a single column/row rather than having them duplicated?
It looks like there are ways in other SQL platforms but I have not been able to find something that works in HANA.
Example
Select
Distinct ID
From Table1
If I pull only Distinct ID I get the following:
ID
1
2
3
4
However when I pull the following:
Example
Select
Distinct ID,Text
From Table1
I get something like
ID
Text
1
Dog
2
Cat
2
Dog
3
Fish
4
Bird
4
Horse
I am trying to Concat the Text field when there is more than 1 row for each ID.
What I need the results to be (Having a "break" between results so that they are on separate lines would be even better but at least a "," would work):
ID
Text
1
Dog
2
Cat,Dog
3
Fish
4
Bird,Horse

I see Kiran has just referred to another valid answer in the comment, but in your example this would work.
SELECT ID, STRING_AGG(Text, ',')
FROM TABLE1
GROUP BY ID;
You can replace the ',' with other characters, maybe a '\n' for a line break
I would caution against the approach to concatenate rows in this way, unless you know your data well. There is no effective limit to the rows and length of the string that you will generate, but HANA will have a limit on string length, so consider that.

Comma delimited values sql

From my research online I have discovered two answers to this question which I am trying to stay away from.
I cannot modify the table or add a new table because the software is third party and needs the table to remain unmodified.
I am trying to stay away from using temporary tables or extra user defined functions.
Here is my issue.
There is a column in the database that is a list of comma-delimited numbers representing days of the week, i.e. (1,2,4,5,7).
I am trying to find a way to read that data and find out if there are any rows where that column represents days that are 3 consecutive days.
It should return anything with
1,2,3
2,3,4
3,4,5
5,6,7
1,,,,,6,7
1,2,,,,,7
But if the column has 1,2,3,4 it should not return twice. There are a lot of rows that have 2,3,4,5,6 and any solution I've come up with will return that 3 times.
Preferably, I would like to create a stored procedure to pass in a number and look for that number of consecutive days. So if 5 is passed in, it will look for anything that is marked for 5 consecutive days.
Is there another option other than using extra tables? If so can you show me how to do make this work? I am not new to SQL but there are a lot of more advanced querying techniques I am not familiar with.

The following brute force method will work in all databases:
select (case when col like '%1%' and col like '%2%' and col like '%3%' then 1
when col like '%2%' and col like '%3%' and col like '%4%' then 1
when col like '%3%' and col like '%4%' and col like '%5%' then 1
when col like '%4%' and col like '%5%' and col like '%6%' then 1
when col like '%5%' and col like '%6%' and col like '%7%' then 1
when col like '%6%' and col like '%7%' and col like '%1%' then 1
when col like '%7%' and col like '%1%' and col like '%2%' then 1
else 0
end) as HasThreeConsecutiveDays
It returns a 0/1 flag if three days are consecutive.

So if 5 is passed in, it will look for anything that is marked for 5 consecutive days.
You won't be able to do that without dynamic sql, because you want to support wrapping from 7 back to 1. I could write a query that would do it for you in a single statement if you didn't care about wrapping from the end of the week back to the beginning, but with that requirement I don't see how to do it without building a dynamic sql string in the procedure, which I don't have time to play with right now (maybe someone else will take that idea and run with it).
With that option defeated for now, I can do this instead:
WHERE
( col like '1,2,3%'
OR col like '%2,3,4%'
OR col like '%3,4,5%'
OR col like '%4,5,6%'
OR col like '%5,6,7'
OR col like '1%6,7'
OR col like '1,2%7'
)
This should be better than checking individual numbers as shown in another answer, because there are fewer pattern matches to complete. However, it only works if we can guarantee the sort order. We also need to know in advance how the commas are spaced between numbers, but we can fix that issue if necessary by replacing all commas and/or spaces with an empty string (and adjusting the patterns accordingly).
One more thought here: I realized that I can support a day count argument, if you can manage sneaking an additional table into the db somewhere. The table would look something like this:
create Table DayPatterns (Days int, Pattern varchar(13) )
and the data in the table would look like this:
1 1%
1 %2%
1 %3%
...
2 1,2%
2 %2,3%
2 %3,4%
2 %4,5%
...
2 1%7
...
3 1,2,3%
3 %2,3,4%
...
3 1%6,7
3 1,2%7
...
7 1,2,3,4,5,6,7
Hopefully you get the idea on how to fill that out. With that table in hand, you can JOIN against the table with a query like this:
INNER JOIN DayPatterns p ON p.Days = #ConsecutiveDays AND col LIKE p.Pattern
The key to making that work (aside from needing to be able to create that table somewhere) is also doing a GROUP BY on the correct columns. Otherwise, you'll end up with the same problem you have right now, where matching multiple possible consecutive day patterns will duplicate your results.
Finally, of course you know that most any schema that includes csv data is broken, but since you can't seem to fix this, hopefully one of these ideas will help.

Finding a Combination (Box) in SQL

I am trying to find whether a set of numbers are contained in another set of numbers.
ID NumberSet Result
-- --------- ------
1 1457 5741
2 4187 7148
3 6324 1345
So for this dataset I would return ID 1 & 2. All the numbers from the NumberSet must be contained within the Result.
Any suggestions?

This actually isn't that hard. Just look for the reverse . . . is there a case where a number from NumberSet is not in Result?
For the first row, you could manually create a like expression for finding a result that has a character other than "1457":
where Result like '%[^1457]%'
What you want is:
where Result not like '%[^1457]%'
Now, let's generalize:
where Result not like '%[^'+NumberSet+']%'

It seems not easy but you can look into this blog http://wikiprogrammer.wordpress.com/2011/10/17/find-out-anagram-using-sql/

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas