GROUPING multiple LIKE string - sql

Data:
2015478 warning occurred at 20201403021545
2020179 error occurred at 20201303021545
2025480 timeout occurred at 20201203021545
2025481 timeout occurred at 20201103021545
2020482 error occurred at 20201473021545
2020157 timeout occurred at 20201403781545
2020154 warning occurred at 20201407851545
2027845 warning occurred at 20201403458745
In above data, there are 3 kinds of strings I am interested in warning, error and timeout
Can we have a single query where it will group by string and give the count of occurrences as below
Output:
timeout 3
warning 3
error 2
I know I can write separate queries to find count individually. But interested in a single query
Thanks

You can use filtered aggregation for that:
select count(*) filter (where the_column like '%timeout%') as timeout_count,
count(*) filter (where the_column like '%error%') as error_count,
count(*) filter (where the_column like '%warning%') as warning_count
from the_table;
This returns the counts in three columns rather then three rows as your indicated.
If you do need this in separate rows, you can use regexp_replace() to cleanup the string, then group by that:
select regexp_replace(the_column, '(.*)(warning|error|timeout)(.*)', '\2') as what,
count(*)
from the_table
group by what;

Please use below query, without hard coding the values using STRPOS
select val, count(1) from
(select substring(column_name ,position(' ' in (column_name))+1,
length(column_name) - position(reverse(' ') in reverse(column_name)) -
position(' ' in (column_name))) as val from matching) qry
group by val; -- Provide the proper column name
Demo:

If you want this on separate rows you can also use a lateral join:
select which, count(*)
from t cross join lateral
(values (case when col like '%error%' then 'error' end),
(case when col like '%warning%' then 'warning' end),
(case when col like '%timeout%' then 'timeout' end)
) v(which)
where which is not null
group by which;
On the other hand, if you simply want the second word -- but don't want to hardcode the values -- then you can use:
select split_part(col, ' ', 2) as which, count(*)
from t
group by which;
Here is a db<>fiddle.

Related

How do I select columns based on a string pattern in BigQuery

I have a table in BigQuery with hundreds of columns, and it just happens that I want to select all of them except for those that begin with an underscore. I know how to do a query to select the columns beginning with an underscore using the INFORAMTION_SCHEMA.COLUMNS table, but I can't figure out how I would use this query to select the columns I want. I know BigQuery has EXCEPT but I want to avoid writing out each column that begins with an underscore, and I can't seem to pass to it a subquery or even something like a._*.
Consider below approach
execute immediate (select '''
select * except(''' || string_agg(col) || ''') from your_table
'''
from (
select col
from (select * from your_table limit 1) t,
unnest([struct(translate(to_json_string(t), '{}"', '') as kvs)]),
unnest(split(kvs)) kv,
unnest([struct(split(kv, ':')[offset(0)] as col)])
where starts_with(col, '_')
));
if apply to table like below
it generates below statement
select * except(_c,_e) from your_table
and produces below output

Avoiding aggregation when selecting values from tables

I have the following code which selects value from table2 when 'some string' occurs more than once in 1990
SELECT a.value, COUNT(*) AS test
FROM table1 c
JOIN table2 a
ON c.value2 = a.value_2
JOIN table3 o
ON c.value3 = o.value_3
AND o.value4 = 1990
WHERE c.string = 'Some string'
GROUP BY a.value
HAVING COUNT(*) > 1
This works fine but I am attempting to write a query that produces a similar result without using aggregation. I just need to select values with more then 1 c.string and select those rather than counting and selecting the count as well. I thought about searching for pairs of 'some string' occurring in 1990 for a value but am unsure of how to execute this. Pointing me in the right direction would be appreciated! Struggling to find any documentation referencing this. Thank you!
Use window function ROW_NUMBER() to assign a sequence number within the rows of each table2.value. And use window function FIRST_VALUE() to get the largest row number for each table2.value. Use DISTINCT to remove the duplicates:
select distinct value, first_value(rn) over ( order by rn desc) as count
from
(
SELECT a.value , row_number() over (partition by a.value order by null) rn
FROM table1 c
JOIN table2 a
ON c.value2 = a.value_2
JOIN table3 o
ON c.value3 = o.value_3
AND o.value4 = 1990
WHERE c.string = 'Some string' ) t
where rn > 1;
To check for duplicates, you can use 'WHERE EXISTS', as a starting point. You could start by reading this:
https://www.w3schools.com/sql/sql_exists.asp
This will give you quite a long, cumbersome piece of code compared to using aggregation. But I expect that's the point of the task - to show how useful aggregation is.

Update count of union

I have this query which is throwing a compilation error at the last ')'. The intellisense says 'Expected AS, ID or QUOTED_ID'.
What I am trying to do is - find the distinct values from the union of a table select and a function select, then get the count and update the column of another table with that value.
UPDATE #referees
SET [TotalKeywordCount] = (select count(*)
from (select Keyword
from [dbo].[RefereeFinderPersonKeyWord] P
where P.p_id=#referees.p_id
union
SELECT ltrim(rtrim(replace(Data, '''', '')))
from [SplitOne] (#keywords, ',')))
Any idea what I am doing wrong?
You need to add a name to the nested query that you use in the FROM of the query that pulls out the value for [TotalKeywordCount]. Below you have the code that assigns to it the name subquery:
UPDATE #referees
SET [TotalKeywordCount] = (select count(*) from (
select Keyword from [dbo].[RefereeFinderPersonKeyWord] P where P.p_id=#referees.p_id
union
SELECT ltrim(rtrim(replace(Data, '''', ''))) from [SplitOne] (#keywords, ',')) subquery )

SUM and GROUP BY a substring in Splice (NoSql)

I am trying to to run a query like the one below. The goal is to get the total activity count for every user_key but because the user_key has a complex structure and I need only the part after the '|' symbol I had to use a substring function. However, when I'm trying to run the query, I get the
error:
SQL Error [42Y36]: Column reference 'USER_KEY' is invalid, or is part of an invalid expression. For a SELECT list with a GROUP BY, the columns and expressions being selected may only contain valid grouping expressions and valid aggregate expressions.
The substring function works OK outside this query. Any workarounds for this problem? Using Splice Machine (NoSql)
SELECT
substr(user_key, instr(user_key,'|') + 1) AS new_user_key,
SUM(
CAST(
activity_count AS INTEGER
)
) AS Total
FROM
schema_name.table_name
GROUP BY
substr(user_key, instr(user_key,'|') + 1)
Your GROUP BY column needs to match the SELECT
SELECT
substr(user_key, instr(user_key,'|') + 1) AS new_user_key,
SUM(
CAST(
activity_count AS INTEGER
)
) AS Total
FROM
schema_name.table_name
GROUP BY
substr(user_key, instr(user_key,'|') + 1) AS new_user_key
I found the answer myself. I used a table subquery:
SELECT new_table.new_user_key, sum(new_table.total)
from
(
SELECT
substr(user_key, instr(user_key,'|') + 1) AS new_user_key,
CAST(activity_count AS INTEGER) AS Total
FROM schema_name.table_name
)
as new_table
GROUP BY
new_table.new_user_key
Let's hope someone will find this post useful and will save some time to him or her.

subquery Count() - Column must appear in the GROUP BY clause

I just wanted to know, why a subquery returned more than one value, so I made this query:
SELECT id,
(SELECT Count(tags[i])
FROM generate_subscripts(tags, 1) AS i
WHERE tags[i]='oneway') as oneway_string
FROM planet_osm_ways
WHERE 'oneway' = ANY(tags)
HAVING
(SELECT Count(tags[i])
FROM generate_subscripts(tags, 1) AS i
WHERE tags[i]='oneway') > 1
which should find all occurences of 'oneway' in tags array and count them.
[42803] ERROR: column "planet_osm_ways.id" must appear in the GROUP BY clause
or be used in an aggregate function Position: 8
You should change HAVING to WHERE as there are no groups on which you could apply HAVING filter, instead you want to use WHERE filter which applies to each row.
SELECT id,
(SELECT Count(tags[i])
FROM generate_subscripts(tags, 1) AS i
WHERE tags[i]='oneway') as oneway_string
FROM planet_osm_ways
WHERE 'oneway' = ANY(tags)
AND
(SELECT Count(tags[i])
FROM generate_subscripts(tags, 1) AS i
WHERE tags[i]='oneway') > 1