Equivalent of string contains in google bigquery - sql

I have a table like as shown below
I would like to create two new binary columns indicating whether the subject had steroids and aspirin. I am looking to implement this in Postgresql and google bigquery
I tried the below but it doesn't work
select subject_id
case when lower(drug) like ('%cortisol%','%cortisone%','%dexamethasone%')
then 1 else 0 end as steroids,
case when lower(drug) like ('%peptide%','%paracetamol%')
then 1 else 0 end as aspirin,
from db.Team01.Table_1
SELECT
db.Team01.Table_1.drug
FROM `table_1`,
UNNEST(table_1.drug) drug
WHERE REGEXP_CONTAINS( db.Team01.Table_1.drug,r'%cortisol%','%cortisone%','%dexamethasone%')
I expect my output to be like as shown below

Below is for BigQuery Standard SQL
#standardSQL
SELECT
subject_id,
SUM(CASE WHEN REGEXP_CONTAINS(LOWER(drug), r'cortisol|cortisone|dexamethasone') THEN 1 ELSE 0 END) AS steroids,
SUM(CASE WHEN REGEXP_CONTAINS(LOWER(drug), r'peptide|paracetamol') THEN 1 ELSE 0 END) AS aspirin
FROM `db.Team01.Table_1`
GROUP BY subject_id
if to apply to sample data from your question - result is
Row subject_id steroids aspirin
1 1 3 1
2 2 1 1
Note: instead of simple LIKE ending with lengthy and redundant text - I am using LIKE on steroids - which is REGEXP_CONTAINS

In Postgres, I would recommend using the filter clause:
select subject_id,
count(*) filter (where lower(drug) ~ 'cortisol|cortisone|dexamethasone') as steroids,
count(*) filter (where lower(drug) ~ 'peptide|paracetamol') as aspirin,
from db.Team01.Table_1
group by subject_id;
In BigQuery, I would recommend countif():
select subject_id,
countif(regexp_contains(drug, 'cortisol|cortisone|dexamethasone') as steroids,
countif(drug ~ ' 'peptide|paracetamol') as aspirin,
from db.Team01.Table_1
group by subject_id;
You can use sum(case when . . . end) as a more general approach. However, each database has a more "local" way of expressing this logic. By the way, the FILTER clause is standard SQL, just not widely adopted.

Use conditional aggregation. This is a solution that works across most (if not all) RDBMS:
SELECT
subject_id,
MAX(CASE WHEN drug IN ('cortisol', 'cortisone', 'dexamethasone') THEN 1 END) steroids,
MAX(CASE WHEN drug IN ('peptide', 'paracetamol') THEN 1 END) aspirin
FROM db.Team01.Table_1.drug
GROUP BY subject_id
NB: it is unclear why you are using LIKE, since it seems like you are having exact matches; I turned the LIKE condition to equalities.

you have missing group-by
select subject_id,
sum(case when lower(drug) in ('cortisol','cortisone','dexamethasone')
then 1 else 0 end) as steroids,
sum(case when lower(drug) in ('peptide','paracetamol')
then 1 else 0 end) as aspirin
from db.Team01.Table_1
group by subject_id
using like keyword
select subject_id,
sum(case when lower(drug) like '%cortisol%'
or lower(drug) like '%cortisone%'
or lower(drug) like '%dexamethasone%'
then 1 else 0 end) as steroids,
sum(case when lower(drug) like '%peptide%'
or lower(drug) like '%paracetamol%'
then 1 else 0 end) as aspirin
from db.Team01.Table_1
group by subject_id

Another potentially more intutive solution would be to use the BigQuery Contains_Substr
to return boolean results.

I've not used BigQuery but have been reading the docs researching it. I came across this while looking into impact of choosing collation at design stage.
I'm either wrong or this is a new feature since answers above.
CONTAINS_SUBSTR
https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#contains_substr
Performs a normalized, case-insensitive search to see if a value
exists as a substring in an expression. Returns TRUE if the value
exists, otherwise returns FALSE.
Before values are compared, they are normalized and case folded with
NFKC normalization. Wildcard searches are not supported.

Related

SQL count conditioning without the 0 values

I am trying to run a case count on SQL but I want the results without the 0 how do I do that?
Select ClubName,
ClubType,
Country,
Concat(Count(case when GameResult like 'w%' then 1 else NULL end), ' ','wins'),
Count(Case when GameResult like 'l%' then 1 end) AS Losses
from ClubDim,CountryDim,GamesFact
where ClubDim.CountryID = CountryDim.CountryID
And ClubDim.ClubID = GamesFact.ClubID
GROUP BY ClubName,ClubType,Country,GameResult
Having ClubType = 'Professional'
That's the code and I am getting a lot of zeros and my target is to count losses and wins in two separate columns
It's obviously not possible to test without sample data however you should be using sum not count here. Your having criteria should be part of the where clause as you are not filtering on an aggregate. I would also recommend using proper join syntax which has been standard since 1992! However this should give you expected results, I suspect.
Select ClubName,
ClubType,
Country,
Concat(sum(case when GameResult like 'w%' then 1 else 0 end), ' ','wins'),
Sum(Case when GameResult like 'l%' then 1 else 0 end) AS Losses
from ClubDim,CountryDim,GamesFact
where ClubDim.CountryID = CountryDim.CountryID and ClubDim.ClubID = GamesFact.ClubID
and ClubType = 'Professional'
GROUP BY ClubName,ClubType,Country,GameResult
The issue is that you have put gameResult in the group by.
More importantly, you need to learn proper, explicit, standard, readable JOIN syntax. Doing a Cartesian product and filtering in the WHERE clause is just awkward. Not using proper syntax is outdated:
select cl.ClubName, cl.ClubType, co.Country,
Concat(Count(case when g.GameResult like 'w%' then 1 else NULL end), ' ', 'wins'),
Count(Case when g.GameResult like 'l%' then 1 end) AS Losses
from ClubDim cl join
CountryDim co
on cl.CountryID = co.CountryID join
GamesFact g
on cl.ClubID = g.ClubID
where cl.ClubType = 'Professional'
group by cl.ClubName, cl.ClubType, co.Country;
In addition:
Note the use of table aliases. These should be abbreviations for the table names.
I did my best to qualify all columns, so it is clear what tables they come from. Of course, your question doesn't provide this information, so this is just guessing.
Filter before aggregating by using a WHERE clause rather than filtering after aggregating with a HAVING. Use HAVING when you want to filter on the results of aggregations (such as COUNT(*)).
You can try below query -
SELECT ClubName,
ClubType,
Country,
CASE WHEN Wins > 0 THEN CONCAT(Wins , ' wins') ELSE NULL END WINS,
loses
FROM (SELECT ClubName,
ClubType,
Country,
Count(case when GameResult like 'w%' then 1 else NULL end) AS Wins,
Count(Case when GameResult like 'l%' then 1 end) AS Losses
FROM ClubDim CLD
JOIN CountryDim COD ON CLD.CountryID = COD.CountryID
JOIN GamesFact GF ON CLD.ClubID = GF.ClubID
where ClubType = 'Professional'
GROUP BY ClubName,ClubType,Country) X;

Oracle SQL: Using COUNT() >1 When Combining two CASE WHEN Statements

I have a line of SQL which produces a count of purchases variable
count(distinct case when t.transaction_sub_type =1 then t.transaction_date end) as COUNTPUR,
I need to modify this so I can produce a 0/1 flag variable, which flags if a customer is a repeat purchaser. So, when a customer's purchases are greater than 1 then flag as 1 else flag as 0.
case when COUNTPUR>1 then 1 else 0 end as FLAG_REPEATPURCHASER
I need to combine these two case statements into one. I have been experimenting with different versions of the syntax, but I can't seem to nail it down. Below is one of the experiments which do not work.
max(case when (count(distinct case when t.transaction_sub_type =1 then t.transaction_date end))>1 then 1 else 0 end) as FLAG_REPEATPURCHASER,
Thanks in advance for assitance
You can use a case expression with conditional aggregation:
(case when count(distinct case when t.transaction_sub_type = 1 then t.transaction_date end) > 1
then 1 else 0
end) as FLAG_REPEATPURCHASER

Counting records that contain letters given in (SQL)

I have to count records containing given letters, for example column A will contain count of records containing 'a' or 'A', and for E it will be count of records containing 'e' or 'E'. Is there any way to do this by only using grouping functions?
I can do this by using subqueries, but we had this task in class before learning subqueries and I have no idea how to do this by grouping.
The result of the code below that I want to achieve by using grouping:
select
(select count(*) from table where lower(name) like '%a%') as a,
(select count(*) from table where lower(name) like '%e%') as e
from dual;
you can use count + case to avoid repeating full-table query select
select count(case when lower(name) like '%a%' then 1 end) as a
,count(case when lower(name) like '%e%' then 1 end) as e
from Table
The proper expression uses sum():
select sum(case when lower(name) like '%a%' then 1 else 0 end) as num_a,
sum(case when lower(name) like '%e%' then 1 else 0 end) as num_e
from t;
You can also use regular expressions (although they are probably more expensive than like for this purpose):
select sum(case when regexp_like(name, '[aA]') then 1 else 0 end) as num_a,
sum(case when regexp_like(name, '[eE]') then 1 else 0 end) as num_e
from t;

Multiple Aggregate functions with a group by clause

I have the following
WorkflowID FK_UA DateApprobation
----------- -------------------- -----------------------
1 3 NULL
2 1 NULL
3 1 NULL
4 2 2013-05-31 09:22:33.000
What I'm looking to do is to get a bunch of aggregate fields.
I want to get Approbated workflow , Non-Approbated workflow, All Workflows
The way I'm knowing that is if the "DateApprobation" field is null or has a value.
The thing is, I want to be able to group that by "FK_UA" so I don't know how to have 3 aggregate functions (COUNT) with a group by clause.
I'm looking for a query that can achieve that, I've tried a couple of similar case i found and it returned some weird values.
I tried this :
SELECT
FK_UA
,COUNT(WorkflowID) AS TOTAL
,COUNT(CASE when DateApprobation is not null then 1 else 0 end) AS APPROVED
,COUNT(CASE when DateApprobation is null then 1 else 0 end) AS NOT_APPROVED
FROM Workflow
GROUP BY
FK_UA
but it always return the same things for all 3 values!
SELECT
SUM(CASE WHEN [DateApprobation] IS NOT NULL THEN 1 ELSE 0 END) as [Approbated count],
SUM(CASE WHEN [DateApprobation] IS NULL THEN 1 ELSE 0 END) as [Non-Approbated count],
COUNT(*) as [Total]
FROM YourTable
GROUP BY FK_UA
If I got you right....
A standard SQL solution using COUNT()
You could have also used COUNT() but make sure you turn the values you don't want to count into NULL, not 0, as aggregate functions do not aggregate NULL values in SQL
SELECT
fk_ua,
COUNT(WorkflowID) AS total,
COUNT(CASE WHEN DateApprobation IS NOT NULL THEN 1 END) AS approved,
COUNT(CASE WHEN DateApprobation IS NULL THEN 1 END) AS not_approved
FROM Workflow
GROUP BY fk_ua
In fact, you could take this one step further in your case, because you're already counting the NOT NULL values:
SELECT
fk_ua,
COUNT(WorkflowID) AS total,
COUNT(DateApprobation) AS approved,
COUNT(WorkflowID) - COUNT(DateApprobation) AS not_approved
FROM Workflow
GROUP BY fk_ua
Or alternatively:
SELECT fk_ua, total, approved, total - approved AS not_approved
FROM (
SELECT
fk_ua,
COUNT(WorkflowID) AS total,
COUNT(DateApprobation) AS approved
FROM Workflow
GROUP BY fk_ua
) t
For large data sets, this might be slightly faster as your database should be able to recognise that there are only 2 distinct COUNT(...) expressions. Most commercial databases do.
A standard SQL solution using FILTER
Some SQL dialects, including e.g. PostgreSQL, implement the standard FILTER clause, which you can use to make things a bit more readable. Your query would then read:
SELECT
fk_ua,
COUNT(*) AS total,
COUNT(*) FILTER (WHERE DateApprobation IS NOT NULL) AS approved,
COUNT(*) FILTER (WHERE DateApprobation IS NULL) AS not_approved
FROM Workflow
GROUP BY fk_ua

SQL statement using case, like, and having

I am using an Oracle based system.
How do you use like, having, and a case statement together?
I am basically trying to list all of the unique individuals that are found in a transactional table that have more than 4 "Class A" transactions, or more than 1 "Class B" transactions. The reason why I want to use like is because the only way to diferentiate between transaction classes is by using a like statement in the transaction type column.
For example, there are many transaction types, but only "Class A" have '%ABC%' as part of their transaction type, and "Class B" are all the other types that do not have '%ABC%' in their transaction type column.
So again, I want my query to return only the indiv ids that have more than 4 "Class A" Transactions, or 1 "Class B" transaction.
here is what I have so far:
select tt.indiv_id, count(*) from transactiontable tt
group by tt.indiv_id
case when tt.tran_type like '%ABC'
having count(*) > 4
else
having count(*)>1.
I have searched a good bit on the site and I have not found an example using all of these functions together.
select tt.indiv_id,
count(case when tt.tran_type like '%ABC' then 1 end) as ClassACount,
count(case when tt.tran_type not like '%ABC' then 1 end) as ClassBCount
from transactiontable tt
group by tt.indiv_id
having count(case when tt.tran_type like '%ABC' then 1 end) > 4
or count(case when tt.tran_type not like '%ABC' then 1 end) > 1
Try this
select tt.indiv_id, count(*)
from transactiontable tt
group by tt.indiv_id, tt.tran_type
having count(*) > case when tt.tran_type like '%ABC' then 4 else 1 end
Your query is close. You want to keep track of each transaction type separately in the having clause:
select tt.indiv_id, count(*)
from transactiontable tt
group by tt.indiv_id
having sum(case when tt.tran_type like '%ABC%' then 1 else 0 end) > 4 or
sum(case when tt.tran_type not like '%ABC%' then 1 else 0 end) > 1