BigQuery : case when expression to Count from Same column but different conditions - sql

I have a table with 2 columns as below:
Col 1 | col_stats
Field 1 | open
Field 2 | close
Field 1 | close
Field 1 | open
I want the ouput to be as :
Col1 | cnt_open | Cnt_close
Field 1 | 2 | 1
Field 2 | 0 | 1
**I wrote a query **
select col 1, count(case when col_stats= 'open' then 1 else 0 END) cnt_open,
count (case when col_stats= 'close' then 1 else 0 END ) cnt_close
from `project.dataset.tablename`
group by col1
Resultant output from above query is incorrect:
Col1 | cnt_open | Cnt_close
Field 1 | 2 | 2
Field 2 | 1 | 1
Can somebody let me know why the output is giving incorrect result for count even after case condition is applied?

Use countif():
select col1, countif(col_stat = 'open') as num_opens, countif(col_stat = 'closed') as num_closes
from t
group by col1;
In SQL count() counts the number of non-NULL values. Your code would work with sum(). But countif() is simpler and clearer.

Use null instead of 0:
select col1, count(case when col_stats= 'open' then 1 else null END) cnt_open,
count (case when col_stats= 'close' then 1 else null END ) cnt_close
from `project.dataset.tablename`
group by col1

Related

Oracle SQL: Dividing Counts into unique and non unique columns

I have a table that looks like this:
|FileID| File Info |
| ---- | ------------ |
| 1 | X |
| 1 | Y |
| 2 | Y |
| 2 | Z |
| 2 | A |
I want to aggregate by FileID and split the File Info column into 2 separate count columns. I want 1 column to have the count of the Unique File Info and the other to be a count of non-Unique file info.
The result would ideally look like this:
|FileID| Count(Unique)| Count(Non-unique) |
| ---- | ------------ | ----------------- |
| 1 | 1 | 1 |
| 2 | 2 | 1 |
where the non-unique count is the 'Y' and the unique count is from the 'X' and 'Z, A' for FileID 1 and 2 respectively.
I'm looking for ways to gauge uniqueness between files rather than within.
Use COUNT() window function in every row to check if FileInfo is unique and then use conditional aggregation to get the results that you want:
SELECT FileID,
COUNT(CASE WHEN counter = 1 THEN 1 END) count_unique,
COUNT(CASE WHEN counter > 1 THEN 1 END) count_non_unique
FROM (
SELECT t.*, COUNT(*) OVER (PARTITION BY t.FileInfo) counter
FROM tablename t
) t
GROUP BY FileID;
See the demo.
First you select the "Non Unique" rows from the table
SELECT FileInfo
FROM sometableyoudidnotname
GROUP BY FileInfo
HAVING COUNT(*) > 1
Now that you know which ones are unique and non unique you can left join to that table to get the "status" and count it up.
SELECT base.FileID,
SUM(CASE WHEN u.FileID is NOT NULL THEN 1 ELSE 0 END) as nonunique,
SUM(CASE WHEN u.FileID is NULL THEN 1 ELSE 0 END) as unique
FROM sometableyoudidnotname base
LEFT JOIN (
SELECT FileInfo
FROM sometableyoudidnotname
GROUP BY FileInfo
HAVING COUNT(*) > 1
) u ON base.FileInfo = u.FileInfo
GROUP BY base.FileID
Have a derived table that counts occurrences of each fileid. JOIN and GROUP BY:
select t1.FileID,
sum(case when t2.ficount = 1 then 1 else 0 end),
sum(case when t2.ficount > 1 then 1 else 0 end)
from tablename t1
join
(
select fileinfo, count(*) ficount
from tablename
group by fileinfo
) t2
on t1.fileinfo = t2.fileinfo
group by t1.FileID

How to do multiple actions in case when then in sql?

I want to do something like this:
select sum(case ttt.ind = 1 then 1 else 0 end) from ttt
I want to add a column to this query, called myresult which indicates if the value of ttt.istry is equal to 1.
Maybe like:
select
sum(case ttt.ind = 1 then 1, ttt.istry as myresult else 0 end)
from ttt
of course I got an error...
How would I do that?
My data is:
ttt.ind | ttt.istry
--------+----------
1 | 0
0 | 1
1 | 1
and so on...
Expected result:
ttt.ind | ttt.istry | myresult | sum
--------+-----------+----------+------
1 | 0 | 0 | 2
0 | 1 | null | 2
1 | 1 | 1 | 2
You don't say which database so I'll assume it's a modern one. You can use a window function and a CASE clause to do this.
For example:
select
ind,
istry,
case when ind = 1 then istry end as myresult,
sum(ind) over() as sum
from ttt
See live example at SQL Fiddle.
Your logic is a bit hard to follow, but your result set suggests:
select ind, istry,
(case when istry = 1 then 1
when sum(istry) over (partition by ind) = 1 then 0
end),
sum(ttt.ind) over () as sum_ind
from ttt;

How can I seperate one column into multiple columns depending on their value when selecting it?

I have a table called assignment_answers, which has the following attributes:
assignment_answers_id, question_id and order. The order is an attribute, which can take a value from 0 to 9.
I would like for every value that it can take to make it be displayed in a different column. For instance when the order has value 0, then I want it to be displayed in a column called number0. When it has value 1 I want it to be displayed in a column called number1.
Could someone help me with that? So far I have tried this but it does not work:
SELECT (CASE WHEN assessment_answers.order = 0
THEN(
select aq.order as number0
from assessment_answers)
END)
(CASE WHEN assessment_answers.order = 1
THEN(
select aq.order as number1
from assessment_answers)
END)
FROM assessment_answers
I get an error saying:
ERROR: syntax error at or near "("
LINE 6: (CASE WHEN assessment_questions."order" = 1
SAMPLE DATA
assignment_answers_id question_id order
1 1 0
2 1 0
3 2 1
desired output:
assignment_answers_id question_id order0 order1
1 1 0 null
2 1 0 null
3 2 null 1
You can try to use normal CASE WHEN
Query 1:
SELECT assignment_answers_id,
question_id,
(CASE WHEN order = 0 THEN order END) order0,
(CASE WHEN order = 1 THEN order END) order1
FROM assessment_answers
Results:
| assignment_answers_id | question_id | order0 | order1 |
|-----------------------|-------------|--------|--------|
| 1 | 1 | 0 | (null) |
| 2 | 1 | 0 | (null) |
| 3 | 2 | (null) | 1 |
Does this do what you want?
select (aa.order = 0)::int as order_0,
(aa.order = 1)::int as order_1,
(aa.order = 2)::int as order_2,
. . .
from assessment_answers aa;

Oracle - Different count clause on the same line

I wish I could find a request allowing me to have on the same result line, 2 values obtained with a different clause:
For example, let's say that I have this table:
ID |VAL
----------
0 | 1
1 | 0
2 | 0
3 | 1
4 | 0
5 | 0
I wish I could, in the same request, select the number of lines having val = 1, the number of total lines, (and if possible the total percentage of one count on the other) which would give result set like this:
nb_lines | nb_val_1 | ratio
---------------------------
6 | 2 | 0.5
I tried something like:
select count(t1.ID), (select count t2.ID
from table t2 where t2.val = 1
)
FROM table t1
But obviously, this syntax doesn't exist (and it wouldn't give me the ratio). How could I perform this request ?
Try this query which uses CASE to count only those rows we need.
SELECT nb_lines,nb_val_1,nb_val_0, nb_val_1/nb_val_0 FROM
(SELECT COUNT (t1.ID) nb_lines,
COUNT (CASE
WHEN t1.val = 1
THEN 1
ELSE NULL
END) nb_val_1,
COUNT (CASE
WHEN t1.val = 0
THEN 1
ELSE NULL
END) nb_val_0
FROM tabless t1);

SELECT with calculated column that is dependent upon a correlation

I don't do a lot of SQL,and most of the time, I'm doing CRUD operations. Occasionally I'll get something a bit more complicated. So, this question may be a newbie question, but I'm ready. I've just been trying to figure this out for hours, and it's been no use.
So, Imagine the following table structure:
> | ID | Col1 | Col2 | Col3 | .. | Col8 |
I want to select ID and a calculated column. The calculated column has a range of 0 - 8 and it contains the number of matches to the query. I also want to restrict the result set to only include rows that have a certain number of matches.
So, from this sample data:
> | 1 | 'a' | 'b' | 1 | 2 |
> | 2 | 'b' | 'c' | 1 | 2 |
> | 3 | 'b' | 'c' | 4 | 5 |
> | 4 | 'x' | 'x' | 9 | 9 |
I want to query on Col1 = 'a' OR Col2 = 'c' OR Col3 = 1 OR Col4 = 5 where the calculated result > 1 and have the result set look like:
> | ID | Cal |
> | 1 | 2 |
> | 2 | 2 |
> | 3 | 2 |
I'm using T-SQL and SQL Server 2005, if it matters, and I can't change the DB Schema.
I'd also prefer to keep it as one self-contained query and not have to create a stored procedure or temporary table.
This answer will work with SQL 2005, using a CTE to clean up the derived table a little.
WITH Matches AS
(
SELECT ID, CASE WHEN Col1 = 'a' THEN 1 ELSE 0 END +
CASE WHEN Col2 = 'c' THEN 1 ELSE 0 END +
CASE WHEN Col3 = 1 THEN 1 ELSE 0 END +
CASE WHEN Col4 = 5 THEN 1 ELSE 0 END AS Result
FROM Table1
WHERE Col1 = 'a' OR Col2 = 'c' OR Col3 = 1 OR Col4 = 5
)
SELECT ID, Result
FROM Matches
WHERE Result > 1
Here's a solution that leverages the fact that a boolean comparison returns the integers 1 or 0:
SELECT * FROM (
SELECT ID, (Col1='a') + (Col2='c') + (Col3=1) + (Col4=5) AS calculated
FROM MyTable
) q
WHERE calculated > 1;
Note that you have to parenthesize the boolean comparisons because + has higher precedence than =. Also, you have to put it all in a subquery because you normally can't use a column alias in a WHERE clause of the same query.
It might seem like you should also use a WHERE clause in the subquery to restrict its rows, but in all likelihood you're going to end up with a full table scan anyway so it's probably not a big win. On the other hand, if you expect that such a restriction would greatly reduce the number of rows in the subquery result, then it'd be worthwhile.
Re Quassnoi's comment, if you can't treat boolean expressions as integer values, there should be a way to map boolean conditions to integers, even if it's a bit verbose. For example:
SELECT * FROM (
SELECT ID,
CASE WHEN Col1='a' THEN 1 ELSE 0 END
+ CASE WHEN Col2='c' THEN 1 ELSE 0 END
+ CASE WHEN Col3=1 THEN 1 ELSE 0 END
+ CASE WHEN Col4=5 THEN 1 ELSE 0 END AS calculated
FROM MyTable
) q
WHERE calculated > 1;
This query is more index friendly:
SELECT id, SUM(match)
FROM (
SELECT id, 1 AS match
FROM mytable
WHERE col1 = 'a'
UNION ALL
SELECT id, 1 AS match
FROM mytable
WHERE col2 = 'c'
UNION ALL
SELECT id, 1 AS match
FROM mytable
WHERE col3 = 1
UNION ALL
SELECT id, 1 AS match
FROM mytable
WHERE col4 = 5
) q
GROUP BY
id
HAVING SUM(match) > 1
This will only be efficient if all the columns you are searching for are, first, indexed and, second, have high cardinality (many distinct values).
See this article in my blog for performance details:
Matching 3 of 4