SQLite Sum and CASE statement - sql

I have a table
----------------------
| Col1 | Col2 | Col3 |
----------------------
| 1 |text1 | 1 |
----------------------
| 98 |text2 | 2 |
----------------------
| 2 |text3 | 1 |
----------------------
| 98 |text4 | 3 |
----------------------
I need to get a Sum of Col3 where Col1 = 98 and Sum of Col3 where Col1 <> 98. The desired result would be 5 and 2.
My SQL query looks like that:
'SELECT sum(case when Col1 = 98 then Col3 else 0 end) as aShort, ' +
'sum(case when Col1 <> 98 then Col3 else 0 end) as aLong '...
And the result that i get is 0 and 7.
What am i doing wrong?

I suppose data type of Col1 is not integer or numeric but of string type such as text or varchar, and seems values contain some whitespaces in them. I strongly recommend to hold a numeric value within a numeric type of column, but in the current case you need to cast as integer in order to get such a result as desired using the query below
SELECT SUM(CASE WHEN CAST(Col1 AS INT) = 98 THEN Col3 ELSE 0 END) AS aShort,
SUM(CASE WHEN CAST(Col1 AS INT) <> 98 THEN Col3 ELSE 0 END) AS aLong
FROM tab
Demo

Related

Why and How do ORDER BY CASE Queries Work in SQL Server?

Let's look at the following table:
| col1 | col2 |
| -------- | -------------- |
| 1 | NULL |
| 23 | c |
| 73 | NULL |
| 43 | a |
| 3 | d |
Suppose you wanted to sort it like this:
| col1 | col2 |
| -------- | -------------- |
| 1 | NULL |
| 73 | NULL |
| 43 | a |
| 23 | c |
| 3 | d |
With the following code this would be almost trivial:
SELECT *
FROM dbo.table1
ORDER BY col2;
However, to sort it in the following, non-standard way isn't that easy:
| col1 | col2 |
| -------- | -------------- |
| 43 | a |
| 23 | c |
| 3 | d |
| 1 | NULL |
| 73 | NULL |
I made it with the following code
SELECT *
FROM dbo.table1
ORDER BY CASE WHEN col2 IS NULL THEN 1 ELSE 0 END, col2;
Can you explain to me 1) why and 2) how this query works? What bugs me is that the CASE-statement returns either 1 or 0 which means that either ORDER BY 1, col2 or ORDER BY 0, col2 will be executed. But the following code gives me an error:
SELECT *
FROM dbo.table1
ORDER BY 0, col2;
Yet, the overall statement works. Why?
How does this work?
ORDER BY (CASE WHEN col2 IS NULL THEN 1 ELSE 0 END),
col2;
Well, it works exactly as the code specifies. The first key for the ORDER BY takes on the values of 1 and 0 based on col2. The 1 is only when the value is NULL. Because 1 > 0, these are sorted after the non-NULL values. So, all non-NULL values are first and then all NULL values.
How are the non-NULL values sorted? That is where the second key comes in. They are ordered by col2.
Starting with this sample data:
--==== Sample Data
DECLARE #t TABLE (col1 INT, col2 VARCHAR(10))
INSERT #t(col1,col2) VALUES (1,NULL),(23,'c'),(73,NULL),(43,'a'),(3 ,'d');
Now note these three queries that do the exact same thing.
--==== QUERY1: Note the derived query
SELECT t.col1, t.col2
FROM
(
SELECT t.col1, t.col2, SortBy = CASE WHEN col2 IS NULL THEN 1 ELSE 0 END
FROM #t AS t
) AS t
ORDER BY t.SortBy;
--==== QUERY2: This does the same thing but with less code
SELECT t.col1, t.col2, SortBy = CASE WHEN col2 IS NULL THEN 1 ELSE 0 END
FROM #t AS t
ORDER BY SortBy;
--==== QUERY3: This is QUERY2 simplified
SELECT t.col1, t.col2
FROM #t AS t
ORDER BY CASE WHEN col2 IS NULL THEN 1 ELSE 0 END;
Note that you can simplify your CASE statements like so:
--==== Simplified Case statemnt examples
SELECT t.col1, t.col2
FROM #t AS t
ORDER BY CASE col2 WHEN NULL THEN 1 ELSE 0 END;
SELECT t.col1, t.col2
FROM #t AS t
ORDER BY IIF(col2 IS NULL,1,0);
Try this:
DECLARE #Table TABLE (col1 int, col2 char(1))
INSERT INTO #Table
VALUES
( 1 , NULL)
, ( 23, 'c' )
, ( 73, NULL)
, ( 43, 'a' )
, ( 3 , 'd' )
;
SELECT *
FROM #Table
ORDER BY ISNULL(col2, CHAR(255))
Common table expressions can be a big help both as a way of clarifying an issue as well as solving it. If you move the CASE clause up into the CTE and then use it to sort, this answers both why and how it works.
With Qry1 (
SELECT col1,
col2,
CASE WHEN col2 IS NULL THEN 1 ELSE 0 END As SortKey
FROM dbo.table1
)
SELECT *
FROM Qry1
ORDER BY SortKey, col2;
This is a description for oracle database SQL's ORDER BY:
ORDER [ SIBLINGS ] BY
{ expr | position | c_alias }
[ ASC | DESC ]
[ NULLS FIRST | NULLS LAST ]
[, { expr | position | c_alias }
[ ASC | DESC ]
[ NULLS FIRST | NULLS LAST ]
]...
We can see that position and expr were depicted as separate paths in the diagram. From the fact, we can conclude that the 0 and 1 are not categorized as position because the CASE expression is not position even though the expression would be evaluated to a number, which is can be viewed as position value.
I think this view can be applied to T-SQL too.

How to convert same column different rows to different column same row in SQL?

This is what I want to do.
----input table----
SID | VALUE
1 | v1
1 | v2
1 | v3
1 | v4
1 | v5
2 | s1
2 | s2
2 | s3
---output table----
sid | col1 | col2 | col3 | col4 | col5
1 | v1 | v2 | v3 | v4 | v5
2 | s1 | s2 | s3 | '' | ''
The general pattern of a conditional aggregation:
SELECT
sid,
MAX(CASE WHEN value = 'v1' THEN value END) as col1,
MAX(CASE WHEN value = 'v2' THEN value END) as col2,
...
FROM t
GROUP BY sid
Il leave it for you to put the other columns in as a practice:)
I prefer to put the value as the column name, not col1, col2 etc
Also if you really want empty strings rather than nulls for those last two columns you can modify the case when to have ELSE '' (note: won't work if you use MIN instead of max) or use COALESCE around the MAX
Learn pivot
https://learn.microsoft.com/en-us/sql/t-sql/queries/from-using-pivot-and-unpivot?view=sql-server-ver15
do something like
select *
from
(
select sid, value
from table_name
) src
pivot
(
sid
for sid in ([1], [2])
) piv;

BigQuery : case when expression to Count from Same column but different conditions

I have a table with 2 columns as below:
Col 1 | col_stats
Field 1 | open
Field 2 | close
Field 1 | close
Field 1 | open
I want the ouput to be as :
Col1 | cnt_open | Cnt_close
Field 1 | 2 | 1
Field 2 | 0 | 1
**I wrote a query **
select col 1, count(case when col_stats= 'open' then 1 else 0 END) cnt_open,
count (case when col_stats= 'close' then 1 else 0 END ) cnt_close
from `project.dataset.tablename`
group by col1
Resultant output from above query is incorrect:
Col1 | cnt_open | Cnt_close
Field 1 | 2 | 2
Field 2 | 1 | 1
Can somebody let me know why the output is giving incorrect result for count even after case condition is applied?
Use countif():
select col1, countif(col_stat = 'open') as num_opens, countif(col_stat = 'closed') as num_closes
from t
group by col1;
In SQL count() counts the number of non-NULL values. Your code would work with sum(). But countif() is simpler and clearer.
Use null instead of 0:
select col1, count(case when col_stats= 'open' then 1 else null END) cnt_open,
count (case when col_stats= 'close' then 1 else null END ) cnt_close
from `project.dataset.tablename`
group by col1

Iterating through every row efficient?

Suppose I have the following table T1:
| col1 | col2 |
|------|------|
| 0 | 0 | // A++
| 3 | 123 | // B++
| 0 | 5 | // C++
| 8 | 432 | // A++
| 0 | 4 | // B++
I now need to create a trigger (on INSERT), that analyses every row, increases a counter (see below), populates the table T2 with the values of the counter:
IF col1 = 0 AND col2 = 0
A++
ELSE IF col1 = 0 col2 > 0
B++
ELSE IF col1 > 0
C++
In this case, T2 would look like:
| id | A | B | C |
|----|---|---|---|
| 1 | 1 | 2 | 2 |
My question is more about the design: Should I really iterate through each row, as described HERE, or is there a more efficient way?
Try something like this in trigger
;with data as
(
SELECT Sum(CASE WHEN col1 = 0 AND col2 = 0 THEN 1 END) AS a,
Sum(CASE WHEN col1 = 0 AND col2 > 0 THEN 1 END) AS b,
Sum(CASE WHEN col1 > 0 THEN 1 END) AS c
FROM (VALUES (0, 0 ),
(3, 123 ),
(0, 5 ),
(8, 432 ),
(0, 4 ) ) tc ( col1, col2 )
)
UPDATE yt
SET a = dt.a,
b = dt.b,
c = dt.c
FROM yourtable yt
JOIN data dt
ON a.id = b.id
This does not require row by row iteration. Replace the table valued constructor with Inserted table
This is something you should not write into a table (unless there aren't millions of rows and you need this for performance...). You should rather get this information on-the-fly like this:
DECLARE #T1 TABLE(col1 INT,col2 INT);
INSERT INTO #T1(col1,col2) VALUES
(0,0)
,(3,123)
,(0,5)
,(8,432)
,(0,4);
SELECT p.*
FROM
(
SELECT CASE WHEN col1=0 AND col2=0 THEN 'A'
WHEN col1=0 AND col2>0 THEN 'B'
WHEN col1>0 THEN 'C' END AS Category
FROM #T1 AS t
) AS tbl
PIVOT
(
COUNT(Category) FOR Category IN(A,B,C)
) AS p
The result
A B C
1 2 2
And I would suggest you to add another option (with ELSE) to catch invalid data (e.g. negativ values).

SELECT with calculated column that is dependent upon a correlation

I don't do a lot of SQL,and most of the time, I'm doing CRUD operations. Occasionally I'll get something a bit more complicated. So, this question may be a newbie question, but I'm ready. I've just been trying to figure this out for hours, and it's been no use.
So, Imagine the following table structure:
> | ID | Col1 | Col2 | Col3 | .. | Col8 |
I want to select ID and a calculated column. The calculated column has a range of 0 - 8 and it contains the number of matches to the query. I also want to restrict the result set to only include rows that have a certain number of matches.
So, from this sample data:
> | 1 | 'a' | 'b' | 1 | 2 |
> | 2 | 'b' | 'c' | 1 | 2 |
> | 3 | 'b' | 'c' | 4 | 5 |
> | 4 | 'x' | 'x' | 9 | 9 |
I want to query on Col1 = 'a' OR Col2 = 'c' OR Col3 = 1 OR Col4 = 5 where the calculated result > 1 and have the result set look like:
> | ID | Cal |
> | 1 | 2 |
> | 2 | 2 |
> | 3 | 2 |
I'm using T-SQL and SQL Server 2005, if it matters, and I can't change the DB Schema.
I'd also prefer to keep it as one self-contained query and not have to create a stored procedure or temporary table.
This answer will work with SQL 2005, using a CTE to clean up the derived table a little.
WITH Matches AS
(
SELECT ID, CASE WHEN Col1 = 'a' THEN 1 ELSE 0 END +
CASE WHEN Col2 = 'c' THEN 1 ELSE 0 END +
CASE WHEN Col3 = 1 THEN 1 ELSE 0 END +
CASE WHEN Col4 = 5 THEN 1 ELSE 0 END AS Result
FROM Table1
WHERE Col1 = 'a' OR Col2 = 'c' OR Col3 = 1 OR Col4 = 5
)
SELECT ID, Result
FROM Matches
WHERE Result > 1
Here's a solution that leverages the fact that a boolean comparison returns the integers 1 or 0:
SELECT * FROM (
SELECT ID, (Col1='a') + (Col2='c') + (Col3=1) + (Col4=5) AS calculated
FROM MyTable
) q
WHERE calculated > 1;
Note that you have to parenthesize the boolean comparisons because + has higher precedence than =. Also, you have to put it all in a subquery because you normally can't use a column alias in a WHERE clause of the same query.
It might seem like you should also use a WHERE clause in the subquery to restrict its rows, but in all likelihood you're going to end up with a full table scan anyway so it's probably not a big win. On the other hand, if you expect that such a restriction would greatly reduce the number of rows in the subquery result, then it'd be worthwhile.
Re Quassnoi's comment, if you can't treat boolean expressions as integer values, there should be a way to map boolean conditions to integers, even if it's a bit verbose. For example:
SELECT * FROM (
SELECT ID,
CASE WHEN Col1='a' THEN 1 ELSE 0 END
+ CASE WHEN Col2='c' THEN 1 ELSE 0 END
+ CASE WHEN Col3=1 THEN 1 ELSE 0 END
+ CASE WHEN Col4=5 THEN 1 ELSE 0 END AS calculated
FROM MyTable
) q
WHERE calculated > 1;
This query is more index friendly:
SELECT id, SUM(match)
FROM (
SELECT id, 1 AS match
FROM mytable
WHERE col1 = 'a'
UNION ALL
SELECT id, 1 AS match
FROM mytable
WHERE col2 = 'c'
UNION ALL
SELECT id, 1 AS match
FROM mytable
WHERE col3 = 1
UNION ALL
SELECT id, 1 AS match
FROM mytable
WHERE col4 = 5
) q
GROUP BY
id
HAVING SUM(match) > 1
This will only be efficient if all the columns you are searching for are, first, indexed and, second, have high cardinality (many distinct values).
See this article in my blog for performance details:
Matching 3 of 4