SQL Count empty fields - sql

I'm not sure if this is possible or if it is, how to do it -
I have the following data in a database -
id | improve | timeframe | criteria | impact
-------+------------+-------------+-----------+---------
1 | | Test | Test | Test
2 | Test | | Test |
3 | | Test | |
-------+------------+-------------+-----------+---------
Ignoring the id column, how can I determine the number of fields out of the remaining 12 that are not null using an SQL query?
I have started with -
SELECT improve, timeframe, impact, criteria
FROM data
WHERE improve IS NOT NULL
AND timeframe IS NOT NULL
AND impact IS NOT NULL
AND criteria IS NOT NULL;
This only returns the number of rows, ie. 3.
Any ideas?
Thanks.

SELECT count(improve) + count(timeframe) + count(impact) + count(criteria) FROM data

Something like this may get you going in the right direction
SELECT
SUM(CASE WHEN improve IS NULL THEN 0 ELSE 1 END +
CASE WHEN timeframe IS NULL THEN 0 ELSE 1 END +
CASE WHEN criteria IS NULL THEN 0 ELSE 1 END +
CASE WHEN impact IS NULL THEN 0 ELSE 1 END)
from
data

SELECT id, COUNT(improve) + COUNT(timeframe) + COUNT(impact) + COUNT(criteria) FROM data GROUP BY id;

IF you're using SQL Server, use DATALENGTH().
SELECT improve, timeframe, impact, criteria
FROM data
WHERE DATALENGTH(improve) > 0
AND DATALENGTH(timeframe) > 0
AND DATALENGTH(impact) > 0
AND DATALENGTH(criteria) >0;
DATALENGTH returns the length of the string in bytes, including trailing spaces. It sounded as though you're OK with blank fields, so DATALENGTH does the job. Otherwise, you could also use LEN(), which would trim any trailing space.
IF you are using MySQL, you can use CHARACTER_LENGTH, which removes trailing white space and then gives you a character count of the field you want to check.

SELECT Sum(case when improve is null then 0 else 1 end +
case when timeframe is null then 0 else 1 end +
case when impact is null then 0 else 1 end +
case when criteria is null then 0 else 1 end)
FROM data
group by improve, timeframe, impact, criteria

Related

Any uses of allowing literal NULL with operators?

Some databases support using literal NULL as an operand while others do not. As an example:
SELECT 1 + NULL
Snowflake: null
BigQuery: error
MySQL: null
Postgres: null
SQLServer: null
I'm trying to determine how I should handle this in an application, and was wondering if there are ever any (valid) use cases for when it might be useful to have a literal null in an expression? This could also include testing.
Writing the expression 1 + NULL by itself is fairly meaningless, as we would expect it to always evaluate to NULL (except, apparently, on BigQuery, where it errors out). However, 1 + NULL could arise as the result of some other calculation. Consider the following data and query:
id | val
1 | NULL
2 | 5
2 | 10
3 | NULL
3 | 7
and the query:
SELECT id, 1 + SUM(val) AS total
FROM yourTable
GROUP BY id;
Here for id = 1 the aggregate total would evaluate to 1 + NULL, which would be NULL on most databases. One way around this would be to use COALESCE():
SELECT id, 1 + COALESCE(SUM(val), 0) AS total
FROM yourTable
GROUP BY id;
Now for id groups having only NULL values, we would replace that NULL sum by zero.

SQL to show 1 for unique and 0 for reapeat

looking for a quick solution on SQL...
I used to have a clunky formula in excel: =IF(COUNTIF($C$2:C2,C2)>1,0,COUNTIF($C$2:C2,C2)) to print 1 for unique item and 0 for a repeat.
Then moved to =1-(C1-C2) and that kinda did the job... Not an accurate one Now looking for an SQL that could do a similar job... The example below for result needed:
NUMBER UNIQUE
6573455300000 1
6573455300000 0
6573455300000 0
6573455300000 0
6573411981080 1
6573411981080 0
6573411981080 0
6573411981080 0
Does anyone know any kind of code to achieve this?
using row_number():
select
col
, [first] = case when row_number() over (partition by col order by (select 1)) > 1 then 0 else 1 end
from t
rextester demo: http://rextester.com/FWA89661
returns:
+---------------+-------+
| col | first |
+---------------+-------+
| 6573411981080 | 1 |
| 6573411981080 | 0 |
| 6573411981080 | 0 |
| 6573411981080 | 0 |
| 6573455300000 | 1 |
| 6573455300000 | 0 |
| 6573455300000 | 0 |
| 6573455300000 | 0 |
+---------------+-------+
Use window functions. In your case, you seem to want the first row and mark that, so row_number() looks like the solution:
select t.*,
(case when row_number() over (partition by number order by ?) = 1
then 1 else 0 end
end) as flag
from t;
The ? is for the column that specifies the ordering (which is first). If you want just one row but don't care which, then you can use order by number or order by (select null).
UNIQUE is a SQL keyword (think "unique index"), so it is a bad name for a column. That is why I changed to the generic flag, although you might prefer first_row_flag or something like that.
SELECT
[number],
case when rown = 1 then 1 else 0 end as [unique]
FROM
(
SELECT
[number], row_number() OVER(partition by [number] order by [number]) as rown
FROM
t
) a
This doesn't strictly have to be done using a subquery but it's unlikely to make any difference to the overall performance, so it's arranged like this to help you see what is going on. If you run just the inner subquery in isolation you'll see that the most important work is done by row_number; essentially the data is partitioned into buckets based on the value of [number] something like a group by, but it doesn't suppress repeated values. Within the partition each occurrence of [number] is numbered with an incrementing counter. When a different value of [number] is encountered the numbering restarts from 1. The order by clause is just there because sql server demands you have one, and we don't know anything else about your table but if there's something else about your data where one of these occurrences would be more ideal to single out to be labelled with [unique]=1, try and find a way to make it so that row is sorted into position 1; a typical use of this pattern is "latest record" in which case the order by part would be [datecolumn] DESC
Once you have an increment of counter per number that resets itself, all we need to do is use a standard case / else statement to make it a 1 when it's 1 otherwise 0 to match your result desired
select t.Number,case when t.num=1 then t.num else 0 end [Unique] from(
select Number,row_number() over (partition by number order by number) num from MyTbl)t
order by t.Number

SQL Count of Column value and its a subColumn

I have A Table in DB2 Database such as below:
StatusCode | IsResolved | IsAssigned
ABC | Y |
ABC | N |
ABC | |
ADEF | Y |
ADEF | | Y
I want to get data in the way such as:
StatusCode |Count of Status Code| Count of Resolved with value Y| Count of Assigned With value Y
ABC | 3 | 1 | 0
ADEF | 2 | 1 | 1
I am able to get count of Status Code by using groupBy but I am not sure how to fetch data of count of resolved and assigned in the same query.
Query: select statusCode,count(statusCode) from table group by statusCode
Can anyone help me in how to fetch the resolved and Assigned count?
Issue Solution: Christian and JPW: Solution was to Use sum(case IsResolved when 'Y' then 1 else 0 end)
Try to use
select statusCode, count(statusCode),
sum(case IsResolved when 'Y' then 1 else 0 end),
sum(case IsAssigned when 'Y' then 1 else 0 end)
from table
group by statusCode
One way to get the result you want is to use conditional aggregation (where you use a predicate to determine how to aggregate data) like this:
select
StatusCode,
count(*) as "Count of Status Code",
sum(case when IsResolved = 'Y' then 1 else 0 end) as "Count of Resolved with value Y",
sum(case when IsAssigned = 'Y' then 1 else 0 end) as "Count of Assigned With value Y"
from your_table
group by StatusCode;
The case expression construct (case ... when ... then .. end) is part of the ANSI SQL standard, so this should work in any compliant database.
You can achieve this using SUM() and CASE
SELECT
statusCode,
COUNT(statusCode)
,SUM(CASE WHEN IsResolved='Y' THEN 1 ELSE 0 END) Resolved
,SUM(CASE WHEN IsAssigned='Y' THEN 1 ELSE 0 END) Assigned
FROM [Questions] GROUP BY statusCode
Here is a related question: Sql Server equivalent of a COUNTIF aggregate function
I suppose the prior answers used the SUM aggregate because the value of the missing values was unknown. If the missing values are the NULL value, then each could have been coded as the COUNT with the same effect as the SUM.
And if the missing values from the "I have a table" given in the OP are the NULL value, and if [effectively the data meets or actually there exists] a CHECK constraint for the isColumnNames of IN ('Y','N'), then similar to the other answers, but performing a COUNT and using NULLIF as a simplified/special-case effect of the CASE expression:
select
statuscode as "StatusCode"
, count(*) as "Count of Status Code"
, count(nullif(isResolved,'N')) as "Count of Resolved with value Y"
, count(nullif(isAssigned,'N')) as "Count of Assigned with value Y"
from so39705143
group by statuscode
order by statuscode

select count by value

Given a table messages with the following fields:
id | Number
customer_id | Number
source | VARCHAR2
...
I want to know how many messages each customer has, but I want to differentiate between messages where source equals to 'xml' and all other sources.
My query so far
SELECT customer_id,
case when source = 'xml' then 'xml' else 'manual' end as xml,
count(*)
FROM MESSAGES
GROUP BY customer_id,
case when source = 'xml' then 'xml' else 'manual' end;
which gives me a result similar to this:
customer_id | xml | count
----------------------------
1 | xml | 12
1 | manual | 34
2 | xml | 54
3 | xml | 77
3 | manual | 1
...
This is rather ugly in two ways:
I have to repeat the case statement in both the field list and in the group list
I now have two rows per customer.
Q: Is it possible to formulate a query, such that the result looks like this instead?
customer_id | xml | manual
--------------------------
1 | 12 | 34
2 | 54 | 0
3 | 11 | 1
You are looking for conditional aggregation:
SELECT customer_id,
count(case when source = 'xml' then 1 end) as xml_count,
count(case when source <> 'xm' then 1 end) as manual_count
FROM MESSAGES
GROUP BY customer_id
This works because aggregates ignore NULL values and the result of the CASE will be NULL if source does not contain the value from the case condition.
Use conditional aggregation.
SELECT customer_id,
sum(case when source = 'xml' then 1 else 0 end) as xml,
sum(case when source <> 'xml' then 1 else 0 end) as manual
FROM MESSAGES
GROUP BY customer_id
This assumes the source column is non null. If it can be null use coalesce or nvl in the case expression so the comparison gives you expected results.
This will work, it doesn't appear you have a source called 'manual'. COUNT or SUM will give you the same difference.
SELECT
customer_id
,ISNULL(COUNT(CASE WHEN source = 'xml' THEN 1 END),0) xml
,ISNULL(COUNT(CASE WHEN source <> 'xml' OR source IS NULL THEN 1 END),0) manual
FROM Messages
GROUP BY customer_id
This will allow for zero to appear where you usually would see a NULL value, your sample has a zero rather than a null.
Here is a fancy solution (it does almost exactly what vkp's solution does), using the PIVOT operation introduced in Oracle 11.1. Note how the distinction between 'xml' and all others (including NULL) is dealt with in the subquery.
select *
from (select customer_id, case when source = 'xml' then 'xml' else 'other' as source
from messages)
pivot (count(*) for source in ('xml' as xml, 'other' as other))
;
There is other way by using decode function apart from CASE:
SELECT cust_id,
COUNT(DECODE(source,'xml','xml'))"XML",
COUNT(DECODE(source,'manual','manual'))"manual"
FROM MESSAGES
GROUP BY cust_id;
But, this won't show result when you have null as source.

select max value in a group of consecutive values

How do you do to retrieve only the max value of a group with only consecutive values?
I have a telephone database with only unique values and I want to get only the highest number of each telephone number group TelNr and I am struggling.
id | TeNr | Position
1 | 100 | SLMO2.1.3
2 | 101 | SLMO2.3.4
3 | 103 | SLMO2.4.1
4 | 104 | SLMO2.3.2
5 | 200 | SLMO2.5.1
6 | 201 | SLMO2.5.2
7 | 204 | SLMO2.5.5
8 | 300 | SLMO2.3.5
9 | 301 | SLMO2.6.2
10 | 401 | SLMO2.4.8
Result should be:
TelNr
101
104
201
204
301
401
I have tried almost every tip I could find so far and whether I get all TelNr or no number at all which is useless in my case.
Any brilliant idea to run this with SQLITE?
So you're searching for gaps and want to get the first value of those gaps.
This is probably the best way to get them, try to check for a row with the current TeNr plus 1 and if there's none you found it:
select t1.TeNr, t1.TeNr + 1 as unused_TeNr
from tab as t1
left join Tab as t2
on t2.TeNr = t1.TeNr + 1
where t2.TeNr is null
Edit:
To get the range of missing values you need to use some old-style SQL as SQLite doesn't seem to support ROW_NUMBER, etc.
select
TeNr + 1 as RangeStart,
nextTeNr - 1 as RangeEnd,
nextTeNr - TeNr - 1 as cnt
from
(
select TeNr,
( select min(TeNr) from tab as t2
where t2.TeNr > t1.TeNr ) as nextTeNr
from tab as t1
) as dt
where nextTeNr > TeNr + 1
It's probably not very efficient, but might be ok if the number of rows is small and/or there's a index on TeNr.
Getting each value in the gap as a row in your result set is very hard, if your version of SQLite supports recursive queries:
with recursive cte (TeNr, missing, maxTeNr) as
(
select
min(TeNr) as TeNr, -- start of range of existing numbers
0 as missing, -- 0 = TeNr exists, 1 = TeNr is missing
max(TeNr) as maxTeNr -- end of range of existing numbers
from tab
union all
select
cte.TeNr + 1, -- next TeNr, if it doesn't exists tab.TeNr will be NULL
case when tab.TeNr is not null then 0 else 1 end,
maxTeNr
from cte left join tab
on tab.TeNr = cte.TeNr + 1
where cte.TeNr + 1 < maxTeNr
)
select TeNr
from cte
where missing = 1
Depending on your data this might return a huge amount of rows.
You might also use the result of the previous RangeStart/RangeEnd query as input to this recursion.