Count the number of not null columns using a case statement - sql

I need some help with my query...I am trying to get a count of names in each house, all the col#'s are names.
Query:
SELECT House#,
COUNT(CASE WHEN col#1 IS NOT NULL THEN 1 ELSE 0 END) +
COUNT(CASE WHEN col#2 IS NOT NULL THEN 1 ELSE 0 END) +
COUNT(CASE WHEN col#3 IS NOT NULL THEN 1 ELSE 0 END) as count
FROM myDB
WHERE House# in (house#1,house#2,house#3)
GROUP BY House#
Desired results:
house 1 - the count is 3 /
house 2 - the count is 2 /
house 3 - the count is 1
...with my current query the results for count would be just 3's

In this case, it seems that counting names is the same as counting the commas (,) plus one:
SELECT House_Name,
LEN(Names) - LEN(REPLACE(Names,',','')) + 1 as Names
FROM dbo.YourTable;

Another option since Lamak stole my thunder, would be to split it and normalize your data, and then aggregate. This uses a common split function but you could use anything, including STRING_SPLIT for SQL Server 2016+ or your own...
declare #table table (house varchar(16), names varchar(256))
insert into #table
values
('house 1','peter, paul, mary'),
('house 2','sarah, sally'),
('house 3','joe')
select
t.house
,NumberOfNames = count(s.Item)
from
#table t
cross apply dbo.DelimitedSplit8K(names,',') s
group by
t.house

Notice how the answers you are getting are quite complex for what they're doing? That's because relational databases are not designed to store data that way.
On the other hand, if you change your data structure to something like this:
house name
1 peter
1 paul
1 mary
2 sarah
2 sally
3 joe
The query now is:
select house, count(name)
from housenames
group by house
So my recommendation is to do that: use a design that's more suitable for SQL Server to work with, and your queries become simpler and more efficient.

One dirty trick is to replace commas with empty strings and compare the lengths:
SELECT house +
' has ' +
CAST((LEN(names) - LEN(REPLACE(names, ',', '')) + 1) AS VARCHAR) +
' names'
FROM mytable

You can parse using xml and find count as below:
Select *, a.xm.value('count(/x)','int') from (
Select *, xm = CAST('<x>' + REPLACE((SELECT REPLACE(names,', ','$$$SSText$$$') AS [*] FOR XML PATH('')),'$$$SSText$$$','</x><x>')+ '</x>' AS XML) from #housedata
) a

select House, 'has '+cast((LEN(Names)-LEN(REPLACE(Names, ',', ''))+1) as varchar)+' names'
from TempTable

Related

Using Stuff to roll up data from multiple rows AND concatenate columns

I've found similar questions on the site, but I'm still struggling with this. I have a table with information like the below:
AcctNo ChargeOrder ChargeCode
ABC 1 Charge1
ABC 2 Charge2
ABC 3 Charge3
I'm trying to use the XML Path/STUFF functions to return the data like so:
AcctNo Order/Code
ABC 1:Charge1 - 2:Charge2 - 3:Charge3
But I can't seem to figure out how to concatenate my chargeorder and chargecode AND STUFF them into a single field.
In SQL Server, you can use string_agg() -- in the more recent versions:
select acctno,
string_agg(concat(ChargeOrder, ':', ChargeCode), ' - ')
from t
group by acctno;
In older versions, this would be phrased as:
select a.acctno,
stuff( (select concat(' - ', ChargeOrder, ':', ChargeCode)
from t t2
where t2.acctno = a.acctno
for xml path ('')
), 1, 3, ''
)
from (select distinct acctno from t) a

Using Pivot with non Numerical Data

This is the first time I have ever tried to use PIVOT.
I am using Microsoft SQL Server.
So here is my issue, I have been reading up on Pivot and have decided that it would work great for a project that exports Patient data to a formatted file i.e. Report, that can be printed out etc.. etc..
VPatientPlusAllergyData is a VIEW, that displays this as a sample result with some of the data cut out for ease of reading
strPatientFullName strAllergy strAllergyMedication
------------------------------------------------------------
Smith, John Henry Dogs Pounces
Smith, John Henry Dogs Orange Juice
Smith, John Henry Mustard Ketchup
Smith, John Henry Mustard Sugar
This is the result I want
strPatientFullName strAllergy1 strAllergy1Medications strAllergy2 strAllergy2Medications
------------------------------------------------------------------------------------------------------
Smith, John Henry Dogs Pounces, OrangeJuice Mustard Ketchup, Sugar
After readin on W3Schools, watching a Youtube video and even reading some articles on this site I'm wondering if what I am trying to do is possible
below is a code snippet but I got stuck on what I should put in the IN statement, and when I started to question the viability of PIVOT being the answer to my particular problem.
GO
SELECT
strPatientFullName
,strStreetAddress
,strCity
,strState
,strZipcode
,strPrimaryPhoneNumber
,strSecondaryPhoneNumber
,blnSmoker
,decPackYears
,blnHeadOfHousehold
,dtmDateOfBirth
,strSex
,strAllergy
,strAllergyMedication
,strEmailAddress
,strRecordCreator
FROM ( SELECT * FROM VPatientPlusAllergyData ) PatientAllergyData
PIVOT
(
MAX(strAllergyMedication)
FOR strAllergy
IN ()
)
GO
Hoping someone more familiar with Pivot will show me what I am missing or enlighten me to a much more efficient solution.
Thanks for the help
****** EDIT: I Have Decided that while I would love to put this sort of operation on the server side, for my particular application, it was just simpler to create a ton of views then perform SELECT queries on the client side and concatenate them that way, then implementing a "EXPORT PROCESSING" Screen.
I appreciate all the help, maybe on day I will write a script and have it execute server side, but for the moment this work good enough ******
Here's an example of how you could do something like this with a STUFF statement, conditional aggregation and dynamic SQL.
DECLARE #SQL NVARCHAR(MAX) = '';
SELECT #SQL += '
, MAX(CASE WHEN RN = ' + RN + ' THEN strAllergy END) strAllergy' + RN + '
, MAX(CASE WHEN RN = ' + RN + ' THEN strAllergyMedications END) strAllergyMedications' + RN
FROM (
SELECT CAST(ROW_NUMBER() OVER (PARTITION BY strPatientFullName, strAllergy ORDER BY (SELECT NULL)) AS VARCHAR(5)) RN
FROM VPatientPlusAllergyData) T
GROUP BY RN;
SELECT #SQL = 'SELECT strPatientFullName' + #SQL + '
FROM (
SELECT strPatientFullname
, strAllergy
, STUFF((SELECT '', '' + strAllergyMedication FROM VPatientPlusAllergyData WHERE strPatientFullName = T.strPatientFullName AND strAllergy = T.strAllergy FOR XML PATH ('''')), 1, 2, '''') strAllergyMedications
, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN
FROM VPatientPlusAllergyData T
GROUP BY strPatientFullname, strAllergy) T
GROUP BY strPatientFullname;';
PRINT #SQL;
EXEC(#SQL);
As scsimon mentions in the comments, dynamic SQL may be necessary if there can be any number of allergies. A stuff statement is one way of getting the comma separated values into a single column. And the conditional aggregation works in the same way that a PIVOT would normally work, but is far easier (IMO) to write and understand than a normal PIVOT statement.
So to get to what you want you are actually looking at needing the following techniques:
For the case of strAllergyMedications you are needing to Concatenate Rows to a Delimited String
Then to make your rows into columns you need to PIVOT, but because you are pivoting 2 columns you would have to PIVOT twice or use Conditional Aggregation
The main trick to pulling it off is to prepare your table by doing the concatenation and coming up with a Row Number for the Allergy. Here is an example using a Common Table Expression [CTE] and STUFF() with a sub select XML to create the delimited string and create the Row Number.
DECLARE #VPatientPlusAllergyData AS TABLE (strPatientFullName VARCHAR(100), strAllergy VARCHAR(50), strAllergyMedication VARCHAR(100))
INSERT INTO #VPatientPlusAllergyData VALUES
('Smith, John Henry','Dogs','Pounces')
,('Smith, John Henry','Dogs','Orange Juice')
,('Smith, John Henry','Mustard','Ketchup')
,('Smith, John Henry','Mustard','Sugar')
;WITH cte AS (
SELECT DISTINCT
v1.strPatientFullName
,v1.strAllergy
,strAllergyMedications = STUFF(
(SELECT ', ' + v2.strAllergyMedication
FROM
#VPatientPlusAllergyData v2
WHERE
v1.strPatientFullName = v2.strPatientFullName
AND v1.strAllergy = v2.strAllergy
FOR XML PATH(''))
,1,2,'')
,AllergyRowNum = DENSE_RANK() OVER (PARTITION BY v1.strPatientFullName ORDER BY v1.strAllergy)
FROM
#VPatientPlusAllergyData v1
)
SELECT
strPatientFullName
,strAllergy1 = MAX(CASE WHEN AllergyRowNum = 1 THEN strAllergy END)
,strAllergy1Medications = MAX(CASE WHEN AllergyRowNum = 1 THEN strAllergyMedications END)
,strAllergy2 = MAX(CASE WHEN AllergyRowNum = 2 THEN strAllergy END)
,strAllergy2Medications = MAX(CASE WHEN AllergyRowNum = 2 THEN strAllergyMedications END)
FROM
cte
GROUP BY
strPatientFullName
AND while I was preparing and posting this #ZLK wrote a nice method to do it dynamically.

sort by second string in database field

I have the below sql statement which sorts an address field (address1) using the street name not the number. This seems to work fine but I want the street names to appear alphabetically. The ASC at the end of order by doesnt help
e.g Address1 field might contain
"5 Elm Close" - a normal sort and order will sort by the number the below will sort by looking at the 2nd string "Elm"
(Using SQL Server)
SELECT tblcontact.ContactID, tblcontact.Forename, tblcontact.Surname,
tbladdress.AddressLine1, tbladdress.AddressLine2
FROM tblcontact
INNER JOIN tbladdress
ON tblcontact.AddressID = tbladdress.AddressID
LEFT JOIN tblDonate
ON tblcontact.ContactID = tblDonate.ContactID
WHERE (tbladdress.CollectionArea = 'Queens Park')
GROUP BY tblcontact.ContactID, tblcontact.Forename, tblcontact.Surname,
tbladdress.AddressLine1, tbladdress.AddressLine2
ORDER BY REVERSE(LEFT(REVERSE(tbladdress.AddressLine1),
charindex(' ', REVERSE(tbladdress.AddressLine1)+' ')-1)) asc
Gordon's statement sorts as below
1 Kings Road
10 Olivier Way
11 Albert Street
11 Kings Road
11 Princes Road
120 High Street
Try this: I based it off of Gordon's code, but altered it to remove the LEFT(AddressLine1, 1) portion - a single-character string could never be match the pattern "n + space + %".
This works on my SQL-Server 2012 environment:
WITH tbladdress AS
(
SELECT AddressLine1 FROM (VALUES ('1 Kings Road'),('10 Olivier Way'), ('11 Albert Street')) AS V(AddressLine1)
)
SELECT
AddressLine1
FROM tbladdress
order by (case when tbladdress.AddressLine1 like '[0-9]% %'
then substrING(tbladdress.AddressLine1, charindex(' ', tbladdress.AddressLine1) + 1, len(tbladdress.AddressLine1))
else tbladdress.AddressLine1
end)
This is edited to be more similar to Gordon's code (position of closing parentheses, substr instead of substring):
order by (case when tbladdress.AddressLine1 like '[0-9]% %'
then substr(tbladdress.AddressLine1, charindex(' ', tbladdress.AddressLine1) + 1), len(tbladdress.AddressLine1)
else tbladdress.AddressLine1
end)
If you assume that the street name is the first or second value in a space separated string, you could try:
order by (case when left(tbladdress.AddressLine1, 1) like '[0-9]% %'
then substr(tbladdress.AddressLine1, charindex(' ', tbladdress.AddressLine1) + 1), len(tbladdress.AddressLine1) )
else tbladdress.AddressLine1
end)
I don't think you need to use REVERSE() at all. That seems like a trap.
ORDER BY
CASE
WHEN ISNUMERIC(LEFT(tbladdress.AddressLine1,CHARINDEX(' ',tbladdress.AddressLine1) - 1))
THEN RIGHT(tbladdress.AddressLine1,LEN(tbladdress.AddressLine1) - CHARINDEX(' ',tbladdress.AddressLine1))
ELSE tbladdress.AddressLine1
END,
CASE
WHEN ISNUMERIC(LEFT(tbladdress.AddressLine1,CHARINDEX(' ',tbladdress.AddressLine1) - 1))
THEN CAST(LEFT(tbladdress.AddressLine1,CHARINDEX(' ',tbladdress.AddressLine1) - 1) AS INT)
ELSE NULL
END
Also, you have a GROUP BY with no aggregate function. While that's not wrong, per se, it is weird. Just use DISTINCT if you're getting duplicate records.
This is the bit of code that works in sql server
order by (case when tbladdress.AddressLine1 like '[0-9]% %'
then substrING(tbladdress.AddressLine1, charindex(' ', tbladdress.AddressLine1) + 1, len(tbladdress.AddressLine1))
else tbladdress.AddressLine1
end)

Could a sql database return a sql result set plus a score for the results?

Just curious, if I wanted to send strings to a database (perhaps for MS SQL Server) can anyone provide any insight on what the best way would be to return results from a database where the result set might be sorted and "scored" on its closeness to the string passed in?
So, if I sent a query for :
SELECT name FROM table where name LIKE 'Jon'
and then get a result of 1000 results that looks like:
100 Jon
98 John
80 Jonathan
32 Nathan
Views, indexes, stored procedures, coded solution? What is the recommendation?
You could, but you'd need to use another function to do it. Levenshtein ratio or Jaro distance would be the most common solutions. I'm not sure what, if anything, SQL Server includes builtin for this. If nothing else I think you can use the SimMetrics library as described here. Regardless, it would look something like this.
select top 1000
jaro('John', name) as score, name
from table
where name like '%John%'
order by 1 desc
EDIT
Due to some persistent prodding from the comments, I present here an implementation of the Levenshtein distance calculation in SQL. TSQL for SQL Server 2005+ is used here, but the technique can be converted to other DBMS as well. Maximum score is 100.
;with tbl as (
select 'Jon' AS Name union all
select 'Jonathan' union all
select 'Jonny' union all
select 'John' union all
select 'Bone' union all
select 'BJon' union all
select 'Nathan' union all
select 'Jonne')
SELECT *, SCORE_Levenshtein + SCORE_SOUNDEX TotalScore
FROM
(
SELECT name,
CAST(50 /
(
select 1.0 + MAX(LDist)
FROM
(
select startAt.number,
LEN(longer) -
sum(case when SUBSTRING(longer, startAt.number+offset.number, 1)
= SUBSTRING(shorter, 1+offset.number, 1) then 1 else 0 end ) LDist
FROM
(select case when LEN(Name) < LEN(LookFor) then Name else LookFor end shorter) shorter
cross join
(select case when LEN(Name) >= LEN(LookFor) then Name else LookFor end longer) longer
inner join master..spt_values startAt
on startAt.type='P' and startAt.number between 1 and len(longer) - LEN(shorter) + 1
inner join master..spt_values offset
on offset.type='P' and offset.number between 0 and LEN(shorter)-1
group by startAt.number, longer, shorter
) X
) AS NUMERIC(16,4)) SCORE_Levenshtein
,
CAST(50 / (5- -- inversely proportional to soundex difference
(
SELECT 0.0 +
case when Substring(A,1,1)=Substring(B,1,1) then 1 else 0 end
+
case when Substring(A,2,1)=Substring(B,2,1) then 1 else 0 end
+
case when Substring(A,3,1)=Substring(B,3,1) then 1 else 0 end
+
case when Substring(A,4,1)=Substring(B,4,1) then 1 else 0 end
FROM (select soundex(name) as A, SOUNDEX(LookFor) as B) X
)) AS NUMERIC(16,4)) AS SCORE_SOUNDEX
FROM tbl
CROSS JOIN (SELECT 'Jon' as LookFor) LookFor
) Scored
Order by SCORE_Levenshtein + SCORE_SOUNDEX DESC
Note - This line CROSS JOIN (SELECT 'Jon' as LookFor) LookFor is used so that the input 'Jon' does not need to be repeated many times in the query. One could also define a variable instead and use it where LookFor is used in the query.
Output
It is worth noting that together with SOUNDEX, Jonny gets to score higher than Bone which won't happen with Levenshtein alone.
name SCORE_Levenshtein SCORE_SOUNDEX TotalScore
Jon 50.0000 50.0000 100.0000
John 12.5000 50.0000 62.5000
Jonny 8.3333 50.0000 58.3333
Jonne 8.3333 50.0000 58.3333
Bone 10.0000 25.0000 35.0000
BJon 10.0000 12.5000 22.5000
Jonathan 5.5556 16.6667 22.2223
Nathan 7.1429 12.5000 19.6429
Original answer follows, based on pre-filtering the input based on LIKE '%x%' which collapses the Levenshtein to a simple Len(column) - Len(Like-expression) calculation
Have a look at this example - it tests the length and SOUNDEX differences, for lack of better measures.
The maximum score is 100.
;with tbl as (
select 'Jon' AS Name union all
select 'Jonathan' union all
select 'Jonny' union all
select 'John' union all -- doesn't match LIKE
select 'BJon' union all
select 'Jonne')
SELECT name,
50 / (Len(Name) - LEN('Jon') + 1.0) -- inversely proportional to length difference
+
50 / (5- -- inversely proportional to soundex difference
(
SELECT 0.0 +
case when Substring(A,1,1)=Substring(B,1,1) then 1 else 0 end
+
case when Substring(A,2,1)=Substring(B,2,1) then 1 else 0 end
+
case when Substring(A,3,1)=Substring(B,3,1) then 1 else 0 end
+
case when Substring(A,4,1)=Substring(B,4,1) then 1 else 0 end
FROM (select soundex(name) as A, SOUNDEX('Jon') as B) X
)) AS SCORE
FROM tbl
where name LIKE '%Jon%'
Order by SCORE DESC
Output
name SCORE
Jon 100.00000000000000000
Jonny 66.66666666666660000
Jonne 66.66666666666660000
BJon 37.50000000000000000
Jonathan 24.99999999999996666
Something like this might help:
http://www.mombu.com/microsoft/microsoft/t-equivalent-sql-server-functions-for-match-against-in-mysql-2292412.html

is there a PRODUCT function like there is a SUM function in Oracle SQL?

I have a coworker looking for this, and I don't recall ever running into anything like that.
Is there a reasonable technique that would let you simulate it?
SELECT PRODUCT(X)
FROM
(
SELECT 3 X FROM DUAL
UNION ALL
SELECT 5 X FROM DUAL
UNION ALL
SELECT 2 X FROM DUAL
)
would yield 30
select exp(sum(ln(col)))
from table;
edit:
if col always > 0
DECLARE #a int
SET #a = 1
-- re-assign #a for each row in the result
-- as what #a was before * the value in the row
SELECT #a = #a * amount
FROM theTable
There's a way to do string concat that is similiar:
DECLARE #b varchar(max)
SET #b = ""
SELECT #b = #b + CustomerName
FROM Customers
Here's another way to do it. This is definitely the longer way to do it but it was part of a fun project.
You've got to reach back to school for this one, lol. They key to remember here is that LOG is the inverse of Exponent.
LOG10(X*Y) = LOG10(X) + LOG10(Y)
or
ln(X*Y) = ln(X) + ln(Y) (ln = natural log, or simply Log base 10)
Example
If X=5 and Y=6
X * Y = 30
ln(5) + ln(6) = 3.4
ln(30) = 3.4
e^3.4 = 30, so does 5 x 6
EXP(3.4) = 30
So above, if 5 and 6 each occupied a row in the table, we take the natural log of each value, sum up the rows, then take the exponent of the sum to get 30.
Below is the code in a SQL statement for SQL Server. Some editing is likely required to make it run on Oracle. Hopefully it's not a big difference but I suspect at least the CASE statement isn't the same on Oracle. You'll notice some extra stuff in there to test if the sign of the row is negative.
CREATE TABLE DUAL (VAL INT NOT NULL)
INSERT DUAL VALUES (3)
INSERT DUAL VALUES (5)
INSERT DUAL VALUES (2)
SELECT
CASE SUM(CASE WHEN SIGN(VAL) = -1 THEN 1 ELSE 0 END) % 2
WHEN 1 THEN -1
ELSE 1
END
* CASE
WHEN SUM(VAL) = 0 THEN 0
WHEN SUM(VAL) IS NOT NULL THEN EXP(SUM(LOG(ABS(CASE WHEN SIGN(VAL) <> 0 THEN VAL END))))
ELSE NULL
END
* CASE MIN(ABS(VAL)) WHEN 0 THEN 0 ELSE 1 END
AS PRODUCT
FROM DUAL
The accepted answer by tuinstoel is correct, of course:
select exp(sum(ln(col)))
from table;
But notice that if col is of type NUMBER, you will find tremendous performance improvement when using BINARY_DOUBLE instead. Ideally, you would have a BINARY_DOUBLE column in your table, but if that's not possible, you can still cast col to BINARY_DOUBLE. I got a 100x improvement in a simple test that I documented here, for this cast:
select exp(sum(ln(cast(col as binary_double))))
from table;
Is there a reasonable technique that would let you simulate it?
One technique could be using LISTAGG to generate product_expression string and XMLTABLE + GETXMLTYPE to evaluate it:
WITH cte AS (
SELECT grp, LISTAGG(l, '*') AS product_expression
FROM t
GROUP BY grp
)
SELECT c.*, s.val AS product_value
FROM cte c
CROSS APPLY(
SELECT *
FROM XMLTABLE('/ROWSET/ROW/*'
PASSING dbms_xmlgen.getXMLType('SELECT ' || c.product_expression || ' FROM dual')
COLUMNS val NUMBER PATH '.')
) s;
db<>fiddle demo
Output:
+------+---------------------+---------------+
| GRP | PRODUCT_EXPRESSION | PRODUCT_VALUE |
+------+---------------------+---------------+
| b | 2*6 | 12 |
| a | 3*5*7 | 105 |
+------+---------------------+---------------+
More roboust version with handling single NULL value in the group:
WITH cte AS (
SELECT grp, LISTAGG(l, '*') AS product_expression
FROM t
GROUP BY grp
)
SELECT c.*, s.val AS product_value
FROM cte c
OUTER APPLY(
SELECT *
FROM XMLTABLE('/ROWSET/ROW/*'
passing dbms_xmlgen.getXMLType('SELECT ' || c.product_expression || ' FROM dual')
COLUMNS val NUMBER PATH '.')
WHERE c.product_expression IS NOT NULL
) s;
db<>fiddle demo
*CROSS/OUTER APPLY(Oracle 12c) is used for convenience and could be replaced with nested subqueries.
This approach could be used for generating different aggregation functions.
There are many different implmentations of "SQL". When you say "does sql have" are you referring to a specific ANSI version of SQL, or a vendor specific implementation. DavidB's answer is one that works in a few different environments I have tested but depending on your environment you could write or find a function exactly like what you are asking for. Say you were using Microsoft SQL Server 2005, then a possible solution would be to write a custom aggregator in .net code named PRODUCT which would allow your original query to work exactly as you have written it.
In c# you might have to do:
SELECT EXP(SUM(LOG([col])))
FROM table;