Add character to selected duplicate value in SQL - sql

I need to add character / number to selected duplicate values.
This is what I need:
SELECT Name -- Here I need to add for example 1 if It have duplicates
-- If It is hard way to code, how to add 1 to all selected values?
FROM Example
WHERE Id BETWEEN 25 AND 285
If there are 2 equal names Peter It should select Peter and second Peter1
If there is no easy way to make It, how to add 1 to all selected lines? Should select Peter1 instead of Peter
I've tried this:
SELECT Name + ' 1' AS Name -- in this case selecting wrong column
FROM Example
WHERE Id BETWEEN 25 AND 285
EDIT
SELECT #cols += ([Name]) + ','
FROM (SELECT Name --I neeed to integrate It here
FROM FormFields
WHERE ID BETWEEN 50 AND 82
) a
If I use this:
SELECT #cols += ([Name]) + ',' -- here throws error
FROM (SELECT Name + CASE WHEN RowNum = 1 THEN '' ELSE CONVERT(NVARCHAR(100), RowNum-1) END AS [UpdatedName]
FROM (
SELECT Name AS Name,
ROW_NUMBER() OVER (PARTITION BY Name ORDER BY Name) AS "RowNum"
FROM FormFields
WHERE Id Between 50 And 82) x
) a
It throws error: Invalid column name 'Name'.
EDIT 2
It's different tests but some of them have the same criteria. That's why I need It to rename.

You can do this via getting the Row_Number and using a Case. Here's an example for SQL Server:
;With Cte As
(
Select Name, Row_Number() Over (Partition By Name Order By Name) RN
From Example
Where Id Between 25 And 285
)
Select Case When RN = 1 Then Name Else Name + Cast((RN - 1) As Varchar (3)) End As Name
From Cte

You could use the ROW_NUMBER function built into SQL server.
select Name + case when RowNum = 1 then '' else CONVERT(varchar(100), RowNum-1) end as "UpdatedName"
from (
select name as "Name",
ROW_NUMBER() over (partition by name order by name) as "RowNum"
from Example
Where Id Between 25 And 285) x
Please note that this still doesn't guarantee you unique names. Afterall, someone could already have a name of "MyName1", so if you had 2 people with names "MyName" you'd still get 2 "MyName1" with this select statement.

This is very unusual request, it looks like you are trying to "make car run with wheels on the roof" :)
The root problem is almost sure wrong database design... Pivot is usually used for data summaries. If you have in the same column "Peter" and "Peter" with different meanings, it looks that there is something wrong. Or do you need to differentiate both Peters for any other reason?
I do not understand what are you trying to achieve. If Peter is always Peter, and you just want to avoid duplicities, you can simply use "group by Name". But this is what pivot does automatically... If Peter and Peter have two different meanings (like Peter1 and Peter2), you should think about changing database structure, if possible.
Or try to explain more deeply what are you trying to achieve.
EDIT:
OK, now I understand the desired output. And what is the structure of your source data table(s)? From your schema it is clear that you need to make PIVOT columns based on
Testname+groupId
or
Testname+convert(varchar(100),groupId)
if groupId is number. That is your Peter1,Peter2 composition. It will create columns that you need. But I dont't know where testname and groupId are located in your datatables. Do test names correspond to column NAMES or to VALUES stored in DB? Is groupId something like TestId? Again column or value? Provide more info about source data structure, if you need more help, your problem is not so complicated.

Since the columns have group IDs, concatenate the Column name with an Underscore and GroupID as a key value and when you display it, strip the underscore and trailing characters.
Like This:
SELECT #cols += ([Name]) + ','
FROM (SELECT Name + '_' + CAST(GroupId AS varchar)
FROM FormFields
WHERE ID BETWEEN 50 AND 82
) a
I assume you are using this to build a dynamic SQL statement. I'm not sure what the schema of your FormFields Table is, but if it includes something like the test name you could append an AS [Name] + ' - ' +[TestName] to have the column header be something more useful. I would say try a PIVOT, but that could get pretty ungainly if the tests don't have the majority of the fields in common...
I also assume you are storing responses to these prompts in a table that looks something like this:
CREATE TABLE [Responses]
(
RespID int IDENTITY NOT NULL,
UserID int NOT NULL,
FieldID int NOT NULL,
RespVal int/varchar/whatever NOT NULL
)
Then perhaps you have a [Test] table with some test metadata that acts as the primary key for your GroupID Foreign key in your FormFields table.
In your example you show responses across all columns, but I'm not sure how that would work since (unless I'm missing something in your explanation and the inferences I've made to your design) one set of responses would only be populated for one of the groups per row, unless you are aggregating responses, but then by what criteria? Perhaps the rows correspond to respondents and all respondents are required to answer across all form types. In that case, your output would work as a PIVOT like this:
DECLARE #sql varchar(4000) = ''
DECLARE #colList varchar(1000)
DECLARE #selList varchar(1000)
;WITH NameBase
AS
(
SELECT t.Name [TestName], f.Name [FieldName], f.GroupId
FROM [FormFields] f
INNER JOIN [Tests] t ON f.GroupID = t.ID
)
SELECT #colList = COALESCE(#colList + ',','') + QUOTENAME([FieldName] + '_' + [GroupId])
, #selList = COALESCE(#selList + ',','') + QUOTENAME([FieldName] + '_' + [GroupId]) + ' AS ' + QUOTENAME([FieldName] + ' - ' + [TestName])
FROM NameBase
SELECT #sql = 'SELECT [UserName],' + #selList + ' FROM (
SELECT u.Name [UserName], f.Name + '_' + f.GroupId [FieldName], r.RespVal [Response]
FROM Responses r
INNER JOIN [TestUsers] u ON r.UserID = u.ID
INNER JOIN [FormFields] f ON r.FieldID = f.ID) t
PIVOT (MAX([Response]) FOR [FieldName] IN (' + #colList + ')) pvt'
EXECUTE(#sql);
I haven't tested that yet, but it should at least point you in the right direction. I'll try to build a SqlFiddle to test it in a little bit.

Related

SQL query to obtain counts from two tables based on values and field names

I want to count the alerts of the candidates based on district.
Below is the district-wise alert lookup table
Table_LKP_AlertMastInfo
DistrictID FieldName AlertOptionValue
71 AreYouMarried Yes
71 Gender Female
72 AreYouMarried Yes
The above Table_LKP_AlertMastInfo FieldName should compare with table_RegistrationInfo fields to check the AlertOptionValue to get counts.
Below is the candidate details table:
Table_RegistrationInfo
CandidateId DistrictID AreYouMarried Gender
Can001 71 Yes Female
Can002 71 No Female
Can003 72 Yes Man
Can004 72 No Man
I want output like below:
Can001 2
Can002 1
Can003 1
Explanation of the above output counts:
Can001 have selected AreYouMarried:Yes and Gender:Female then count value 2
Can002 have selected Gender:Female then count value 1
Can003 have selected AreYouMarried:Yes then count value 1
Can004 have not alerts
This won't be possible without dynamic SQL if your data is modeled like it is, i.e. key-value pairs in Table_LKP_AlertMastInfo and columns in Table_RegistrationInfo. So with that out of our way, let's do it. Full code to the stored procedure providing the exact results you need is at the end, I'll follow with the explanation on what it does.
Because the alerts are specified as key-value pairs (field name - field value), we'll first need to get the candidate data in the same format. UNPIVOT can fix this right up, if we can get it the list of the fields. Had we only had only the two fields you mention in the question, it would be rather easy, something like:
SELECT CandidateId, DistrictID
, FieldName
, FieldValue
FROM Table_RegistrationInfo t
UNPIVOT (FieldValue FOR FieldName IN (AreYouMarried, Gender)) upvt
Of course that's not the case, so we'll need to dynamically select the list of the fields we're interested in and provide that. Since you're on 2008 R2, STRING_AGG is not yet available, so we'll use the XML trick to aggregate all the fields into a single string and provide it to the query above.
DECLARE #sql NVARCHAR(MAX)
SELECT #sql = CONCAT('SELECT CandidateId, DistrictID
, FieldName
, FieldValue
FROM Table_RegistrationInfo t
UNPIVOT (FieldValue FOR FieldName IN (',
STUFF((
SELECT DISTINCT ',' + ami.FieldName
FROM Table_LKP_AlertMastInfo ami
FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)'), 1, 1, ''), ')) upvt')
PRINT #sql
This produces almost the exact output as the query I wrote. Next, we need to store this data somewhere. Temporary tables to the rescue. Let's create one and insert into it using this dynamic SQL.
CREATE TABLE #candidateFields
(
CandidateID VARCHAR(50),
DistrictID INT,
FieldName NVARCHAR(200),
FieldValue NVARCHAR(1000)
);
INSERT INTO #candidateFields
EXEC sp_executesql #sql
-- (8 rows affected)
-- We could index this for good measure
CREATE UNIQUE CLUSTERED INDEX uxc#candidateFields on #candidateFields
(
CandidateId, DistrictId, FieldName, FieldValue
);
Great, with that out of the way, we now have both data sets - alerts and candidate data - in the same format. It's a matter of joining to find matches between both:
SELECT cf.CandidateID, COUNT(*) AS matches
FROM #candidateFields cf
INNER
JOIN Table_LKP_AlertMastInfo alerts
ON alerts.DistrictID = cf.DistrictID
AND alerts.FieldName = cf.FieldName
AND alerts.AlertOptionValue = cf.FieldValue
GROUP BY cf.CandidateID
Provides the desired output for the sample data:
CandidateID matches
-------------------------------------------------- -----------
Can001 2
Can002 1
Can003 1
(3 rows affected)
So we can stitch all that together now to form a reusable stored procedure:
CREATE PROCEDURE dbo.findMatches
AS
BEGIN
SET NOCOUNT ON;
DECLARE #sql NVARCHAR(MAX)
SELECT #sql = CONCAT('SELECT CandidateId, DistrictID
, FieldName
, FieldValue
FROM Table_RegistrationInfo t
UNPIVOT (FieldValue FOR FieldName IN (',
STUFF((
SELECT DISTINCT ',' + ami.FieldName
FROM Table_LKP_AlertMastInfo ami
FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)'), 1, 1, ''), ')) upvt')
CREATE TABLE #candidateFields
(
CandidateID VARCHAR(50),
DistrictID INT,
FieldName NVARCHAR(200),
FieldValue NVARCHAR(1000)
);
INSERT INTO #candidateFields
EXEC sp_executesql #sql
CREATE UNIQUE CLUSTERED INDEX uxc#candidateFields on #candidateFields
(
CandidateId, DistrictId, FieldName
);
SELECT cf.CandidateID, COUNT(*) AS matches
FROM #candidateFields cf
JOIN Table_LKP_AlertMastInfo alerts
ON alerts.DistrictID = cf.DistrictID
AND alerts.FieldName = cf.FieldName
AND alerts.AlertOptionValue = cf.FieldValue
GROUP BY cf.CandidateID
END;
Execute with
EXEC dbo.findMatches
You'd of course need to adjust types and probably add a bunch of other things here, like error handling, but this should get you started on the right path. You'll want a covering index on that alert table and it should be pretty fast even with a lot of records.
I managed to get the expected result without using dynamic queries.
Not sure if this is what you are looking for:
SELECT DISTINCT
c.CandidateId, SUM(a.AreYouMarriedAlert + a.GenderAlter) AS AlterCount
FROM
Table_RegistrationInfo c
OUTER APPLY
(
SELECT
CASE
WHEN a.FieldName = 'AreYouMarried' AND c.AreYouMarried = a.AlertOptionValue THEN 1
ELSE 0
END AS AreYouMarriedAlert,
CASE
WHEN a.FieldName = 'Gender' AND c.Gender = a.AlertOptionValue THEN 1
ELSE 0
END AS GenderAlter
FROM
Table_LKP_AlertMastInfo a
WHERE
a.DistrictID = c.DistrictID
) a
GROUP BY c.CandidateId
HAVING SUM(a.AreYouMarriedAlert + a.GenderAlter) > 0
Results:
I asusme that with 100 fields you have a set of alerts which are a combinatioin of values. Further I assume that you can have a select list in a proper order all the time. So
select candidateid,
AreyouMarried || '|' || Gender all_responses_in_one_string
from ....
is psssible. So above will return
candidateid all_responses_in_one_string
can001 Yes|Female
can002 No|Female
So now your alert can be a regular expression for the concatenated string. And your alert is based on how much you matched.
Here is one simple way of doing this:
SELECT subq.*
FROM
(SELECT CandidateId,
(SELECT COUNT(*)
FROM Table_LKP_AlertMastInfo ami
WHERE ami.DistrictID = ri.DistrictID
AND ami.FieldName ='AreYouMarried'
AND ami.AlertOptionValue = ri.AreYouMarried) +
(SELECT COUNT(*)
FROM Table_LKP_AlertMastInfo ami
WHERE ami.DistrictID = ri.DistrictID
AND ami.FieldName ='Gender'
AND ami.AlertOptionValue = ri.Gender) AS [count]
FROM Table_RegistrationInfo ri) subq
WHERE subq.[count] > 0;
See SQL Fiddle demo.
I am not sure if this can be completely done using SQL. If you are using some backend technology such as ADO.NET, then you can store the results in Datatables. Loop through the column names and do the comparison.
Dynamic SQL can be used to make Table_LKP_AlertMastInfo look like Table_RegistrationInfo.
This script can be used in a stored procedure and results can be retrieved in a Datatable.
DECLARE #SQL NVARCHAR(MAX)
DECLARE #PivotFieldNameList nvarchar(MAX)
SET #SQL = ''
SET #PivotFieldNameList = ''
SELECT #PivotFieldNameList = #PivotFieldNameList + FieldName + ', '
FROM (SELECT DISTINCT FieldName FROM Table_LKP_AlertMastInfo) S
SET #PivotFieldNameList = SUBSTRING(#PivotFieldNameList, 1, LEN(#PivotFieldNameList) - 1)
--SELECT #PivotFieldNameList
SET #SQL = ' SELECT DistrictId, ' + #PivotFieldNameList + ' FROM
Table_LKP_AlertMastInfo
PIVOT
( MAX(AlertOptionValue)
FOR FieldName IN (' + #PivotFieldNameList + '
) ) AS p '
PRINT #SQL
EXEC(#SQL)
Above query results like below
DistrictId AreYouMarried Gender
71 Yes Female
72 Yes NULL
If you get results from Table_RegistrationInfo into another Datatable, then both can be used for comparison.
Not tested but this should do the trick:
SELECT CandidateId,
( CASE
WHEN AreYouMarried = "Yes" AND Gender = 'Female' THEN 2
WHEN Gender = 'Female' THEN 1
WHEN AreYouMarried = "Yes" THEN 1
ELSE 0 END
) as CandidateValue
FROM
(SELECT * FROM Table_LKP_AlertMastInfo) as Alert
LEFT JOIN
(SELECT * FROM Table_RegistrationInfo) as Registration
ON (Alert.DistrictID = Registration.DistrictID);
This should give you a list with candidateId matching the condition count

SQL Server : add 'Type' to every column in table

I have a table DemoTable in SQL Server. And it has these columns:
Column1, Column2, Column3
I want to query the table
select * from DemoTable
but in query results I want to concatenate Type_ to all the column names available in DemoTable.
So the result of this query should be showing columns
Type_Column1, Type_Column2, Type_Column3
Is there any function or any way to achieve this?
Note: there are N number of columns not only 3 just to rename only these manually.
If the problem is as you say:
After joining all the tables , there are many duplicate column names
then the typical solution is to NOT use *. So instead of this:
SELECT *
FROM A
JOIN B ON ...
JOIN C ON ...
... you should consider using a custom column set, which is the normal and recommended way to do this, as in the following example:
SELECT A.Column1, A.Column2, B.Column3, C.Column4, C.Column5
FROM A
JOIN B ON ...
JOIN C ON ...
Here's one way to automate your task using dynamic SQL:
use MY_DATABASE;
go
--here you specify all your parameters, names should be self-explanatory
declare #sql varchar(1000) = 'select ',
#tableName varchar(100) = 'DemoTable',
#prefix varchar(10) = 'Type_';
select #sql = #sql + name + ' as ' + #prefix + name + ',' from sys.columns
where object_name(object_id) = #tableName;
set #sql = left(#sql, len(#sql) - 1) + ' from ' + #tableName;
exec(#sql);
Some general remarks:
Naming your result set's columns dynamically will demand for dynamic SQL in any case. No way around...
Naming columns to carry extra information is - in most cases - a very bad idea.
the only way I know to deal with the asterisk in a SELECT * FROM ... and still get full control over the columns names and types is XML.
Try this:
SELECT TOP 10 *
FROM sys.objects
FOR XML RAW, ROOT('TableDef'),ELEMENTS, XMLSCHEMA,TYPE
This will return the 10 first rows of sys.objects. The result is an XML, where the rows follow an XML schema definition.
It is possible (but sure not the best in performance) to create a fully inlined query dynamically. The result will be an EAV list carrying everything you need.
WITH PrepareForXml(QueryAsXml) AS
(
SELECT
(
SELECT TOP 10 *
FROM sys.objects
FOR XML RAW, ROOT('TableDef'),ELEMENTS, XMLSCHEMA,TYPE
)
)
,AllRows AS
(
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) RowIndex
,rw.query('.') theRowXml
FROM PrepareForXml
CROSS APPLY QueryAsXml.nodes('TableDef/*:row') A(rw)
)
SELECT RowIndex
,B.ColumnName
,B.ColumnValue
,COALESCE(
(SELECT QueryAsXml.value('declare namespace xsd="http://www.w3.org/2001/XMLSchema";
(TableDef
/xsd:schema
/xsd:element
/xsd:complexType
/xsd:sequence
/xsd:element[#name=sql:column("ColumnName")]
/#type )[1]','nvarchar(max)')
FROM PrepareForXml)
,(SELECT QueryAsXml.value('declare namespace xsd="http://www.w3.org/2001/XMLSchema";
(TableDef
/xsd:schema
/xsd:element
/xsd:complexType
/xsd:sequence
/xsd:element[#name=sql:column("ColumnName")]
/xsd:simpleType
/xsd:restriction
/#base)[1]','nvarchar(max)')
FROM PrepareForXml)
) AS ColumnType
FROM AllRows
CROSS APPLY theRowXml.nodes('*:row/*') A(col)
CROSS APPLY (SELECT col.value('local-name(.)','nvarchar(max)') ColumnName
,col.value('(./text())[1]','nvarchar(max)') ColumnValue ) B;
This is the beginning of the result-set:
RowIndex ColumnName ColumnValue ColumnType
1 name sysrscols sqltypes:nvarchar
1 object_id 3 sqltypes:int
1 schema_id 4 sqltypes:int
[...many more...]
I don't know what you need actually, but it might be enough to export the XML as is. It's everything in there...
UPDATE: I did not read carefully enough...
You want to trick out the fact, that a result set's column names must be unique in order to continue with this...
The approach above will not solve this issue. Sorry.
I won't delete this immediately... Might be there are some hints you can get out of this...
You can use the following query to add 'Type' to every column in table:
SELECT Column1 AS Type_Column1, Column2 AS Type_Column2, Column3 AS Type_Column3
FROM DemoTable

Dynamic SQL: Grouping by one variable, counting another for column names

I am trying to do a dynamic sql query, similar to some that have appeared on this forum, but for the life of me, I cannot get it to work.
I am using SQL Server 2008. I have a table with a series of order_ref numbers. Each of these numbers has a varying number of advice_refs associated with it. advice_ref numbers are unique (they are a key from another table). There is at least one advice_ref for each order_ref. There are a bunch of columns that describe information for each advice_ref.
What I want to do is create a table with a row for each unique order_ref, with columns for each advice_ref, in ascending order. The columns would be Advice01, Advice02, ....Advice10, Advice11, etc. Not all the Advice# columns would be filled in for every order_ref and the number of advice# columns would depend on the order_ref with the greatest number of advice_refs.
The table would look like:
Order Advice01 Advice02 Advice03 Advice04.....
1 1 2 3
2 5 8 9 20
3 25
The code I've tried to use is:
DECLARE #SQL NVARCHAR(MAX)
DECLARE #PVT NVARCHAR(MAX)
SELECT #SQL = #SQL + ', COALESCE(' + QUOTENAME('Advice' + RowNum) + ', '''') AS ' + QUOTENAME('Advice' + RowNum),
#PVT = #PVT + ', ' + QUOTENAME('Advice' + RowNum)
FROM (SELECT case when RowNum2 < 10 then '0'+RowNum2 when RowNum2 >=10 then RowNum2 end [RowNum] From
( SELECT DISTINCT CONVERT(VARCHAR, ROW_NUMBER() OVER(PARTITION BY order_ref ORDER BY advice_ref)) [RowNum2]
FROM [ED_dups].[dbo].[NewEDDupsLongForm]
) rn2 ) rn
SET #SQL = 'SELECT order_ref' + #SQL + '
FROM ( SELECT order_ref,
advice_ref,
case when CONVERT(VARCHAR, ROW_NUMBER() OVER(PARTITION BY order_ref ORDER BY advice_ref)) < 10
then ''Advice0'' + CONVERT(VARCHAR, ROW_NUMBER() OVER(PARTITION BY order_ref ORDER BY advice_ref))
else ''Advice'' + CONVERT(VARCHAR, ROW_NUMBER() OVER(PARTITION BY order_ref ORDER BY advice_ref))
end [AdviceID]
FROM [ED_dups].[dbo].[NewEDDupsLongForm]
) data
PIVOT
( MAX(advice_ref)
FOR AdviceID IN (' + STUFF(#PVT, 1, 2, '') + ')
) pvt'
EXECUTE SP_EXECUTESQL #SQL
SQL server tells me that the query executed successfully, but there is no output. When I run snippets of the code, it seems that the problem either lies in the pivot statement, near
+ STUFF(#PVT, 1, 2, '') + ')
and/or in the select statement, near
''Advice0'' +
Thanks in advance for any help--I've been at this for days!
I think you have to initialize variables like
DECLARE #SQL NVARCHAR(MAX) = ''
DECLARE #PVT NVARCHAR(MAX) = ''
or
DECLARE #SQL NVARCHAR(MAX)
DECLARE #PVT NVARCHAR(MAX)
SELECT #SQL = '', #PVT = ''
Otherwise your #SQL would be null
fist thing that comes to my mind is - do you really need SQL to fetch you dataset with dynamic number of columns? If you are writting an application, then your user interface, being it a web page or desktop app form, would be much nicer place to transform your data into a desired structure.
If you really need to do so, you will make your life much easier when you will not try to do everything in one big and rather complicated query, but rather split it into smaller tasks done step by step. What I would do is to use temporary tables to store working results, then use cursors to process order by order and advice by advice while inserting my data into temporary table or tables, in the end return a content of this table. Wrap everything in a stored procedure.
This method will also allow you to debug it easier - you can check every single step if it has done what it was expected to do.
And final advice - share a definition of your NewEDDupsLongForm table - someone might write some code to help you out then.
cheers

Find All Rows With Null Value(s) in Any Column

I'm trying to create a query that will return all the rows that have a null value across all but 1 column. Some rows will have more than one null entry somewhere. There is one column I will want to exclude, because at this moment in time all of the entries are null and it is the only column that is allowed to have null values. I am stuck because I don't know how to include all of the columns in the WHERE.
SELECT *
FROM Analytics
WHERE * IS NULL
Alternatively, I can do a count for one column, but the table has about 67 columns.
SELECT COUNT(*)
FROM Analytics
WHERE P_Id IS NULL
In SQL Server you can borrow the idea from this answer
;WITH XMLNAMESPACES('http://www.w3.org/2001/XMLSchema-instance' as ns)
SELECT *
FROM Analytics
WHERE (SELECT Analytics.*
FOR xml path('row'), elements xsinil, type
).value('count(//*[local-name() != "colToIgnore"]/#ns:nil)', 'int') > 0
SQL Fiddle
Likely constructing a query with 67 columns will be more efficient but it saves some typing or need for dynamic SQL to generate it.
Depending on which RDBMS you're using, I think your only option (rather than explicitly saying WHERE col1 IS NULL and col2 IS NULL and col3 IS NULL ...) would be to use Dynamic SQL.
For example, if you want to get all the column names from a SQL Server database, you could use something like this to return those names:
SELECT
name
FROM
sys.columns
WHERE
object_id = OBJECT_ID('DB.Schema.Table')
You could use FOR XML to create your WHERE clause:
SELECT Name + ' IS NULL AND ' AS [text()]
FROM sys.columns c1
WHERE object_id = OBJECT_ID('DB.Schema.Table')
ORDER BY Name
FOR XML PATH('')
Hope this helps get you started.
Good luck.
I don't have such a table to test, assuming there is no 'x' as data in any field, I think this should work on Sql-Server; (DEMO)
NOTE: I have filtered keyColumn as c.name != 'keyColumn'
DECLARE #S NVARCHAR(max), #Columns VARCHAR(50), #Table VARCHAR(50)
SELECT #Columns = '66', --Number of cols without keyColumn
#Table = 'myTable'
SELECT #S = ISNULL(#S+'+ ','') + 'isnull(convert(nvarchar, ' + c.name + '),''x'')'
FROM sys.all_columns c
WHERE c.object_id = OBJECT_ID(#Table) AND c.name != 'keyColumn'
exec('select * from '+#Table+' where ' + #S + '= replicate(''x'',' + #Columns + ')')
For the SQL beginner user like me, all the query above seem so hard to digest. I think the quickest way is just to write query for all 67 columns. It's just basically a copy&paste process.
E.g:
select count(*) from user where id is null or
name is null or
review_count is null or
yelping_since is null or
useful is null or
funny is null;
This is for SQLite, so slightly different, but I had the same problem and I ended up writing this small Python script findnulls.py to find all null values in a database:
import sqlite3
import sys
def main():
dbfile = sys.argv[1]
conn = sqlite3.connect(dbfile)
c = conn.cursor()
tables = c.execute("SELECT name FROM sqlite_master WHERE type='table';").fetchall()
for table in tables:
tablename = table[0]
cols = c.execute("PRAGMA table_info({0})".format(tablename)).fetchall()
for col in cols:
colname = col[1]
nullvals = c.execute("SELECT COUNT(*) FROM {0} WHERE {1} IS NULL;".format(tablename, colname)).fetchall()
if nullvals[0][0] != 0:
print("found {0} nulls in table: {1} column {2}".format(nullvals[0][0], tablename, colname))
main()
Where you would run it like this:
findnulls.py somedatabase.db
And the output looks like this:
found 1 nulls in table: Channels column MessageId
found 1 nulls in table: Channels column MessageChannel
found 1 nulls in table: Channels column SignalType
found 7 nulls in table: Channels column MinVal
found 7 nulls in table: Channels column MaxVal
found 9 nulls in table: Channels column AvgVal
found 9 nulls in table: Channels column Median
found 9 nulls in table: Channels column StdDev

How can I pivot these key+values rows into a table of complete entries?

Maybe I demand too much from SQL but I feel like this should be possible. I start with a list of key-value pairs, like this:
'0:First, 1:Second, 2:Third, 3:Fourth'
etc. I can split this up pretty easily with a two-step parse that gets me a table like:
EntryNumber PairNumber Item
0 0 0
1 0 First
2 1 1
3 1 Second
etc.
Now, in the simple case of splitting the pairs into a pair of columns, it's fairly easy. I'm interested in the more advanced case where I might have multiple values per entry, like:
'0:First:Fishing, 1:Second:Camping, 2:Third:Hiking'
and such.
In that generic case, I'd like to find a way to take my 3-column result table and somehow pivot it to have one row per entry and one column per value-part.
So I want to turn this:
EntryNumber PairNumber Item
0 0 0
1 0 First
2 0 Fishing
3 1 1
4 1 Second
5 1 Camping
Into this:
Entry [1] [2] [3]
0 0 First Fishing
1 1 Second Camping
Is that just too much for SQL to handle, or is there a way? Pivots (even tricky dynamic pivots) seem like an answer, but I can't figure how to get that to work.
No, in SQL you can't infer columns dynamically based on the data found during the same query.
Even using the PIVOT feature in Microsoft SQL Server, you must know the columns when you write the query, and you have to hard-code them.
You have to do a lot of work to avoid storing the data in a relational normal form.
Alright, I found a way to accomplish what I was after. Strap in, this is going to get bumpy.
So the basic problem is to take a string with two kinds of delimiters: entries and values. Each entry represents a set of values, and I wanted to turn the string into a table with one column for each value per entry. I tried to make this a UDF, but the necessity for a temporary table and dynamic SQL meant it had to be a stored procedure.
CREATE PROCEDURE [dbo].[ParseValueList]
(
#parseString varchar(8000),
#itemDelimiter CHAR(1),
#valueDelimiter CHAR(1)
)
AS
BEGIN
SET NOCOUNT ON;
IF object_id('tempdb..#ParsedValues') IS NOT NULL
BEGIN
DROP TABLE #ParsedValues
END
CREATE TABLE #ParsedValues
(
EntryID int,
[Rank] int,
Pair varchar(200)
)
So that's just basic set up, establishing the temp table to hold my intermediate results.
;WITH
E1(N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1),--Brute forces 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --Uses a cross join to generate 100 rows (10 * 10)
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --Uses a cross join to generate 10,000 rows (100 * 100)
cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY N) FROM E4)
That beautiful piece of SQL comes from SQL Server Central's Forums and is credited to "a guru." It's a great little 10,000 line tally table perfect for string splitting.
INSERT INTO #ParsedValues
SELECT ItemNumber AS EntryID, ROW_NUMBER() OVER (PARTITION BY ItemNumber ORDER BY ItemNumber) AS [Rank],
SUBSTRING(Items.Item, T1.N, CHARINDEX(#valueDelimiter, Items.Item + #valueDelimiter, T1.N) - T1.N) AS [Value]
FROM(
SELECT ROW_NUMBER() OVER (ORDER BY T2.N) AS ItemNumber,
SUBSTRING(#parseString, T2.N, CHARINDEX(#itemDelimiter, #parseString + #itemDelimiter, T2.N) - T2.N) AS Item
FROM cteTally T2
WHERE T2.N < LEN(#parseString) + 2 --Ensures we cut out once the entire string is done
AND SUBSTRING(#itemDelimiter + #parseString, T2.N, 1) = #itemDelimiter
) AS Items, cteTally T1
WHERE T1.N < LEN(#parseString) + 2 --Ensures we cut out once the entire string is done
AND SUBSTRING(#valueDelimiter + Items.Item, T1.N, 1) = #valueDelimiter
Ok, this is the first really dense meaty part. The inner select is breaking up my string along the item delimiter (the comma), using the guru's string splitting method. Then that table is passed up to the outer select which does the same thing, but this time using the value delimiter (the colon) to each row. The inner RowNumber (EntryID) and the outer RowNumber over Partition (Rank) are key to the pivot. EntryID show which Item the values belong to, and Rank shows the ordinal of the values.
DECLARE #columns varchar(200)
DECLARE #columnNames varchar(2000)
DECLARE #query varchar(8000)
SELECT #columns = COALESCE(#columns + ',[' + CAST([Rank] AS varchar) + ']', '[' + CAST([Rank] AS varchar)+ ']'),
#columnNames = COALESCE(#columnNames + ',[' + CAST([Rank] AS varchar) + '] AS Value' + CAST([Rank] AS varchar)
, '[' + CAST([Rank] AS varchar)+ '] AS Value' + CAST([Rank] AS varchar))
FROM (SELECT DISTINCT [Rank] FROM #ParsedValues) AS Ranks
SET #query = '
SELECT '+ #columnNames +'
FROM #ParsedValues
PIVOT
(
MAX([Value]) FOR [Rank]
IN (' + #columns + ')
) AS pvt'
EXECUTE(#query)
DROP TABLE #ParsedValues
END
And at last, the dynamic sql that makes it possible. By getting a list of Distinct Ranks, we set up our column list. This is then written into the dynamic pivot which tilts the values over and slots each value into the proper column, each with a generic "Value#" heading.
Thus by calling EXEC ParseValueList with a properly formatted string of values, we can break it up into a table to feed into our purposes! It works (but is probably overkill) for simple key:value pairs, and scales up to a fair number of columns (About 50 at most, I think, but that'd be really silly.)
Anyway, hope that helps anyone having a similar issue.
(Yeah, it probably could have been done in something like SQLCLR as well, but I find a great joy in solving problems with pure SQL.)
Though probably not optimal, here's a more condensed solution.
DECLARE #DATA varchar(max);
SET #DATA = '0:First:Fishing, 1:Second:Camping, 2:Third:Hiking';
SELECT
DENSE_RANK() OVER (ORDER BY [Data].[row]) AS [Entry]
, [Data].[row].value('(./B/text())[1]', 'int') as "[1]"
, [Data].[row].value('(./B/text())[2]', 'varchar(64)') as "[2]"
, [Data].[row].value('(./B/text())[3]', 'varchar(64)') as "[3]"
FROM
(
SELECT
CONVERT(XML, '<A><B>' + REPLACE(REPLACE(#DATA , ',', '</B></A><A><B>'), ':', '</B><B>') + '</B></A>').query('.')
) AS [T]([c])
CROSS APPLY [T].[c].nodes('/A') AS [Data]([row]);
Hope is not too late.
You can use the function RANK to know the position of each Item per PairNumber. And then use Pivot
SELECT PairNumber, [1] ,[2] ,[3]
FROM
(
SELECT PairNumber, Item, RANK() OVER (PARTITION BY PairNumber order by EntryNumber) as RANKing
from tabla) T
PIVOT
(MAX(Item)
FOR RANKing in ([1],[2],[3])
)as PVT