Displaying occurrences of NULL values and overall duplicates with SQL - sql

With data such as the below, I need to generate a report that reports back the number of records with NULL and the number of duplicates, all with one SQL query if possible.
DES | VAL
--------------
Tango | 32
Zulu | [null]
Golf | 12
Golf | 12
Bravo | [null]
The report would look like:
NULLS | DUPLICATES
---------------------
2 | 1
I can get the nulls with something like SUM(CASE VAL WHEN NULL THEN 1 ELSE 0 END) AS NULLS, and duplicates separately, but not as one query so I don't even know if it's possible.

SELECT
(SELECT COUNT(*) FROM table_name WHERE val IS NULL)
AS NULLS,
(SELECT ( COUNT(val) - COUNT(DISTINCT(val)) ) FROM table_name)
AS DUPLICATES

Not sure how you want to count your duplicates so I included two versions.
declare #T table
(
DES varchar(10),
VAL int
)
insert into #T values
('Tango', 32),
('Zulu', null),
('Zulu', null),
('Zulu', null),
('Golf', 12),
('Golf', 12),
('Bravo', null)
select sum(case when T.VAL is null then C end) as NULLS,
sum(case when T.C > 1 then C-1 end) as DUPLICATES1,
sum(case when T.C > 1 then 1 end) as DUPLICATES2
from (
select VAL, count(*) as C
from #T
group by DES, VAL
) T
Result:
NULLS DUPLICATES1 DUPLICATES2
----------- ----------- -----------
4 3 2

Well if you have 2 selects returning scalar values that you want to combine into a simple report like that, you could do:
SELECT
2 AS NULLS,
DUPS
FROM (SELECT 1 AS DUPS) D
Results:
NULLS DUPS
----------- -----------
2 1
Replacing the two selects as needed.

Assuming (?!) that you want to count duplicate rows, this may come close to what you want:
declare #Foo as Table ( DES VarChar(10), VAL Int Null )
insert into #Foo ( DES, VAL ) values
( 'Tango', 32 ),
( 'Zulu', NULL ),
( 'Golf', 12 ), ( 'Golf', 12 ), ( 'Golf', 13 ),
( 'Bravo', NULL ),
( 'Whiskey', 8388 ), ( 'Whiskey', 8388 ), ( 'Whiskey', 8388 ), ( 'Whiskey', 8388 )
select * from #Foo
select distinct DES, VAL from #Foo
select ( select Count( 42 ) from #Foo where VAL is NULL ) as [NULLS],
( select Count( 42 ) from #Foo ) - Count( 42 ) as [DUPLICATES] from ( select distinct DES, VAL from #Foo ) as Elmer

Related

SQL count number of records where value remains constant

I need to find the count of tracker_id where position remains 1 through out the table.
tracker_id | position
---------------------
5 | 1
11 | 1
4 | 1
4 | 2
5 | 2
4 | 1
4 | 1
11 | 1
14 | 1
9 | 2
Here, the output should be 2 since, position of tracker_id:11 and 14 remains 1 through out the table.
You can use not exists
select count(*) from tbl a
where not exists(select 1
from tbl b
where a.tracker_id = b.tracker_id
and a.position <> b.position )
and a.position = 1
Output: 2
declare #table1 as table (tracker_id int,postion int)
insert into #table1 values (5,1)
insert into #table1 values (11,1)
insert into #table1 values (4,1)
insert into #table1 values (4,2)
insert into #table1 values (5,2)
insert into #table1 values (4,1)
insert into #table1 values (4,1)
insert into #table1 values (11,1)
insert into #table1 values (14,1)
insert into #table1 values (9,2)
select count(tracker_id),tracker_id,postion from #table1 group by tracker_id,postion
You can also do:
select ( count(distinct tracker_id) -
count(distinct tracker_id) filter (where position <> 1)
) as num_all_1s
from t;
Using uncorrelated subquery
select count(distinct tracker_id)
from t
where position=1
and tracker_id not in (select tracker_id from t where position<>1);
Using window function
select count(distinct tracker_id)
from (select *, avg(position) over (partition by tracker_id) as avg_pos from t) a
where avg_pos=1;
This one is just for giggles
select distinct count(*) over ()
from t
group by tracker_id
having count(*) = sum(position);
And if you really want to have fun
select count(distinct tracker_id)-count(distinct case when position<>1 then tracker_id end)
from t;
If position can only be 1, then you can use this, which gets all the tracker_ids with only a single position value, and then limits that to those records where position = 1:
WITH agg AS
(
SELECT
tracker_id
, p = MAX(position)
FROM table1
GROUP BY tracker_id
HAVING COUNT(DISTINCT position) = 1
)
SELECT COUNT(tracker_id)
FROM agg
WHERE p = 1

How can I select distinct by one column?

I have a table with the columns below, and I need to get the values if COD is duplicated, get the non NULL on VALUE column. If is not duplicated, it can get a NULL VALUE. Like the example:
I'm using SQL SERVER.
This is what I get:
COD ID VALUE
28 1 NULL
28 2 Supermarket
29 1 NULL
29 2 School
29 3 NULL
30 1 NULL
This is what I want:
COD ID VALUE
28 2 Supermarket
29 2 School
30 1 NULL
What I'm tryin' to do:
;with A as (
(select DISTINCT COD,ID,VALUE from CodId where ID = 2)
UNION
(select DISTINCT COD,ID,NULL from CodId where ID != 2)
)select * from A order by COD
You can try this.
DECLARE #T TABLE (COD INT, ID INT, VALUE VARCHAR(20))
INSERT INTO #T
VALUES(28, 1, NULL),
(28, 2 ,'Supermarket'),
(29, 1 ,NULL),
(29, 2 ,'School'),
(29, 3 ,NULL),
(30, 1 ,NULL)
;WITH CTE AS (
SELECT *, RN= ROW_NUMBER() OVER (PARTITION BY COD ORDER BY VALUE DESC) FROM #T
)
SELECT COD, ID ,VALUE FROM CTE
WHERE RN = 1
Result:
COD ID VALUE
----------- ----------- --------------------
28 2 Supermarket
29 2 School
30 1 NULL
Another option is to use the WITH TIES clause in concert with Row_Number()
Example
Select top 1 with ties *
from YourTable
Order By Row_Number() over (Partition By [COD] order by Value Desc)
Returns
COD ID VALUE
28 2 Supermarket
29 2 School
30 1 NULL
I would use GROUP BY and JOIN. If there is no NOT NULL value for a COD than it should be resolved using the OR in JOIN clause.
SELECT your_table.*
FROM your_table
JOIN (
SELECT COD, MAX(value) value
FROM your_table
GROUP BY COD
) gt ON your_table.COD = gt.COD and (your_table.value = gt.value OR gt.value IS NULL)
If you may have more than one non null value for a COD this will work
drop table MyTable
CREATE TABLE MyTable
(
COD INT,
ID INT,
VALUE VARCHAR(20)
)
INSERT INTO MyTable
VALUES (28,1, NULL),
(28,2,'Supermarket'),
(28,3,'School'),
(29,1,NULL),
(29,2,'School'),
(29,3,NULL),
(30,1,NULL);
WITH Dups AS
(SELECT COD FROM MyTable GROUP BY COD HAVING count (*) > 1 )
SELECT MyTable.COD,MyTable.ID,MyTable.VALUE FROM MyTable
INNER JOIN dups ON MyTable.COD = Dups.COD
WHERE value IS NOT NULL
UNION
SELECT MyTable.COD,MyTable.ID,MyTable.VALUE FROM MyTable
LEFT JOIN dups ON MyTable.COD = Dups.COD
WHERE dups.cod IS NULL

Compare two number SQL

In SQL,I am trying to compare two numbers in the same field. Both numbers contain different information, but for some technical reason they are same. The problem is when exist one sub-string of length 5 and another of length 4 and the last 4 digits of both are same.I want to get the first one with length 5.
Example:
--------------------------------
|ID | Number| Description |
---------------------------------
| 1 | 12345 | Project X,Ready |
---------------------------------
| 2 | 2345 | Project X,onDesign |
---------------------------------
I should always get 12345(or biggest one) if exist numbers with last 4 digits same. Is there any CASE or CTE statement which can give me an easy resolution for this issue?
Try this:
SELECT Id
,Number
,Description
FROM (
SELECT Id
,Number
,Description
,rank() OVER (PARTITION BY right(cast([Number] AS VARCHAR(20)), 4) ORDER BY Number DESC) AS Ranking
FROM YourTable
) InnerTable
WHERE ranking = 1
Here is an example with not exists:
DECLARE #t TABLE
(
ID INT ,
Number INT ,
Description VARCHAR(100)
)
INSERT INTO #t
VALUES ( 1, 12345, 'Project 1' ),
( 2, 2345, 'Project 2' ),
( 3, 77777, 'Project 3' ),
( 4, 7777, 'Project 4' ),
( 5, 88888, 'Project 5' ),
( 6, 9999, 'Project 6' )
SELECT * FROM #t t1
WHERE NOT EXISTS(SELECT * FROM #t t2
WHERE t2.ID <> t1.ID AND
CAST(t2.Number AS VARCHAR(10)) LIKE '%' + CAST(t1.Number AS VARCHAR(10)))
Output:
ID Number Description
1 12345 Project 1
3 77777 Project 3
5 88888 Project 5
6 9999 Project 6
So you need to join using last 4 digits. You could do this by using simple MOD operator. It's used as a percentage sign in SQL Server.
SELECT 12345 % 10000;
This outputs 2345. Exactly what we are looking for.
So we could build the following query to use that calculation:
DECLARE #Test TABLE
(
ID INT
, Number INT
, Description VARCHAR(500)
);
INSERT INTO #Test(ID, Number, Description)
VALUES (1, 12345, 'Project X,Ready')
, (2, 2345, 'Project X,onDesign');
SELECT T1.*
FROM #Test AS T1
INNER JOIN #Test AS T2
ON T2.Number = T1.Number % 10000
WHERE T2.Number <> T1.Number;
Output:
╔════╦════════╦═════════════════╗
║ ID ║ Number ║ Description ║
╠════╬════════╬═════════════════╣
║ 1 ║ 12345 ║ Project X,Ready ║
╚════╩════════╩═════════════════╝
Note that I've added WHERE T2.Number <> T1.Number. It eliminates equal numbers, because SELECT 2345 % 10000 is 2345 as well.
Update
This could be done using ROW_NUMBER()
;WITH Data (ID, Number, Description, RN)
AS (
SELECT ID
, Number
, Description
, ROW_NUMBER() OVER (PARTITION BY Number % 10000 ORDER BY Number DESC)
FROM #Test
)
SELECT *
FROM Data
WHERE RN = 1;
This will do the classic row_number stuff. It will partition windows by Number % 10000, which means that 12345 and 2345 will fall under same window and the highest number will always come first.
Try this:
SELECT DISTINCT A.*
FROM [Tablename] AS A
INNER JOIN [Tablename] AS B
ON B.Number =RIGHT(A.Number,4)
WHERE B.Number <> A.Number;
RIGHT(A.Number,4) will compare the last 4 digits and will give the output
The query might be RDBMS spesific. For example with MSSQL you can do like this:
SELECT *
FROM myTable AS d1
WHERE NOT EXISTS ( SELECT *
FROM myTable AS d2
WHERE SUBSTRING(d2.number, 2, 4) = d1.number );
EDIT: Ah, you edited and it is an INT! Then you can use the % operator instead of substring.
Sample with CTE:
DECLARE #dummy TABLE
(
id INT IDENTITY
PRIMARY KEY ,
number INT ,
[description] VARCHAR(20)
);
INSERT #dummy ( [number], [description] )
VALUES ( 12345, 'P' ),
( 22345, 'P' ),
( 2345, 'P' ),
( 3456, 'P' ),
( 13456, 'P' ),
( 4567, 'P' );
WITH d AS (
SELECT MAX(number) AS maxNum
FROM #dummy AS [d]
GROUP BY [d].[number] % 10000
)
SELECT d1.*
FROM #dummy AS [d1]
INNER JOIN d ON d.[maxNum] = d1.[number];

Count pair-wise occurrences in a T-SQL table

How can I count pair-wise occurrences in a SQL Server table? Please note that the order of the given sequence has to be accounted for and shouldn't be changed.
Original table:
1 2 3 4
--------
1 | A A A B
2 | A # don't count
3 | B A A
4 | B # don't count
Result:
1 | AA = 3
2 | AB = 1
3 | BB = 0
4 | BA = 1
In addition, the code has to work for large datasets.
Edit:
A pair in this context is a set of two values {x[ij], x[(i+1)j]}, where i=1,...,4 and j=1,...,4. Further, pairs that have the form A null or B null shouldn't be counted. Moreover, null A or null B can't happen, therefore they don't have to be accounted for.
I just want to point out a pretty easy way to express this logic:
with vals as (
select 'A' as val union all select 'B'
)
pairs as (
select t1.val as val1, t2.val as val2
from vals t1 cross join vals t2
)
select p.*,
(select count(*)
from original
where [1] = val1 and [2] = val2 or
[2] = val1 and [3] = val2 or
[3] = val1 and [4] = val2
) as cnt
from pairs p
order by cnt desc;
This doesn't have great performance characteristics, that is actually easily fixed by using three subqueries and indexes on the data columns.
LiveDemo
CREATE TABLE #tab([1] NVARCHAR(100), [2] NVARCHAR(100),
[3] NVARCHAR(100), [4] NVARCHAR(100));
INSERT INTO #tab
VALUES ('A', 'A', 'A', 'B') ,('A' , NULL ,NULL ,NULL )
,('B' ,'A' ,'A', NULL),('B', NULL, NULL, NULL);
WITH cte AS
(
SELECT pair = [1] + [2] FROM #tab
UNION ALL
SELECT pair = [2] + [3] FROM #tab
UNION ALL
SELECT pair = [3] + [4] FROM #tab
), cte2 AS
(
SELECT [1] AS val FROM #tab
UNION ALL SELECT [2] FROM #tab
UNION ALL SELECT [3] FROM #tab
UNION ALL SELECT [4] FROM #tab
), all_pairs AS
(
SELECT DISTINCT a.val + b.val AS pair
FROM cte2 a
CROSS JOIN cte2 b
WHERE a.val IS NOT NULL and b.val IS NOT NULL
)
SELECT a.pair, result = COUNT(c.pair)
FROM all_pairs a
LEFT JOIN cte c
ON a.pair = c.pair
GROUP BY a.pair;
How it works:
cte create all pairs (1,2), (2,3), (3,4)
cte2 get all values from column
all_pairs create all possible pairs of values AA, AB, BA, BB
Final use grouping and COUNT to get number of occurences.
EDIT:
You can concatenate result as below:
LiveDemo2
...
, final AS
(
SELECT a.pair, result = COUNT(c.pair), rn = ROW_NUMBER() OVER(ORDER BY a.pair)
FROM all_pairs a
LEFT JOIN cte c
ON a.pair = c.pair
GROUP BY a.pair
)
SELECT rn, [result] = pair + ' = ' + CAST(result AS NVARCHAR(100))
FROM final
with cte as (
select 1 as id, 'A' as [1], 'A' as [2], 'A' as [3], 'B' as [4]
union all select 2 , 'A', NULL,NULL,NULL
union all select 3 , 'B', 'A','A',NULL
union all select 4 , 'B',NULL,NULL,NULL
)
, Vals as (
select 'AA' as Val
union all select 'AB'
union all select 'BB'
union all select 'BA'
)
, UNPVT as (
/*UNPIVOT to convert the columns to be rows*/
SELECT id , VAL + LEAD(VAL) OVER (PARTITION BY ID ORDER BY SEQ) as Code
FROM (
select ID,[1],[2],[3],[4] from cte
) P
UNPIVOT (Val FOR Seq IN ([1],[2],[3],[4])
) AS UNPVT
)
select Vals.Val, count(UNPVT.Code) from UNPVT right join Vals on UNPVT.Code = Vals.Val
group by Vals.Val
CTE: contains your data.
Vals: contains the returned code.
UnPVT: to convert the columns to be rows.

Unpivot multiple columns not showing desire result

Original
RecordKey Name Section1_Product Section1_Code Section2_Product Section2_Code ......
1 a ff 22
2 b gg 22
3 c hh 33
RecordKey Name Section Product Code ......
1 a 1 ff 22
1 a 2
2 b 1 gg 22
2 b 2
3 c 1 hh 22
3 c 2
I am trying to unpivot the columns into rows. Some sections will have null value.
SELECT RecordKey
,Name
,'Num_of_Sections' = ROW_NUMBER() OVER (PARTITION BY RecordKey ORDER BY ID)
,Product
,Code
FROM (
SELECT RecordKey, Name, Section1_Product, Section1_Code, Section2_Product, Section2_Code FROM Table
) M
UNPITVOT (
Product FOR ID IN (Section1_Product, Section2_Product)
) p
UNPIVOT (
Code FOR CO IN (Section1_Code, Section2_Code)
) c
If I execute with only one column (Product, comment out Code) then I will have 2 values in ID column (1,2). If I run the query with 2 columns then I get 4 values in ID column(1, 2, 3, 4).
may as per my assumption and your data provided we can achieve this using Cross apply and Row_number
declare #Record TABLE
([RecordKey] int,
[Name] varchar(1),
[Section1_Product] varchar(2),
[Section1_Code] int,
[Section2_Product] varchar(2),
[Section2_Code] int)
;
INSERT INTO #Record
([RecordKey], [Name], [Section1_Product], [Section1_Code],[Section2_Product],[Section2_Code])
VALUES
(1, 'a', 'ff', 22,NULL,NULL),
(2, 'b', 'gg', 22,NULL,NULL),
(3, 'c', 'hh', 33,NULL,NULL)
;
With cte as (
Select T.RecordKey,
T.Name,
T.val,
T.val1 from (
select RecordKey,Name,val,val1 from #Record
CROSS APPLY (VALUES
('Section1_Product',Section1_Product),
('Section2_Product',Section2_Product))cs(col,val)
CROSS APPLY (VALUES
('Section1_Code',Section1_Code),
('Section2_Code',Section2_Code))css(col1,val1)
WHERE val is NOT NULL)T
)
Select c.RecordKey,
c.Name,
c.RN,
CASE WHEN RN = 2 THEN NULL ELSE c.val END Product,
c.val1 Code
from (
Select RecordKey,
Name,
ROW_NUMBER()OVER(PARTITION BY val ORDER BY (SELECT NULL))RN,
val,
val1 from cte )C