Group by absorb NULL unless it's the only value - sql

I'm trying to group by a primary column and a secondary column. I want to ignore NULL in the secondary column unless it's the only value.
CREATE TABLE #tempx1 ( Id INT, [Foo] VARCHAR(10), OtherKeyId INT );
INSERT INTO #tempx1 ([Id],[Foo],[OtherKeyId]) VALUES
(1, 'A', NULL),
(2, 'B', NULL),
(3, 'B', 1),
(4, 'C', NULL),
(5, 'C', 1),
(6, 'C', 2);
I'm trying to get output like
Foo OtherKeyId
A NULL
B 1
C 1
C 2
This question is similar, but takes the MAX of the column I want, so it ignores other non-NULL values and won't work.
I tried to work out something based on this question, but I don't quite understand what that query does and can't get my output to work
-- Doesn't include Foo='A', creates duplicates for 'B' and 'C'
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY [Foo] ORDER BY [OtherKeyId]) rn1
FROM #tempx1
)
SELECT c1.[Foo], c1.[OtherKeyId], c1.rn1
FROM cte c1
INNER JOIN cte c2 ON c2.[OtherKeyId] = c1.[OtherKeyId] AND c2.rn1 = c1.rn1
This is for a modern SQL Server: Microsoft SQL Server 2019

You can use a GROUP BY expression with HAVING clause like below one
SELECT [Foo],[OtherKeyId]
FROM #tempx1 t
GROUP BY [Foo],[OtherKeyId]
HAVING SUM(CASE WHEN [OtherKeyId] IS NULL THEN 0 END) IS NULL
OR ( SELECT COUNT(*) FROM #tempx1 WHERE [Foo] = t.[Foo] ) = 1
Demo

Hmmm . . . I think you want filtering:
select t.*
from #tempx1 t
where t.otherkeyid is not null or
not exists (select 1
from #tempx1 t2
where t2.foo = t.foo and t2.otherkeyid is not null
);

My actual problem is a bit more complicated than presented here, I ended up using the idea from Barbaros Özhan solution to count the number of items. This ends up with two inner queries on the data set with two different GROUP BY. I'm able to get the results I need on my real dataset using a query like the following:
SELECT
a.[Foo],
b.[OtherKeyId]
FROM (
SELECT
[Foo],
COUNT([OtherKeyId]) [C]
FROM #tempx1 t
GROUP BY [Foo]
) a
JOIN (
SELECT
[Foo],
[OtherKeyId]
FROM #tempx1 t
GROUP BY [Foo], [OtherKeyId]
) b ON b.[Foo] = a.[Foo]
WHERE
(b.[OtherKeyId] IS NULL AND a.[C] = 0)
OR (b.[OtherKeyId] IS NOT NULL AND a.[C] > 0)

Related

How to get the aggregate results for missing values as zero

My DDL is like
create table if not exists sample_t
(
id bigserial NOT NULL constraint sample_t_id primary key,
test_value varchar(255),
test varchar(255) not null,
count bigint not null
);
Sample insert queries
INSERT INTO public.sample_t (id, test_value, test, count) VALUES (1, 'CC1', 'hi-1', 11);
INSERT INTO public.sample_t (id, test_value, test, count) VALUES (2, 'CC2', 'hi-1', 10);
INSERT INTO public.sample_t (id, test_value, test, count) VALUES (3, 'CC1', 'hi-2', 4);
My Query is
select test, sum(count) from sample_t where test_value= 'CC2' group by test;
The o/p is
test | sum
hi-1 | 10
However, I want to list down missing 'test' column values as 0. So the expected o/p should look like:
test | sum
hi-1 | 10
hi-2 | 0
Instead, use conditional aggregation:
select test, sum(case when test_value = 'CC2' then count else 0 end)
from sample_t
group by test;
Alternatively, if you have a table of all test values:
select t.test, coalesce(sum(count), 0)
from test t left join
sample_t s
on s.test = t.test and s.test_value = 'CC2'
group by t.test;
The problem here is that your WHERE clause might completely filter off a test group, should none of its records have the matching test value. You may use a left join here to preserve every initial test value:
SELECT DISTINCT
s1.test,
COALESCE(s2.cnt, 0) AS cnt
FROM sample_t s1
LEFT JOIN
(
SELECT test, COUNT(*) AS cnt
FROM sample_t
WHERE test_value = 'CC2'
GROUP BY test
) s2
ON s1.test = s2.test;
Or, you could use conditional aggregation:
SELECT
test, COUNT(CASE WHEN test_value = 'CC2' THEN 1 END) cnt
FROM sample_t
GROUP BY test;

sql generate code based on three column values

I have three columns
suppose
row no column1 column2 column3
1 A B C
2 A B C
3 D E F
4 G H I
5 G H C
I want to generate code by combining these three column values
For Eg.
1)ABC001
2)ABC002
3)DEF001
4)GHI001
5)GHC001
by checking combination of three columns
logic is that
if values of three columns are same then like first time it shows 'ABC001'
and 2nd time it shows 'ABC002'
You can try this:
I dont know what you want for logic with 00, but you can add them manuel or let the rn decide for you
declare #mytable table (rowno int,col1 nvarchar(50),col2 nvarchar(50),col3 nvarchar(50)
)
insert into #mytable
values
(1,'A', 'B', 'C'),
(2,'A', 'B', 'C'),
(3,'D', 'E', 'F'),
(4,'G', 'H', 'I'),
(5,'G', 'H', 'C')
Select rowno,col1,col2,col3,
case when rn >= 10 and rn < 100 then concatcol+'0'+cast(rn as nvarchar(50))
when rn >= 100 then concatcol+cast(rn as nvarchar(50))
else concatcol+'00'+cast(rn as nvarchar(50)) end as ConcatCol from (
select rowno,col1,col2,col3
,Col1+col2+col3 as ConcatCol,ROW_NUMBER() over(partition by col1,col2,col3 order by rowno) as rn from #mytable
) x
order by rowno
My case when makes sure when you hit number 10 it writes ABC010 and when it hits above 100 it writes ABC100 else if its under 10 it writes ABC001 and so on.
Result
TSQL: CONCAT(column1,column2,column3,RIGHT(REPLICATE("0", 3) + LEFT(row_no, 3), 3))
You should combine your columns like below :
SELECT CONVERT(VARCHAR(MAX), ROW_NUMBER() OVER(ORDER BY
(
SELECT NULL
)))+') '+DATA AS Data
FROM
(
SELECT column1+column2+column3+'00'+CONVERT(VARCHAR(MAX), ROW_NUMBER() OVER(PARTITION BY column1,
column2,
column3 ORDER BY
(
SELECT NULL
))) DATA
FROM <table_name>
) T;
Result :
1)ABC001
2)ABC002
3)DEF001
4)GHI001
5)GHC001
MySQL:
CONCAT(column1,column2,column3,LPAD(row_no, 3, '0'))
[you will need to enclose the 'row no' in ticks if there is a space in the name of the field instead of underscore.]

SQL Querying on tuple values

I need to write a write a SQL query that selects values from a table based on several tuples of selection criteria. It could be done using a where clause like this :
where (a = 1 and b='a') or (a=5 and b='s')
Is the best way to select:
select a, pk from x where a in (1,5)
select b, pk from x where b in ('a','s')
and join the result of the two queries using the primary key?
do you mean something(a self join) like this:
select x.a, x.pk
from x
join x x2 on x.pk=x2.pk
where x.a in (1,5)
and x2.b in ('a','s')
?
You can use join on table expression from VALUES. You can add in VALUES as much rows as you want. It will work on MSSQL:
DECLARE #x TABLE ( a INT, b CHAR(1) )
INSERT INTO #x
VALUES ( 1, 'a' ),
( 1, 'b' ),
( 1, 'c' ),
( 2, 'd' ),
( 2, 'e' ),
( 5, 'f' ),
( 5, 's' )
SELECT x.*
FROM #x x
JOIN (
VALUES ( 1, 'a'),
( 5, 's')
) AS v( a, b ) ON x.a = v.a AND x.b = v.b
Output:
a b
1 a
5 s
Based on my understanding you want write a SQL that uses a combination of two filters. Here is a simple solution that will work in any database.
Create a new column say "COLUMN_NEW" in the same table or build a temp table or a view with a new column (plus existing columns from original table).
Insert concatenated values of column a and column b in "COLUMN_NEW". Based on the example mentioned by you values in "COLUMN_NEW" will be "1a" and "5s"
Now you may have a different syntax for concat in different databases. Example concat(a,b) in SQL server.
SQL to select records from the table will be select * from table where COLUMN_NEW in ("1a",5s");

What's the best way to get intersected data from one table?

Suppose I have the below table
CREATE TABLE [dbo].[TestData](
[ID] [bigint] NOT NULL,
[InstanceID] [int] NOT NULL,
[Field] [int] NULL,
[UserID] [bigint] NOT NULL
) ON [PRIMARY]
GO
INSERT [dbo].[TestData] ([ID], [InstanceID], [Field], [UserID])
VALUES (1, 1, NULL, 1000),(2, 1, NULL, 1002),(3, 1, NULL, 1000),
(4, 1, NULL, 1003),(5, 2, NULL, 1002), (6, 2, NULL, 1005),
(7, 2, NULL, 1006),(8, 2, NULL, 1007),(9, 3, NULL, 1002),
(10, 3, NULL, 1006),(11, 3, NULL, 1009),(12, 3, NULL, 1010),
(13, 1, NULL, 1006),(14, 2, NULL, 1002),(15, 3, NULL, 1003)
GO
I search for the best practice to write a query to get the full rows of intersected data between two instances using UserID
For example the intersected UserIDs between InstanceID 1 and 2 are ( 1002 , 1006 ), to get the results I wote the query in two different ways as below :
Select * From TestData
Where UserID in
(
Select T1.UserID From TestData T1 Where InstanceID = 1
Intersect
Select T2.UserID From TestData T2 Where InstanceID = 2
)
and InstanceID in (1,2) Order By 1
Second
Select * From TestData
Where UserID in
(
Select Distinct T1.UserID
From TestData T1 join TestData T2 on T1.UserID = T2.UserID
Where T1.InstanceID = 1 and T2.InstanceID = 2
)
and InstanceID in (1,2) Order By 1
So the results will be
Is one of the above queries is the best way to get the results ??
Using EXISTS is better than using IN. When using the IN subquery, the entire resultset is processed. With EXISTS, it just searches as they are found to match. As far as your question, I think the INTERSECT implementation just simply does the join anyways so there shouldn't be a difference.
EDIT: a post Here says that for IN vs EXISTS, the optimizer will treat them the same as well (as of 2008). So pretty much my guess as well as what I just read boils down to :They will perform the same because the optimizer knows.
Here's an example of the query if you were to use EXISTS statements:
SELECT *
FROM TestData td
WHERE td.InstanceID IN (1, 2)
AND EXISTS
(SELECT 1
FROM TestData sub
WHERE td.UserID = sub.UserID
AND sub.InstanceID = 2)
AND EXISTS
(SELECT 1
FROM TestData sub
WHERE td.UserID = sub.UserID
AND sub.InstanceID = 1)
ORDER BY 1;
For the sample data provided, there was no noticable performance difference between any of the three solutions. However, I agree with Scotch that using EXISTS statements will help performance over IN statements under specific scenarios.
The best thing you can do to improve performance is create the table with a PRIMARY KEY. Setting the ID field as a PRIMARY KEY will bolster performance by 50% since the highest cost of your query is sorting the data.
You can also do this with an aggregation and join:
select td.*
from TestData td join
(select td.userid
from TestData
group by td.userId
having sum(case when InstanceId = 1 then 1 else 0 end) > 0 and
sum(case when InstanceId = 2 then 1 else 0 end) > 0
) td2
on td.userid = td2.userid
The advantage to the aggregation is that the having clause makes it very flexible in terms of the conditions you can represent. Performance will be best if you have an index on userId, InstanceId.
The script is used by two operations of Index seek and one operation of Distinct sorting.
SELECT ID, InstanceID, Field, UserID
FROM [dbo].[TestData] t
WHERE InstanceID IN(1, 2)
AND EXISTS (
SELECT 1
FROM [dbo].[TestData] t2
WHERE InstanceID IN(1, 2) AND t.UserID = t2.UserID
HAVING COUNT(DISTINCT t2.InstanceID) = 2
)
ORDER BY t.ID
OR
;WITH cte AS
(
SELECT ID, InstanceID, Field, UserId
,COUNT(*) OVER(PARTITION BY InstanceID, UserID) AS cntInstanceUser
FROM [dbo].[TestData] t
WHERE InstanceID IN(1, 2)
)
SELECT c.ID, c.InstanceID, c.Field, c.UserID
FROM cte c
WHERE EXISTS (
SELECT 1
FROM cte c2
WHERE c2.UserId = c.UserID
HAVING COUNT(*) != c.cntInstanceUser
)
ORDER BY c.ID
For improving performance use this index:
CREATE INDEX x ON [dbo].[TestData](InstanceID, UserID) INCLUDE(Id, Field)
Demo on SQLFiddle

How to write a Sql statement without using union?

I have a sql statement like below. How can I add a single row(code = 0, desc = 1) to result of this sql statement without using union keyword? thanks.
select code, desc
from material
where material.ExpireDate ='2010/07/23'
You can always create a view for your table which itself uses UNION keyword
CREATE VIEW material_view AS SELECT code, desc, ExpireDate FROM material UNION SELECT '0', '1', NULL;
SELECT code, desc FROM material_view WHERE ExpireDate = '2010/07/23' OR code = '0';
WITH material AS
(
SELECT *
FROM
(VALUES (2, 'x', '2010/07/23'),
(3, 'y', '2009/01/01'),
(4, 'z', '2010/07/23')) vals (code, [desc], ExpireDate)
)
SELECT
COALESCE(m.code,x.code) AS code,
COALESCE(m.[desc],x.[desc]) AS [desc]
FROM material m
FULL OUTER JOIN (SELECT 0 AS code, '1' AS [desc] ) x ON 1=0
WHERE m.code IS NULL OR m.ExpireDate ='2010/07/23'
Gives
code desc
----------- ----
2 x
4 z
0 1
Since you don't want to use either a union or a view, I'd suggest adding a dummy row to the material table (with code = 0, desc = 1, and ExpireDate something that would never normally be selected - eg. 01 January 1900) - then use a query like the following:
select code, desc
from material
where material.ExpireDate ='2010/07/23' or
material.ExpireDate ='1900/01/01'
Normally, a Union would be my preferred option.