Getting Count Only of Distinct Value Combinations of multiple fields. - sql

Please consider the following:
IF OBJECT_ID ('tempdb..#Customer') IS NOT NULL
DROP TABLE #Customer;
CREATE TABLE #Customer
(
CustomerKey INT IDENTITY (1, 1) NOT NULL
,CustomerNum INT NOT NULL
,CustomerName VARCHAR (25) NOT NULL
,Planet VARCHAR (25) NOT NULL
)
GO
INSERT INTO #Customer (CustomerNum, CustomerName, Planet)
VALUES (1, 'Anakin Skywalker', 'Tatooine')
, (2, 'Yoda', 'Coruscant')
, (3, 'Obi-Wan Kenobi', 'Coruscant')
, (4, 'Luke Skywalker', 'Tatooine')
, (4, 'Luke Skywalker', 'Tatooine')
, (4, 'Luke Skywalker', 'Bespin')
, (4, 'Luke Skywalker', 'Bespin')
, (4, 'Luke Skywalker', 'Endor')
, (4, 'Luke Skywalker', 'Tatooine')
, (4, 'Luke Skywalker', 'Kashyyyk');
Notice that there are a total of 10 records. I know that I can get the list of distinct combinations of CustomerName and PLanet eith either of the following two queries.
SELECT DISTINCT CustomerName, Planet FROM #Customer;
SELECT CustomerName, Planet FROM #Customer
GROUP BY CustomerName, Planet;
However, what I'd like is a simple way to get just the count of those values, not the values themselves. I'd like a way that's quick to type, but also performant. I know I could load the values into a CTE, Temp Table, Table Variable, or Sub Query, and then count the records. Is there a better way to accomplish this?

This will work in 2005:
SELECT COUNT(*) AS cnt
FROM
( SELECT 1 AS d
FROM Customer
GROUP BY Customername, Planet
) AS t ;
Tested in SQL-Fiddle. An index on (CustomerName, Planet) would be used, see the query plan (for 2012 version):
The simplest to think, "get all distinct values in a subquery, then count" , yiields the same identical plan:
SELECT COUNT(*) AS cnt
FROM
 ( SELECT DISTINCT Customername, Planet
   FROM  Customer
 ) AS t ;
And also the one (thanx to #Aaron Bertrand) using ranking function ROW_NUMBER() (not sure if it will be efficient in 2005 version, too, but you can test):
SELECT COUNT(*) AS cnt
FROM
(SELECT rn = ROW_NUMBER()
OVER (PARTITION BY CustomerName, Planet
ORDER BY CustomerName)
FROM Customer) AS x
WHERE rn = 1 ;
There are also other ways to write this (one even without subquery, thanx to #Mikael Erksson!) but not as efficient.

The subquery/CTE method is the "right" way to do it.
A quick (in terms of typing but not necessarily performance) and dirty way is:
select count(distinct customername+'###'+Planet)
from #Customer;
The '###' is to separate the values so you don't get accidental collisions.

Related

Comparing a value of a row with the value of the previous row

I have a table in SQL Server that stores geology samples, and there is a rule that must be adhered to.
The rule is simple, a "DUP_2" sample must always come after a "DUP_1" sample (sometimes they are loaded inverted)
CREATE TABLE samples (
id INT
,name VARCHAR(5)
);
INSERT INTO samples VALUES (1, 'ASSAY');
INSERT INTO samples VALUES (2, 'DUP_1');
INSERT INTO samples VALUES (3, 'DUP_2');
INSERT INTO samples VALUES (4, 'ASSAY');
INSERT INTO samples VALUES (5, 'DUP_2');
INSERT INTO samples VALUES (6, 'DUP_1');
INSERT INTO samples VALUES (7, 'ASSAY');
id
name
1
ASSAY
2
DUP_1
3
DUP_2
4
ASSAY
5
DUP_2
6
DUP_1
7
ASSAY
In this example I would like to show all rows where name equal to 'DUP_2' and predecessor row (using ID) name is different from 'DUP_1'.
In this case, it would be row 5 only.
I would appreciate very much if you help me.
You can use the LAG() window function or you can use LEAD() - they are identical except for the way in which they are ordered. That is - LAG(name) OVER ( ORDER BY id ) is the same as LEAD(name) OVER ( ORDER BY id DESC ). (You can read more about these functions here.)
WITH s1 ( id, name, prior_name ) AS (
SELECT id, name, LAG(name) OVER ( ORDER BY id ) AS prior_name
FROM samples
)
SELECT id, name
FROM s1
WHERE name = 'DUP_2'
AND COALESCE(prior_name, 'DUMMY') != 'DUP_1';
The reason for the COALESCE() at the end with the DUMMY value is that the first value won't have a LAG(); it will be NULL; and we want to return the DUP_2 record in this case since it doesn't follow a DUP_1 record.
You can use lag():
select s.*
from (select s.*,
lag(name) over (order by id) as prev_name
from samples s
) s
where name = 'DUP_2' and (prev_name <> 'DUP_1' or prev_name is null)

SQL group three columns into one

I have a table with three columns:
[ID] [name] [link]
1 sample_name_1 sample_link_1
2 sample_name_2 sample_link_2
3 sample_name_3 sample_link_3
I need to somehow group them into one column, so the ideal result is this:
[one_column]
1
sample_name_1
sample_name_1
2
sample_name_2
sample_link_2
3
sample_name_3
sample_link_3
Does anyone have any suggestions on where to look and how to get it done in SQL Server?
You may try to use VALUES table value constructor with CROSS APPLY:
Table:
CREATE TABLE MyTable (
ID int,
name varchar(50),
link varchar(50)
)
INSERT INTO MyTable (ID, name, link)
VALUES
(1, 'sample_name_1', 'sample_link_1'),
(2, 'sample_name_2', 'sample_link_2'),
(3, 'sample_name_3', 'sample_link_3')
Statement:
SELECT v.one_column
FROM MyTable t
CROSS APPLY (VALUES
(1, CONVERT(varchar(50), ID)),
(2, CONVERT(varchar(50), name)),
(3, CONVERT(varchar(50), link))
) v (rn, one_column)
ORDER BY t.ID, v.rn
Result:
one_column
1
sample_name_1
sample_link_1
2
sample_name_2
sample_link_2
3
sample_name_3
sample_link_3
While this is something you should do in your presentation layer (i.e. your app or Website) you can do this in SQL:
select one column
from
(
select cast(id as varchar(10)) as one column, id as sortkey1, 1 as sortkey2 from mytable
union all
select name as one column, id as sortkey1, 2 as sortkey2 from mytable
union all
select link as one column, id as sortkey1, 3 as sortkey2 from mytable
) unioned
order by sortkey1, sortkey2;

Alternative to NOT IN in SSMS

I have my table in this structure. I am trying to find all the unique ID's whose word's do not appear in the list. How can I achieve this in MS SQL Server.
id word
1 hello
2 friends
2 world
3 cat
3 dog
2 country
1 phone
4 eyes
I have a list of words
**List**
phone
eyes
hair
body
Expected Output
Except the words from the list, I need all the unique ID's. In this case it is,
2
3
I & 4 is not in the output as their words appears in the List
I tried the below code
Select count(distinct ID)
from Table1
where word not in ('phone','eyes','hair','body')
I tried Not Exists also which did not work
You can also use GROUP BY
SELECT id
FROM Table1
GROUP BY id
HAVING MAX(CASE WHEN word IN('phone', 'eyes', 'hair', 'body') THEN 1 ELSE 0 END) = 0
One way to do it is to use not exists, where the inner query is linked to the outer query by id and is filtered by the search words.
First, create and populate sample table (Please save us this step in your future questions):
DECLARE #T AS TABLE (
id int,
word varchar(20)
)
INSERT INTO #T VALUES
(1, 'hello'),
(2, 'friends'),
(2, 'world'),
(3, 'cat'),
(3, 'dog'),
(2, 'country'),
(1, 'phone'),
(4, 'eyes')
The query:
SELECT DISTINCT id
FROM #T t0
WHERE NOT EXISTS
(
SELECT 1
FROM #T t1
WHERE word IN('phone', 'eyes', 'hair', 'body')
AND t0.Id = t1.Id
)
Result:
id
2
3
SELECT t.id FROM dbo.table AS t
WHERE NOT EXISTS (SELECT 1 FROM dbo.table AS t2
INNER JOIN
(VALUES('phone'),('eyes'),('hair'),('body')) AS lw(word)
ON t2.word = lw.word
AND t2.id = t.id)
GROUP BY t.id;
You can try this as well: this is a dynamic table structure:
DECLARE #T AS TABLE (id int, word varchar(20))
INSERT INTO #T VALUES
(1, 'hello'),
(2, 'friends'),
(2, 'world'),
(3, 'cat'),
(3, 'dog'),
(2, 'country'),
(1, 'phone'),
(4, 'eyes')
DECLARE #tblNotUsed AS TABLE ( id int, word varchar(20))
DECLARE #tblNotUsedIds AS TABLE (id int)
INSERT INTO #tblNotUsed VALUES
(1, 'phone'),
(2, 'eyes'),
(3, 'hair'),
(4, 'body')
INSERT INTO #tblNotUsedIds (id)
SELECT [#T].id FROM #T INNER JOIN #tblNotUsed ON [#tblNotUsed].word = [#T].word
SELECT DISTINCT id FROM #T
WHERE id NOT IN (SELECT id FROM #tblNotUsedIds)
The nice thing about SQL is there are sometimes many ways to do things. Here is one way is to place your list of known values into a #temp table and then run something like this.
Select * from dbo.maintable
EXCEPT
Select * from #tempExcludeValues
The results will give you all records that aren't in your predefined list. A second way is to do the join like Larnu has mentioned in the comment above. NOT IN is typically not the fastest way to do things on larger datasets. JOINs are by far the most efficient method of filtering data. Many times better than using a IN or NOT IN clause.

Excluding records within an aggregate function based on presence of value in another table

I'm writing a query that generates statistics based on postcodes and I need to be able to count the number of matching records that are within a range of postcodes except when they exist in a secondary table. This is part of a larger query and I need the count of records for each postcodes in columnar format rather than as separate rows and this minimal example demonstrates what I've attempted:
CREATE TABLE #People
(
Name nvarchar(10),
Postcode int
)
INSERT INTO #People VALUES ('Adam', 2000)
INSERT INTO #People VALUES ('John', 2001)
INSERT INTO #People VALUES ('Paul', 2001)
INSERT INTO #People VALUES ('Peter', 2099)
INSERT INTO #People VALUES ('Tom', 4000)
CREATE TABLE #PostcodesToIgnore
(
Postcode int
)
INSERT INTO #PostcodesToIgnore VALUES (2099)
SELECT SUM(CASE WHEN PostCode BETWEEN 2000 AND 2099 THEN 1 ELSE 0 END) FROM #People
SELECT SUM(CASE WHEN PostCode BETWEEN 2000 AND 2099
AND PostCode NOT IN (SELECT PostCode FROM #PostcodesToIgnore) THEN 1 ELSE 0 END)
FROM #People
The first query that counts all postcodes within the range works but the second one fails with the error:
Cannot perform an aggregate function on an expression containing an aggregate or a subquery.
While I could refactor the query to include all the criteria from the outer select into each subselect there are quite a few criteria in the real query so I was hoping there might be a more elegant way to go about it?
You could use a left join instead.
SELECT
SUM
(
CASE WHEN PostCode BETWEEN 2000 AND 2099
AND pcti.PostCode is null
THEN 1
ELSE 0
END
)
FROM #People p
left join #PostcodesToIgnore pcti on pcti.PostCode = p.PostCode
You could remove the SUM and push the query into a derived table or CTE.
The following works
SELECT SUM(PostCodeFlag)
FROM (SELECT CASE
WHEN PostCode BETWEEN 2000 AND 2099
AND PostCode NOT IN (SELECT PostCode
FROM #PostcodesToIgnore) THEN 1
ELSE 0
END AS PostCodeFlag
FROM #People) T
Something like this:
Use a CTE to pre-prepare your data, then do a simple grouped count.
Or you could have a look on OVER (https://msdn.microsoft.com/en-us/library/ms189461.aspx)
WITH myCTE AS
(
SELECT Name,Postcode FROM #People
WHERE Postcode NOT IN (SELECT Postcode FROM #PostcodesToIgnore)
)
SELECT Postcode, Count(Name)
FROM myCTE
GROUP BY Postcode
FROM #people WHERE postcode not in (...).
In fact, it looks like you just don't need any CASE at all and you can specify all of your predicates in the FROM.
Or am I missing something ?

SQL Showing Less information depending on date

I have this code, what It returns is a list of some clients, but it lists too many. This is because it lists several of the same thing just with diffrent dates. I only want to show the latest date and none of the other ones. I tried to do a group by Client_Code but it didn't work, it just through up not an aggregate function or something similar (can get if needed). What I have been asked to get is all of our clients, with all the details listed. in the 'as' part and they all pull through properly. If I take out:
I.DATE_LAST_POSTED as 'Last Posted',
I.DATE_LAST_BILLED as 'Last Billed'
It shows up okay, but I need the last billed date only to appear. But putting these lines in shows the client several times listing all the diffrent bill dates. And I think that is because it is pulling across the diffrent Matters in the Matter_Master Table. Essentially, I would like to only show the Client Information on the highest Matter with there last billed date.
Please let me know if this needs clarification, im trying to explain best I can....
SELECT DISTINCT
A.DIWOR as 'ID',
B.Client_alpha_Name as 'Client Name',
A.ClientCODE as 'Client Code',
B.Client_address as 'Client Address',
D.COMM_NO AS 'Contact',
E.Contact_full_name as 'Possible Key Contact',
G.LOBSICDESC as 'LOBSIC Code',
H.EARNERNAME as 'Client Care Parnter',
A.CLIENTCODE + '/' + LTRIM(STR(A.LAST_MATTER_NUM)) as 'Last Matter Code',
I.DATE_LAST_POSTED as 'Last Posted',
I.DATE_LAST_BILLED as 'Last Billed'
FROM CLIENT_MASTER A
JOIN CLIENT_INFO B
ON A.CLIENTCODE=B.CLIENT_CODE
JOIN MATTER_MASTER C
ON A.DIWOR=C.CLIENTDIWOR
JOIN COMMINFO D
ON A.DIWOR=D.DIWOR
JOIN CONTACT E
ON A.CLIENTCODE=E.CLIENTCODE
JOIN VW_CONTACT F
ON E.NAME_DIWOR=F.NAME_DIWOR
JOIN LOBSIC_CODES G
ON A.LOBSICDIWOR=G.DIWOR
JOIN STAFF H
ON A.CLIENTCAREPARTNER=H.DIWOR
JOIN MATTER I
ON C.DIWOR=I.MATTER_DIWOR
WHERE F.COMPANY_FLAG='Y'
AND C.MATTER_MANAGER NOT IN ('78','466','2','104','408','73','51','561','504','101','13','534','16','461','531','144','57','365','83','107','502','514','451')
AND I.DATE_LAST_BILLED > 0
GROUP BY A.ClientCODE
ORDER BY A.DIWOR
Your problem is that you aren't using enough aggregate functions. Which is probably why you're using both the DISTINCT clause and the GROUP BY clause (the recommendation is to use GROUP BY, and not DISTINCT).
So... remove DISTINCT, add the necessary (unique, more or less) list of columns to the GROUP BY clause, and wrap the rest in aggregate functions, constants, or subselects. In the specific case of wanting the largest date, wrap it in a MAX() function.
If I understood right:
--=======================
-- sample data - simplifed output of your query
--=======================
declare #t table
(
ClientCode int,
ClientAddress varchar(50),
DateLastBilled datetime
-- the rest of fields is skipped
)
insert into #t values (1, 'address1', '2011-01-01')
insert into #t values (1, 'address1', '2011-01-02')
insert into #t values (1, 'address1', '2011-01-03')
insert into #t values (1, 'address1', '2011-01-04')
insert into #t values (2, 'address2', '2011-01-07')
insert into #t values (2, 'address2', '2011-01-08')
insert into #t values (2, 'address2', '2011-01-09')
insert into #t values (2, 'address2', '2011-01-10')
--=======================
-- solution
--=======================
select distinct
ClientCode,
ClientAddress,
DateLastBilled
from
(
select
ClientCode,
ClientAddress,
DateLastBilled,
-- list of remaining fields
MaxDateLastBilled = max(DateLastBilled) over(partition by ClientCode)
from
(
-- here should be your query
select * from #t
) t
) t
where MaxDateLastBilled = DateLastBilled