DB2 SQL Getting distinct value when grouping rows - sql

BUSINESSTABLE looks like this:
HOTEL_CHAIN HOTEL_LOCATION HOTEL_OWNER
_____________________________________________________
Marriott Las Vegas Nelson
Best Western New York Richards
Best Western San Francisco Smith
Marriott New York Nelson
Hilton Boston James
I'm trying to execute an SQL statement in a DB2 database that groups these entries by HOTEL_CHAIN. If the rows that are grouped together contain the same HOTEL_LOCATION or HOTEL_OWNER, that info should be preserved. Otherwise, a value of 'NULL' should be displayed. For example, both Marriott hotels have the same owner, Nelson, so I want to display that information in the new table. However, each Marriott hotel is in a different location, so I'd like to display 'NULL' in that column.
The resulting table (HOTELTABLE) should look like this:
HOTEL_CHAIN HOTEL_LOCATION HOTEL_OWNER
_____________________________________________________
Marriott NULL Nelson
Best Western NULL NULL
Hilton Boston James
I'm trying to use the following SQL statement to accomplish this:
INSERT INTO HOTELTABLE(HOTEL_CHAIN,HOTEL_LOCATION,HOTEL_OWNER)
SELECT
HOTEL_CHAIN,
CASE COUNT(DISTINCT(HOTEL_LOCATION)) WHEN 1 THEN HOTEL_LOCATION ELSE 'NULL' END,
CASE COUNT(DISTINCT(HOTEL_OWNER)) WHEN 1 THEN HOTEL_OWNER ELSE 'NULL' END,
FROM BUSINESSTABLE GROUP BY HOTEL_CHAIN
I get an SQL error SQLCODE-119 A COLUMN OR EXPRESSION IN A HAVING CLAUSE IS NOT VALID. It seems to be complaining about the 2nd HOTEL_LOCATION and the 2nd HOTEL_OWNER within my case statements. I also tried using DISTINCT(HOTEL_LOCATION) and that threw another error. Can someone please explain the correct way to code this? Thank you!

Don't use COUNT(DISTINCT). Use MIN() and MAX():
INSERT INTO HOTELTABLE(HOTEL_CHAIN,HOTEL_LOCATION,HOTEL_OWNER)
SELECT HOTEL_CHAIN,
(CASE WHEN MIN(HOTEL_LOCATION) = MAX(HOTEL_LOCATION)
THEN MIN(HOTEL_LOCATION) ELSE 'NULL'
END),
(CASE WHEN MIN(HOTEL_OWNER) = MAX(HOTEL_OWNER)
THEN MIN(HOTEL_OWNER) ELSE 'NULL'
END)
FROM BUSINESSTABLE
GROUP BY HOTEL_CHAIN;
Notes:
Why not COUNT(DISTINCT)? It is generally much more expensive than MIN() and MAX() because it needs to maintain internal lists of all values.
I don't approve of a string value called 'NULL'. Seems like it is designed to foster confusion. Perhaps just NULL the value itself?

I agree Gordon for the null (gj Gordon).
other method
INSERT INTO HOTELTABLE(HOTEL_CHAIN,HOTEL_LOCATION,HOTEL_OWNER)
select distinct f1.HOTEL_CHAIN,
case when f2.HasDiffLocation is not null then 'NULL' else f1.HOTEL_LOCATION end as HOTEL_LOCATION,
case when f3.HasDiffOwner is not null then 'NULL' else f1.HOTEL_OWNER end as HOTEL_OWNER
from BUSINESSTABLE f1
left outer join lateral
(
select 1 HasDiffLocation from BUSINESSTABLE f2b
where f1.HOTEL_CHAIN=f2b.HOTEL_CHAIN and f1.HOTEL_LOCATION<>f2b.HOTEL_LOCATION
fetch first rows only
) f2 on 1=1
left outer join lateral
(
select 1 HasDiffOwner from BUSINESSTABLE f3b
where f1.HOTEL_CHAIN=f3b.HOTEL_CHAIN and f1.HOTEL_OWNER<>f3b.HOTEL_OWNER
fetch first rows only
) f3 on 1=1
or like this :
INSERT INTO HOTELTABLE(HOTEL_CHAIN,HOTEL_LOCATION,HOTEL_OWNER)
select distinct f1.HOTEL_CHAIN,
ifnull(f2.result, f1.HOTEL_LOCATION) as HOTEL_LOCATION,
ifnull(f3.result, f1.HOTEL_OWNER) as HOTEL_LOCATION,
from BUSINESSTABLE f1
left outer join lateral
(
select 'NULL' result from BUSINESSTABLE f2b
where f1.HOTEL_CHAIN=f2b.HOTEL_CHAIN and f1.HOTEL_LOCATION<>f2b.HOTEL_LOCATION
fetch first rows only
) f2 on 1=1
left outer join lateral
(
select 'NULL' result from BUSINESSTABLE f3b
where f1.HOTEL_CHAIN=f3b.HOTEL_CHAIN and f1.HOTEL_OWNER<>f3b.HOTEL_OWNER
fetch first rows only
) f3 on 1=1

Related

Return a NULL value if Date not in CTE

I have a query that counts the number of records imported for every day according to the current date. The only problem is that the count only returns when records have been imported and NULLS are ignored
I have created a CTE with one column in MSSQL that lists dates in a certain range e.g. 2019-01-01 - today.
The query that i've currently got is like this:
SELECT TableName, DateRecordImported, COUNT(*) AS ImportedRecords
FROM Table
WHERE DateRecordImported IN (SELECT * FROM DateRange_CTE)
GROUP BY DateRecordImported
I get the results fine for the dates that exist in the table for example:
TableName DateRecordImported ImportedRecords
______________________________________________
Example 2019-01-01 165
Example 2019-01-02 981
Example 2019-01-04 34
Example 2019-01-07 385
....
but I need a '0' count returned if the date from the CTE is not in the Table. Is there a better alternative to use in order to return a 0 count or does my method need altering slightly
You can do LEFT JOIN :
SELECT C.Date, COUNT(t.DateRecordImported) AS ImportedRecords
FROM DateRange_CTE C LEFT JOIN
table t
ON t.DateRecordImported = C.Date -- This may differ use actual column name instead
GROUP BY C.Date; -- This may differ use actual column name instead
Move the position of the CTE from a subquery to the FROM:
SELECT T.TableName,
DT.PCTEDateColumn} AS DateRecordImported,
COUNT(T.{TableIDColumn}) AS ImportedRecords
FROM DateRange_CTE DT
LEFT JOIN [Table] T ON DT.{TEDateColumn} = T.DateRecordImported
GROUP BY DT.{CTEDateColumn};
You'll need to replace the values in braces ({})
You can try this
SELECT TableName, DateRecordImported,
case when DateRecordImported is null
then '0'
else count(*) end AS ImportedRecords
FROM Table full join DateRange_CTE
on Table.DateRecordImported DateRange_CTE.ImportedDate
group by DateRecordImported,ImportedDate
(ImportedDate is name of column of CTE)

pick group by from a col where col2 is NULL

I have to write a report by doing some SQL in MS SQL server. The data I have is like this:
UserID,Country, CommNumber
00001, IN, 1001
00002, IN, NULL
00003, US, 1002
00004, US, 1003
00005, DE, NULL
00006, DE, NULL
00007, US, NULL
Now I want to pull up the list of countries where all CommNumbers are NULL. Even if one user has a CommNumber in that country, I don't want that country to be in list. So looking at above only DE has all two users with NULL on CommNumber. US and IN have atleast one user where the CommNumber is not NULL.
Hope this question makes sense.
My attempt is:
SELECT
[COUNTRY]
,COUNT(*) AS 'COMMNUMBER_USERS'
FROM
<TABLENAME>
WHERE [COMMNUMBER] IS NULL
GROUP BY [C]
ORDER BY [COMMNUMBER_USERS]
The above is not giving me the correct results. I understand why because I don't have way to tell it that I only want countries where all commnumbers are null.
I would use NOT EXISTS :
SELECT t.*
FROM table t
WHERE NOT EXISTS (SELECT 1 FROM table t1 WHERE t1.Country = t.Country AND t1.CommNumber IS NOT NULL);
If you want only Country then you can do aggregation :
select country
from table t
group by country
having max(commnumber) is null;
You can simply use group by and having:
select country
from t
group by country
having max(commnumber) is null;
You can try using correlated subquery
select * from tablename a where not exists
(select 1 from tablename b where a.country=b.country and b.commnumber is not null)
SELECT Country
FROM Tablename
GROUP BY Country
HAVING sum(ISNULL(Commnumber, 0)) = 0
You can use this one as well

SQL Server / Report Builder subquery returns more than one row error

I'm struggling to figure out why a subquery in report builder is returning more than one row.
(
SELECT
(
CASE
WHEN C.CourseCode IN ('50089079','50089080') THEN 'L2 Maths FS'
WHEN C.CourseCode IN ('50089067','50089109') THEN 'L1 Maths FS'
WHEN C.CourseCode IN ('50084987','50092959') THEN 'E3 Maths FS'
WHEN C.CourseCode IN ('50084975','50091967') THEN 'E2 Maths FS'
WHEN C.CourseCode IN ('50084963','50091724') THEN 'E1 Maths FS'
WHEN C.CourseCode IN ('60146084') THEN 'GCSE Maths'
Else 'NA'
END
)
FROM
Enrolment E
INNER JOIN
Course C ON C.CourseID = E.CourseID
WHERE
E.PMStudentID = vReports_Enrolment.PMStudentID
AND C.CourseCode IN ('50089079', '50089080', '50089067', '50089109', '50084987', '50092959', '50084975', '50091967', '50084963', '50091724', '60146084')
AND vReports_Enrolment.CompletionID = 1
)
This is the data for a specific learner where this error is popping up - I've highlighted where there would usually be 2 rows returned if not for the CompletionID being checked to see if it's '1':
CourseCode CompletionID
-------------------------
50044357 1
50044369 1
50089079 0
60146084 1
60187578 1
60148366 1
The expected behavior in this case is to return 'GCSE Maths' - am I doing something wrong?
In some cases you have two rows or more rows.
Using TOP 1 will only choose the first which is no guarantee that it's the one you want, especially if your data is not as clean as you think.
It is safer to use SELECT DISTINCT ... . That way, if all returned rows are the same, just duplicates, then you will get the correct answer. If you still get an error then you need to investigate the sub-query results.
Added:
(
SELECT TOP 1
(
CASE
Ensure only one row is returned, which is the expected behavior.

SQL Server select column while inserting

This question is a part of an insert statement in which I am trying to select a value from another value that can be inserted into a column.
For example in my table OnlineServers, I have columns:
ID, ServerID, OnlineSince
In my second table ImportServers, I have columns with the data (The lines after NewYork and Paris are actually empty):
ImportServerName
NewYork
London
Paris
Tokyo
This question is related to SQL Server.
In my third table, which is a look-up table called ServerLookup, I have these columns with data:
ID, ServerName
0 Not specified
1 NewYork
2 London
3 Tokyo
4 Munich
5 Salzburg
Question: I want to have an sql statement which can select ID '0' from ServerLookup table if the value of the column ImportServerName is empty.
What I have so far is:
insert into OnlineServers (ServerID, OnlineSince)
select
(
select ID
from ServerLookup
where ServerLookup.ServerName = ImportServers.ServerName
or ServerLookup.ServerName = ''
),
GETDATE()
from ImportServers
The problem I am facing is if the server name is matched, it also returns an extra row with empty server name.
How can I fix this problem.
Thanks
PS: Forgive me if there is any typo in the code
INSERT INTO OnlineServers
SELECT CASE ImportServerName
WHEN '' THEN 0
ELSE ID
END AS ServerID, GetDate()
FROM ImportedServers s
LEFT JOIN ServerLookup l on s.ImportedServerName = l.ServerName;
This should do it. You LEFT JOIN so you get every record from ImportedServers and use CASE to get 0 where ImportServerName is blank.
Maybe something like this:
SELECT FIRST(column_name) FROM table_name;
Limiting the return to the first match, also, shoudl teh from not be inside the select brackets?
select FIRST(ID)
from ServerLookup
where ServerLookup.ServerName = ImportServers.ServerName
or ServerLookup.ServerName = ''
from ImportServers)
if i understand your problem correctly try something like this
insert into OnlineServers (ServerID, OnlineSince)
select TOP(1) ID ,GETDATE()
from ServerLookup
inner join ImportServers
on ServerLookup.ServerName = ImportServers.ServerName
Where ServerLookup.ServerName = ''

How can I choose the closest match in SQL Server 2005?

In SQL Server 2005, I have a table of input coming in of successful sales, and a variety of tables with information on known customers, and their details. For each row of sales, I need to match 0 or 1 known customers.
We have the following information coming in from the sales table:
ServiceId,
Address,
ZipCode,
EmailAddress,
HomePhone,
FirstName,
LastName
The customers information includes all of this, as well as a 'LastTransaction' date.
Any of these fields can map back to 0 or more customers. We count a match as being any time that a ServiceId, Address+ZipCode, EmailAddress, or HomePhone in the sales table exactly matches a customer.
The problem is that we have information on many customers, sometimes multiple in the same household. This means that we might have John Doe, Jane Doe, Jim Doe, and Bob Doe in the same house. They would all match on on Address+ZipCode, and HomePhone--and possibly more than one of them would match on ServiceId, as well.
I need some way to elegantly keep track of, in a transaction, the 'best' match of a customer. If one matches 6 fields, and the others only match 5, that customer should be kept as a match to that record. In the case of multiple matching 5, and none matching more, the most recent LastTransaction date should be kept.
Any ideas would be quite appreciated.
Update: To be a little more clear, I am looking for a good way to verify the number of exact matches in the row of data, and choose which rows to associate based on that information. If the last name is 'Doe', it must exactly match the customer last name, to count as a matching parameter, rather than be a very close match.
for SQL Server 2005 and up try:
;WITH SalesScore AS (
SELECT
s.PK_ID as S_PK
,c.PK_ID AS c_PK
,CASE
WHEN c.PK_ID IS NULL THEN 0
ELSE CASE WHEN s.ServiceId=c.ServiceId THEN 1 ELSE 0 END
+CASE WHEN (s.Address=c.Address AND s.Zip=c.Zip) THEN 1 ELSE 0 END
+CASE WHEN s.EmailAddress=c.EmailAddress THEN 1 ELSE 0 END
+CASE WHEN s.HomePhone=c.HomePhone THEN 1 ELSE 0 END
END AS Score
FROM Sales s
LEFT OUTER JOIN Customers c ON s.ServiceId=c.ServiceId
OR (s.Address=c.Address AND s.Zip=c.Zip)
OR s.EmailAddress=c.EmailAddress
OR s.HomePhone=c.HomePhone
)
SELECT
s.*,c.*
FROM (SELECT
S_PK,MAX(Score) AS Score
FROM SalesScore
GROUP BY S_PK
) dt
INNER JOIN Sales s ON dt.s_PK=s.PK_ID
INNER JOIN SalesScore ss ON dt.s_PK=s.PK_ID AND dt.Score=ss.Score
LEFT OUTER JOIN Customers c ON ss.c_PK=c.PK_ID
EDIT
I hate to write so much actual code when there was no shema given, because I can't actually run this and be sure it works. However to answer the question of the how to handle ties using the last transaction date, here is a newer version of the above code:
;WITH SalesScore AS (
SELECT
s.PK_ID as S_PK
,c.PK_ID AS c_PK
,CASE
WHEN c.PK_ID IS NULL THEN 0
ELSE CASE WHEN s.ServiceId=c.ServiceId THEN 1 ELSE 0 END
+CASE WHEN (s.Address=c.Address AND s.Zip=c.Zip) THEN 1 ELSE 0 END
+CASE WHEN s.EmailAddress=c.EmailAddress THEN 1 ELSE 0 END
+CASE WHEN s.HomePhone=c.HomePhone THEN 1 ELSE 0 END
END AS Score
FROM Sales s
LEFT OUTER JOIN Customers c ON s.ServiceId=c.ServiceId
OR (s.Address=c.Address AND s.Zip=c.Zip)
OR s.EmailAddress=c.EmailAddress
OR s.HomePhone=c.HomePhone
)
SELECT
*
FROM (SELECT
s.*,c.*,row_number() over(partition by s.PK_ID order by s.PK_ID ASC,c.LastTransaction DESC) AS RankValue
FROM (SELECT
S_PK,MAX(Score) AS Score
FROM SalesScore
GROUP BY S_PK
) dt
INNER JOIN Sales s ON dt.s_PK=s.PK_ID
INNER JOIN SalesScore ss ON dt.s_PK=s.PK_ID AND dt.Score=ss.Score
LEFT OUTER JOIN Customers c ON ss.c_PK=c.PK_ID
) dt2
WHERE dt2.RankValue=1
Here's a fairly ugly way to do this, using SQL Server code. Assumptions:
- Column CustomerId exists in the Customer table, to uniquely identify customers.
- Only exact matches are supported (as implied by the question).
SELECT top 1 CustomerId, LastTransaction, count(*) HowMany
from (select Customerid, LastTransaction
from Sales sa
inner join Customers cu
on cu.ServiceId = sa.ServiceId
union all select Customerid, LastTransaction
from Sales sa
inner join Customers cu
on cu.EmailAddress = sa.EmailAddress
union all select Customerid, LastTransaction
from Sales sa
inner join Customers cu
on cu.Address = sa.Address
and cu.ZipCode = sa.ZipCode
union all [etcetera -- repeat for each possible link]
) xx
group by CustomerId, LastTransaction
order by count(*) desc, LastTransaction desc
I dislike using "top 1", but it is quicker to write. (The alternative is to use ranking functions and that would require either another subquery level or impelmenting it as a CTE.) Of course, if your tables are large this would fly like a cow unless you had indexes on all your columns.
Frankly I would be wary of doing this at all as you do not have a unique identifier in your data.
John Smith lives with his son John Smith and they both use the same email address and home phone. These are two people but you would match them as one. We run into this all the time with our data and have no solution for automated matching because of it. We identify possible dups and actually physically call and find out id they are dups.
I would probably create a stored function for that (in Oracle) and oder on the highest match
SELECT * FROM (
SELECT c.*, MATCH_CUSTOMER( Customer.Id, par1, par2, par3 ) matches FROM Customer c
) WHERE matches >0 ORDER BY matches desc
The function match_customer returns the number of matches based on the input parameters... I guess is is probably slow as this query will always scan the complete customer table
For close matches you can also look at a number of string similarity algorithms.
For example, in Oracle there is the UTL_MATCH.JARO_WINKLER_SIMILARITY function:
http://www.psoug.org/reference/utl_match.html
There is also the Levenshtein distance algorithym.