SQL Server SELECT first occurrence OR if no occurrence SELECT other criteria - sql

I am having an issue trying to form the proper SQL query for the job here. I have two tables, one is called CUSTOMER and the other is called CUSTOMER_CONTACT. To simplify this, I will only include the relevant column names.
CUSTOMER columns: ID, CUSTOMERNAME
CUSTOMER_CONTACT columns: ID, CUSTOMER_ID, CONTACT_VC, EMAIL
CUSTOMER_ID is the foreign key to link to the CUSTOMER table from CUSTOMER_CONTACT. CONTACT_VC is just the entry number for their contact information. There could be multiple CUSTOMER_CONTACT records for each customer, but they will have a unique CONTACT_VC.
EMAIL can be null/blank on some or all as well.
I need to select the first CUSTOMER_CONTACT entry where EMAIL is NOT NULL/blank but if none of the CUSTOMER_CONTACT entries have an email address, then select CUSTOMER_CONTACT WHERE CONTACT_VC = 1
Any suggestions on how to accomplish this?

The following approach uses ROW_NUMBER to retrieve a number based on your ordering logic within each CUSTOMER_ID group, then filters by the first record retrieved.
You may try the following:
SELECT
*
FROM (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY CUSTOMER_ID
ORDER BY (CASE WHEN EMAIL IS NOT NULL THEN 0 ELSE 1 END),CONTACT_VC
) as rn
FROM
CUSTOMER_CONTACT
) t
WHERE rn=1
If you would like to join this to the customer table you may use the above query as a subquery eg
SELECT
c.*,
contact.*
FROM
CUSTOMER c
INNER JOIN (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY CUSTOMER_ID
ORDER BY (CASE WHEN EMAIL IS NOT NULL THEN 0 ELSE 1 END),CONTACT_VC
) as rn
FROM
CUSTOMER_CONTACT
) contact ON c.ID = contact.CUSTOMER_ID and contact.rn=1

Here is almost the same answer as ggordon, but I used a common table expression and I think the ordering in the subquery portion should go by CONTACT_VS first then by non-NULL email addresses. I created some very simple test data to run this:
DECLARE #CUSTOMER AS TABLE
(
[ID] INT NOT NULL,
[CUSTOMERNAME] VARCHAR(10) NOT NULL
);
INSERT INTO #CUSTOMER
(
[ID],
[CUSTOMERNAME]
)
VALUES
(1, 'Alice'),
(2, 'Bob'),
(3, 'Cathy');
DECLARE #CUSTOMER_CONTACT AS TABLE
(
[ID] INT NOT NULL,
[CUSTOMER_ID] INT NOT NULL,
[CONTACT_VC] INT NOT NULL,
[EMAIL] VARCHAR(40) NULL
);
INSERT INTO #CUSTOMER_CONTACT
(
[ID],
[CUSTOMER_ID],
[CONTACT_VC],
[EMAIL]
)
VALUES
(1, 1, 1, 'alice#email.com'),
(2, 1, 2, 'alice#gmail.com'),
(3, 2, 1, NULL),
(4, 2, 2, 'bob#work.com'),
(5, 3, 1, NULL),
(6, 3, 2, NULL),
(7, 3, 3, NULL);
;WITH [cc]
AS (SELECT [ID],
[CUSTOMER_ID],
[CONTACT_VC],
[EMAIL],
ROW_NUMBER() OVER (PARTITION BY [CUSTOMER_ID]
ORDER BY [CONTACT_VC],
(CASE WHEN [EMAIL] IS NOT NULL THEN
0
ELSE
1
END
)
) AS [rn]
FROM #CUSTOMER_CONTACT)
SELECT [c].[ID], [c].[CUSTOMERNAME], [cc].[ID], [cc].[CUSTOMER_ID], [cc].[CONTACT_VC], [cc].[EMAIL]
FROM #CUSTOMER AS [c]
INNER JOIN [cc]
ON [c].[ID] = [cc].[CUSTOMER_ID]
AND [cc].[rn] = 1;

select * from CUSTOMER_CONTACT where EMAIL IS NOT NULL
union all
select * from CUSTOMER_CONTACT where
(CONTACT_VC=1 and NOT EXISTS (select 1 FROM CUSTOMER_CONTACT where EMAIL IS NOT NUL)
order by CONTACT_VC asc limit 1

Related

sql SERVER - distinct selection based on priority columns

hello I would like to find a solution to solve my problem in a single request if possible.
For the moment I take all the records then I go through the lines one by one to eliminate what I don't want.
I have 2 tables : first table with links
the second with the prefered label for the url
the second table must be consulted keeping only the row with maximum priority
priority rules are
the current user then
the user group and finally
everyone.
if the hidden column is true, exclude any reference to the url
here is the expected result.
Unfortunately, I don't see any other solution than to multiply the conditions on several selects and unions.
if you have a idea to solve my problem, thank you in advance for your help
It appears as though you can rely on pref_id for the preference ordering, correct? If so, you could try:
SELECT *
FROM table2
INNER JOIN table1 ON table2.url_id = table1.url_id
QUALIFY ROW_NUMBER() OVER (
PARTITION BY table1.url
ORDER BY pref_id ASC
) = 1
This will partition by the url and then provide only the one with lowest pref_id.
I didn't test this SQL as I wasn't sure which RDBMS you're running on, but I used Rasgo to translate the SQL.
maybe of interest in this tricky query:
select so.*, table1.url from
(select distinct t.url_id,
(select pref_id from table2 s where s.url_id = t.url_id order by "user" is null, "group" is null limit 1) pref_id
from table2 t
where not exists(select 1 from table2 s where s.hide and s.url_id = t.url_id)
) ids
join table2 so on so.pref_id = ids.pref_id
join table1 ON table1.url_id = ids.url_id
order by so.url_id;
here is my solution but i think there is better to do.
in the condition's select, I built a column which gives a level note according to the priorities
DECLARE #CUR_USER VARCHAR(10) = 'ROBERT'
DECLARE #CUR_GROUP VARCHAR(10) = 'DEV'
DECLARE #TABLE1 TABLE (
URL_ID INT
,URLNAME VARCHAR(100)
);
DECLARE #TABLE2 TABLE (
PREF_ID INT
,URL_ID INT
,FAVORITE_LABEL VARCHAR(100)
,USER_GROUP VARCHAR(10)
,USER_CODE VARCHAR(10)
,HIDE_URL DECIMAL(1, 0) DEFAULT 0
);
INSERT INTO #TABLE1
VALUES
(1, 'https://stackoverflow.com/')
,(2, 'https://www.microsoft.com/')
,(3, 'https://www.apple.com/')
,(4, 'https://www.wikipedia.org/')
;
INSERT INTO #TABLE2
VALUES
(1000, 1, 'find everything', NULL, 'ROBERT', 0)
,(1001, 1, 'a question ? find the answer here', 'DEV', NULL, 0)
,(1002, 1, 'StackOverFlow', NULL, NULL, 0)
,(1003, 2, 'Windows', 'DEV', NULL, 0)
,(1004, 2, 'Microsoft', NULL, NULL, 0)
,(1005, 3, 'Apple', NULL, NULL, 0)
,(1006, 4, 'Free encyclopedia', NULL, 'ROBERT', 1)
,(1007, 4, 'Wikipedia', NULL, NULL, 0)
,(1008, 1, 'StackOverFlow FOR MAT', 'MAT', NULL, 0)
,(1009, 2, 'Microsoft FOR MAT', 'MAT', NULL, 0)
,(1010, 3, 'Apple', 'MAT', NULL, 1)
,(1011, 4, 'Wikipedia FOR MAT', 'MAT', NULL, 0)
,(1012, 1, 'StackOverFlow', NULL, 'JEAN', 1)
,(1013, 2, 'Microsoft ', NULL, 'JEAN', 0)
,(1014, 3, 'Apple', NULL, 'JEAN', 0)
,(1015, 4, 'great encyclopedia', NULL, 'JEAN', 0)
;
SELECT t2.* ,t1.URLName
FROM #TABLE1 t1
INNER JOIN #TABLE2 t2 ON t1.URL_ID = t2.URL_ID
WHERE EXISTS (
SELECT 1
FROM (
SELECT TOP (1) test.PREF_ID
,CASE
-- if I do not comment this case: jean from the MAT group will not see apple
-- WHEN Hide_Url = 1
-- THEN 3
WHEN USER_code IS NOT NULL
THEN 2
WHEN USER_GROUP IS NOT NULL
THEN 1
ELSE 0
END AS ROW_LEVEL
FROM #TABLE2 test
WHERE (
(
test.USER_GROUP IS NULL
AND test.user_group IS NULL
AND test.USER_code IS NULL
)
OR (test.USER_GROUP = #CUR_GROUP)
OR (test.USER_code = #CUR_USER)
)
AND t2.URL_ID = test.URL_ID
ORDER BY ROW_LEVEL DESC
) test
WHERE test.PREF_ID = t2.PREF_ID
AND Hide_Url = 0
)
Simply use an ORDER BY clause that puts the preferred row first. You can use this in the window function ROW_NUMBER and work with this or use a lateral top(1) join with CROSS APPLY.
select *
from urls
cross apply
(
select top(1) *
from labels
where labels.url_id = urls.url_id
where [Group] is not null or [user] is not null or hide is not null
order by
case when [Group] is null then 2 else 1 end,
case when [user] is null then 2 else 1 end,
case when hide is null then 2 else 1 end
) top_labels
order by urls.url_id;

Looping through groups of records

SQL Server 2014, I have a table with a number of rows for example 15, 5 have a groupid column of 736881 and 10 have a group id column 3084235. What I want to do is process each group of records in turn and load the results in to a table.
I have written the code to do this but I think I am not setting the loopcounter incorrectly set as I keep getting the groupid of records 736881 loaded twice.
I cant't currently post the test data due to containing personal information but if the mistake is not obvious I will try and create some dummy data.
SELECT #LoopCounter = min(rowfilter) , #maxrowfilter = max(rowfilter)
FROM peops6
WHILE ( #LoopCounter IS NOT NULL
AND #LoopCounter <= #maxrowfilter)
begin
declare #customer_dist as Table (
[id] [int] NOT NULL,
[First_Name] [varchar](50) NULL,
[Last_Name] [varchar](50) NULL,
[DoB] [date] NULL,
[post_code] [varchar](50) NULL,
[mobile] [varchar](50) NULL,
[Email] [varchar](100) NULL );
INSERT INTO #customer_dist (id, First_Name, Last_Name, DoB, post_code, mobile, Email)
select id, first_name, last_name, dob, postcode, mobile_phone, email from peops6 where rowfilter = #LoopCounter
insert into results
SELECT result.* ,
[dbo].GetPercentageOfTwoStringMatching(result.DoB, d.DoB) [DOB%match] ,
[dbo].GetPercentageOfTwoStringMatching(result.post_code, d.post_code) [post_code%match] ,
[dbo].GetPercentageOfTwoStringMatching(result.mobile, d.mobile) [mobile%match] ,
[dbo].GetPercentageOfTwoStringMatching(result.Email, d.Email) [email%match]
FROM ( SELECT ( SELECT MIN(id)
FROM #customer_dist AS sq
WHERE sq.First_Name = cd.First_Name
AND sq.Last_Name = cd.Last_Name
AND ( sq.DoB = cd.DoB
OR sq.mobile = cd.mobile
OR sq.Email = cd.Email
OR sq.post_code = cd.post_code )) nid ,
*
FROM #customer_dist AS cd ) AS result
INNER JOIN #customer_dist d ON result.nid = d.id order by 1, 2 asc;
SELECT #LoopCounter = min(rowfilter) FROM peops6
WHERE rowfilter > #LoopCounter
end
You need to truncate your table variable (#customer_dist) at the end of the loop:
....
-- Add this
TRUNCATE TABLE #customer_dist
SELECT #LoopCounter = min(rowfilter) FROM peops6
WHERE rowfilter > #LoopCounter
end
See: https://social.msdn.microsoft.com/Forums/sqlserver/en-US/42ef20dc-7ad8-44f7-b676-a4596fc0d593/declaring-a-table-variable-inside-a-loop-does-not-delete-the-previous-data?forum=transactsql
I am not sure you need a LOOP like using a SQL Cursor to fulfill this task
Please check following SQL statement where I used multiple CTE expressions
with customer_dist as (
select
rowfilter,
id, first_name, last_name, dob, postcode, mobile_phone, email
from peops6
), result as (
SELECT
(
SELECT
MIN(id)
FROM customer_dist AS sq
WHERE
sq.rowfilter = cd.rowfilter
AND sq.First_Name = cd.First_Name
AND sq.Last_Name = cd.Last_Name
AND (sq.DoB = cd.DoB OR sq.mobile_phone = cd.mobile_phone OR sq.Email = cd.Email OR sq.postcode = cd.postcode )
) nid,
*
FROM customer_dist AS cd
)
SELECT
result.* ,
[dbo].edit_distance(result.DoB, d.DoB) [DOB%match] ,
[dbo].edit_distance(result.postcode, d.postcode) [post_code%match] ,
[dbo].edit_distance(result.mobile_phone, d.mobile_phone) [mobile%match] ,
[dbo].edit_distance(result.Email, d.Email) [email%match]
FROM result
INNER JOIN customer_dist d
ON result.nid = d.id
order by 1, 2 asc;
Please note, I used my fuzzy string matching Levenshtein Distance Algorithm in this sample instead of your function
And the outcome is as follows
Only you need to add the INSERT statement just before the last SELECT statement
Hope it is useful

How find duplicates in a table with no primary key or ID field?

I've inherited a SQL Server database that has duplicate data in it. I need to find and remove the duplicate rows. But without an id field, I'm not sure how to find the rows.
Normally, I'd compare it with itself using a LEFT JOIN and check that all fields are the same except the ID field would be table1.id <> table2.id, but without that, I don't know how to find duplicates rows and not have it also match on itself.
TABLE:
productId int not null,
categoryId int not null,
state varchar(255) not null,
dateDone DATETIME not null
SAMPLE DATA
1, 3, "started", "2016-06-15 04:23:12.000"
2, 3, "started", "2016-06-15 04:21:12.000"
1, 3, "started", "2016-06-15 04:23:12.000"
1, 3, "done", "2016-06-15 04:23:12.000"
In that sample, only rows 1 and 3 are duplicates.
How do I find duplicates?
Use having (and group by)
select
productId
, categoryId
, state
, dateDone
, count(*)
from your_table
group by productId ,categoryId ,state, dateDone
having count(*) >1
You can do this with windowing functions. For instance
create table #tmp
(
Id INT
)
insert into #tmp
VALUES (1), (1), (2) --so now we have duplicated rows
WITH CTE AS
(
SELECT
ROW_NUMBER() OVER(PARTITION BY Id ORDER BY Id) AS [DuplicateCounter],
Id
FROM #tmp
)
DELETE FROM CTE
WHERE DuplicateCounter > 1 --duplicated rows have DuplicateCounter > 1
For some reason I thought you wanted to delete them I guess I read that wrong but just switch DELETE in my statement to SELECT and now you have all of the duplicates and not the original. But using DELETE will remove all duplicates and still leave you 1 record which I suspect is your desire.
IF OBJECT_ID('tempdb..#TT') IS NOT NULL
BEGIN
DROP TABLE #TT
END
CREATE TABLE #TT (
productId int not null,
categoryId int not null,
state varchar(255) not null,
dateDone DATETIME not null
)
INSERT INTO #TT (productId, categoryId, state, dateDone)
VALUES (1, 3, 'started', '2016-06-15 04:23:12.000')
,(2, 3, 'started', '2016-06-15 04:21:12.000')
,(1, 3, 'started', '2016-06-15 04:23:12.000')
,(1, 3, 'done', '2016-06-15 04:23:12.000')
SELECT *
FROM
#TT
;WITH cte AS (
SELECT
*
,RowNum = ROW_NUMBER() OVER (PARTITION BY productId, categoryId, state, dateDone ORDER BY productId) --note what you order by doesn't matter
FROM
#TT
)
--if you want to delete them just do this otherwise change DELETE TO SELECT
DELETE
FROM
cte
WHERE
RowNum > 1
SELECT *
FROM
#TT
If you want to and can change schema you can always add an identity column after the fact too and it will populate the existing record
ALTER TABLE #TT
ADD Id INTEGER IDENTITY(1,1) NOT NULL
You can try CTE and then limit the actual selection from the CTE to where RN = 1. Here is the query:-
;WITH ACTE
AS
(
SELECT ProductID, categoryID, State, DateDone,
RN = ROW_NUMBER() OVER(PARTITION BY ProductID, CategoryID, State, DateDone
ORDER BY ProductID, CategoryID, State, DateDone)
FROM [Table]
)
SELECT * FROM ACTE WHERE RN = 1

How to retrieve data with group by aggregate function in SQL Server 2012

I am new to SQL Server 2012. This is my table DDL & DML script.
CREATE TABLE [tbl_item_i18n]
(
[item_id] [int] NOT NULL,
[lang_id] [int] NOT NULL,
[item_text] [nvarchar](max) NULL
);
INSERT INTO [tbl_item_i18n] ([item_id],[lang_id],[item_text])
VALUES (1, 1, 'item1'), (1, 2, 'idem 1'),
(2, 1, 'item2'),
(3, 1, 'item3'), (3, 2, 'idem 3'),
(4, 1, 'item4'), (4, 2, 'idem 4');
My expected output is :
This is what I have tried :
select
lang_id,
case when lang_id = 2 AND itemI18N.item_text is not null then itemI18N.item_text
when lang_id = 1 then itemI18N.item_text
end as ite_texte
from
tbl_item_i18n itemI18N
group by
itemI18N.item_id, lang_id, itemI18N.item_text
But it does not give me expected result.
Purpose :- I would like to retrieve data for lang_id = 2. If the record for lang_id = 2 does not exist, then retrieve data for lang_id = 2.
How do I retrieve data using aggregate function?
LEFT JOIN Bring all the column with lenguaje 1, and null if doesnt have lenguaje 2.
I include extra column so you understand the result.
SELECT
item_id,
CASE
WHEN B.lang_id IS NULL THEN A.item_text
ELSE B.item_text
END as item_name,
A.*
B.*
FROM tbl_item_i18n A
LEFT JOIN tbl_item_i18n B
ON A.item_id = B.item_id
AND A.lang_id < B.lang_id
NOTE
Maybe need especial consideration if more than 2 lenguajes.
Another solution
SELECT *
FROM
(
SELECT item_id, lang_id, item_text as item_name,
ROW_NUMBER() over (partition by item_id order by lang_id desc) as RN
FROM tbl_item_i18n
) as t
WHERE RN = 1

how to insert multiple rows with check for duplicate rows in a short way

I am trying to insert multiple records (~250) in a table (say MyTable) and would like to insert a new row only if it does not exist already.
I am using SQL Server 2008 R2 and got help from other threads like SQL conditional insert if row doesn't already exist.
While I am able to achieve that with following stripped script, I would like to know if there is a better (short) way to do this as I
have to repeat this checking for every row inserted. Since we need to execute this script only once during DB deployment, I am not too much
worried about performance.
INSERT INTO MyTable([Description], [CreatedDate], [CreatedBy], [ModifiedDate], [ModifiedBy], [IsActive], [IsDeleted])
SELECT N'ababab', GETDATE(), 1, NULL, NULL, 1, 0
WHERE NOT EXISTS(SELECT * FROM MyTable WITH (ROWLOCK, HOLDLOCK, UPDLOCK)
WHERE
([InstanceId] IS NULL OR [InstanceId] = 1)
AND [ChannelPartnerId] IS NULL
AND [CreatedBy] = 1)
UNION ALL
SELECT N'xyz', 1, GETDATE(), 1, NULL, NULL, 1, 0
WHERE NOT EXISTS(SELECT * FROM [dbo].[TemplateQualifierCategoryMyTest] WITH (ROWLOCK, HOLDLOCK, UPDLOCK)
WHERE
([InstanceId] IS NULL OR [InstanceId] = 1)
AND [ChannelPartnerId] IS NULL
AND [CreatedBy] = 1)
-- More SELECT statements goes here
You could create a temporary table with your descriptions, then insert them all into the MyTable with a select that will check for rows in the temporary table that is not yet present in your destination, (this trick in implemented by the LEFT OUTER JOIN in conjunction with the IS NULL for the MyTable.Description part in the WHERE-Clause):
DECLARE #Descriptions TABLE ([Description] VARCHAR(200) NOT NULL )
INSERT INTO #Descriptions ( Description )VALUES ( 'ababab' )
INSERT INTO #Descriptions ( Description )VALUES ( 'xyz' )
INSERT INTO dbo.MyTable
( Description ,
CreatedDate ,
CreatedBy ,
ModifiedDate ,
ModifiedBy ,
IsActive ,
IsDeleted
)
SELECT d.Description, GETDATE(), 1, NULL, NULL, 1, 0
FROM #Descriptions d
LEFT OUTER JOIN dbo.MyTable mt ON d.Description = mt.Description
WHERE mt.Description IS NULL