Selecting rows based on row level uniqueness (combination of columns)

Selecting rows based on row level uniqueness (combination of columns) - sql

I hope somebody can help me solve the following problem.
I need to select unique rows based on a combination of 2 or 3 columns. Its basically a 3 level hierachy table that I build up referening the PK as the parentId in the hierachy.
To set everything up please run the following script:
-- ===================
-- Source table & data
-- ===================
IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[ExternalSource]') AND type in (N'U'))
BEGIN
CREATE TABLE [dbo].[ExternalSource](
[locname1] [varchar](max) NULL,
[locname2] [varchar](max) NULL,
[locname3] [varchar](max) NULL
) ON [PRIMARY]
END
INSERT [dbo].[ExternalSource] ([locname1], [locname2], [locname3]) VALUES (N'Location1', N'Floor1', N'Room123')
INSERT [dbo].[ExternalSource] ([locname1], [locname2], [locname3]) VALUES (N'Location2', N'Floor2', N'Room234')
INSERT [dbo].[ExternalSource] ([locname1], [locname2], [locname3]) VALUES (N'Location3', N'Floor2', N'Room111')
-- ===================
-- Destination table
-- ===================
CREATE TABLE [dbo].[Location](
[LocationID] [int] IDENTITY(1,1) NOT NULL,
[CompanyID] [tinyint] NOT NULL,
[ParentID] [int] NULL,
[LocationCode] [nvarchar](20),
[LocationName] [nvarchar](60) NOT NULL,
[CanAssign] [bit] NOT NULL)
-- Level 1 records in the hierachy
insert into Location
(
CompanyID,
ParentID,
LocationName,
CanAssign
)
select distinct 1, NULL, ES.locname1, 1
from dbo.ExternalSource ES
where ES.locname1 not in (select LocationName from Location) and ES.locname1 is not null
-- Level 2 records in the hierachy
insert into Location
(
CompanyID,
ParentID,
LocationName,
CanAssign
)
select 1, max(Loc.LocationID), ES.locname2, 1
from ExternalSource ES
left join Location Loc on ES.locname1 = Loc.LocationName
where ES.locname2 not in (select LocationName from Location) and ES.locname2 is not null and ES.locname1 is not null
group by ES.locname2
order by ES.locname2
select * from ExternalSource
select * from Location
The first insert into Location is not a problem at all, all I want at the first insert is unique Location names.
Now at my second insert I need to be able to tell whether ExternalSource.locname2 & Location.LocationName are unique in a "combined" fashion, if that makes sense...
If they are unique, then I need to have the location name at level 2 selected.
Here is an example:
Below is what you get when you do a select * from ExternalSource
locname1 locname2 locname3
Location1 Floor1 Room123
Location2 Floor2 Room234
Location3 Floor2 Room111
Given the above, there is only one Floor1 on locname2 so no issues there but as you can see there are two Floor2 on the locname2 column. I need a way to check if the value on locname2 + locname1 are unique when "combined". If they are I should select them both.
This is the expected output of the select during the second insert:
1 1 Floor1 1
1 2 Floor2 1
1 3 Floor2 1
But lets say the output of ExternalSource where to look like this:
locname1 locname2 locname3
Location1 Floor1 Room123
Location2 Floor2 Room234
Location2 Floor2 Room111
Note the bold Location2 above, because there are two rows with the same value on locname2 + locname1 it doesn't make it unique anymore and then the desired output whould have looked like this:
1 1 Floor1 1
1 3 Floor2 1

So you want to group by two columns in ExternalSource...?
select MAX(LocationID), Locname1, Locname2, 1 from ExternalSource
group by Locname1, Locname2

Related

T-SQL Select row only if not exist already

I have a table with two Ids, ResourceId and LanguageId
I need to join those two selects where second result would be added only if ResourceId not already in the list.
SELECT * FROM Resources WHERE Language = 1
SELECT * FROM Resources WHERE Language = 0
JOIN
/*where ResourceId not present already*/
So far I came up with nothing except complicated partitions. Is there better solution to this?
Not all ResourceIds have Language 0 entry
Not all ResourceIds have Language 1 entry
Some ResourceIds have both
CREATE TABLE [dbo].[Resources](
[Id] [bigint] NOT NULL,
[ResourceId] [bigint] NOT NULL,
[LanguageId] [int] NOT NULL,
[Text] [nvarchar](2000) NULL,
[Path] [varchar](2000) NULL,
CONSTRAINT [PK_Resourcces] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]

You could use a union with exists logic:
SELECT * FROM Resources WHERE Language = 1
UNION ALL
SELECT *
FROM Resources r1
WHERE
Language = 0 AND
NOT EXISTS (SELECT 1 FROM Resources r2
WHERE r2.Language = 1 AND r2.ResourceId = r1.ResourceId);

You can number the rows per resourceid by languageid using the row_number() window function and then just select the "first" one.
SELECT id,
resourceid,
languageid,
text,
path
FROM (SELECT id,
resourceid,
languageid,
text,
path,
row_number() OVER (PARTITION BY resourceid
ORDER BY languageid DESC) rn
FROM resources
WHERE languageid IN (0,
1) x
WHERE rn = 1;

Since I had started answering but Tim was more effective than me, I still put my answer as you indicated that, and I quote:
If somebody finds something faster and simpler, I would love to see it
USE TEST
CREATE DATABSE TEST
CREATE TABLE Ressources
(
RessourceId INT,
LanguageId INT
);
INSERT INTO Ressources
VALUES
(1,1),
(1,0),
(1,2),
(1,3),
(2,1),
(2,0),
(2,2),
(3,1),
(4,1),
(5,0)
WITH CTE_L1 AS (SELECT * FROM Ressources WHERE LanguageId = 1)
SELECT * FROM CTE_L1
UNION ALL
SELECT * FROM Ressources
WHERE LanguageId = 0
AND RessourceId NOT IN(SELECT RessourceId FROM CTE_L1)
Results I got:
RessourceId LanguageId
----------- -----------
1 1
2 1
3 1
4 1
5 0
(Same result if I execute the #Tim Biegeleisen query)
See which one you like best.
--> Cost of mine query 0.010132
--> Cost of Tim query 0.0100952
(Based on the execution plan)

Create Count On Table That References Itself

I have a table that looks like below:
The table lists countries and regions (states, provinces, counties, etc) within those countries. I need to generate a count of all the regions within all countries. As you can see, each region has a ParentID which is the ID of the country in which you can find the region.
As an example, California is in USA, so its parent ID is 1 (which is the ID of USA).
So, the results from the simple table above should be:
USA: 2 and
Canada: 1
I have tried the following:
Select all values into a table which have ID a 1 (for USA)
Select all values into a table which have ID a 3 (for Canada)
Select all values into the USA table with Parent ID as 1
Select all values into the Canada table with Parent ID as 3
Do counts on both tables
The problem with the above approach is that if a new country is added, a count will not be automatically generated.
Any ideas on making this more dynamic?

You have to join the table with itself:
select t1.ID, t1.segment, count(distinct t2.ID)
from yourTable t1
join yourTable t2
on t1.ID = t2.parentID
where t1.parentID is null
group by t1.ID, t1.segment
The where clause ensures you that only "top level" rows will be displayed.

Perhaps it makes sense to re-format the data, incase there are other sorts of queries that you want to make in addition to a count of countries and regions.
CREATE TABLE #CountriesRegions
(
[ID] [int] NOT NULL,
parentid [int] NULL,
segment [nvarchar](50) NULL)
insert into #CountriesRegions values (1,null,'usa'), (2,1, 'california'), (3, null, 'canada'), (4, 3, 'quebec'), (5, 1, 'NY')
select * from #CountriesRegions
Create table #Country
([ID] [int] NOT NULL
,[country_name] [nvarchar](50) NOT NULL)
Insert into #Country select ID, segment AS country_name from #CountriesRegions where parentid IS NULL
select * from #Country
Create table #Region
([ID] [int] NOT NULL
,[country_id] [int] NOT NULL
,[region_name] [nvarchar](50) NOT NULL)
Insert into #Region select ID, parentid AS country_ID, segment AS region_name from #CountriesRegions where parentid IS NOT NULL
select * from #Region
Select COUNT(*) As 'Num of Countries' from #Country
Select COUNT(*) As 'Num of Regions' from #Region

CREATE TABLE CountriesRegions
(
[ID] [int] NOT NULL,
parentid [int] NULL,
segment [nvarchar](50) NULL)
insert into CountriesRegions values (1,null,'usa'), (2,1, 'california'), (3, null, 'canada'), (4, 3, 'quebec'), (5, 1, 'NY')
select a.id, a.segment, count(*) as [Region Count]
from CountriesRegions a
left join CountriesRegions b
on a.id=b.parentid
where b.id is not null
group by a.id, a.segment

Order by and custom sorting in Microsoft SQL Server

I have a table with numeric and string values. I need to apply the custom sorting as mentioned below:-
CREATE TABLE [dbo].[TEST]
(
[Tag] [nvarchar](max) NULL,
[Category] [nvarchar](max) NULL,
[LE] [nvarchar](max) NULL,
[Description] [nvarchar](max) NULL,
[Row_Id] [int] NOT NULL,
CONSTRAINT [PK_testsirius_TEST_0_Row_Id]
PRIMARY KEY CLUSTERED ([Row_Id] ASC)
)
Insert into TEST values (1,'Area','EMR','A',199)
Insert into TEST values (2,'Area','EMR','B',200)
Insert into TEST values (3,'Area','EMR','C',201)
Insert into TEST values (201,'Area','EMR','1',399)
Insert into TEST values (202,'Area','EMR','2',400)
Insert into TEST values (203,'Area','EMR','3',401)
Excepted output:
select *
from TEST
order by asc
Output:
1
2
3
A
B
C
Current output:
C
B
A
3
2
1
Requirement :
If the sort direction is [↑] then first sort all the numeric values from smallest to the largest, then sort all the time values from oldest to newest and then sort all the text values from A to Z
If the sort direction is [↓] then first sort all the text values from Z to A, then sort all the time values from newest to oldest and then sort all the numeric values from largest to the smallest
While sorting, always place the blank cells at the bottom.

SELECT *
FROM TEST
ORDER BY CASE WHEN Description NOT LIKE '%[^0-9]%' THEN 0 ELSE 1 END,
LEN(Description),
Description

You can use isNumeric().
select *
from TEST
order by CASE WHEN isNumeric(Description) = 1 THEN Cast([Description] as int) ELSE 2147483647 END
, Description
-- for a descending order you can use the maths idea that -1 * a number maintains magnitude but reverses the order ...
select *
from TEST
order by CASE WHEN isNumeric(Description) = 1 THEN Cast([Description] as int) * -1 ELSE -2147483648 END
, Description desc
-- with the extra test case for 12345
CREATE TABLE [dbo].[TEST]
(
[Tag] [nvarchar](max) NULL,
[Category] [nvarchar](max) NULL,
[LE] [nvarchar](max) NULL,
[Description] [nvarchar](max) NULL,
[Row_Id] [int] NOT NULL,
CONSTRAINT [PK_testsirius_TEST_0_Row_Id]
PRIMARY KEY CLUSTERED ([Row_Id] ASC)
)
Insert into TEST values (1,'Area','EMR','A',199)
Insert into TEST values (2,'Area','EMR','B',200)
Insert into TEST values (3,'Area','EMR','C',201)
Insert into TEST values (201,'Area','EMR','1',399)
Insert into TEST values (202,'Area','EMR','2',400)
Insert into TEST values (203,'Area','EMR','3',401)
Insert into TEST values (204,'Area','EMR','12345',402)
select *
from TEST
order by CASE WHEN isNumeric(Description) = 1 THEN Cast([Description] as int) ELSE 2147483647 END
, Description
-- example output
Tag Category LE Description Row_Id
201 Area EMR 1 399
202 Area EMR 2 400
203 Area EMR 3 401
204 Area EMR 12345 402
1 Area EMR A 199
2 Area EMR B 200
3 Area EMR C 201
-- descending order
select *
from TEST
order by CASE WHEN isNumeric(Description) = 1 THEN Cast([Description] as int) * -1 ELSE -2147483648 END
, Description desc
-- example output
Tag Category LE Description Row_Id
3 Area EMR C 201
2 Area EMR B 200
1 Area EMR A 199
204 Area EMR 12345 402
203 Area EMR 3 401
202 Area EMR 2 400
201 Area EMR 1 399

Have you tried this:
select *
from TEST
order by case when Description not like '%[0-9]%' then 1 else 0 end, Description

Remove duplicate row and update next row to current row and continue

I need a select query ..
Environment : SQL DBA -SQL SERVER 2005 or newer
Example :
In this sample table, if I select top 20 no duplicate records should come and next record should be in 20 records .
Example :
123456 should not repeat in 20 records and if 18th is duplicate, in place of 18th, 19th record should come and in 19th—20th should come, in 20th ---21st should come .
No concern of Asc or Desc for rows .
Lookup Table before
Id Name
123456 hello
123456 hello
123654 hi
123655 yes
LookUp Table after
Id Name
123456 hello
123654 hi
123655 yes
My table:
CREATE TABLE [dbo].[test](
[Id] [int] IDENTITY(1,1) NOT NULL,
[ContestId] [int] NOT NULL,
[PrizeId] [int] NOT NULL,
[ContestParticipantId] [int] NOT NULL,
[SubsidiaryAnswer] [varchar](256) NOT NULL,
[SubsidiaryDifference] [bigint] NOT NULL,
[AttemptTime] [datetime] NOT NULL,
[ParticipantName] [varchar](250) NOT NULL,
[IsSubscribed] [bit] NOT NULL,
[IsNewlyRegistered] [bit] NOT NULL,
[IsWinner] [bit] NOT NULL,
[IsWinnerConfirmed] [bit] NOT NULL,
[IsWinnerExcluded] [bit] NOT NULL) ON [PRIMARY]
My question is: from this select, we actually need the first 20, but unique ones.
SELECT TOP 20 * FROM test order by SubsidiaryDifference
When we do the above query, we have currently some double in there. In case there is a double, we need take them only 1 time and take the next one
Any one know this issue ?
Thanks in advance :)

Reading your question, it appears you don't really want to delete the rows from the table - you just want to display the TOP 20 distinct rows - you try something like this:
;WITH LastPerContestParticipantId AS
(
SELECT
ContestParticipantId,
-- add whatever other columns you want to select here
ROW_NUMBER() OVER(PARTITION BY ContestParticipantId
ORDER BY SubsidiaryDifference) AS 'RowNum'
FROM dbo.Test
)
SELECT TOP (20)
ContestParticipantId,
-- add whatever other columns you want to select here
SubsidiaryDifference
FROM
LastPerContestParticipantId
WHERE
RowNum = 1
This will show you the most recent row for each distinct ContestParticipantId, order by SubsidiaryDifference - try it!
Update #2: I've created a quick sample - it uses the data from your original post - plus an additional SubID column so that I can order rows of the same ID by something...
When I run this with my CTE query, I do get only one entry for each ID - so what exactly is "not working" for you?
DECLARE #test TABLE (ID INT, EntryName VARCHAR(50), SubID INT)
INSERT INTO #test
VALUES(123456, 'hello', 1), (123456, 'hello', 2), (123654, 'hi', 1), (123655, 'yes', 3)
;WITH LastPerId AS
(
SELECT
ID, EntryName,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY SubID DESC) AS 'RowNum'
FROM #test
)
SELECT TOP (3)
ID, EntryName
FROM
LastPerId
WHERE
RowNum = 1
Gives an output of:
ID EntryName
123456 hello
123654 hi
123655 yes
No duplicates.

rows that exclude each other in create table statement

I have to create a table, with either first name and last name of a person, or a name of an organization. There has to be exactly one of them. For example one row of the table is -
first_name last_name organization
---------- --------- ------------
John Smith null
or another row can be -
first_name last_name organization
---------- --------- --------------------
null null HappyStrawberry inc.
Is there a way to define this in SQL language? Or should I just define all three columns being able to get null values?

Your situation is a classical example of what some ER dialects call "entity subtyping".
You have an entity called "Person" (or "Party" or something of that ilk), and you have two ditinct sub-entities called "NaturalPerson" and "LegalPerson", respectively.
The canonical way to model ER entity subtypes in a relational database is using three tables : one for the "Person" entity with all columns that are "common" for both NaturalPerson and LegalPerson (i.e. that exist for Persons, regardless of their type), and one per identified sub-entity holding all the columns that pertain to that sub-entity in particular.
You can read more on this in Fabian Pascal, "Practical Issues in Database Management".

You could use a check constraint, like:
create table YourTable (
col1 varchar(50)
, col2 varchar(50)
, col3 varchar(50)
, constraint TheConstraint check ( 1 =
case when col1 is null then 1 else 0 end +
case when col2 is null then 1 else 0 end +
case when col3 is null then 1 else 0 end )
)
Another way is to add a type column (EAV method):
create table YourTable (
type varchar(10) check (type in ('FirstName', 'LastName', 'Organisztion')
, value varchar(50))
insert YourTable ('LastName', 'Obama')
insert YourTable ('FirstName', 'Barrack')
insert YourTable ('Orginazation', 'White House')

You can do this using a constraint:
CREATE TABLE [dbo].[Contact](
[first_name] [varchar](50) NULL,
[last_name] [varchar](50) NULL,
[organization] [varchar](50) NULL,
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Contact] WITH CHECK ADD CONSTRAINT [CK_Contact] CHECK (([first_name] IS NOT NULL OR [last_name] IS NOT NULL OR [organization] IS NOT NULL))
GO
ALTER TABLE [dbo].[Contact] CHECK CONSTRAINT [CK_Contact]
GO
The CK_Contact constraint ensures that at least one value was entered.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Selecting rows based on row level uniqueness (combination of columns) - sql

So you want to group by two columns in ExternalSource...? select MAX(LocationID), Locname1, Locname2, 1 from ExternalSource group by Locname1, Locname2

Related

T-SQL Select row only if not exist already

Create Count On Table That References Itself

Order by and custom sorting in Microsoft SQL Server

Remove duplicate row and update next row to current row and continue

rows that exclude each other in create table statement

Categories

Resources