SQL Ignore duplicate primary keys - sql

Imagine you have a string of results from a SELECT statement:
ID (pk) Name Address
1 a b
1 c d
1 e f
2 a b
3 a d
2 a d
Is it possible to alter the SQL statement to get one record ONLY for the record with ID 1?
I have a SELECT statement that displays multiple values which can have the same primary key. I want to only take one of those records, if say, I have 5 records with the same primary key.
SQL: http://pastebin.com/cFCBA2Uy
Screenshot: http://i.imgur.com/UlMBZhC.png
What I want is to show only one file which is for e.g. File Number: 925, 890

You stated that no matter which row to choose when there are more than one row for the same Id, you just want one row for each id.
The following query does what you asked for:
DECLARE #T table
(
id int,
name varchar(50),
address varchar(50)
)
INSERT INTO #T VALUES
(1, 'a', 'b'),
(1, 'c', 'd'),
(1, 'e', 'f'),
(2, 'a', 'b'),
(3, 'a', 'd'),
(2, 'a', 'd');
WITH A AS
(
SELECT
t.id, t.name, t.address,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY (SELECT NULL)) AS RowNumber
FROM
#T t
)
SELECT
A.id, A.name, A.address
FROM
A
WHERE
A.RowNumber = 1
But I think there should be a criteria. If you find one, express your criteria as the ORDER BY inside the OVER clause.
EDIT:
Here you have the result:
+----+------+---------+
| id | name | address |
+----+------+---------+
| 1 | a | b |
| 2 | a | b |
| 3 | a | d |
+----+------+---------+
Disclaimer: the query I wrote is non-deterministic, different conditions (indexes, statistics, etc) might lead to different results.

Related

SQL Server query for multiple conditions on the same column

Here's the schema and data that i am working with
CREATE TABLE tbl (
name varchar(20) not null,
groups int NOT NULL
);
insert into tbl values('a', 35);
insert into tbl values('a', 36);
insert into tbl values('b', 35);
insert into tbl values('c', 36);
insert into tbl values('d', 37);
| name | groups|
|------|-------|
| a | 35 |
| a | 36 |
| b | 35 |
| c | 36 |
| d | 37 |
now i need names of only those that are having group greater than or equal to 35
but also an additional is that i can only include a row for which group=35 when a corresponding groups=36 is also present
| name | groups|
|------|-------|
| a | 35 |
| a | 36 |
second condition is that it CAN include those names that are having groups greater than or equal to 36 without having a groups=35
| name | groups|
|------|-------|
| c | 36 |
| d | 37 |
the only case it should leave out is where a record has only groups=35 present without a corresponding groups=36
| name | groups|
|------|-------|
| b | 35 |
i have tried the following
select name from tbl
where groups>=35
group by name
having count(distinct(groups))>=2
or groups>=36;
this is the error i am facing Column 'tbl.groups' is invalid in the HAVING clause because it is not contained in either an aggregate function or the GROUP BY clause.
Try this:
DECLARE #tbl table ( [name] varchar(20) not null, groups int NOT NULL );
INSERT INTO #tbl VALUES
('a', 35), ('a', 36), ('b', 35), ('c', 36), ('d', 37);
DECLARE #group int = 35;
; WITH cte AS (
SELECT
[name]
, COUNT ( DISTINCT groups ) AS distinct_group_count
FROM #tbl
WHERE
groups >= #group
GROUP BY
[name]
)
SELECT t.* FROM #tbl AS t
INNER JOIN cte
ON t.[name] = cte.[name]
WHERE
cte.distinct_group_count > 1
OR t.groups > #group;
RETURNS
+------+--------+
| name | groups |
+------+--------+
| a | 35 |
| a | 36 |
| c | 36 |
| d | 37 |
+------+--------+
Basically, this restricts the name results to groups with a value >= 35 with more than one distinct group associated, or any name with a group value greater than 35. Several assumptions were made in regard to your data, but I believe the logic still applies.
So, as far as i can tell you just want to limit where groups 35 is by itself. I thought, lets try and isolate those names where they only have groups=35 and then not exists from there. Is this the correct output youre after?
Also, using complicated OR's in the where clause will often lead to your query not being SARGable. Better to UNION or some how building the query so that each part can use indexes (if they can).
if object_id('tempdb..#tbl') is not null drop table #tbl;
CREATE TABLE #tbl (
name varchar(20) not null,
groups int NOT NULL
);
insert into #tbl values('a', 35), ('a', 36), ('b', 35), ('c', 36), ('d', 37);
select *
from #tbl tbl
WHERE NOT EXISTS
(
SELECT COUNT(groups), name
FROM #tbl t
WHERE EXISTS
(
SELECT name
FROM #tbl tb
WHERE groups = 35
and tb.name=t.name
)
AND t.name = tbl.name
GROUP BY name
HAVING COUNT(groups)=1
)
;
It looks like you need an exists() condition. Try:
select *
from tbl t
where t.groups >= 35
and (
t.groups > 35
or exists(select * from tbl t2 where t2.name = t.name and t2.groups = 36)
)
There are other ways to arrange the where clause to achieve the same effect. Having the t.groups >= 35 condition up front should give the query optimizer the ability to leverage an index on groups.
You can use a windowed count for this
This avoids joining the table multiple times
SELECT
name,
groups
FROM (
SELECT *,
Count36 = COUNT(CASE WHEN groups = 36 THEN 1 END) OVER (PARTITION BY name)
FROM tbl
WHERE groups >= 35
) tbl
WHERE groups >= 36 OR Count36 > 0;
db<>fiddle

SQLite query - filter name where each associated id is contained within a set of ids

I'm trying to work out a query that will find me all of the distinct Names whose LocationIDs are in a given set of ids. The catch is if any of the LocationIDs associated with a distinct Name are not in the set, then the Name should not be in the results.
Say I have the following table:
ID | LocationID | ... | Name
-----------------------------
1 | 1 | ... | A
2 | 1 | ... | B
3 | 2 | ... | B
I'm needing a query similar to
SELECT DISTINCT Name FROM table WHERE LocationID IN (1, 2);
The problem with the above is it's just checking if the LocationID is 1 OR 2, this would return the following:
A
B
But what I need it to return is
B
Since B is the only Name where both of its LocationIDs are in the set (1, 2)
You can try to write two subquery.
get count by each Name
get count by your condition.
then join them by count amount, which means your need to all match your condition count number.
Schema (SQLite v3.17)
CREATE TABLE T(
ID int,
LocationID int,
Name varchar(5)
);
INSERT INTO T VALUES (1, 1,'A');
INSERT INTO T VALUES (2, 1,'B');
INSERT INTO T VALUES (3, 2,'B');
Query #1
SELECT t2.Name
FROM
(
SELECT COUNT(DISTINCT LocationID) cnt
FROM T
WHERE LocationID IN (1, 2)
) t1
JOIN
(
SELECT COUNT(DISTINCT LocationID) cnt,Name
FROM T
WHERE LocationID IN (1, 2)
GROUP BY Name
) t2 on t1.cnt = t2.cnt;
| Name |
| ---- |
| B |
View on DB Fiddle
You can just use aggregation. Assuming no duplicates in your table:
SELECT Name
FROM table
WHERE LocationID IN (1, 2)
GROUP BY Name
HAVING COUNT(*) = 2;
If Name/LocationID pairs can be duplicated, use HAVING COUNT(DISTINCT LocationID) = 2.

Select rows with highest value in one column in SQL

in MySQL, I am trying to select one row for each "foreign_id". It must be the row with the highest value in column "time" (which is of type DATETIME). Can you help me how the SQL SELECT statement must look like? Thank you!
This would be really great! I am already trying for hours to find a solution :(
This is my table:
primary_id | foreign_id | name | time
----------------------------------------------------
1 | 3 | a | 2017-05-18 01:02:03
2 | 3 | b | 2017-05-19 01:02:03
3 | 3 | c | 2017-05-20 01:02:03
4 | 5 | d | 2017-07-18 01:02:03
5 | 5 | e | 2017-07-20 01:02:03
6 | 5 | f | 2017-07-18 01:02:03
And this is what the result should look like:
primary_id | foreign_id | name | time
----------------------------------------------------
3 | 3 | c | 2017-05-20 01:02:03
5 | 5 | e | 2017-07-20 01:02:03
I tried to order the intermediate result by time (descending) and then to select only the first row by using LIMIT 1. But like this I cannot get one row for each foreign_id.
Another try was to first order the intermediate result by time (descending) and then to GROUP BY foreign_id. But the GROUP BY statement seems to be executed before the ORDER BY statement (I received the rows with primary_id 1 and 4 as a result, not 3 and 5).
Try this
SELECT DISTINCT *
From my_table A
INNER JOIN (SELECT foreign_id, Max(time) AS time FROM my_table GROUP BY foreign_id) B
ON A.foreign_id = B.foreign_id AND A.time = B.time
Just add some data sample to analyze special case
CREATE TABLE Table1
(`primary_id` int, `foreign_id` int, `name` varchar(1), `time` datetime)
;
INSERT INTO Table1
(`primary_id`, `foreign_id`, `name`, `time`)
VALUES
(1, 3, 'a', '2017-05-18 01:02:03'),
(2, 3, 'b', '2017-05-19 01:02:03'),
(3, 3, 'c', '2017-05-20 01:02:03'),
(7, 3, 'H', '2017-05-20 01:02:03'),
(4, 5, 'd', '2017-07-18 01:02:03'),
(5, 5, 'e', '2017-07-20 01:02:03'),
(6, 5, 'f', '2017-07-18 01:02:03')
;
http://sqlfiddle.com/#!9/38947b/6
select d.primary_id, d.foreign_id, c.name, d.time
from table1 c inner join (
select max(b.primary_id) primary_id, a.foreign_id, a.time
from table1 b inner join
( select foreign_id, max(time) time
from table1
group by foreign_id) a
on a.foreign_id = b.foreign_id and a.time=b.time
group by a.foreign_id, a.time ) d
on c.primary_id=d.primary_id
In days gone by you would code this as a correlated subquery:
SELECT *
FROM Table1 o
WHERE primary_id = (
SELECT min (m.primary_id) FROM Table1 m
WHERE m.time= (
SELECT max (i.time) FROM Table1 i
WHERE o.foreign_id=i.foreign_id
)
)
The extra subquery handles the case of duplicate foreign_id & time values. If you were sure that time was unique for each foreign_id you could omit the middle subquery.

Create a table with unique values from another table

I am using MS SQL Server Management Studio. I have table -
+--------+----------+
| Num_ID | Alpha_ID |
+--------+----------+
| 1 | A |
| 1 | B |
| 1 | C |
| 2 | B |
| 2 | C |
| 3 | A |
| 4 | C |
| 5 | A |
| 5 | B |
+--------+----------+
I want to create another table with 2 columns from this table so that column_1 gives Unique values in Num_ID (i.e. 1,2,3,4 and so on) and column_2 gives Unique values in Alpha_ID (A, B, C and so on).
But if an alphabet has already occurred, it should not occur again. So the output will be something like this -
Col_1 Col_2
================
1 - A
----------------
2 - B
----------------
3 - NULL (as A has been chosen by 1, it cannot occur next to 3)
----------------
4 - C
----------------
5 - NULL (both 5 A and 5 B cannot be chosen as A and B were picked up by 1 and 2)
----------------
Hope that makes sense.
I would like to clarify that the IDs in the input table are not numerical as I have shown, but both Num_ID and Alpha_ID are complex strings. I have simplified them to 1,2,3,... and A, B, C .... for the purpose of this question
I don't think this could be done without a cursor.
I added few more rows to your sample data to test how it works with other cases.
The logic is straight-forward. At first get a list of all distinct values of Num_ID. Then loop through them and with each iteration add one row to the destination table. To determine the Alpha_ID value to add I'll use EXCEPT operator that takes all available Alpha_ID values for the current Num_ID from the source table and removes from them all values that have been used before.
It is possible to write that INSERT without using explicit variable #CurrAlphaID, but it looks a bit cleaner with variable.
Here is SQL Fiddle.
DECLARE #TSrc TABLE (Num_ID varchar(10), Alpha_ID varchar(10));
INSERT INTO #TSrc (Num_ID, Alpha_ID) VALUES
('1', 'A'),
('1', 'B'),
('1', 'C'),
('2', 'B'),
('2', 'C'),
('3', 'A'),
('3', 'C'),
('4', 'A'),
('4', 'C'),
('5', 'A'),
('5', 'B'),
('5', 'C'),
('6', 'D'),
('6', 'E');
DECLARE #TDst TABLE (Num_ID varchar(10), Alpha_ID varchar(10));
DECLARE #CurrNumID varchar(10);
DECLARE #CurrAlphaID varchar(10);
DECLARE #iFS int;
DECLARE #VarCursor CURSOR;
SET #VarCursor = CURSOR FAST_FORWARD
FOR
SELECT DISTINCT Num_ID
FROM #TSrc
ORDER BY Num_ID;
OPEN #VarCursor;
FETCH NEXT FROM #VarCursor INTO #CurrNumID;
SET #iFS = ##FETCH_STATUS;
WHILE #iFS = 0
BEGIN
SET #CurrAlphaID =
(
SELECT TOP(1) Diff.Alpha_ID
FROM
(
SELECT Src.Alpha_ID
FROM #TSrc AS Src
WHERE Src.Num_ID = #CurrNumID
EXCEPT
SELECT Dst.Alpha_ID
FROM #TDst AS Dst
) AS Diff
ORDER BY Diff.Alpha_ID
);
INSERT INTO #TDst (Num_ID, Alpha_ID)
VALUES (#CurrNumID, #CurrAlphaID);
FETCH NEXT FROM #VarCursor INTO #CurrNumID;
SET #iFS = ##FETCH_STATUS;
END;
CLOSE #VarCursor;
DEALLOCATE #VarCursor;
SELECT * FROM #TDst;
Result
Num_ID Alpha_ID
1 A
2 B
3 C
4 NULL
5 NULL
6 D
Having index on (Num_ID, Alpha_ID) on the source table would help. Having index on (Alpha_ID) on the destination table would help as well.
I think I've made something not through a recursion (cursor or a while)
First, I created a table with rows.
create table #tmptest
(
Num_ID int
, Alpha_ID varchar(50)
)
insert into #tmptest (Num_ID, Alpha_ID) values
(1,'A'),
(1,'B'),
(1,'C'),
(2,'B'),
(2,'C'),
(3,'A'),
(4,'C'),
(5,'A'),
(5,'B')
// this one, with row column
SELECT
ROW_NUMBER() OVER (PARTITION BY Num_ID ORDER BY Num_ID ASC) as row
, *
INTO #tmp_withrow
FROM #tmptest
and these were the results
Lastly, I made an inner query (could possibly be a left join or better).
SELECT DISTINCT
Num_ID
, (
SELECT
TOP 1
Alpha_ID
FROM #tmp_withrow in1
WHERE
in1.Num_ID = t.Num_ID
AND in1.Alpha_ID NOT IN (
SELECT
Alpha_ID
FROM #tmp_withrow in2
WHERE
in2.Num_ID < in1.Num_ID
AND in2.row = 1
)
ORDER BY in1.Num_ID ASC
) AS [NonRepeatingAlpha]
from #tmptest t
and these were the results
Note : I created a flag (row) which will allow you to query all less than the ID's you're in (in2.Num_ID < in1.Num_ID) then find out what letters where already used (in2.row = 1) and then select / avoid all letters that has already been used from the other Num_ID (
WHERE in1.Num_ID = t.Num_ID
AND in1.Alpha_ID NOT IN (
SELECT
Alpha_ID
FROM #tmp_withrow in2
WHERE
in2.Num_ID < in1.Num_ID
AND in2.row = 1
)
I hope this helps. Thanks!

CONCAT(column) OVER(PARTITION BY...)? Group-concatentating rows without grouping the result itself

I need a way to make a concatenation of all rows (per group) in a kind of window function like how you can do COUNT(*) OVER(PARTITION BY...) and the aggregate count of all rows per group will repeat across each particular group. I need something similar but a string concatenation of all values per group repeated across each group.
Here is some example data and my desired result to better illustrate my problem:
grp | val
------------
1 | a
1 | b
1 | c
1 | d
2 | x
2 | y
2 | z
And here is what I need (the desired result):
grp | val | groupcnct
---------------------------------
1 | a | abcd
1 | b | abcd
1 | c | abcd
1 | d | abcd
2 | x | xyz
2 | y | xyz
2 | z | xyz
Here is the really tricky part of this problem:
My particular situation prevents me from being able to reference the same table twice (I'm actually doing this within a recursive CTE, so I can't do a self-join of the CTE or it will throw an error).
I'm fully aware that one can do something like:
SELECT a.*, b.groupcnct
FROM tbl a
CROSS APPLY (
SELECT STUFF((
SELECT '' + aa.val
FROM tbl aa
WHERE aa.grp = a.grp
FOR XML PATH('')
), 1, 0, '') AS groupcnct
) b
But as you can see, that is referencing tbl two times in the query.
I can only reference tbl once, hence why I'm wondering if windowing the group-concatenation is possible (I'm a bit new to TSQL since I come from a MySQL background, so not sure if something like that can be done).
Create Table:
CREATE TABLE tbl
(grp int, val varchar(1));
INSERT INTO tbl
(grp, val)
VALUES
(1, 'a'),
(1, 'b'),
(1, 'c'),
(1, 'd'),
(2, 'x'),
(2, 'y'),
(2, 'z');
In sql 2017 you can use STRING_AGG function:
SELECT STRING_AGG(T.val, ',') AS val
, T.grp
FROM #tbl AS T
GROUP BY T.grp
I tried using pure CTE approach: Which is the best way to form the string value using column from a Table with rows having same ID? Thinking it is faster
But the benchmark tells otherwise, it's better to use subquery(or CROSS APPLY) results from XML PATH as they are faster: Which is the best way to form the string value using column from a Table with rows having same ID?
DECLARE #tbl TABLE
(
grp INT
,val VARCHAR(1)
);
BEGIN
INSERT INTO #tbl(grp, val)
VALUES
(1, 'a'),
(1, 'b'),
(1, 'c'),
(1, 'd'),
(2, 'x'),
(2, 'y'),
(2, 'z');
END;
----------- Your Required Query
SELECT ST2.grp,
SUBSTRING(
(
SELECT ','+ST1.val AS [text()]
FROM #tbl ST1
WHERE ST1.grp = ST2.grp
ORDER BY ST1.grp
For XML PATH ('')
), 2, 1000
) groupcnct
FROM #tbl ST2
Is it possible for you to just put your stuff in the select instead or do you run into the same issue? (i replaced 'tbl' with 'TEMP.TEMP123')
Select
A.*
, [GROUPCNT] = STUFF((
SELECT '' + aa.val
FROM TEMP.TEMP123 AA
WHERE aa.grp = a.grp
FOR XML PATH('')
), 1, 0, '')
from TEMP.TEMP123 A
This worked for me -- wanted to see if this worked for you too.
I know this post is old, but just in case, someone is still wondering, you can create scalar function that concatenates row values.
IF OBJECT_ID('dbo.fnConcatRowsPerGroup','FN') IS NOT NULL
DROP FUNCTION dbo.fnConcatRowsPerGroup
GO
CREATE FUNCTION dbo.fnConcatRowsPerGroup
(#grp as int) RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE #val AS VARCHAR(MAX)
SELECT #val = COALESCE(#val,'')+val
FROM tbl
WHERE grp = #grp
RETURN #val;
END
GO
select *, dbo.fnConcatRowsPerGroup(grp)
from tbl
Here is the result set I got from querying a sample table:
grp | val | (No column name)
---------------------------------
1 | a | abcd
1 | b | abcd
1 | c | abcd
1 | d | abcd
2 | x | xyz
2 | y | xyz
2 | z | xyz