Alternative way of using DISTINCT in SQL Server 2008

Alternative way of using DISTINCT in SQL Server 2008 - sql

I have a table with millions of rows of data in SQL Server 2008. I am trying to find an alternative way instead of using distinct. Please see the query below:
create table #temp (id int)
create table #temp2 (id int, name varchar(55), t_id int)
insert into #temp values (1)
insert into #temp2 values (1,'john',1)
insert into #temp2 values (2,'alex',1)
insert into #temp2 values (3,'alex',1)
select t.id, t2.name
from #temp t
inner join #temp2 t2 on t.id = t2.t_id
This query returns output like:
Id Name
1 john
1 alex
1 alex
The expected output is:
Id Name
1 john
1 alex
I can provide the expected output by using DISTINCT keyword, I know it, but it decreases the performance. Could you please advise me some professional alternative ways (except using group by) to handle it? Thanks!
Edit:
I have a custom concentenate function which helps me to do:
select t.id, concetenate(t2.name)
from #temp t
inner join #temp2 t2 on t.id = t2.t_id
and this is returning 1 john,alex,alex. I am looking for a way to get rid of one of the alex without updating the function and do not want to use "distinct" keyword.

Use GROUP BY
select t.id, t2.name
from #temp t
inner join #temp2 t2 on t.id = t2.t_id
GROUP BY t.id, t2.name
Use a CTE and ROW_NUMBER and your custom "concetenate" function
;WITH cte
AS(
select t.id, t2.name, RN=ROW_NUMBER()OVER(PARTITION BY t2.name ORDER BY t2.id)
from #temp t
inner join #temp2 t2 on t.id = t2.t_id
)
SELECT C.id
, Name =concetenate(C.name)
FROM cte C WHERE C.RN = 1

You can use group by like below - But why you are inserting duplicates in yourtable..
create table #temp (id int)
create table #temp2 (id int, name varchar(55), t_id int)
insert into #temp values (1)
insert into #temp2 values (1,'john',1)
insert into #temp2 values (2,'alex',1)
insert into #temp2 values (3,'alex',1)
select t.id, t2.name
from #temp t
inner join #temp2 t2 on t.id = t2.t_id
GROUP BY t.id, t2.name
Another solution is we can create a constraint to restrict the duplicate values.

Related

Use multiple WITH tablename AS (...) statements SQL Server

I'm trying to create two temporary tables and join them with a permanent table. For example:
WITH temp1 AS (COUNT(*) AS count_sales, ID FROM table1 GROUP BY ID)
WITH temp2 AS (COUNT(*) AS count_somethingelse, ID FROM table2 GROUP BY ID)
SELECT *
FROM table3 JOIN table2 JOIN table1
ON table1.ID = table2.ID = table3.ID
but there seems to be an issue having multiple WITH tablename AS (...) statments. I tried a semicolon.

Your query should look more like this:
WITH temp1 AS (
SELECT COUNT(*) AS count_sales, ID
FROM table1
GROUP BY ID
),
temp2 AS (
SELECT COUNT(*) AS count_somethingelse, ID
FROM table2
GROUP BY ID
)
SELECT *
FROM temp2 JOIN
temp1
ON temp1.ID = temp2.ID;
Your query has multiple errors. I would suggest you start by understanding why this version works -- or at least does something other than report on syntax errors. Then, go back and study SQL some more.

I'm trying to create two temporary tables
Just for clarity... you used a CTE which is not quite the same thing as a temporary table. You've also tagged 'temp tables', so you want a temp table? You can store query results in a declared table variable or an actual temp table.
An example of declared table variables:
DECLARE #table1 TABLE(id int, count_sales int)
INSERT INTO #table1 (id, count_sales)
SELECT ID, COUNT(*)
FROM table1
GROUP BY ID
--or however you populate temp table1
DECLARE #table2 TABLE(id int, count_somethingelse int)
INSERT INTO #table2 (id, count_somethingelse)
SELECT ID, COUNT(*)
FROM table2
GROUP BY ID
--or however you populate temp table2
SELECT T3.id
--,T2.(some column)
--,T1.(some column)
--,etc...
FROM table3 T3 INNER JOIN #table2 T2 ON T3.id = T2.id
INNER JOIN #table1 T1 ON T3.id = T1.id

Query for earliest datetime and corresponding number field

I'm attempting to update a table with a dollar amount based on the earliest datetime field from another table. For example:
Table 1
ID|INITIAL_ANNUAL_RATE_AMT|
1 | NULL (I want to update this to 25.02)
Table 2
ID|ANNUAL_RATE_AMT|STARTING_DATE|
1 |25.01 |1/1/2014
1 |25.02 |1/1/2013
I've got a query like this that retreives the earliest date from table 2 and the corresponding objects ID:
select ID,
MIN(t2.STARTING_DATE) as EARLIEST_START_DATE
from t2
group by t2.ID
But how can I leverage this into an update statement that sets the INITIAL_ANNUAL_RATE_AMT in table 1 to the earliest corresponding value in table 2?
Something like this (which currently fails):
update t1
set t1.Initial_Annual_Rate__c = t3.ANNUAL_RATE_AMT
from t1, t2
left join
(select t2.ID
MIN(t2.STARTING_DATE) as EARLIEST_START_DATE
from t2
group by t2.DEAL_ID)
as t3 ON (t3.DEAL_ID = t1.DEAL_ID)

One way is to use a CTE
;WITH C AS(
SELECT t.ID, EARLIEST_START_DATE, ANNUAL_RATE_AMT FROM(
select ID,
MIN(t2.STARTING_DATE) as EARLIEST_START_DATE
from #Table2 AS t2
group by t2.ID) t
INNER JOIN #Table2 AS t2 ON t2.ID = t.ID AND t.EARLIEST_START_DATE = t2.STARTING_DATE
)
UPDATE t1
SET INITIAL_ANNUAL_RATE_AMT = C.ANNUAL_RATE_AMT
FROM #Table1 AS t1
INNER JOIN C ON C.ID = t1.ID
SQLFIDDLE

Another method, using a window function to get the first row in each ID partitioned set:
-- Setup test data
declare #table1 table (ID int, INITIAL_ANNUAL_RATE_AMT decimal(9,2))
declare #table2 table (ID int, ANNUAL_RATE_AMT decimal(9,2), STARTING_DATE date)
INSERT INTO #table1 (ID, INITIAL_ANNUAL_RATE_AMT)
SELECT 1, NULL
INSERT INTO #table2 (ID, ANNUAL_RATE_AMT, STARTING_DATE)
SELECT 1,25.01,'1/1/2014'
UNION SELECT 1,25.02,'1/1/2013'
-- Do the update
;with table2WithIDRowNumbers as (
select ID, ANNUAL_RATE_AMT, STARTING_DATE, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY STARTING_DATE) as rowNumber
FROM #table2
)
UPDATE t1
SET INITIAL_ANNUAL_RATE_AMT=t2.ANNUAL_RATE_AMT
FROM table2WithIDRowNumbers t2
INNER JOIN #table1 t1 ON t1.ID=t2.ID
where t2.rowNumber=1
-- Show the result
SELECT * from #table1

How to left join to first row in SQL Server

How to left join two tables, selecting from second table only the first row?
My question is a follow up of:
SQL Server: How to Join to first row
I used the query suggested in that thread.
CREATE TABLE table1(
id INT NOT NULL
);
INSERT INTO table1(id) VALUES (1);
INSERT INTO table1(id) VALUES (2);
INSERT INTO table1(id) VALUES (3);
GO
CREATE TABLE table2(
id INT NOT NULL
, category VARCHAR(1)
);
INSERT INTO table2(id,category) VALUES (1,'A');
INSERT INTO table2(id,category) VALUES (1,'B');
INSERT INTO table2(id,category) VALUES (1,'C');
INSERT INTO table2(id,category) VALUES (3,'X');
INSERT INTO table2(id,category) VALUES (3,'Y');
GO
------------------
SELECT
table1.*
,FirstMatch.category
FROM table1
CROSS APPLY (
SELECT TOP 1
table2.id
,table2.category
FROM table2
WHERE table1.id = table2.id
ORDER BY id
)
AS FirstMatch
However, with this query, I get inner join results. I want to get left join results. The tabel1.id in desired results should have '2' with NULL. How to do it?

use row_number and left join
with cte as(
select id,
category,
row_number() over(partition by id order by category) rn
from table2
)
select t.id, cte.category
from table1 t
left outer join cte
on t.id=cte.id and cte.rn=1
OUTPUT:
id category
1 A
2 (null)
3 X
SQLFIDDLE DEMO

select table1.id,
(SELECT TOP 1 category FROM table2 WHERE table2.id=table1.id ORDER BY category ASC) AS category
FROM table1

SELECT table1.id ,table2.category
FROM table1 Left join table2
on table1.id = table2.id
where table2.category = ( select top 1 category from table2 t where table1.id = t.id)
OR table2.category is NULL

Following the comment of t-clausen.dk this does the job:
change CROSS APPLY to OUTER APPLY

SQL Join query to show records if exists in master table or not

Table1
id name color
1,'a','red'
2,'a','blue'
3,'b','red'
4,'c','red'
5,'d','red'
6,'a','green'
declare #t1 table (id int, name varchar(10),color varchar(5))
insert into #t1 values(1,'a','red')
insert into #t1 values(2,'a','blue')
insert into #t1 values(3,'b','red')
insert into #t1 values(4,'c','red')
insert into #t1 values(5,'d','red')
table t2 (master table )
color
red
blue
green
declare #t2 table (color varchar(5))
insert into #t2 values ('red')
insert into #t2 values ('blue')
insert into #t2 values ('green')
The output will be
'a','red'
'a','blue'
'a','green'
We need to retrieve the name from table 1 what are all having all the t2 color...

You can get the names in t1 that match all master colors using group by, having, and join:
select t1.name
from t1 join
t2
on t1.color = t2.color
group by t1.name
having count(distinct t1.color) = (select count(*) from t2);
This returns the names. If you want the detailed rows, then use this as a subquery or CTE and join t1 back to these results.
And to get the detailed rows:
with n as (
select t1.name
from t1 join
t2
on t1.color = t2.color
group by t1.name
having count(distinct t1.color) = (select count(*) from t2)
)
select t1.*
from t1 join
n
on t1.name = n.name;

INSERT SELECT TOP (n) ORDER BY column not included in the destination table

Is that possible to insert from select when the select statement has more columns that the table to insert to ?
consider scenario:
INSERT INTO table_1 --table_1 consist only of one column
SELECT TOP 10 col1, col2 --col_2 is only selected because is used in ORDER BY
FROM table_2
ORDER BY col2 DESC
The above statement will result with error. One way I would accomplish that is to use sub-query like that
INSERT INTO table_1
SELECT TOP 10 col1
FROM (
SELECT col1, col2
FROM table_2
ORDER BY col2 DESC
) AS t
But I'm wondering if there is a straight forward way for example using equal operator like in UPDATE statement.
UPDATE
I apologize for submitting oversimplified example. That was because I've took it for granted this will apply to my scenario without actually testing it.
This is the reproduced context of my query (tested on sqlfiddle as have no SQL Server installed on my home PC)
CREATE TABLE table_1 (id INT)
CREATE TABLE table_2 (id INT, col2 INT)
CREATE TABLE table_3 (id INT, col2 INT)
INSERT INTO table_2 VALUES (1,3),(2,2),(3,1)
INSERT INTO table_3 VALUES (1,3),(1,2),(3,1)
INSERT INTO table_1
SELECT TOP 1 t.id, t.Qty
FROM table_2
INNER JOIN
(
SELECT table_2.id, COUNT(table_3.id) AS Qty
FROM table_2
INNER JOIN table_3 on table_3.id = table_2.id
GROUP BY table_2.id
) AS t ON (t.id = table_2.id)
ORDER BY t.Qty
The original query is much more complex, therefore I would like to avoid another sub-query if this is possible.
This query results with the error saying:
Column name or number of supplied values does not match table definition.: INSERT INTO table_1 SELECT TOP 1 table_1.id FROM table_1 INNER JOIN ( SELECT table_2.id, COUNT(table_3.id) AS Qty FROM table_2 INNER JOIN table_3 on table_3.id = table_2.id GROUP BY table_2.id ) AS t ON (t.id = table_1.id) ORDER BY t.Qty

Up don't need to include the order by col in the select list.
CREATE TABLE table_1 (id INT)
GO
CREATE TABLE table_2 (col1 INT, col2 INT)
GO
INSERT INTO table_2 VALUES (1,3),(2,2),(3,1)
GO
INSERT INTO table_1 --table_1 consist only of one column
SELECT TOP 2 col1
FROM table_2
ORDER BY col2 DESC

Have you tried specifying the column that you are trying to insert into?
INSERT INTO table_1 (id)
SELECT TOP 1 table_1.id
FROM table_1
INNER JOIN
(
SELECT table_2.id, COUNT(table_3.id) AS Qty
FROM table_2
INNER JOIN table_3 on table_3.id = table_2.id
GROUP BY table_2.id
) AS t ON (t.id = table_1.id)
ORDER BY t.Qty

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Alternative way of using DISTINCT in SQL Server 2008 - sql

Related

Use multiple WITH tablename AS (...) statements SQL Server

Query for earliest datetime and corresponding number field

How to left join to first row in SQL Server

SQL Join query to show records if exists in master table or not

INSERT SELECT TOP (n) ORDER BY column not included in the destination table

Categories

Resources