Different Results from Grouping query - sql

I ran following query in MS Access 2007 and get expected results
SELECT Col1
FROM tblA
GROUP BY Col1
HAVING ((Count(Col1))>1);
But after adding additional column in the same table to the grouping as below. It gives 0 records
SELECT Col1, Col2
FROM tblA
GROUP BY Col1, Col2
HAVING ((Count(Col1))>1);
Col1 Col2
19570304 180243268
19570304 180243269
19570304 180243270
26984406 422233864
26984951 796883002
26985060 594201758
19700070 150814697
19700070 430871349
19700070 670755019
19700070 883583086
19700070 963146318
19990910 715835415
19990910 715835416
19990910 799844489
20123527 957714629
20123527 957714630
22000508 376790722
26981961 637378887
What could be the issue here
Thanks

Try this way:
SELECT t.Col1, t.Col2
FROM tblA t
inner join (
SELECT Col1
FROM tblA
GROUP BY Col1
HAVING ((Count(Col1))>1);
) tbl on tbl.col1=t.col1

I believe there is no duplicate pairs in Col1 and Col2

Related

NOT IN vs concatenate columns

Isn't both below SQL the same? I mean functionality wise should do the same thing?
I was expecting this first SQL should have got result as well.
SELECT *
FROM #TEST
WHERE COL1 NOT IN (SELECT COL1 FROM #TEST_1)
AND COL2 NOT IN (SELECT COL2 FROM #TEST_1)
--1 record
SELECT *
FROM #TEST
WHERE COL1 + COL2 NOT IN (SELECT COL1 +COL2 FROM #TEST_1)
CREATE TABLE #TEST
(
COL1 VARCHAR(10),
COL2 VARCHAR(10),
COL3 VARCHAR(10)
)
INSERT INTO #TEST VALUES ('123', '321', 'ABC')
INSERT INTO #TEST VALUES ('123', '436', 'ABC')
CREATE TABLE #TEST_1
(
COL1 VARCHAR(10),
COL2 VARCHAR(10),
COL3 VARCHAR(10)
)
INSERT INTO #TEST_1 VALUES ( '123','532','ABC')
INSERT INTO #TEST_1 VALUES ( '123','436','ABC')
--No result
SELECT *
FROM #TEST
WHERE COL1 NOT IN (SELECT COL1 FROM #TEST_1)
AND COL2 NOT IN (SELECT COL2 FROM #TEST_1)
--1 record
SELECT *
FROM #TEST
WHERE COL1 + COL2 NOT IN (SELECT COL1 + COL2 FROM #TEST_1)
Let's put this into a bit more context and look at your 2 WHERE clauses, which I'm going to call "WHERE 1" and "WHERE 2" respectively:
--WHERE 1
WHERE COL1 NOT IN (SELECT COL1 FROM #TEST_1)
AND COL2 NOT IN (SELECT COL2 FROM #TEST_1)
--WHERE 2
WHERE COL1 + COL2 NOT IN (SELECT COL1 + COL2 FROM #TEST_1)
As you might have noticed, this do not behave the same. In fact, from a logic point of view and the way the database engine would handle them they are completely different.
WHERE 2, to start with is not SARGable. This means that any indexes on your tables would not be able to able to be used and the data engine would have to scan the entire table. For WHERE 1, however, it is SARGable, and if you had any indexes, they could be used to perform seeks, likely helping with performance.
From the point of view of logic let's look at WHERE 2 first. This requires that the concatenated value of COL1 and COL2 not match the other concatenated value of COL1 and COL2; which means these values must be on the same row. So '123456' would match only when Col1 has the value '123' and Col2 the value '456'.
For WHERE 1, however, here the value of Col1 needs to be not found in the other table, and Col2 needs to be not found as well, but they can be on different rows. This is where things differ. As '123' in Col1 appears in both tables (and is the only value) then the NOT IN isn't fulfilled and no rows are returned.
In you wanted a SARGable version of WHERE 2, I would suggest using an EXISTS:
--1 row
SELECT T.COL1, --Don't use *, specify your columns
T.COL2, --Qualifying your columns is important!
T.COL3
FROM #TEST T --Aliasing is important!
WHERE NOT EXISTS (SELECT 1
FROM #TEST_1 T1
WHERE T1.COL1 = T.COL1
AND T1.COL2 = T.COL2);
db<>fiddle
When you add strings in this way (using + instead of concatenation) it adds the two strings and gives you numeric value.
At the first query you are not adding strings so what you did is:
Select all rows from #Test that values of Col1 and Col2 are not in Test1
And actually, only first argument is cutting everything out, since you got 123 values in both tables in col1.
Second query sums that strings, but not by concatenation.
It actually convert varchars to numbers behind the scene.
So the second query does:
Select all rows from #test where COL1+COL2 (its 444 at first row, and 559 in second row) are not in #Test 1
And if you add rows at #Test1, values are:
For the first row COL1+COL2= 655
For the second row COL1+COL2= 559
So only the row with the sum of 444 is not at #Test1, thats why you get 1 row as result.
To sum up:
Thats why you see only 1 row at the second query, and you don't see any records at your first query. At the first query only first condition actually works and cuts everything. And at the second query SQL engine is converting varchars to numerics.
So '123' +'321' is not '123321' but '444'.

Returning any element that isn’t contained in two columns SQL

I have two columns in a table and I want to create a third column that contains any element that isn’t contained in both columns. For example: the first row of both columns looks like:
Col1: [‘apple’,’banana’,’orange’,’pear’]
Col2: [‘apple’,’banana’]
And it would return:
Col3: [‘orange’, ‘pear’]
Essentially the opposite of array_intersect function. I have seen array_diff in php so I am wondering if there is an equivalent function in sql?
explode col1 and use array_contains+case statement, assemble array again using collect_set or collect_list.
Demo:
with your_data as (--Test data. Use your table instead of this
select stack(1,
array('apple','banana','orange','pear'),
array('apple','banana')
) as (col1, col2)
)
select col1, col2,
collect_set(case when array_contains(t.col2, e.col1_elem) then null else e.col1_elem end) as col3
from your_data t
lateral view explode(t.col1) e as col1_elem
group by col1, col2
Result:
col1 col2 col3
["apple","banana","orange","pear"] ["apple","banana"] ["orange","pear"]
If you have a primary key, then I think this will do what you want:
select t.pk, collect_set(case when c2.el is null then c1.el end)
from (t lateral view
explode(t.col1) c1 as el
) left join
(t t2 lateral view
explode(t2.col2) c2 as el
)
on t.pk = t2.pk and
c1.el = c2.el
group by t.pk;

Find percentage of increase between two values

I have a query that I am building that requires multiple flags. One of those flags is to find the percentage of increase between two values in the same row.
For example I have two values on my row:
Col1 26323 &
Col2 26397
Col2 has increased by 0.28 % on Col1. How can I express this in my query?
In this way
select Col1, Col2, (Col2 *100.0/Col1)-100 from (
select Col1 = 26323 , Col2 =26397
)a
Result :
Col1 Col2 (No column name)
26323 26397 0.281122972305
SELECT
100.0*(col1 - col2) / col2 As pdif
FROM ptable
Hope it is what you are looking for.

Best possible way to have out put from two tables with high difference in cardinality

I have two tables
select col1 , col2 , col3, col4, ........, col20 from ftcm; --TABLE has 470708 ROWS
select val from cspm where product='MARK'; --TABLE has 1 ROW
i have to make col3 as null if col2=val.
have thought of joining as
select
col1 , col2 , decode(col2,val,NULL,col3) col3 , col4, ........, col20
from ftcm a left outer join ( select val from cspm where product='MARK') b
on a.col2=b.val;
but it seems to be time taking
Please advise if there is any other way to get it tuned in best way.
I have not tested this query but if you know that the record from cspm is returning only one value, then you can perhaps try the following query :-
select col1, col2, decode(col2,(select val from cspm where product='MARK'),NULL,col3) col3, col4 ... col20 from ftcm
Since you are doing an outer join, the above might produce an equivalent output.
Another option which you can explore is to use a parallel hint
select /*+ parallel(em,4) */ col1, col2, decode(col2,(select val from cspm where product='MARK'),NULL,col3) col3, col4 ... col20 from ftcm em
However, consult with your DBA before using parallel hint at the specified degree (4)

SQL Server - Return full record for Duplicated Field

Suppose I have a SQL Server table with many columns something along the lines of:
Col1: Col2: Col3: ... Coln:
Val1_1 Val1_2 Val1_3 Val1_n
Val2_1 Val2_2 Val2_3 Val2_n
Val3_1 Val3_2 Val3_3 Val3_n
Val3_1 Val4_2 Val4_3 Val4_n
Val3_1 Val5_2 Val5_3 Val5_n
In this case, Val3_1 is repeated in Col1 for the last 3 records, whereas the remaining values are not repeated.
How can I write a query to return the full set of columns where Col1's value is duplicated to get back:
Col1: Col2: Col3: ... Coln:
Val3_1 Val3_2 Val3_3 Val3_n
Val3_1 Val4_2 Val4_3 Val4_n
Val3_1 Val5_2 Val5_3 Val5_n
I tried using the Group By function, but I had to write out each column's name (which gets very frustrating), I was hoping for something along the lines of:
SELECT MyTable.* FROM MyTable WHERE count(MyTable.Col1) OVER() > 1
Obviously, this didn't work, but how could I do something along those lines??
Thanks!!!
You can use a subquery:
select *
from mytable t1
inner join
(
select count(col1) Total, col1
from mytable
group by col1
having count(col1) > 1
) t2
on t1.col1 = t2.col1
See SQL Fiddle with Demo
Or you can use count(*) over():
select *
from
(
select *,
count(*) over(partition by col1) tot
from mytable
) src
where tot > 1
See SQL Fiddle with Demo