Need help identifying dups in the table - sql

What I have:
data_source_1 table
data_source_2 table
data_sources_view view
About tables:
data_source_1:
has no dups:
db=# select count(*) from (select distinct * from data_source_1);
count
--------
543243
(1 row)
db=# select count(*) from (select * from data_source_1);
count
--------
543243
(1 row)
data_source_2:
has no dups:
db=# select count(*) from (select * from data_source_2);
count
-------
5304
(1 row)
db=# select count(*) from (select distinct * from data_source_2);
count
-------
5304
(1 row)
data_sources_view:
has dups:
db=# select count(*) from (select distinct * from data_sources_vie);
count
--------
538714
(1 row)
db=# select count(*) from (select * from data_sources_view);
count
--------
548547
(1 row)
The view is simple as:
CREATE VIEW data_sources_view
AS SELECT *
FROM (
(
SELECT a, b, 'data_source_1' as source
FROM data_source_1
)
UNION ALL
(
SELECT a, b, 'data_source_2' as source
FROM data_source_2
)
);
What I want to know:
How is that possible to have dups in a view where source tables doesn't have dups + 'data_source_x' as source eliminates the possibility of overlapping data.
How to identify dups?
What I've tried:
db# create table t1 as select * from data_sources_view;
SELECT
db=#
db=# create table t2 as select distinct * from data_sources_view;
SELECT
db=# create table t3 as select * from t1 minus select * from t2;
SELECT
db=# select 't1' as table_name, count(*) from t1 UNION ALL
db-# select 't2' as table_name, count(*) from t2 UNION ALL
db-# select 't3' as table_name, count(*) from t3;
table_name | count
------------+--------
t1 | 548547
t3 | 0
t2 | 538714
(3 rows)
Database:
Redshift (PostgreSQL)

The reason is because your data sources have more than two columns. If you do these counts:
select count(*) from (select distinct a, b from data_source_1);
and
select count(*) from (select distinct a, b from data_source_2);
You should find that they are different from the count(*) you get on the same table.

UNION vs UNION ALL
UNION - If the data exist in the TOP Query it's suppressed in the bottom query.
OUTPUT
FOO
UNION ALL - The data repeats as the data exist in both tables (shows both records)
OUTPUT
FOO
FOO

Related

Can we select count(query) from table

SELECT COUNT(ANOTHER SELECT QUERY) FROM DUAL.
Can we get the results this way or is there any other way?
An example could help:
SQL> create table tabTest as (select 1 x from dual);
Table created.
SQL> select count( select * from tabTest ) from dual;
select count( select * from tabTest ) from dual
*
ERROR at line 1:
ORA-00936: missing expression
SQL> select count(*) from (select * from tabTest);
COUNT(*)
----------
1
You can use a derived table (aka "sub-query")
select count(*)
from (
.... your query here ...
);

Merge three tables in Select query by rule 3, 2, 1 records from each table

Merge three tables in a Select query by rule 3, 2, 1 records from each table as follows:
TableA: ID, FieldA, FieldB, FieldC,....
TableB: ID, FieldA, FieldB, FieldC,....
TableC: ID, FieldA, FieldB, FieldC,....
ID : auto number in each table
FieldA will be unique in all three tables.
I am looking for a Select query to merge three tables as follows:
TOP three records from TableA sorted by ID
TOP two records from TableB sorted by ID
TOP 1 record from TableC sorted by ID
Repeat this until select all records from all three tables.
If some table has fewer records or does not meet the criteria, ignore that and continue with others.
My attempt:
I did it totally through programming way, like cursors and If conditions inside a SQL Server stored procedure.
It makes delay.
This requires a formula that takes row numbers from each table and transforms it into a series of integers that skips the desired values.
In the query below, I am adding some CTE for the sake of shortening the formula. The real magic is in the UNION. Also, I am adding an additional field for your control. Feel free to get rid of it.
WITH A_Aux as (
SELECT 'A' As FromTable, ROW_NUMBER() OVER (ORDER BY ID) AS RowNum, TableA.*
FROM TableA
), B_Aux AS (
SELECT 'B' As FromTable, ROW_NUMBER() OVER (ORDER BY ID) AS RowNum, TableB.*
FROM TableB
), C_Aux AS (
SELECT 'C' As FromTable, ROW_NUMBER() OVER (Order BY ID) AS RowNum, TableC.*
FROM TableC
)
SELECT *
FROM (
SELECT RowNum+3*FLOOR((RowNum-1)/3) As ColumnForOrder, A_Aux.* FROM A_Aux
UNION ALL
SELECT 3+RowNum+4*FLOOR((RowNum-1)/2), B_Aux.* FROM B_Aux
UNION ALL
SELECT 6*RowNum, C_Aux.* FROM C_Aux
) T
ORDER BY ColumnForOrder
PS: note the pattern Offset + RowNum + (6-N) * Floor((RowNum-1)/N) to group N records together (it of course simplifies a lot for TableC).
PPS: I don't have a SQL server at hand to test it. Let me know if there is a syntax error.
You may try this..
GO
select * into #temp1 from (select * from table1) as t1
select * into #temp2 from (select * from table2) as t2
select * into #temp3 from (select * from table3) as t3
select * into #final from (select col1, col2, col3 from #temp1 where 1=0) as tb
declare #i int
set #i=1
while( (select COUNT(*) from #temp1)>#i)
Begin
;with ct1 as (
select ROW_NUMBER() over (order by id) as Slno, * from #temp1
),ct2 as (
select ROW_NUMBER() over (order by id) as Slno, * from #temp2
),ct3 as (
select ROW_NUMBER() over (order by id) as Slno, * from #temp3
),cfinal as (
select top 3 * from #temp1
union all
select top 2 * from #temp2
union all
select top 1 * from #temp3
)
insert into #final ( col1 , col2, col3 )
select col1, col2, col3 from cfinal
delete from #temp1 where id in (select top 3 ID from #temp1)
delete from #temp2 where id in (select top 2 ID from #temp2)
delete from #temp3 where id in (select top 1 ID from #temp3)
set #i = #i+1
End
Select * from #final
Drop table #temp1
Drop table #temp2
Drop table #temp3
GO
First create temp table for all 3 tables with each insert delete the inserted record and this will result you the desired result, if nothing is missing from my side.
Please see to this if this works.
There is not a lot of information to go with here, but I assume you can use UNION to combine multiple statements.
SELECT * TableA ORDER BY ID DESC OFFSET 3 ROWS
UNION
SELECT * TableB ORDER BY ID DESC OFFSET 2 ROWS
UNION
SELECT * TableC ORDER BY ID DESC OFFSET 1 ROWS
Execute and see if this works.
/AF
From my understanding, I create three temp tables as ta, tb, tc.
select * into #ta from (
select 'A' a
union all
select 'A' a
union all
select 'A' a
union all
select 'A' a
union all
select 'A' a
union all
select 'A' a
union all
select 'A' a
) a
select * into #tb from (
select 'B' b
union all
select 'B'
union all
select 'B'
union all
select 'B'
union all
select 'B'
) b
select * into #tc from (
select 'C' c
union all
select 'C'
union all
select 'C'
union all
select 'C'
union all
select 'C'
) c
If tables match you tables, then the output looks like A,A,A,B,B,C,A,A,A,B,B,C,A,B,C,C,C
T-SQL
declare #TAC int = (select count (*) from #ta) -- Table A Count = 7
declare #TBC int = (select count (*) from #tb) -- Table B Count = 5
declare #TAR int = #TAC % 3 -- Table A Reminder = 1
declare #TBR int = #TBC % 2 -- Table B Reminder = 1
declare #TAQ int = (#TAC - #TAR) / 3 -- Table A Quotient = (7 - 1) / 3 = 2, is will passed on NTILE
-- So we gonna split as two group (111), (222)
declare #TBQ int = (#TBC - #TBR) / 2 -- Table B Quotient = (5 - 1) / 2 = 2, is will passed on NTILE
-- So we gonna split as two group (11), (22)
select * from (
select *, NTILE (#TAQ) over ( order by a) FirstOrder, 1 SecondOrder from (
select top (#TAC - #TAR) * from #ta order by a
) ta -- 6 rows are obtained out of 7.
union all
select *, #TAQ + 1, 1 from (
select top (#TAR) * from #ta order by a desc
) ta -- Remaining one row is obtained. Order by desc is must
-- Here FirstOrder is next value of previous value.
union all
select *, NTILE (#TBQ) over ( order by b), 2 from (
select top (#TBC - #TBR) * from #tb order by b
) tb
union all
select *, #TBQ + 1, 2 from (
select top (#TBR) * from #tb order by b desc
) tb
union all
select *, ROW_NUMBER () over (order by c), 3 from #tc
) abc order by FirstOrder, SecondOrder
Let me explain the T-SQL:
Before that, FYR: NTILE and Row Number
Get the count.
Find the Quotient which will pass to NTILE function.
Order by the NTILE value and static.
Note:
I am using SQL Server 2017.
If T-SQL works fine, then you need to change the column in order by <yourcolumn>.

find difference using UNION and minus

I am bit out of my wits why the sql below would not produce any row. Clearly there is an id 1 which is not in b and I expected that to be the output. I know I am missing some fundamentals on how union works - may be due to the fact that there is not output in the second minus?
Redshift:
WITh a as
(select 1 id union all select 2
)
,b as (select 2 id)
select * from a
minus
select * from b
union all
select * from b
minus
select * from a
Oracle-
WITh a as
(select 1 id from dual union all select 2 from dual
)
,b as (select 2 id from dual)
select * from a
minus
select * from b
union all
select * from b
minus
select * from a
There is an order of operations issue with the way you wrote your query. If you wrap the two sides of the union as subqueries, and select from them, then you get the result you expect:
select * from
(select * from a
minus
select * from b ) t1
union all
select * from
(select * from b
minus
select * from a ) t2
What appears to be happening is that first the following is run, leaving us with id=1:
select * from a
minus
select * from b
Then, this result is being unioned with a query on b:
(select * from a
minus
select * from b)
union all
select * from b
At this point, the result set again has both 1 and 2 in it. But now, we take a minus operation against table a:
(select * from a
minus
select * from b
union all
select * from b)
minus
select * from a
This results in an empty set, since (1,2) minus (1,2) leaves us with nothing.

Issue with insert after subquery - Oracle

I've a sub list of subqueries something like below
WITH QRY1 AS (.. SOME PL SQL STATEMNT),
QRY2 (.. SELECT X,Y,Z,QRY1.* ),
QRY3 (.. SOME SELECT * AGAIN USING QRY2)
and finally
SELECT * FROM QRY3;
is there a way that I can do
INSERT INTO table_name (a,b,c,d)
SELECT * FROM QRY3;
You just need to put the CTE as part of the select, after the insert itself, rather then before it:
INSERT INTO table_name (a,b,c,d)
WITH QRY1 AS (.. SOME PL SQL STATEMNT),
QRY2 (.. SELECT X,Y,Z,QRY1.* ),
QRY3 (.. SOME SELECT * AGAIN USING QRY2)
-- and finally
SELECT * FROM QRY3;
Quick demo:
create table table_name (a number, b number, c number, d number);
insert into table_name (a,b,c,d)
with qry1 as (select 4 as d from dual),
qry2 as (select 2 as b, 3 as c, qry1.* from qry1),
qry3 as (select 1 as a, qry2.* from qry2)
select * from qry3;
1 row inserted.
select * from table_name;
A B C D
---------- ---------- ---------- ----------
1 2 3 4

Get the number of columns in two SQL tables in a single query?

I've written two SQL statements:
SELECT Count(*) AS table1Count FROM table1 WHERE foo=1;
SELECT Count(*) AS table2Count FROM table2 WHERE bar=2;
Both statements return the data I want, but I would like to know how to return a single table with two cells: table1Count and table2Count from a single query.
How do I do construct the query?
SELECT (SELECT Count(*) AS table1Count FROM table1 WHERE foo=1) AS table1Count,
(SELECT Count(*) AS table2Count FROM table2 WHERE bar=2) AS table2Count;
Gives something like:
table1count | table2count
-------------+-------------
4 | 6
(1 row)
With UNION ALL:
SELECT 'Table1' AS "Table", Count(*) As "Count" FROM table1 WHERE foo=1
UNION ALL
SELECT 'Table2' AS "Table", Count(*) As "Count" FROM table2 WHERE bar=2;
Will produce:
Table | Count
---------------
Table1 | 1
Table2 | 2