PL/SQL SELECT with multiple COUNT(DISTINCT xxx) - unexpected results

PL/SQL SELECT with multiple COUNT(DISTINCT xxx) - unexpected results - sql

I'm trying to put together a query for an Oracle 11g application and I've run into a problem.
I'll simplify the real scenario to make it easier to understand (and also to protect the client's data):
Table A is the base table. It has a known identifier in it that I pass in to the query.
For each entry in Table A there may be multiple entries in Table B. Table B contains a value that I am interested in.
For each entry in Table B there may also be multiple entries in Table C. Table C contains another value I'm interested in.
I also have an XML snippet containing a list of values that may or may not match up to the values of interest in table C.
The query does an outer join to the XML so that if there is a matching value it will return the value again, otherwise it is null.
What I want to do is get back the identifier I passed in, a count of the unique values in B and C, as well as a count of the unique (and non-null) values from the XML part of the join.
My current query is:
SELECT
a.ID
, COUNT(DISTINCT b.VAL) AS B_VAL
, COUNT(DISTINCT c.VAL) AS C_VAL
, COUNT(DISTINCT xml.VAL) AS XML_VAL
FROM a, b, c,
XMLTABLE('/field1/collection/value' passing my_xml_type
COLUMNS VAL VARCHAR2(50) PATH '.') xml
WHERE
a.ID = b.SOME_ID
AND b.OTHER_ID = c.OTHER_ID
AND c.VAL = xml.VAL (+)
Now if you forget about the counting and just return rows, an example result set might look something like this:
ID B_VAL C_VAL XML_VAL
---------------------------------------
X abc 123 123
X abc 456 null
X abc 789 789
X abc 789 789
DESIRED: Now when I want to do the distinct counts, I'd like it to return:
ID B_VAL C_VAL XML_VAL
---------------------------------------
X 1 3 2
ACTUAL: However, this is what I'm getting when I have them all as COUNT(DISTINCT ...):
ID B_VAL C_VAL XML_VAL
---------------------------------------
X 1 1 1
ALTERNATIVE: ...and if I take the DISTINCT out of the counts then I get:
ID B_VAL C_VAL XML_VAL
---------------------------------------
X 1 4 3
How come the DISTINCT seems to be operating only within a particular B_VAL, but taking it out causes it to operate across all the rows but not taking uniqueness into account?
Is there another way of doing this that doesn't involve having to replicate all the joins as a sub-query? Have I missed the point entirely?
(Please note, I'm not a DB developer at all, I've just been pulled in to help out, so sorry if this is an easy problem... I HAVE searched Google and browsed this site for answers before posting, though!)
Thanks.
I've found that if I take the XML table join out then the count distinct works OK across the B_VAL and C_VAL... So perhaps it's something weird with how Oracle handles XML table joins?

As Vincent's test case works in 10.2.0.3 and 11.2.0.2, and if you're at an earlier version of 11g, this could be bug 8816675: XMLexists query returns wrong results with a select DISTINCT. The example in the bug is referring to a problem with count(distinct). You aren't explicitly using XMLexists, but the bug may have a wider impact then the title suggests, or it may be used under the hood.
If this is the problem, and you can't patch up, you might be able to work around it by wrapping the non-count version, which still isn't pretty:
SELECT
A_ID
, COUNT(DISTINCT B_VAL) AS B_VAL
, COUNT(DISTINCT C_VAL) AS C_VAL
, COUNT(DISTINCT XML_VAL) AS XML_VAL
FROM (
SELECT a.ID as A_ID, b.VAL as B_VAL, c.VAL as C_VAL, xml.VAL as XML_VAL
FROM a, b, c
, XMLTABLE('/field1/collection/value' passing my_xml_type
COLUMNS VAL VARCHAR2(50) PATH '.') xml
WHERE a.ID = b.SOME_ID
AND b.OTHER_ID = c.OTHER_ID
AND c.VAL = xml.VAL (+)
)
GROUP BY A_ID;

I can't reproduce your finding with Oracle 10.2.0.3.
Here's my setup:
SQL> CREATE TABLE a AS SELECT 'X' ID FROM dual;
Table created
SQL> CREATE TABLE b AS SELECT 'abc' val, 'X' some_id, 1 other_id FROM dual;
Table created
SQL> CREATE TABLE c AS
2 SELECT 1 other_id, '123' val,
3 XMLTYPE('<field1>
4 <collection><value>123</value></collection>
5 </field1>') my_xml_type
6 FROM dual UNION ALL
7 SELECT 1 other_id, '456' val, NULL FROM dual UNION ALL
8 SELECT 1 other_id, '789' val,
9 XMLTYPE('<field1>
10 <collection><value>789</value></collection>
11 <collection><value>789</value></collection>
12 </field1>') my_xml_type
13 FROM dual;
Table created
the query returns the right result:
SQL> SELECT
2 a.ID
3 , COUNT(DISTINCT b.VAL) AS B_VAL
4 , COUNT(DISTINCT c.VAL) AS C_VAL
5 , COUNT(DISTINCT xml.VAL) AS XML_VAL
6 FROM a, b, c
7 , XMLTABLE('/field1/collection/value' passing my_xml_type
8 COLUMNS VAL VARCHAR2(50) PATH '.') xml
9 WHERE a.ID = b.SOME_ID
10 AND b.OTHER_ID = c.OTHER_ID
11 AND c.VAL = xml.VAL (+)
12 GROUP BY a.id;
ID B_VAL C_VAL XML_VAL
-- ---------- ---------- ----------
X 1 3 2
Can you run this test case?

Related

How to unescape % in LIKE clause

I have my search patterns stored in database in patterns table. For example my table column name_pattern contains string 'Basic%'. I'd like to create dynamic search where search patterns will be fetched from name_pattern column.
So my SQL query should look something like:
SELECT *
FROM products
WHERE product_name LIKE name_pattern <-- somehow joined from patterns table
Seems that Oracle escapes % in my string but I want to take it unescapped in order my query to work like:
SELECT *
FROM products
WHERE product_name LIKE 'Basic%'
I found that my problem is with stable set of rows:
CREATE TABLE patterns(code CHAR(1),name_pattern VARCHAR2(20));
INSERT INTO patterns(code,name_pattern) VALUES('B','Basic%');
INSERT INTO patterns(code,name_pattern) VALUES('T','%thing');
CREATE TABLE products (id NUMBER,name VARCHAR2(20),code CHAR(1));
INSERT INTO products(id,name,found) VALUES(1,'Basic instinct',NULL);
INSERT INTO products(id,name,found) VALUES(2,'Basic thing',NULL);
INSERT INTO products(id,name,found) VALUES(3,'Super thing',NULL);
INSERT INTO products(id,name,found) VALUES(4,'Hyper instinct',NULL);
MERGE INTO products p USING
(
SELECT code,name_pattern FROM patterns
) s
ON (p.name LIKE s.name_pattern)
WHEN MATCHED THEN UPDATE SET p.code=s.code;
SELECT * FROM products;
If my search patterns were Basic% and Super% in patterns table then this MERGE will work, but if my search patterns are Basic% and %thing, the second product should be marked with both codes 'B' and 'T' and that causes the error:
ORA-30926: unable to get a stable set of rows in the source tables
So my problem is not in (un)escaping :-(, sorry

You don't have to (un)escape anything, I'd say.
SQL> with
2 patterns (name_pattern) as
3 (select 'Basic%' from dual union all
4 select '%foot%' from dual
5 ),
6 products (id, name) as
7 (select 1, 'Basic instinct' from dual union all
8 select 2, 'Visual Basic' from dual union all
9 select 3, 'Littlefoot' from dual union all
10 select 4, 'Happy feet' from dual
11 )
12 select b.id, b.name, a.name_pattern
13 from products b join patterns a on b.name like a.name_pattern;
ID NAME NAME_P
---------- -------------- ------
1 Basic instinct Basic%
3 Littlefoot %foot%
SQL>
Based on test case you provided: don't merge, update!
SQL> update products p set
2 p.found = 1
3 where exists (select null
4 from patterns o
5 where p.name like o.name_pattern
6 );
3 rows updated.
SQL> select * from products;
ID NAME FOUND
---------- -------------------- ----------
1 Basic instinct 1
2 Basic thing 1
3 Super thing 1
4 Hyper instinct 0
SQL>
After you changed your mind (again), it is still update. Though, you didn't explain which code you want to take when there's multiple match (for example, product 2 matches both "Basic%" and "%thing") so I took any of them, using the min function.
SQL> update products p set
2 p.code = (select min(o.code)
3 from patterns o
4 where p.name like o.name_pattern
5 );
4 rows updated.
SQL> select * from products;
ID NAME CODE
---------- -------------------- ----------
1 Basic instinct B
2 Basic thing B
3 Super thing T
4 Hyper instinct NULL
SQL>

Display missing rows (oracle, sql)

I have run below select:
select replace(replace(id,'[',''),']','') as ID from tableA where COL1= 'TEST';
It returns 15 rows.
example of id:
1abc
3def
9abc
..
..
..
14abc
Then I'm looking this ID into other table:
select col1, col3 from tableB where
id in (select replace(replace(id,'[',''),']','') from tableA where COL1= 'TEST');
It returns 12 rows.
1abc city1
2def city2
5abc city2
.. ..
12abc city3
How to display missing 3 rows?

I suggest that there aren't actually any missing rows, but rather that 3 of the 15 rows returned by the first query are actually duplicate id values.
To see how this might work, consider that the first query returned the following 5 id values (for the sake of simplicity):
1
1
1
2
2
There are in fact 5 id values, but only 2 are actually unique. Then, the following WHERE clause:
WHERE id IN (1, 1, 1, 2, 2)
is equivalent to just saying:
WHERE id IN (1, 2)
Another possibility to this might be that tableB just does not contain every id returned by the first query.
To find the missing rows, here is one way:
WITH cte AS (
SELECT REPLACE(REPLACE(id, '[', ''), ']', '') AS ID
FROM tableA
WHERE COL1= 'TEST'
)
SELECT a.ID
FROM cte a
LEFT JOIN tableB b
ON a.ID = b.ID
WHERE b.ID IS NULL;

Determine source on COALESCE fields

I have two tables table which are identical in structure but belong to different schemas (schemas A and B). All rows in question will always appear in the A.table but may or may not appear in B.table. B.table is essentially an override for the defaults in A.table.
As such my query uses a COALESCE on each field similar to:
SELECT COALESCE(B.id, A.id) as id,
COALESCE(B.foo, A.foo) as foo,
COALESCE(B.bar, A.bar) as bar
FROM A.table LEFT JOIN B.table ON (A.id = B.id)
WHERE A.id in (1, 2, 3)
This works great, but I also want to add the source of the data. In the example above, assuming id=2 existed in B.table but not 1 or 3, I would want to include some indication that A is the source for 1 and 3 and B is the source for 2.
So the data might look like the following
+---------------------------------+
| id | foo | bar | source |
+---------------------------------+
| 1 | a | b | A |
| 2 | c | d | B |
| 3 | e | f | A |
+---------------------------------+
I don't really care what the value of source is as long as I can distinguish A from B.
I am no pgsql expert (not by a long shot) but I have tinkered around with EXISTS and a subquery but have had no luck so far.

As records showing the default value (from A.table) have NULLs for B.id, all you need is to add this column specification to your query:
CASE WHEN B.id IS NULL THEN 'A' ELSE 'B' END AS Source

The USING clause would simplify the query you have:
SELECT id
, COALESCE(B.foo, A.foo) AS foo
, COALESCE(B.bar, A.bar) AS bar
, CASE WHEN b.id IS NULL THEN 'A' ELSE 'B' END AS source -- like #Terje provided
FROM a
LEFT JOIN b USING (id)
WHERE a.id IN (1, 2, 3);
But typically, this alternative query should serve you better:
SELECT x.* -- or list columns of your choice
FROM (VALUES (1), (2), (3)) t (id)
, LATERAL (
SELECT *, 'B' AS source FROM b WHERE id = t.id
UNION ALL
SELECT *, 'A' FROM a WHERE id = t.id
LIMIT 1
) x
ORDER BY x.id;
Advantages:
You don't have to add another COALESCE construct for every column you want to add to the result.
The same query works for any number of columns in a and b.
The query even works if the column names are not identical. Only number and data types of columns must match.
Of course, you can always list selected, compatible columns as well:
SELECT * -- or list columns of your choice
FROM (VALUES (1), (2), (3)) t (id)
, LATERAL (
SELECT foo, bar, 'B' AS source FROM b WHERE id = t.id
UNION ALL
SELECT foo2, bar17, 'A' FROM a WHERE id = t.id
LIMIT 1
) x
ORDER BY x.id;
The first SELECT determines names, data types and number of columns.
This query doesn't break if columns in b are not defined NOT NULL.
COALESCE cannot tell the difference between b.foo IS NULL and no row with matching id in b. So the source of any result column (except id) can still be 'A', even if the result row says 'B' - if any relevant column in b can be NULL.
My alternative returns all values from b if the row exists - including NULL values. So the result can be different if columns in b can be NULL. It depends on your requirements which behavior is desirable.
Either query assumes that id is defined as primary key (so exactly 1 or 0 rows per given id value).
Related:
Select first record if none match
What is the difference between LATERAL and a subquery in PostgreSQL?

create a table of duplicated rows of another table using the select statement

I have a table with one column containing different integers.
For each integer in the table I would like to duplicate it as the number of digits -
For example:
12345 (5 digits):
1. 12345
2. 12345
3. 12345
4. 12345
5. 12345
I thought doing it using with recursion t (...) as () but I didn't manage, since I don't really understand how it works and what is happening "behind the scenes.
I don't want to use insert because I want it to be scalable and automatic for as many integers as needed in a table.
Any thoughts and an explanation would be great.

The easiest way is to join to a table with numbers from 1 to n in it.
SELECT n, x
FROM yourtable
JOIN
(
SELECT day_of_calendar AS n
FROM sys_calendar.CALENDAR
WHERE n BETWEEN 1 AND 12 -- maximum number of digits
) AS dt
ON n <= CHAR_LENGTH(TRIM(ABS(x)))
In my example I abused TD's builtin calendar, but that's not a good choice, as the optimizer doesn't know how many rows will be returned and as the plan must be a Product Join it might decide to do something stupid. So better use a number table...

Create a numbers table that will contain the integers from 1 to the maximum number of digits that the numbers in your table will have (I went with 6):
create table numbers(num int)
insert numbers
select 1 union select 2 union select 3 union select 4 union select 5 union select 6
You already have your table (but here's what I was using to test):
create table your_table(num int)
insert your_table
select 12345 union select 678
Here's the query to get your results:
select ROW_NUMBER() over(partition by b.num order by b.num) row_num, b.num, LEN(cast(b.num as char)) num_digits
into #temp
from your_table b
cross join numbers n
select t.num
from #temp t
where t.row_num <= t.num_digits

I found a nice way to perform this action. Here goes:
with recursive t (num,num_as_char,char_n)
as
(
select num
,cast (num as varchar (100)) as num_as_char
,substr (num_as_char,1,1)
from numbers
union all
select num
,substr (t.num_as_char,2) as num_as_char2
,substr (num_as_char2,1,1)
from t
where char_length (num_as_char2) > 0
)
select *
from t
order by num,char_length (num_as_char) desc

How do you find a missing number in a table field starting from a parameter and incrementing sequentially?

Let's say I have an sql server table:
NumberTaken CompanyName
2 Fred 3 Fred 4 Fred 6 Fred 7 Fred 8 Fred 11 Fred
I need an efficient way to pass in a parameter [StartingNumber] and to count from [StartingNumber] sequentially until I find a number that is missing.
For example notice that 1, 5, 9 and 10 are missing from the table.
If I supplied the parameter [StartingNumber] = 1, it would check to see if 1 exists, if it does it would check to see if 2 exists and so on and so forth so 1 would be returned here.
If [StartNumber] = 6 the function would return 9.
In c# pseudo code it would basically be:
int ctr = [StartingNumber]
while([SELECT NumberTaken FROM tblNumbers Where NumberTaken = ctr] != null)
ctr++;
return ctr;
The problem with that code is that is seems really inefficient if there are thousands of numbers in the table. Also, I can write it in c# code or in a stored procedure whichever is more efficient.
Thanks for the help

Fine, if this question isn't going to be closed, I may as well Copy and paste my answer from the other one:
I called my table Blank, and used the following:
declare #StartOffset int = 2
; With Missing as (
select #StartOffset as N where not exists(select * from Blank where ID = #StartOffset)
), Sequence as (
select #StartOffset as N from Blank where ID = #StartOffset
union all
select b.ID from Blank b inner join Sequence s on b.ID = s.N + 1
)
select COALESCE((select N from Missing),(select MAX(N)+1 from Sequence))
You basically have two cases - either your starting value is missing (so the Missing CTE will contain one row), or it's present, so you count forwards using a recursive CTE (Sequence), and take the max from that and add 1
Tables:
create table Blank (
ID int not null,
Name varchar(20) not null
)
insert into Blank(ID,Name)
select 2 ,'Fred' union all
select 3 ,'Fred' union all
select 4 ,'Fred' union all
select 6 ,'Fred' union all
select 7 ,'Fred' union all
select 8 ,'Fred' union all
select 11 ,'Fred'
go

I would create a temp table containing all numbers from StartingNumber to EndNumber and LEFT JOIN to it to receive the list of rows not contained in the temp table.

If NumberTaken is indexed you could do it with a join on the same table:
select T.NumberTaken -1 as MISSING_NUMBER
from myTable T
left outer join myTable T1
on T.NumberTaken= T1.NumberTaken+1
where T1.NumberTaken is null and t.NumberTaken >= STARTING_NUMBER
order by T.NumberTaken
EDIT
Edited to get 1 too

1> select 1+ID as ID from #b as b
where not exists (select 1 from #b where ID = 1+b.ID)
2> go
ID
-----------
5
9
12
Take max(1+ID) and/or add your starting value to the where clause, depending on what you actually want.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas