Possible Oracle-Bug with SubQueries and Group Functions - sql

can anybody explain me why the following Query returns two Rows and not only one?
SELECT *
FROM (SELECT 'ASDF' c1, MAX (SUM (1)) c2
FROM DUAL
GROUP BY dummy
UNION
SELECT 'JKLÖ' c1, 1 c2
FROM DUAL)
WHERE c1 != 'ASDF';
--another Version with the same wrong result:
SELECT *
FROM (SELECT 1 c1, MAX (SUM (1)) c2
FROM DUAL
GROUP BY dummy
UNION all
SELECT 2 c1, 1 c2
FROM DUAL)
WHERE c1 != 1;
Is it correct that Oracle delivers two rows? In my opinion the Row with c1 = ASDF should not be in the result.
Here is a Screenshot of the result from the first query:
I have tested it on the following Versions, always with the same result:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

No this is not a bug. Aggregate functions are the reason why you see this unexpected result. Here is how it works. SUM() function as well as MAX() function will return NULL(producing 1 row) if there is no rows returned by the query. When your query is executed optimizer applies predicate pushing transformation and your original query becomes(will not post the entire trace, only transformed query):
SELECT "from$_subquery$_001"."C1" "C1",
"from$_subquery$_001"."C2" "C2"
FROM (
(SELECT 'ASDF' "C1",MAX(SUM(1)) "C2"
FROM "SYS"."DUAL" "DUAL"
WHERE 'ASDF'<>'ASDF' [1]-- predicate pushed into the view
GROUP BY "DUAL"."DUMMY" )
UNION
(SELECT 'JKLÖ' "C1",
1 "C2"
FROM "SYS"."DUAL" "DUAL"
WHERE 'JKLÖ'<>'ASDF')) "from$_subquery$_001"
[1] Because of predicate pushing your fist sub-query returns no rows and when an aggregate function(except count and few others), MAX or SUM or even both as in this case used on empty result set NULL will be returned - 1 row + 1 row return by the second sub-query thus producing 2 rows result set you are looking at.
Here is simple demonstration:
create table empty_table (c1 varchar2(1));
select 'aa' literal, nvl(max(c1), 'NULL') as res
from empty_table
LITERAL RES
------- ----
aa NULL
1 row selected.

It definitely looks like a bug.
I don't really know how to read explain plans, but here it is. It seems to me the predicate has been pushed to only one of the UNION members and it has been transformed into "NULL IS NOT NULL" which is totally weird.
Note that the strings could be changed to 'a' and 'b' (so we don't use special characters), UNION and UNION ALL produce the same bug, and the bug seems to be triggered by the MAX(SUM(1)) in the first branch; simply replacing that with NULL or anything else that's "simple", or even with SUM(1) (without the MAX) causes the query to work correctly.
ADDED: Strangely, if I change MAX(SUM(1)) to either MAX(1) or SUM(1), or if I simply change it to the literal number 1, the query works correctly - but the Explain Plan still shows the same weird predicate, "NULL IS NOT NULL". So, it seems the problem is that the predicate is not pushed to both branches of the union, not the predicate transformation. (And even that doesn't explain why c2 appears as NULL in the extra row in the result set.) MORE ADDED (see Comments below) - as it turns out, the predicate IS pushed to both branches of the UNION, and this is exactly what causes the problem (as Nicholas explains in his answer).
Plan hash value: 1682090214
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 32 | 2 (0)| 00:00:01 |
| 1 | VIEW | | 2 | 32 | 2 (0)| 00:00:01 |
| 2 | UNION-ALL | | | | | |
| 3 | SORT AGGREGATE | | 1 | 2 | | |
| 4 | HASH GROUP BY | | 1 | 2 | | |
|* 5 | FILTER | | | | | |
| 6 | TABLE ACCESS FULL| DUAL | 1 | 2 | 2 (0)| 00:00:01 |
| 7 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - filter(NULL IS NOT NULL)

A much simpler example results in the same error:
SELECT 'ASDF' c1, MAX (SUM (1)) c2
FROM DUAL where 'ASDF' <> 'ASDF'
GROUP BY dummy
I must confess that I am totally confused. Why doesn't the filter eliminate the record, thereby eliminating any result set?

Related

listagg produces ORA-01489 if used as window function in conditional expression

My query returns many (thousands of) rows.
Column l has certain value for very small amount of rows (up to 10).
For each such row I want to output aggregated comma-separated values of very short (up to 5 chars) varchar column v over all of these rows.
For rows not having the special value of l I want to simply output the v value for that row.
Synthetized example of same problem: from first 10000 integers, I want to output 1,2,3,4,5,6,7,8,9 for each single-digit number; that number for multiple-digit number. (Yes, silly example but real case makes sense.)
with x (v,l) as (
select to_char(level), length(to_char(level)) from dual connect by level <= 10000
)
select case l
when 1 then listagg(v,',') within group (order by v) over (partition by l)
else v
end
from x
order by 1;
The problem is, listagg function fails on ORA-01489: result of string concatenation is too long error.
I am aware of 4000 char limit of listagg function as well as xmlagg-based workaround. I just don't get the limit is enough for data I want to concatenate even though not enough for all data. In example above, the partition of 9 single-digit numbers fits into 4000 chars, the partition of 9000 four-digit numbers not. I expected the case expression would prevent execution of window for unrelated rows but, for some reason, it seems the db engine evaluates window for all rows. (Also note that order by clause causes query to fail-fast - without it some rows are returned before failure.)
Can you please explain some reasoning for this behaviour? I suspect the window computation is logically before select clause but without any evidence. Reproduced on Oracle 11g, 18c and 19 (livesql).
Well you are using SQL which is not procedural, so you can't expect that some parts of the code path will not be executed, only because they are not used. (So filling a bug as other suggested will have no success).
Anyway you can do the often used trick based on the fact that listagg ignores null values.
So this formulation works fine:
with x (v,l) as (
select to_char(level), length(to_char(level)) from dual connect by level <= 10000
)
select nvl(listagg(case when l = 1 then v end,',') within group (order by v) over (partition by l),v) lst
from x
order by 1;
giving
LST
------------------
1,2,3,4,5,6,7,8,9
1,2,3,4,5,6,7,8,9
..
10
100
1000
10000
The explanation of the problem can be found in the execution plan (showing only the relevant part)
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 35 | 4 (50)| 00:00:01 |
| 1 | SORT ORDER BY | | 1 | 35 | 4 (50)| 00:00:01 |
| 2 | WINDOW SORT | | 1 | 35 | 4 (50)| 00:00:01 |
| 3 | VIEW | | 1 | 35 | 2 (0)| 00:00:01 |
|* 4 | CONNECT BY WITHOUT FILTERING| | | | | |
| 5 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------------
...
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=1) CASE "L" WHEN 1 THEN LISTAGG("V",',') WITHIN GROUP ( ORDER BY
"V") OVER ( PARTITION BY "L") ELSE "V" END [4000]
2 - (#keys=2) "L"[NUMBER,22], "V"[VARCHAR2,40], LISTAGG("V",',') WITHIN
GROUP ( ORDER BY "V") OVER ( PARTITION BY "L")[4000]
3 - "V"[VARCHAR2,40], "L"[NUMBER,22]
4 - LEVEL[4]
So in the line 2 the listagg is calculated (for all rows) only to be filtered in the line 1.
It is odd that you do get an error about the 4000 character limit even though no result is longer than 4000 characters. Maybe you could file this as a bug to Oracle Support.
Another workaround is to make use of the ON OVERFLOW logic of the LISTAGG function if you are on Oracle 12.2 or higher. Using LISTAGG (v, ',' ON OVERFLOW TRUNCATE) in the query allows the query to be run without error and does not truncate any values (at least in the example).

Clarifications about some SQL Injection commands

I'm struggling with a CTF(Capture The Flag) Web Challange on hackthebox, not being an expert in penetration testing I'm asking your help to explain me (with some comments) some commands used to reach the solution, expecially about the syntax and logic of the commands themselves. (A reference to the commands can be found here (click me), so you have the whole situation very clear).
I ask you to be very detailed, even on things that may seem trivial.
Leaving aside the base64 encoding (that I understand) I need to understand these commands and their related parameters (syntax and logic of the commands):
1th: {"ID":"1"}
2nd: {"ID": "1' or 1-- -"}
3rd: {"ID": "-1' union select * from (select 1)table1 JOIN (SELECT 2)table2 on 1=1-- -"}
About the 3rd command, I saw the same command but with an alteration of the table names, like this:
{"ID": "-1' union select * from (select 1)UT1 JOIN (SELECT 2)UT2 on 1=1-- -"}
What is the difference? Is the name given to the tables in the query irrelevant?
If you need further clarification or I haven't made myself clear, just tell it and I'll try to help you. Thank you in advance.
The stage of hacking is: recon, scanning, gaining access, maintaining access, and clearing tracks. Basically it's just obtain information, then do something with that information It seems that this SQL injection learning module is used to teach how to obtain information about the current system.
The basic of SQL injection is inserting SQL code/command/syntax. It's usually done in the WHERE clause (because webapp often have search feature, which is basically retrieving user input and inserting it on the where clause.
For example, the simplest vulnerability would be like this (assuming MySQL and PHP):
SELECT * FROM mytable WHERE mycolumn='$_GET[myparam]'
Payload is what you put inside the parameter (ex: myparam) to do SQL injection.
With such query, you can inject payload 1' OR 1=1 to test for SQL injection vulnerability.
1st payload
1st payload is used to check if there is an injection point (parameter that can be injected) or not.
If you change the parameter and there is a change on the output, then it means there is an injection point.
Otherwise there is no injection point
2nd payload
2nd payload is used to check if the target app have SQL injection vulnerability or not (would the app sanitize user's input or not).
If the app shows all output, then it means the app have SQL injection vulnerability. Explanation: because the query sent to RDBMS would become something like this
Before injection:
SELECT col1, col2, ... colN FROM mytable WHERE col1='myparam'
After injection:
SELECT col1, col2, ... colN FROM mytable WHERE col1='1' or 1-- -'
Please note that in MySQL, -- (minus-minus-space) is used to mark inline comment. So the actual query would be: SELECT col1, col2, ... colN FROM mytable WHERE col1='1' or 1
3rd payload
3rd payload is used to check how many column the query would SELECT. To understand this you have to understand subquery, join, and union (do a quick search, it's a very basic concept). The name or the table alias is not important (UT1 or UT2), it's just identifier so that it's not identical with current table alias.
If the query succeed (no error, the app display output), then it means the app query SELECTs 2 columns
If the query failed, then it means it's not 2 column, you can change the payload to check for 3 columns, 4 columns, etc...
Example for checking if SELECT statement have 3 columns:
-1' union select * from (select 1)UT1 JOIN (SELECT 2)UT2 on 1=1 JOIN (SELECT 3)UT3 on 1=1 -- -
Tips: when learning about SQL injection, it's far easier to just type (or copy-paste) the payload to your SQL console (use virtual machine or sandbox if the query is considered dangerous).
Edit 1:
basic explanation of subquery and union
Subquery: It's basically putting a query inside another query. Subqueries may be inserted in SELECT clause, FROM clause, and WHERE clause.
Example of subquery in FROM clause:
select * from (select 'hello','world','foo','bar')x;
Example of subquery in WHERE clause:
select * from tblsample t1 where t1.price>(select avg(t2.price) from tblsample t2);
Union: concatenating select output, example:
tbl1
+----+--------+-----------+------+
| id | name | address | tele |
+----+--------+-----------+------+
| 1 | Rupert | Somewhere | 022 |
| 2 | John | Doe | 022 |
+----+--------+-----------+------+
tbl2
+----+--------+-----------+------+
| id | name | address | tele |
+----+--------+-----------+------+
| 1 | AAAAAA | DDDDDDDDD | 022 |
| 2 | BBBB | CCC | 022 |
+----+--------+-----------+------+
select * from tbl1 union select * from tbl2
+----+--------+-----------+------+
| id | name | address | tele |
+----+--------+-----------+------+
| 1 | Rupert | Somewhere | 022 |
| 2 | John | Doe | 022 |
| 1 | AAAAAA | DDDDDDDDD | 022 |
| 2 | BBBB | CCC | 022 |
+----+--------+-----------+------+
Edit 2:
further explanation on 3rd payload
In mysql, you can make a 'literal table' by selecting a value. Here is an example:
MariaDB [(none)]> SELECT 1;
+---+
| 1 |
+---+
| 1 |
+---+
1 row in set (0.00 sec)
MariaDB [(none)]> SELECT 1,2;
+---+---+
| 1 | 2 |
+---+---+
| 1 | 2 |
+---+---+
1 row in set (0.00 sec)
MariaDB [(none)]> SELECT 1 firstcol, 2 secondcol;
+----------+-----------+
| firstcol | secondcol |
+----------+-----------+
| 1 | 2 |
+----------+-----------+
1 row in set (0.00 sec)
The purpose of making this 'literal table' is to check how many column the SELECT statement that we inject have. For example:
MariaDB [(none)]> SELECT 1 firstcol, 2 secondcol UNION SELECT 3 thirdcol, 4 fourthcol;
+----------+-----------+
| firstcol | secondcol |
+----------+-----------+
| 1 | 2 |
| 3 | 4 |
+----------+-----------+
2 rows in set (0.07 sec)
MariaDB [(none)]> SELECT 1 firstcol, 2 secondcol UNION SELECT 3 thirdcol, 4 fourthcol, 5 fifthcol;
ERROR 1222 (21000): The used SELECT statements have a different number of columns
As shown above, when UNION is used on two select statement with different number of column, it'll throw an error. Therefore, you can get how many column a SELECT statement when it DOESN'T throw an error.
So, why don't we just use SELECT 1, 2 to generate a 'literal table' with 2 column? That's because the application's firewall block the usage of comma. Therefore we must go the roundabout way and make 2 columned 'literal table' with JOIN query SELECT * FROM (SELECT 1)UT1 JOIN (SELECT 2)UT2 ON 1=1
MariaDB [(none)]> SELECT * FROM (SELECT 1)UT1 JOIN (SELECT 2)UT2 ON 1=1;
+---+---+
| 1 | 2 |
+---+---+
| 1 | 2 |
+---+---+
1 row in set (0.01 sec)
Additional note: MariaDB is the 'free version' of MySQL (since MySQL was sold and made proprietary). MariaDB maintain more or less the same syntax and command as MySQL.

Best Way to Join One Column on Columns From Two Other Tables

I have a schema like the following in Oracle
Section:
+--------+----------+
| sec_ID | group_ID |
+--------+----------+
| 1 | 1 |
| 2 | 1 |
| 3 | 2 |
| 4 | 2 |
+--------+----------+
Section_to_Item:
+--------+---------+
| sec_ID | item_ID |
+--------+---------+
| 1 | 1 |
| 1 | 2 |
| 2 | 3 |
| 2 | 4 |
+--------+---------+
Item:
+---------+------+
| item_ID | data |
+---------+------+
| 1 | a |
| 2 | b |
| 3 | c |
| 4 | d |
+---------+------+
Item_Version:
+---------+----------+--------+
| item_ID | start_ID | end_ID |
+---------+----------+--------+
| 1 | 1 | |
| 2 | 1 | 3 |
| 3 | 2 | |
| 4 | 1 | 2 |
+---------+----------+--------+
Section_to_Item has FK into Section and Item on the *_ID columns.
Item_version is indexed on item_ID but has no FK to Item.item_ID (ran out of space in the snapshot group).
I have code that receives a list of version IDs and I want to get all items in sections in a given group that are valid for at least one of the versions passed in. If an item has no end_ID, it's valid for anything starting with start_ID. If it has an end_id, it's valid for anything up until (not including) end_ID.
What I currently have is:
SELECT Items.data
FROM Section, Section_to_Items, Item, Item_Version
WHERE Section.group_ID = 1
AND Section_to_Item.sec_ID = Section.sec_ID
AND Item.item_ID = Section_to_Item.item_ID
AND Item.item_ID = Item_Version.item_ID
AND exists (
SELECT *
FROM (
SELECT 2 AS version FROM DUAL
UNION ALL SELECT 3 AS version FROM DUAL
) passed_versions
WHERE Item_Version.start_ID <= passed_versions.version
AND (Item_Version.end_ID IS NULL or Item_Version.end_ID > passed_version.version)
)
Note that the UNION ALL statement is dynamically generated from the list of passed in versions.
This query currently does a cartesian join and is very slow.
For some reason, if I change the query to join
AND Item_Version.item_ID = Section_to_Item.item_ID
which is not a FK, the query does not do the cartesian join and is much faster.
A) Can anyone explain why this is?
B) Is this the right way to be joining this sequence of tables (I feel weird about joining Item.item_ID to two different tables)
C) Is this the right way to get versions between start_ID and end_ID?
Edit
Same query with inner join syntax:
SELECT Items.data
FROM Item
INNER JOIN Section_to_Items ON Section_to_Items.item_ID = Item.item_ID
INNER JOIN Section ON Section.sec_ID = Section_to_Items.sec_ID
INNER JOIN Item_Version ON Item_Version.item_ID = Item_.item_ID
WHERE Section.group_ID = 1
AND exists (
SELECT *
FROM (
SELECT 2 AS version FROM DUAL
UNION ALL SELECT 3 AS version FROM DUAL
) passed_versions
WHERE Item_Version.start_ID <= passed_versions.version
AND (Item_Version.end_ID IS NULL or Item_Version.end_ID > passed_version.version)
)
Note that in this case the performance difference comes from joining on Item_Version first and then joining Section_to_Item on Item_Version.item_ID.
In terms of table size, Section_to_Item, Item, and Item_Version should be similar (1000s) while Section should be small.
Edit
I just found out that apparently, the schema has no FKs. The FKs specified in the schema configuration files are ignored. They're just there for documentation. So there's no difference between joining on a FK column or not. That being said, by changing the joins into a cascade of SELECT INs, I'm able to avoid joining the entire Item table twice. I don't love the resulting query, and I don't really understand the difference, but the stats indicate it's much less work (changes the A-Rows returned from the inner most scan on Section from 656,000 to 488 (it used to be 656k starts returning 1 row, now it's 488 starts returning 1 row)).
Edit
It turned out to be stale statistics - the two queries were equivalent the whole time but with the incomplete statistics, the DB happened to notice the correct plan only in the second instance. After updating statistics, both queries generated the same plan.
I'm not sure if this is the best idea but this seems to avoid the cartesian join:
select data
from Item
where item_ID in (
select item_ID
from Item_Version
where item_ID in (
select item_ID
from Section_to_Item
where sec_ID in (
select sec_ID
from Section
where group_ID = 1
)
)
and exists (
select 1
from (
select 2 as version
from dual
union all
select 3 as version
from dual
) versions
where versions.version >= start_ID
and (end_ID is null or versions.version <)
)
)

SQL script runs VERY slowly with small change

I am relatively new to SQL. I have a script that used to run very quickly (<0.5 seconds) but runs very slowly (>120 seconds) if I add one change - and I can't see why this change makes such a difference. Any help would be hugely appreciated!
This is the script and it runs quickly if I do NOT include "tt2.bulk_cnt
" in line 26:
with bulksum1 as
(
select t1.membercode,
t1.schemecode,
t1.transdate
from mina_raw2 t1
where t1.transactiontype in ('RSP','SP','UNTV','ASTR','CN','TVIN','UCON','TRAS')
group by t1.membercode,
t1.schemecode,
t1.transdate
),
bulksum2 as
(
select t1.schemecode,
t1.transdate,
count(*) as bulk_cnt
from bulksum1 t1
group by t1.schemecode,
t1.transdate
having count(*) >= 10
),
results as
(
select t1.*, tt2.bulk_cnt
from mina_raw2 t1
inner join bulksum2 tt2
on t1.schemecode = tt2.schemecode and t1.transdate = tt2.transdate
where t1.transactiontype in ('RSP','SP','UNTV','ASTR','CN','TVIN','UCON','TRAS')
)
select * from results
EDIT: I apologise for not putting enough detail in here previously - although I can use basic SQL code, I am a complete novice when it comes to databases.
Database: Oracle (I'm not sure which version, sorry)
Execution plans:
QUICK query:
Plan hash value: 1712123489
---------------------------------------------
| Id | Operation | Name |
---------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | HASH JOIN | |
| 2 | VIEW | |
| 3 | FILTER | |
| 4 | HASH GROUP BY | |
| 5 | VIEW | VM_NWVW_0 |
| 6 | HASH GROUP BY | |
| 7 | TABLE ACCESS FULL| MINA_RAW2 |
| 8 | TABLE ACCESS FULL | MINA_RAW2 |
---------------------------------------------
SLOW query:
Plan hash value: 1298175315
--------------------------------------------
| Id | Operation | Name |
--------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | FILTER | |
| 2 | HASH GROUP BY | |
| 3 | HASH JOIN | |
| 4 | VIEW | VM_NWVW_0 |
| 5 | HASH GROUP BY | |
| 6 | TABLE ACCESS FULL| MINA_RAW2 |
| 7 | TABLE ACCESS FULL | MINA_RAW2 |
--------------------------------------------
A few observations, and then some things to do:
1) More information is needed. In particular, how many rows are there in the MINA_RAW2 table, what indexes exist on this table, and when was the last time it was analyzed? To determine the answers to these questions, run:
SELECT COUNT(*) FROM MINA_RAW2;
SELECT TABLE_NAME, LAST_ANALYZED, NUM_ROWS
FROM USER_TABLES
WHERE TABLE_NAME = 'MINA_RAW2';
From looking at the plan output it looks like the database is doing two FULL SCANs on MINA_RAW2 - it would be nice if this could be reduced to no more than one, and hopefully none. It's always tough to tell without very detailed information about the data in the table, but at first blush it appears that an index on TRANSACTIONTYPE might be helpful. If such an index doesn't exist you might want to consider adding it.
2) Assuming that the statistics are out-of-date (as in, old, nonexistent, or a significant amount of data (> 10%) has been added, deleted, or updated since the last analysis) run the following:
BEGIN
DBMS_STATS.GATHER_TABLE_STATS(owner => 'YOUR-SCHEMA-NAME',
table_name => 'MINA_RAW2');
END;
substituting the correct schema name for "YOUR-SCHEMA-NAME" above. Remember to capitalize the schema name! If you don't know if you should or shouldn't gather statistics, err on the side of caution and do it. It shouldn't take much time.
3) Re-try your existing query after updating the table statistics. I think there's a fair chance that having up-to-date statistics in the database will solve your issues. If not:
4) This query is doing a GROUP BY on the results of a GROUP BY. This doesn't appear to be necessary as the initial GROUP BY doesn't do any grouping - instead, it appears this is being done to get the unique combinations of MEMBERCODE, SCHEMECODE, and TRANSDATE so that the count of the members by scheme and date can be determined. I think the whole query can be simplified to:
WITH cteWORKING_TRANS AS (SELECT *
FROM MINA_RAW2
WHERE TRANSACTIONTYPE IN ('RSP','SP','UNTV',
'ASTR','CN','TVIN',
'UCON','TRAS')),
cteBULKSUM AS (SELECT a.SCHEMECODE,
a.TRANSDATE,
COUNT(*) AS BULK_CNT
FROM (SELECT DISTINCT MEMBERCODE,
SCHEMECODE,
TRANSDATE
FROM cteWORKING_TRANS) a
GROUP BY a.SCHEMECODE,
a.TRANSDATE)
SELECT t.*, b.BULK_CNT
FROM cteWORKING_TRANS t
INNER JOIN cteBULKSUM b
ON b.SCHEMECODE = t.SCHEMECODE AND
b.TRANSDATE = t.TRANSDATE
I managed to remove an unnecessary subquery, but this syntax with distinct inside count may not work outside of PostgreSQL or may not be the desired result. I know I've certainly used it there.
select t1.*, tt2.bulk_cnt
from mina_raw2 t1
inner join (select t2.schemecode,
t2.transdate,
count(DISTINCT membercode) as bulk_cnt
from mina_raw2 t2
where t2.transactiontype in ('RSP','SP','UNTV','ASTR','CN','TVIN','UCON','TRAS')
group by t2.schemecode,
t2.transdate
having count(DISTINCT membercode) >= 10) tt2
on t1.schemecode = tt2.schemecode and t1.transdate = tt2.transdate
where t1.transactiontype in ('RSP','SP','UNTV','ASTR','CN','TVIN','UCON','TRAS')
When you use those with queries, instead of subqueries when you don't need to, you're kneecapping the query optimizer.

Finding the difference between two sets of data from the same table

My data looks like:
run | line | checksum | group
-----------------------------
1 | 3 | 123 | 1
1 | 7 | 123 | 1
1 | 4 | 123 | 2
1 | 5 | 124 | 2
2 | 3 | 123 | 1
2 | 7 | 123 | 1
2 | 4 | 124 | 2
2 | 4 | 124 | 2
and I need a query that returns me the new entries in run 2
run | line | checksum | group
-----------------------------
2 | 4 | 124 | 2
2 | 4 | 124 | 2
I tried several things, but I never got to a satisfying answer.
In this case I'm using H2, but of course I'm interested in a general explanation that would help me to wrap my head around the concept.
EDIT:
OK, it's my first post here so please forgive if I didn't state the question precisely enough.
Basically given two run values (r1, r2, with r2 > r1) I want to determine which rows having row = r2 have a different line, checksum or group from any row where row = r1.
select * from yourtable
where run = 2 and checksum = (select max(checksum)
from yourtable)
Assuming your last run will have the higher run value than others, below SQL will help
select * from table1 t1
where t1.run in
(select max(t2.run) table1 t2)
Update:
Above SQL may not give you the right rows because your requirement is not so clear. But the overall idea is to fetch the rows based on the latest run parameters.
SELECT line, checksum, group
FROM TableX
WHERE run = 2
EXCEPT
SELECT line, checksum, group
FROM TableX
WHERE run = 1
or (with slightly different result):
SELECT *
FROM TableX x
WHERE run = 2
AND NOT EXISTS
( SELECT *
FROM TableX x2
WHERE run = 1
AND x2.line = x.line
AND x2.checksum = x.checksum
AND x2.group = x.group
)
A slightly different approach:
select min(run) run, line, checksum, group
from mytable
where run in (1,2)
group by line, checksum, group
having count(*)=1 and min(run)=2
Incidentally, I assume that the "group" column in your table isn't actually called group - this is a reserved word in SQL and would need to be enclosed in double quotes (or backticks or square brackets, depending on which RDBMS you are using).