Query minus problems? - sql

My query is not returning the expected amount of records
the first part returns 50.000 records
the second part (below the minus) returns 30.000
so as conclusion my minus should return 20.000 records.
only this is not happening more records are removed 21.000.
-- edit --
the count returns more rows that expected, these returned records will in a later step be removed.
Anyone suggestions?
-- select count(*) from (
SELECT
loc.ITEM,
loc.loc
FROM ITEM_LOC loc
WHERE LOC NOT IN(101,104,107,115,116,117)
and loc.status = 'A'
and primary_supp in (select supplier from item_supplier where supp_discontinue_date >= sysdate)
-- );
minus
--; select count(*) from (
select distinct item, store from (
SELECT
siv.ITEM,
sto.store
FROM DC_CCN190_SID_VTB siv
JOIN DC_STORE_RANGING str ON siv.dpac = str.dpac
join store sto on sto.store_name_secondary = cast(str.loc as varchar2(150 byte))
where sto.store_close_date >= sysdate
union
SELECT
pim.ITEM,
sto.store
FROM dc_pim_export_vert PIM
JOIN DC_STORE_RANGING str ON PIM.dpac = str.dpac
join store sto on sto.store_name_secondary = cast(str.loc as varchar2(150 byte))
where PIM.artikel_type_lms = 'D1'
and sto.store_close_date >= sysdate
)
-------------------------------------------------------------------------------------
count returns 50.000
select count(*) from (
SELECT
loc.ITEM,
loc.loc
FROM ITEM_LOC loc
WHERE LOC NOT IN(101,104,107,115,116,117)
and loc.status = 'A'
and primary_supp in (select supplier from item_supplier where supp_discontinue_date >= sysdate));
count returns 30.000
select count(*) from (
select distinct item, store from (
SELECT
siv.ITEM,
sto.store
FROM DC_CCN190_SID_VTB siv
JOIN DC_STORE_RANGING str ON siv.dpac = str.dpac
join store sto on sto.store_name_secondary = cast(str.loc as varchar2(150 byte))
where sto.store_close_date >= sysdate
union
SELECT
pim.ITEM,
sto.store
FROM dc_pim_export_vert PIM
JOIN DC_STORE_RANGING str ON PIM.dpac = str.dpac
join store sto on sto.store_name_secondary = cast(str.loc as varchar2(150 byte))
where PIM.artikel_type_lms = 'D1'
and sto.store_close_date >= sysdate
);
So the minus should return 20.000 right?

A = 50000. B = 30000. A - B = 21000. What is B - A? I'm expecting 1000. That is, there are 1000 records in B that are not included in A. In a simple case,
A returns 4 records
Jim
Bob
Mary
Samantha
B returns 3 records
Bob
Mary
Josephine
A - B returns 2 records:
Jim
Samantha
B - A returns 1 record:
Josephine

Use the
select count(*)
in both queries and compare the result with
select distinct count(*)
There should be a difference!
From HERE, "Additionally, if there are two identical rows in table_A, and that same row exists in table_B, BOTH rows from table_A will be removed from the result set."
If there's no difference, then only one explanation:
MINUS is a SQL set operation that selects elements from the first
table and then removes rows that are also returned by the second
SELECT statement.
Not all rows from the second second are present in the first one.

Related

Select random sample of N rows from Oracle SQL query result

I want to reduce the number of rows exported from a query result. I have had no luck adapting the accepted solution posted on this thread.
My query looks as follows:
select
round((to_date('2019-12-31') - date_birth) / 365, 0) as age
from
personal_info a
where
exists
(
select person_id b from credit_info where credit_type = 'C' and a.person_id = b.person_id
)
;
This query returns way more rows than I need, so I was wondering if there's a way to use sample() to select a fixed number of rows (not a percentage) from however many rows result from this query.
You can sample your data by ordering randomly and then fetching first N rows.
DBMS_RANDOM.RANDOM
select round((to_date('2019-12-31') - date_birth) / 365, 0) as age
From personal_info a
where exists ( select person_id b from credit_info where credit_type = 'C' and a.person_id = b.person_id )
Order by DBMS_RANDOM.RANDOM
Fetch first 250 Rows
Edit: for oracle 11g and prior
Select * from (
select round((to_date('2019-12-31') - date_birth) / 365, 0) as age
From personal_info a
where exists ( select person_id b from credit_info where credit_type = 'C' and a.person_id = b.person_id )
Order by DBMS_RANDOM.RANDOM
)
Where rownum< 250
You can use fetch first to return a fixed number of rows. Just add:
fetch first 100 rows
to the end of your query.
If you want these sampled in some fashion, you need to explain what type of sampling you want.
If you are using 12C, you can use the row limiting clause below
select
round((to_date('2019-12-31') - date_birth) / 365, 0) as age
from
personal_info a
where
exists
(
select person_id b from credit_info where credit_type = 'C' and a.person_id = b.person_id
)
FETCH NEXT 5 ROWS ONLY;
Instead of 5, you can use any number you want.

What should I do to print out the desired sql join result value?( SQL Join first match only)

I have two tables :
I want to use the chart_num value for two tables.
Table hospital_payment_data
id chart_num treatment_fees_difference treatment_fees_check_division
1 9 200000 test
2 9 -100000 test
3 10 200000 test
4 10 -100000 test
Table advenced_payment
id chart_num advenced_amount
1 9 100000
2 10 100000
I want result
if_treatment_fees_check_division sum_init_amount test COUNT
200000 200000 400000 4
However, when you send a query, the following results are printed.
SELECT
SUM(t_join.treatment_fees_difference) if_treatment_fees_check_division,
SUM(t_join.advenced_amount) sum_init_amount,
SUM(t_join.treatment_fees_difference) + SUM(t_join.advenced_amount) test,
COUNT(*) "count"
FROM
(
SELECT t_a.treatment_fees_difference , IFNULL(t_b.advenced_amount,0 ) AS advenced_amount
FROM hospital_payment_data t_a
LEFT OUTER JOIN advenced_payment t_b on t_a.chart_num = t_b.chart_num
WHERE t_a.treatment_fees_check_division = 'test'
) t_join
bad result
How do I fix my query to get the results I want?
The problem appears to be that you are joining the tables, which will give you two rows in the results, then you are summing the values with sum(t_join.advenced_amount) sum_init_amount which is giving double the value you want.
This is a quick and nasty fix:
SELECT
sum(t_join.treatment_fees_difference) if_treatment_fees_check_division,
MIN(t_join.advenced_amount) sum_init_amount ,
sum(t_join.treatment_fees_difference) + sum(t_join.advenced_amount) test,
COUNT(*) "count"
FROM
(
SELECT t_a.treatment_fees_difference , IFNULL(t_b.advenced_amount,0 ) AS advenced_amount
FROM hospital_payment_data t_a LEFT OUTER JOIN advenced_payment t_b on t_a.chart_num = t_b.chart_num
WHERE t_a.treatment_fees_check_division = 'test'
) t_join
But that will probably fail in a real world application.
Try it like this (untested):
SELECT
if_treatment_fees_check_division,
advenced_amount AS sum_init_amount ,
if_treatment_fees_check_division + advenced_amount AS test,
row_count AS "count"
FROM
(
SELECT
SUM(treatment_fees_difference) as if_treatment_fees_check_division,
count(*) as row_count,
chart_num
FROM hospital_payment_data
WHERE treatment_fees_check_division = 'test'
GROUP BY chart_num
) as TABLE1
LEFT OUTER JOIN advenced_payment AS TABLE2
ON TABLE1.chart_num = TABLE2.chart_num

#1222 - The used SELECT statements have a different number of columns

Why am i getting a #1222 - The used SELECT statements have a different number of columns
? i am trying to load wall posts from this users friends and his self.
SELECT u.id AS pid, b2.id AS id, b2.message AS message, b2.date AS date FROM
(
(
SELECT b.id AS id, b.pid AS pid, b.message AS message, b.date AS date FROM
wall_posts AS b
JOIN Friends AS f ON f.id = b.pid
WHERE f.buddy_id = '1' AND f.status = 'b'
ORDER BY date DESC
LIMIT 0, 10
)
UNION
(
SELECT * FROM
wall_posts
WHERE pid = '1'
ORDER BY date DESC
LIMIT 0, 10
)
ORDER BY date DESC
LIMIT 0, 10
) AS b2
JOIN Users AS u
ON b2.pid = u.id
WHERE u.banned='0' AND u.email_activated='1'
ORDER BY date DESC
LIMIT 0, 10
The wall_posts table structure looks like id date privacy pid uid message
The Friends table structure looks like Fid id buddy_id invite_up_date status
pid stands for profile id. I am not really sure whats going on.
The first statement in the UNION returns four columns:
SELECT b.id AS id,
b.pid AS pid,
b.message AS message,
b.date AS date
FROM wall_posts AS b
The second one returns six, because the * expands to include all the columns from WALL_POSTS:
SELECT b.id,
b.date,
b.privacy,
b.pid.
b.uid message
FROM wall_posts AS b
The UNION and UNION ALL operators require that:
The same number of columns exist in all the statements that make up the UNION'd query
The data types have to match at each position/column
Use:
FROM ((SELECT b.id AS id,
b.pid AS pid,
b.message AS message,
b.date AS date
FROM wall_posts AS b
JOIN Friends AS f ON f.id = b.pid
WHERE f.buddy_id = '1' AND f.status = 'b'
ORDER BY date DESC
LIMIT 0, 10)
UNION
(SELECT id,
pid,
message,
date
FROM wall_posts
WHERE pid = '1'
ORDER BY date DESC
LIMIT 0, 10))
You're taking the UNION of a 4-column relation (id, pid, message, and date) with a 6-column relation (* = the 6 columns of wall_posts). SQL doesn't let you do that.
(
SELECT b.id AS id, b.pid AS pid, b.message AS message, b.date AS date FROM
wall_posts AS b
JOIN Friends AS f ON f.id = b.pid
WHERE f.buddy_id = '1' AND f.status = 'b'
ORDER BY date DESC
LIMIT 0, 10
)
UNION
(
SELECT id, pid , message , date
FROM
wall_posts
WHERE pid = '1'
ORDER BY date DESC
LIMIT 0, 10
)
You were selecting 4 in the first query and 6 in the second, so match them up.
Beside from the answer given by #omg-ponies; I just want to add that this error also occur in variable assignment. In my case I used an insert; associated with that insert was a trigger. I mistakenly assign different number of fields to different number of variables. Below is my case details.
INSERT INTO tab1 (event, eventTypeID, fromDate, toDate, remarks)
-> SELECT event, eventTypeID,
-> fromDate, toDate, remarks FROM rrp group by trainingCode;
ERROR 1222 (21000): The used SELECT statements have a different number of columns
So you see I got this error by issuing an insert statement instead of union statement. My case difference were
I issued a bulk insert sql
i.e. insert into tab1 (field, ...) as select field, ... from tab2
tab2 had an on insert trigger; this trigger basically decline duplicates
It turns out that I had an error in the trigger. I fetch record based on new input data and assigned them in incorrect number of variables.
DELIMITER ##
DROP TRIGGER trgInsertTrigger ##
CREATE TRIGGER trgInsertTrigger
BEFORE INSERT ON training
FOR EACH ROW
BEGIN
SET #recs = 0;
SET #trgID = 0;
SET #trgDescID = 0;
SET #trgDesc = '';
SET #district = '';
SET #msg = '';
SELECT COUNT(*), t.trainingID, td.trgDescID, td.trgDescName, t.trgDistrictID
INTO #recs, #trgID, #trgDescID, #proj, #trgDesc, #district
from training as t
left join trainingDistrict as tdist on t.trainingID = tdist.trainingID
left join trgDesc as td on t.trgDescID = td.trgDescID
WHERE
t.trgDescID = NEW.trgDescID
AND t.venue = NEW.venue
AND t.fromDate = NEW.fromDate
AND t.toDate = NEW.toDate
AND t.gender = NEW.gender
AND t.totalParticipants = NEW.totalParticipants
AND t.districtIDs = NEW.districtIDs;
IF #recs > 0 THEN
SET #msg = CONCAT('Error: Duplicate Training: previous ID ', CAST(#trgID AS CHAR CHARACTER SET utf8) COLLATE utf8_bin);
SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = #msg;
END IF;
END ##
DELIMITER ;
As you can see i am fetching 5 fields but assigning them in 6 var. (My fault totally I forgot to delete the variable after editing.
You are using MySQL Union.
UNION is used to combine the result from multiple SELECT statements into a single result set.
The column names from the first SELECT statement are used as the column names for the results returned. Selected columns listed in corresponding positions of each SELECT statement should have the same data type. (For example, the first column selected by the first statement should have the same type as the first column selected by the other statements.)
Reference: MySQL Union
Your first select statement has 4 columns and second statement has 6 as you said wall_post has 6 column.
You should have same number of column and also in same order in both statement.
otherwise it shows error or wrong data.

How do you find a missing number in a table field starting from a parameter and incrementing sequentially?

Let's say I have an sql server table:
NumberTaken CompanyName
2 Fred 3 Fred 4 Fred 6 Fred 7 Fred 8 Fred 11 Fred
I need an efficient way to pass in a parameter [StartingNumber] and to count from [StartingNumber] sequentially until I find a number that is missing.
For example notice that 1, 5, 9 and 10 are missing from the table.
If I supplied the parameter [StartingNumber] = 1, it would check to see if 1 exists, if it does it would check to see if 2 exists and so on and so forth so 1 would be returned here.
If [StartNumber] = 6 the function would return 9.
In c# pseudo code it would basically be:
int ctr = [StartingNumber]
while([SELECT NumberTaken FROM tblNumbers Where NumberTaken = ctr] != null)
ctr++;
return ctr;
The problem with that code is that is seems really inefficient if there are thousands of numbers in the table. Also, I can write it in c# code or in a stored procedure whichever is more efficient.
Thanks for the help
A solution using JOIN:
select min(r1.NumberTaken) + 1
from MyTable r1
left outer join MyTable r2 on r2.NumberTaken = r1.NumberTaken + 1
where r1.NumberTaken >= 1 --your starting number
and r2.NumberTaken is null
I called my table Blank, and used the following:
declare #StartOffset int = 2
; With Missing as (
select #StartOffset as N where not exists(select * from Blank where ID = #StartOffset)
), Sequence as (
select #StartOffset as N from Blank where ID = #StartOffset
union all
select b.ID from Blank b inner join Sequence s on b.ID = s.N + 1
)
select COALESCE((select N from Missing),(select MAX(N)+1 from Sequence))
You basically have two cases - either your starting value is missing (so the Missing CTE will contain one row), or it's present, so you count forwards using a recursive CTE (Sequence), and take the max from that and add 1
Edit from comment. Yes, create another CTE at the top that has your filter criteria, then use that in the rest of the query:
declare #StartOffset int = 2
; With BlankFilters as (
select ID from Blank where hasEntered <> 1
), Missing as (
select #StartOffset as N where not exists(select * from BlankFilters where ID = #StartOffset)
), Sequence as (
select #StartOffset as N from BlankFilters where ID = #StartOffset
union all
select b.ID from BlankFilters b inner join Sequence s on b.ID = s.N + 1
)
select COALESCE((select N from Missing),(select MAX(N)+1 from Sequence))
this may now return a row that does exist in the table, but hasEntered=1
Tables:
create table Blank (
ID int not null,
Name varchar(20) not null
)
insert into Blank(ID,Name)
select 2 ,'Fred' union all
select 3 ,'Fred' union all
select 4 ,'Fred' union all
select 6 ,'Fred' union all
select 7 ,'Fred' union all
select 8 ,'Fred' union all
select 11 ,'Fred'
go
Try the set based approach - should be faster
select min(t1.NumberTaken)+1 as "min_missing" from t t1
where not exists (select 1 from t t2
where t1.NumberTaken = t2.NumberTaken+1)
and t1.NumberTaken > #StartingNumber
This is Sybase syntax, so massage for SQL server consumption if needed.
Create a temp table with all numbers from StartingValue to EndValue and LEFT OUTER JOIN to your data table.

T-sql problem with running sum

I am trying to write T-sql script which will find "open" records for one table
Structure of data is following
Id (int PK) Ts (datetime) Art_id (int) Amount (float)
1 '2009-01-01' 1 1
2 '2009-01-05' 1 -1
3 '2009-01-10' 1 1
4 '2009-01-11' 1 -1
5 '2009-01-13' 1 1
6 '2009-01-14' 1 1
7 '2009-01-15' 2 1
8 '2009-01-17' 2 -1
9 '2009-01-18' 2 1
According to my needs I am trying to show only records after last sum for every one articles where 0 sorting by date of last running sum of zero value. So I am trying to abstract (show) records 5 and 6 for Art_id=1 and record 9 for art_id=2. I am using MSSQL2005 and my table has around 30K records with 6000 distinct values of ART_ID.
In this solution I simply want to find all the rows where there isn't a subsequent row for that Art_id where the running sum was 0. I am assuming we can use the ID as a better tiebreaker than TS, since two rows can come in with the same timestamp but they will get sequential identity values.
;WITH base AS
(
SELECT
ID, Art_id, TS, Amount,
RunningSum = Amount + COALESCE
(
(
SELECT SUM(Amount)
FROM dbo.foo
WHERE Art_id = f.Art_id
AND ID < f.ID
)
, 0
)
FROM dbo.[table name] AS f
)
SELECT ID, Art_id, TS, Amount
FROM base AS b1
WHERE NOT EXISTS
(
SELECT 1
FROM base AS b2
WHERE Art_id = b1.Art_id
AND ID >= b1.ID
AND RunningSum = 0
)
ORDER BY ID;
Complete working query:
SELECT
*
FROM TABLE_NAME E
JOIN
(SELECT
C.ART_ID,
MAX(TS) MAX_TS
FROM
(SELECT
ART_ID,
TS,
COALESCE((SELECT SUM(AMOUNT) FROM TABLE_NAME B WHERE (B.Art_id = A.Art_id) AND (B.Ts < A.Ts)),0) ROW_SUM
FROM TABLE_NAME A) C
WHERE C.ROW_SUM = 0
GROUP BY C.ART_ID) D
ON
(D.ART_ID = E.ART_ID) AND
(E.TS >= D.MAX_TS)
First we calculate running sums for every row:
SELECT
ART_ID,
TS,
COALESCE((SELECT SUM(AMOUNT) FROM TABLE_NAME B WHERE (B.Art_id = A.Art_id) AND (B.Ts < A.Ts)),0) ROW_SUM
FROM TABLE_NAME A
Then we look for last article with 0:
SELECT
C.ART_ID,
MAX(TS) MAX_TS
FROM
(SELECT
ART_ID,
TS,
COALESCE((SELECT SUM(AMOUNT) FROM TABLE_NAME B WHERE (B.Art_id = A.Art_id) AND (B.Ts < A.Ts)),0) ROW_SUM
FROM TABLE_NAME A) C
WHERE C.ROW_SUM = 0
GROUP BY C.ART_ID
You can find all rows where the running sum is zero with:
select cur.id, cur.art_id
from #articles cur
left join #articles prev
on prev.art_id = cur.art_id
and prev.id <= cur.id
group by cur.id, cur.art_id
having sum(prev.amount) = 0
Then you can query all rows that come after the rows with a zero running sum:
select a.*
from #articles a
left join (
select cur.id, cur.art_id, running = sum(prev.amount)
from #articles cur
left join #articles prev
on prev.art_id = cur.art_id
and prev.ts <= cur.ts
group by cur.id, cur.art_id
having sum(prev.amount) = 0
) later_zero_running on
a.art_id = later_zero_running.art_id
and a.id <= later_zero_running.id
where later_zero_running.id is null
The LEFT JOIN in combination with the WHERE says: there can not be a row after this row, where the running sum is zero.