SQL statement with two not ins - sql

I have 2 tables: Documents and DocumentAttributes.
Document with relevant columns DocID, DelFlag
DocumentAttributes: DocID, aID, aValue
Now I want all DocIDs and aValues with following restriction:
SELECT
[o1].[docid]
, [o1].[aValue]
FROM [DocumentAttributes] [o1]
WHERE [o1].[aID] = 9
AND [o1].[DocID] >= 2356
AND [o1].[DocID] < 90000000
AND [o1].[DocID] NOT IN
(
SELECT
[o].[DocID]
FROM [DocumentAttributes] [o]
WHERE [o].[aID] = 2
)
AND [o1].[DocID] IN
(
SELECT
[d].[DocID]
FROM [DOCUMENTS] [d]
WHERE [d].[DELFLAG] != 2
);
So I want all IDs where Documents have no Attribute with AttributeID = 2 and which are not marked as Deleted.
The SQL statement above works, but it's too slow since I have about 1kk documents with each having about 10 Attributes at least.
The 3 selects themselves cost less than 1 second, so the "not in" is the problem I guess.
Does anyone have an idea how to make it faster?
Thanks in advance

Re-write the not in as left join:
select o1.docid, o1.aValue
from DocumentAttributes o1
left join DocumentAttributes o on o1.DocID = o.DocID and o.aID = 2
where o1.aID = 9 and o1.DocID >= 2356 and o1.DocID < 90000000
and o.DocID is null
and o1.DocID in (
select d.DocID
from DOCUMENTS d
where d.DELFLAG != 2)

Oracle supports the minus keyword. That means you can replace this sort of thing
where myField not in (
select someField
from etc
)
with this sort of thing.
where myField in (
select someField
from wherever
where they are available
minus
select someField
from the same tables
where I want to exclude them
)
I suggest trying both this and the left join method to see which performs better.

Subselects from 'not in' statements are executed for every row, so maybe thats one of your cause.

Related

using correlated subquery in the case statement

I’m trying to use a correlated subquery in my sql code and I can't wrap my head around what I'm doing wrong. A brief description about the code and what I'm trying to do:
The code consists of a big query (ALIASED AS A) which result set looks like a list of customer IDs, offer IDs and response status name ("SOLD","SELLING","IRRELEVANT","NO ANSWER" etc.) of each customer to each offer. The customers IDs and the responses in the result set are non-unique, since more than one offer can be made to each customer, and a customer can have different response for different offers.
The goal is to generate a list of distinct customer IDs and to mark each ID with 0 or 1 flag :
if the ID has AT LEAST ONE offer with status name is "SOLD" or "SELLING" the flag should be 1 otherwise 0. Since each customer has an array of different responses, what I'm trying to do is to check if "SOLD" or "SELLING" appears in this array for each customer ID, using correlated subquery in the case statement and aliasing the big underlying query named A with A1 this time:
select distinct
A.customer_ID,
case when 'SOLD' in (select distinct A1.response from A as A1
where A.customer_ID = A1.customer_ID) OR
'SELLING' in (select distinct A1.response from A as A1
where A.customer_ID = A1.customer_ID)
then 1 else 0 end as FLAG
FROM
(select …) A
What I get is a mistake alert saying there is no such object as A or A1.
Thanks in advance for the help!
You can use exists with cte :
with cte as (
<query here>
)
select c.*,
(case when exists (select 1
from cte c1
where c1.customer_ID = c.customer_ID and
c1.response in ('sold', 'selling')
)
then 1 else 0
end) as flag
from cte c;
You can also do aggregation :
select customer_id,
max(case when a.response in ('sold', 'selling') then 1 else 0 end) as flag
from < query here > a;
group by customer_id;
With statement as suggested by Yogesh is a good option. If you have any performance issues with "WITH" statement. you can create a volatile table and use columns from volatile table in your select statement .
create voltaile table as (select response from where response in ('SOLD','SELLING').
SELECT from customer table < and join voltaile table>.
The only disadvantge here is volatile tables cannot be accessed after you disconnect from session.

Grouping records on consecutive dates

If I have following table in Postgres:
order_dtls
Order_id Order_date Customer_name
-------------------------------------
1 11/09/17 Xyz
2 15/09/17 Lmn
3 12/09/17 Xyz
4 18/09/17 Abc
5 15/09/17 Xyz
6 25/09/17 Lmn
7 19/09/17 Abc
I want to retrieve such customer who has placed orders on 2 consecutive days.
In above case Xyz and Abc customers should be returned by query as result.
There are many ways to do this. Use an EXISTS semi-join followed by DISTINCT or GROUP BY, should be among the fastest.
Postgres syntax:
SELECT DISTINCT customer_name
FROM order_dtls o
WHERE EXISTS (
SELEST 1 FROM order_dtls
WHERE customer_name = o.customer_name
AND order_date = o.order_date + 1 -- simple syntax for data type "date" in Postgres!
);
If the table is big, be sure to have an index on (customer_name, order_date) to make it fast - index items in this order.
To clarify, since Oto happened to post almost the same solution a bit faster:
DISTINCT is an SQL construct, a syntax element, not a function. Do not use parentheses like DISTINCT (customer_name). Would be short for DISTINCT ROW(customer_name) - a row constructor unrelated to DISTINCT - and just noise for the simple case with a single expression, because Postgres removes the pointless row wrapper for a single element automatically. But if you wrap more than one expression like that, you get an actual row type - an anonymous record actually, since no row type is given. Most certainly not what you want.
What is a row constructor used for?
Also, don't confuse DISTINCT with DISTINCT ON (expr, ...). See:
Select first row in each GROUP BY group?
Try something like...
SELECT `order_dtls`.*
FROM `order_dtls`
INNER JOIN `order_dtls` AS mirror
ON `order_dtls`.`Order_id` <> `mirror`.`Order_id`
AND `order_dtls`.`Customer_name` = `mirror`.`Customer_name`
AND DATEDIFF(`order_dtls`.`Order_date`, `mirror`.`Order_date`) = 1
The way I would think of it doing it would be to join the table the date part with itselft on the next date and joining it with the Customer_name too.
This way you can ensure that the same customer_name done an order on 2 consecutive days.
For MySQL:
SELECT distinct *
FROM order_dtls t1
INNER JOIN order_dtls t2 on
t1.Order_date = DATE_ADD(t2.Order_date, INTERVAL 1 DAY) and
t1.Customer_name = t2.Customer_name
The result you should also select it with the Distinct keyword to ensure the same customer is not displayed more than 1 time.
For postgresql:
select distinct(Customer_name) from your_table
where exists
(select 1 from your_table t1
where
Customer_name = your_table.Customer_name and Order_date = your_table.Order_date+1 )
Same for MySQL, just instead of your_table.Order_date+1 use: DATE_ADD(your_table.Order_date , INTERVAL 1 DAY)
This should work:
SELECT A.customer_name
FROM order_dtls A
INNER JOIN (SELECT customer_name, order_date FROM order_dtls) as B
ON(A.customer_name = B.customer_name and Datediff(B.Order_date, A.Order_date) =1)
group by A.customer_name

Find records in SQL Server 2008 R2

Please find below image for make understanding my issues. I have a table as shown below picture. I need to get only highlighted (yellow) records. What is the best method to find these records?
In SQL Server 2012+, you can use the lead() and lag() functions. However, this is not available in SQL Server 2008. Here is a method using outer apply:
select t.*
from t outer apply
(select top 1 tprev.*
from t tprev
on tprev.time < t.time
order by tprev.time desc
) tprev outer apply
(select top 1 tnext.*
from t tnext
on tnext.time > t.time
order by tnext.time asc
)
where (t.cardtype = 1 and tnext.cardtype = 2) or
(t.cardtype = 2 and tprev.cardtype = 1);
With your sample data, it would also be possible to use self joins on the id column. This seems unsafe, though, because there could be gaps in that columns values.
Havent tried this, but I think it should work. First, make a view of the table in your question, with the rownumber included as one column:
CREATE VIEW v AS
SELECT
ROW_NUMBER() OVER(ORDER BY id) AS rownum,
id,
time,
card,
card_type
FROM table
Then, you can get all the rows of type 1 followed by a row of type 2 like this:
SELECT
a.id,
-- And so on...
FROM v AS a
JOIN v AS b ON b.rownum = a.rownum + 1
WHERE a.card_type = 1 AND b.card_type = 2
And all the rows of type 2 preceded by a row of type 1 like this:
SELECT
b.id,
-- And so on...
FROM v AS b
JOIN v AS a ON b.rownum = a.rownum + 1
WHERE a.card_type = 1 AND b.card_type = 2
To get them both in the same set of results, you can just use UNION ALL. Technically, you don't need the view. You could use nested selects instead, but since you will need to query the table four times it might be nice to have it as a view.
Also, if the ID is continous (it goes 1, 2, 3 without any gaps), you don't need the rownum and can just use the ID instead.
here is a code you can run in sql server
select * from Table_name where id in (1,2,6,7,195,160,164,165)

Fastest way to check if values in one table does not exist in another based on multiple values

This is probably easy but i have been racking my brain. I have two tables (TEMP_PARTY and GTT_PARTY).
TEMP_PARTY AND GTT_PARTY have the following columns => system, case_num, party_id, party, role.
What I am trying to do is find all values from TEMP_PARTY that do not exist in GTT_PARTY based on a unique combination of System, case_num and party_id.
Hence, i want to pull all parties from TEMP_PARTY where I have no record in GTT_PARTY. A record is uniquely identified by data in the three columns (System, case_num and party_id).
Speed is a HUGE concern for me. Can someone please help.
Use NOT EXISTS like below
select * from TEMP_PARTY
where not exists
(
select 1 from GTT_PARTY
where System = some_val
and case_num = 3
and party_id = 7
)
Also, You can use MINUS operator (If I am not wrong Oracle has it and it's same as EXCEPT in SQL Server) like
select * from TEMP_PARTY
where System = some_val
and case_num = 3
and party_id = 7
MINUS
select * from GTT_PARTY
where System = some_val
and case_num = 3
and party_id = 7
One approach is an anti-join pattern:
SELECT t.system
, t.case_num
, t.party_id
, t.party
, t.role
FROM TEMP_PARTY t
LEFT
JOIN GTT_PARTY g
ON g.system = t.system
AND g.case_num = t.case_num
AND g.party_id = t.party_id
WHERE g.system IS NULL
This is an "outer" join, returning all rows from t, along with all matching rows from g, except we've added a predicate in the WHERE clause which effectively excludes all rows that had a match. So what we're left with is rows from t that don't have any matching row(s) in g.
This isn;t the only way to get the result. There are several other approaches, for example, you can return an equivalent result with a query using a NOT EXISTS and correlated subquery.
SELECT t.system
, t.case_num
, t.party_id
, t.party
, t.role
FROM TEMP_PARTY t
WHERE NOT EXISTS
( SELECT 1
FROM GTT_PARTY g
WHERE g.system = t.system
AND g.case_num = t.case_num
AND g.party_id = t.party_id
)
For best performance, we'd want to see an index (with leading columns of)
... ON GTT_PARTY (system,case_num,party_id)
EXPLAIN output will show the execution plan.
(I was thinking this was for MySQL; these same queries will work in Oracle as well.)

update existing column with results of select query using sql

I am trying to update a column called Number_Of_Marks in our Results table using the results we get from our SELECT statement. Our select statement is used to count the numbers of marks per module in our results table. The SELECT statement works and the output is correct, which is
ResultID ModuleID cnt
-------------------------
111 ART3452 2
114 ART3452 2
115 CSC3039 3
112 CSC3039 3
113 CSC3039 3
The table in use is:
Results: ResultID, ModuleID, Number_Of_Marks
We need the results of cnt to be updated into our Number_Of_Marks column. This is our code below...
DECLARE #cnt INT
SELECT #cnt
SELECT C.cnt
FROM Results S
INNER JOIN (SELECT ModuleID, count(ModuleID) as cnt
FROM Results
GROUP BY ModuleID) C ON S.ModuleID = C.ModuleID
UPDATE Results
SET [Number_Of_Marks] = (#cnt)
You can do this in SQL Server using the update/join syntax:
UPDATE s
SET [Number_Of_Marks] = c.cnt
FROM Results S INNER JOIN
(SELECT ModuleID, count(ModuleID) as cnt
FROM Results
GROUP BY ModuleID
) C
ON S.ModuleID = C.ModuleID;
I assume that you want the count from the subquery, not from the uninitialized variable.
EDIT:
In general, when you change the question it is better to ask another question. Sometimes, though, the changes are really small. The revised query looks something like:
UPDATE s
SET [Number_Of_Marks] = c.cnt,
Marks = avgmarks
FROM Results S INNER JOIN
(SELECT ModuleID, count(ModuleID) as cnt, avg(marks * 1.0) as avgmarks
FROM Results
GROUP BY ModuleID
) C
ON S.ModuleID = C.ModuleID;
Note that I multiplied the marks by 1.0. This is a quick-and-dirty way to convert an integer to a numeric value. SQL Server takes averages on integers and produces an integer. Usually you want some sort of decimal or floating value.