Eliminate records - sql

I am writing TSQL to eliminate some data in a stored procedure.
The scenario is that there are four data points ID, Recordnumer, OrderDate,RejectDate
The ID can have multiple same or different order date and reject date.
I need to eliminate all the records apart from 1/01/1900 (This is not an actual rejection and a null which is substituted with this value).
However, if no rejection with 1/01/1900 then I should eliminate all records apart from the max of the reject date.
The record number is a roumber that I have done using Row over partition. Please shed a light: The image a particular records and I need to apply this rule on all the records in the table. The expected results are highlighted in yellow for different ID's

Is this what you want?
select t.*
from t
where t.reject_date = '1900-01-01' or
t.reject_date = (select max(t2.reject_date)
from t t2
where t2.id = t.id
);
For each id, this keeps the rows where the reject_date is 1900-01-01 or the reject date is the maximum reject date for that id.
EDIT:
This might be more appropriate:
select t.*
from t
where t.reject_date = (select t2.reject_date
from t t2
where t2.id = t.id
order by (case when t2.reject_date = '1900-01-01' then 1 else 2 end),
t2.reject_date desc
);

Seems you don't need row_number() for this
select id
, OrderDate
, RejectDate
, max(case when RejectDate = '1900-01-01' then '9999-12-31' else RejectDate end) as rSum
from tableA
group by id, OrderDate, RejectDate

Related

SQL Help: SELECT CASE?

I'm looking for some general guidance on the best solution for a reoccurring SQL query. Basically, I want to create a view of a table which has a lot of nearly identical rows, (except for 1 discerning column called [Status], which can be either 'Closed' or 'Draft').
I want to return distinct data for each [Port], if both 'Closed' and 'Draft' exist, then return only the 'Draft' row data, and if only 'Closed' exists, then return the 'Closed' row data.
Please refer to the attached files for a visual. Any assistance is greatly appreciated! I believe this solution will lend itself well to other practical cases/solutions for me in the future - thank you!
Original Table Data:
Example Output:
Try this,
select c.Port,c.DateAdded,max(Status) as Status
from myTable c
group by c.Port,c.DateAdded
Basically, group the table, and take the highest status code (Closed or Draft)
If both exists, Draft will be returned
Use NOT EXISTS:
SELECT t1.*
FROM tablename t1
WHERE t1.Status = 'Draft'
OR NOT EXISTS (
SELECT 1
FROM tablename t2
WHERE t2.Port = t1.Port AND t1.Status = 'Draft'
)
Or with ROW_NUMBER() window function:
SELECT Port, DateAdded, Status
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Port ORDER BY CASE WHEN Status = 'Draft' THEN 1 ELSE 2 END) rn
FROM tablename
) t
WHERE rn = 1
A rewording of your requirement is to return just one row per Port, and that Draft rows take precidence over Closed rows.
You don't make clear if they can have different dates though. Such that if one port has two Draft rows or two Closed rows, do you want the earlier dated row, or the later dated row?
The code below presumes the dates can indeed be different, and that your prefer the later dated row.
WITH
sorted AS
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY port ORDER BY status DESC, dateAdded DESC) AS seq_num
FROM
YourTable
)
SELECT
*
FROM
sorted
WHERE
seq_num = 1
If the dates are always identical, MAX(status) with GROUP BY port, dateAdded is easily sufficient.
I'd use a full outer join, and coalesce the results so that the "draft" row is preferred over the "closed" row:
SELECT COALESCE(d.Port, c.Port),
COALESCE(d.DateAdded, c.DateAdded),
COALESCE(d.Status, c.Status)
FROM (SELECT Port, DateAdded, Status
FROM mytable
WHERE Status = 'Draft') d
FULL OUTER JOIN (SELECT Port, DateAdded, Status
FROM mytable
WHERE Status = 'Closed') c ON d.Port = c.Port

Showing zeroes in sql count

I`m using redshift and trying to count different things by days, but its not showing when the count in table 2 is zero. How can i make it show count zero?
SELECT TO_CHAR(date1,'dd') AS day,
COUNT(*) as Volume,sum(CASE WHEN status = 'ANSWERED' THEN 1 ELSE 0 END )as ANSWERED , t2.Volume AS TRANSFERS
FROM table1 t1
RIGHT JOIN (SELECT TO_CHAR(date2,'dd') AS day,
COUNT(*) as Volume
FROM table2
WHERE TO_CHAR(date2,'yyyy_MM') IN (SELECT DISTINCT TO_CHAR(date2,'yyyy_MM')
FROM table2
WHERE date2 BETWEEN DATE ('2016-11-01') AND DATE ('2016-12-30'))
AND type = 'Active'
GROUP BY day) t2 ON TO_CHAR(date1,'dd') = day
WHERE TO_CHAR(date1,'yyyy_MM') IN (SELECT DISTINCT TO_CHAR(date1,'yyyy_MM')
FROM table1
WHERE date1 BETWEEN DATE ('2016-11-01') AND DATE ('2016-12-30'))
GROUP BY 1,4
ORDER BY 1
Notice that you used a right join between the tables. This means that any row from the first table that doesn't have a matching day in the second table will not display.
If you're new with SQL joins you can refer to this image that explains it.
If your first (or left table) contains all of the unique days that should show up in the result, just switch the "right" to a "left" join.

Query to add quarter condition in SQL

I have a scenario in which if the system date is between 1 to 5 of the current quarter the calculation should not include current quarter data and if it is greater than 5 it has to include all the data.
I am trying to include this condition in where clause but I am not able to acheive the result.
Could you please help me in this condition
SELECT
Dense_Rank() over(order by AMOUNT desc)as RANK,
FISCAL,
AMOUNT
FROM
T1 INNER JOIN T2 ON 1=1
WHERE ( FISCAL<( CASE WHEN t2.SYSDATE BETWEEN t2.CURRENTQUARTER_START_DATE AND ADD_DAYS(tw.CURRENTQUARTER_START_DATE,4)
THEN CURRENT_QUARTER
END ) OR (NULL)
I am not sure how to include that condition.
I think this may be what you're after:
SELECT
Dense_Rank() over (order by AMOUNT desc) as RANK,
FISCAL,
AMOUNT
FROM
T1
WHERE
FISCAL <= (
SELECT
CASE
WHEN ADD_DAYS(SYSDATE, -5) >= CURRENTQUARTER_START_DATE
THEN SYSDATE /* or maybe CURRENTQUARTER_END_DATE ? */
ELSE ADD_DAYS(CURRENTQUARTER_START_DATE, -1)
END
FROM T2
)
While you can do this with a join I think it makes sense to break it into logical pieces where the end-date lookup is isolated to a subquery and where the optimizer will understand that it should only see a single row/value.
Try:
CASE
WHEN t2.SYSDATE BETWEEN t2.CURRENT_QUARTER_START_DATE
AND ADD_DAYS(tw.CURRENTQUARTER_START_DATE, 4)
THEN CURRENT_QUARTER
ELSE NULL
END

SQL Query select optimization

i'm using ms sqlserver 2005.
i have a query that need to filter according to date.
lets say i have a table containing phone numbers and dates.
i need to provide a count number of phone numbers in a time frame (begin date and end date).
this phone numbers shouldn't be in the result count if they appear in the past.
i'm doing something like this :
select (phoneNumber) from someTbl
where phoneNumber not in (select phoneNumber from someTbl where date<#startDate)
This is looking not efficient at all to me (and it is taking too much time to preform resulting with some side effects that maybe should be presented in a different question)
i have about 300K rows in someTbl that should be checked.
after i'm doing this check i need to check one more thing.
i have a past database that contains yet another 30K of phone numbers.
so i'm adding
and phoneNumber not in (select pastPhoneNumber from somePastTbl)
and that really nail the coffin or the last straw that break the camel or what ever phrase you are using to explain fatal state.
So i'm looking for a better way to preform this 2 actions.
UPDATE
i have choose to go with Alexander's solution and ended up with this kind of query :
SELECT t.number
FROM tbl t
WHERE t.Date > #startDate
--this is a filter for different customers
AND t.userId in (
SELECT UserId
FROM Customer INNER JOIN UserToCustomer ON Customer.customerId = UserToCustomer.CustomerId
Where customerName = #customer
)
--this is the filter for past number
AND NOT EXISTS (
SELECT 1
FROM pastTbl t2
WHERE t2.Numbers = t.number
)
-- this is the filter for checking if the number appeared in the table before startdate
AND NOT EXISTS (
SELECT *
FROM tbl t3
WHERE t3.Date<#startDate and t.number=t3.number
)
Thanks Gilad
Since its a not in just switch the less than to a greater than.
select phoneNumber from someTbl where date > #startDate
Next to filter out somePastTbl
select s1.phoneNumber from someTbl s1
LEFT JOIN somePastTbl s2 on s1.phoneNumber = s2.phonenumber
where s1.date > #startDate and s2 IS NULL
UPDATE
As Per comment:
Less than month of start date
SELECT COUNT(s1.phoneNumber) FROM someTbl s1
LEFT JOIN somePastTbl s2 on s1.phoneNumber = s2.phonenumber
where DATEADD(MONTH,-1,#startDate) < s1.date AND s1.date < #startDate and s2 IS NULL
One more option
SELECT t.phoneNumber
FROM SomeTbl t
WHERE t.date > #startDate
AND NOT EXISTS (
SELECT 1
FROM SomePastTbl t2
WHERE t2.phoneNumber = t.phoneNumber
)
one simple index
CREATE NONCLUSTERED INDEX IX_SomeTbl_date_phoneNumber
ON SomeTbl
(
date ASC,
phoneNumber ASC
)
then
SELECT phoneNumber FROM SomeTbl WHERE date > #startDate
EXCEPT
SELECT phoneNumber FROM SomePastTbl;
You want phone numbers whose minimum start date is greater than your start date. This suggests aggregation at the phone number level before doing the count (or creating the list).
Here is one way, with the condition in the having clause:
select COUNT(*)
from (select t.phonenumber,
from someTble t left outer join
somePastTble pt
on t.phonenumber = pt.phonenumber
where pt.phonenumber is null
having MIN(date) >= #startdate
) t
You can also write this using window functions (SQL 2005 or greater). Here is a version using min():
select COUNT(distinct t.phonenumber)
from (select t.*, t.phonenumber, MIN(date) over (partition by phonenumber) as mindate
from someTble t
) t left outer join
somePastTble pt
on t.phonenumber = pt.phonenumber
where pt.phonenumber is null and mindate >= #startdate

A query calls two instances of the same tables joined to compare fields, gives mirrored results. How do I eliminate mirrored duplicates?

This is a simpler version of the query I have.
Alias1 as
(select distinct ID, file_tag, status, creation_date from tables where creation_dt >= sysdate and creation_dt <= sysdate + 1),
Alias2 as
(select distinct ID, file_tag, status, creation_date from same tables creation_dt >= sysdate and creation_dt <= sysdate + 1)
select distinct Alias1.ID ID_1,
Alias2.ID ID_2,
Alias1.file_tag,
Alias1.creation_date in_dt1,
Alias2.creation_date in_dt2
from Alias1, Alias2
where Alias1.file_tag = Alias2.file_tag
and Alias1.ID != Alias2.ID
order by Alias1.creation_dt desc
This is an example of the results. Both of these are the same, though their values are flipped.
ID_1 ID_2 File_Tag in_dt1 in_dt2
70 66 Apples 6/25/2012 3:06 6/25/2012 2:53:47 PM
66 70 Apples 6/25/2012 2:53 6/25/2012 3:06:18 PM
The goal of the query is to find more than one ID with a matching file tag and do stuff to the one submitted earlier in the day (the query runs daily and only needs duplicates from that given day). I am still relatively new to SQL/Oracle and wonder if there's a better way to approach this problem.
SELECT *
FROM (SELECT id, file_tag, creation_date in_dt
, row_number() OVER (PARTITION BY file_tag
ORDER BY creation_date) rn
, count(*) OVER (PARTITION BY file_tag) ct
FROM tables
WHERE creation_date >= TRUNC(SYSDATE)) tbls
WHERE rn = 1
AND ct > 1;
This should get you the first (earliest) row within each file_tag having at least 2 records today.
The inner select calculates the relative row numbers of each set of identical file_tag records by creation date. The outer select retrieves the first one in each partition.
This assumes from your goal statement that you want to do something with the earliest single row for each file_tag. The inner query only returns rows with a creation_date of sometime on the current day.
Here is an easy way, just by chaning your comparison operation:
select distinct Alias1.ID ID_1, Alias2.ID ID_2, Alias1.file_tag,
Alias1.creation_date in_dt1, Alias2.creation_date in_dt2
from Alias1 join
Alias2
on Alias1.file_tag = Alias2.file_tag and
Alias1.ID < Alias2.ID
order by Alias1.creation_dt desc
Replacing the not-equals with less-than orders the two ideas so the smaller one is always first. This will eliminate the duplicates. Note: I also fixed the join syntax.