SQL Query - Design struggle - sql

I am fairly new to SQL Server (2012) but I was assigned the project where I have to use it.
The database consists of one table (counted in millions of rows) which looks mainly like this:
Number (float) Date (datetime) Status (nvarchar(255))
999 2016-01-01 14:00:00.000 Error
999 2016-01-02 14:00:00.000 Error
999 2016-01-03 14:00:00.000 Ok
999 2016-01-04 14:00:00.000 Error
888 2016-01-01 14:00:00.000 Error
888 2016-01-02 14:00:00.000 Ok
888 2016-01-03 14:00:00.000 Error
888 2016-01-04 14:00:00.000 Error
777 2016-01-01 14:00:00.000 Error
777 2016-01-02 14:00:00.000 Error
I have to create a query which will show me only the phone numbers (one number per row so probably Group by number?) that meet the conditions:
Number reappears at least 3 times
Last two times (that has to be based on date; originally records are not sorted by date) has to be an Error
For example, in the table above the phone number that meets the criteria is only 888, beacuse for 999 2nd newest status is Ok and number 777 reoccurs only 2 times.
I will appreciate any kind of help!
Thanks in advance!

You can use row_number() and conditional aggregation:
select number
from (select t.*,
row_number() over (partition by number order by date desc) as seqnum
from t
) t
group by number
having count(*) >= 3 and
max(case when seqnum = 1 then status end) = 'Error' and
max(case when seqnum = 2 then status end) = 'Error';
Note: float is a really, really bad type to use for the "number" column. In particular, two numbers can look the same but differ in low-order bits. They will produce different rows in the group by.
You should probably use varchar() for telephone numbers. That gives you the most flexibility. If you need to store the number as a number, then decimal/numeric is a much, much better choice than float.

select *, ROW_NUMBER() OVER(partition by Number, order by date desc) as times
FROM
(
select Number, Date
From table
where Number in
(
select Number
from table
group by Number
having count (*) >3
) as ABC
WHERE ABC.times in (1,2) and ABC.Status = 'Error'

with CTE as
(
select t1.*, row_number() over(partition by t1.Number order by t1.date desc) as r_ord
from MyTable t1
)
select C1.*
from CTE C1
inner join
(
select Number
from CTE
group by Number
having max(r_ord) >=3
) C2
on C1.Number = C2.Number
where C1.r_ord in (1,2)
and C1.Status = 'Error'

Related

Expanding/changing my query to find more entries using (potentially) IFELSE

My question will use this dataset as an example. I have a query setup (I have changed variables to more generic variables for the sake of posting this on the internet so the query may not make perfect sense) that picks the most recent date for a given account. So the query returns values with a reason_type of 1 with the most recent date. This query has effective_date set to is not null.
account date effective_date value reason_type
123456 4/20/2017 5/1/2017 5 1
123456 1/20/2017 2/1/2017 10 1
987654 2/5/2018 3/1/2018 15 1
987654 12/31/2017 2/1/2018 20 1
456789 4/27/2018 5/1/2018 50 1
456789 1/24/2018 2/1/2018 60 1
456123 4/25/2017 null 15 2
789123 5/1/2017 null 16 2
666888 2/1/2018 null 31 2
333222 1/1/2018 null 20 2
What I am looking to do now is to basically use that logic to only apply to reason_type
if there is an entry for it, otherwise have it default to reason_type
I think I should be using an IFELSE, but I'm admittedly not knowledgeable about how I would go about that.
Here is the code that I currently have to return the reason_type 1s most recent entry.
I hope my question is clear.
SELECT account, date, effective_date, value, reason_type
from
(
SELECT account, date, effective_date, value, reason_type
ROW_NUMBER() over (partition by account order by date desc) rn
from mytable
WHERE value is not null
AND effective_date is not null
)
WHERE rn =1
I think you might want something like this (do you really have a column named date by the way? That seems like a bad idea):
SELECT account, date, effective_date, value, reason_type
FROM (
SELECT account, date, effective_date, value, reason_type
, ROW_NUMBER() OVER ( PARTITION BY account ORDER BY date DESC ) AS rn
FROM mytable
WHERE value IS NOT NULL
) WHERE rn = 1
-- effective_date IS NULL or is on or before today's date
AND ( effective_date IS NULL OR effective_date < TRUNC(SYSDATE+1) );
Hope this helps.

SQL select specific group from table

I have a table named trades like this:
id trade_date trade_price trade_status seller_name
1 2015-01-02 150 open Alex
2 2015-03-04 500 close John
3 2015-04-02 850 close Otabek
4 2015-05-02 150 close Alex
5 2015-06-02 100 open Otabek
6 2015-07-02 200 open John
I want to sum up trade_price grouped by seller_name when last (by trade_date) trade_status was 'open'. That is:
sum_trade_price seller_name
700 John
950 Otabek
The rows where seller_name is Alex are skipped because the last trade_status was 'close'.
Although I can get desirable output result with the help of nested select
SELECT SUM(t1.trade_price), t1.seller_name
WHERE t1.seller_name NOT IN
(SELECT t2.seller_name FROM trades t2
WHERE t2.seller_name = t1.seller_name AND t2.trade_status = 'close'
ORDER BY t2.trade_date DESC LIMIT 1)
from trades t1
group by t1.seller_name
But it takes more than 1 minute to execute above query (I have approximately 100K rows).
Is there another way to handle it?
I am using PostgreSQL.
I would approach this with window functions:
SELECT SUM(t.trade_price), t.seller_name
FROM (SELECT t.*,
FIRST_VALUE(trade_status) OVER (PARTITION BY seller_name ORDER BY trade_date desc) as last_trade_status
FROM trades t
) t
WHERE last_trade_status <> 'close;
GROUP BY t.seller_name;
This should perform reasonably with an index on seller_name
select
sum(trade_price) as sum_trade_price,
seller_name
from
trades
inner join
(
select distinct on (seller_name) seller_name, trade_status
from trades
order by seller_name, trade_date desc
) s using (seller_name)
where s.trade_status = 'open'
group by seller_name

SQL: Take maximum value, but if a field is missing for a particular ID, ignore all values

This is somewhat difficult to explain...(this is using SQL Assistant for Teradata, which I'm not overly familiar with).
ID creation_date completion_date Difference
123 5/9/2016 5/16/2016 7
123 5/14/2016 5/16/2016 2
456 4/26/2016 4/30/2016 4
456 (null) 4/30/2016 (null)
789 3/25/2016 3/31/2016 6
789 3/1/2016 3/31/2016 30
An ID may have more than one creation_date, but it will always have the same completion_date. If the creation_date is populated for all records for an ID, I want to return the record with the most recent creation_date. However, if ANY creation_date for a given ID is missing, I want to ignore all records associated with this ID.
Given the data above, I would want to return:
ID creation_date completion_date Difference
123 5/14/2016 5/16/2016 2
789 3/25/2016 3/31/2016 6
No records are returned for 456 because the second record has a missing creation_date. The record with the most recent creation_date is returned for 123 and 789.
Any help would be greatly appreciated. Thanks!
Depending on your database, here's one option using row_number to get the max date per group. You can then filter those results with not exists to check against null values:
select *
from (
select *,
row_number() over (partition by id order by creation_date desc) rn
from yourtable
) t
where rn = 1 and not exists (
select 1
from yourtable t2
where t2.creationdate is null and t.id = t2.id
)
row_number is a window function that is supported in many databases. mysql doesn't but you can achieve the same result using user-defined variables.
Here is a more generic version using conditional aggregation:
select t.*
from yourtable t
join (select id, max(creation_date) max_creation_date
from yourtable
group by id
having count(case when creation_date is null then 1 end) = 0
) t2 on t.id = t2.id and t.creation_date = t2.max_creation_date
SQL Fiddle Demo

How to rank partitions by date order when values which I am partitioning on can repeat?

I have a query which looks for the number of different values of a key field over a period of time and assigns a rank to the values in the order they occur.
So, for example I might have:
ID Date Value
1 2010-01-01 125.00
1 2010-02-01 125.00
1 2010-03-01 130.00
1 2010-04-01 131.00
1 2010-05-01 131.00
1 2010-06-01 131.00
1 2010-07-01 126.00
1 2010-08-01 140.00
I am using
ROW_NUMBER() over(partition by [ID] order by [Date]) as [row]
to rank the different values of the Value column in the date order they occur. So I would get something like
Value row
125.00 1
130.00 2
131.00 3
126.00 4
etc
THe problem I am having is that sometimes a value might repeat. So in the above example if the value on 1st August was 125.00 for example. I want to treat this as a seperate occurance but using the ranking function I am using at the moment it obviously gets aggregated into a partition with the other instances of 125.00 when calculating the row number.
What's the easiest way for me to overcome this problem please? Thanks in advance!
This should work:
WITH A
AS
(SELECT ID, [Date], [Value], ROW_NUMBER() over(partition by [ID] order by [Value], [Date]) as [row]
FROM YourTable)
SELECT A.[Date], A.[Value], B.min_row as row
FROM A JOIN (SELECT ID, [Value], MIN([row]) AS min_row
FROM A) AS B
ON A.ID = B.ID AND A.[Value] = B.[Value]

Select info from table where row has max date

My table looks something like this:
group date cash checks
1 1/1/2013 0 0
2 1/1/2013 0 800
1 1/3/2013 0 700
3 1/1/2013 0 600
1 1/2/2013 0 400
3 1/5/2013 0 200
-- Do not need cash just demonstrating that table has more information in it
I want to get the each unique group where date is max and checks is greater than 0. So the return would look something like:
group date checks
2 1/1/2013 800
1 1/3/2013 700
3 1/5/2013 200
attempted code:
SELECT group,MAX(date),checks
FROM table
WHERE checks>0
GROUP BY group
ORDER BY group DESC
problem with that though is it gives me all the dates and checks rather than just the max date row.
using ms sql server 2005
SELECT group,MAX(date) as max_date
FROM table
WHERE checks>0
GROUP BY group
That works to get the max date..join it back to your data to get the other columns:
Select group,max_date,checks
from table t
inner join
(SELECT group,MAX(date) as max_date
FROM table
WHERE checks>0
GROUP BY group)a
on a.group = t.group and a.max_date = date
Inner join functions as the filter to get the max record only.
FYI, your column names are horrid, don't use reserved words for columns (group, date, table).
You can use a window MAX() like this:
SELECT
*,
max_date = MAX(date) OVER (PARTITION BY group)
FROM table
to get max dates per group alongside other data:
group date cash checks max_date
----- -------- ---- ------ --------
1 1/1/2013 0 0 1/3/2013
2 1/1/2013 0 800 1/1/2013
1 1/3/2013 0 700 1/3/2013
3 1/1/2013 0 600 1/5/2013
1 1/2/2013 0 400 1/3/2013
3 1/5/2013 0 200 1/5/2013
Using the above output as a derived table, you can then get only rows where date matches max_date:
SELECT
group,
date,
checks
FROM (
SELECT
*,
max_date = MAX(date) OVER (PARTITION BY group)
FROM table
) AS s
WHERE date = max_date
;
to get the desired result.
Basically, this is similar to #Twelfth's suggestion but avoids a join and may thus be more efficient.
You can try the method at SQL Fiddle.
Using an in can have a performance impact. Joining two subqueries will not have the same performance impact and can be accomplished like this:
SELECT *
FROM (SELECT msisdn
,callid
,Change_color
,play_file_name
,date_played
FROM insert_log
WHERE play_file_name NOT IN('Prompt1','Conclusion_Prompt_1','silent')
ORDER BY callid ASC) t1
JOIN (SELECT MAX(date_played) AS date_played
FROM insert_log GROUP BY callid) t2
ON t1.date_played = t2.date_played
SELECT distinct
group,
max_date = MAX(date) OVER (PARTITION BY group), checks
FROM table
Should work.