Subsetting on dates for a SQL query

Subsetting on dates for a SQL query - sql

Using Snowflake, I am attempting to subset on customers that have no current subscriptions, and eliminating all IDs for those which have current/active contracts.
Each ID will typically have multiple records associated with a contract/renewal history for a particular ID/customer.
It is only known if a customer is active if there is no contract that goes beyond the current date, while there are likely multiple past contracts which have lapsed, but the account is still active if one of those contract end dates goes beyond the current date.
Consider the following table:
Date_Start
Date_End
Name
ID
2015-07-03
2019-07-03
Piggly
001
2019-07-04
2025-07-04
Piggly
001
2013-10-01
2017-12-31
Doggy
031
2018-01-01
2018-06-30
Doggy
031
2020-01-01
2021-03-14
Catty
022
2021-03-15
2024-06-01
Catty
022
1999-06-01
2021-06-01
Horsey
052
2021-06-02
2022-01-01
Horsey
052
2022-01-02
2022-07-04
Horsey
052
With a desired output non-active customers that do not have an end date beyond Jan 5th 2023 (or current/arbitrary date)
Name
ID
Doggy
031
Horsey
052
My first attempt was:
SELECT Name, ID
FROM table
WHERE Date_End < GETDATE()
but the obvious problem is that I'll also be selecting past contracts of customers who haven't expired/churned and who have a contract that goes beyond the current date.
How do I resolve this?

As there are many rows per name and ID, you should aggregate the data and then use a HAVING clause to select only those you are interested in.
SELECT name, id
FROM table
GROUP BY name, id
HAVING MAX(date_end) < GETDATE();

You can work it out with an EXCEPT operator, if your DBMS supports it:
SELECT DISTINCT Name, ID FROM tab
EXCEPT
SELECT DISTINCT Name, ID FROM tab WHERE Date_end > <your_date>
This would removes the active <Name, ID> pairs from the whole.

Related

Using Distinct and MAX(date) in a large data

I have a table that stores the list of users who have accessed a product(with the accessed date).
I have written the below query to get the list of users who have accessed the product B between '2021-02-01' and '2021-02-26'.
SELECT DISTINCT UserName,Country,ADate,Product FROM Report WHERE UserName != '-' and Product='B and (CAST(ADate AS DATE) BETWEEN #startdate AND #enddate '
then it gives the below result:
UserName Country ADate Product
-------- ------ -------- ---------
asson IN 2021-02-10 00:00:00.000 B
rajan US 2021-02-23 00:00:00.000 B
rajan US 2021-02-25 00:00:00.000 B
moody US 2021-02-14 00:00:00.000 B
rajon US 2021-02-01 00:00:00.000 B
lukman US 2021-02-10 00:00:00.000 B
since the user rajan has accessed the product in 2 days it shows 2 entries for rajan even though I have added distinct. So I have modified the query as below:
SELECT DISTINCT UserName,Country,max(ADate),Product FROM Report WHERE UserName != '-' and Product='B' and (CAST(ADate AS DATE) BETWEEN #startdate AND #enddate group by Username,product
This query gives me the required result. But the problem I am facing now is When I select the table with more than a month gap (say data between 2 months), I miss some data in the output. I believe it might be due to the MAX(ADate). Can anyone give a good suggestion to get rid of this issue?

This will give you the latest access date of each user by month
SELECT DISTINCT UserName,Country, month(ADate) as month, max(ADate),Product FROM Report WHERE UserName != '-' and Product='B' group by UserName, Country, month, Product

Can not understand the logic of this query

This query is trying to get the s1ppmp (the price of product) of each s1ilie (size), each s1iref (reference) and s1ydat (the lastest date) for the price, because one product could have more than one price on different dates, for example, during the black friday or the normal price for the other days.
The anmoisjour comes from calender table, but there is no connection between CALENDER table and main table msk100, so ... I don't understand the logic of this query...
SELECT s1isoc,
s1ilie,
s1iref,
s1ydat,
anmoisjour,
s1ppmp
FROM msk110
INNER JOIN (SELECT s1isoc AS isoc,
s1ilie AS ilie,
s1iref AS iref,
MAX(s1ydat) AS ydat,
anmoisjour
FROM calendrier,
msk110
WHERE s1ydat <= anmoisjour
AND anmoisjour BETWEEN 20100101 AND 20302131
GROUP BY s1isoc,
s1ilie,
s1iref,
anmoisjour) a ON s1isoc = isoc
AND s1ilie = ilie
AND s1iref = iref
AND s1ydat = ydat
WHERE s1isoc = 1
AND anmoisjour BETWEEN 20100101 AND 20302131
ORDER BY anmoisjour,
s1ydat;
s1isoc, s1ilie, s1iref, s1ydat,and s1ppmp comes from msk110
and
anmoisjour belongs to calender table, which is a date table.

I believe the confusion is the way that the calendar table is joined.
If anmoisjour is the day column of the calendar table and this table holds 1 row per day, the WHERE filter anmoisjour BETWEEN 20100101 AND 20302131 makes calendrier hold a row for each day for 20 years (2010 to 2030).
They way the product prices table msk100 is linked to the calendar calendrier table is not directly by date, but with a max date (msk100.s1ydat <= calendrier.anmoisjour). This means that for example a date of msk100.s1ydat that's 2015-01-01 will join against every row of the calendar table thats between 2015-01-01 and 2030-12-31.
The GROUP BY is by the calendar table's date (calendrier.anmoisjour) this means that if a particular product, size and price repeats on different dates, let's say the only occurrences are on dates 2015-01-01, 2017-01-01 and 2020-01-01, then the result of the group by would be the following (ordered by calendar date, displaying even NULL to demonstrate):
MAX(s1ydat) anmoisjour
null 2010-01-01
null ...
null 2014-12-31
2015-01-01 2015-01-01
2015-01-01 2015-01-02
2015-01-01 ...
2015-01-01 2016-01-01
2015-01-01 ...
2017-01-01 2017-01-01
2017-01-01 2017-01-02
2017-01-01 ...
2017-01-01 2019-12-31
2020-01-01 2020-01-01
2020-01-01 2025-01-01
2020-01-01 ...
What your query is showing is the contents of the product table with the last date that that particular product had that particular price, for each day over 20 years, also where s1isoc = 1 (which I don't know what that means).

How to have the rolling distinct count of each day for past three days in Oracle SQL?

I searched for this a lot, but I couldn't find the solution yet. let me explain my question by sample data and my desired output.
sample data:
datetime customer
---------- --------
2018-10-21 09:00 Ryan
2018-10-21 10:00 Sarah
2018-10-21 20:00 Sarah
2018-10-22 09:00 Peter
2018-10-22 10:00 Andy
2018-10-23 09:00 Sarah
2018-10-23 10:00 Peter
2018-10-24 10:00 Andy
2018-10-24 20:00 Andy
my desired output is to have the distinctive number of customers for past three days relative to each day:
trunc(datetime) progressive count distinct customer
--------------- -----------------------------------
2018-10-21 2
2018-10-22 4
2018-10-23 4
2018-10-24 3
explanation: for 21th, because we have only Ryan and Sarah the count is 2 (also because we have no other records before 21th); for 22th Andy and Peter are added to the distinct list, so it's 4. for 23th, no new customer is added so it would be 4. for 24th, however, as we only should consider past 3 days (as per business logic), we should only take 24th,23th and 22th; so the distinct customers would be Sarah, Andy and Peter. so the count is 3.
I believe it is called the progressive count, or moving count or rolling up count. but I couldn't implement it in Oracle 11g SQL. Obviously it's easy by using PL-SQL programming (Stored-Procedure/Function). but, preferably I wonder if we can have it by a single SQL query.

What you seem to want is:
select date,
count(distinct customer) over (order by date rows between 2 preceding and current row)
from (select distinct trunc(datetime) as date, customer
from t
) t
group by date;
However, Oracle does not support window frames with count(distinct).
One rather brute force approach is a correlated subquery:
select date,
(select count(distinct t2.customer)
from t t2
where t2.datetime >= t.date - 2
) as running_3
from (select distinct trunc(datetime) as date
from t
) t;
This should have reasonable performance for a small number of dates. As the number of dates increases, the performance will degrade linearly.

How to use sub query in group by clause sql server 2005

my table data as follows
FinishDate SpecialistName jobstate
----------------------- --------------- ---------
2012-10-01 00:00:00.000 Josh FINISHED
2012-10-01 00:00:00.000 Josh FINISHED
2012-10-01 00:00:00.000 Sam FINISHED
2012-10-01 00:00:00.000 Robin FINISHED
2012-10-01 00:00:00.000 Robin FINISHED
2012-10-01 00:00:00.000 Joy FINISHED
2012-10-01 00:00:00.000 Joy INCOMMING
2012-10-02 00:00:00.000 Joy FINISHED
my query as follows
select Count(*) [Count] from employee
where convert(varchar,FinishDate,112)>='20121001'
and convert(varchar,FinishDate,112) <='20121001'
and JobState='FINISHED'
group by SpecialistName
if a particular specialist finish multiple jobs in same day then i want to show 1
if robin,josh & Sam finish 10 jobs in same day then 3 will be shown for that day
then output will be like
FinishDate Count
----------------------- ------
2012-10-01 00:00:00.000 3
2012-10-02 00:00:00.000 5
2012-10-03 00:00:00.000 15
so just guide me how to customize my sql to have desire result. thanks

try something along these lines. Syntax may not be perfect (did "freehand")
Select
TheDate
, Count(*) [Count]
From
(
select
convert(varchar,FinishDate,112) TheDate
, SpecialistName
from employee
where convert(varchar,FinishDate,112)>='20121001'
and convert(varchar,FinishDate,112) <='20121001'
and JobState='FINISHED'
group by
convert(varchar,FinishDate,112)
, SpecialistName
) t1
Group By
TheDate
It has to be two selects because the groupings that you want are different. If you did a single select grouping by FinishDate and SpecialistName what you would get would be a count of the distinct combinations of those two.
What you want is to get the distinct SpecialistNames that had at least one entry in a date. Distinct because you care that they had an entry, but not whether they had 1 or 3 or 17. This is done by the inner query.
Then you want to take these distinct SpecialistName with corresponding date and summarize them by FinishDate to get a count of specialists by date. This is done by the outer query.
Part of your comment mentions Distinct and you could in fact use Select Distinct instead of Group By in the inner query since we don’t need a count there. The outer query does require the Group By since you do need a count. My own bias is to use group by rather than distinct in case I need an aggregate function later, but that’s me. It would be perfectly OK to use Select Distinct if you prefer.

GROUP BY number with aggregate being a date

In a table I have an account number and a corresponding date. From a query I want to get only distinct account numbers and a corresponding date. Can I do this in 1 query or does there need to be multiple queries? If I use: SELECT DISTINCT account, date; then a still get duplicate accounts since it looks for unique combinations of account and date. If I use GROUP BY then how do I select only 1 date if there are multiple dates to 1 account?

You've got an account number and a date? And the account number can be redundant, if there are more than one dates? Then you get a logical problem, because which date should sql select?
account number | date
---------------------------------------
1002 | 2013-01-01
1003 | 2013-03-12
1003 | 2013-03-13
1003 | 2013-03-16
1004 | 2013-06-11
You can use functions like "max" or "min". If you need a more complex logic, let us know:
Select account, max(date) FROM tablename GROUP BY account
So you will get unique account numbers with the latest date.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas