GROUP BY number with aggregate being a date - sql

In a table I have an account number and a corresponding date. From a query I want to get only distinct account numbers and a corresponding date. Can I do this in 1 query or does there need to be multiple queries? If I use: SELECT DISTINCT account, date; then a still get duplicate accounts since it looks for unique combinations of account and date. If I use GROUP BY then how do I select only 1 date if there are multiple dates to 1 account?

You've got an account number and a date? And the account number can be redundant, if there are more than one dates? Then you get a logical problem, because which date should sql select?
account number | date
---------------------------------------
1002 | 2013-01-01
1003 | 2013-03-12
1003 | 2013-03-13
1003 | 2013-03-16
1004 | 2013-06-11
You can use functions like "max" or "min". If you need a more complex logic, let us know:
Select account, max(date) FROM tablename GROUP BY account
So you will get unique account numbers with the latest date.

Related

Subsetting on dates for a SQL query

Using Snowflake, I am attempting to subset on customers that have no current subscriptions, and eliminating all IDs for those which have current/active contracts.
Each ID will typically have multiple records associated with a contract/renewal history for a particular ID/customer.
It is only known if a customer is active if there is no contract that goes beyond the current date, while there are likely multiple past contracts which have lapsed, but the account is still active if one of those contract end dates goes beyond the current date.
Consider the following table:
Date_Start
Date_End
Name
ID
2015-07-03
2019-07-03
Piggly
001
2019-07-04
2025-07-04
Piggly
001
2013-10-01
2017-12-31
Doggy
031
2018-01-01
2018-06-30
Doggy
031
2020-01-01
2021-03-14
Catty
022
2021-03-15
2024-06-01
Catty
022
1999-06-01
2021-06-01
Horsey
052
2021-06-02
2022-01-01
Horsey
052
2022-01-02
2022-07-04
Horsey
052
With a desired output non-active customers that do not have an end date beyond Jan 5th 2023 (or current/arbitrary date)
Name
ID
Doggy
031
Horsey
052
My first attempt was:
SELECT Name, ID
FROM table
WHERE Date_End < GETDATE()
but the obvious problem is that I'll also be selecting past contracts of customers who haven't expired/churned and who have a contract that goes beyond the current date.
How do I resolve this?
As there are many rows per name and ID, you should aggregate the data and then use a HAVING clause to select only those you are interested in.
SELECT name, id
FROM table
GROUP BY name, id
HAVING MAX(date_end) < GETDATE();
You can work it out with an EXCEPT operator, if your DBMS supports it:
SELECT DISTINCT Name, ID FROM tab
EXCEPT
SELECT DISTINCT Name, ID FROM tab WHERE Date_end > <your_date>
This would removes the active <Name, ID> pairs from the whole.

Grouping the data having two different dates and take the latest date

Table:1
Date Customer Amount
12-Dec ABC 200
15-Dec ABC 300
Output:
I need to group the data by Customer and need to take the latest date for that unique record.
Date Customer Amount
15-Dec ABC 500
You seems to want aggregation :
select max(to_date(date, 'DD-MON-YYYY')), cust, sum(amount)
from table t
group by cust;

Query to find value in column dependent on a different column in table being the minimum date

I have a dataset that looks like this. I would like to pull a distinct id, the minimum date and value on the minimum date.
id date value
1 01/01/2020 0.5
1 02/01/2020 1
1 03/01/2020 2
2 01/01/2020 3
2 02/01/2020 4
2 03/01/2020 5
This code will pull the id and the minimum date
select Distinct(id), min(nav_date)
from table
group by id
How can I get the value on the minimum date so the output of my query looks like this?
id date value
1 01/01/2020 0.5
2 01/01/2020 3
Use distinct on:
select distinct on (id) t.*
from t
order by id, date;
This can take advantage of an index on (id, date) and is typically the fastest way to do this operation in Postgres.

Selecting the most recent date

I have data structured like this:
ID | Enrolment_Date | Appointment1_Date | Appointment2_Date | .... | Appointment150_Date |
112 01/01/2015 01/02/2015 01/03/2018 01/08/2018
113 01/06/2018 01/07/2018 NULL NULL
114 01/04/2018 01/05/2018 01/06/2018 NULL
I need a new variable which counts the number of months between the enrolment_date and the most recent appointment. The challenge is is that all individuals have a different number of appointments.
Update: I agree with the comments that this is poor table design and it needs to be reformatted. Could proposed solutions please include suggested code on how to transform the table?
Since the OP is currently stuck with this bad design, I will point out a temporary solution. As others have suggested, you really must change the structure here. For now, this will suffice:
SELECT '['+ NAME + '],' FROM sys.columns WHERE OBJECT_ID = OBJECT_ID ('TableA') -- find all columns, last one probably max appointment date
SELECT ID,
Enrolment_Date,
CASE WHEN Appointment150_Date IS NOT NULL THEN DATEDIFF (MONTH, Enrolment_Date, Appointment150_Date)
WHEN Appointment149_Date IS NOT NULL THEN DATEDIFF (MONTH, Enrolment_Date, Appointment149_Date)
WHEN Appointment148_Date IS NOT NULL THEN DATEDIFF (MONTH, Enrolment_Date, Appointment148_Date)
WHEN Appointment147_Date IS NOT NULL THEN DATEDIFF (MONTH, Enrolment_Date, Appointment147_Date)
WHEN Appointment146_Date IS NOT NULL THEN DATEDIFF (MONTH, Enrolment_Date, Appointment146_Date)
WHEN Appointment145_Date IS NOT NULL THEN DATEDIFF (MONTH, Enrolment_Date, Appointment145_Date)
WHEN Appointment144_Date IS NOT NULL THEN DATEDIFF (MONTH, Enrolment_Date, Appointment144_Date) -- and so on
END AS NumberOfMonths
FROM TableA
This is a very ugly temporary solution and should be considered as such.
You will need to restructure your data, the given structure is poor database design. Create two separate tables - one called users and one called appointments. The users table contains the user id, enrollment date and any other specific user information. Each row in the appointments table contains the user's unique id and a specific appointment date. Structuring your tables like this will make it easier to write a query to get days/months since last appointment.
For example:
Users Table:
ID, Enrollment_Date
1, 2018-01-01
2, 2018-03-02
3, 2018-05-02
Appointments Table:
ID, Appointment_Date
1, 2018-01-02
1, 2018-02-02
1, 2018-02-10
2, 2018-05-01
You would then be able to write a query to join the two tables together and calculate the difference between the enrollment date and min value of the appointment date.
It is better if you can create two tables.
Enrolment Table (dbo.Enrolments)
ID | EnrolmentDate
1 | 2018-08-30
2 | 2018-08-31
Appointments Table (dbo.Appointments)
ID | EnrolmentID | AppointmentDate
1 | 1 | 2018-09-02
2 | 1 | 2018-09-03
3 | 2 | 2018-09-01
4 | 2 | 2018-09-03
Then you can try something like this.
If you want the count of months from Enrolment Date to the final appointment date then use below query.
SELECT E.ID, E.EnrolmentDate, A.NoOfMonths
FROM dbo.Enrolments E
OUTER APPLY
(
SELECT DATEDIFF(mm, E.EnrolmentDate, MAX(A.AppointmentDate)) AS NoOfMonths
FROM dbo.Appointments A
WHERE A.EnrolmentId = E.ID
) A
And, If you want the count of months from Enrolment Date to the nearest appointment date then use below query.
SELECT E.ID, E.EnrolmentDate, A.NoOfMonths
FROM dbo.Enrolments E
OUTER APPLY
(
SELECT DATEDIFF(mm, E.EnrolmentDate, MIN(A.AppointmentDate)) AS NoOfMonths
FROM dbo.Appointments A
WHERE A.EnrolmentId = E.ID
) A
Try this on sqlfiddle
You have a lousy data structure, as others have noted. You really one a table with one row per appointment. After all, what happens after the 150th appointment?
select t.id, t.Enrolment_Date,
datediff(month, t.Enrolment_Date, m.max_Appointment_Date) as months_diff
from t cross apply
(select max(Appointment_Date) as max_Appointment_Date
from (values (Appointment1_Date),
(Appointment2_Date),
. . .
(Appointment150_Date)
) v(Appointment_Date)
) m;

SQL Group By - counting records per month/year, error on insert - NOT A VALID MONTH

I have this example data:
Country | Members | Joined
USA | 250 | 1/1/2012
USA | 100 | 1/8/2012
Russia | 75 | 1/20/2012
USA | 150 | 2/10/2012
When I query this data I would like to aggregate all the records in a given month. The result of the query would look like:
Country | Members | Joined
USA | 350 | 1/2012
Russia | 75 | 1/2012
USA | 150 | 2/2012
As a select that is simple enough:
select country, count(*) as members , to_char(trunc(joined), 'MM-YYYY')
from table
group by country, to_char(trunc(joined), 'MM-YYYY')
That query will give me data in the format I want, however my issue is that when I go to insert that into a new pivot table I get an error because the to_char() in the select statement is being placed into a DATETIME column (error: ORA-01843 - not a valid month)
When I change the to_char() in the select to to_date() , it still doesn't work (same error, ORA-01843 - not a valid month):
select country, count(*) as members, to_date(trunc(joined), 'MM-YYYY')
from table
group by country, to_date(trunc(joined), 'MM-YYYY')
Any suggestions on how to modify this query in such a way that I can insert the result into a new table whose "JOINED" column is of type DATETIME?
thanks in advance for any tips/suggestions/comments!
You can do something like to_date('01/'||trunc(joined), 'DD/MM/YYYY'), which would turn it into a valid date first.
You just need to decide whether to use the first or last day of the month (last is more complicated)
Another option is to use the EXTRACT function:
select country, count(*) as members, EXTRACT(MONTH FROM joined) as mn, EXTRACT(YEAR FROM JOINED) as yr,MIN(JOINED) as dt
from table
group by country, EXTRACT(MONTH FROM joined), EXTRACT(YEAR FROM JOINED)
and then from that, you could just select the dt column and insert it
You should be using the trunc function to truncate the date to the first of the month. That eliminates the conversion of the date to a string and the need to convert the string back to a date.
select country,
count(*) as members ,
trunc(joined, 'MM')
from table
group by country,
trunc(joined, 'MM')