I'm trying to get a count of a number of policies issued per month. This is close to returning the correct information:
SELECT count(policy_no), left(issue_date,6)
FROM table_a
WHERE indicator = 'fln'
GROUP BY left(issue date,6)
the indicator is narrowing it down to the types of policies I want. The only problem I'm having is that there will be an entry with an identical policy number every year as the policy renews. I need to only count the lowest issue date for each policy, not every policy every time. If a policy was issued in November of 2010, I want it to count that one time, not once for November 2010,2011,2012, etc. The issue dates are in the format of yyyymmdd. Only year and month are relevant.
I'm sure this is an easy one for the more experienced among you, I haven't been able to piece it together by other questions on this forum. Any help would be appreciated!
Something like this will get what you want:
SELECT LEFT(FirstIssued, 6) AS YYMM, COUNT(DISTINCT Policy_No) AS NumPolicies
FROM
(
SELECT Policy_No, MIN(issue_date) AS FirstIssued
FROM table_a
WHERE indicator = 'fln'
GROUP BY Policy_No
) A
GROUP BY LEFT(FirstIssued,6)
The key is to first find the min date for each policy, before aggregating the counts. Note that the only months you will have appear are those with at least one policy, so if you would prefer to have 0s you need to add in a date generator.
Related
I have already seen all the related posts, but none have been able to help me.
I Have the following fields:
Where:
SOLD_AT is the date of each transaction
CUSTOMER_ID is a unique ID for each customer
COHORT is the date (Year-Month) of the first purchase of the user in that row
ORDER_MONTH is the date of (Year-Month) of the purchase in that row
PERIOD_NUMBER is the date difference in months between COHORT and ORDER_MONTH
N_CUSTOMERS is the number of customers in each PERIOD_NUMBER in each COHORT
In case is useful, I have the querys with which I have obtained these fields, but I think that including them would only add noise since the definition of each variable is more useful.
What I need to do and am not able to do is add an additional field for the retention of each period number of each cohort (not a pivot table by adding the period numbers of each cohort).
Specifically, I need the retention of each period number to be the division of the number of users of that period by the number of users of the previous period, in this way:
To do this in Python, I simply do:
cohort_pivot = df_cohort.pivot_table(index = 'cohort',
columns = 'period_number',
values = 'n_customers')
cohort_size = cohort_pivot.iloc[:,0]
retention_matrix1 = cohort_pivot.divide(cohort_size, axis = 0)
and I can then unpivot and take out the retention for each period of each cohort to create an additional column with this value.
One of the answers that I tried because it was the closest thing I saw was the answer chosen in this post, but I am not able to know the number of periods_numbers or historical months that I am going to have since the code has to be dynamic for any company that is loaded (For example, in DBT, which is the tool I'm using, you can create dynamic pivot tables instead of static ones that require to know this information, but as I say I need to create the field, not the pivot table)
Any ideas will be more than welcome, thank you very much
I desperately need some help from your brains to solve one SQL problem I have now.
I have a very simple table made of two columns: Client # and Purchasing Date.
I want to add one more column to show how many days have passed since the previous Purchasing Date per each Client #. Below is my current query to create the starting table.
select client_id, purchasing_date
from sales.data
The result looks like this (apparently, I need more reputation to post images):
https://imgur.com/a/IP1ot
The highlighted column on the right is the column I want to create.
Basically, that shows the number of days elapsed since the previous purchasing date of each Client #. For the first purchase of each Client, it will be just 0.
I'm not sure if I have explained enough to help you guys produce solutions - if you have any questions, please let me know.
Thanks!
Use lag():
select client_id, purchasing_date,
(purchasing_date -
lag(purchasing_date, 1, purchasing_date) over (partition by client_id
order by purchasing_date
)
) as day_diff
from sales.data
I'm currently working on a project in which I want to aggregate data (resolution = 15 minutes) to weekly values.
I have 4 weeks and the view should include a value for each week AND every station.
My dataset includes more than 50 station.
What I have is this:
select name, avg(parameter1), avg(parameter2)
from data
where week in ('29','30','31','32')
group by name
order by name
But it only displays the avg value of all weeks. What I need is avg values for each week and each station.
Thanks for your help!
The problem is that when you do a 'GROUP BY' on just name you then flatten the weeks and you can only perform aggregate functions on them.
Your best option is to do a GROUP BY on both name and week so something like:
select name, week, avg(parameter1), avg(parameter2)
from data
where week in ('29','30','31','32')
group by name, week
order by name
PS - It' not entirely clear whether you're suggesting that you need one set of results for stations and one for weeks, or whether you need a set of results for every week at every station (which this answer provides the solution for). If you require the former then separate queries are the way to go.
I've got a table with purchase orders stored in it. Each row has a timestamp indicating when the order was placed. I'd like to be able to create a report indicating the number of purchases each day, month, or year. I figured I would do a simple SELECT COUNT(xxx) FROM tbl_orders GROUP BY tbl_orders.purchase_time and get the value, but it turns out I can't GROUP BY a timestamp column.
Is there another way to accomplish this? I'd ideally like a flexible solution so I could use whatever timeframe I needed (hourly, monthly, weekly, etc.) Thanks for any suggestions you can give!
This does the trick without the date_trunc function (easier to read).
// 2014
select created_on::DATE from users group by created_on::DATE
// updated September 2018 (thanks to #wegry)
select created_on::DATE as co from users group by co
What we're doing here is casting the original value into a DATE rendering the time data in this value inconsequential.
Grouping by a timestamp column works fine for me here, keeping in mind that even a 1-microsecond difference will prevent two rows from being grouped together.
To group by larger time periods, group by an expression on the timestamp column that returns an appropriately truncated value. date_trunc can be useful here, as can to_char.
Suppose ,I have a table which has all the billing records. Now I want to see the sales trend for a user given time duration group by each 3 days ...what should be the sql query regarding this?
please help,Otherwise I am gone ...
I can only give a vague suggestion as per the question, however you may want to have a derived column with a standardised date (as per MS date format, just a number per day) that you could then use a modulus (3) on so that days are equal per 3 day period. You can then group and aggregate over this column to get the values for a 3 day period. Obviously to display the date nicely you would have to multiply back and convert your column as well.
Again I'm not sure of the specifics, but I think this general idea could be achieved to get a result (may well not be the best way so it would help to add more to the question...)