I desperately need some help from your brains to solve one SQL problem I have now.
I have a very simple table made of two columns: Client # and Purchasing Date.
I want to add one more column to show how many days have passed since the previous Purchasing Date per each Client #. Below is my current query to create the starting table.
select client_id, purchasing_date
from sales.data
The result looks like this (apparently, I need more reputation to post images):
https://imgur.com/a/IP1ot
The highlighted column on the right is the column I want to create.
Basically, that shows the number of days elapsed since the previous purchasing date of each Client #. For the first purchase of each Client, it will be just 0.
I'm not sure if I have explained enough to help you guys produce solutions - if you have any questions, please let me know.
Thanks!
Use lag():
select client_id, purchasing_date,
(purchasing_date -
lag(purchasing_date, 1, purchasing_date) over (partition by client_id
order by purchasing_date
)
) as day_diff
from sales.data
Related
I have already seen all the related posts, but none have been able to help me.
I Have the following fields:
Where:
SOLD_AT is the date of each transaction
CUSTOMER_ID is a unique ID for each customer
COHORT is the date (Year-Month) of the first purchase of the user in that row
ORDER_MONTH is the date of (Year-Month) of the purchase in that row
PERIOD_NUMBER is the date difference in months between COHORT and ORDER_MONTH
N_CUSTOMERS is the number of customers in each PERIOD_NUMBER in each COHORT
In case is useful, I have the querys with which I have obtained these fields, but I think that including them would only add noise since the definition of each variable is more useful.
What I need to do and am not able to do is add an additional field for the retention of each period number of each cohort (not a pivot table by adding the period numbers of each cohort).
Specifically, I need the retention of each period number to be the division of the number of users of that period by the number of users of the previous period, in this way:
To do this in Python, I simply do:
cohort_pivot = df_cohort.pivot_table(index = 'cohort',
columns = 'period_number',
values = 'n_customers')
cohort_size = cohort_pivot.iloc[:,0]
retention_matrix1 = cohort_pivot.divide(cohort_size, axis = 0)
and I can then unpivot and take out the retention for each period of each cohort to create an additional column with this value.
One of the answers that I tried because it was the closest thing I saw was the answer chosen in this post, but I am not able to know the number of periods_numbers or historical months that I am going to have since the code has to be dynamic for any company that is loaded (For example, in DBT, which is the tool I'm using, you can create dynamic pivot tables instead of static ones that require to know this information, but as I say I need to create the field, not the pivot table)
Any ideas will be more than welcome, thank you very much
I am quite new to SQL, have been learning for ~3 weeks, and have taken a liking to it. Hoping to polish up my skills before beginning to apply to Data Analyst roles.
I've been working with a dummy dvd-rental database and have found myself unable to solve a challenge given to me by a peer. The question was: "what is the most expensive rental for the 4th customer?"
We can see in picture, that based on the nth_customer column, Terrance Roush is the 4th ever customer (he's the 4th ever person to pay). But the issue is that the nth_customer column is actually reporting back the nth order and continues counting to infinity. So the next time Terrance shows up, the nth_customer column will not show '4' (which is what I was hoping to achieve).
Would appreciate any feedback on how to solve this. Thank you in advance.
If "the fourth customer" means the customer who did the fourth rental, you can break the problem down into two - finding that fourth customer, and finding their most expensive rental. Something like this:
SELECT *
FROM payment
WHERE customer_id = (
SELECT customer_id
FROM payment
ORDER BY payment_date
LIMIT 1 OFFSET 3
)
ORDER BY amount DESC
LIMIT 1;
Here I'm finding the ID of the fourth customer in the subquery, using a LIMIT & OFFSET to get just the one record I want. Then in the outer query I'm simply ordering all of that customer's records and taking the one with the biggest amount.
I'm trying to get a count of a number of policies issued per month. This is close to returning the correct information:
SELECT count(policy_no), left(issue_date,6)
FROM table_a
WHERE indicator = 'fln'
GROUP BY left(issue date,6)
the indicator is narrowing it down to the types of policies I want. The only problem I'm having is that there will be an entry with an identical policy number every year as the policy renews. I need to only count the lowest issue date for each policy, not every policy every time. If a policy was issued in November of 2010, I want it to count that one time, not once for November 2010,2011,2012, etc. The issue dates are in the format of yyyymmdd. Only year and month are relevant.
I'm sure this is an easy one for the more experienced among you, I haven't been able to piece it together by other questions on this forum. Any help would be appreciated!
Something like this will get what you want:
SELECT LEFT(FirstIssued, 6) AS YYMM, COUNT(DISTINCT Policy_No) AS NumPolicies
FROM
(
SELECT Policy_No, MIN(issue_date) AS FirstIssued
FROM table_a
WHERE indicator = 'fln'
GROUP BY Policy_No
) A
GROUP BY LEFT(FirstIssued,6)
The key is to first find the min date for each policy, before aggregating the counts. Note that the only months you will have appear are those with at least one policy, so if you would prefer to have 0s you need to add in a date generator.
I want to get the average of days between some dates, for example, I have a table called Patient that has the id of the registration, patient's id, entry date and final date:
(1,1,'07-04-2014','08-04-2014'),
(2,2,'07-04-2014','07-04-2014'),
(3,3,'08-04-2014','10-04-2014'),
(4,4,'09-04-2014','10-04-2014')
I want to get the average of days of the entry fields, I have tried a lot of thing but I only get random results. I tried with dtiff but it needs two arguments and I only need one.
You could get the average DURATION between a fixed date and the date field. But averaging a date doesn't really make sense.
SELECT AVG(DATEDIFF(DD,'19700101',dateField)) AS avgDays
You could say the "average" date would then be: DATEADD(DD,avgDays,'19700101')
But I'm not sure if that makes sense in the context of what you're trying to do.
thanks for answering, maybe I couldn't express what I wanted to do but I found a solution and it was actually vey simple:
select Patient.Name, avg(day(Patient.FinalDate) - day(Patient.EntryDate)) as [Average] from Patient,DetailPatient where Patient.IdPatient=DetailPatient.IdPatiene group by DetailPatient.Name
I know it looks very simple haha, but is the first time I use avg function this way.
Thank you guys.
Suppose ,I have a table which has all the billing records. Now I want to see the sales trend for a user given time duration group by each 3 days ...what should be the sql query regarding this?
please help,Otherwise I am gone ...
I can only give a vague suggestion as per the question, however you may want to have a derived column with a standardised date (as per MS date format, just a number per day) that you could then use a modulus (3) on so that days are equal per 3 day period. You can then group and aggregate over this column to get the values for a 3 day period. Obviously to display the date nicely you would have to multiply back and convert your column as well.
Again I'm not sure of the specifics, but I think this general idea could be achieved to get a result (may well not be the best way so it would help to add more to the question...)