Finding repeat rate using sql - sql

I have a table mentioned below:
If X customers had made the purchase in the month of Jan, how many of them made them in Feb too i.e Y. (Repeat Rate: Y/X*100)
customer_no month
---------------------
1 jan
2 jan
3 jan
4 jan
11 jan
1 feb
2 feb
3 feb
9 feb
10 feb
Output:
Repeat_Rate
60%

i would do it like:
SELECT CAST(COUNT(yourtable_feb.customer_no) as FLOAT)
/ CAST(COUNT(yourtable_jan.customer_no) AS FLOAT) AS Repeating_Rate
FROM yourtable yourtable_jan
LEFT JOIN yourtable yourtable_feb
ON yourtable_jan.customer_no = yourtable_feb.customer_no
AND yourtable_feb.mymonth = 'feb'
WHERE yourtable_jan.mymonth = 'jan'
here a rextester, if you'd like to retest my query:
http://rextester.com/ESO11614

Related

T-SQL - How to select two columns data of same table by grouping and using two different where clause

I have orders table with two columns sales_amt, Disc_Amt. I want to find the each month total sales amount having discounts and without discount
Table:
month
sales_amt
Disc_Amt
Jan
20
Jan
30
Feb
5
2
Feb
30
Feb
10
5
Mar
80
10
Mar
20
Result required:
month
Sales_Amount_Without_Discount
Sales_Amount_With_Discount
Jan
50
0
Feb
30
15
Mar
80
20
I tried with the below query
Select
month(invoice_date),
sales_amt as Sales_Amount_Without_Discount,
Sales_Amount_Without_Discount = (select
sales_amt
from
orders
where
disc_amt > '0'
)
from
orders
Group by
month(invoice_date)
Currently I am getting result as below
month
Sales_Amount_Without_Discount
Sales_Amount_With_Discount
Jan
50
110
Feb
30
110
Mar
80
110
Thank you
You can do this by putting a case expression within your SUM(), e.g.
SELECT MONTH(invoice_date),
Sales_Amount_Without_Discount = SUM(sales_amt),
Sales_Amount_With_Discount = SUM(CASE WHEN Disc_Amt > 0 THEN sales_amt ELSE 0 END)
FROM orders
GROUP BY
MONTH(invoice_date);

SQL query to Find highest value in table and sum the corresponding value

I would like to group Highest values in month column group by year and Sum the value column
value
Year
Month
4
2019
10
1
2019
11
5
2019
11
1
2019
11
1
2019
12
8
2019
12
1
2019
12
1
2020
1
10
2020
1
3
2021
1
2
2021
2
11
2021
2
1
2021
2
3
2021
2
2
2021
3
In above table I would like to extract highest value of month group by year
in year 2019 highest month is 12 so there are 3 rows and sum of value column will be 10
The output should be
value
Year
Month
10
2019
12
11
2020
1
2
2021
3
supposing that the table is called "example_table" you can use the following query:
select sum(example_table.value), example_table.year, example_table.month
from example_table
join (
select year, max(month) "month"
from example_table
group by year
) sub on example_table.year = sub.year and example_table.month = sub.month
group by example_table.year, example_table.month
order by example_table.year

big query SQL - repeatedly/recursively change a row's column in the select statement based on the values in previous row

I have table like below
customer
date
end date
1
jan 1 2021
jan 30 2021
1
jan 2 2021
jan 31 2021
1
jan 3 2021
feb 1 2021
1
jan 27 2021
feb 26 2021
1
feb 3 2021
mar 5 2021
2
jan 2 2021
jan 31 2021
2
jan 10 2021
feb 9 2021
2
feb 10 2021
mar 12 2021
Now, I wanted to update the value in the 'end date' column of a row based on the values in the previous row 'end date' and the current row 'date'.
Say if the date in current row < end date of the previous row, I wanted to update the end date of the current row = (end date of the previous row).
I Wanted to do this repeated for all the rows (grouped by customer).
I want the output as below. Just need it in the select statement instead of a updating/inserting in a table.
Note - in below as the second row(end date) is updated with the value in the first row (jan 30 2021), now the third row value (jan 3 2021) is evaluated against the updated value in the second row (which is jan 30 2021) but not with the second row value before update (jan 31 2021).
customer
date
end date
1
jan 1 2021
jan 30 2021
1
jan 2 2021
jan 30 2021 [updated because current date < previous end date]
1
jan 3 2021
jan 30 2021[updated because current date < previous end date]
1
jan 27 2021
jan 30 2021 [updated because current date < previous end date]
1
feb 3 2021
mar 5 2021
2
jan 2 2021
jan 31 2021
2
jan 10 2021
jan 31 2021[updated because current date < previous end date]
2
feb 10 2021
mar 12 2021
I think I should go this way. I use the datasource twice just to get the way its needed to perform the operation without updating or inserting into the table.
input table:
1|2021-01-01|2021-01-30
1|2021-01-02|2021-01-31
1|2021-01-03|2021-02-01
1|2021-01-27|2021-02-26
1|2021-02-03|2021-03-05
2|2021-01-02|2021-01-31
2|2021-01-10|2021-02-09
2|2021-02-10|2021-03-12
code:
with num_raw_data as (
SELECT row_number() over(partition by customer)as num, customer,date_init,date_end
FROM `project-id.data-set.table`
), analyzed_data as(
select r.num,
r.customer,
r.date_init,
r.date_end,
case when date_init<(select date_end from num_raw_data where num=r.num-1 and customer=r.customer and EXTRACT(month FROM r.date_init)=EXTRACT(month FROM date_init)) then 1 else 0 end validation
from num_raw_data r
)
select customer,
date_init,
case when validation !=0 then (select MIN(date_end) from analyzed_data where validation=0 and customer=ad.customer and date_init<ad.date_end) else date_end end as date_end
from analyzed_data ad
order by customer,num
output:
1|2021-01-01|2021-01-30
1|2021-01-02|2021-01-30
1|2021-01-03|2021-01-30
1|2021-01-27|2021-01-30
1|2021-02-03|2021-03-05
2|2021-01-02|2021-01-31
2|2021-01-10|2021-01-31
2|2021-02-10|2021-03-12
Using column validation from analyzed_data to get to know where I should be looking for changes. I'm not sure if its fast (probably not) but it works for the scenario you bring in your question.

Adding set lists of future dates to rows in a SQL query

So I am doing a cohort analysis for customers, where a cohort is a group of people who started using the product in the same month. I then keep track of each cohort's total use for every subsequent month up till present time.
For example, the first "cohort month" is January 2012, then I have "use months" January 12, Feb 12, March 12, ..., March 17(current month). One column is "cohort month", and another is "use month". This process repeats for every subsequent cohort month. The table looks like:
Jan 12 | Jan 12
Jan 12 | Feb 12
...
Jan 12 | Mar 17
Feb 12 | Feb 12
Feb 12 | Mar 12
...
Feb 12 | Mar 17
...
Feb 17 | Feb 17
Feb 17 | Mar 17
Mar 17 | Mar 17
The problem arises because I want to do forecasting for one year out for both existing and future cohorts.
That means for the Jan 12 cohort, I want to do prediction for April 17 to Mar 18.
I also want to do predictions for the April 17 cohort (which doesn't exist yet) from April 17 to Mar 18. And so on till predictions for the Mar 18 cohort in Mar 18.
I can handle the predictions, don't worry about that.
My issue is that I cannot figure out how to add in this list of (April 17 .. Mar 17) in the "use month" column before every cohort switches.
I also need to add in cohorts April 17 to Mar 18, and have the applicable parts of this list of (April 17 ... Mar 17) for each of these future cohorts.
So I want the table to look like:
Jan 12 | Jan 12
Jan 12 | Feb 12
...
Jan 12 | Mar 17
Jan 12 | Apr 17
..
Jan 12 | Mar 18
Feb 12 | Feb 12
Feb 12 | Mar 12
...
Feb 12 | Mar 17
Feb 12 | Apr 17
...
Feb 12 | Mar 18
...
...
Feb 17 | Feb 17
Feb 17 | Mar 17
...
Feb 17 | Mar 18
Mar 17 | Mar 17
...
Mar 17 | Mar 18
I know the first solution to come to mind is to do a create a list of all dates Jan 12 to Mar 18, cross join it to itself, and then left outer join to the current table I have (where cohort / use months range from Jan 12 to Mar 17). However, this is not scalable.
Is there a way I can just iteratively add in this list of the months of the next year?
I am using HP Vertica, could use Presto or Hive if absolutely necessary
I think you should use the query here below to create a temporary table out of nothing, and join it with the rest of your query. You can't do anything in a procedural manner in SQL, I'm afraid. You won't be able to get away without a CROSS JOIN. But here, you limit the CROSS JOIN to the generation of the first-of-month pairs that you need.
Here goes:
WITH
-- create a list of integers from 0 to 100 using the TIMESERIES clause
i(i) AS (
SELECT dt::DATE - '2000-01-01'::DATE
FROM (
SELECT '2000-01-01'::DATE + 0
UNION ALL SELECT '2000-01-01'::DATE + 100
) d(d)
TIMESERIES dt AS '1 day' OVER(ORDER BY d::TIMESTAMP)
)
,
-- limits are Jan-2012 to the first of the current month plus one year
month_limits(month_limit) AS (
SELECT '2012-01-01'::DATE
UNION ALL SELECT ADD_MONTHS(TRUNC(CURRENT_DATE,'MONTH'),12)
)
-- create the list of possible months as a CROSS JOIN of the i table
-- containing the integers and the month_limits table, using ADD_MONTHS()
-- and the smallest and greatest month of the month limits
,month_list AS (
SELECT
ADD_MONTHS(MIN(month_limit),i) AS month_first
FROM month_limits CROSS JOIN i
GROUP BY i
HAVING ADD_MONTHS(MIN(month_limit),i) <= (
SELECT MAX(month_limit) FROM month_limits
)
)
-- finally, CROSS JOIN the obtained month list with itself with the
-- filters needed.
SELECT
cohort.month_first AS cohort_month
, use.month_first AS use_month
FROM month_list AS cohort
CROSS JOIN month_list AS use
WHERE use.month_first >= cohort.month_first
ORDER BY 1,2
;

Merging old table with new table with different structure

I am using SQL Server 2012. I have two tables which I need to 'merge'. The two tables are called tblOld and tblNew.
tblOld has data from say 2012 to 2013
tblNew has data from 2013 onwards and has a different structure
The dates do not overlap between the tables.
Simple example of the tables:
Old table
t_date region sub_region sales
------------------------------------------
1 Jan 2012 US QR 2
1 Jan 2012 US NT 3
1 Jan 2012 EU QR 5
2 Jan 2012 US QR 4
2 Jan 2012 US NT 6
2 Jan 2012 EU QR 10
...
31 Dec 2013 US QR 8
31 Dec 2013 US NT 9
31 Dec 2013 EU QR 15
New table
t_date region sales
-----------------------------
1 Jan 2014 US 20
1 Jan 2014 EU 50
2 Jan 2014 US 40
2 Jan 2014 EU 100
...
31 Dec 2014 US 80
31 Dec 2014 EU 150
Result I'm looking for:
t_date US QR US NT EU
-------------------------------------
1 Jan 2012 2 3 5
2 Jan 2012 4 6 10
...
31 Dec 2013 8 9 15
1 Jan 2014 20 50
2 Jan 2014 40 100
...
31 Dec 2014 80 150
So I'm trying to create a query which will give me the results above although I'm not sure how to do this or if it can be done?
SELECT t_date,
SUM(CASE WHEN region='US' AND (sub_region='QR' OR sub_region IS NULL) THEN sales ELSE 0 END) 'US QR',
SUM(CASE WHEN region='US' AND sub_region='NT' THEN sales ELSE 0 END) 'US NT',
SUM(CASE WHEN region='EU' THEN sales ELSE 0 END) 'EU'
FROM (
SELECT t_date
,region
,sub_region
,sales
FROM tblOLD
UNION ALL
SELECT t_date
,region
,NULL
,sales
FROM tblNEW
) t
GROUP BY t_date
You are looking for a UNION of the two tables:
SELECT t_date
,region
,sales
,sub_region
FROM tblOLD
UNION ALL
SELECT t_date
,region
,NULL
,sales
FROM tblNEW