Select same account numbers in a new table - sql

Using Teradata SQL Assistant, I want to be able to pull a table a year ahead but only the ones that would match the results in the query from the year before. Here's what I am trying to do. I pulled a table that contains information where the results in a specific column equals 0 for no. I want to pull information from 1 year ahead where the results in that column equals 1 but only include the account numbers that came when I pulled the results for the year before. Like only pull the customer account numbers for the year ahead that are the same from the year before.
Explanation: I pull the one table that has 0 in the column. From that, I want to see which of those accounts became a 1 in the table from a year ahead. The table has millions of accounts and I just have my settings for 10,000 of them so I want to see of those 10,000 in the first year that did not have the product, how many of them became 1 in the second year.
Can I do this? If so, how? I have been googling and I do not think I am explaining what I am trying to do correctly in my google query so I am coming up short with results.

Thanks for clarifying. That makes it a little simpler. I would put the second year data in a subquery and filter the main table on the first year and quantity = 0. This will give you two columns one with the first year and one with the second year. If you're only looking for this information for a single product_id you will need to add this to both WHERE clauses.
SELECT TABLE_NAME.ACCOUNT_ID, TABLE_NAME.QUANTITY AS "2019" , YEAR_TWO.QUANTITY AS "2020"
FROM TABLE_NAME
LEFT JOIN
(
SELECT *
FROM TABLE
WHERE YEAR = 2020
) YEAR_TWO ON TABLE_NAME.ACCOUNT_ID = YEAR_TWO.ACCOUNT_ID
WHERE TABLE_NAME.YEAR = 2019
AND TABLE_NAME.QUANTITY = 0
If you want just the % of accounts that are no longer 0 in the second year you could try something like this (adding up all the 1s and dividing by total count)
SELECT TABLE_NAME.YEAR, SUM(YEAR_TWO.QUANTITY) / COUNT(YEAR_TWO.QUANTITY) AS PERCENTAGE_NOT_ZERO
FROM TABLE_NAME
LEFT JOIN
(
SELECT *
FROM TABLE
WHERE YEAR = 2020
) YEAR_TWO ON TABLE_NAME.ACCOUNT_ID = YEAR_TWO.ACCOUNT_ID
WHERE TABLE_NAME.YEAR = 2019
AND TABLE_NAME.QUANTITY = 0
GROUP BY TABLE_NAME.YEAR

Related

Cohort retention with SQL BigQuery

I am trying to create a retention table like the following using SQL in Big Query but with MONTHLY cohorts;
I have the following columns to use in my dataset, I am only using one table and it's name is 'curious-furnace-341507.TEST.Test_Dataset_-_Orders'
order_date
order_id
customer_id
2020-01-02
12345
6789
I do not need the new user column and the data goes through June 2020 I think ideally a cohort month column that lists January-June cohorts and then 5 periods across.
I have tried so many different things and keep getting errors in BigQuery I think I am approaching it all wrong. The online queries I am trying to pull from seem to use dates rather than months which is also causing some confusion as I think I need to truncate my date column to months only in the query?
Does anyone have a go-to query that will work in BigQuery for a retention table or can help me approach this? Thanks!
This may help you:
With cohorts AS (
SELECT
customer_id,
MIN(DATE(order_date)) AS cohort_date
FROM 'curious-furnace-341507.TEST.Test_Dataset_-_Orders'
GROUP BY 1)
SELECT
FORMAT_DATE("%Y-%m", c.cohort_date) AS cohort_mth,
t.customer_id AS cust_id,
DATE_DIFF(t.order_date, c.cohort_date, month) AS order_period,
FROM 'curious-furnace-341507.TEST.Test_Dataset_-_Orders' t
JOIN cohorts c ON t.customer_id = c.customer_id
WHERE cohort_date >= ('2020-01-01')
AND DATE_DIFF(t.order_date, c.cohort_date, month) <=5
GROUP BY 1, 2, 3
I typically do pivots and % calcs in excel/ sheets. So this will give just you the input data you need for that.
NOTE:
This will give you a count of unique customers who ordered in period X (ignores repeat orders in period).
This also has period 0 (ordered again in cohort_mth) which you may wish to keep/ exclude.

How to count employees that have been promoted?

I'm trying to figure out how to come up with a calculation or query to count the number of employees by grade promoted on each pay period.
*count the number of records who's value in grade have increased by pay period.
Sample solution:
Soln:
Year Payroll Period Count
2018 16 2
2019 6 1
2019 10 1
I've tried pivot and queries in access but I think this needs to have an inner join to identify specific employees who got promoted. thanks for the assistance.
code in excel that seems to work but needs to be transferred in access due to the number of records. I think inner join would make this work. =AND(B2<>B3,C2=C3,D3>D2)
Based on EXCEL, you can derive your solution, assuming that your records are in sequence for columns Year, Payroll, Employee & Grade.
Add another column to determine if there is a grade increase for that particular Payroll Period.
For excel cell reference sake, "Year" is in cell A1
Set formula of 1st cell of this column to false
For the next cell in this new column, set it as such:
The above checks if there is a grade increase for that particular Payroll Period.
The explanation of the formula in sequence is as such, 1. Check if year same (A3=A2), 2. Check if Payroll Period is different(B3<>B2), 3. Check if Employee is the same (C3=C2) and finally 4. Check if there is a change in grade (D3=D2).
Copy this formula down to the rest of your range.
Next, you can start to pivot.
Add your pivot table from your table/range with the following
Filter Grade Increase to true and also change the values aggregation of Employee from Sum to Count.
You will get the following:
I would rename Count of Employees to make it more meaningful.
One caveat for the above approach is that if the grade was increased at the beginning of the 1st Payroll Period of the year, the increase won't be captured. For such, you can remove the year check from the formula A3=A2.
Edit:
Doing a bit of research, perhaps you can do
select t1.*, (t1.Grade > t2.Grade) as Grade_Increase
from YourTableName t1 left join YourTableName t2 on
t1.Employee = t2.Employee and
(((t1.Year - 2018)*26) + t1.Payroll_Period) =
(((t2.Year - 2018)*26) + t2.Payroll_Period - 1) -- -1 to get the prior record to compare grades
What the above does is essentially joining the table to itself.
Records that are 'next in sequence' are combined into the same row. And a comparison is done.
This was not verified in Access.
Substitute 2018 with whatever your base year is. I'm using 2018 to calculate the sequence number of the records. Initially I thought of using common table expressions, rank and row_number. But access doesn't seem to support these functions.

Count records in each month using Month(dateField) in SQL

I have a large table containing c650,000 records. They are individuals with email addresses and one of the fields is 'dateOfApplication'. I have been asked for a breakdown of how many people signed up in each month.
I'd like the results to look something like
Month Year Total
1 2017 50763
2 2017 34725
And have made a target table in this format to put the results in. I've been able to use Month(dateOfApplication) to get the month component of the date using
SELECT DISTINCT
(SELECT COUNT(1) FROM [UG_Master]
WHERE MONTH([UG_Master].dateOfApplication) = '6') as Total
To return particular months, but don't really know how to get one row for each month it finds.
but don't really know how to get one row for each month it finds.
You can use GROUP BY :
SELECT MONTH([UG_Master].dateOfApplication), COUNT(1)
FROM [UG_Master]
GROUP BY MONTH([UG_Master].dateOfApplication);
If you want year wise months then include year also :
SELECT YEAR([UG_Master].dateOfApplication), MONTH([UG_Master].dateOfApplication), COUNT(1)
FROM [UG_Master]
GROUP BY YEAR([UG_Master].dateOfApplication), MONTH([UG_Master].dateOfApplication);

GROUP BY with date range

I have a table with 4 columns, id, Stream which is text, Duration (int), and Timestamp (datetime). There is a row inserted for every time someone plays a specific audio stream on my website. Stream is the name, and Duration is the time in seconds that they are listening. I am currently using the following query to figure up total listen hours for each week in a year:
SELECT YEARWEEK(`Timestamp`), (SUM(`Duration`)/60/60) FROM logs_main
WHERE `Stream`="asdf" GROUP BY YEARWEEK(`Timestamp`);
This does what I expect... presenting a total of listen time for each week in the year that there is data.
However, I would like to build a query where I have a result row for weeks that there may not be any data. For example, if the 26th week of 2006 has no rows that fall within that week, then I would like the SUM result to be 0.
Is it possible to do this? Maybe via a JOIN over a date range somehow?
The tried an true old school solution is to set up another table with a bunch of date ranges that you can outer join with for the grouping (as in the other table would have all of the weeks in it with a begin / end date).
In this case, you could just get by with a table full of the values from YEARWEEK:
201100
201101
201102
201103
201104
And here is a sketch of a sql statement:
SELECT year_weeks.yearweek , (SUM(`Duration`)/60/60)
FROM year_weeks LEFT OUTER JOIN logs_main
ON year_weeks.yearweek = logs_main.YEARWEEK(`Timestamp`)
WHERE `Stream`="asdf" GROUP BY year_weeks.yearweek;
Here is a suggestion. might not be exactly what you are looking for.
But say you had a simple table with one column [year_week] that contained the values of 1, 2, 3, 4... 52
You could then theoretically:
SELECT
A.year_week,
(SELECT SUM('Duration')/60/00) FROM logs_main WHERE
stream = 'asdf' AND YEARWEEK('TimeStamp') = A.year_week GROUP BY YEARWEEK('TimeStamp'))
FROM
tblYearWeeks A
this obviously needs some tweaking... i've done several similar queries in other projects and this works well enough depending on the situation.
If your looking for a one table/sql based solution then that is deffinately something I would be interested in as well!

Selecting records from the past three months

I have 2 tables from which i need to run a query to display number of views a user had in the last 3 months from now.
So far I have come up with: all the field types are correct.
SELECT dbo_LU_USER.USERNAME
, Count(*) AS No_of_Sessions
FROM dbo_SDB_SESSION
INNER JOIN dbo_LU_USER
ON dbo_SDB_SESSION.FK_USERID = dbo_LU_USER.PK_USERID
WHERE (((DateDiff("m",[dbo_SDB_SESSION].[SESSIONSTART],Now()))=0
Or (DateDiff("m",[dbo_SDB_SESSION].[SESSIONSTART],Now()))=1
Or (DateDiff("m",[dbo_SDB_SESSION].[SESSIONSTART],Now()))=2))
GROUP BY dbo_LU_USER.USERNAME;
Basically, the code above display a list of all records within the past 3 months; however, it starts from the 1st day of the month and ends on the current date, but I need it to start 3 months prior to today's date.
Also to let you know this is SQL View in MS Access 2007 code.
Thanks in advance
Depending on how "strictly" you define your 3 months rule, you could make things a lot easier and probably efficient, by trying this:
SELECT dbo_LU_USER.USERNAME, Count(*) AS No_of_Sessions
FROM dbo_SDB_SESSION
INNER JOIN dbo_LU_USER
ON dbo_SDB_SESSION.FK_USERID = dbo_LU_USER.PK_USERID
WHERE [dbo_SDB_SESSION].[SESSIONSTART] between now() and DateAdd("d",-90,now())
GROUP BY dbo_LU_USER.USERNAME;
(Please understand that my MS SQL is a bit rusty, and can't test this at the moment: the idea is to make the query scan all record whose date is between "TODAY" and "TODAY-90 days").