I have an origin-destination table like this in Bigquery with weekday, date, UTC time/hour and count of trips:
Origin Destination Day Date Time Count
NY Station Downtown Mon 02.09.2019 15 12
NY Station Downtown Mon 02.09.2019 16 10
City libry Eastside Mon 02.09.2019 17 10
NY Station Downtown Tue 03.09.2019 15 8
NY Station Downtown Tue 03.09.2019 16 5
City libry Eastside Tue 03.09.2019 17 5
NY Station Downtown Wed 04.09.2019 15 8
NY Station Downtown Wed 04.09.2019 16 10
City libry Eastside Wed 04.09.2019 17 11
I wish to get the average Count for
each origin-destination pair (NY Station-Downtown and City libry-Eastside)
the average of Monday-Wednesday at each given time
The output should then be something like
Origin Destination Avg_Day Period Time Avg_Count
NY Station Downtown Mon-Wed Week1 (02.09.19-04.09.19) 15 9,33
NY Station Downtown Mon-Wed Week1 (02.09.19-04.09.19) 16 8,33
City libry Eastside Mon-Wed Week1 (02.09.19-04.09.19) 17 8,67
Ignore the Avg_day and Period columns as its just for help/showing for which days and dates i wish to achieve the average for. In other words the aim is to have an idea of the average counts for each origin-destination pair on a normal weekday (in this case defined as mon-wed) on certain hours of the day. The average count of for example the time 15 for NY Station-Downtown pair is 9,33, calculated by taking the average of the count for 15 o'clock at Monday, at Tuesday and at Wednesday (that is the average of 12, 8 and 8).
I have tried variants of CASE and WHERE SQL queries, but not even close to grasping the logic on how to make the query for this so no point in posting any query. Possibly have to create a temporary table also. Can anyone help me? it is HUGELY appreciated
Below is for BigQuery Standard SQL
#standardSQL
select
Origin,
Destination,
'Mon-Wed' AS Avg_Day,
FORMAT('Week%i (%s-%s)', week, min_date, max_date) AS Period,
Time,
Avg_Count
from (
SELECT
Origin,
Destination,
'Mon-Wed' AS Avg_Day,
EXTRACT(WEEK FROM PARSE_DATE('%d.%m.%Y', date)) week,
MIN(date) AS min_date,
MAX(date) AS max_date,
Time,
ROUND(AVG(count), 2) AS Avg_Count
FROM `project.dataset.table`
WHERE day IN ('Mon', 'Tue', 'Wed')
GROUP BY Origin, Destination, Time, week
)
if to apply to sample data from your question - output is
Row Origin Destination Avg_Day Period Time Avg_Count
1 NY Station Downtown Mon-Wed Week35 (02.09.2019-04.09.2019) 15 9.33
2 NY Station Downtown Mon-Wed Week35 (02.09.2019-04.09.2019) 16 8.33
3 City libry Eastside Mon-Wed Week35 (02.09.2019-04.09.2019) 17 8.67
Related
I have a sample of a table as below:
Customer Ref
Bear Rate
Distance
Month
Revenue
ABA-IFNL-001
1000
01/01/2022
-135
ABA-IFNL-001
1000
01/02/2022
-135
ABA-IFNL-001
1000
01/03/2022
-135
ABA-IFNL-001
1000
01/04/2022
-135
ABA-IFNL-001
1000
01/05/2022
-135
ABA-IFNL-001
1000
01/06/2022
-135
I also have a sample of a calendar table as below:
Date
Year
Week
Quarter
WeekDay
Qtr Start
Qtr End
Week Day
04/11/2022
2022
45
4
Fri
30/09/2022
29/12/2022
1
05/11/2022
2022
45
4
Sat
30/09/2022
29/12/2022
2
06/11/2022
2022
45
4
Sun
30/09/2022
29/12/2022
3
07/11/2022
2022
45
4
Mon
30/09/2022
29/12/2022
4
08/11/2022
2022
45
4
Tue
30/09/2022
29/12/2022
5
09/11/2022
2022
45
4
Wed
30/09/2022
29/12/2022
6
10/11/2022
2022
45
4
Thu
30/09/2022
29/12/2022
7
11/11/2022
2022
46
4
Fri
30/09/2022
29/12/2022
1
12/11/2022
2022
46
4
Sat
30/09/2022
29/12/2022
2
13/11/2022
2022
46
4
Sun
30/09/2022
29/12/2022
3
14/11/2022
2022
46
4
Mon
30/09/2022
29/12/2022
4
15/11/2022
2022
46
4
Tue
30/09/2022
29/12/2022
5
16/11/2022
2022
46
4
Wed
30/09/2022
29/12/2022
6
17/11/2022
2022
46
4
Thu
30/09/2022
29/12/2022
7
How can I join/link the tables to report on revenue over weekly and quarterly periods using the calendar table? I can put into two tables if needed as an output eg:
Quarter Starting
31/12/2021
01/04/2022
01/07/2022
30/09/2022
Quarter
1
2
3
4
Revenue
500
400
540
540
Week Date Start
31/12/2021
07/01/2022
14/01/2022
21/01/2022
Week
41
42
43
44
Revenue
33.75
33.75
33.75
33.75
I am using alteryx for this but wouldnt mind explaination of possible logic in sql to apply it into the system
Thanks
Before I get into the answer, you're going to have an issue regarding data integrity. All the revenue data is aggregated at a monthly level, where your quarters start and end on someday within the month.
For example - Q4 starts September 30th (Friday) and ends Dec. 29th (Thursday). You may have a day or two that bleeds from another month into the quarters which might throw off the data a bit (esp. if there's a large amount of revenue during the days that bleed into a quarter.
Additionally, your revenue is aggregated at a monthly level - unless you have more granular data (weekly, daily would be best), it doesn't make sense to do a weekly calculation since you'll probably just be dividing revenue by 4.
That being said - You'll want to use a cross tab feature in alteryx to get the data how you want it. But before you do that, we want to aggregate your data at a quarterly level first.
You can do this with an if statement or some other data cleansing tool (sorry, been a while since I used alteryx). Something like:
# Pseudo code - this won't actually work!
# For determining quarter
if (month) between (30/09/2022,29/12/2022) then 4
where you can derive the logic from your calendar table. Then once you have the quarter, you can join in the Quarter Start date based on your quarter calculation.
Now you have a nice clean table that might look something like this:
Month
Revenue
Quarter
Quarter Start Date
01/01/2022
-135
4
30/09/2022
01/01/2022
-135
4
30/09/2022
Aggregate on your quarter to get a cleaner table
Quarter Start Date
Quarter
revenue
30/09/2022
4
300
Then use cross tab, where you pivot on the Quarter start date.
For SQL, you'd be pivoting the data. Essentially, taking the value from a row of data, and converting it into a column. It will look a bit janky because the data is so customized, but here's a good question that goes over pivioting - Simple way to transpose columns and rows in SQL?
Purpose of the report: Identify patients who did not have dental cleanings in the last 6 months
What would be the best approach to write a sql script?
Patients table
patient_id
patient_name
11
Jason Strong
22
Ryan Smith
33
Casey Hammer
Visits table
v_id
patient_id
reason_visit
date_of_visit
1
11
medical
01/01/2021
2
22
dental cleaning
11/10/2020
3
22
annual
01/01/2021
4
11
dental cleaning
5/10/2021
5
11
annual
5/1/2021
Expected
patient_id
patient_name
22
Ryan Smith
33
Casey Hammer
Casey is on the list because she is not in the visits table meaning she never received a cleaning from our office.
Ryan Smith is on the list because it is time for his cleaning.
I was also thinking what if the patient did not have an appointment in the last 6 months but had an future appointment for dental cleaning. I would want to exclude that.
in postgresql:
select * from Patients p
where not exists (
select 1 from Visits v
where v.patient_id = p.patient_id
and reason_visit = 'dental cleaning'
and date_of_visit < now() - interval '6 month'
)
in sql server replace now() - interval '6 month' with dateadd(month, -6,getdate())
in mysql date_add(now(), interval -6 month)
This is the table sports table
Country Players Hours Completed_Time
India Player1 7 24-05-2021 07:29
India Player2 7 24-05-2021 07:21
India Player3 7 24-05-2021 07:26
India Player4 7 24-05-2021 07:30
India Player5 2 25-05-2021 02:01
US Player1 8 25-05-2021 08:54
US Player2 8 25-05-2021 08:57
US Player3 8 25-05-2021 08:54
**US Player4 8 25-05-2021 08:45
US Player5 05 26-05-2021 05:19**
while plotting chart I facing problem to convert Completed Time column that datetime into hours
I have used below oracle SQL query
select Country,Players,TO_NUMBER(TO_CHAR(Completed_Time,'HH')) as Hours
from sports
only hours is not enough to plot the graph
since if we take only hours then in case
Country Players Hours Completed_Time
US Player4 **8** **25**-05-2021 **08**:45
US Player5 **05** **26**-05-2021 **05**:19
Player5 took one day extra than Player4 so
in this case, I want +24 in hours value for Player5 ,Player4 hours should be 8 and for Player5 hours should be 29
if you don't have startdate , you can get hour difference by comapring to the first Completed_Time in your data :
select * , 24 * (Completed_Time - min(Completed_Time) over ()) as hourdiff
from sports
I guess you need time difference from the particular country start playing. Try out below query.
SELECT mq.Country,
mq.Players,
(24*(to_date(mq.Completed_Time,
'DD-MM-YYYY HH') - to_date(sq.start_hr, 'DD-MM-YYYY')))+TO_NUMBER(TO_CHAR(mq.Completed_Time,'HH')) AS diff_hours
FROM sports AS mq
INNER JOIN
(SELECT Country ,
TO_DATE(min(Completed_Time),
'DD-MM-YYYY') AS start_hr
FROM sports
GROUP BY Country) AS sq
ON sq.Country = mq.Country;
I have a table
JobID Date_of_Completion Region day
1 23/05/2016 South monday
2 23/05/2016 north monday
3 23/05/2016 north monday
4 23/05/2016 east monday
5 22/05/2016 South sunday
6 22/05/2016 north sunday
7 22/05/2016 south sunday
8 22/05/2016 east sunday
.
.
.
..
23 2/05/2016 north monday
24 2/05/2016 east monday
25 2/05/2016 South monday
26 2/05/2016 north monday
27 2/05/2016 south monday
28 2/05/2016 east monday
desired output :
for last two months
Day Region countofjobsonparticularday no of days
sunday south 34 8 (no of sund forlast 2 months)
sunday north 24 8 (no of sund forlast 2 months)
monday south 74 9 (no of mon forlast 2 months)
tuesday east 64 8 (no of tue forlast 2 months)
how to write a query? plz help me
It seems that you need something like this:
select Day, Region, count(1), count(distinct date_of_completion)
from your_table
where date_of_completion between add_months(sysdate, -2) and sysdate
group by Day, Region
This will count the number of jobs and the number of DISTINCT days on which job completed.
You should refine this, based on your need ( for example how you want to consider hours, minutes, ...,)
If - as I suspect - you mean the last column, no of days, is supposed to show the total number of Mondays, Tuesdays, etc. over the last two months (regardless of whether there were any jobs on some of the days), first create a (sub)query as below and then join to Aleksej's result on the Day column. Speaking of Day, it is an Oracle keyword; it is always best to avoid using Oracle keywords as table or column names. I use day_name below.
Result of query (can be used as subquery):
DAY_NAME CT
--------------- ----------
monday 9
thursday 8
sunday 9
saturday 9
tuesday 8
friday 9
wednesday 8
I didn't order the results (not needed, if used for a join) and I used low-caps as the OP did. That is controlled by the format model (the middle argument to to_char in the query, below; if capitalized names, like Monday, are desired, change that from 'day' to 'Day').
Query:
with x (day_name) as (
select to_char(sysdate - level + 1, 'day', 'nls_date_language = American')
from dual
connect by level <= sysdate - add_months(sysdate, -2) - 1
)
select day_name, count(*) as ct
from x
group by day_name;
Note 'nls_date_language = American' - it is always best to make that explicit than to rely on default parameters. (Without this third argument, someone else running this with German or Chinese date language wouldn't get the expected result for joining with the other table.) Also, the definition of "last two months" is fuzzy; I used all days between today (included) and two months ago, that is between March 24 and May 23, 2016. These are controlled by the two expressions containing sysdate.
Thanks #mathguy and#Aleksej
I tried this query it worked
select to_char(dayofcompletion,'DY') as day_name, count(1),count(distinct(trunc(dayofcompletion))) as noofdays
from tablename
where trunc(dayofcompletion)>= trunc(sysdate-60)and trunc(dayofcompletion)<=trunc(sysdate-1)
group by to_char(dayofcompletion,'DY')
I have been working on this practice problem for SQL class for the past 30 minutes. I am having a hard time including rows that have NULL values in it or a numerical value of 0. I will post the question and then the query that I have written:
"Write a query to display the tour name, outing date, and number of registered clients for each
outing of that tour on each date. Include only outings that were scheduled to occur after
October 27, 2013. Include tours with no outings and outings with no registered clients. Sort the result by the number of clients in descending order, and then by outing date in ascending order."
SELECT TOUR_NAME,OUT_DATE,Count(DISTINCT CLIENT_NUM) AS "Num Clients"
FROM TOUR RIGHT JOIN OUTING USING (TOUR_ID) JOIN REGISTER USING (OUT_ID)
WHERE To_Char(OUT_DATE,'YYYY-MM-DD') > '2013-10-27'
GROUP BY TOUR_NAME,OUT_DATE
ORDER BY "Num Clients" DESC,OUT_DATE;
I cannot figure out how to pull rows with empty cells. It currently only pulls complete rows.
-EXPECTED RESULTS:
--TOUR_NAME --OUT_DATE --Num Clients
Weekend Weekday 29-OCT-13 26
Downtown 28-OCT-13 25
Deluxe Day Away 28-OCT-13 23
Quick Break 30-OCT-13 19
Downtown 27-OCT-13 18
Downtown 30-OCT-13 18
Deluxe Day Away 31-OCT-13 12
Washington Heights 31-OCT-13 10
Weekend Weekday 13-NOV-13 0
Downtown 14-NOV-13 0
Beltway 15-NOV-13 0
Weekend Weekday 15-NOV-13 0
Quick Break 16-NOV-13 0
Power Shots 0
Perfect Endings 0
Primary Point 0
MY ACTUAL RESULTS:
TOUR_NAME OUT_DATE Num Clients
Weekend Weekday 29-OCT-13 26
Downtown 28-OCT-13 25
Deluxe Day Away 28-OCT-13 23
Quick Break 30-OCT-13 19
Downtown 27-OCT-13 18
Downtown 30-OCT-13 18
Deluxe Day Away 31-OCT-13 12
Washington Heights 31-OCT-13 10
It's not including any rows that has a null value or a zero count value within that row.
I appreciate any help. Thank you.