Group By Issue in Select Query in SQL - sql

I am kind of stuck in one of the SQL queries where I would require little help.
Table structure is as follow:
Table #1: PROD_ORDER:
ID_PROD_ORDER(PK)
1001
1002
1003
Table #2: JOB
ID_JOB | ID_PROD_ORDER(FK)|ID_ASSET | DT_START | DT_END
1 1001 8 2016/11/22 05:45:50 2016/11/24 13:13:14
2 1001 8 some date some date
3 1002 9 some date some date
4 1002 9 some date some date
5 1003 8 some date some date
6 1001 8 some date some date
Table #3: Confirmation
ID_CONFIRMATION | ID_JOB | QT_CONF | QT_SCRAP
Table #4: DOWNTIME
ID_DOWNTIME | DT_START | DT_END | ID_ORG_SUB_ASSET
Now the requirement is to find order and
Start date of the order (which will be Min of DT_START from JOB)
End date of the order (which will be Max of DT_END from JOB)
Sum of all of its jobs' QT_CONF
Sum of all of its jobs' QT_SCRAP
Downtime from DOWNTIME table where downtime = Difference in seconds of DT_START - DT_END
ID_ASSET, Start Date and End Date will be passed in as parameters.
I have written this query:
SELECT
PO.ID_PROD_ORDER,
J.ID_ORG_ASSET,
SUM(C.QT_CONF) AS "QT_CONF",
SUM(C.QT_SCRAP) AS "QT_SCRAP",
MIN(J.DT_JOB_ST) AS "START_DATE",
MAX(J.DT_JOB_ED) AS "END_DATE",
(SELECT SUM(datediff(ss, D.DT_START, D.DT_END)) AS "DOWNTIMESECONDS"
FROM DOWNTIME D
INNER JOIN SUB_ASSET SA ON D.ID_SUB_ASSET = SA.ID_SUB_ASSET
WHERE SA.ID_ASSET = [Param.3] AND D.DT_START >= J.DT_JOB_ST
AND D.DT_END <= J.DT_JOB_ED)
FROM
PROD_ORDER PO
INNER JOIN
JOB J ON PO.ID_PROD_ORDER = J.ID_PROD_ORDER
AND J.DT_JOB_ST >= '[Param.1]'
AND J.DT_JOB_ED <= '[Param.2]'
LEFT OUTER JOIN
CONFIRMATION C ON C.ID_JOB = J.ID_JOB
WHERE
J.ID_ASSET = [Param.3]
GROUP BY
PO.ID_PROD_ORDER, J.ID_ASSET
The query throws an error:
JOB.DT_JOB_ST cannot be included in select list as it is not used in aggregation or GROUP BY
If I put JOB.DT_JOB_ST and JOB.DT_JOB_ED in GROUP BY, then it returns more than 1 row for each order but I need only one row per order.
How can I correct it? I'm just confused !!
Thanks !

I suspect the issue here is the correlated query, which will try to evaluate for each row - as this includes the DT_JOB_ST and DT_JOB_ED it would need these to be part of the group.
The other option would be to rewrite to not need the correlated query, so something like this should be good for you:
SELECT PO.ID_PROD_ORDER,
J.ID_ORG_ASSET,
SUM(C.QT_CONF) AS [QT_CONF],
SUM(C.QT_SCRAP) AS [QT_SCRAP],
MIN(J.DT_JOB_ST) AS [START_DATE],
MAX(J.DT_JOB_ED) AS [END_DATE],
SUM(ISNULL(datediff(ss, D.DT_START, D.DT_END),0)) AS [DOWNTIMESECONDS]
FROM PROD_ORDER PO
INNER JOIN JOB J
ON PO.ID_PROD_ORDER = J.ID_PROD_ORDER
AND J.DT_JOB_ST >= '[Param.1]'
AND J.DT_JOB_ED <= '[Param.2]'
LEFT JOIN CONFIRMATION C
ON C.ID_JOB = J.ID_JOB
LEFT JOIN DOWNTIME D
INNER JOIN SUB_ASSET SA
ON D.ID_SUB_ASSET = SA.ID_SUB_ASSET
AND SA.ID_ASSET = [Param.3]
ON D.DT_START >= J.DT_JOB_ST
AND D.DT_END <= J.DT_JOB_ED
WHERE J.ID_ASSET = [Param.3]
GROUP BY PO.ID_PROD_ORDER, J.ID_ORG_ASSET
(if you prefer, a CTE could work, too)

Following query worked :
SELECT
PO.ID_PROD_ORDER,
J.ID_ASSET,
SUM(C.QT_CONF) AS "QT_CONF",
SUM(C.QT_SCRAP) AS "QT_SCRAP",
MIN(J.DT_JOB_ST) AS "START_DATE",
MAX(J.DT_JOB_ED) AS "END_DATE",
(SELECT SUM(datediff(ss, D.DT_START, D.DT_END)) AS "DOWNTIMESECONDS"
FROM DOWNTIME D
INNER JOIN SUB_ASSET SA ON D.ID_SUB_ASSET = SA.ID_SUB_ASSET
WHERE SA.ID_ASSET = [Param.3] AND D.DT_START >= MIN(J.DT_JOB_ST)
AND D.DT_END <= MAX(J.DT_JOB_ED))
FROM
PROD_ORDER PO
INNER JOIN
JOB J ON PO.ID_PROD_ORDER = J.ID_PROD_ORDER
AND J.DT_JOB_ST >= '[Param.1]'
AND J.DT_JOB_ED <= '[Param.2]'
LEFT OUTER JOIN
CONFIRMATION C ON C.ID_JOB = J.ID_JOB
WHERE
J.ID_ASSET = [Param.3]
GROUP BY
PO.ID_PROD_ORDER, J.ID_ASSET

Related

SELECT list expression references column integration_start_date which is neither grouped nor aggregated at

I'm facing an issue with the following query. It gave me this error [SELECT list expression references column integration_start_date which is neither grouped nor aggregated at [34:63]]. In particular, it points to the first 'when' in the result table, which I don't know how to fix. This is on BigQuery if that helps. I see everything is written correctly or I could be wrong. Seeking for help.
with plan_data as (
select format_date("%Y-%m-%d",last_day(date(a.basis_date))) as invoice_date,
a.sponsor_id as sponsor_id,
b.company_name as sponsor_name,
REPLACE(SUBSTR(d.meta,STRPOS(d.meta,'merchant_id')+12,13),'"','') as merchant_id,
a.state as plan_state,
date(c.start_date) as plan_start_date,
a.employee_id as square_employee_id,
date(
(select min(date)
from glproductionview.stats_sponsors
where sponsor_id = a.sponsor_id and sponsor_payroll_provider_identifier = 'square' and date >= c.start_date) )
as integration_start_date,
count(distinct a.employee_id) as eligible_pts_count, --pts that are in active plan and have payroll activities (payroll deductions) in the reporting month
from glproductionview.payroll_activities as a
left join glproductionview.sponsors as b
on a.sponsor_id = b.id
left join glproductionview.dc_plans as c
on a.plan_id = c.id
left join glproductionview.payroll_connections as d
on a.sponsor_id = d.sponsor_id and d.provider_identifier = 'rocket' and a.company_id = d.payroll_id
where a.payroll_provider_identifier = 'rocket'
and format_date("%Y-%m",date(a.basis_date)) = '2021-07'
and a.amount_cents > 0
group by 1,2,3,4,5,6,7,8
order by 2 asc
)
select invoice_date,
sponsor_id,
sponsor_name,
eligible_pts_count,
case
when eligible_pts_count <= 5 and date_diff(current_date(),integration_start_date, month) <= 12 then 20
when eligible_pts_count <= 5 and date_diff(current_date(),integration_start_date, month) > 12 then 15
when eligible_pts_count > 5 and date_diff(current_date(),integration_start_date, month) <= 12 then count(distinct square_employee_id)*4
when eligible_pts_count > 5 and date_diff(current_date(),integration_start_date, month) > 12 then count(distinct square_employee_id)*3
else 0
end as fees
from plan_data
group by 1,2,3,4;

Fill in blank dates for rolling average - CTE in Snowflake

I have two tables – activity and purchase
Activity table:
user_id date videos_watched
1 2020-01-02 3
1 2020-01-04 5
1 2020-01-07 5
Purchase table:
user_id purchase_date
1 2020-01-01
2 2020-02-02
What I would like to do is to get a 30 day rolling average since purchase on how many videos has been watched.
The base query is like this:
SELECT
DATEDIFF(DAY, p.purchase_date, a.date) AS day_since_purchase,
AVG(A.VIDEOS_VIEWED)
FROM PURCHASE P
LEFT OUTER JOIN ACTIVITY A ON P.USER_ID = A.USER_ID AND
A.DATE >= P.PURCHASE_DATE AND A.DATE <= DATEADD(DAY, 30, P.PURCHASE_DATE)
GROUP BY 1;
However, the Activity table only has records for each day a video has been logged. I would like to fill in the blanks for days a video has not been viewed.
I have started to look into using a CTE like this:
WITH cte AS (
SELECT date('2020-01-01') as fdate
UNION ALL
SELECT CAST(DATEADD(day,1,fdate) as date)
FROM cte
WHERE fdate < date('2020-04-01')
) select * from cte
cross join purchases p
left outer join activity a
on p.user id = a.user_id
and a.fdate = p.purchase_date
and a.date >= p.purchase_date and a.date <= dateadd(day, 30, p.purchase_date)
The end goal is to have something like this:
days_since_purchase videos_watched
1 3
2 0 --CTE coalesce inserted value
3 0
4 5
Been trying for the last couple of hours to get it right, but still can't really get the hang of it.
If you want to fill in the gaps in the result set, then I think you should be generating integers rather than dates:
WITH cte AS (
SELECT 1 as day_since_purchase
UNION ALL
SELECT 1 + day_since_purchase
FROM cte
WHERE day_since_purchase < 4
)
SELECT cte.day_since_purchase, COALESCE(avg_videos_viewed, 0)
FROM cte LEFT JOIN
(SELECT DATEDIFF(DAY, p.purchase_date, a.date) AS day_since_purchase,
AVG(A.VIDEOS_VIEWED) as avg_videos_viewed
FROM purchases p JOIN
activity a
ON p.user id = a.user_id AND
a.fdate = p.purchase_date AND
a.date >= p.purchase_date AND
a.date <= dateadd(day, 30, p.purchase_date)
GROUP BY 1
) pa
ON pa.day_since_purchase = cte.day_since_purchase;
You can use a recursive query to generate the 30 days following each purchase, then bring the activity table:
with cte as (
select
purchase_date,
client_id,
0 days_since_purchase,
purchase_date dt
from purchases
union all
select
purchase_date,
client_id,
days_since_purchase + 1
dateadd(day, days_since_purchase + 1, purchase_date)
from cte
where days_since_purchase < 30
)
select
c.days_since_purchase,
avg(colaesce(a. videos_watch, 0)) avg_ videos_watch
from cte c
left join activity a
on a.client_id = c.client_id
and a.fdate = c.purchase_date
and a.date = c.dt
group by c.days_since_purchase
Your question is unclear on whether you have a column in the activity table that stores the purchase date each row relates to. Your query has column fdate but not your sample data. I used that column in the query (without such column, you might end up counting the same activity in different purchases).

Simplify complex Query

I need to simplify a complex old query in order to filter is with date range.
I got a table with Tickets and TicketNotes.
I need
a column with the Tickets count of the day
a column with the Tickets count with a specific note of the day
the date
The old query
SELECT SUM(IFNULL(qtickets.count, 0)) j, SUM(IFNULL(mtickets.count, 0)) m FROM (
SELECT
COUNT(tickets.id) COUNT,
DATE(tickets.date) DATE
FROM
tickets
WHERE
tickets.status = 'Closed' AND tickets.did = 7
AND MONTH(tickets.date) = MONTH( CURRENT_DATE - INTERVAL 1 MONTH )
AND YEAR(tickets.date) = YEAR( CURRENT_DATE - INTERVAL 1 MONTH )
GROUP BY
DATE(tickets.date)
) AS mtickets LEFT JOIN (
SELECT
1 AS COUNT,
DATE(tickets.date) DATE
FROM
ticketnotes
INNER JOIN tickets ON tickets.id = ticketnotes.ticketid
WHERE
ticketnotes.message LIKE '%https://xxxxx.net/help/tickets/%'
AND tickets.status = 'Closed'
AND tickets.did = 7
AND MONTH(tbltickets.date) = MONTH( CURRENT_DATE - INTERVAL 1 MONTH )
AND YEAR(tbltickets.date) = YEAR( CURRENT_DATE - INTERVAL 1 MONTH )
GROUP BY
DATE(tickets.date)
) AS qtickets ON (mtickets.date = qtickets.date)
The goal is to get a result of
Date | M | Q
===================
2020-04-01 | 1 | 1
2020-04-02 | 2 | 1
2020-04-03 | 5 | 2
...
2020-04-30 | 3 | 0
With M be the total closed tickets of the day for did = 7 and Q be the total closed tickets that got the note.message.
I need to check the query with one instance of date filter date BETWEEN '2020-04-01' AND '2020-04-30' and still get the correct three columns.
=======
UPDATE:
When I'm trying to add AND DATE(tickets.date) BETWEEN DATE('2020-04-01') AND DATE('2020-04-30') in Gordon's answer, I got other result data from my primary query.
QUERY:
SELECT
DATE(t.date),
COUNT(t.id) AS num_tickets,
(CASE WHEN COUNT(tn.ticketid) = 0 THEN 0 ELSE 1 END) AS num_with_message
FROM
tickets t
LEFT JOIN ticketnotes tn ON
tn.ticketid = t.id AND tn.message LIKE '%https://xxxxx.net/help/tickets/%'
WHERE
t.status = 'Closed' AND t.did = 7
AND DATE(t.date) BETWEEN DATE('2020-04-01') AND DATE('2020-04-30')
GROUP BY
DATE(t.date)
The result is getting num_tickets with wrong data as getting num_ticket without JOIN.
Any suggestions ?
You could try using case for the ehere like
SELECT
DATE(tickets.date) DATE
, COUNT(tickets.id) M
, case sum( ticketnotes.message LIKE '%https://xxxxx.net/help/tickets/%' <> 0 ) then 1 else null end Q
FROM
ticketnotes
INNER JOIN tickets ON tickets.id = ticketnotes.ticketid
WHERE tickets.status = 'Closed'
AND tickets.did = 7
AND MONTH(tbltickets.date) = MONTH( CURRENT_DATE - INTERVAL 1 MONTH )
AND YEAR(tbltickets.date) = YEAR( CURRENT_DATE - INTERVAL 1 MONTH )
GROUP BY DATE(tickets.date)
This answers the original version of the question.
What you are describing sounds like a group by with left join. However, it is not clear what exactly you are looking for. My best guess is:
select date(t.date), count(t.id) as num_tickets,
count(tn.ticketid) as num_with_message
from tickets t left join
ticketnotes tn
on tn.ticketid = t.id and
tn.message like '%https://xxxxx.net/help/tickets/%'
where t.status = 'Closed' and
t.did = 7
group by date(t.date)

Having clause being ignored

In this query, I am attempting to get a count that gives me a count of patients for each practice under given conditions.
The issue is that I have to show patients who have had >=3 office visits in the past year.
Count(D.PID)
in the select list is ignoring
HAVING count(admitdatetime)>=3
Here is my query
select distinct D.PracticeAbbrevName, D.ProviderLastName, count(D.pid) AS Count
from PersonDetail AS D
left join Visit AS V on D.PID = V.PID
where D.A1C >=7.5 and V.admitdatetime >= (getdate()-365) and D.A1CDays <180 and D.Diabetes = 1
group by D.PracticeAbbrevName, D.ProviderLastName
having count(admitdatetime)>=3
order by PracticeAbbrevName
If I get rid of the count function for D.pid, and just display each PID individually, my having phrase works properly.
There is something about count and having that do now work properly together.
Revised answer:
SELECT DISTINCT
D.PracticeAbbrevName,
D.ProviderLastName,
COUNT(D.pid) AS PIDCount,
COUNT(admitdatetime) AS AdmitCount
FROM
PersonDetail AS D
LEFT JOIN Visit AS V
ON D.PID = V.PID
WHERE
D.A1C >= 7.5
AND V.admitdatetime >= ( GETDATE() - 365 )
AND D.A1CDays < 180
AND D.Diabetes = 1
GROUP BY
D.PracticeAbbrevName,
D.ProviderLastName
HAVING
COUNT(admitdatetime) >= 3
ORDER BY
PracticeAbbrevName
You're trying to do too much at once. Split the logic in 2 steps:
Query grouping by PID to filter out patients that don't meet your criteria.
Query grouping by practice to get a patient count.
Your query would look like this:
;with EligiblePatients as (
select d.pid,
d.PracticeAbbrevName,
d.ProviderLastName
from PersonDetail d
left join Visit v
on v.pid = d.pid
and v.admitdatetime >= (getdate()-365)
where d.A1C >= 7.5
and d.A1CDays < 180
and d.Diabetes = 1
group by d.pid,
d.PracticeAbbrevName,
d.ProviderLastName
having count(v.pid) >= 3
)
select PracticeAbbrevName,
ProviderLastName,
COUNT(*) as PatientCount
from EligiblePatients
group by PracticeAbbrevName,
ProviderLastName
order by PracticeAbbrevName

How do I use calendar exceptions to generate accurate schedules using GTFS?

I'm having trouble figuring out the GTFS query to obtain the next 20 schedules for a given stop ID and a given direction.
I know the stop ID, the trip direction ID, the time (now) and the date (today)
I wrote
SELECT DISTINCT ST.departure_time FROM stop_times ST
JOIN trips T ON T._id = ST.trip_id
JOIN calendar C ON C._id = T.service_id
JOIN calendar_dates CD on CD.service_id = T.service_id
WHERE ST.stop_id = 3377699724118483
AND T.direction_id = 0
AND ST.departure_time >= "16:00:00"
AND
(
( C.start_date <= 20140607 AND C.end_date >= 20140607 AND C.saturday= 1 ) // regular service today
AND ( ( CD.date != 20140607 ) // no exception today
OR ( CD.date = 20140607 AND CD.exception_type = 1 ) // or ADDED exception today
)
)
ORDER BY stopTimes.departure_time LIMIT 20
This results in no record being found.
If a remove the last part, dealgin with the CD tables (i.e. the removed or added exceptions), it works perfectly fine.
So I think I'm miswriting the check on the exceptions.
As written above with // comments, I want to check that
today is in a regular service (from checking the calendar table)
there is no removal exception for today (or in this case the trips corresponding to this service id are not included in the computation)
if there is added exception for today, the corresponding trips shall be included in the computation
can you help me with that ?
I'm fairly certain it's not possible to do what you're trying to do with only a single SELECT statement, due to the design of the calendar and calendar_dates tables.
What I do is use a second, inner query to build the set of active service IDs on the requested date, then join the outer query against this set to include only results relevant for that date. Try this:
SELECT DISTINCT ST.departure_time FROM stop_times ST
JOIN trips T ON T._id = ST.trip_id
JOIN (SELECT _id FROM calendar
WHERE start_date <= 20140607
AND end_date >= 20140607
AND saturday = 1
UNION
SELECT service_id FROM calendar_dates
WHERE date = 20140607
AND exception_type = 1
EXCEPT
SELECT service_id FROM calendar_dates
WHERE date = 20140607
AND exception_type = 2
) ASI ON ASI._id = T.service_id
WHERE ST.stop_id = 3377699724118483
AND T.direction_id = 0
AND ST.departure_time >= "16:00:00"
ORDER BY ST.departure_time
LIMIT 20