Parametrizing dates in SQL IN clause - using cell magic in jupyter notebook - pandas

Using MSSQL db as the backend, I have a cell in my notebook with this sql which works fine.
%%sql
select * from (
select count(*) as CNT, COL1,CONVERT(VARCHAR,CAST(CREATED_DATE AS date)) as dt from TABLE1
group by COL1,CONVERT(VARCHAR,CAST(CREATED_DATE AS date))
)t
PIVOT
(
sum(CNT) for [dt] in ([2022-07-25],[2022-07-26])
) AS PivotTable
I am trying to parameterize the [IN] clause in the pivot.
Tried a few things, but without much success
import pandas as pd
from datetime import datetime
rng = pd.date_range(end = datetime.today(), periods = 5).strftime('%Y-%m-%d').tolist()
#rng = format(','.join('[{}]'.format(i) for i in rng))
#rng = pd.date_range(end = datetime.today().date(), periods = 5)
print (rng)
['2022-07-26', '2022-07-27', '2022-07-28', '2022-07-29', '2022-07-30']
select * from (
select count(*) as CNT, COL1,CONVERT(VARCHAR,CAST(CREATED_DATE AS date)) as dt from TABLE1
group by COL1,CONVERT(VARCHAR,CAST(CREATED_DATE AS date))
)t
PIVOT
(
sum(CNT) for [dt] in (:rng)
) AS PivotTable
* mssql+pymssql://---
(pymssql._pymssql.ProgrammingError) (102, b"Incorrect syntax near '('.DB-Lib error message 20018, severity 15:\nGeneral SQL Server error: Check messages from the SQL Server\n")
[SQL: select * from (
select count(*) as CNT, COL1,CONVERT(VARCHAR,CAST(CREATED_DATE AS date)) as dt from TABLE1
group by COL1,CONVERT(VARCHAR,CAST(CREATED_DATE AS date))
)t
PIVOT
(
sum(CNT) for [dt] in (%(rng)s)
) AS PivotTable]
[parameters: {'rng': ['2022-07-26', '2022-07-27', '2022-07-28', '2022-07-29', '2022-07-30']}]
(Background on this error at: https://sqlalche.me/e/14/f405)
any ideas on how I can achieve this. I will try creating the entire query dynamically but it will be much better if I can pass the dates alone into the query.
thanks for your time.

Related

Assistance with PERCENTILE_CONT function and GROUP By error

All,
I am having problems with the below query. I am trying to get stat data from our database for the last 3 years but I keep getting the error message:
***Column 'OC_VDATA.DATA1' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.***
I know it has something to do with the DATA1 column but I am not familiar enough using the PERCENTILE_CONT function to know what the solution is.
Anyone have any ideas?
WITH Q AS
(
SELECT stagingPLM.dbo.ITEM_CODES.ITEM_CODE,
AVG(OC_VDATA.DATA1) AS Mean,
STDEVP(OC_VDATA.DATA1) AS StandardDev,
PERCENTILE_CONT(0.5)
WITHIN GROUP (ORDER BY OC_VDATA.DATA1)
OVER (PARTITION BY stagingPLM.dbo.ITEM_CODES.ITEM_CODE) AS Median
FROM OC_VDATA INNER JOIN
OC_VDAT_AUX ON OC_VDATA.PARTNO = OC_VDAT_AUX.PARTNOAUX
AND OC_VDATA.DATETIME = OC_VDAT_AUX.DATETIMEAUX INNER JOIN
stagingPLM.dbo.ITEM_CODES ON LEFT(OC_VDATA.PARTNO, 12) = stagingPLM.dbo.ITEM_CODES.SPEC_NO
AND LEFT(OC_VDAT_AUX.PARTNOAUX, 12) = stagingPLM.dbo.ITEM_CODES.SPEC_NO
WHERE (OC_VDAT_AUX.UDL28 LIKE '%PLASTIC%')
AND (RIGHT(OC_VDATA.PARTNO, 6) = '036150')
AND (CAST(OC_VDAT_AUX.UDL40 AS DATETIME)
BETWEEN CONVERT(datetime, '2019-05-18 00:00:00', 102) AND CONVERT(datetime, '2022-05-18 00:00:00', 102))
GROUP BY stagingPLM.dbo.ITEM_CODES.ITEM_CODE
)
SELECT * FROM Q
The error is because of the code WITHIN GROUP (ORDER BY OC_VDATA.DATA1).
You are doing GROUP BY(for AVG and STDEVP) based on ITEM_CODE, whereas ORDER BY is there on OC_VDATA.DATA1 for the Window function.
Better to calculate AVG,STDEVP and PERCENTILE_CONT with Window Function, instead of half through GROUP BY and half through Window Function.
By considering the minimum required columns to reproduce the issue, you can rewrite the query as below to get the desired output.
SELECT DISTINCT item_codes.item_code,
Avg(oc_vdata.data1)
over(
PARTITION BY item_codes.item_code) AS Mean,
Stdevp(oc_vdata.data1)
over(
PARTITION BY item_codes.item_code) AS StandardDev,
Percentile_cont(0.5)
within GROUP (ORDER BY oc_vdata.data1) over (
PARTITION BY item_codes.item_code) AS Median
FROM oc_vdata
inner join item_codes
ON Left(oc_vdata.partno, 12) = item_codes.spec_no
DB Fiddle: Try it here
Minimum steps to reproduce the error:
SELECT item_codes.item_code,
Avg(oc_vdata.data1) AS Mean,
Stdevp(oc_vdata.data1) AS StandardDev
FROM oc_vdata
INNER JOIN item_codes
ON LEFT(oc_vdata.partno, 12) = item_codes.spec_no
GROUP BY item_codes.item_code
ORDER BY oc_vdata.data1 -- This will cause the error

SQL group by in Subquery

I'm trying to get monthly production using group by after converting the unix column into regular timestamp. Can you please tell how to use group by here in the code.
'''
With production(SystemId, dayof, monthof, yearof, powerwatts, productionwattshours) as
(
Select SystemId,
[dayof] = DAY(hrdtc),
[monthof] = MONTH(hrdtc),
[yearof] = YEAR(hrdtc),
powerwatts, productionwatthours
from (
Select * , dateadd(s, UnixTime, '19700101') as hrdtc from meterreading ) ds
)
Select * from production
where systemId = 2368252
'''
I think you're looking for this (technically you don't need a subquery but it allows you to avoid repeating the DATEADD() expression):
SELECT SystemId = 2368252,
[Month] = DATEFROMPARTS(YEAR(hrdtc), MONTH(hrdtc), 1),
powerwatts = SUM(powerwatts),
productionwatthours = SUM(productionwatthours)
FROM
(
SELECT powerwatts, productionwatthours,
DATEADD(SECOND, UnixTime, '19700101') as hrdtc
FROM dbo.enphasemeterreading
WHERE systemId = 2368252
) AS ds
GROUP BY DATEFROMPARTS(YEAR(hrdtc), MONTH(hrdtc), 1);
If you want to also avoid repeating the GROUP BY expression:
SELECT SystemId = 2368252,
[Month],
powerwatts = SUM(powerwatts),
productionwatthours = SUM(productionwatthours)
FROM
(
SELECT [Month] = DATEFROMPARTS(YEAR(hrdtc), MONTH(hrdtc), 1),
powerwatts, productionwatthours
FROM
(
SELECT powerwatts, productionwatthours,
DATEADD(SECOND, UnixTime, '19700101') as hrdtc
FROM dbo.enphasemeterreading
WHERE systemId = 2368252
) AS ds1
) AS ds2
GROUP BY [Month];
Personally I don't think that's any prettier or clearer. A couple of other tips:
Spell it out; shorthand is lazy and problematic
Always qualify tables and other objects with schema
Updated requirement (please state these up front): How would I join this query to another table?
SELECT * FROM dbo.SomeOtherTable AS sot
INNER JOIN
(
SELECT SystemId = 2368252,
[Month],
powerwatts = SUM(powerwatts),
productionwatthours = SUM(productionwatthours)
FROM
...
GROUP BY [Month]
) AS agg
ON sot.SystemId = agg.SystemId;

How to compute cumulative product in SQL Server 2008?

I have below table with 2 columns, DATE & FACTOR. I would like to compute cumulative product, something like CUMFACTOR in SQL Server 2008.
Can someone please suggest me some alternative.
Unfortunately, there's not PROD() aggregate or window function in SQL Server (or in most other SQL databases). But you can emulate it as such:
SELECT Date, Factor, exp(sum(log(Factor)) OVER (ORDER BY Date)) CumFactor
FROM MyTable
You can do it by:
SELECT A.ROW
, A.DATE
, A.RATE
, A.RATE * B.RATE AS [CUM RATE]
FROM (
SELECT ROW_NUMBER() OVER(ORDER BY DATE) as ROW, DATE, RATE
FROM TABLE
) A
LEFT JOIN (
SELECT ROW_NUMBER() OVER(ORDER BY DATE) as ROW, DATE, RATE
FROM TABLE
) B
ON A.ROW + 1 = B.ROW
To calculate the cumulative product, as displayed in the CumFactor column in the original post, the following code does the job:
--first, load the sample data to a temp table
select *
into #t
from
(
values
('2/3/2000', 10),
('2/4/2000', 20),
('2/5/2000', 30),
('2/6/2000', 40)
) d ([Date], [Rate]);
--next, calculate cumulative product
select *, CumFactor = cast(exp(sum(log([Rate])) over (order by [Date])) as int) from #t;
Here is the result:

oracle: pivot on dynamic dates

I have this query:
select pvt1.*
from
(
select
TO_CHAR(DateAppointment, 'yyyy-mm-dd') as currentDay,
count(*) myCounter
[...]
from (
select
[...]
from myTable
) a
group by [...]
order by DateAppointment
) source1
PIVOT
(
max(myCounter)
--FOR currentDay IN ('2012-08-20', '2012-08-21', '2012-08-27', '2012-09-03')
FOR currentDay IN (
SELECT LISTAGG(datevalue, ', ')
WITHIN GROUP (ORDER BY datevalue)
FROM DATESLIST
)
) pvt1;
This subquery just get the list of my dates from another table (DATESLIST), but when i run the first query, Oracle returns an error.
SELECT LISTAGG(datevalue, ', ')
WITHIN GROUP (ORDER BY datevalue)
FROM DATESLIST
But when i use instead the following code, i get the correct results:
FOR currentDay IN ('2012-08-20', '2012-08-21', '2012-08-27', '2012-09-03')
Any ideas?
Thanks in advance.

Filling in missing dates DB2 SQL

My initial query looks like this:
select process_date, count(*) batchCount
from T1.log_comments
order by process_date asc;
I need to be able to do some quick analysis for weekends that are missing, but wanted to know if there was a quick way to fill in the missing dates not present in process_date.
I've seen the solution here but am curious if there's any magic hidden in db2 that could do this with only a minor modification to my original query.
Note: Not tested, framed it based on my exposure to SQL Server/Oracle. I guess this gives you the idea though:
*now amended and tested on DB2*
WITH MaxDateQry(MaxDate) AS
(
SELECT MAX(process_date) FROM T1.log_comments
),
MinDateQry(MinDate) AS
(
SELECT MIN(process_date) FROM T1.log_comments
),
DatesData(ProcessDate) AS
(
SELECT MinDate from MinDateQry
UNION ALL
SELECT (ProcessDate + 1 DAY) FROM DatesData WHERE ProcessDate < (SELECT MaxDate FROM MaxDateQry)
)
SELECT a.ProcessDate, b.batchCount
FROM DatesData a LEFT JOIN
(
SELECT process_date, COUNT(*) batchCount
FROM T1.log_comments
) b
ON a.ProcessDate = b.process_date
ORDER BY a.ProcessDate ASC;