PANDAS divide for a given value with groupby - pandas

I want to divide each 'Value' in this dataset by the Value at TIME=='1970-Q1' grouped by LOCATION.
This is how I'd implement the logic in SQL
WITH first_year AS (
SELECT LOCATION, Value
FROM `table`
WHERE TIME = '1970-Q1'
)
SELECT t.LOCATION, t.TIME, ((t.Value / f.Value) * 100) normValue
FROM `table` t,
first_year f
WHERE t.LOCATION = f.LOCATION
ORDER BY LOCATION, TIME ASC
However, you can also assume that we can sort (ascending) the column TIME within the group and take the first value. It's always a string like 'YYYY-QX'
Expected result:

Try with transform
df['normal'] = df.Value / df['VALUE'].where(df.TIME.str[5:] =='Q1').groupby(df['LOCATION']).transform('first')

Related

Oracle SQL - Timestamp splits query result into 2 rows, Need all in one with

I need a time-based query (Random or Current) with all results in one row. My current query is as follows:
WITH started AS
(
SELECT f.*, CURRENT_DATE + ROWNUM / 24
FROM
(
SELECT
d.route_name,
d.op_name,
d.route_step_name,
nvl(MAX(DECODE(d.complete_reason, NULL, d.op_STARTS)), 0) started_units,
round(nvl(MAX(DECODE(d.complete_reason, 'PASS', d.op_complete)), 0) / d.op_starts * 100, 2) yield
FROM
(
SELECT route_name,
op_name,
route_step_name,
complete_reason,
complete_quantity,
sum(start_quantity) OVER(PARTITION BY route_name, op_name, COMPLETE_REASON) op_starts,
sum(complete_quantity) OVER(PARTITION BY route_name, op_name, COMPLETE_REASON ) op_complete
FROM FTPC_LT_PRDACT.tracked_object_history
WHERE route_name = 'HEADER FINAL ASSEMBLY'
AND OP_NAME NOT LIKE '%DISPOSITION%'
and (tobj_type = 'Lot')
AND xfr_insert_pid IN
(
SELECT xfr_start_id
FROM FTPC_LT_PRDACT.xfr_interval_id
WHERE last_modified_time <= SYSDATE
AND OP_NAME NOT LIKE '%DISPOSITION%'
and complete_reason = 'PASS' OR complete_reason IS NULL
)
) d
GROUP BY d.route_name, d.op_name, d.route_step_name, complete_reason, d.op_starts
ORDER BY d.route_step_name
) f
),
queued AS
(
SELECT
ts.route_name,
ts.queue_name,
o.op_name,
sum (th.complete_quantity) queued_units
FROM
FTPC_LT_PRDACT.tracked_object_HISTORY th,
FTPC_LT_PRDACT.tracked_object_status ts,
FTPC_LT_PRDACT.route_arc a,
FTPC_LT_PRDACT.route_step r,
FTPC_LT_PRDACT.operation o,
FTPC_LT_PRDACT.lot l
WHERE r.op_key = o.op_key
and l.lot_key = th.tobj_key
AND a.to_node_key = r.route_step_key
AND a.from_node_key = ts.queue_key
and th.tobj_history_key = ts.tobj_history_key
AND a.main_path = 1
AND (ts.tobj_type = 'Lot')
AND O.OP_NAME NOT LIKE '%DISPOSITION%'
and th.route_name = 'HEADER FINAL ASSEMBLY'
GROUP BY ts.route_name, ts.queue_name, o.op_name
)
SELECT
started.route_name,
started.op_name,
started.route_step_name,
max(started.yield) started_yield,
max(started.started_units) started_units,
case when queued.queue_name is NULL then 'N/A' else queued.queue_name end QUEUE_NAME,
case when queued.queued_units is NULL then 0 else queued.queued_units end QUEUED_UNITS
FROM started
left JOIN queued ON started.op_name = queued.op_name
group by started.route_name, started.op_name, started.route_step_name, queued.queue_name, QUEUED_UNITS
order by started.route_step_name asc
;
Current Query (as expected) but missing timestamp:
I need to have a timestamp for each individual row for a different application to display the results. Any help would be greatly appreciated! When I try to add a timestamp my query is altered:
Query once timestamp is added:
Edit: I need to display the query in a visualization tool. That tool is time based and will skew the table results unless there is a datetime associated with each field. The date time value can be random, but cannot be the same for each result.
The query is to be displayed on a live dashboard, every time the application is refreshed, the query is expected to be updated.

Select value when it different from time between time

I have Table:
I want to get value like this:
IP
Temp
Time
172.16.24.96
31.5
2021-07-24 11:17:46.000
172.16.24.96
31.4
2021-07-24 11:18:31.000
When it have same value in interval just get 1 value with the lowest Time.
If by "interval" you mean similar ip and Temp, then you can use the GROUP BY clause:
SELECT ip, Temp, MIN(Time)
FROM yourtable
GROUP BY ip, Temp;
EDIT
Larnu at the comment sections has pointed out that temperature might change and then change back. To cope with that issue, here's an adjusted query:
SELECT current.ip, current.Temp, current.Time
FROM yourtable current
LEFT JOIN yourtable previous
ON current.ip = previous.ip AND
current.Temp = previous.Temp AND
current.Time > previous.Time
LEFT JOIN yourtable betw
ON (current.ip <> betw.ip OR current.Temp <> betw.Temp) AND
betw.Time BETWEEN (previous.Time AND current.Time)
WHERE (previous.Temp IS NULL) OR
((NOT (previous.Temp IS NULL)) AND (NOT (betw.Temp IS NULL)))
GROUP BY current.ip, current.Temp, current(Time)
HAVING count(*) > 0
A simple method uses lag():
select t.*
from (select t.*,
lag(temp) over (partition by ip order by time) as prev_temp
from t
) t
where prev_temp is null or prev_temp <> temp;

MAX NOT WORKING IN SQL QUERY

I want the latest record to be retrieved by the following query....
but max is not working in the below query. All the rows are getting retrieved instead of the latest one
SELECT SV.SEGMENT1 TARGETED_INCENTIVE,
SIT.ANALYSIS_CRITERIA_ID,
SIT.OBJECT_VERSION_NUMBER OBJECT_VERSION_NUMBER,
ST.ID_FLEX_NUM,
SIT.DATE_FROM,
SIT.DATE_TO,
MAX (SIT.PERSON_ANALYSIS_ID)
FROM FND_ID_FLEX_STRUCTURES_TL STTL,
FND_ID_FLEX_STRUCTURES ST,
PER_PERSON_ANALYSES SIT,
PER_ANALYSIS_CRITERIA SV
WHERE 1 = 1
AND (STTL.ID_FLEX_STRUCTURE_NAME) LIKE
('%%Tare%')
AND STTL.LANGUAGE = USERENV ('LANG')
AND ST.ID_FLEX_CODE = STTL.ID_FLEX_CODE
AND ST.ID_FLEX_NUM = STTL.ID_FLEX_NUM
AND ST.ID_FLEX_NUM = SIT.ID_FLEX_NUM
AND ST.ID_FLEX_NUM = SV.ID_FLEX_NUM
AND TO_DATE (SIT.DATE_TO) IS NULL
AND SIT.ANALYSIS_CRITERIA_ID = SV.ANALYSIS_CRITERIA_ID
AND SIT.PERSON_ID = (SELECT PERSON_ID
FROM abc
WHERE ID = :AIN)
GROUP BY SV.SEGMENT1,
SIT.ANALYSIS_CRITERIA_ID,
STTL.ID_FLEX_STRUCTURE_NAME,
SIT.OBJECT_VERSION_NUMBER,
ST.ID_FLEX_NUM,
SIT.DATE_FROM,
SIT.DATE_TO;
Can anyone guide ?
I'm afraid that's not what MAX() does. MAX() is an aggregate function (though it can be used as a window [analytic] function), so when you get the MAX() of a particular column grouped by other columns, you will get distinct combinations of values for all those other columns.
I think you might want something like this:
SELECT targeted_incentive, analysis_criteria_id
, object_version_number, id_flex_num, date_from
, date_to, person_analysis_id
FROM (
SELECT sv.segment1 AS targeted_incentive
, sit.analysis_criteria_id
, sit.object_version_number
, st.id_flex_num
, sit.date_from
, sit.date_to
, sit.person_analysis_id
, RANK() OVER ( ORDER BY sit.person_analysis_id DESC ) rn
FROM fnd_id_flex_structures_tl sttl
, fnd_id_flex_structures st
, per_person_analyses sit
, per_analysis_criteria sv
WHERE sttl.id_flex_structure_name LIKE '%Tare%'
AND sttl.language = USERENV('LANG')
AND st.id_flex_code = sttl.id_flex_code
AND st.id_flex_num = sttl.id_flex_num
AND st.id_flex_num = sit.id_flex_num
AND st.id_flex_num = sv.id_flex_num
AND sit.date_to IS NULL
AND sit.analysis_criteria_id = sv.analysis_criteria_id
AND sit.person_id = ( SELECT person_id FROM abc
WHERE id = :AIN )
) WHERE rn = 1;
The RANK() window function will return the rank of each row ordered by the value of person_analysis_id in descending order. To get the maximum value, simply filter for rank = 1. Note that this will return more than one row in case of ties. If you want only one row, use ROW_NUMBER() in place of RANK().
Also note that I cleaned up the query a bit. You certainly don't need to use two % wildcards in a row in a LIKE, for example. You also definitely don't need the WHERE 1=1 condition.

How to find the average time difference between rows in a table?

I have a mysql database that stores some timestamps. Let's assume that all there is in the table is the ID and the timestamp. The timestamps might be duplicated.
I want to find the average time difference between consecutive rows that are not duplicates (timewise). Is there a way to do it in SQL?
If your table is t, and your timestamp column is ts, and you want the answer in seconds:
SELECT TIMESTAMPDIFF(SECOND, MIN(ts), MAX(ts) )
/
(COUNT(DISTINCT(ts)) -1)
FROM t
This will be miles quicker for large tables as it has no n-squared JOIN
This uses a cute mathematical trick which helps with this problem. Ignore the problem of duplicates for the moment. The average time difference between consecutive rows is the difference between the first timestamp and the last timestamp, divided by the number of rows -1.
Proof: The average distance between consecutive rows is the sum of the distance between consective rows, divided by the number of consecutive rows. But the sum of the difference between consecutive rows is just the distance between the first row and last row (assuming they are sorted by timestamp). And the number of consecutive rows is the total number of rows -1.
Then we just condition the timestamps to be distinct.
Are the ID's contiguous ?
You could do something like,
SELECT
a.ID
, b.ID
, a.Timestamp
, b.Timestamp
, b.timestamp - a.timestamp as Difference
FROM
MyTable a
JOIN MyTable b
ON a.ID = b.ID + 1 AND a.Timestamp <> b.Timestamp
That'll give you a list of time differences on each consecutive row pair...
Then you could wrap that up in an AVG grouping...
Here's one way:
select avg(timestampdiff(MINUTE,prev.datecol,cur.datecol))
from table cur
inner join table prev
on cur.id = prev.id + 1
and cur.datecol <> prev.datecol
The timestampdiff function allows you to choose between days, months, seconds, and so on.
If the id's are not consecutive, you can select the previous row by adding a rule that there are no other rows in between:
select avg(timestampdiff(MINUTE,prev.datecol,cur.datecol))
from table cur
inner join table prev
on prev.datecol < cur.datecol
and not exists (
select *
from table inbetween
where prev.datecol < inbetween.datecol
and inbetween.datecol < cur.datecol)
)
OLD POST but ....
Easies way is to use the Lag function and TIMESTAMPDIFF
SELECT
id,
TIMESTAMPDIFF('MINUTES', PREVIOUS_TIMESTAMP, TIMESTAMP) AS TIME_DIFF_IN_MINUTES
FROM (
SELECT
id,
TIMESTAMP,
LAG(TIMESTAMP, 1) OVER (ORDER BY TIMESTAMP) AS PREVIOUS_TIMESTAMP
FROM TABLE_NAME
)
Adapted for SQL Server from this discussion.
Essential columns used are:
cmis_load_date: A date/time stamp associated with each record.
extract_file: The full path to a file from which the record was loaded.
Comments:
There can be many records in each file. Records have to be grouped by the files loaded on the extract_file column. Intervals of days may pass between one file and the next being loaded. There is no reliable sequential value in any column, so the grouped rows are sorted by the minimum load date in each file group, and the ROW_NUMBER() function then serves as an ad hoc sequential value.
SELECT
AVG(DATEDIFF(day, t2.MinCMISLoadDate, t1.MinCMISLoadDate)) as ElapsedAvg
FROM
(
SELECT
ROW_NUMBER() OVER (ORDER BY MIN(cmis_load_date)) as RowNumber,
MIN(cmis_load_date) as MinCMISLoadDate,
CASE WHEN NOT CHARINDEX('\', extract_file) > 0 THEN '' ELSE RIGHT(extract_file, CHARINDEX('\', REVERSE(extract_file)) - 1) END as ExtractFile
FROM
TrafTabRecordsHistory
WHERE
court_id = 17
and
cmis_load_date >= '2019-09-01'
GROUP BY
CASE WHEN NOT CHARINDEX('\', extract_file) > 0 THEN '' ELSE RIGHT(extract_file, CHARINDEX('\', REVERSE(extract_file)) - 1) END
) t1
LEFT JOIN
(
SELECT
ROW_NUMBER() OVER (ORDER BY MIN(cmis_load_date)) as RowNumber,
MIN(cmis_load_date) as MinCMISLoadDate,
CASE WHEN NOT CHARINDEX('\', extract_file) > 0 THEN '' ELSE RIGHT(extract_file, CHARINDEX('\', REVERSE(extract_file)) - 1) END as ExtractFile
FROM
TrafTabRecordsHistory
WHERE
court_id = 17
and
cmis_load_date >= '2019-09-01'
GROUP BY
CASE WHEN NOT CHARINDEX('\', extract_file) > 0 THEN '' ELSE RIGHT(extract_file, CHARINDEX('\', REVERSE(extract_file)) - 1) END
) t2 on t2.RowNumber + 1 = t1.RowNumber

Find closest numeric value in database

I need to find a select statement that will return either a record that matches my input exactly, or the closest match if an exact match is not found.
Here is my select statement so far.
SELECT * FROM [myTable]
WHERE Name = 'Test' AND Size = 2 AND PType = 'p'
ORDER BY Area DESC
What I need to do is find the closest match to the 'Area' field, so if my input is 1.125 and the database contains 2, 1.5, 1 and .5 the query will return the record containing 1.
My SQL skills are very limited so any help would be appreciated.
get the difference between the area and your input, take absolute value so always positive, then order ascending and take the first one
SELECT TOP 1 * FROM [myTable]
WHERE Name = 'Test' and Size = 2 and PType = 'p'
ORDER BY ABS( Area - #input )
something horrible, along the lines of:
ORDER BY ABS( Area - 1.125 ) ASC LIMIT 1
Maybe?
If you have many rows that satisfy the equality predicates on Name, Size, and PType columns then you may want to include range predicates on the Area column in your query. If the Area column is indexed this could allow efficient index-based access.
The following query (written using Oracle syntax) uses one branch of a UNION ALL to find the record with minimal Area >= your target, while the other branch finds the record with maximal Area < your target. One of these two records will be the record that you are looking for. Then you can ORDER BY ABS(Area - ?input) to pick the winner out of those two candidates. Unfortunately the query is complex due to nested SELECTS that are needed to enforce the desired ROWNUM / ORDER BY precedence.
SELECT *
FROM
(SELECT * FROM
(SELECT * FROM
(SELECT * FROM [myTable]
WHERE Name = 'Test' AND Size = 2 AND PType = 'p' AND Area >= ?target
ORDER BY Area)
WHERE ROWNUM < 2
UNION ALL
SELECT * FROM
(SELECT * FROM [myTable]
WHERE Name = 'Test' AND Size = 2 AND PType = 'p' AND Area < ?target
ORDER BY Area DESC)
WHERE ROWNUM < 2)
ORDER BY ABS(Area - ?target))
WHERE rownum < 2
A good index for this query would be (Name, Size, PType, Area), in which case the expected query execution plan would be based on two index range scans that each returned a single row.
SELECT *
FROM [myTable]
WHERE Name = 'Test' AND Size = 2 AND PType = 'p'
ORDER BY ABS(Area - 1.125)
LIMIT 1
-- MarkusQ
How about ordering by the difference between your input and [Area], such as:
DECLARE #InputValue DECIMAL(7, 3)
SET #InputValue = 1.125
SELECT TOP 1 * FROM [myTable]
WHERE Name = 'Test' AND Size = 2 AND PType = 'p'
ORDER BY ABS(#InputValue - Area)
Note that although ABS() is supported by pretty much everything, it's not technically standard (in SQL99 at least). If you must write ANSI standard SQL for some reason, you'd have to work around the problem with a CASE operator:
SELECT * FROM myTable
WHERE Name='Test' AND Size=2 AND PType='p'
ORDER BY CASE Area>1.125 WHEN 1 THEN Area-1.125 ELSE 1.125-Area END
If using MySQL
SELECT * FROM [myTable] ... ORDER BY ABS(Area - SuppliedValue) LIMIT 1