SQL max multiple columns - sql

I am trying to display the maximum value of a specific value, and the corresponding timestamp for the value. I have the command working properly, but unfortunately, if the value is at the maximum value for more than one time period, it displays all of the timestamps. This can be cumbersome with multiple targets as well. Here is what I am using now:
select target_name,value,collection_timestamp
from (select target_name,value,collection_timestamp,
max(value) over (partition by target_name) max_value
from mgmt$metric_details
where target_type='host' and metric_name='TotalDiskUsage'
and column_label='Total Disk Utilized (%) (across all local filesystems)'
)
where value=max_value;
I want to utilize the same kind of command (trying to avoid inner joins etc, because of the lack of bandwidth)....but only show 1 max value/timestamp per target_name. Is there a way to coordinate a group by or limit function into this, without breaking it? I am somewhat unfamiliar with SQL, so this is all new territory.

Your query is so close. Instead of doing the max, do a row_number():
select target_name,value,collection_timestamp
from (select target_name,value,collection_timestamp,
row_number() over (partition by target_name order by value desc) as seqnum
from mgmt$metric_details
where target_type='host' and metric_name='TotalDiskUsage'
and column_label='Total Disk Utilized (%) (across all local filesystems)'
)
where seqnum = 1
This orders everything in the partition by value. You want the one largest value, so order by descending value and take the first in the sequence.

Use ROW_NUMBER() function instead of MAX() and appropriate ORDER BY in the window to resolve the ties:
select target_name,value,collection_timestamp
from (select target_name,value,collection_timestamp,
ROW_NUMBER() OVER (partition by target_name
ORDER BY value DESC,
collection_timestamp DESC )
AS rn
from mgmt$metric_details
where target_type='host' and metric_name='TotalDiskUsage'
and column_label='Total Disk Utilized (%) (across all local filesystems)'
)
where rn = 1 ;

Related

Google Bigquery Memory error when using ROW_NUMBER() on large table - ways to replace long hash by short unique identifier

For a query in google BigQuery I want to replace a long hash by a shorter numeric unique identifier to save some memory afterwards, so I do:
SELECT
my_hash
, ROW_NUMBER() OVER (ORDER BY null) AS id_numeric
FROM hash_table_raw
GROUP BY my_hash
I don't even need an order in the id, but ROW_NUMBER() requires an ORDER BY.
When I try this on my dataset (> 1 billion rows) I get a memory error:
400 Resources exceeded during query execution: The query could not be executed in the allotted memory. Peak usage: 126% of limit.
Top memory consumer(s):
sort operations used for analytic OVER() clauses: 99%
other/unattributed: 1%
Is there another way to replace a hash by an shorter identifier?
Thanks!
One does not really need to have populated over clause while doing this.
e.g. following will work:
select col, row_number() over() as row_num from (select 'A' as col)
So that will be your first try.
Now, with billion+ rows that you have: if above fails: you can do something like this (considering order is not at all important for you): but here you have to do it in parts:
SELECT
my_hash
, ROW_NUMBER() OVER () AS id_numeric
FROM hash_table_raw
where MOD(my_hash, 5) = 0
And in subsequent queries:
you can get max(id_numeric) from previous run and add that as an offset to next:
SELECT
my_hash
, previous_max_id_numberic_val + ROW_NUMBER() OVER () AS id_numeric
FROM hash_table_raw
where MOD(my_hash, 5) = 1
And keep appending outputs of these mod queries (0-4) to a single new table.

Count half of rest of a partition by from position

I'm trying to achieve the following results:
now, the group comes from
SUM(CASE WHEN seqnum <= (0.5 * seqnum_rev) THEN i.[P&L] END) OVER(PARTITION BY i.bracket_label ORDER BY i.event_id) AS [P&L 50%],
I need that in each iteration it counts the total of rows from the end till position (seq_inv) and sum the amounts in P&L only for the half of it from that position.
for example, when
seq = 2
seq_inv will be = 13, half of it is 6 so I need to sum the following 6 positions from seq = 2.
when seq = 4 there are 11 positions till the end (seq_inv = 11), so half is 5, so I want to count 5 positions from seq = 4.
I hope this makes sense, I'm trying to come up with a rule that will be able to adapt to the case I have, since the partition by is what gives me the numbers that need to be summed.
I was also thinking if there was something to do with a partition by top 50% or something like that, but I guess that doesn't exist.
I have the advantage that I've helped him before and have a little extra context.
That context is that this is just the later stage of a very long chain of common table expressions. That means self-joins and/or correlated sub-queries are unfortunately expensive.
Preferably, this should be answerable using window functions, as the data set is already available in the appropriate ordering and partitioning.
My reading is this...
The SUM(5:9) (meaning the sum of rows 5 to row 9, inclusive) is equal to SUM(5:end) - SUM(10:end)
That leads me to this...
WITH
cumulative AS
(
SELECT
*,
SUM([P&L]) OVER (PARTITION BY bracket_label ORDER BY event_id DESC) AS cumulative_p_and_l
FROM
data
)
SELECT
*,
cum_val - LEAD(cumulative_p_and_l, seq_inv/2, 0) OVER (PARTITION BY bracket_label ORDER BY event_id) AS p_and_l_50_perc,
cum_val - LEAD(cumulative_p_and_l, seq_inv/4, 0) OVER (PARTITION BY bracket_label ORDER BY event_id) AS p_and_l_25_perc,
FROM
cumulative
NOTE: Using , &, % in column names is horrendous, don't do it ;)
EDIT: Corrected the ORDER BY in the cumulative sum.
I don't think that window functions can do what you want. You could use a correlated subquery instead, with the following logic:
select
t.*,
(
select sum(t1.P&L]
from mytable t1
where t1.seq - t.seq between 0 and t.seq_inv/2
) [P&L 50%]
from mytable t

How to join records by date range

I need to match scrap records in one table with records indicating the material that was running at the same time on a machine. I have a table with the scrap counts and a table with records showing whenever the material changed on a machine.
I have a working query of which I will include a simplified version below, but it is very slow when applied to a large data set. I would like to try one of Oracle's analytical functions to make it faster, but I can't figure out how. I tried FIRST_VALUE, and ROW_NUMBER in a few different forms, but I couldn't get them right. Looking for any suggestions.
Please let me know if you would like more details.
Following are simplified versions of the tables:
Scrap readings table (~41m rows)
Machine
ScrapReasonCode
ScrapQuantity
ReportTime
Material numbers (~3m rows)
Machine
MaterialNumber
MEASUREMENT_TIMESTAMP
SELECT Scrap.Machine,
Scrap.MaterialNumber,
Scrap.ScrapReasonCode,
Scrap.ScrapQuantity,
Scrap.ReportTime
FROM Scrap, Materials
WHERE Scrap.Machine = Materials.Machine
AND Materials.MEASUREMENT_TIMESTAMP =
(SELECT MAX (M2.MEASUREMENT_TIMESTAMP)
FROM Materials M2
WHERE M2.Materials.Machine = Scrap.Machine
AND M2.MEASUREMENT_TIMESTAMP <= Scrap.ReportTime)
I think this is what you are trying to do. You can use the FIRST_VALUE window function.
SELECT DISTINCT
s.Machine,
s.MaterialNumber,
s.ScrapReasonCode,
s.ScrapQuantity,
s.ReportTime,
FIRST_VALUE(m.MEASUREMENT_TIMESTAMP) OVER(PARTITION BY s.Machine ORDER BY m.MEASUREMENT_TIMESTAMP DESC)
--or you can use the `MAX` window function too.
--MAX(m.MEASUREMENT_TIMESTAMP) OVER(PARTITION BY s.Machine)
FROM Scrap s
JOIN Materials m
WHERE s.Machine = m.Machine AND m.MEASUREMENT_TIMESTAMP <= s.ReportTime
I may be misunderstanding your requirements but I believe the following query should work in terms of implementing using ROW_NUMBER:
SELECT q.*
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY Scrap.Machine ORDER BY Materials.MEASUREMENT_TIMESTAMP DESC) AS RNO
Scrap.MaterialNumber,
Scrap.ScrapReasonCode,
Scrap.ScrapQuantity,
Scrap.ReportTime
FROM Scrap, Materials
WHERE Scrap.Machine = Materials.Machine
AND Materials.MEASUREMENT_TIMESTAMP <= Scrap.ReportTime
) q
WHERE q.RNO = 1
Edit: if you need the measurement timestamp before (rather than on-or-before) the Scrap ReportTime, you could just change the <= sign to a < sign in the query above.

Split the results of a query in half

I'm trying to export rows from one database to Excel and I'm limited to 65000 rows at a shot. That tells me I'm working with an Access database but I'm not sure since this is a 3rd party application (MRI Netsource) with limited query ability. I've tried the options posted at this solution (Is there a way to split the results of a select query into two equal halfs?) but neither of them work -- in fact, they end up duplicating results rather than cutting them in half.
One possibly related issue is that this table does not have a unique ID field. Each record's unique ID can be dynamically formed by the concatenation of several text fields.
This produces 91934 results:
SELECT * from note
This produces 122731 results:
SELECT * FROM (
SELECT *, ROW_NUMBER() OVER (ORDER BY notedate) AS rn FROM note
) T1
WHERE rn % 2 = 1
EDIT: Likewise, this produces 91934 results, half of them with a tile_nr value of 1, the other half with a value of 2:
SELECT *, NTILE(2) OVER (ORDER BY notedate) AS tile_nr FROM note
However this produces 122778 results, all of which have a tile_nr value of 1:
SELECT bldgid, leasid, notedate, ref1, ref2, tile_nr
FROM (
SELECT *, NTILE(2) OVER (ORDER BY notedate) AS tile_nr FROM note) x
WHERE x.tile_nr = 1
I know that I could just use a COUNT to get the exact number of records, run one query using TOP 65000 ORDER BY notedate, and then another that says TOP 26934 ORDER BY notedate DESC, for example, but as this dataset changes a lot I'd prefer some way to automate this to save time.

SQL ranking effective dates

There may be a very simple way to do this, but I can't quite think of it -- I have a dataset that returns a minimum job title and minimum effective date, then all effdts > than the min_effdt. In order to use this data in a charting program, I would like to rank each successive effdt if it exists, as in Min Role Effdt, then 2nd, 3rd, Max. Of course there could be anywhere from 2 to 20 jobs per person.
At first I considered trying a case statement, but I don't think that works when analyzing two columns at once. Is there a SQL statement that will allow ranking? Right now my data looks like
Employee Number | Min Base Role | Min Role Effdt | Base Role | Role Effdt
and comes from two tables, with the 2nd table brought in twice to get the Role / Effdt as Min, then All greater than Min.
I am using ORACLE. Code is below:
SELECT DISTINCT AL4.FULL_NAME,
AL4.EMPLOYEE_NUMBER,
AL4.HIRE_DATE,
AL4.DATE_OF_BIRTH,
AL4.AGE,
AL4.TERM_DATE,
AL4.ETHNIC_ORIGIN,
AL2.RECORDVALUE AS MIN_BASE_ROLE,
AL3.RECORDVALUE AS BASE_ROLE,
AL3.EFFECTIVE_START_DATE AS "ROLE EFFECTIVE DATE",
AL2.EFFECTIVE_START_DATE AS "MIN ROLE EFFDT"
FROM T1 AL2,
T2 AL3,
T3 AL4
WHERE AL4.PERSON_ID = AL2.PERSON_ID
AND AL4.PERSON_ID = AL3.PERSON_ID
AND AL4.EMPLOYEE_NUMBER = AL2.HISL_ID
AND AL4.EMPLOYEE_NUMBER = AL3.HISL_ID
AND AL2.RECORDTYPE = 'BASE_ROLE'
AND AL3.RECORDTYPE = 'BASE_ROLE'
AND AL2.EFFECTIVE_START_DATE = (SELECT MIN(A.EFFECTIVE_START_DATE) from T1 A where A.person_id = al2.person_id and a.recordtype = al2.recordtype)
AND AL3.EFFECTIVE_START_DATE > AL2.EFFECTIVE_START_DATE
AND (AL4.TERM_DATE >= '01-JAN-2012' or AL4.TERM_DATE is NULL)
order by AL4.EMPLOYEE_NUMBER
The function that you are looking for is row_number(). I think the expression you want is:
row_number() over (partition by AL4.EMPLOYEE_NUMBER
order by AL2.EFFECTIVE_START_DATE
) as ranking
The function row_number() says "assign a sequential number to a group of rows". The partition by clause defines the group, where the numbering starts over again at 1. The order by clause specifies the ordering within the group.
Similar functions rank() and dense_rank() might also be useful. They differ in how they handle duplicate values.