Using Distinct Max Partition in TSQL

Using Distinct Max Partition in TSQL - sql

SELECT DISTINCT
MAX(pb.revision) OVER (PARTITION BY pb.batch) AS REVNO,
pb.lot, pb.batch
FROM
db1 ps
INNER JOIN
db2 pb ON ps.Title = pb.Lot
WHERE
pb.proposalsubmitted IS NOT NULL
AND PS.Contract LIKE 'CLS'
AND BATCH <> 6
GROUP BY
pb.lot, pb.batch, pb.revision
ORDER BY
pb.lot
Above is my code snippet - I'm trying to retrieve the Max Revision number per batch. There can be multiple batches per LOT number. An added problem is that this database was setup in SharePoint at one time and the batch column is actually a VARCHAR and not Integer. I'm not sure if that is what the issue is because every time I do a pull I end up with:
The actual batch numbers and corresponding revisions are here:
Thanks guys - it maybe a fundamental issue on my part or a misunderstanding of the over() partition by

Related

Difference of values in the same column ms access sql (mdb)

I have a table which contains two column with values that are not unique, those values are generated automatically and I have no way to do anything about it, cannot edit the table, db nor make custom functions.
With that in mind I've solved this problem in sql server, but it contains some functions that does not exist in ms-access.
The columns are Volume and ComponentID, here is my code in sql:
with rows as (
select row_number() over (order by volume) as rownum, volume
from test where componentid = 'S3')
select top 10
rowsMinusOne.volume, coalesce(rowsMinusOne.volume - rows.volume,0) as diff
from rows as rowsMinusOne
left outer join rows
on rows.rownum = rowsMinusOne.rownum - 1
Sample data:
58.29168
70.57396
85.67902
97.04888
107.7026
108.2022
108.3975
108.5777
109
109.8944
Expected results:
Volume
diff
58.29168
0
70.57396
12.28228
85.67902
15.10506
97.04888
11.36986
107.7026
10.65368
108.2022
0.4996719
108.3975
0.1952896
108.5777
0.1801834
109
0.4223404
109.8944
0.89431
I have solved the part of the coalesce by replacing it with NZ, I have tryed to use the DCOUNT to solve the row_number (How to show the record number in a MS Access report table?) but I reveive the error that it cannot find the function (I am reading the data by code, that is the only thing I can do).
I also tryed this but, as the answer says I need a column with a unique value which I do not have nor can create Microsoft Access query to duplicate ROW_NUMBER

Consider:
SELECT TOP 10 Table1.ComponentID,
DCount("*","Table1","ComponentID = 'S3' AND Volume<" & [Volume])+1 AS Seq, Table1.Volume,
Nz(Table1.Volume -
(SELECT Top 1 Dup.Volume FROM Table1 AS Dup
WHERE Dup.ComponentID = Table1.ComponentID AND Dup.Volume<Table1.Volume
ORDER BY Volume DESC),0) AS Diff
FROM Table1
WHERE (((Table1.ComponentID)="S3"))
ORDER BY Table1.Volume;
This will likely perform very slowly with large dataset.
Alternative solutions:
build query that calculates difference, use that query as source for a report, use textbox RunningSum property to calculate sequence number
VBA looping through recordset and saving results to a 'temp' table
export to Excel

Running Sum Query

I'm new to Access and I'm trying to develop my own Inventory Management Database.
I'm trying to make a query that could display a running total of the Inventory on Hand as of a specific date. This is how my table looks:
It's sorted according to ITEM_ID then TRANDATE in ascending order. I'd like to add a calculated field beside the NET field that would show a running total of the specific ITEM_ID after a specific date. Negative numbers in the NET field represent a sale while the positive ones represent a purchase. I tried using the DSUM function as it is widely recommended in creating a running sum field. My expression is this
DSum([NET],"InvtyTransT", "[ITEM_ID]=" & [ITEM_ID] And "[TRANDATE]<=#" & [TRANDATE] & "#"). But it only shows the total of the NET field (6827) in each record like this:
What I needed is like this:
(I used an IF function in excel to compute this)
Please help. I think I might have missed something in my expression. I've tried revising it several times and it would always give me the same wrong answer in every record.
Thanks in advance.

Try correlated sub-query.
SELECT t.*, (SELECT SUM(t2.NET)
from InvtyTransT as t2
WHERE (t2.TRANDATE <= t.TRANDATE AND t2.ITEM_ID = t.ITEM_ID)
) AS rSUM
FROM InvtyTransT AS t;

SQL: Reduce resultset to X rows?

I have the following MYSQL table:
measuredata:
- ID (bigint)
- timestamp
- entityid
- value (double)
The table contains >1 billion entries. I want to be able to visualize any time-window. The time window can be size of "one day" to "many years". There are measurement values round about every minute in DB.
So the number of entries for a time-window can be quite different. Say from few hundrets to several thousands or millions.
Those values are ment to be visualiuzed in a graphical chart-diagram on a webpage.
If the chart is - lets say - 800px wide, it does not make sense to get thousands of rows from database if time-window is quite big. I cannot show more than 800 values on this chart anyhow.
So, is there a way to reduce the resultset directly on DB-side?
I know "average" and "sum" etc. as aggregate function. But how can I i.e. aggregate 100k rows from a big time-window to lets say 800 final rows?
Just getting those 100k rows and let the chart do the magic is not the preferred option. Transfer-size is one reason why this is not an option.
Isn't there something on DB side I can use?
Something like avg() to shrink X rows to Y averaged rows?
Or a simple magic to just skip every #th row to shrink X to Y?
update:
Although I'm using MySQL right now, I'm not tied to this. If PostgreSQL f.i. provides a feature that could solve the issue, I'm willing to switch DB.
update2:
I maybe found a possible solution: https://mike.depalatis.net/blog/postgres-time-series-database.html
See section "Data aggregation".
The key is not to use a unixtimestamp but a date and "trunc" it, avergage the values and group by the trunc'ed date. Could work for me, but would require a rework of my table structure. Hmm... maybe there's more ... still researching ...
update3:
Inspired by update 2, I came up with this query:
SELECT (`timestamp` - (`timestamp` % 86400)) as aggtimestamp, `entity`, `value` FROM `measuredata` WHERE `entity` = 38 AND timestamp > UNIX_TIMESTAMP('2019-01-25') group by aggtimestamp
Works, but my DB/index/structue seems not really optimized for this: Query for last year took ~75sec (slow test machine) but finally got only a one value per day. This can be combined with avg(value), but this further increases query time... (~82sec). I will see if it's possible to further optimize this. But I now have an idea how "downsampling" data works, especially with aggregation in combination with "group by".

There is probably no efficient way to do this. But, if you want, you can break the rows into equal sized groups and then fetch, say, the first row from each group. Here is one method:
select md.*
from (select md.*,
row_number() over (partition by tile order by timestamp) as seqnum
from (select md.*, ntile(800) over (order by timestamp) as tile
from measuredata md
where . . . -- your filtering conditions here
) md
) md
where seqnum = 1;

Get the first row of a nested field in BigQuery

I have been struggling with a question that seem simple, yet eludes me.
I am dealing with the public BigQuery table on bitcoin and I would like to extract the first transaction of each block that was mined. In other word, to replace a nested field by its first row, as it appears in the table preview. There is no field that can identify it, only the order in which it was stored in the table.
I ran the following query:
#StandardSQL
SELECT timestamp,
block_id,
FIRST_VALUE(transactions) OVER (ORDER BY (SELECT 1))
FROM `bigquery-public-data.bitcoin_blockchain.blocks`
But it process 492 GB when run and throws the following error:
Error: Resources exceeded during query execution: The query could not be executed in the allotted memory. Sort operator used for OVER(ORDER BY) used too much memory..
It seems so simple, I must be missing something. Do you have an idea about how to handle such task?

#standardSQL
SELECT * EXCEPT(transactions),
(SELECT transaction FROM UNNEST(transactions) transaction LIMIT 1) transaction
FROM `bigquery-public-data.bitcoin_blockchain.blocks`
Recommendation: while playing with large table like this one - I would recommend creating smaller version of it - so it incur less cost for your dev/test. Below can help with this - you can run it in BigQuery UI with destination table which you will then be using for your dev. Make sure you set Allow Large Results and unset Flatten Results so you preserve original schema
#legacySQL
SELECT *
FROM [bigquery-public-data:bitcoin_blockchain.blocks#1529518619028]
The value of 1529518619028 is taken from below query (at a time of running) - the reason I took four days ago is that I know number of rows in this table that time was just 912 vs current 528,858
#legacySQL
SELECT INTEGER(DATE_ADD(USEC_TO_TIMESTAMP(NOW()), -24*4, 'HOUR')/1000)

An alternative approach to Mikhail's: Just ask for the first row of an array with [OFFSET(0)]:
#StandardSQL
SELECT timestamp,
block_id,
transactions[OFFSET(0)] first_transaction
FROM `bigquery-public-data.bitcoin_blockchain.blocks`
LIMIT 10
That first row from the array still has some nested data, that you might want to flatten to only their first row too:
#standardSQL
SELECT timestamp
, block_id
, transactions[OFFSET(0)].transaction_id first_transaction_id
, transactions[OFFSET(0)].inputs[OFFSET(0)] first_transaction_first_input
, transactions[OFFSET(0)].outputs[OFFSET(0)] first_transaction_first_output
FROM `bigquery-public-data.bitcoin_blockchain.blocks`
LIMIT 1000

SQL add up rows in a column

I'm running SQL queries in Orion Report Writer for Solarwinds Netflow Traffic Analyzer and am trying to add up data usage for specific conversations coming from the same general sources. In this case it is netflix. I've made some progress with my query.
SELECT TOP 10000 FlowCorrelation_Source_FlowCorrelation.FullHostname AS Full_Hostname_A,
SUM(NetflowConversationSummary.TotalBytes) AS SUM_of_Bytes_Transferred,
SUM(NetflowConversationSummary.TotalBytes) AS Total_Bytes
FROM
((NetflowConversationSummary LEFT OUTER JOIN FlowCorrelation FlowCorrelation_Source_FlowCorrelation ON (NetflowConversationSummary.SourceIPSort = FlowCorrelation_Source_FlowCorrelation.IPAddressSort)) LEFT OUTER JOIN FlowCorrelation FlowCorrelation_Dest_FlowCorrelation ON (NetflowConversationSummary.DestIPSort = FlowCorrelation_Dest_FlowCorrelation.IPAddressSort)) INNER JOIN Nodes ON (NetflowConversationSummary.NodeID = Nodes.NodeID)
WHERE
( DateTime BETWEEN 41539 AND 41570 )
AND
(
(FlowCorrelation_Source_FlowCorrelation.FullHostname LIKE 'ipv4_1.lagg0%')
)
GROUP BY FlowCorrelation_Source_FlowCorrelation.FullHostname, FlowCorrelation_Dest_FlowCorrelation.FullHostname, Nodes.Caption, Nodes.NodeID, FlowCorrelation_Source_FlowCorrelation.IPAddress
So I've got an output that filters everything but netflix sessions (Full_Hostname_A) and their total usage for each session (Sum_Of_Bytes_Transferred)
I want to add up Sum_Of_Bytes_Transferred to get a total usage for all netflix sessions
listed, which will output to Total_Bytes. I created the column Total_Bytes, but don't know how to output a total to it.
For some asked clarification, here is the output from the above query:
I want the Total_Bytes Column to be all added up into one number.

I have no familiarity with the reporting tool you are using.
From reading your post I'm thinking you want the the first 2 columns of data that you've got, plus at a later point in the report, a single figure being the sum of the total_bytes column you're already producing.
Your reporting tool probably has some means of totalling a column, but you may need to get the support people for the reporting tool to tell you how to do that.
Aside from this, if you can find a way of calling a separate query in a latter section of the report, or if you embed a new report inside your existing report, after the detail section, and use that to run a separate query then you should be able to get the data you want with this:
SELECT Sum(Total_Bytes) as [Total Total Bytes]
FROM ( yourExistingQuery ) x
yourExistingQuery means the query you've already got, in full (doesnt have to be put on one line), the paretheses are required, and so is the "x". (The latter provides a syntax-required name for the virtual table which your query defines).
Hope this helps.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas