sql max for column based on value in row - sql

I am trying to get the maximum value in a column based on another value in a row.
I have the following code :
UPDATE LeweringVsSkattingResultaat
SET maks = ((SELECT max(persentklaarkultivar2) FROM
LeweringVsSkattingResultaat)
group by kultivar2)
I get the following error :Incorrect syntax near the keyword 'group'.
I want the maksimum value in column persentklaarkultivar2 for each value in kultivar2.
Any help would be much appreciated.
Regards

Your subquery would generate an error if you have more than on value for kultivar2. The group by would return a row for each kultivar2.
Although you can use a correlated subquery to fix the problem (see end of answer), I like to do this with updatable CTEs and window functions:
with toupdate as (
select r.*,
max(persentklaarkultivar2) over (partition by kultivar2) as maxval
from LeweringVsSkattingResultaat r
)
update toupdate
set maks = maxval;
I should note that with window functions, you can readily calculate the maximum whenever you want, so it is not necessary to store it. Window functions are optimized so they can take advantage of an index on LeweringVsSkattingResultaat(kultivar2, persentklaarkultivar2).
This is probably a better approach. You won't have to figure out how to keep the maks value up-to-date when rows are inserted, updated, or deleted from the table.
The correlated subquery would look like:
UPDATE r
SET maks = (SELECT max(r2.persentklaarkultivar2)
FROM LeweringVsSkattingResultaat r2
WHERE r2.kultivar2 = r.kultivar2
)
FROM LeweringVsSkattingResultaat r;

Remove a ()
UPDATE LeweringVsSkattingResultaat
SET maks = ( SELECT max(persentklaarkultivar2) FROM
LeweringVsSkattingResultaat
group by kultivar2)
otherwise your group by is out the inner select

Related

Redshift Query for comparing current row to previous row - SQL query to Redshift Query

I have a table test with fields - A (ID), B (Flag). I need to add a new column - C (Result) in this table and it's value will be derived based on B (Flag) field. If flag is false then keep checking previous rows till we get flag as true and then take value of A (ID) field and populate it in C (Result) column. So C will have the last value of A with B field as True.
I have the query in SQL but when I try to use it in Redshift I get following errors.
1st Query Option:
WITH
cte1 AS (
SELECT A, SUM(B='T') OVER (ORDER BY A) group_no
FROM test
),
cte2 AS (
SELECT A, MIN(A) OVER (PARTITION BY group_no) previous_T
FROM cte1
)
UPDATE test
JOIN cte2 USING (A)
SET test.C = cte2.previous_T;
I am getting errors in SUM and MIN function.
2nd Query Option:
UPDATE test
JOIN (
SELECT A,
#tmp := CASE WHEN B='T' THEN A ELSE #tmp END C
FROM test
JOIN (SELECT #tmp:=0) init
ORDER BY A
) data USING (A)
SET test.C = data.C;
Getting error in temporary table.
I am new to SQL with no experience in Redshift, appreciate any help I get. Thanks!
I only have a few min but I think I can get you started. Let’s stick to query #1.
SUM(B=‘T’) isn’t going to work in Redshift. Look at the function DECODE() as it will allow you to switch how a column is generated based on another column.
It looks like you want to do a rolling SUM() so you will need a frame clause (likely “rows unbounded preceding”).
It’s not clear why you want SUM() of ids. I’d think you would want MAX() as this will give you the highest preceding id. Advise if I’m missing something.
I’d think you would want something like:
DECODE(B=‘T’, true, A, MAX(A) OVER (order by A rows unbounded preceding) as prev_id
But I don’t know your data or the process you are trying to implement.
As for the UPDATE I suggest you look at the Redshift docs. This will look more like:
…
update test set c = cte.prev_id
from cte
where test.id = cte.id;
This assumes the id is unique and that I have half a clue what you are trying to do.

Trying to update temporary table in sql by accumulating variable

I have a temp table that contains 4 variables that I need to do a calculation to and then accumulate that field from record to record, similar to a cumulative sum. I have setup a RANK in the #TEMP_TABLE_1. Here is my code:
UPDATE T1
SET T1.TOTAL_NET_BAL = ISNULL((SELECT T2.TOTAL_NET_BAL
FROM #TEMP_TABLE_1 AS T2
WHERE T1.RANK - 1 = T2.RANK),0) + (T1.MAX_STD_CAPACITY + T1.MAX_QT_CAPACITY) - (T1.STANDARD_PANELS + T1.QUICKTURN_PANELS)
FROM #TEMP_TABLE_1 AS T1
All this is doing is updating the current row in the table for TOTAL_NET_BAL. For some reason, it is not grabbing the amount when we are on the next row.
You're looking for a cumulative sum. You can see how to do a cumulative sum at this link and adapt that to fit your need.
So what I ended up doing was changing my code to another select statement into another TEMP table using the SUM function. Still don't know why my original SQL statement up top wouldn't work but I am able to move on. Thanks.
In SQL Server, you can use an updatable CTE. If I understand correctly, you want a running sum:
with toupdate as (TOTAL_NET_BAL) over
select t.*,
sum((T1.MAX_STD_CAPACITY + T1.MAX_QT_CAPACITY) - (T1.STANDARD_PANELS + T1.QUICKTURN_PANELS)) over (order by rank) as running_total
from #temp_table t
update toupdate
set total_net_bal = running_total;

Get count and result from SQL query in Go

I'm running a pretty straightforward query using the database/sql and lib/pq (postgres) packages and I want to toss the results of some of the fields into a slice, but I need to know how big to make the slice.
The only solution I can find is to do another query that is just SELECT COUNT(*) FROM tableName;.
Is there a way to both get the result of the query AND the count of returned rows in one query?
Conceptually, the problem is that the database cursor may not be enumerated to the end so the database does not really know how many records you will get before you actually read all of them. The only way to count (in general case) is to go through all the records in the resultset.
But practically, you can enforce it to do so by using subqueries like
select *, (select count(*) from table) from table
and just ignore the second column for records other than first. But it is very rude and I do not recommend doing so.
Not sure if this is what you are asking for but you can call the ##Rowcount function to return the count of the previous select statement that has been executed.
SELECT mytable.mycol FROM mytable WHERE mytable.foo = 'bar'
SELECT ##Rowcount
If you want the row count included in your result set you can use the the OVER clause (MSDN)
SELECT mytable.mycol, count(*) OVER(PARTITION BY mytable.foo) AS 'Count' FROM mytable WHERE mytable.foo = 'bar'
You could also perhaps just separate two SQL statements with the a ; . This would return a result set of both statements executed.
You would used count(*)
SELECT count(distinct last)
FROM (XYZTable)
WHERE date(FROM_UNIXTIME(time)) >= '2013-10-28' AND
id = 90 ;

Need a row count after SELECT statement: what's the optimal SQL approach?

I'm trying to select a column from a single table (no joins) and I need the count of the number of rows, ideally before I begin retrieving the rows. I have come to two approaches that provide the information I need.
Approach 1:
SELECT COUNT( my_table.my_col ) AS row_count
FROM my_table
WHERE my_table.foo = 'bar'
Then
SELECT my_table.my_col
FROM my_table
WHERE my_table.foo = 'bar'
Or Approach 2
SELECT my_table.my_col, ( SELECT COUNT ( my_table.my_col )
FROM my_table
WHERE my_table.foo = 'bar' ) AS row_count
FROM my_table
WHERE my_table.foo = 'bar'
I am doing this because my SQL driver (SQL Native Client 9.0) does not allow me to use SQLRowCount on a SELECT statement but I need to know the number of rows in my result in order to allocate an array before assigning information to it. The use of a dynamically allocated container is, unfortunately, not an option in this area of my program.
I am concerned that the following scenario might occur:
SELECT for count occurs
Another instruction occurs, adding or removing a row
SELECT for data occurs and suddenly the array is the wrong size.
-In the worse case, this will attempt to write data beyond the arrays limits and crash my program.
Does Approach 2 prohibit this issue?
Also, Will one of the two approaches be faster? If so, which?
Finally, is there a better approach that I should consider (perhaps a way to instruct the driver to return the number of rows in a SELECT result using SQLRowCount?)
For those that asked, I am using Native C++ with the aforementioned SQL driver (provided by Microsoft.)
If you're using SQL Server, after your query you can select the ##RowCount function (or if your result set might have more than 2 billion rows use the RowCount_Big() function). This will return the number of rows selected by the previous statement or number of rows affected by an insert/update/delete statement.
SELECT my_table.my_col
FROM my_table
WHERE my_table.foo = 'bar'
SELECT ##Rowcount
Or if you want to row count included in the result sent similar to Approach #2, you can use the the OVER clause.
SELECT my_table.my_col,
count(*) OVER(PARTITION BY my_table.foo) AS 'Count'
FROM my_table
WHERE my_table.foo = 'bar'
Using the OVER clause will have much better performance than using a subquery to get the row count. Using the ##RowCount will have the best performance because the there won't be any query cost for the select ##RowCount statement
Update in response to comment: The example I gave would give the # of rows in partition - defined in this case by "PARTITION BY my_table.foo". The value of the column in each row is the # of rows with the same value of my_table.foo. Since your example query had the clause "WHERE my_table.foo = 'bar'", all rows in the resultset will have the same value of my_table.foo and therefore the value in the column will be the same for all rows and equal (in this case) this the # of rows in the query.
Here is a better/simpler example of how to include a column in each row that is the total # of rows in the resultset. Simply remove the optional Partition By clause.
SELECT my_table.my_col, count(*) OVER() AS 'Count'
FROM my_table
WHERE my_table.foo = 'bar'
There are only two ways to be 100% certain that the COUNT(*) and the actual query will give consistent results:
Combined the COUNT(*) with the query, as in your Approach 2. I recommend the form you show in your example, not the correlated subquery form shown in the comment from kogus.
Use two queries, as in your Approach 1, after starting a transaction in SNAPSHOT or SERIALIZABLE isolation level.
Using one of those isolation levels is important because any other isolation level allows new rows created by other clients to become visible in your current transaction. Read the MSDN documentation on SET TRANSACTION ISOLATION for more details.
Approach 2 will always return a count that matches your result set.
I suggest you link the sub-query to your outer query though, to guarantee that the condition on your count matches the condition on the dataset.
SELECT
mt.my_row,
(SELECT COUNT(mt2.my_row) FROM my_table mt2 WHERE mt2.foo = mt.foo) as cnt
FROM my_table mt
WHERE mt.foo = 'bar';
If you're concerned the number of rows that meet the condition may change in the few milliseconds since execution of the query and retrieval of results, you could/should execute the queries inside a transaction:
BEGIN TRAN bogus
SELECT COUNT( my_table.my_col ) AS row_count
FROM my_table
WHERE my_table.foo = 'bar'
SELECT my_table.my_col
FROM my_table
WHERE my_table.foo = 'bar'
ROLLBACK TRAN bogus
This would return the correct values, always.
Furthermore, if you're using SQL Server, you can use ##ROWCOUNT to get the number of rows affected by last statement, and redirect the output of real query to a temp table or table variable, so you can return everything altogether, and no need of a transaction:
DECLARE #dummy INT
SELECT my_table.my_col
INTO #temp_table
FROM my_table
WHERE my_table.foo = 'bar'
SET #dummy=##ROWCOUNT
SELECT #dummy, * FROM #temp_table
Here are some ideas:
Go with Approach #1 and resize the array to hold additional results or use a type that automatically resizes as neccessary (you don't mention what language you are using so I can't be more specific).
You could execute both statements in Approach #1 within a transaction to guarantee the counts are the same both times if your database supports this.
I'm not sure what you are doing with the data but if it is possible to process the results without storing all of them first this might be the best method.
If you are really concerned that your row count will change between the select count and the select statement, why not select your rows into a temp table first? That way, you know you will be in sync.
Why don't you put your results into a vector? That way you don't have to know the size before hand.
You might want to think about a better pattern for dealing with data of this type.
No self-prespecting SQL driver will tell you how many rows your query will return before returning the rows, because the answer might change (unless you use a Transaction, which creates problems of its own.)
The number of rows won't change - google for ACID and SQL.
IF (##ROWCOUNT > 0)
BEGIN
SELECT my_table.my_col
FROM my_table
WHERE my_table.foo = 'bar'
END
Just to add this because this is the top result in google for this question.
In sqlite I used this to get the rowcount.
WITH temptable AS
(SELECT one,two
FROM
(SELECT one, two
FROM table3
WHERE dimension=0
UNION ALL SELECT one, two
FROM table2
WHERE dimension=0
UNION ALL SELECT one, two
FROM table1
WHERE dimension=0)
ORDER BY date DESC)
SELECT *
FROM temptable
LEFT JOIN
(SELECT count(*)/7 AS cnt,
0 AS bonus
FROM temptable) counter
WHERE 0 = counter.bonus

Can I get the position of a record in a SQL result table?

If I do something like
SELECT * FROM mytable ORDER BY mycolumn ASC;
I get a result table in a specific order.
Is there a way in SQL to efficiently find out, given a PK, what position in that result table would contain the record with my PK?
You can count the number of records where the value that you are sorting on has a lower value than the record that you know the key value of:
select count(*)
from mytable
where mycolumn < (select mycolumn from mytable where key = 42)
On databases that support it, you could use ROW_NUMBER() for this purpose:
SELECT RowNr
FROM (
SELECT
ROW_NUMBER() OVER (ORDER BY mycolumn) AS RowNr,
mycolumn
FROM mytable
) sub
WHERE sub.mycolumn = 42
The example assumes you're looking for primary key 42 :)
The subquery is necessary because something like:
SELECT
ROW_NUMBER() OVER (ORDER BY mycolumn) AS RowNr
FROM mytable
WHERE sub.mycolumn = 42
Will always return 1; ROW_NUMBER() works after the WHERE, so to speak.
SQL doesn't work that way. It's set-based, which means that "position in that result table" is meaningless to the database.
You can keep track of position when you map the ResultSet into a collection of objects or when you iterate over it.
Unfortunately you cannot get "the position of a row in a table".
The best you can get, using ORDER BY and a variant of the ROW_NUMBER construct (depends on the database engine in use), is the position of a row in the resultset of the query executed.
This position does not map back to any position in the table, though, unless the ORDER BY is on a set of clustered index columns, but even then that position might be invalidated the next second.
What I would like to know is what you intended to use this "position" for.
This answer applies to MySQL
==> lower than 8.0
SET #row_number = 0;
SELECT
(#row_number:=#row_number + 1) AS num,
myColumn.first,
myColumn.second
FROM
myTable
ORDER BY myColumn.first, myColumn.second
source: http://www.mysqltutorial.org/mysql-row_number/
==> greater than 8.0
Please see MySQL ROW_NUMBER() function manual as I did not test. But it seems this function is prefered.
There's no way you can tell that without selecting an entire subset of records. If your PK is of integer type, you can
select count(*) from mytable
where id <= 10 -- Record with ID 10
order by mycolumn asc