Why can't I use SELECT ... FOR UPDATE with aggregate functions? - sql

I have an application where I find a Sum() of a database column for a set of records and later use that sum in a separate query, similar to the following (made up tables, but the idea is the same):
SELECT Sum(cost)
INTO v_cost_total
FROM materials
WHERE material_id >=0
AND material_id <= 10;
[a little bit of interim work]
SELECT material_id, cost/v_cost_total
INTO v_material_id_collection, v_pct_collection
FROM materials
WHERE material_id >=0
AND material_id <= 10
FOR UPDATE;
However, in theory someone could update the cost column on the materials table between the two queries, in which case the calculated percents will be off.
Ideally, I would just use a FOR UPDATE clause on the first query, but when I try that, I get an error:
ORA-01786: FOR UPDATE of this query expression is not allowed
Now, the work-around isn't the problem - just do an extra query to lock the rows before finding the Sum(), but that query would serve no other purpose than locking the tables. While this particular example is not time consuming, the extra query could cause a performance hit in certain situations, and it's not as clean, so I'd like to avoid having to do that.
Does anyone know of a particular reason why this is not allowed? In my head, the FOR UPDATE clause should just lock the rows that match the WHERE clause - I don't see why it matters what we are doing with those rows.
EDIT: It looks like SELECT ... FOR UPDATE can be used with analytic functions, as suggested by David Aldridge below. Here's the test script I used to prove this works.
SET serveroutput ON;
CREATE TABLE materials (
material_id NUMBER(10,0),
cost NUMBER(10,2)
);
ALTER TABLE materials ADD PRIMARY KEY (material_id);
INSERT INTO materials VALUES (1,10);
INSERT INTO materials VALUES (2,30);
INSERT INTO materials VALUES (3,90);
<<LOCAL>>
DECLARE
l_material_id materials.material_id%TYPE;
l_cost materials.cost%TYPE;
l_total_cost materials.cost%TYPE;
CURSOR test IS
SELECT material_id,
cost,
Sum(cost) OVER () total_cost
FROM materials
WHERE material_id BETWEEN 1 AND 3
FOR UPDATE OF cost;
BEGIN
OPEN test;
FETCH test INTO l_material_id, l_cost, l_total_cost;
Dbms_Output.put_line(l_material_id||' '||l_cost||' '||l_total_cost);
FETCH test INTO l_material_id, l_cost, l_total_cost;
Dbms_Output.put_line(l_material_id||' '||l_cost||' '||l_total_cost);
FETCH test INTO l_material_id, l_cost, l_total_cost;
Dbms_Output.put_line(l_material_id||' '||l_cost||' '||l_total_cost);
END LOCAL;
/
Which gives the output:
1 10 130
2 30 130
3 90 130

The syntax select . . . for update locks records in a table to prepare for an update. When you do an aggregation, the result set no longer refers to the original rows.
In other words, there are no records in the database to update. There is just a temporary result set.

You might try something like:
<<LOCAL>>
declare
material_id materials.material_id%Type;
cost materials.cost%Type;
total_cost materials.cost%Type;
begin
select material_id,
cost,
sum(cost) over () total_cost
into local.material_id,
local.cost,
local.total_cost
from materials
where material_id between 1 and 3
for update of cost;
...
end local;
The first row gives you the total cost, but it selects all the rows and in theory they could be locked.
I don't know if this is allowed, mind you -- be interesting to hear whether it is.

For example, there is product table with id, name and stock as shown below.
product table:
id
name
stock
1
Apple
3
2
Orange
5
3
Lemon
8
Then, both 2 queries below can run sum() and SELECT FOR UPDATE together:
SELECT sum(stock) FROM (SELECT * FROM product FOR UPDATE) AS result;
WITH result AS (SELECT * FROM product FOR UPDATE) SELECT sum(stock) FROM result;
Output:
sum
-----
16
(1 row)

For that, you can use the WITH command.
Exemple:
WITH result AS (
-- your select
) SELECT * FROM result GROUP BY material_id;

Is your problem "However, in theory someone could update the cost column on the materials table between the two queries, in which case the calculated percents will be off."?
In that case , probably you can simply use a inner query as:
SELECT material_id, cost/(SELECT Sum(cost)
FROM materials
WHERE material_id >=0
AND material_id <= 10)
INTO v_material_id_collection, v_pct_collection
FROM materials
WHERE material_id >=0
AND material_id <= 10;
Why do you want to lock a table? Other applications might fail if they try to update that table during that time right?

Related

Count number of rows returned in a SQL statement

Are there any DB engines that allow you to run an EXPLAIN (or other function) where it will give you an approximate count of values that may be returned before an aggregation is run (not rows scanned but that actually would be returned)? For example, in the following query:
SELECT gender, COUNT(1) FROM sales JOIN (
SELECT id, person FROM sales2 WHERE country='US'
GROUP BY person_id
) USING (id)
WHERE sales.age > 20
GROUP BY gender
Let's say this query returns 3 rows after being aggregated, but would return 170M rows if unaggregated.
Are there any tools where you can run the query to get this '170M' number or does this have to do with complexity theory (or something similar) where it's almost just as expensive to run the query (without the final aggregation/having/sort/limit/etc) to get the count? In other words, doing a rewrite to:
SELECT COUNT(1) FROM sales JOIN (
SELECT id, person FROM sales2 WHERE country='US'
GROUP BY person_id
) USING (id)
WHERE sales.age > 20
But having to execute the query nonetheless.
As an example of using the current (mysql) explain to show how 'off' it is to get what I'm looking for:
explain select * from movies where title>'a';
# rows=147900
select count(1) from _tracktitle where title>'a';
# 144647 --> OK, pretty close
explain select * from movies where title>'u';
# rows=147900
select * from movies where title>'u';
# 11816 --> Not close at all
Assuming you can use MS SQL Server, you could tap into the same data the Optimiser is using for cardinality estimation: DBCC SHOW_STATISTICS (table, index) WITH HISTOGRAM
Part of data sets you get back is per-column histogram, which is essentially number of rows for each value range found in the table.
You probably want to query the data programmatically, one way to achieve this would be to insert it into a temp table:
CREATE TABLE #histogram (
RANGE_HI_KEY datetime PRIMARY KEY,
RANGE_ROWS INT,
EQ_ROWS INT,
DISTINCT_RANGE_ROWS INT,
AVG_RANGE_ROWS FLOAT
)
INSERT INTO #histogram
EXEC ('DBCC SHOW_STATISTICS (Users, CreationDate) WITH HISTOGRAM')
SELECT 'Estimate', SUM(RANGE_ROWS+EQ_ROWS) FROM #histogram WHERE RANGE_HI_KEY BETWEEN '2010-08-30 08:28:45.070' AND '2010-09-20 22:15:33.603'
UNION ALL
select 'Actual', COUNT(1) from Users u WHERE u.CreationDate BETWEEN '2010-08-30 08:28:45.070' AND '2010-09-20 22:15:33.603'
For example, check out what this same query run against Stack Overflow Database.
| -------- | ----- |
| Estimate | 98092 |
| Actual | 11715 |
it seems like a lot but then keep in mind that the whole table has almost 15mil records.
A note on precision and other gotchas
The maximum number of histogram steps is capped at 200 - which is not a lot, so you are not getting guaranteed 10% margin of error, but neither does SQL Server.
As you insert data into table, histograms may get stale so your results would get skewed even more.
There are different ways to update this data, some are reasonably quick while others effectively require full table scan
not all columns will have statistics. You can either create it manually or (I believe) it gets created automatically if you run a search with the column as predicate
MS Sql Server offers "execution plans". In the picture below I have queries and I press (Ctrl-L) to see the plans.
In my queries I return all records in first and just the count in the other, using the same table.
Look at metric corresponding to red arrows- estimated # of rows that WILL be scanned when queries are run. In this case, that number is same regardless whether count(*) or *, your point in case!

SQL How to update every nth row which meets requirement

I have a table that I would like to update one column data on every nth row if it meets row requirement.
My table has many columns but the key are Object_Id (in case this could be useful for creating temp table)
But the one I'm trying to update is online_status, it looks like below, but on bigger scales so I usually have 10rows that has same time but they all have %Online% in it and in total around 2000 rows (with Online and about another 2000 with Offline). I just need to update every 2-4 rows of those 10 that are repeating itself.
Table picture here: (for some reason table formatting doesn't come up good)
Table
So what I tried is: This pulls a list of every 3rd record that matches criteria Online, I just need a way to update it but can't get through this.
SELECT * FROM (SELECT *, row_number() over() rn FROM people
WHERE online_status LIKE '%Online%') foo WHERE online_status LIKE '%Online%' AND foo.rn % 3 =0
What I also tried is:
However this has updated every single row. not the ones I needed.
UPDATE people
SET online_status = 'Offline 00:00-24:00'
WHERE people.Object_id IN
(SELECT *
FROM
(SELECT people.Object_id, row_number() over() rn FROM people
WHERE online_status LIKE '%Online%') foo WHERE people LIKE '%Online%' AND foo.rn % 3 =0);
Is there a way to take list from Select code above and simply update it or run a few scripts that could add it to like temp table and store object ids, and the next script would update main table if object id would match temp table.
Thank you for any help :)
Don't select other columns but Object_id in the subquery at WHERE people.Object_id IN (..)
UPDATE people
SET online_status = 'Offline 00:00-24:00'
WHERE Object_id IN
( SELECT Object_id
FROM
( SELECT p.Object_id, row_number() over() rn
FROM people p
WHERE p.online_status LIKE '%Online%') foo
WHERE foo.rn % 3 = 0
);

Multiplying fields from separate columns which have the same ID in SQL?

I have two tables which are joined by an ID...
table 1
- Assessment ID
- Module ID
- Assessment Weighting
table 2
- ID
- AssessmentID
- ModuleID
- UserID
- MarkFrom100
An assessment can have many students taking the assessment.
For example
A module has two assessments, one worth 60% and the other worth 40%. in table 2, I want to take the weighting value from table 1 and multiply it against the mark from 100.
SELECT * FROM Assessment, ModuleAssessmentUser WHERE
INNER JOIN moduleassementuser.assessmentID on Assessment.assessmentID
MULTIPLY AssessmentWeighting BY MarkFrom100 AS finalmark
UserID = 1
I know this is way off, but I really don't know how else to go about it.
My SQL knowledge is limited, so any help is appreciated!
You may use a SUM function in your query which will sum all the data of a certain group in a sub query wich will allow you to multiply the sum to the weight
sub query :
SELECT ModuleID, AssessmentID, UserID, SUM(MarkFrom100) as Total
FROM Table_2
GROUP BY ModuleID
Then use this sub query as a table in a main query :
SELECT T1.Assessment_ID, T1.ModuleID, Q1.UserID (Q1.Total * T1.Assessment_Weighting) as FinalMark
FROM (SELECT ModuleID, UserID, SUM(MarkFrom100) as Total
FROM Table_2
GROUP BY ModuleID) AS Q1
INNER JOIN Table_1 as T1 on T1.ModuleID = Q1.ModuleID
-- WHERE T1.ModuleID = 2 -- a particular module ID
GROUP BY ModuleID;
Note that the WHERE statement is in comment. If you want the whole data, remove it, if you want a particular data, use it ^^
NOTE :
I don't have your database, so it may need some tweeks, but the main idea is there

how can I calculate the sum of my top n records in crystal report?

I m using report tab -> group sort expert-> top n to get top n record but i m getting sum of value in report footer for all records
I want only sum of value of top n records...
In below image i have select top 3 records but it gives sum of all records.
The group sort expert (and the record sort expert too) intervenes in your final result after the total summary is calculated. It is unable to filter and remove rows, in the same way an ORDER BY clause of SQL cannot effect the SELECT's count result (this is a job for WHERE clause). As a result, your summary will always be computed for all rows of your detail section and, of course, for all your group sums.
If you have in mind a specific way to exlude specific rows in order to appear the appropriate sum the you can use the Select Expert of Crystal Reports to remove rows.
Alternatively (and I believe this is the best way), I would make all the necessary calculations in the SQL command and I would sent to the report only the Top 3 group sums (then you can get what you want with a simple total summary of these 3 records)
Something like that
CREATE TABLE #TEMP
(
DEP_NAME varchar(50),
MINVAL int,
RMAVAL int,
NETVAL int
)
INSERT INTO #TEMP
SELECT TOP 3
T.DEP_NAME ,T.MINVAL,T.RMAVAL,T.NETVAL
FROM
(SELECT DEP_NAME AS DEP_NAME,SUM(MINVAL) AS MINVAL,SUM(RMAVAL) AS
RMAVAL,SUM(NETVAL) AS NETVAL
FROM YOURTABLE
GROUP BY DEP_NAME) AS T
ORDER BY MINVAL DESC
SELECT * FROM #TEMP

Approach to a Bin Packing sql problem

I have a problem in sql where I need to generate a packing list from a list of transactions.
Data Model
The transactions are stored in a table that contains:
transaction id
item id
item quantity
Each transaction can have multiple items (and coincidentally multiple rows with the same transaction id). Each item then has a quantity from 1 to N.
Business Problem
The business requires that we create a packing list, where each line item in the packing list contains the count of each item in the box.
Each box can only contain 160 items (they all happen to be the same size/weight). Based on the total count of the order we need to split items into different boxes (sometimes splitting even the individual item's collection into two boxes)
So the challenge is to take that data schema and come up with the result set that includes how many of each item belong in each box.
I am currently brute forcing this in some not so pretty ways and wondering if anyone has an elegant/simple solution that I've overlooked.
Example In/Out
We really need to isolate how many of each item end up in each box...for example:
Order 1:
100 of item A100 of item B140 of item C
This should result in three rows in the result set:
Box 1: A (100), B (60) Box 2: B(40), C (120) Box 3: C(20)
Ideally the query would be smart enough to put all of C together, but at this point - we're not too concerned with that.
How about something like
SELECT SUM([Item quantity]) as totalItems
, SUM([Item quantity]) / 160 as totalBoxes
, MOD(SUM([Item Quantity), 160) amountInLastBox
FROM [Transactions]
GROUP BY [Transaction Id]
Let me know what fields in the resultset you're looking for and I could come up with a better one
I was looking for something similar and all I could achieve was expanding the rows to the number of item counts in a transaction, and grouping them into bins. Not very elegant though.. Moreover, because string aggregation is still very cumbersome in SQL Server (Oracle, i miss you!), I have to leave the last part out. I mean putting the counts in one single row..
My solution is as follows:
Example transactions table:
INSERT INTO transactions
(trans_id, item, cnt) VALUES
('1','A','50'),
('2','A','140'),
('3','B','100'),
('4','C','80');
GO
Create a dummy sequence table, which contains numbers from 1 to 1000 (I assume that maximum number allowed for an item in a single transaction is 1000):
CREATE TABLE numseq (n INT NOT NULL IDENTITY) ;
GO
INSERT numseq DEFAULT VALUES ;
WHILE SCOPE_IDENTITY() < 1000 INSERT numseq DEFAULT VALUES ;
GO
Now we can generate a temporary table from transactions table, in which each transaction and item exist "cnt" times in a subquery, and then give numbers to the bins using division, and group by bin number:
SELECT bin_nr, item, count(*) count_in_bin
INTO result
FROM (
SELECT t.item, ((row_number() over (order by t.item, s.n) - 1) / 160) + 1 as bin_nr
FROM transactions t
INNER JOIN numseq s
ON t.cnt >= s.n -- join conditionally to repeat transaction rows "cnt" times
) a
GROUP BY bin_id, item
ORDER BY bin_id, item
GO
Result is:
bin_id item count_in_bin
1 A 160
2 A 30
2 B 100
2 C 30
3 C 50
In Oracle, the last step would be as simple as that:
SELECT bin_id, WM_CONCAT(CONCAT(item,'(',count_in_bin,')')) contents
FROM result
GROUP BY bin_id
This isn't the prettiest answer but I am using a similar method to keep track of stock items through an order process, and it is easy to understand, and may lead to you developing a better method than I have.
I would create a table called "PackedItem" or something similar. The columns would be:
packed_item_id (int) - Primary Key, Identity column
trans_id (int)
item_id (int)
box_number (int)
Each record in this table represents 1 physical unit you will ship.
Lets say someone adds a line to transaction 4 with 20 of item 12, I would add 20 records to the PackedItem table, all with the transaction ID, the Item ID, and a NULL box number. If a line is updated, you need to add or remove records from the PackedItem table so that there is always a 1:1 correlation.
When the time comes to ship, you can simply
SELECT TOP 160 FROM PackedItem WHERE trans_id = 4 AND box_number IS NULL
and set the box_number on those records to the next available box number, until no records remain where the box_number is NULL. This is possible using one fairly complicated UPDATE statement inside a WHILE loop - which I don't have the time to construct fully.
You can now easily get your desired packing list by querying this table as follows:
SELECT box_number, item_id, COUNT(*) AS Qty
FROM PackedItem
WHERE trans_id = 4
GROUP BY box_number, item_id
Advantages - easy to understand, fairly easy to implement.
Pitfalls - if the table gets out of sync with the lines on the Transaction, the final result can be wrong; This table will get many records in it and will be extra work for the server. Will need each ID field to be indexed to keep performance good.