Coset table calculation failed -- trying with bigger table limit - gap-system

I tried with the following code snippet in GAP:
f:=FreeGroup("P","Q");
g:=f/ParseRelators(f, "P^2 = Q^3=1");
Size(g);
But it gets stuck in the following step:
gap> Size(g);
#I Coset table calculation failed -- trying with bigger table limit
#I Coset table calculation failed -- trying with bigger table limit
#I Coset table calculation failed -- trying with bigger table limit
#I Coset table calculation failed -- trying with bigger table limit
When I terminate the command by Ctrl-C, and then run the last command, the following result is returned:
brk> Size(g);
16384000
Such a simple relationship leads to such a complex group, which makes me a little confused.
Regards,
HZ

Related

Why changing where statement to a variable cause query to be 4 times slower

I am inserting data from one table "Tags" from "Recovery" database into another table "Tags" in "R3" database
they all live in my laptop similar SQL Server instance
I have built the insert query and because Recovery..Tags table is around 180M records I decided to break it into smaller sebsets. ( 1 million recs at the time)
Here is my query (Let's call Query A)
insert into R3..Tags (iID,DT,RepID,Tag,xmiID,iBegin,iEnd,Confidence,Polarity,Uncertainty,Conditional,Generic,HistoryOf,CodingScheme,Code,CUI,TUI,PreferredText,ValueBegin,ValueEnd,Value,Deleted,sKey,RepType)
SELECT T.iID,T.DT,T.RepID,T.Tag,T.xmiID,T.iBegin,T.iEnd,T.Confidence,T.Polarity,T.Uncertainty,T.Conditional,T.Generic,T.HistoryOf,T.CodingScheme,T.Code,T.CUI,T.TUI,T.PreferredText,T.ValueBegin,T.ValueEnd,T.Value,T.Deleted,T.sKey,R.RepType
FROM Recovery..tags T inner join Recovery..Reps R on T.RepID = R.RepID
where T.iID between 13000001 and 14000000
it takes around 2 minutes.
That is ok
To make things a bit easier for me
I put the iiD in the were statement in a variable
so my query looks like this (Let's call Query B)
declare #i int = 12
insert into R3..Tags (iID,DT,RepID,Tag,xmiID,iBegin,iEnd,Confidence,Polarity,Uncertainty,Conditional,Generic,HistoryOf,CodingScheme,Code,CUI,TUI,PreferredText,ValueBegin,ValueEnd,Value,Deleted,sKey,RepType)
SELECT T.iID,T.DT,T.RepID,T.Tag,T.xmiID,T.iBegin,T.iEnd,T.Confidence,T.Polarity,T.Uncertainty,T.Conditional,T.Generic,T.HistoryOf,T.CodingScheme,T.Code,T.CUI,T.TUI,T.PreferredText,T.ValueBegin,T.ValueEnd,T.Value,T.Deleted,T.sKey,R.RepType
FROM Recovery..tags T inner join Recovery..Reps R on T.RepID = R.RepID
where T.iID between (1000000 * #i) + 1 and (#i+1)*1000000
but that cause the insert to become so slow (around 10 min)
So what I tried query A again and gave me around 2 min
I tried query B again and gave around 8 min!!
I am attaching exec plan for each one (at a site that shows an analysis of the query plan) - Query A Plan and Query B Plan
Any idea why this is happening?
and how to fix it?
The big difference in time is due to the very different plans that are being created to join Tags and Reps.
Fundamentally, in version A, it knows how much data is being extracted (a million rows) and it can design an efficient query for that. However, because you are using variables in B to define how much data is being imported, it has to define a more generic query - one that would work for 10 rows, a million rows, or a hundred million rows.
In the plans, here are the relevant sections of the query joining Tags and Reps...
... in A
... and B
Note that in A it takes just over a minute to do the join; in B it takes 6 and a half minutes.
The key thing that appears to take the time is that it does a table scan of the Tags table which takes 5:44 to complete. The plan has this as a table scan, as the next time you run the query you may want many more than 1 million rows.
A secondary issue is that the amount of data it reads (or expects to read) from Reps is also way out of whack. In A it expected to read 2 million rows and read 1421; in B it basically read them all (even though technically it probably only needed the same 1421).
I think you have two main approaches to fix
Look at indexing, to remove the table scan on Tags - ensure the indexes match what is needed and allows the query to do a scan on that index (it appears that the index at the top of #MikePetri's answer is what you need, or similar). This way instead of doing a table scan, it can do an index scan which can start 'in the middle' of the data set (a table scan must start at either the start or end of the data set).
Separate this into two processes. The first process gets the relevant million rows from Tags, and saves it in a temporary table. The second process uses the data in the temporary table to join to Reps (also try using option (recompile) in the second query, so that it checks the temporary table's size before creating the plan).
You can even put an index or two (and/or Primary Key) on that temporary table to make it better for the next step.
The reason the first query is so much faster is it went parallel. This means the cardinality estimator knew enough about the data it had to handle, and the query was large enough to tip the threshold for parallel execution. Then, the engine passed chunks of data for different processors to handle individually, then report back and repartition the streams.
With the value as a variable, it effectively becomes a scalar function evaluation, and a query cannot go parallel with a scalar function, because the value has to determined before the cardinality estimator can figure out what to do with it. Therefore, it runs in a single thread, and is slower.
Some sort of looping mechanism might help. Create the included indexes to assist the engine in handling this request. You can probably find a better looping mechanism, since you are familiar with the identity ranges you care about, but this should get you in the right direction. Adjust for your needs.
With a loop like this, it commits the changes with each loop, so you aren't locking the table indefinitely.
USE Recovery;
GO
CREATE INDEX NCI_iID
ON Tags (iID)
INCLUDE (
DT
,RepID
,tag
,xmiID
,iBegin
,iEnd
,Confidence
,Polarity
,Uncertainty
,Conditional
,Generic
,HistoryOf
,CodingScheme
,Code
,CUI
,TUI
,PreferredText
,ValueBegin
,ValueEnd
,value
,Deleted
,sKey
);
GO
CREATE INDEX NCI_RepID ON Reps (RepID) INCLUDE (RepType);
USE R3;
GO
CREATE INDEX NCI_iID ON Tags (iID);
GO
DECLARE #RowsToProcess BIGINT
,#StepIncrement INT = 1000000;
SELECT #RowsToProcess = (
SELECT COUNT(1)
FROM Recovery..tags AS T
WHERE NOT EXISTS (
SELECT 1
FROM R3..Tags AS rt
WHERE T.iID = rt.iID
)
);
WHILE #RowsToProcess > 0
BEGIN
INSERT INTO R3..Tags
(
iID
,DT
,RepID
,Tag
,xmiID
,iBegin
,iEnd
,Confidence
,Polarity
,Uncertainty
,Conditional
,Generic
,HistoryOf
,CodingScheme
,Code
,CUI
,TUI
,PreferredText
,ValueBegin
,ValueEnd
,Value
,Deleted
,sKey
,RepType
)
SELECT TOP (#StepIncrement)
T.iID
,T.DT
,T.RepID
,T.Tag
,T.xmiID
,T.iBegin
,T.iEnd
,T.Confidence
,T.Polarity
,T.Uncertainty
,T.Conditional
,T.Generic
,T.HistoryOf
,T.CodingScheme
,T.Code
,T.CUI
,T.TUI
,T.PreferredText
,T.ValueBegin
,T.ValueEnd
,T.Value
,T.Deleted
,T.sKey
,R.RepType
FROM Recovery..tags AS T
INNER JOIN Recovery..Reps AS R ON T.RepID = R.RepID
WHERE NOT EXISTS (
SELECT 1
FROM R3..Tags AS rt
WHERE T.iID = rt.iID
)
ORDER BY
T.iID;
SET #RowsToProcess = #RowsToProcess - #StepIncrement;
END;

SAP HANA | BULK INSERT in SAP HANA

We have SAP HANA 1.0 SP11. We have one requirement where we need to calculate current stock at store, material level on daily basis. No of rows expected are around 250 million.
Currently we use procedure for same. Flow of procedure is as follows -
begin
t_rst = select * from <LOGIC of deriving current stock on tables MARD,MARC,MBEW>;
select count(*) into v_cnt from :t_rst;
v_loop = v_cnt/2500000;
FOR X in 0 .. v_loop DO
INSERT INTO CRRENT_STOCK_TABLE
SELECT * FROM :t_rst LIMIT 2500000 OFFSET :count;
COMMIT;
count := count + 2500000;
END FOR;
end;
Row count of result set t_rst is around 250 million.
Total execution time of procedure time is around 2.5 hours. Few times procedure goes into long running state resulting into error. We run this procedure in non peak hours of business so load on system is almost nothing.
Is there a way, we can load data in target table in parallel threads and reduce loading time. Also, is there way to bulk insert efficiently in HANA.
Query for t_rst fetches first 1000 rows in 5 minutes.
As Lars, mentioned the total resource usage will not change effectively
But if you have limited time (non-peak hours) and if the system configuration will overcome to the requirements of parallel execution, maybe you can try using
BEGIN PARALLEL EXECUTION
<stmt>
END;
Please refer to reference documentation
After you calculate v_loop value, you know how many times you have to run following INSERT command
INSERT INTO CRRENT_STOCK_TABLE
SELECT * FROM :t_rst LIMIT 2500000 OFFSET :count;
I'm not sure how to convert above code into a dynamic calculation for PARALLEL EXECUTION
But you can assume let's say 10 parallel processes, and run that many INSERT command by modifying the OFFSET clause according to calculated values
The ones that you exceed will run for zero rows which will not harm the overall process
As a response to #LarsBr. , as he mentioned there are limitations that will prevent parallel execution
Restrictions and Limitations
The following restrictions apply:
Modification of tables with a foreign key or triggers are not allowed
Updating the same table in different statements is not allowed
Only concurrent reads on one table are allowed. Implicit SELECT and SELCT INTO scalar variable statements are supported.
Calling procedures containing dynamic SQL (for example, EXEC, EXECUTE IMMEDIATE) is not supported in parallel blocks
Mixing read-only procedure calls and read-write procedure calls in a parallel block is not allowed.
These limitations saying, insert to same table will not be possible from different executions and dynamic SQL cannot be used too

T-SQL ways to avoid potentialy updating the same row based on subquery results

I have a SQL Server table with records (raw emails) that needs to be processed (build the email and send it) in a given order by an external process (mailer). Its not very resource intensive but can take a while with all the parsing and SMTP overhead etc.
To speed things up I can easily run multiple instance of the mailer process over multiple servers but worry that if two were to start at almost the same time they might still overlap a bit and send the same records.
Simplified for the question my table look something like this with each record having the data for the email.
queueItem
======================
queueItemID PK
...data...
processed bit
priority int
queuedStart datetime
rowLockName varchar
rowLockDate datetime
Batch 1 (Server 1)
starts at 12:00PM
lock/reserve the first 5000 rows (1-5000)
select the newly reserved rows
begin work
Batch 2 (Server 2)
starts at 12:15PM
lock/reserve the next 5000 rows (5001-10000)
select the newly reserved rows
begin work
To lock the rows I have been using the following:
declare #lockName varchar(36)
set #lockName = newid()
declare #batchsize int
set #batchsize = 5000
update queueItem
set rowLockName = #lockName,
rowLockDate = getdate()
where queueitemID in (
select top(#batchsize) queueitemID
from queueItem
where processed = 0
and rowLockName is null
and queuedStart <= getdate()
order by priority, queueitemID
)
If I'm not mistaken the query would start executing the SELECT subquery first and then lock the rows in preparation of the update, this is fast but not instantaneous.
My concern is that if I start two batches at near the same time (faster than the subquery runs) Batch 1's UPDATE might not be completed and Batch 2's SELECT would see the records as still available and attempt/succeed in overwriting Batch 1 (sort of race condition?)
I have ran some test but so far haven't had the issue with them overlapping, is it a valid concern that will come to haunt me at the worst of time?
Perhaps there are better way to write this query worth looking into as I am by no mean a T-SQL guru.

Performance difference between select and select with order by

I have a table T1 with seven columns with name c1,c2,c3,.... c7.
All columns are of type varchar ranging from 10 - 500, but mostly are of size 500.
Table has approximately 24k rows. When I execute
SELECT * FROM T1;
it takes more than 1 min to fetch all rows.But when I run same query with additional order by clause
SELECT * FROM T1 ORDER BY C1;
it takes less then 25 secs to load completed data.
Note: There are no indices on table.
Tool I am using to query is Embarcadero DBArtisan and Database is IBM DB2 9.7+.
Also server and client are several thousand kilometres apart.
I am bit confused why it is taking less time with order by clause and time difference is huge.

Make SELECT with LIMIT and OFFSET on big table fast

I have more than 10 million records in a table.
SELECT * FROM tbl ORDER BY datecol DESC
LIMIT 10
OFFSET 999990
Output of EXPLAIN ANALYZE on explain.depesz.com.
Executing the above query takes about 10 seconds. How can I make this faster?
Update
The execution time is reduced half by using a subquery:
SELECT * FROM tbl where id in
(SELECT id FROM tbl ORDER BY datecol DESC LIMIT 10 OFFSET 999990)
Output of EXPLAIN ANALYZE on explain.depesz.com.
You need to create an index on the column used in ORDER BY. Ideally in the same sort order, but PostgreSQL can scan indexes backwards at almost the same speed.
CREATE INDEX tbl_datecol_idx ON tbl (datecol DESC);
More about indexes and CREATE INDEX in the current manual.
Test with EXPLAIN ANALYZE to get actual times in addition to the query plan.
Of course all the usual advice for performance optimization applies, too.
I was trying to do something similar my self with a very large table ( >100m records ) and found that using Offset / Limit was killing performance.
Offset for the first 10m records was (with limit 1) about 1.5 minutes to retrieve with it growing exponentially.
By record 50m I was up to 3 minutes per select - even using sub-queries.
I came across a post here which details useful alternatives.
I modified this slightly to suit my needs and came up with a method that gave me pretty quick results.
CREATE TEMPORARY TABLE
just_index AS SELECT ROW_NUMBER()
OVER (ORDER BY [VALUE-You-need]), [VALUE-You-need]
FROM [your-table-name];
This was a once-off - took about 4 minutes but I then had all values I wanted
Next was to create a function that would loop at the "Offset" I needed:
create or replace
function GetOffsets ()
returns void as $$
declare
-- For this part of the function I only wanted values after 90 million up to 120 million
counter bigint := 90000000;
maxRows bigInt := 120000000;
begin
drop table if exists OffsetValues;
create temp table OffsetValues
(
offset_myValue bigint
);
while counter <= maxRows loop
insert into OffsetValues(offset_myValue)
select [VALUE-You-need] from just_index where row_number > counter
limit 1;
-- here I'm looping every 500,000 records - this is my 'Offset'
counter := counter + 500000 ;
end loop ;
end ;$$ LANGUAGE plpgsql;
Then run the function:
select GetOffsets();
Again, a once-off amount of time (I went from ~3 minutes getting one of my offset values to 3 milliseconds to get one of my offset values).
Then select from the temp-table:
select * from OffsetValues;
This worked really well for me in terms of performance - I don't think i'll be using offset going forward if I can help it.
Hope this improves performance for any of your larger tables.