I have a range partitioned table in my database, it is range partitioned by a date column: transaction_date, with 1 partition per 1 month.
Now my problem is:
When running SQL statement to read data from the table,
select col1,col2 from mytable where ID=1
My table is very large so it takes a long time for the SQL to finish.
However, there is another ETL job to insert (append) data to the table at the same time, the insert operation cannot start until the read SQL finishes.
Any suggestions I can avoid this issue while reading data? Also are there any IBM official documents regarding this problem?
** EDIT 1:
$ db2level
DB21085I This instance or install (instance name, where applicable:
"db2inst1") uses "64" bits and Db2 code release "SQL11011" with level
identifier "0202010F".
Informational tokens are "DB2 v11.1.1.1", "s1610100100",
"DYN1610100100AMD64", and Fix Pack "1".
Product is installed at "/opt/ibm/db2/v11.1".
$ db2set -all
[i] DB2COMM=TCPIP
[i] DB2AUTOSTART=TRUE
[i] DB2OPTIONS=+c
[g] DB2FCMCOMM=TCPIP4
[g] DB2SYSTEM=<server hostname>
[g] DB2INSTDEF=db2inst1
** EDIT 2:
For the select and load SQL statement, I am not specifying any isolation level.
For the ETL job, it is an IBM DataStage job, the ETL insert is a bulk load append operation to insert data to a pre-existing range.
You may use the MON_LOCKWAITS administrative view to check what's happening during such a lock situation. You may optionally format a lock with the MON_FORMAT_LOCK_NAME function to get more details on this as well.
SELECT
W.*
--, F.NAME, F.VALUE
FROM SYSIBMADM.MON_LOCKWAITS W
--, TABLE(MON_FORMAT_LOCK_NAME(W.LOCK_NAME)) F
--WHERE W.REQ_APPLICATION_HANDLE = XXX -- if you know the holder's handle to reduce the amount of data returned
ORDER BY W.LOCK_NAME
;
Related
I am supposed to do incremental load and using below structure.
Do the statements execute in sequence i.e. TRUNCATE is never executed before first two statements which are getting data:
#newData = Extract ... (FROM FILE STREAM)
#existingData = SELECT * FROM dbo.TableA //this is ADLA table
#allData = SELECT * FROM #newData UNION ALL SELECT * FROM #existingData
TRUNCATE TABLE dbo.TableA;
INSERT INTO dbo.TableA SELECT * FROM #allData
To be very clear: U-SQL scripts are not executed statement by statement. Instead it groups the DDL/DML/OUTPUT statements in order and the query expressions are just subtrees to the inserts and outputs. But first it binds the data during compilation to their names, so your SELECT from TableA will be bound to the data (kind of like a light-weight snapshot), so even if the truncate is executed before the select, you should still be able to read the data from table A (note that permission changes may impact that).
Also, if your script fails during the execution phase, you should have an atomic execution. That means if your INSERT fails, the TRUNCATE should be undone at the end.
Having said that, why don't you use INSERT incrementally and use ALTER TABLE REBUILD periodically instead of doing the above pattern that reads the full table on every insertion?
I have a table, stop_logs in HIVE. When I run a insert query for around 6000 rows, it takes 300 secs, where as if I run just SELECT query, it finishes in 6 seconds. Why insert is taking this much time?
CREATE TABLE stop_logs (event STRING, loadId STRING)
STORED AS SEQUENCEFILE;
Following takes 300 sec:
INSERT INTO TABLE stop_logs
SELECT
i.event, i.loadId
FROM
event_logs i
WHERE
i.stopId IS NOT NULL;
;
Following query takes 6 secs.
SELECT
i.event, i.loadId
FROM
event_logs i
WHERE
i.stopId IS NOT NULL;
;
First you need to understand how Hive is processing your query:
When you perform a "select * from < tablename>", Hive fetches the whole data from file as a FetchTask rather than a mapreduce task which just dumps the data as it is without doing anything on it. This is similar to "hadoop dfs -text ". As it doesn't run any map-reduce task so it runs faster.
while using "select a,b from < tablename>", Hive requires a map-reduce job since it needs to extract the 'column' from each row by parsing it from the file it loads.
While using "insert into table stop_logs select a,b from event_logs" statement, first select statement runs, which trigger map-reduce job since it needs to extract the 'column' from each row by parsing it from the file it loads and for inserting into another table(stop_logs) it will launch another map reduce task to takes values inserted into columns a and b in 'stop_logs' and maps them to columns a and b, respectively, for insertion into the new row.
Another reason for slowness is check If "hive.typecheck.on.insert" is set to true ,because of that values are validated, converted and normalized to conform to their column types (Hive 0.12.0 onward) while inserting into table which also causes insert to perform slow as compare to select statement.
We have a tables in Db2, That which we need to get that table to MS SQL server (only for read), And I want it to be in sync for every 15 minutes (one way from DB2 to SQL Server). Can you suggest the best approach?
Have a SQL Agent job execute an SSIS package every 15 minutes.
I know that all the time MERGE is the right option to sync the tables in the SQL. But I am not sure, whether we can use it in linked servers also. Anyway, after some research I got this task accomplished by using the merge join. Merge will update, insert, delete what ever required. But it will take a little bit more time to update the table for every 15 min, when the job runs. So, you can create a #Temptable to insert the transactions that were done from the lastjob done.You can use the datetime stamp in that source table to retrieve the transactions that were done from the last job done(15min). If you don't have the date time in source table, you can use the audit table for that source table(if applicable).
(JLT table have 3 columns (last_job_end)( cur_job_start)(some job identity). JLT is the job log table we need to create in linked server to get the last job end and cur job start time, We need to update last job end every time at the end of query in JOB. As well as cur job start in the beginning of the job )
SELECT *
INTO #TEMPtable
FROM OPENQUERY([DB2], 'Select * from source_table
where some_id_column in
(select some_id_column
from audit_table AT, Job_log_table JLT
where datetime > last_job_end
and datetime <= cur_job_start
and c_job = ''some_job_id'')’)`
If you don't have the audit table and you have the datetime in Source.
SELECT *
INTO #TEMPtable
FROM OPENQUERY([DB2], 'Select *
from source_table s, JOB_CYCLE_TABLE pr
where s.DATETIME <= pr.cur_job_start
and s.DATETIME > pr.last_job_end
and pr.c_job = ''some_job_id''')
When I run a script in PostgreSQL I usually do the following from psql:
my_database> \i my_script.sql
Where in my_script.sql I may have code like the following:
select a.run_uid, s.object_uid into temp_table from dt.table_run_group as a
inner join dt.table_segment as s on a.group_uid = s.object_uid;
In this particular case, I am only interested in creating temp_table with the results of the query.
Are these results in disk on the server? In memory? Is the table stored permanently?
Temporary tables are stored in RAM until the available memory is used up, at which time they spill onto disk. The relevant setting here is temp_buffers.
Either way, they live for the duration of a session and are dropped at the end automatically.
You can also drop them at the end of a transaction automatically (ON COMMIT DROP) or manually any time.
Temporary table are only visible to the the same user in the same session. Others cannot access it - and also not conflict with it.
Always use CREATE TABLE tbl AS .... The alternative form SELECT ... INTO tbl is discouraged since it conflicts with the INTO clause in plpgsql.
Your query could look like:
CREATE TEMP TABLE tbl AS
SELECT a.run_uid, s.object_uid
FROM dt.table_run_group a
JOIN dt.table_segment s ON a.group_uid = s.object_uid;
SELECT INTO table ... is the same as CREATE TABLE table AS ..., which creates a normal, permanent table.
I need to insert records into a new table from an existing table. I used the following query to do this:
Insert into Newtable
Select * from Oldtable where date1 = #date
This query works most of the time, but in one scenario I get 10 million records to be inserted for the date1 value. In this case I'm getting the following error message:
Error : The transaction log for database "tempDB" is full. To find out why space in the log cannot be reused, see the log_reuse_wait_desc column in sys.databases
Should I break the query into parts and insert them sequentially, or is there a way to do this with the current query?
This is, perhaps, a distasteful suggestion. But, you can try exporting the data to a file and then inserting using bulk-insert, with database logging set to SIMPLE or BULK-LOGGED.
More information is at http://msdn.microsoft.com/en-us/library/ms190422.aspx.