Is there size limit on appending ORC data files to Vora tables - sap

I created a Vora table in Vora 1.3 and tried to append data to that table from ORC files that I got from SAP BW archiving process (NLS on Hadoop). I had 20 files, in total containing approx 50 Mio records.
When I tried to use the "files" setting in the APPEND statement as "/path/*", after approx 1 hour Vora returned this error message:
com.sap.spark.vora.client.VoraClientException: Could not load table F002_5F: [Vora [eba156.extendtec.com.au:42681.1640438]] java.lang.RuntimeException: Wrong magic number in response, expected: 0x56320170, actual: 0x00000000. An unsuccessful attempt to load a table might lead to an inconsistent table state. Please drop the table and re-create it if necessary. with error code 0, status ERROR_STATUS
Next thing I tried was appending data from each file using separate APPEND statements. On the 15th append (of 20) I've got the same error message.

The error indicates that the Vora engine on node eba156.extendtec.com.au is not available. I suspect it either crashed or ran into an out-of-memory situtation.
You can check the log directory for a crash dump. If you find one, please open a customer message for further investigation.
If you do not find a crash dump, it is likely a out-of-memory situation. You should find confirmation in either the engine log file or in /var/log/messages (if the oom killer ended the process). In that case, the available memory is not sufficient to load the data.

Related

DataStage 11.5 CFF stage throwing exception : APT_BabAlloc : Heap allocation failed

Can you please help me to solve this --> I m using Datastage 11.5 and in cff stage of one of my job i m getting allocation failed error due to which my job is getting aborted when ever a large size cff file comes.
my job simplly converts cff file into text file.
Errors in job log show:
Message: main_program: Current heap size: 2,072,104,336 bytes in 4,525,666 blocks
Message: main_program: Fatal Error: Throwing exception: APT_BadAlloc: Heap allocation failed. [error_handling/exception.C:132]
From https://www.ibm.com/support/pages/datastage-cff-stage-job-fails-message-aptbadalloc-heap-allocation-failed the Complex Flat File (CFF) stage is a composite operator and will be inserting a Promote Sub-Record operator for every subrecord. Too many of them can exhaust the available heap. To further diagnose the problem, SET the Environment Variable APT_DUMP_SCORE=True and Verify the score dump in the log to see if the job is creating too many Promote Sub-Record operators. This could be exhausting the available heap. To improve performance and reduce memory usage the table definition should be optimized further.
Resolving the problem
Here is what you can do to reduce the number of Promote Sub-Record operators:
Save the table definition from the CFF stage in the job.
Clear all the columns in the CFF stage.
Reload the table definition from the saved table definition in step 1) by checking the check box 'Remove group columns". This step will remove the additional group columns.
Check the layout, it should have the same record length with the original job. After reloading the table the table structure will be flat (no more hierarchy).
After the above steps the OSH script generated from the Complex Flat File stage will no longer contain Promote Sub-Record operator and the performance will be improved and memory usage will be reduced to minimum.

SSAS corrupt string store data file for one of the table columns error

I'm getting error on SSAS when redeploy the project. The error is;
The JSON DDL request failed with the following error: Error happened while loading table data. Possible cause is: corrupt string store data file for one of the table columns.Error happened while loading table data.A duplicate value has been detected in the Unique Value store associated with the dictionary.Database consistency checks (DBCC) failed while checking the data segments.Error happened while loading table '', file '1245.H$Countries (437294994)$Country (437295007).POS_TO_ID.0.idf'.Database consistency checks (DBCC) failed while checking the data segments.Error happened while loading table '', file '1245.H$Countries (437294994)$City ....
I checked the table Countries but there is no duplicated data.
Is there anybody who can help please?
As the error implies, the model has some corrupted data (not to be confused with duplicated data).
Microsoft has some resolutions for there kinds of errors here: https://learn.microsoft.com/en-us/analysis-services/instances/database-consistency-checker-dbcc-for-analysis-services?view=asallproducts-allversions#common-resolutions-for-error-conditions
TL:DR:
Depending on the error, the recommended resolution is to either
reprocess an object, delete and redeploy a solution, or restore the
database.

BigQuery Scheduled Data Transfer throws "Incompatible table partitioning specification." Error - but error message is truncated

I'm using the new BQ Data Transfer UI and upon scheduling a Data Transfer, the transfer fails.
The error message in Run History isn't terribly helpful as the error message seems truncated.
Incompatible table partitioning specification. Expects partitioning specification interval(type:hour), but input partitioning specification is ; JobID: xxxxxxxxxxxx
Note the part of the error that says..."but input partition specification is..." with nothing before the semicolon. Seems this error is truncated.
Some details about the run:
The run imports data from a CSV file located in a GCS Bucket on a nightly basis. Once successfully ingested the process will delete the file. The target table in BQ is a partitioned table using the default partition pseudo column (_PARTITIONTIME)
What I have done so far:
Reran the scheduled Data Transfer -- which failed and threw the same error
Deleted the target table in BQ and recreated it with different partition specifications (day, hour, month) -- then Reran Scheduled Transfer -- failed and threw same error.
Imported the data manually (I downloaded the file from GCS and uploaded it locally from my machine) using the BQ UI (Create Table, append the specific table) - Worked perfectly.
Checked to see if this was a known issue here on Stack Overflow and only found this (which is now closed) -- close, but not exactly the issue. (BigQuery Data Transfer Service with BigQuery partitioned table)
What I'm holding off doing since it would take a bit more work.
Change schema of the target BQ table to include a specified column specific for partitioning
Include a system-generated timestamp in the original file inside of GCS and ensure the process recognizes this as the partitioning field.
Am I doing something wrong? Or is this a known issue?
Alright, I believe I have solved this. It looks like you need to include runtime parameters into your target table if the destination table is being partitioned.
https://cloud.google.com/bigquery-transfer/docs/gcs-transfer-parameters
Specifically this section called "Runtime Parameter Examples" here: https://cloud.google.com/bigquery-transfer/docs/gcs-transfer-parameters#loading_a_snapshot_of_all_data_into_an_ingestion-time_partitioned_table
They also advise that minutes cannot be specified in these parameters.
You will need to append the parameters to your destination table details as shown below:

databricks error IllegalStateException: The transaction log has failed integrity checks

I have a table that I need drop, delete transaction log and recreate, but while I am trying to drop I get following error.
I have ran repair table statement on this one and could be responsible for error but not sure.
IllegalStateException: The transaction log has failed integrity checks. We recommend you contact Databricks support for assistance. To disable this check, set spark.databricks.delta.state.corruptionIsFatal to false. Failed verification of:
Table size (bytes) - Expected: 0 Computed: 63233
Number of files - Expected: 0 Computed: 1
We think this may just be related to s3 eventual consistency. Please try waiting a few extra minutes after deleting the Delta directory before writing new data to it. Also, normal MSCK REPAIR TABLE doesn't do anything for Delta, as Delta doesn't use the Hive Metastore to store the partitions. There is an FSCK REPAIR TABLE, but that is for removing the file entries from the transaction log of a Databricks Delta table that can no longer be found in the underlying file system.
We don't recommend overwriting a Delta table in place, like you might with a normal Spark table. Delta is not like a normal table - it's a table, plus a transaction log, and many versions of your data (unless fully vacuumed). If you want to overwrite parts of the table, or even the whole table, you should use Delta's delete functionality. If you want to completely change the table, consider writing to an entirely new directory, such as /table/v2/... and separately deleting the other table.
To skip the issue from occurring can use below command (PySpark notebook):
spark.conf.set("spark.databricks.delta.state.corruptionIsFatal", False)

Error while loading AVRO files to BigQuery

I have successfully loaded large number of AVRO files (of same schema type into same table), stored on Google Storage, using bq CLI utility.
However, for some of the AVRO files I am getting very cryptic error while loading into bigquery, the error says:
The Apache Avro library failed to read data with the follwing error: EOF
reached (error code: invalid)
With avro-tools validated that the AVRO file is not corrupted, report output:
java -jar avro-tools-1.8.1.jar repair -o report 2017-05-15-07-15-01_48a99.avro
Recovering file: 2017-05-15-07-15-01_48a99.avro
File Summary:
Number of blocks: 51 Number of corrupt blocks: 0
Number of records: 58598 Number of corrupt records: 0
I tried creating a brand new table with one of the failing files in case it was due to schema mismatch but that didnt help as the error was exactly the same.
need help to figure out what could be causing the error here?
No way to pinpoint the issue without more information, but I ran into this error message and filed a ticket here.
I a number of files in a single load job were missing columns which was causing the error.
Explanation from the ticket.
BigQuery uses the alphabetically last file from the directory as the avro schema to read the other Avro files. I suspect the issue is with schema incompatibility between the last file and the "problematic" file. Do you know if all the files have the exact same schema or differ? One thing you could try to help verify this is to copy the alphabetically last file of the directory and the "problematic" file to a different folder and try to load those two files in one BigQuery load job and see if the error reproduces.