NullPointerException on loading data into Grakn - backup

I have created a backup of Grakn with the exporter tool like this:
./grakn server export 'old_test' backup.grakn
$x isa export,
has status "completed",
has progress (100.0%),
has count (105 / 105);
I then wanted to import this into a new keyspace with
./grakn server import 'new_test' backup.grakn
But I got this error below:
An error has occurred during boot-up. Please run 'grakn server status' or check the logs located under the 'logs' directory.
io.grpc.StatusRuntimeException: INTERNAL: java.lang.NullPointerException

You need to import your schema into the new keyspace first, this error occurs because the server cannot find a schema label in your dataset. The steps for migrating schema are described in the docs: https://dev.grakn.ai/docs/management/migration-and-backup

Related

Load from GCS to GBQ causes an internal BigQuery error

My application creates thousands of "load jobs" daily to load data from Google Cloud Storage URIs to BigQuery and only a few cases causing the error:
"Finished with errors. Detail: An internal error occurred and the request could not be completed. This is usually caused by a transient issue. Retrying the job with back-off as described in the BigQuery SLA should solve the problem: https://cloud.google.com/bigquery/sla. If the error continues to occur please contact support at https://cloud.google.com/support. Error: 7916072"
The application is written on Python and uses libraries:
google-cloud-storage==1.42.0
google-cloud-bigquery==2.24.1
google-api-python-client==2.37.0
Load job is done by calling
load_job = self._client.load_table_from_uri(
source_uris=source_uri,
destination=destination,
job_config=job_config,
)
this method has a default param:
retry: retries.Retry = DEFAULT_RETRY,
so the job should automatically retry on such errors.
Id of specific job that finished with error:
"load_job_id": "6005ab89-9edf-4767-aaf1-6383af5e04b6"
"load_job_location": "US"
after getting the error the application recreates the job, but it doesn't help.
Subsequent failed job ids:
5f43a466-14aa-48cc-a103-0cfb4e0188a2
43dc3943-4caa-4352-aa40-190a2f97d48d
43084fcd-9642-4516-8718-29b844e226b1
f25ba358-7b9d-455b-b5e5-9a498ab204f7
...
As mentioned in the error message, Wait according to the back-off requirements described in the BigQuery Service Level Agreement, then try the operation again.
If the error continues to occur, if you have a support plan please create a new GCP support case. Otherwise, you can open a new issue on the issue tracker describing your issue. You can also try to reduce the frequency of this error by using Reservations.
For more information about the error messages you can refer to this document.

Getting error while connecting ADLS to Notebook in AML

I am getting below error while connecting dataset created and registered in AML notebook and which is based on ADLS. When I connect this dataset in designer I am able to visualize the same. Below is the code that I am using. Please let me know the solution if anyone have faced the same error.
Examle 1 Import dataset to notebbok
from azureml.core import Workspace, Dataset
subscription_id = 'abcd'
resource_group = 'RGB'
workspace_name = 'DSG'
workspace = Workspace(subscription_id, resource_group, workspace_name)
dataset = Dataset.get_by_name(workspace, name='abc')
dataset.to_pandas_dataframe()
Error 1
ExecutionError: Could not execute the specified transform.
(Error in getting metadata for path /local/top.txt.
Operation: GETFILESTATUS failed with Unknown Error: The operation has timed out..
Last encountered exception thrown after 5 tries.
[The operation has timed out.,The operation has timed out.,The operation has timed out.,The operation has timed out.,The operation has timed out.]
[ServerRequestId:])|session_id=2d67
Example 2 Import data from datastore to notebook
from azureml.core import Workspace, Datastore, Dataset
datastore_name = 'abc'
workspace = Workspace.from_config()
datastore = Datastore.get(workspace, datastore_name)
datastore_paths = [(datastore, '/local/top.txt')]
df_ds = Dataset.Tabular.from_delimited_files(
path=datastore_paths, validate=True,
include_path=False, infer_column_types=True,
set_column_types=None, separator='\t',
header=True, partition_format=None
)
df = df_ds.to_pandas_dataframe()
Error 2
Cannot load any data from the specified path. Make sure the path is accessible.
Try removing the initial slash from your path 'local/top.txt'
datastore_paths = [(datastore, 'local/top.txt')]
For your dataset abc, can you visualize/preview the data on ml.azure.com?
Might be due to the fact that your data permission is not set up correctly in ADLS. You need to give permission to the service principal for the file/folder you are access.
https://learn.microsoft.com/en-us/azure/data-lake-store/data-lake-store-access-control
Data Access Setting on a file in ADLS

Intermittent 500 internal server error in images after adding isolation_level to a flask-sqlalchemy app on apache mod_wsgi server

I am using Apache mod_wsgi in a flask-sqlalchemy, marshamllow application, connecting to a remote ms sql database using pyodbc, recently I was asked to add isolation_level 'SNAPSHOT' and I did that using apply_driver_hacks
class SQLiteAlchemy(SQLAlchemy):
def apply_driver_hacks(self, app, info, options):
options.update({
'isolation_level': 'SNAPSHOT',
})
super(SQLiteAlchemy, self).apply_driver_hacks(app, info, options)
the project is built to access image blob data from a ms sql server and display on a webpage, soon after adding the isolation level I see internal error generated for every few images, doing a ctrl+f5 displays the image but then there are other images not being displayed and this is in the error log
mod_wsgi (pid=10694): Exception occurred processing WSGI script
pyodbc.ProgrammingError: ('42000', "[42000] [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]Transaction failed in database 'testdb' because the statement was run under snapshot isolation but the transaction did not start in snapshot isolation. You cannot change the isolation level of the transaction to snapshot after the transaction has started unless the transaction was originally started under snapshot isolation level. (3951) (SQLExecDirectW)")
edited to add code below:
how would I do that with flask-sqlalchmey when not using create-engine
my app.py file
app = Flask(__name__)
app.config.from_object('config.ProductionConfig')
db.init_app(app)
ma.init_app(app)
my model.py file
class SQLiteAlchemy(SQLAlchemy):
def apply_driver_hacks(self, app, info, options):
options.update({
'isolation_level': 'SNAPSHOT',
})
super(SQLiteAlchemy, self).apply_driver_hacks(app, info, options)
# To be initialized with the Flask app object in app.py.
db = SQLiteAlchemy()
ma = Marshmallow()
At Engine Level
If you were using the declaritive implementation you would have access to the create engine function (and the scoped session one).
But assuming you're using the Flask-SQLAlchemy implementation, this just calls sqlalchemy.create_engine under the hood (on this line).
Might be a hack for the latter, as there doesn't seem to be a way to pass engine related options in; they are defined specifically a few lines up at #558:
options = {'convert_unicode': True}
At Session Level
This looks like it could be slightly easier, because you can pass session options when you initialise SQLAlchemy: see this line. The create_scoped_session method expects a dictionary which can be passed to the __init__ method as session_options.
So when you initialise the library you could try something like:
db = SQLiteAlchemy(session_options={'isolation_level': 'SNAPSHOT'})

BigQuery loads manually but not through the Java SDK

I have a Dataflow pipeline, running locally. The objective is to read a JSON file using TEXTIO, make sessions and load it into BigQuery. Given the structure I have to create a temp directory in GCS and then load it into BigQuery using that. Previously I had a data schema error that prevented me to load the data, see here. That issue is resolved.
So now when I run the pipeline locally it ends with dumping a temporary JSON newline delimited file into GCS. The SDK then gives me the following:
Starting BigQuery load job beam_job_xxxx_00001-1: try 1/3
INFO [main] (BigQueryIO.java:2191) - BigQuery load job failed: beam_job_xxxx_00001-1
...
Exception in thread "main" com.google.cloud.dataflow.sdk.Pipeline$PipelineExecutionException: java.lang.RuntimeException: Failed to create the load job beam_job_xxxx_00001, reached max retries: 3
at com.google.cloud.dataflow.sdk.Pipeline.run(Pipeline.java:187)
at pedesys.Dataflow.main(Dataflow.java:148)
Caused by: java.lang.RuntimeException: Failed to create the load job beam_job_xxxx_00001, reached max retries: 3
at com.google.cloud.dataflow.sdk.io.BigQueryIO$Write$WriteTables.load(BigQueryIO.java:2198)
at com.google.cloud.dataflow.sdk.io.BigQueryIO$Write$WriteTables.processElement(BigQueryIO.java:2146)
The errors are not very descriptive and the data is still not loaded in BigQuery. What is puzzling is that if I go to the BigQuery UI and load the same temporary file from GCS that was dumped by the SDK's Dataflow pipeline manually, in the same table, it works beautifully.
The relevant code parts are as follows:
PipelineOptions options = PipelineOptionsFactory.create();
options.as(BigQueryOptions.class)
.setTempLocation("gs://test/temp");
Pipeline p = Pipeline.create(options)
...
...
session_windowed_items.apply(ParDo.of(new FormatAsTableRowFn()))
.apply(BigQueryIO.Write
.named("loadJob")
.to("myproject:db.table")
.withSchema(schema)
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
);
The SDK is swallowing the error/exception and not reporting it to users. It's most likely a schema problem. To get the actual error that is happening you need to fetch the job details by either:
CLI - bq show -j job beam_job_<xxxx>_00001-1
Browser/Web: use "try it" at the bottom of the page here.
#jkff has raised an issue here to improve the error reporting.

SSIS CSV Import Error 0xC0202092 DTS_E_PRIMEOUTPUTFAILED

All of the sudden, a CSV file that is imported into a db/table every morning is failing every time within the last few weeks. I do not support this process directly, so I don't know much about SSIS, but would greatly appreciate some help as I need this working and whoever supports this process has no idea what the issue is. I'm not sure if that error regarding the row has anything to do with the data in the row because it looks fine to me. The CSV includes Active Directory information for every computer in AD and is exported from PowerShell to a server where the CSV is imported into a table via SSIS. The process is entirely automated and nothing has changed.
[Source - Clean_Gold CSV [1]] Error: The column delimiter for column "LastLogontimestamp" was not found.
[Source - Clean_Gold CSV [1]] Error: An error occurred while processing file "H:\Computers\clean_gold.csv" on data row 40377.
[SSIS.Pipeline] Error: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED. The PrimeOutput method on component "Source - Clean_Gold CSV" (1) returned error code 0xC0202092. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing. There may be error messages posted before this with more information about the failure.
I looked at 40,378 and saw someone put a ", in the description of the computer object in Active Directory. That caused an issue with the delimiting.