Bigquery to Redshift data transfer using AWS SCT failing? - google-bigquery

I am trying to migrate data from Bigquery to Redshift using this article. I followed through and successfully got till "Start the Local Data Migration Task".I had to setup AWS profile to access "Data Migration View(Other)". AWS profile was setup using access key and access secret of an admin user account in AWS.
What am I missing ?However, upon starting the task I keep getting following error:
class com.amazon.dmt.model.FileCredentials cannot be cast to class com.amazon.dmt.model.UserCredentials (com.amazon.dmt.model.FileCredentials and com.amazon.dmt.model.UserCredentials are in unnamed module of loader 'app')
I tried to check AWS documentation and looked around but this error is not listed anywhere. I cannot seem to understand that, why is type casting from FileCredentials to UserCredentials is being done ?
Anyone faced a similar issue or can point me in right direction please ?

Based on my testing, I have determined that this is an issue in the 1.0.670 version of SCT. A request has been submitted to correct the issue. In the meantime, to allow you to continue with your project, please revert to AWS-SCT version 1.0.666 using this link. https://d211wdu1froga6.cloudfront.net/builds/1.0/666/Windows/aws-schema-conversion-tool-1.0.zip
You will have to uninstall SCT and the extractor agent then reinstall and configure the previous version(s) as you did before.

Related

Intermittent HTTP error when loading files from ADLS Gen2 in Azure Databricks

I am getting an intermittent HTTP error when I try to load the contents of files in Azure Databricks from ADLS Gen2. The storage account has been mounted using a service principal associated with Databricks and has been given Storage Blob Data Contributor access through RBAC on the data lake storage account. A sample statement to load is
df = spark.read.format("orc").load("dbfs:/mnt/{storageaccount}/{filesystem}/{filename}")
The error message I get is:
Py4JJavaError: An error occurred while calling o214.load. : java.io.IOException: GET https://{storageaccount}.dfs.core.windows.net/{filesystem}/{filename}?timeout=90``` StatusCode=412 StatusDescription=The condition specified using HTTP conditional header(s) is not met.
ErrorCode=ConditionNotMet ErrorMessage=The condition specified using HTTP conditional header(s) is not met.
RequestId:51fbfff7-d01f-002b-49aa-4c89d5000000
Time:2019-08-06T22:55:14.5585584Z
This error is not with all the files in the filesystem. I can load most of the files. The error is just with some of the files. Not sure what the issue is here.
This has been resolved now. The underlying issue was due to a change at Microsoft end. This is the RCA I got from Microsoft Support:
There was a storage configuration that is turned on incorrectly during the latest storage tenant upgrade. This type of error would only show up for the namespace enabled account on the latest upgraded tenant. The mitigation for this issue is to turn off the configuration on the specific tenant, and we had kicked off the super sonic configuration rollout for the all the tenants. We have since added additional Storage upgrade validation for ADLS Gen 2 to help cover this type of scenario.
I had the same problem on one file today. Downloading the file, deleting it from storage and putting it back solved the problem.
Tried to rename file -> didn't work.
Edit: we have it on more files, random.
We worked around the problem by copying the entire folder to a new folder and rename it to original. Jobs run without problems again.
Still the question remains, why did the files end up in this situation?
Same issue here. After some research, it seems it was probably an If-Match eTag condition failure in the http GET request. Microsoft talk about how they will return error 412 when this happens in this post: https://azure.microsoft.com/de-de/blog/managing-concurrency-in-microsoft-azure-storage-2/
Regardless, Databricks seem to have resolved the issue on their end now.

internal server error connecting to Aurora

In VS 2017 I created a AWS Serverless Application (.NET Core - C#). I have a RDS (Aurora) with data in it.
I added MySql.Data to the project using NuGet.
Created a new controller to get the data out of the DB.
Created a method and model to Get data.
Built the project and ran it locally in VS.
I was able to use Postman to Get data from the API. GREAT!
Right clicked the project and selected Publish to AWS Lambda. Everything published and got the new URL.
when using the url/api/method. I get a 500 return. Tried another Controller that just returns values with no DB queries and that works. Any ideas?
First thing you should do is check the CloudWatch logs for your function for the source of the error (as 500 indicates an Internal Server Error i.e. your code throws an exception). Add logging into your function as needed if you don't get anything useful there.
Access control is one likely candidate. Is your database accessible from your lambda and does the deployed function correctly receive the database credentials?

Connecting IntelliJ Idea Servers to GitLab.com: what info is actually needed?

I'm trying to configure IntelliJ IDEA 2017.1.2 in order to get the tasks from a private repository on GitLab.com.
To do that I have to create the corresponding entry in the Servers window.
Now, I don't have the faintest idea about how I should fill the Servers form in IDEA.
What URL I have to use for Server URL ?
What token ?
Any advice? Thx in advance.
UPDATE: Based on the information mentioned in the issue IDEA-193736, the connectivity problem with the new GitLab Issues API (V4) should be fixed when the update 2018.2 is released.
The https://gitlab.com URL didn't work for me as the API URL was updated to V4 on GitLab. So, after some trial and error I was able to make it work by completing the following steps:
Create a Personal Access Token on GitLab (https://gitlab.com/profile/personal_access_tokens) with API and read_user access permissions
In IntelliJ (or Pycharm in my case), the Server URL should be https://gitlab.com/api/v4/issues? (with the question mark at the end)
The token is the Personal Access Token that was generated previously
Also, don't forget to increase the connection timeout to 15000 milliseconds under the Tasks section in the Settings (Settings => Tools => Tasks).
Task Server Screenshot
Hope it helps someone else.
[EDIT] This answer was valid in '17, when it was created. For an up to date anwer, pls see other answers in the thread.
So, here's how to do it.
First of all, go to gitlab.
Access with your data and get a personal access token.
Then, you can configure IntelliJ Idea with the following values:
You can now check all your GitLab's issues directly in Idea, as shown here below.

Google Cloud Dataflow error: "The Application Default Credentials are not available"

We have a Google Cloud Dataflow job, which writes to Bigtable (via HBase API). Unfortunately, it fails due to:
java.io.IOException: The Application Default Credentials are not available. They are available if running in Google Compute Engine. Otherwise, the environment variable GOOGLE_APPLICATION_CREDENTIALS must be defined pointing to a file defining the credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information. at com.google.bigtable.repackaged.com.google.auth.oauth2.DefaultCredentialsProvider.getDefaultCredentials(DefaultCredentialsProvider.java:74) at com.google.bigtable.repackaged.com.google.auth.oauth2.GoogleCredentials.getApplicationDefault(GoogleCredentials.java:54) at com.google.bigtable.repackaged.com.google.cloud.config.CredentialFactory.getApplicationDefaultCredential(CredentialFactory.java:181) at com.google.bigtable.repackaged.com.google.cloud.config.CredentialFactory.getCredentials(CredentialFactory.java:100) at com.google.bigtable.repackaged.com.google.cloud.grpc.io.CredentialInterceptorCache.getCredentialsInterceptor(CredentialInterceptorCache.java:85) at com.google.bigtable.repackaged.com.google.cloud.grpc.BigtableSession.<init>(BigtableSession.java:257) at org.apache.hadoop.hbase.client.AbstractBigtableConnection.<init>(AbstractBigtableConnection.java:123) at org.apache.hadoop.hbase.client.AbstractBigtableConnection.<init>(AbstractBigtableConnection.java:91) at com.google.cloud.bigtable.hbase1_0.BigtableConnection.<init>(BigtableConnection.java:33) at com.google.cloud.bigtable.dataflow.CloudBigtableConnectionPool$1.<init>(CloudBigtableConnectionPool.java:72) at com.google.cloud.bigtable.dataflow.CloudBigtableConnectionPool.createConnection(CloudBigtableConnectionPool.java:72) at com.google.cloud.bigtable.dataflow.CloudBigtableConnectionPool.getConnection(CloudBigtableConnectionPool.java:64) at com.google.cloud.bigtable.dataflow.CloudBigtableConnectionPool.getConnection(CloudBigtableConnectionPool.java:57) at com.google.cloud.bigtable.dataflow.AbstractCloudBigtableTableDoFn.getConnection(AbstractCloudBigtableTableDoFn.java:96) at com.google.cloud.bigtable.dataflow.CloudBigtableIO$CloudBigtableSingleTableBufferedWriteFn.getBufferedMutator(CloudBigtableIO.java:836) at com.google.cloud.bigtable.dataflow.CloudBigtableIO$CloudBigtableSingleTableBufferedWriteFn.processElement(CloudBigtableIO.java:861)
Which makes very little sense, because the job is already running on Cloud Dataflow service/VMs.
The Cloud Dataflow job id: 2016-05-13_11_11_57-8485496303848899541
We are using bigtable-hbase-dataflow version 0.3.0, and we want to use HBase API.
I believe this is a known issue where GCE instances are very rarely unable to get the default credentials during startup.
We have been working on a fix which should be part of the next release (1.6.0) which should be coming soon. In the meantime we'd suggest re-submitting the job which should work. If you run into problems consistently or want to discuss other workarounds (such as backporting the 1.6.0 fix) please reach out to us.
1.7.0 is released so this should be fixed now https://cloud.google.com/dataflow/release-notes/release-notes-java

Big Query | Error: Not Found: Project <project-id>

I'm getting the below error in Big Query Browser Tool.
Error: Not Found: Project
Have verified that, Big Query API is turned ON and billing also enabled.
Please let me know the solution for this.
I had found that this error occurred if (when) cookies were disabled in your browser. Let me know if this resolves this issue. You may also test via other methods, i.e. API, etc... to make sure that your account is actually enabled - Here's the URL - URL to BigQuery API documentation
See BigQuery error in query operation : Project id not found.
It looks like there is an issue with projects that were created via appengine. Let me know if that is the case for your project.