How to import or create notebook in synapse using CLI - azure-synapse

How to create notebooks in synapse using GitHub via CLI commands.
I am unable to create the notebooks.
Please provide step by step instructions

Here's an example:
az synapse notebook export --workspace-name syn-ggtest1 --name mynotebook --output-folder ./
az synapse notebook import --workspace-name syn-ggtest1 --name mynotebook --file #./mynotebook.ipynb
A few things to be aware of:
The notebook will only be visible in synapse studio if you're in
"live mode". If your workspace is connected to a git repo, you need
to import a notebook by adding the json file to the "notebook"
folder in your repo.
You have to use the .ipynb format (this is the
format you get when you do an export via cli) to do the
import/create, not the .json (this is the format you get when you
commit to git)
The notebook will get created at the root level in
Synapse studio (the "folder" property in the .ipynb is ignored)

How to import or create notebook in synapse using CLI?
You can use Azure CLI commands to create and import notebooks.
Reference: az synapse notebook

Related

how to run google bigquery locally outside gcp or google colab notebooks?

I am trying to run google bigquery in jupyter notebook on a local host on my pc but turns out that its not working,whereas its working fine in google vms in gcp and google colab notebooks.
Tried everything but nothing seems to work.
from google.cloud import bigquery
ModuleNotFoundErro Traceback (most recent call last)
<ipython-input-1-1035661e8528> in <module>
----> 1 from google.cloud import bigquery
ModuleNotFoundError: No module named 'google'
You can connect to BigQuery from an environment which is outside GCP.
You need to setup two things:
Bigquery client library of your choice of language. Looking at the above code, it looks like you want to use python. You can install Bigquery python client lib by running
pip install --upgrade google-cloud-bigquery
Authentication to BigQuery -
a. Get your GCP creds by running following command:
gcloud auth application-default login
This should create a credential JSON file at location "~/.config/gcloud/"
b. You can set an environment variable pointing to the JSON creds file on the command line
export GOOGLE_APPLICATION_CREDENTIALS="~/.config/gcloud/application_default_credentials.json"
Or, you can set the above environment variable in your python program by adding the following lines:
import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] =
'~/.config/gcloud/application_default_credentials.json'
Hope this helps.

How to read HDF5 file in Python/Pandas via SSH?

I'm accessing a remote machine via SSH (Putty). A dataset is stored in a directory on that machine, which I need to read with pandas in Python on my local computer. I am trying to use dataframe=pandas.read_hdf(path, key="data") but I don't know which path to specify which would direct towards the dataset stored on the remote machine in my local Python code since it's not stored locally. As I mentioned I am accessing the dataset using Putty.
What should the path look like?
I tried replacing C: with the host name followed by the path which I use in Putty to access the file.
Thanks in advance.
I don't know what you mean precisely by read, but you can display the dataframe with the following:
SSH to your remote server
Navigate to the directory your dataframe is stored in:
cd /directory/of/dataframe
Launch a Python or iPython interpreter: python or ipython
Execute those python commands:
>>> import pandas as pd
>>> dataframe=pandas.read_hdf("hdf_file.h5", key="data")
# This should work because `hdf_file.h5 is
# in the directory you launched the python command
Print your dataframe: print(dataframe)

nifi pyspark - "no module named boto3"

I'm trying to run a pyspark job I created that downloads and uploads data from s3 using the boto3 library. While the job runs fine in pycharm, when I try to run it in nifi using this template https://github.com/Teradata/kylo/blob/master/samples/templates/nifi-1.0/template-starter-pyspark.xml
The ExecutePySpark errors with "No module named boto3".
I made sure it was installed on my conda environment that is active.
Any ideas, im sure im missing something obvious.
Here is a picture of the nifi spark processor.
Thanks,
tim
The Python environment where PySpark should run on is configured via the PYSPARK_PYTHON variable.
Go to Spark installation directory
Go to conf
Edit spark-env.sh
Add this line: export PYSPARK_PYTHON=PATH_TO_YOUR_CONDA_ENV

tensorboard logdir with s3 path

I see tensorflow support AWS s3 file system (https://github.com/tensorflow/tensorflow/tree/master/tensorflow/core/platform/s3) but I am unable to use the S3 path with tensorboard.
I tried latest nightly 0.4.0rc3 but no luck. I built locally also and made sure Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: set to YES but still i don't see tensorboard --logdir=s3://bucket/path working at all.
Am I missing something here?
If you start a tensorboard by using AWS s3 file, you should do as follows:
(1) add ENV VARIBLES
export AWS_ACCESS_KEY_ID=******
export AWS_SECRET_ACCESS_KEY=*******
export S3_ENDPOINT=******
export S3_VERIFY_SSL=0
export S3_USE_HTTPS=0
(2) upgrade tensorflow to newest version by using pip:
pip install tensorflow==1.4.1
(3) you don't need to upgrade tensorboard, because it is revolved in the previous step
Then you can start you tensorboard by using you code
tensorboard --logdir=s3://bucket/path

Use BigQuery CLI to authenticate two two projects without bq init, like gsutil with a .boto file?

I am trying to get the bq CLI to work with multiple service accounts for different projects without having to re-authenticate using gcloud auth login or bq init.
An example of what I want to do, and am able to do using gsutil:
I have used gsutil with a .boto configuration file containing:
[Credentials]
gs_service_key_file = /path/to/key_file.json
[Boto]
https_validate_certificates = True
[GSUtil]
content_language = en
default_api_version = 2
default_project_id = my-project-id
[OAuth2]
on a GCE instance to run an arbitrary gsutil command as a service. The service does not need to be unique or globally defined on the GCE instance: as long as a service is set up in my-project-id and a private key has been created, then the private key file referenced in the .boto config will take care of authentication. For example, if I run
BOTO_CONFIG=/path/to/my/.boto_project_1
export BOTO_CONFIG
gsutil -m cp gs://mybucket/myobject .
I can copy from any project that I have a service account set up with, and for which I have the private key file defined in .boto_project_1. In this way, I can run a similar gsutil command for project_2 just be referencing the .boto_project_2 config file. No manual authentication needed.
The case with bq CLI
In the case of the bigquery command line interpreter, I want to reference a config file or pass a config option like a key file to run a bq load command, ie. upload the same .csv file that is in GCS for various projects. I want to automate this without having to bq init each time.
I have read here that you can configure a .biqqueryrc file and pass in your credential and key files as options; however the answer is from 2012, references outdated bq credential files, and throws errors due to the openssl and pyopenssl installs that it mentioned.
My question
Provide two example bq load commands with any necessary options/biqueryrc files to correctly load a .csv file from GCS into bigquery for two distinct projects without needing to bq init/authenticate manually between the two commands. Assume the .csv file is already correctly in each project's GCS bucket.
Simply use gcloud auth activate-service-account and use the global --project flag.
https://cloud.google.com/sdk/gcloud/reference/auth/activate-service-account
https://cloud.google.com/sdk/gcloud/reference/