I have three projects on GCP which play the role of three environments (dev, staging, prod. Each of them has a corresponding dataset on Big Query created as follows:
bq --location=${REGION} mk \
--dataset \
${DEVSHELL_PROJECT_ID}:mydataset
bq mk \
--table \
${DEVSHELL_PROJECT_ID}:mydataset.mytable \
schema.json
When executing that in the dev shel on GCP, I have my Dev project selected.
And, when I execute
bq ls
in the shell I can see only this dataset available there which is expected.
After that, after switching to another project and executing
bq ls
Again, only one data set is visible and it is the one dedicated to the staging environment, for example. But when I open the UI of Google Big Query (using the staging project), I can see my Dev environment/project dataset.
I am wondering why is that and is it normal and expected?
It is totally normal behavior. The Resources section contains a list of pinned projects. Expand a project to view datasets and tables that you have access to. You can manually pin/unpin your datasets in each project. A search box is available in the Resources section that allows you to search for resources by name or by label.
Please, refer to official documentation. I hope it helps.
Related
I am looking for CI/CD solution for the Google Bigquery script.
The requirement is that I have a list of files with DDL script, design the CI/CD solution which should maintain the version, and deploy the script in Google Bigquery in auto/schedule based.
Since you want to use version control to commit the schema, you can use CI for Data in BigQuery CLI utility Github Repository which will help you in orchestrating the processes. For more information you can check this documentation. For implementing this, you can check this link.
Since you want to use CD, Cloud Build can be used with BigQuery where you can use your own custom builders for your requirement. You can also configure notifications for both BigQuery and GitHub using Cloud Build.
for product recommendation for CI, user cloud source repositories and for CD use cloud build
there are multiple ways to do deploy
option 1: here you are specifying inline query in the cloud build steps, this does not exactly takes into your latest version of SQL. see option 2 for latest version of sql
here you see $PROJECT_ID and $_DATASET these are dynamic variables that you set at run time by environment variables in cloud build also you can use same way to
— name: ‘gcr.io/cloud-builders/gcloud’
entrypoint: ‘bq’
id: ‘create entry min day view’
args:
— query
— --use_legacy_sql=false
— "CREATE OR REPLACE TABLE $PROJECT_ID.$_DATASET.TABLENAME
AS
SELECT 1"
option 2:
there is post for this here
on the last post above link, you can use use bash as entry point and pass bq arguments as args
hope this helps.
How to get the list of projects, repositories and teams created in Azure DevOps. Also the administrators, contributors and pull request approver's list.
I have seen the API mentioned in Azure docs but it provides the info in json format which has a lot of info and is really difficult to pull out the project and repository name.
How can I get that data in excel or word document?
Best option looks like Rest APIs which returns json, but I am afraid there is no easy way to export them to excel or word.
For people who are searching what exactly are the Rest API endpoints to fetch them, here are the details;
How to get git repositories?
You can follow following steps to see the count of repositories
Login to Azure DevOps
Open one of following links by modifying your organization and project name
https://dev.azure.com/{ORGANIZATION_NAME}/{PROJECT_NAME}/_apis/git/repositories/
https://dev.azure.com/{ORGANIZATION_NAME}/{PROJECT_ID}/_apis/git/repositories/
Scroll to the end of the json response and you should see the count at the end of the page
How to get projects?
Follow step 1 and then open the link below
https://dev.azure.com/{ORGANIZATION_NAME}/_apis/projects
How to get teams?
Follow step 1 and then open the link below
https://dev.azure.com/{ORGANIZATION_NAME}/_apis/teams
How to find the project Id?
In Chrome, open the developer tools, open Azure DevOps and navigate to your project then you should see following calls which will give you the project id
How to get the count of projects, repositories and teams created in Azure DevOps?
I am afraid there is no such out of box method to get that data in excel or word document at this moment. We could not get the list of projects,repositories via queries and export it to the excel or word document.
To achieve this, you can accept the advise of Shayki Abramczyk, using the Rest API.
After get the response in json format, then parse the json file with Powershell or other scripts, like:
PS Script Get All Team Projects Listed as HTML–TFS/VSTS
Besides, when we parse the josn file, we even could export the data to the csv file:
Ticket: How to export data to CSV in PowerShell?
Hope this helps.
list all repos and count the lines assuming you have access to.
# get repo details
$repo_list = az devops project list --query 'value[].name' -o tsv | % { az repos list --project $_ -o tsv}
# count the output lines
$repo_list = az devops project list --query 'value[].name' -o tsv | % { az repos list --project $_ -o tsv} | wc -l```
note: ran this in powershell with a computer with git bash. the wc command is not a powershell command normally
The best way to retrieve the list of Project and team is using the OData feed. One can use excel or Power bi to retrieve the data. I am attaching how to retrieve the list of team and Project from Azure DevOps. https://learn.microsoft.com/en-us/azure/devops/report/powerbi/access-analytics-power-bi?view=azure-devops
I created different models with django 1.8.
Now, to enable other people to have a quickly comprehension, I would create a sql schema from the models or from the migration files (even only from the initial migration file).
Someone knows how does it?
You can squash all migrations for a moment, or delete them for a sec and generate new initial one and then run this command:
https://docs.djangoproject.com/en/1.11/ref/django-admin/#django-admin-sqlmigrate
If you just want to show the database structure to others, I would rather recommend using the graph_models command from django_extensions:
http://django-extensions.readthedocs.io/en/latest/graph_models.html
For example typing
python manage.py graph_models -a -g models.png
creates a graph with the individual models as nodes and their relations as arcs (assuming you have graphviz installed). You can also create a dot-file and render it however you like
So I'm trying to setup this script that pipes this data via an API into BigQuery.
It's all being done on the command line, and I've already successfully setup the framework behind it. Specifically, setting up the schema.json file.
When I run the following, it successfully uploads:
bq load --source_format=NEWLINE_DELIMITED_JSON --max_bad_records=10 program_users gs://internal/program_user.json program_users_schema.json
As I said, this successfully pipes into BQ, which is great. But the problem is, this API only allows a max of 50 records at a time when there are over 1000.
EDIT: The initial call to retrieve the records looks like this:
$ curl -s https://api.programs.io/users/57f263fikgi33d8ea7ff4 -u 'dG9rOmU5NGFjYTkwXzliNDFfNGIyJ24iYzA0XzU0NDg3MjE5ZWJkZD=': -H 'Accept:application/json'
Would anyone have a solution to this, particularly, one that is done in the command line?
Is there an easy way to directly download all the data contained in a certain dataset on Google BigQuery? I'm actually downloading "as csv", making one query after another, but it doesn't allow me to get more than 15k rows, and rows i need to download are over 5M.
Thank you
You can run BigQuery extraction jobs using the Web UI, the command line tool, or the BigQuery API. The data can be extracted
For example, using the command line tool:
First install and auth using these instructions:
https://developers.google.com/bigquery/bq-command-line-tool-quickstart
Then make sure you have an available Google Cloud Storage bucket (see Google Cloud Console for this purpose).
Then, run the following command:
bq extract my_dataset.my_table gs://mybucket/myfilename.csv
More on extracting data via API here:
https://developers.google.com/bigquery/exporting-data-from-bigquery
Detailed step-by-step to download large query output
enable billing
You have to give your credit card number to Google to export the output, and you might have to pay.
But the free quota (1TB of processed data) should suffice for many hobby projects.
create a project
associate billing to a project
do your query
create a new dataset
click "Show options" and enable "Allow Large Results" if the output is very large
export the query result to a table in the dataset
create a bucket on Cloud Storage.
export the table to the created bucked on Cloud Storage.
make sure to click GZIP compression
use a name like <bucket>/prefix.gz.
If the output is very large, the file name must have an asterisk * and the output will be split into multiple files.
download the table from cloud storage to your computer.
Does not seem possible to download multiple files from the web interface if the large file got split up, but you could install gsutil and run:
gsutil -m cp -r 'gs://<bucket>/prefix_*' .
See also: Download files and folders from Google Storage bucket to a local folder
There is a gsutil in Ubuntu 16.04 but it is an unrelated package.
You must install and setup as documented at: https://cloud.google.com/storage/docs/gsutil
unzip locally:
for f in *.gz; do gunzip "$f"; done
Here is a sample project I needed this for which motivated this answer.
For python you can use following code,it will download data as a dataframe.
from google.cloud import bigquery
def read_from_bqtable(bq_projectname, bq_query):
client = bigquery.Client(bq_projectname)
bq_data = client.query(bq_query).to_dataframe()
return bq_data #return dataframe
bigQueryTableData_df = read_from_bqtable('gcp-project-id', 'SELECT * FROM `gcp-project-id.dataset-name.table-name` ')
yes steps suggested by Michael Manoochehri are correct and easy way to export data from Google Bigquery.
I have written a bash script so that you do not required to do these steps every time , just use my bash script .
below are the github url :
https://github.com/rajnish4dba/GoogleBigQuery_Scripts
scope :
1. export data based on your Big Query SQL.
2. export data based on your table name.
3. transfer your export file to SFtp server.
try it and let me know your feedback.
to help use ExportDataFromBigQuery.sh -h