How to connect to a VPC project - google-bigquery

I am new to BigQuery and i'm trying to understand how VPC access works for BigQuery projects. I have a BigQuery project that imports data from several other BigQuery projects (no VPC but same organisation). I also need to connect to a project that is in a VPC network (still same organisation).
The only way that I can read this VPC project is to
Be a Gsuite member
Connect to the organisation VPN
Open the cloud console trough a proxy
I can only read the project and write queries if i'm in the VPC project itself
I want to be able to read and write queries for the VPC project in my own project
I want to be able to schedule data imports on daily aggregated data from the VPC project into my project.
Will this be possible if I add my project to a service perimeter and get access trough a perimeter bridge? What sort of access do I need to set up in order to read and import VPC project data directly in my project?

In this page you can find the BigQuery limitations when using VPC. Basically, if you want to use a service account to access a BigQuery instance protected by a service perimeter you must access from within the perimeter.
VPC Service Controls does not support copying BigQuery resources protected by a service perimeter to another organization. Access
levels do not enable you to copy across organizations.
To copy protected BigQuery resources to another organization, download the dataset (for example, as a CSV file), and then upload
that file to the other organization.
The BigQuery Data Transfer Service is supported only for the following services:
Campaign Manager
Google Ad Manager
Google Ads
Google Cloud Storage
Google Merchant Center
Google Play
YouTube
The BigQuery Classic Web UI is not supported. A BigQuery instance protected by a service perimeter cannot be accessed with the BigQuery
Classic Web UI.
The third-party ODBC driver for BigQuery cannot currently be used with the restricted VIP.
BigQuery audit log records do not always include all resources that were used when a request is made, due to the service internally
processing access to multiple resources.
When using a service account to access a BigQuery instance protected by a service perimeter, the BigQuery job must be run within a project
inside the perimeter. By default, the BigQuery client libraries will
run jobs within the service account or user's project, causing the
query to be rejected by VPC Service Controls.

Related

How to deploy a Managed Private Endpoint via ARM template?

I have a Synapse Workspace that is getting deployed via ARM (via Ev2). I have manually created a Managed Private Endpoint to a Private Link Service. I need to be able to deploy this connection with the workspace. When I look at the json for the workspace (or when I get az synapse workspace show), I don’t see the endpoint listed, so I am not sure where to start hunting. I don’t find much info online either.
Thanks,
~john

How can I disable user access to k8s cluster?

I have a question about giving access to k8s cluster. For example, new member joined our team. He created certificatesigningrequest and I approved it. Then created kubeconfig and give it to him to access our cluster. One day if he leave our team how can remove his access? I want he can not access to our cluster with this kubeconfig.
Imho you should use an external authentication provider. You can take a look at https://dexidp.io/docs/kubernetes/ which is an abstraction layer to other IDaaS-Providers like Azure, Google, Github and many more. For example, if your company uses Active Directory, you can control the access to the cluster using group memberships, where withdrawing access is then part of the company-wide leaver process.

Using Kubernetes Persistent Volume for Data Protection

To resolve a few issues we are running into with docker and running multiple instances of some services, we need to be able to share values between running instances of the same docker image. The original solution I found was to create a storage account in Azure (where we are running our kubernetes instance that houses the containers) and a Key Vault in Azure, accessing both via the well defined APIs that microsoft has provided for Data Protection (detailed here).
Our architect instead wants to use Kubernetes Persitsent Volumes, but he has not provided information on how to accomplish this (he just wants to save money on the azure subscription by not having an additional storage account or key storage). I'm very new to kubernetes and have no real idea how to accomplish this, and my searches so far have not come up with much usefulness.
Is there an extension method that should be used for Persistent Volumes? Would this just act like a shared file location and be accessible with the PersistKeysToFileSystem API for Data Protection? Any resources that you could point me to would be greatly appreciated.
A PersistentVolume with Kubernetes in Azure will not give you the same exact functionality as Key Vault in Azure.
PesistentVolume:
Store locally on a mounted volume on a server
Volume can be encrypted
Volume moves with the pod.
If the pod starts on a different server, the volume moves.
Accessing volume from other pods is not that easy.
You can control performance by assigning guaranteed IOPs to the volume (from the cloud provider)
Key Vault:
Store keys in a centralized location managed by Azure
Data is encrypted at rest and in transit.
You rely on a remote API rather than a local file system.
There might be a performance hit by going to an external service
I assume this not to be a major problem in Azure.
Kubernetes pods can access the service from anywhere as long as they have network connectivity to the service.
Less maintenance time, since it's already maintained by Azure.

Monitoring Amazon S3 logs with Splunk?

We have a large extended network of users that we track using badges. The total traffic is in the neighborhood of 60 Million impressions a month. We are currently considering switching from a fairly slow, database-based logging solution (custom-built on PHP—messy...) to a simple log-based alternative that relies on Amazon S3 logs and Splunk.
After using Splunk for some other analyisis tasks, I really like it. But it's not clear how to set up a source like S3 with the system. It seems that remote sources require the Universal Forwarder installed, which is not an option there.
Any ideas on this?
Very late answer but I was looking for the same thing and found a Splunk app that does what you want, http://apps.splunk.com/app/1137/. I have yet not tried it though.
I would suggest logging j-son preprocessed data to a documentdb database. For example, using azure queues or simmilar service bus messaging technologies that fit your scenario in combination with azure documentdb.
So I'll keep your database based approach and modify it to be a schemaless easy to scale document based DB.
I use http://www.insight4storage.com/ from AWS Marketplace to track my AWS S3 storage usage totals by prefix, bucket or storage class over time; plus it shows me the previous versions storage by prefix and per bucket. It has a setting to save the S3 data as splunk format logs that might work for your use case, in addition to its UI and webservice API.
You use Splunk Add-On for AWS.
This is what I understand,
Create a Splunk instance. Use the website version or the on-premise
AMI of splunk to create an EC2 where splunk is running.
Install Splunk Add-On for AWS application on the EC2.
Based on the input logs type (e.g. Cloudtrail logs, Config logs, generic logs, etc) configure the Add-On and supply AWS account id or IAM Role, etc parameters.
The Add-On will automatically ping AWS S3 source and fetch the latest logs after specified amount of time (default to 30 seconds).
For generic use case (like ours), you can try and configure Generic S3 input for Splunk

Web Service - Efficient way to transfer file to cloud platform storage

I have a service that requires some input files from the client. This service is run from a number of different cloud platforms (Amazon, Azure, etc.). Further, it requires that these input files be stored in their respective cloud platform's persistent storage. What is the best way to transfer those input files from the client to the service without requiring that the client knows about the specific cloud platform's details (e.g. secret keys)?
Given these requirements, the simple approach of the client transferring these files directly to the platform's storage (e.g. S3) isn't possible.
I could have the client call a service method and pass it the file blob to transfer up to the service, but that would require that the service receive the file, then do it's own file transfer to the cloud platform's storage (in other words, do 2 file transfers for 1 file).
So, the question is, what is the best way to transfer a file to storage given that we don't know what cloud platform the service is running on?
In this setting your proposed proxy solutions seems to be the only one and although the double transfer seems to be bad, in practice its not that evil as the cloud storage is commonly in the same datacenter as the compute nodes and therefore very fast to access.