De-identified data from BigQuery with DLP - google-bigquery

I would like to de-identified my PII data that already in BiqQuery with Google DLP, and store the result in another table in BigQuery. Is that possible ? and how to do that ?

Currently the main recommendation is to use dataflow.
https://github.com/GoogleCloudPlatform/dlp-dataflow-deidentification

The different methods for De-Indentifying sensitive data in DLP are available through API, for example, we can use replaceConfig to replace from:
My email address is astacko#example.com.
to
My email address is [email-address].
by using an API request like this:
"deidentifyConfig":{
"infoTypeTransformations":{
"transformations":[
{
"infoTypes":[
{
"name":"EMAIL_ADDRESS"
}
],
"primitiveTransformation":{
"replaceConfig":{
"newValue":{
"stringValue":"[email-address]"
}
}
}
}
]
}
}
So, for your use case you would need to integrate the De-identifying API into a flow that reads from BigQuery, perform the De-identifying transformations and writes back to BigQuery.
Cloud DLP in action is a Google post that talks about this. It points out to Dataflow to achieve this use case. Please refer to this Reference Architecture to have an idea of how this can work, in there you will find some Java classes examples. You can modify it if needed so that you can ingest it to BigQuery.

As a quick workaround, I would consider moving the tables with PII into a dataset with restricted access. Then, in a new dataset, create a view that does not include the sensitive columns. Give users query access to only the dataset with the view, and not the private dataset.
https://cloud.google.com/bigquery/docs/share-access-views

This feature is currently in preview (October 2022). Talk to your Google Cloud sales rep to see if it can be enabled for your project.

Related

Does accessing Firebase Database via Firebase Console adds to Usage?

I was wondering if accessing data, For example view any item, modify any item in Db/Firestore directly via Firebase console adds to usage (data download/ FireStore Read etc.)
I googled it to find an answer but I didn't find any. So raising my query here.
Also wanted to know if there is a way to provide write access to a particular child to 3-4 specific emails (google authentication). I understand we can allow writing to users who created it using the below rules. But in my case, I want others (few) also to be able to write to the child but not all( so cannot use ".write": "auth != null" )
{
"rules": {
"users": {
"$uid": {
".write": "$uid === auth.uid"
}
}
}
}
Thanks in advance.
Yes, it does charge you. The documentation says,
Firebase console data: Although this isn't usually a significant portion of Realtime Database costs, Firebase charges for data that you read and write from the Firebase console.
Similarly you are charged for any reads and writes from the Firestore console. Also you need to be the project owner (or have editor/viewer role) to access the console. You'll be charged the amount of data loaded or written. For example, if the console loads 500 documents then you'll be charged 500 reads.
For the data retrieved from client side, "Firebase charges for the downloaded data. Typically, this makes up the bulk of your bandwidth costs, but it isn't the only factor in your bill." Make sure you check all the factors in the documentation above.
To allow specific users to read or write to your database, you can try using custom claims. You can refer to this answer for a detailed explanation on security rules with custom claims.

How does Google Tag Manager handle campaign data

This may be a really silly question, but I'm so fed up I have to ask.
If an external resource is directed to my website with campaign data, i.e.
utm_source=splodge&utm_medium=foobar&utm_campaign=santaclause
Do I need to capture this and pass it to Google Analytics through Google Tag Manager?
Unless you deliberately override either the "location" field or the campaign related fields in your Google Analytics tag configuration GTM will simply pass the values through to GA.
So while (to be pedantic) GTM itself does not "handle data" at all it does not interfere with the way configured tracking tools like GA handle data. No special action is necessary.

Pagemap data and Google Structured Data Validator tool

I'm trying to get back pagemap data when I call the google site search api, and it's not currently present.
The sample response here leads me to believe it is possible ( there is a pagemap field )
https://developers.google.com/custom-search/json-api/v1/reference/cse/list?hl=en
Can anyone confirm that the structured data tool reads PageMap data? I've tried using it to verify my pagemap data is correct, but finds nothing.
https://developers.google.com/structured-data/testing-tool/
If it doesn't, does the Site search API only return pagemap attributes maybe at the paid level?
Answering below question:
If it doesn't, does the Site search API only return pagemap attributes maybe at the paid level?
Yes. It gives you a pagemap object which itself is sub-divided in other data objects. For instance the book:isbn was provided as Schema.org data, whereas the og: stuff as Open Graph Data.

How can I obtain an API-key for my Fusion Tables

Yep, newbie question here, but it's bothering me for some days now, trying to read all the docs on google developer site, but I'm spinning in circles.
I've created a Fusion Table and set the access to 'public' and got an ID.
According to Goolge I should have an API key to access the data from a REST-call. Google suggests:
Go to the Google Developers Console.
Select a project, or create a new one.
In the sidebar on the left, expand APIs & auth. Next, click APIs. In the list of APIs, make sure the status is ON for the Fusion Tables API.
In the sidebar on the left, select Credentials.
I can do that all I've got an API-key, but how does this relates to the Fusion Table I've created? Can I use that API key for
this is really simple API key give the ability to do the most of mysql request type SELECT,INSERT,UPDATE,DELETE using GET and POST also PUT request , for GET you can use the navigator for that but the most effective way for your case is the use of curl librairie for php or jaira for java ... so you can send post or put request with a simple script.
So, what you can do with fusion table is automating the process of manipulating data and the option to share those data with someone else.
Edit: procedures changed since this post. Your mileage may vary
Head to the Google Developers Console
Create a project
Under Explore other services click "Enable APIs and get credentials like keys"
Search for Fusion Tables
Enable Fusion Tables API as a service under APIs & Auth --> APIs
You probably want the browser key. Grab the API key.
Happy Mapping...
API-keys are not related to specific Fusion Tables, they are related to projects.
You may use the key to request data from any public and downloadable FusionTable(not only your own Tables ), the key basically is used to identify your project(google-account) .
So when you have problems with requesting data from a public table, check if the table is downloadable too(click on the table-name on top-left->reuse access->allow downloads ).

List of all companies on AngelList via API

https://angel.co/api/spec/startups
What would the best approach for hitting every company that is listed on AngelList? My first guess would be to query all the numbers up until 250k, the number of companies on angelList, using this endpoint https://api.angel.co/1/startups/45435
There surely has to be a better way of doing this though.
Yes it is possible via their API. And the API endpoint that you have mentioned in your question is the correct one. I have written a PHP component to achieve this. You can use this exporter application to download the start-ups data for each country into a CSV file : AngelList Data Exporter
I hope this helps you.
Angel.co does not expose its api anymore. So you have to parse the website to get any data.
Also a quick google search would give you a few websites which have different datasets from angel.co website.