How to give user specific access to BigQuery columns? - google-bigquery

I am currently trying to limit the columns that my view may return but keeping the possibility to the user to filter it, for example:
Table
{ f_name: string, l_name: string, ssn: string }
View
{ f_name: string, l_name: string}
but allowing queries like this: SELECT * FROM view WHERE ssn = '1234567890'
I am pretty sure that there is better approach but I am too deep to see it :)

Below is high level idea for you
This is for BigQuery Standard SQL
#standardSQL
WITH `yourTable` AS (
SELECT 'a' f_name, 'x' l_name, '1234567890' ssn UNION ALL
SELECT 'b', 'y', '2234567890' UNION ALL
SELECT 'c', 'z', '3234567890' UNION ALL
SELECT 'd', 'v', '4234567890' UNION ALL
SELECT 'r', 'w', '5234567890'
),
`yourView`AS (
SELECT f_name, l_name, FARM_FINGERPRINT(ssn) ssn
FROM `yourTable`
)
SELECT *
FROM `yourView`
WHERE ssn = FARM_FINGERPRINT('3234567890')
Below is implementation outline:
1. create yourView View in separate for yourTable Dataset
2. authorize yourView View as a reader for yourTable's dataset
3. so now, any user who has access to your view will be able to run below
4. of course, make sure your users do not have access to yourTable's dataset
#standardSQL
SELECT *
FROM `yourView`
WHERE ssn = FARM_FINGERPRINT('3234567890')
and even if ssn is visible it is not real

There is no way to expose only a subset of columns but to allow access to all of them if a filter is present. What if the filter was ssn = ssn, for instance? That would always be true unless ssn was null. You can, however, set up different datasets with different permissions and create views that expose some subset of columns in them. There is a good tutorial on creating authorized views in the BigQuery documentation.
For example, you could have:
restricted_dataset: Only certain people in your team/organization can query or manage the tables and views in this dataset. Contains a table named all_info containing all data.
open_dataset: Anyone in your team/organization can query tables/views in this dataset. Contains a view named filtered_info defined e.g. as SELECT f_name, l_name FROM all_info;

Related

How to execute a select with a WHERE using a not-always-existing column

Simple example: I have some (nearly) identical tables with personal data (age, name, weight, ...)
Now I have a simple, but long SELECT to find missing data:
Select ID
from personal_data_a
where
born is null
or age < 1
or weight > 500
or (name is 'John' and surname is 'Doe')
Now the problem is:
I have some personal_data tables where the column "surname" does not exit, but I want to use the same SQL-statement for all of them. So I have to check (inside the WHERE clause) that the last OR-condition is only used "IF the column surname exists".
Can it be done in a simple way?
You should have all people in the same table.
If you can't do that for some reason, consider creating a view. Something like this:
CREATE OR REPLACE VIEW v_personal_data
AS
SELECT id,
born,
name,
surname,
age,
weight
FROM personal_data_a
UNION ALL
SELECT id,
born,
name,
NULL AS surname, --> this table doesn't contain surname
age,
weight
FROM personal_data_b;
and then
SELECT id
FROM v_personal_data
WHERE born IS NULL
OR age < 1
OR ( name = 'John'
AND ( surname = 'Doe'
OR surname IS NULL))
Can it be done in a simple way?
No, SQL statements work with static columns and the statements will raise an exception if you try to refer to a column that does not exist.
You will either:
need to have a different query for tables with the surname column and those without;
have to check in the data dictionary whether the table has the column or not and then use dynamic SQL to build your query; or
to build a VIEW of the tables which do not have that column and add the column to the view (or add a GENERATED surname column with a NULL value to the tables that are missing it) and use that instead.
While dynamic predicates are usually best handled by the application or by custom PL/SQL objects that use dynamic SQL, you can solve this problem with a single SQL statement using DBMS_XMLGEN, XMLTABLE, and the data dictionary. The following code is not what I would call "simple", but it is simple in the sense that it does not require any schema changes.
--Get the ID column from a PERSONAL table.
--
--#4: Get the IDs from the XMLType.
select id
from
(
--#3: Convert the XML to an XMLType.
select xmltype(personal_xml) personal_xmltype
from
(
--#2: Convert the SQL to XML.
select dbms_xmlgen.getxml(v_sql) personal_xml
from
(
--#1: Use data dictionary to create SQL statement that may or may not include
-- the surname predicate.
select max(replace(replace(
q'[
Select ID
from #TABLE_NAME#
where
born is null
or age < 1
or weight > 500
or (name = 'John' #OPTIONAL_SURNAME_PREDICATE#)
]'
, '#TABLE_NAME#', table_name)
, '#OPTIONAL_SURNAME_PREDICATE#', case when column_name = 'SURNAME' then
'and surname = ''Doe''' else null end)) v_sql
from all_tab_columns
--Change this literal to the desired table.
where table_name = 'PERSONAL_DATA_A'
)
)
where personal_xml is not null
)
cross join xmltable
(
'/ROWSET/ROW'
passing personal_xmltype
columns
id number path 'ID'
);
See this db<>fiddle for a runnable example.

PGSQL Help, SELECT Row Binding

i want to ask about postgresql, i couldnt find my answer from google.
Okay, lets get started,
suppose i have a table named 'name' with 20 rows, which has the column 'first_name' and 'last_name'.
what SQL i should use to make the query return only one column that contain all the value in 'first_name' and 'last_name' so that it return 40 rows (20 from 'first_name' and another 20 from 'last_name')?
Thanks for your help,
You can do this using the UNION operator and aliasing the columns to the same name, as below:
SELECT
first_name AS names
FROM name
UNION
SELECT
last_name AS names
FROM name;
However the UNION operator will remove duplicate name, to include everything, including duplicate names use the UNION ALL operator as below:
SELECT
first_name AS names
FROM name
UNION ALL
SELECT
last_name AS names
FROM name;
Another way to transpose columns to rows:
SELECT UNNEST(ARRAY[first_name, last_name]) AS name
FROM name;
This approach will do a single sequential scan on the table, while the UNION ALL approach will do two scans. Which approach performs better is likely to depend on your data.

Keeping a nested structure in a result table while writing over nested data

I need to do some brief masking of data in our bigquery table. I need the resulting table to have the same structure, but removed of personal information.
I'm doing something along the lines of:
select
customer,
"1234 Road" as tttt.address
...
from table
I can't delve into more detail, but I need to overwrite things such as customer name and phone number, while the structure remains the same.
You can use something like this:
#standardSQL
select
* EXCEPT(tttt),
(SELECT AS STRUCT tttt.* REPLACE("1234 Road" AS address)) AS tttt
from table;
As a concrete example:
#standardSQL
WITH T AS (
SELECT
1 AS x,
'foo' AS y,
STRUCT('kksdf' AS address, 1234 AS street) AS tttt
)
select
* EXCEPT(tttt),
(SELECT AS STRUCT tttt.* REPLACE("1234 Road" AS address)) AS tttt
from T;
You can read more about this syntax in the Query Syntax topic.
You can use below approach to have ability to query live original table with obfuscated/dummy data points
#standardSQL
CREATE TEMP FUNCTION dummy_string(value STRING)
AS ((SELECT CONCAT(value, '_', CAST(CAST(100000* RAND() AS INT64) AS STRING))));
WITH yourTable AS (
SELECT 'customer1' AS customer, 'address1' AS address, 'phone1' AS phone UNION ALL
SELECT 'customer2' AS customer, 'address2' AS address, 'phone2' AS phone UNION ALL
SELECT 'customer3' AS customer, 'address3' AS address, 'phone3' AS phone UNION ALL
SELECT 'customer4' AS customer, 'address4' AS address, 'phone4' AS phone UNION ALL
SELECT 'customer5' AS customer, 'address5' AS address, 'phone5' AS phone
)
SELECT * REPLACE(
dummy_string('aaaa') AS address,
dummy_string('bbbb') AS phone
)
FROM yourTable
You can use whatever obfuscation logic you want to have implementing it in dummy_string() SQL UDF
Even more - based on this query - you can make a view that will reside in separate dataset (different from dataset where original table is) so whoever will get access to this view (but not to original table) will be able to explore table but with hidden/dummy data-points of your choice
Follow below steps to make it happen
1 - Create a view in a dataset different from dataset where yourTable resides. This is important!
#standardSQL
SELECT * REPLACE(
CONCAT('aaaa', '_', cast(CAST(100000* RAND() as INT64) as string)) AS address,
CONCAT('bbbb', '_', cast(CAST(100000* RAND() as INT64) as string)) AS phone
)
FROM yourTable
As you can see here - i am not using SQL UDF because UDF are not supported in View (yet - I hope)
2 - Go to Share Dataset Menu of dataset where original table is and add just created view as Authorized View
3 - Go to Share Dataset Menu of dataset where view is created and add as a Viewer those users who you want be able to play with obfuscated original table
Above setup - makes user being able to see and use view - but they will not have access to original table/data
#standardSQL
SELECT *
FROM yourView
I think this example can help you

How to do you map an object to an aggregated view in NHibernate?

Basically, I have an existing database I'm trying to map to with NHibernate.
Here's a simplified example:
CREATE TEMPORARY TABLE exmplTable (
id INT,
changeNumber INT,
name VARCHAR(255),
address VARCHAR(255)
)
which might contain the following records:
1 0 'John Doe' '123 Fake St'
1 1 'John Doe' '145 Another St' -- John moved
1 2 'John Doe' '42 Clark St' -- John moved again
I only care about the most recent info for a single id. If I was to map these manually, I'd make a view:
SELECT id, name, address
FROM exmplTable E
INNER JOIN
(
SELECT id, MAX(changeNumber) cn
FROM exmplTable
GROUP BY id
) E2
ON E.id = E2.id AND E.changeNumber = E2.cn
and then get a record by id this way:
SELECT * FROM viewname WHERE id = #id
SO THEN:
Without making a view in the database, and without having an interface to the DAL to retrieve a record by manually performing the aggregate query, is it possible to just have NHibernate map to this sort of a relationship?
Note that although I am using NHibernate Hibernate xml works the same AFAIK.
In case, that we need to map some complex SELECT statements (as the one mentioned above) and we
1) do not want create view for that
2) can accept that solution will be readonly (expectable, I know)
We can use NHibernate built-in (but not so deeply documented and presented) feature <subselect>. Check here more details:
NHibernate Criteria: Subquery After 'From' Clause
small snippet example
<class name="MyEntity"... >
<subselect>
SELECT ...
FROM ...
</subselect>
...

Select from multiple tables with different columns

Say I got this SQL schema.
Table Job:
id,title, type, is_enabled
Table JobFileCopy:
job_id,from_path,to_path
Table JobFileDelete:
job_id, file_path
Table JobStartProcess:
job_id, file_path, arguments, working_directory
There are many other tables with varying number of columns and they all got foreign key job_id which is linked to id in table Job.
My questions:
Is this the right approach? I don't have requirement to delete anything at anytime. I will require to select and insert mostly.
Secondly, what is the best approach to get the list of jobs with relevant details from all the different tables in a single database hit? e.g I would like to select top 20 jobs with details, their details can be in any table (depends on column type in table Job) which I don't know until runtime.
select (case when type = 'type1' then (select field from table1) else (select field from table2) end) as a from table;
Could it be a solution for you?