How do I make a Bigquery dataset public using command line tool or Python? - google-bigquery

I'm making an open data website powered by BigQuery. How do I make a Bigquery dataset public using command line tool or Python?
Note I tried to make every dataset in my project public but got an unexplained error. In project permission settings via WebUI under "Add members" I put
allAuthenticatedUsers and did the permission Data Viewer. The error was "Error
Sorry, there’s a problem. If you entered information, check it and try again. Otherwise, the problem might clear up on its own, so check back later."
I wasn't able to find any command line examples for updating permissions. I also can't find a JSON string to pass to https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/update

To achieve this programatically, you need to use a dataset patch request and use the specialGroup item with the value allAuthenticatedUsers, like so:
{
"datasetReference":{
"projectId":"<removed>",
"datasetId":"<removed>"
},
"access":[
... //other access roles
{
"specialGroup":"allAuthenticatedUsers",
"role":"READER"
}
]
}
Note: You should use a read-modify-write cycle as described here & here:
Note about arrays: Patch requests that contain arrays replace the existing array with the one you provide. You cannot modify, add, or delete items in an array in a piecemeal fashion.

Related

Enable Impala Impersonation on Superset

Is there a way to make the logged user (on superset) to make the queries on impala?
I tried to enable the "Impersonate the logged on user" option on Databases but with no success because all the queries run on impala with superset user.
I'm trying to achieve the same! This will not completely answer this question since it does not still work but I want to share my research in order to maybe help another soul that is trying to use this instrument outside very basic use cases.
I went deep in the code and I found out that impersonation is not implemented for Impala. So you cannot achieve this from the UI. I found out this PR https://github.com/apache/superset/pull/4699 that for whatever reason was never merged into the codebase and tried to copy&paste code in my Superset version (1.1.0) but it didn't work. Adding some logs I can see that the configuration with the impersonation is updated, but then the actual Impala query is with the user I used to start the process.
As you can imagine, I am a complete noob at this. However I found out that the impersonation thing happens when you create a cursor and there is a constructor parameter in which you can pass the impersonation configuration.
I managed to correctly (at least to my understanding) implement impersonation for the SQL lab part.
In the sql_lab.py class you have to add in the execute_sql_statements method the following lines
with closing(engine.raw_connection()) as conn:
# closing the connection closes the cursor as well
cursor = conn.cursor(**database.cursor_kwargs)
where cursor_kwargs is defined in db_engine_specs/impala.py as the following
#classmethod
def get_configuration_for_impersonation(cls, uri, impersonate_user, username):
logger.info(
'Passing Impala execution_options.cursor_configuration for impersonation')
return {'execution_options': {
'cursor_configuration': {'impala.doas.user': username}}}
#classmethod
def get_cursor_configuration_for_impersonation(cls, uri, impersonate_user,
username):
logger.debug('Passing Impala cursor configuration for impersonation')
return {'configuration': {'impala.doas.user': username}}
Finally, in models/core.py you have to add the following bit in the get_sqla_engine def
params = extra.get("engine_params", {}) # that was already there just for you to find out the line
self.cursor_kwargs = self.db_engine_spec.get_cursor_configuration_for_impersonation(
str(url), self.impersonate_user, effective_username) # this is the line I added
...
params.update(self.get_encrypted_extra()) # already there
#new stuff
configuration = {}
configuration.update(
self.db_engine_spec.get_configuration_for_impersonation(
str(url),
self.impersonate_user,
effective_username))
if configuration:
params.update(configuration)
As you can see I just shamelessy pasted the code from the PR. However this kind of works only for the SQL lab as I already said. For the dashboards there is an entirely different way of querying Impala that I did not still find out.
This means that queries for the dashboards are handled in a different way and there isn't something like this
with closing(engine.raw_connection()) as conn:
# closing the connection closes the cursor as well
cursor = conn.cursor(**database.cursor_kwargs)
My gut (and debugging) feeling is that you need to first understand the sqlalchemy part and extend a new ImpalaEngine class that uses a custom cursor with the impersonation conf. Or something like that, however it is not simple (if we want to call this simple) as the sql_lab part. So, the trick is to find out where the query is executed and create a cursor with the impersonation configuration. Easy, isnt'it ?
I hope that this could shed some light to you and the others that have this issue. Let me know if you did find out another way to solve this issue, or if this comment was useful.
Update: something really useful
A colleague of mine succesfully implemented impersonation with impala without touching any superset related, but instead working directly with the impyla lib. A PR was open with the code to change. You can apply the patch directly in the impyla src used by superset. You have to edit both dbapi.py and hiveserver2.py.
As a reminder: we are still testing this and we do not know if it works with different accounts using the same superset instance.

how to get more info than generic "Failed to parse JSON: No active field found.; ParsedString returned false; Could not parse value" on BigQuery load?

We're trying BigQuery for the first time, with data extracted from mongo in json format. I kept getting this generic parse error upon loading the file. But then I tried a smaller subset of the file, 20 records, and it loaded fine. This tells me it's not the general structure of the file, which I had originally thought was the problem. Is there any way to get more info on the parse error, such as the string of the record that it's trying to parse when it has this error?
I also tried using the max errors field, but that didn't work either.
This was via the website. I also tried it via the Google Cloud SDK command line 'bq load...' and got the same error.
This error is most likely caused by some of the JSON records not compying with table schema. It is not clear whether you used schema autodetect feature, or you are supplying schema for the load. But here is one example where such error could happen:
{ "a" : "1" }
{ "a" : { "b" : "2" } }
If you only have a few of these and they are for invalid records - you can automatically ignore them by using max_bad_records option for load job. More details at: https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-json

Google Apps Script ScriptDb permissions issue

I am having an issue trying to query the ScriptDb of a resource file in Google Apps Script. I create a script file (file1), add it as a resource to another script file (file2). I call file1 from file2 to return a handle to its ScriptDb. This works fine. I then try to query the ScriptDb but have a permissions error returned.
Both files owned by same user and in same google environment
See code below:
file 1:
function getMyDb() {
return ScriptDb.getMyDb;
}
file 2 (references file1):
function getDataFromFile1() {
var db = file1.getMyDb(); // This works
var result = db.query({..............}); // This results in a permissions error!
}
I am at a loss to understand why I can access file1 and get back a handle on the ScriptDb, but then am not able to query it, due to an permissions issue.
I have tried to force file1 to require re-authorization, but have not yet been successful. I tried adding a new function and running it, so any suggestions there would be gratefully received.
Thanks in advance
Chris
There appears to be an error in file1/line2. It says "return ScriptDb.getMyDb;" but it should say "return ScriptDb.getMyDb();"
If you leave out the ()s then when you call file1 as a library, file1.getMyDb() will return a function which you store in var db. Then the line var result = db.query({..............}) results in an error because there is no method "query" in the function.
Is that what's causing your error?
I have figured out what the problem was, a misunderstanding on my part regarding authorisation. I was thinking of it in terms of file permissions, when in fact that problem was that my code was not authorised to run the DbScript service. As my code calls a different file and receives back a pointer to a ScriptDb database it is not using the ScriptDb service, so then when it calls the db.query() it invokes the ScriptDb service, for which it is not authorised.
To resolve this I just had to create a dummy function and make a ScriptDb.getMyDb() call, which triggered authorisation for the service. The code then worked fine.
Thanks for the input though.
Chris

MsTest, DataSourceAttribute - how to get it working with a runtime generated file?

for some test I need to run a data driven test with a configuration that is generated (via reflection) in the ClassInitialize method (by using reflection). I tried out everything, but I just can not get the data source properly set up.
The test takes a list of classes in a csv file (one line per class) and then will test that the mappings to the database work out well (i.e. try to get one item from the database for every entity, which will throw an exception when the table structure does not match).
The testmethod is:
[DataSource(
"Microsoft.VisualStudio.TestTools.DataSource.CSV",
"|DataDirectory|\\EntityMappingsTests.Types.csv",
"EntityMappingsTests.Types#csv",
DataAccessMethod.Sequential)
]
[TestMethod()]
public void TestMappings () {
Obviously the file is EntityMappingsTests.Types.csv. It should be in the DataDirectory.
Now, in the Initialize method (marked with ClassInitialize) I put that together and then try to write it.
WHERE should I write it to? WHERE IS THE DataDirectory?
I tried:
File.WriteAllText(context.TestDeploymentDir + "\\EntityMappingsTests.Types.csv", types.ToString());
File.WriteAllText("EntityMappingsTests.Types.csv", types.ToString());
Both result in "the unit test adapter failed to connect to the data source or read the data". More exact:
Error details: The Microsoft Jet database engine could not find the
object 'EntityMappingsTests.Types.csv'. Make sure the object exists
and that you spell its name and the path name correctly.
So where should I put that file?
I also tried just writing it to the current directory and taking out the DataDirectory part - same result. Sadly, there is limited debugging support here.
Please use the ProcessMonitor tool from technet.microsoft.com/en-us/sysinternals/bb896645. Put a filter on MSTest.exe or the associate qtagent32.exe and find out what locations it is trying to load from and at what point in time in the test loading process. Then please provide an update on those details here .
After you add the CSV file to your VS project, you need to open the properties for it. Set the Property "Copy To Output Directory" to "Copy Always". The DataDirectory defaults to the location of the compiled executable, which runs from the output directory so it will find it there.

Magento - Trouble with setting up Model Read Adapter

I was following through on Alan Storm's tutorial on Magento's Model and ORM basics and I've run into a bit of a problem. When I get to the portion where I load from the Model for the first time I get this error "Fatal error: Call to a member function load() on a non-object...". I've reset everything already and tried again from scratch but I still get the same problem. My code looks like this:
$params = $this->getRequest()->getParams();
$blogpost = Mage::getModel('weblog/blogpost');
var_dump($blogpost);
echo("Loading the blogpost with an ID of ".$params['id']);
$blogpost->load($params['id']);
As you can see I dumped the contents of $blogpost and it shows that it is just a boolean false. My guess is that there's either a problem with the connection to the database or, for some reason, the code for Mage::getModel() didn't get installed correctly.
EDIT - Adding Code
There's so many that I just decided to pastebin them lol
app/code/local/Ahathaway/Weblog/controllers/IndexController.php
app/code/local/Ahathaway/Weblog/etc/config.xml
app/code/local/Ahathaway/Weblog/Model/Blogpost.php
app/etc/modules/Ahathaway_Weblog.xml
Your Model/Blogpost.php file should actually be Model/Mysql4/Blogpost.php, and you are missing the real Model/Blogpost.php.
My guess is that Mage cannot find your model class. Double check the module/model name and also verify if the model is in a correct place in the filesystem (it should be in app/code/local/Weblog/Model/Blogpost.php).
You also need to check if your config.xml correctly defines your model classes. It would be best if you could past your config.xml and your model class...
A quick glance reveals you're missing the model resource. Go back to the section around the following code example
File: app/code/local/Alanstormdotcom/Weblog/Model/Mysql4/Blogpost.php
class Alanstormdotcom_Weblog_Model_Mysql4_Blogpost extends Mage_Core_Model_Mysql4_Abstract{
protected function _construct()
{
$this->_init('weblog/blogpost', 'blogpost_id');
}
}