Regarding Bigquery datasetId and tableId - google-bigquery

How do I get the datasetId and tableId from BigQuery. I tried to click the dropdown on the sidebar and copied the dataset info and table info, Is there any way, I can query the datasetId and tableId? but I got this error. I am using php client libraries to pull the BigQuery data.
How do I get the datasetId and tableId from BigQuery. I tried to click the dropdown on the sidebar and copied the dataset info and table info, Is there any way, I can query the datasetId and tableId? but I got this error. I am using php client libraries to pull the BigQuery data.
use Google\Cloud\BigQuery\BigQueryClient;
/** Uncomment and populate these variables in your code */
// $projectId = 'The Google project ID';
// $datasetId = 'The BigQuery dataset ID';
// $tableId = 'The BigQuery table ID';
// $maxResults = 10;
$maxResults = 10;
$startIndex = 0;
$options = [
'maxResults' => $maxResults,
'startIndex' => $startIndex
];
$bigQuery = new BigQueryClient([
'projectId' => $projectId,
]);
$dataset = $bigQuery->dataset($datasetId);
$table = $dataset->table($tableId);
$numRows = 0;
foreach ($table->rows($options) as $row) {
print('---');
foreach ($row as $column => $value) {
printf('%s: %s' . PHP_EOL, $column, $value);
}
$numRows++;
}
I am getting this error.
Google\Cloud\Core\Exception\BadRequestException : {
"error": {
"code": 400,
"message": "Invalid dataset ID \"mc-data-2:Turnflex\". Dataset IDs must be alphanumeric (plus underscores and dashes) and must be at most 1024 characters long.",
"errors": [
{
"message": "Invalid dataset ID \"mc-data-2:Turnflex\". Dataset IDs must be alphanumeric (plus underscores and dashes) and must be at most 1024 characters long.",
"domain": "global",
"reason": "invalid"
}
],
"status": "INVALID_ARGUMENT"
}
}

According to PHP Google Cloud client library documentation for a proper BigQueryClient() class initialization you might be requiring propagate $projectId variable. In case of any Biqguery data querying or discovering action aimed, you would probably leverage query() method submitting Bigquery Job with the particular custom query or explicitly define dataset() and table() as the target Bigquery location.
Confirming #Priya Agarwal presumption, I've checked the code snippet from your example and it works as intended. I've catched the similar error that you've reported, once I made connection with $datasetId variable exposed jointly with tableId:
$datasetId = 'datasetId:tableId'
instead of:
$datasetId = 'datasetId'

Related

Google App Script Big Query - GoogleJsonResponseException: API call to bigquery.jobs.query failed with error: Query parameter 'X' not found

I have been struggling with this for a couple of days now and I felt like I should reach out. This might be very simple but I am not from a programming background and I haven't found any resources to solve this so far.
Basically, I want to parameterize a SQL query that is running for BigQuery within Google APp Script, it takes a variable from a user from a Google From they have submitted and I wanted to ensure that this won't be injectable by parameterizing the query, however, I got the following error that I could not fix:
GoogleJsonResponseException: API call to bigquery.jobs.query failed with error: Query parameter 'account_name' not found at [1:90]
Here is how I run the query:
//Query
const sqlQuery = 'SELECT district FROM `table` WHERE account_name = #account_name AND ent_theatre=("X") LIMIT 1;'
const request = {
query: sqlQuery,
params: { account_name: queryvar },
useLegacySql: false,
};
// Run Query
var queryResult = BigQuery.Jobs.query(request,projectID);
I have created the query based on Google's documentation
Your syntax for request object is not correct. The right syntax for the BigQuery.Jobs.query Request is like below:
const request = {
query: sqlQuery,
queryParameters: [
{
name: "account_name",
parameterType: { type: "STRING" },
parameterValue: { value: queryvar }
}
],
useLegacySql: false,
};
For more detail about QueryRequest Object refer to this link.

How to create Type RECORD of INTEGER in my terraform file for BigQuery

I am trying to make the terraform schema for my BigQuery table and I need a column of type RECORD which will be populated by INTEGER.
The field in question would have the format of brackets with integers inside could be one or mutiple seperated by comma : [1]
I tried writing like this:
resource "google_bigquery_table" "categories" {
project = "abcd-data-ods-${terraform.workspace}"
dataset_id = google_bigquery_dataset.bq_dataset_op.dataset_id
table_id = "categories"
schema = <<EOF
[
{"type":"STRING","name":"a","mode":"NULLABLE"},
{"type":"RECORD[INTEGER]","name":"b","mode":"NULLABLE"}
]
EOF
}
and like this:
resource "google_bigquery_table" "categories" {
project = "abcd-data-ods-${terraform.workspace}"
dataset_id = google_bigquery_dataset.bq_dataset_op.dataset_id
table_id = "categories"
schema = <<EOF
[
{"type":"STRING","name":"a","mode":"NULLABLE"},
{"type":"RECORD","name":"b","mode":"NULLABLE"}
]
EOF
}
But it didn't work as I keep getting an error in my CI/CD on gitlab
The error for the first attempt:
Error: googleapi: Error 400: Invalid value for type: RECORD[INTEGER] is not a valid value, invalid
The error for the second attempt:
Error: googleapi: Error 400: Field b is type RECORD but has no schema, invalid
I presume that the second implementation is the closet to the solution given the error but it is still missing something
Does anyone has an idea about the right way to declare it
Just as stated at the second error:
Error: googleapi: Error 400: Field b is type RECORD but has no schema, invalid
You must provide a schema for RECORD types (you can read more on the docs). For instance, a valid example could be:
resource "google_bigquery_table" "categories" {
project = "abcd-data-ods-${terraform.workspace}"
dataset_id = google_bigquery_dataset.bq_dataset_op.dataset_id
table_id = "categories"
schema = <<EOF
[
{
"type":"STRING",
"name":"a",
"mode":"NULLABLE"
},
{
"type":"RECORD",
"name":"b",
"mode":"NULLABLE",
"fields": [{
"name": "c",
"type": "INTEGER",
"mode": "NULLABLE"
}]
}
]
EOF
}
Hope can help.

updating a value in an array in mongodb from java

I have couple of documens in mongodb as follow:
{
"_id" : ObjectId("54901212f315dce7077204af"),
"Date" : ISODate("2014-10-20T04:00:00.000Z"),
"Type" : "Twitter",
"Entities" : [
{
"ID" : 4,
"Name" : "test1",
"Sentiment" : {
"Value" : 20,
"Neutral" : 1
}
},
{
"ID" : 5,
"Name" : "test5",
"Sentiment" : {
"Value" : 10,
"Neutral" : 1
}
}
]
}
Now I want to update the document that has Entities.ID=4 by adding (Sentiment.Value+4)/2 for example in the above example after update we have 12.
I wrote the following code but I am stuck in the if statement as you can see:
DBCollection collectionG;
collectionG = db.getCollection("GraphDataCollection");
int entityID = 4;
String entityName = "test";
BasicDBObject queryingObject = new BasicDBObject();
queryingObject.put("Entities.ID", entityID);
DBCursor cursor = collectionG.find(queryingObject);
if (cursor.hasNext())
{
BasicDBObject existingDocument = new BasicDBObject("Entities.ID", entityID);
//not sure how to update the sentiment.value for entityid=4
}
First I thought I should unwind the Entities array first to get the value of sentiment but if I do that then how can I wind them again and update the document with the same format as it has now but with the new sentiment value ?
also I found the this link as well :
MongoDB - Update objects in a document's array (nested updating)
but I could not understand it since it is not written in java query,
can anyone explain how I can do this in java?
You need to do this in two steps:
Get all the _id of the records which contain a Entity with sentiment
value 4.
During the find, project only the entity sub document that has
matched the query, so that we can process it to consume only its
Sentiment.Value. Use the positional operator($) for this purpose.
Instead of hitting the database every time to update each matched
record, use the Bulk API, to queue up the updates and execute it
finally.
Create the Bulk operation Writer:
BulkWriteOperation bulk = col.initializeUnorderedBulkOperation();
Find all the records which contain the value 4 in its Entities.ID field. When you match documents against this query, you would get the whole document returned. But we do not want the whole document, we would like to have only the document's _id, so that we can update the same document using it, and the Entity element in the document that has its value as 4. There may be n other Entity documents, but they do not matter. So to get only the Entity element that matches the query we use the positional operator $.
DBObject find = new BasicDBObject("Entities.ID",4);
DBObject project = new BasicDBObject("Entities.$",1);
DBCursor cursor = col.find(find, project);
What the above could would return is the below document for example(since our example assumes only a single input document). If you notice, it contains only one Entity element that has matched our query.
{
"_id" : ObjectId("54901212f315dce7077204af"),
"Entities" : [
{
"ID" : 4,
"Name" : "test1",
"Sentiment" : {
"Value" : 12,
"Neutral" : 1
}
}
]
}
Iterate each record to queue up for update:
while(cursor.hasNext()){
BasicDBObject doc = (BasicDBObject)cursor.next();
int curVal = ((BasicDBObject)
((BasicDBObject)((BasicDBList)doc.get("Entities")).
get(0)).get("Sentiment")).getInt("Value");
int updatedValue = (curVal+4)/2;
DBObject query = new BasicDBObject("_id",doc.get("_id"))
.append("Entities.ID",4);
DBObject update = new BasicDBObject("$set",
new BasicDBObject("Entities.$.Sentiment.Value",
updatedValue));
bulk.find(query).update(update);
}
Finally Update:
bulk.execute();
You need to do a find() and update() and not simply an update, because currently mongodb does not allow to reference a document field to retrieve its value, modify it and update it with a computed value, in a single update query.

RavenDB Get document count after BulkInsertOperations

I am using RavenDB to bulk load some documents. Is there a way to get the count of documents loaded into the database?
For insert operations I am doing:
BulkInsertOperation _bulk = docStore.BulkInsert(null,
new BulkInsertOptions{ CheckForUpdates = true});
foreach(MyDocument myDoc in docCollection)
_bulk.Store(myDoc);
_bulk.Dispose();
And right after that I call the following:
session.Query<MyDocument>().Count();
but I always get a number which is less than the count I see in raven studio.
By default, the query you are doing limits to a sane number of results, part of RavenDB's promise to be safe by default and not stream back millions of records.
In order to get the number of a specific type of document in yoru database, you need a special map-reduce index whose job it is to track the counts for each document type. Because this type of index deals directly with document metadata, it's easier to define this in Raven Studio instead of trying to create it with code.
The source for that index is in this question but I'll copy it here:
// Index Name: Raven/DocumentCollections
// Map Query
from doc in docs
let Name = doc["#metadata"]["Raven-Entity-Name"]
where Name != null
select new { Name , Count = 1}
// Reduce Query
from result in results
group result by result.Name into g
select new { Name = g.Key, Count = g.Sum(x=>x.Count) }
Then to access it in your code you would need a class that mimics the structure of the anonymous type created by both the Map and Reduce queries:
public class Collection
{
public string Name { get; set; }
public int Count { get; set; }
}
Then, as Ayende notes in the answer to the previously linked question, you can get results from the index like this:
session.Query<Collection>("Raven/DocumentCollections")
.Where(x => x.Name == "MyDocument")
.FirstOrDefault();
Keep in mind, however, that indexes are updated asynchronously so after bulk-inserting a bunch of documents, the index may be stale. You can force it to wait by adding .Customize(x => x.WaitForNonStaleResults()) right after the .Query(...).
Raven Studio actually gets this data from the index Raven/DocumentsByEntityName which exists for every database, by sidestepping normal queries and getting metadata on the index. You can emulate that like this:
QueryResult result = docStore.DatabaseCommands.Query("Raven/DocumentsByEntityName",
new Raven.Abstractions.Data.IndexQuery
{
Query = "Tag:MyDocument",
PageSize = 0
},
includes: null,
metadataOnly: true);
var totalDocsOfType = result.TotalResults;
That QueryResult contains a lot of useful data:
{
Results: [ ],
Includes: [ ],
IsStale: false,
IndexTimestamp: "2013-11-08T15:51:25.6463491Z",
TotalResults: 3,
SkippedResults: 0,
IndexName: "Raven/DocumentsByEntityName",
IndexEtag: "01000000-0000-0040-0000-00000000000B",
ResultEtag: "BA222B85-627A-FABE-DC7C-3CBC968124DE",
Highlightings: { },
NonAuthoritativeInformation: false,
LastQueryTime: "2014-02-06T18:12:56.1990451Z",
DurationMilliseconds: 1
}
A lot of that is the same data you get on any query if you request statistics, like this:
RavenQueryStatistics stats;
Session.Query<Course>()
.Statistics(out stats)
// Rest of query

facets with ravendb

i am trying to work with the facet ability in ravendb but getting strange results.
i have a documents like :
{
"SearchableModel": "42LC2RR ",
"ModelName": "42LC2RR",
"ModelID": 490578,
"Name": "LG 42 Television 42LC2RR",
"Desctription": "fffff",
"Image": "1/4/9/8/18278941c",
"MinPrice": 9400.0,
"MaxPrice": 9400.0,
"StoreAmounts": 1,
"AuctionAmounts": 0,
"Popolarity": 3,
"ViewScore": 0.0,
"ReviewAmount": 2,
"ReviewScore": 45,
"Sog": "E-TV",
"SogID": 1,
"IsModel": true,
"Manufacrurer": "LG",
"ParamsList": [
"1994267",
"46570",
"4134",
"4132",
"4118",
"46566",
"4110",
"180676",
"239517",
"750771",
"2658507",
"2658498",
"46627",
"4136",
"169941",
"169846",
"145620",
"169940",
"141416",
"3190767",
"3190768",
"144720",
"2300706",
"4093",
"4009",
"1418470",
"179766",
"190025",
"170557",
"170189",
"43768",
"4138",
"67976",
"239516",
"3190771",
"141195"
],
}
where the ParamList each represents a property of the product and in our application we have in cache what each param represents.
when searching for a specific product i would like to count all the returning attributes to be able to add the amount of each item after the search.
After searching lg in televisions category i want to get :
Param:4134 witch is a representative of LCD and the amount :65.
but unfortunately i am getting strange results. only some params are counted and some not.
on some searchers where i am getting results back i dont get any amounts back.
i am using the latest stable version of RavenDB.
index :
from doc in docs
from param in doc.ParamsList
select new {Name=doc.Name,Description=doc.Description,SearchNotVisible = doc.SearchNotVisible,SogID=doc.SogID,Param =param}
facet :
DocumentStore documentStore = new DocumentStore { ConnectionStringName = "Server" };
documentStore.Initialize();
using (IDocumentSession session = documentStore.OpenSession())
{
List<Facet> _facets = new List<Facet>
{
new Facet {Name = "Param"}
};
session.Store(new FacetSetup { Id = "facets/Params", Facets = _facets });
session.SaveChanges();
}
usage example :
IDictionary<string, IEnumerable<FacetValue>> facets = session.Advanced.DatabaseCommands.GetFacets("FullIndexParams", new IndexQuery { Query = "Name:lg" }, "facets/Params");
i tried many variations without success.
does anyone have ideas what am i doing wrong ?
Thanks
Use this index, it should resolve your problem:
from doc in docs
select new {Name=doc.Name,Description=doc.Description,SearchNotVisible = doc.SearchNotVisible,SogID=doc.SogID,Param = doc.ParamsList}
What analyzer you set for "Name" field. I see you search by Name "lg". By default, Ravendb use KeywordAnalyzer, means you must search by exact name. You should set another analyzer for Name or Description field (StandardAnalyzer for example).