We have a number of .gs files in an Apps Script project, which schedule a number of BiqQuery SQL queries into tables.
Everything was fine till a few days ago when one table started not updating correctly. We've looked into the query history, and seen that one of our tables hasn't been updated in quite a while. When we run the Apps Script responsible for that table, and check the BigQuery query history, it actually is running a different query, even though the script is valid and references different source and destination tables.
Our scripts mostly look like the below:
function table_load_1() {
var configuration = {
"query": {
"useQueryCache": false,
"destinationTable": {
"projectId": "project",
"datasetId": "schema",
"tableId": "destination_table"
},
"writeDisposition": "WRITE_TRUNCATE",
"createDisposition": "CREATE_IF_NEEDED",
"allowLargeResults": true,
"useLegacySql": false,
"query": "select * from `project.schema.source_table` "
}
};
var job = {
"configuration": configuration
};
var jobResult = BigQuery.Jobs.insert(job, "project");
Logger.log(jobResult);
}
Any idea why this would be happening?
Related
I am using MongoDB. My task is to build Dashboard charts for the data. So, I am using Apache superset. I connected MongoDB to apache drill as it wont connect directly with superset. Then connected apache drill to Apachesueperset. My collection is nested. How can I process this nested data to get use for dashboard charts.My data looks as below
{
"_id": {
"$oid": "6229d3cfdbfc81a8777e4821"
},
"jobs": [
{
"job_ID": {
"$oid": "62289ded8079821eb24760e0"
},
"New": false,
"Expired": false
},
{
"job_ID": {
"$oid": "6228a252fb4554dd5c48202a"
},
"New": true,
"Expired": true
},
{
"job_ID": {
"$oid": "622af1c391b290d34701af9f"
},
"New": true,
"Expired": false
}
],
"email": "mani2090996#ail.com"
}
I am querying in apache drill as follows
SELECT flat.fill FROM (SELECT FLATTEN(t.jobs) AS fill FROM mongo.recruitingdb.flatten.`Vendorjobs` t) flat WHERE flat.fill.New = flase;
And i am getting parsing error
org.apache.drill.common.exceptions.UserRemoteException: PARSE ERROR: Encountered "." at line 1, column 123.
Superset doesn't really handle nested data very well. Drill does however, so you'll have to craft queries to produce columns that can be visualized.
Take a look here: https://drill.apache.org/docs/json-data-model/
and here: https://drill.apache.org/docs/querying-complex-data-introduction/.
UPDATE:
Try the query below. The FROM clause may not be exactly right, but you should get the idea from this.
Note that you can access maps in Drill in two ways:
tablename.mapname.field OR
mapname['field']
You can do this for any level of nesting.
SELECT mongoTable.jobs.job_ID.`$oid` AS job_ID,
mongoTable.jobs.`New` AS new,
mongoTable.jobs.`Expired` AS expired
FROM
(
SELECT flatten(jobs) AS jobs
FROM mongo.recruitingdb.flatten.`Vendorjobs` AS t1
WHERE t1.jobs.New = false
) AS mongoTable
I have tried using BigQueryInsertJobOperator operator in airflow for inserting data into table, It works completely fine for 1-2 rows.
My sql has 100 million rows , the task for the operators run completely with status as successful even creates the table but table is empty
task2 = BigQueryInsertJobOperator(
task_id="insert_query_job",
gcp_conn_id='xxxxxxxxx',
configuration={
"query": {
"query": my_sql_query,
"useLegacySql": False,
"destinationTable": {
"projectId": "xxxxxxxx",
"datasetId": 'yyyyyyyy',
"tableId": 'zzzzzzzzz'
},
"allowLargeResults":True
}
}
)
Given there is a Partitioned table in BigQuery, is it possible to make it non-partitioned?
AFAIK there is no feature/function to unpartition a partitioned table. However, there's nothing stopping you doing a select * from partitioned_table and writing to results to a new (non-partitioned table). Using this approach you'll of course take a hit on cost.
Another way could be to export your table(s) to GCS and then load the exported file(s) back in. Loading doesn't cost anything, so you'd only pay for the brief amount of time the files are stored in GCS.
Is it possible to make a Partitioned table Non-partitioned in BigQuery?
It is possible!! Quick and Free ($0.00)
Step 1 - Create new non-partitioned table with exact same schema as your source table and no rows
#standardSQL
SELECT *
FROM `xxx.yyy.your_partitioned_table`
WHERE FALSE
run above with destination - xxx.yyy.your_new_NON_PARTITIONED_table
Step 2 - Invoke Copy job with WRITE_APPEND as a writeDisposition
Your copy config will look like below
{
"configuration": {
"copy": {
"sourceTable": {
"projectId": "xxx",
"datasetId": "yyy",
"tableId": "your_partitioned_table"
},
"destinationTable": {
"projectId": "xxx",
"datasetId": "yyy",
"tableId": "your_new_NON_PARTITIONED_table"
},
"createDisposition": "CREATE_IF_NEEDED",
"writeDisposition": "WRITE_APPEND"
}
}
}
Mission accomplished!
Note: Both steps are free of charge!
I've got a problem with a query I can't figure out, I'm doing this query in code:
var userList = (from user in this.documentSession.Query<User>()
where user.FederatedUserIds[authenticatedClient.ProviderName] == authenticatedClient.UserInformation.Id
select user).ToList();
In this case the providername is facebook and the id = 100001103765630. The FederatedUserId's is a Dictionary.
Which results in this query to the server:
http://localhost:8080/indexes/dynamic/Users?&query=FederatedUserIds.facebook%3A100001103765630&pageSize=128
Which gives zero results, also from a query in the webbrowser:
{
"Results": [],
"Includes": [],
"IsStale": false,
"IndexTimestamp": "2013-08-24T14:52:44.0511623Z",
"TotalResults": 0,
"SkippedResults": 0,
"IndexName": "Auto/Users/ByFederatedUserIds_facebook",
"IndexEtag": "01000000-0000-0064-0000-000000000001",
"ResultEtag": "2BD9AA1E-935A-FEDF-3636-FAB0F155ED9E",
"Highlightings": {},
"NonAuthoritativeInformation": false,
"LastQueryTime": "2013-08-24T15:00:30.1200358Z",
"DurationMilliseconds": 1
}
While I have a document that's like this in the database, so I expect 1 result instead of 0:
{
"DisplayName": "neographikal",
"RealName": "x",
"Email": "x",
"PictureUri": "x",
"Roles": [
"User"
],
"ProfileImages": [],
"FederatedUserIds": {
"google": "x",
"twitter": "x",
"windowslive": "x",
"linkedin": "x",
"facebook": "100001103765630"
}
}
The strange thing is, this has never bothered me before in this piece of code. Can somebody see where I'm doing this wrong?
I was going to say that you might have stale results, but I see that "IsStale": false.
The only other thing I see is that the query on the URL comes through as FederatedUserIds.facebook while the field name is going to be FederatedUserIds_facebook in the index. However, I tested this and it worked, so it appears that the . is translated to _ before the query executes. I'm not sure when this was added or if it's always been that way.
Note if you try to query with . in Raven Studio, it doesn't work there, but _ does.
What build version are you running? I tested on 2.5.2666 and it worked for me.
Somehow, this was related to the indexes. Although it was a auto indexed query, deleting all the indexes on the server and rebooting the application solved the problem. I don't think this should have happened, but reproducing it seemed very difficult. If I can reproduce it in the feature, I'll try to investigate this further.
edit 01-09:
Found the problem and created a bug report: http://issues.hibernatingrhinos.com/issue/RavenDB-1334
Worked around it by creating a decent index:
public class UserByFederatedLoginIndex : AbstractIndexCreationTask<Core.Domain.User>
{
public UserByFederatedLoginIndex()
{
Map = users => from u in users
select new
{
u.DisplayName,
_ = u.FederatedUserIds.Select(x => CreateField("FederatedUserIds_"+x.Key, x.Value))
};
}
I am currently setting up SQL Replication to replicate our Raven DB documents into SQL for reporting purposes. So far everything has been working great. However, I am now trying to save a document that contains an array for days of the week.
This is how the document looks in Raven:
{
"ClientId": "clients/385",
"Description": "Test",
"IsOneOff": false,
"RecursEveryWeeks": 1,
"StartDate": "2013-03-19T00:00:00.0000000",
"TaskStartTime": "12:00:00",
"TaskDuration": 120,
"TaskEndTime": "14:00:00",
"AdditionalResources": false,
"AdditionalVisitType": "TestType",
"BillableTo": "Private",
"RecurrenceEndDate": "2013-04-30T00:00:00.0000000",
"DaysOfWeek": [
"Monday",
"Tuesday",
"Wednesday",
"Friday",
"Saturday"
]
}
In SQL Replication I have done the following:
sqlReplicate("AdditionalVisit", "AdditionalVisitId", {
ClientId: this.ClientId,
Description: this.Description,
IsOneOff: this.IsOneOff,
RecursEveryWeeks: this.RecursEveryWeeks,
StartDate: this.StartDate,
TaskStartTime: this.TaskStartTime,
TaskDuration: this.TaskDuration,
TaskEndTime: this.TaskEndTime,
AdditionalResources: this.AdditionalResources,
AdditionalVisitType: this.AdditionalVisitType,
BillableTo: this.BillableTo,
RecurrenceEndDate: this.RecurrenceEndDate,
DaysOfWeek: this.DaysOfWeek
});
All of this works fine when I leave DaysOfWeek out of the SQL Replication but causes the server to crash when I leave in.
How should this be done in SQL Replication so everything in the array is saved to a DaysOfWeek column in SQL?
I've not tested this, but it's along the lines of what you want...just add this to the end of your current script.
for (var i=0; i<this.DaysOfWeek.length; i++) {
var day = this.DaysOfWeek[i];
sqlReplicate('AdditionalVisit_DaysOfWeek', 'AdditionalVisitId', {
AdditionalVisitId: documentId,
DayOfWeek: day,
});
}
By the way there is currently a bug in the SqlReplication for RavenDb 2.1 where deletes won't be pushed through to SqlReplication, it's supposed to be fixed in 2.5 branch but there are still some other issues that need to be worked on for it to become usable.