How to programmatically list available Google BigQuery locations? - google-bigquery

How to programmatically list available Google BigQuery locations? I need a result similar to what is in the table of this page: https://cloud.google.com/bigquery/docs/locations.

As #shollyman has mentioned
The BigQuery API does not expose the equivalent of a list locations call at this time.
So, you should consider filing a feature request on the issue tracker.
Meantime, I wanted to add Option 3 to those two already proposed by #Tamir
This is a little naïve option with its pros and cons, but depends on your specific use case can be useful and easy adapted to your application
Step 1 - load page (https://cloud.google.com/bigquery/docs/locations) html
Step 2 - parse and extract needed info
Obviously, this is super simple to implement in any client of your choice
As I am huge BigQuery fan - I went through "prove of concept" using BigQuery Tool - Magnus
I've created workflow with just two Tasks:
API Task - to load page's HTML into variable var_payload
and
BigQuery Task - to parse and extract wanted info out of html
The "whole" workflow is as simple as it looks in below screenshot
The query I used in BigQuery Task is
CREATE TEMP FUNCTION decode(x STRING) RETURNS STRING
LANGUAGE js AS """
return he.decode(x);
"""
OPTIONS (library="gs://my_bucket/he.js");
WITH t AS (
SELECT html,
REGEXP_EXTRACT_ALL(
REGEXP_REPLACE(html,
r'\n|<strong>|</strong>|<code>|</code>', ''),
r'<table>(.*?)</table>'
)[OFFSET(0)] x
FROM (SELECT'''<var_payload>''' AS html)
)
SELECT pos,
line[SAFE_OFFSET(0)] Area,
line[SAFE_OFFSET(1)] Region_Name,
decode(line[SAFE_OFFSET(2)]) Region_Description
FROM (
SELECT
pos, REGEXP_EXTRACT_ALL(line, '<td>(.*?)</td>') line
FROM t,
UNNEST(REGEXP_EXTRACT_ALL(x, r'<tr>(.*?)</tr>')) line
WITH OFFSET pos
WHERE pos > 0
)
As you can see, i used he library. From its README:
he (for “HTML entities”) is a robust HTML entity encoder/decoder written in JavaScript. It supports all standardized named character references as per HTML, handles ambiguous ampersands and other edge cases just like a browser would ...
After workflow is executed and those two steps are done - result is in project.dataset.location_extraction and we can query this table to make sure we've got what is expected
Note: obviously parsing and extracting needed locations info is quite simplified and surely can be improved to be more flexible in terms of changing source page layout

Unfortunately, There is no API which provides BigQuery supported location list.
I see two options which might be good for you:
Option 1
You can manually manage a list and expose this list to your client via an API or any other means your application support (You will need to follow BigQuery product updates to follow on updates on this list)
Option 2
If your use case is to provide a list of the location you are using to store your own data you can call dataset.list to get a list of location and display/use it in your app
{
"kind": "bigquery#dataset",
"id": "id1",
"datasetReference": {
"datasetId": "datasetId",
"projectId": "projectId"
},
"location": "US"
}

Related

Feature and FeatureView versioning

my team is interested in a feature store solution that enables rapid experimentation of features, probably using feature versioning. In the Feast slack history, I found
#Benjamin Tan’s post that explains their feast workflow, and they explain FeatureView versioning:
insights_v1 = FeatureView(
features=[
Feature(name="insight_type", dtype=ValueType.STRING)
]
)
insights_v2 = FeatureView(
features=[
Feature(name="customer_id", dtype=ValueType.STRING)
Feature(name="insight_type", dtype=ValueType.STRING)
]
)
Is this the recommended best practice for FeatureView versioning? It looks like Features do not have a version field. Is there a recommended strategy for Feature versioning?
Creating a new column for each Feature version is one approach:
driver_rating_v1
driver_rating_v2
But that could get unwieldy if we want to experiment with dozens of permutations of the same Feature.
Featureform appears to have support for feature versions through the "variant" field, but their documentation is a bit unclear.
Adding additional clarity on Featureform: Variant is analogous to version. You'd supply a string which then becomes an immutable identifier for the version of the transformation, source, etc. Variant is one of the common metadata fields provided in the Featureform API.
Using the example of an ecommerce dataset & spark, here's an example of using the variant field to version a source (a parquet file in this case):
orders = spark.register_parquet_file(
name="orders",
variant="default",
description="This is the core dataset. From each order you might find all other information.",
file_path="path_to_file",
)
You can set the variant variable ahead of time:
VERSION="v1" # You can change this to rerun the definitions with with new variants
orders = spark.register_parquet_file(
name="orders",
variant=f"{VERSION}",
description="This is the core dataset. From each order you might find all other information.",
file_path="path_to_file",
)
And you can create versions or variants of the transformations -- here I'm taking a dataframe called total_paid_per_customer_per_day and aggregating it.
# Get average order value per day
#spark.df_transformation(inputs=[("total_paid_per_customer_per_day", "default")], variant="skeller88_20220110")
def average_daily_transaction(df):
from pyspark.sql.functions import mean
return df.groupBy("day_date").agg(mean("total_customer_order_paid").alias("average_order_value"))
There are some more details on the Featureform CLI here: https://docs.featureform.com/getting-started/interact-with-the-cli

How do you do pagination in GUN?

How do you do something like gun.get({startkey, endkey}) ?
Previously: https://github.com/amark/gun/issues/479
#qwe123wsx #sebastianmacias apologies for the delay! Originally posted at: https://github.com/amark/gun/issues/479
The wire spec has a protocol for this but it isn't implemented yet. It looks something like this:
gun.on('out', {get: {'#': {'>': 'a', '<': 'b'}}});
However this doesn't work yet. I would recommend instead:
(1) Pagination behavior is very different from one app to another and will be hard for us to create a "one-size-fits-all" solution, so it would be highly helpful if you could implement your own* pagination and make it available as a user-module, then we can learn from your experience (what worked, what didn't) and make the best solution part of core.
(2) Your app will probably work fine without pagination in the meanwhile, while it can be built (it is targeted for after 1.0), and then as your app becomes more popular, it should be fairly easy to add in without much refactor, once you need it and it is available.
... * How to build your own?
Lots of good articles on this, best one I've seen yet is from Neo4j on how to do it in a graph database (which applies to gun as well) https://graphaware.com/neo4j/2014/08/20/graphaware-neo4j-timetree.html .
Another rough idea is you model your data based on pagination or time. So rather than having ALL tweets go into user's tweet table, instead, the user's tweet table is a table of DAYS (or weeks), and then you put the tweet inside the week table. Now when you load the data, you can scan/skip based off of week very easily while it being super bandwidth efficient.
Rough PSEUDO code:
function onTweetSend(tweet){
gun.get('user').get('alice').get('tweets').get(Date.uniqueYear() + Date.uniqueWeek()).set(tweet)
}
function paginateUserTweet(howMany, cb){
var range = convertToArrayOfUniqueWeekNamesFromToday(howMany);
var all = [];
range.forEach(function(week){
gun.get('user').get('alice').get('tweets').get(week).load(function(tweets){
all.push(tweets);
if(all.length < range.length){ return }
all = flattenArray(all);
cb(all);
});
});
}
Now we can use https://gun.eco/docs/RAD#lex
gun.get(...).get({'.': {'>': startkey, '<': endkey}, '%': 50000}).map().once(...)

Extract incident details from Service Now in Excel

I am trying to extract ticket details from Service Now. Is there a way to extract the details without ODBC ? I have also tried the solution mentioned in [1]: https://community.servicenow.com/docs/DOC-3844, but I am receiving an error 9 -subscript out of range.
Is there a better way to extract details efficiently? I tried asking this in the service now forum but I thought I might get other opinions from here.
It's been a while since this question is asked. Hopefully following is still useful.
I am extracting change data (not incident) , but the process still should be same. You will need to gather incident table and column information. Then there are couple of ways to approach the problem.
1) If the data you are extracting has fixed parameters , such as fixed period or fixed column or group etc., then you can create a report within servicenow and then use REST/SOAP API to get the data in text/csv format. You can use different python modules to convert from csv to xls or xlsx depending on you need. I used openpyXL ,csv , xlsreader ,xlswriter etc.
See here for a example
ServiceNow - How to use SOAP to download reports
2) If the data has dynmaic parameters where you need to change columns, dates or filter etc, you can still use soap / REST API but form query within python scripts instead of having static report. This way you can change it based on your requirement on the fly.
Here is an example query for DB. you can use example for above. Just switch url with following.
table_name = 'u_change_table_name' #SN DB holding change/INCIDENT info
table_limit = 800
table_query = 'active=true&sysparm_display_value=true&planned_start_date=today'
date_query = 'chg_start_date>=javascript:gs.daysAgoStart(1)^active=true^chg_type=normal'
table_fields = 'chg_number,chg_start_date,chg_duration,chg_end_date' #Actual column names from DB and not from SN report.
url= (
'https://yourcompany.service-now.com/api/now/table/' +table_name +\
'?sysparm_query=' + date_query + '&sysparm_fields=' \
+ table_fields + '&sysparm_limit=' + str(table_limit)
)

Should paging be zero indexed within an API?

When implementing a Rest API, with parameters for paging, should paging be zero indexed or start at 1. The parameters would be Page, and PageSize.
For me, it makes sense to start at 1, since we are talking about pages
There's no standard for it. Just have a look around: there are hundreds of thousands of APIs using different approaches.
Most of APIs I know use one of the following approaches for pagination:
offset and limit or
page and size
Both can be 0 or 1 indexed. Which is better? That's up to you.
Just pick the one that fits your needs and document it properly.
Additionally, you could provide some links in the response payload to make the navigation easier between the pages.
Consider, for example, you are reading data from page 2. So, provide a link for the previous page (page 1) and for the next page (page 3):
{
"data": [
...
],
"paging": {
"previous": "http://api.example.com/foo?page=1&size=10",
"next": "http://api.example.com/foo?page=3&size=10"
}
}
And remember, always make an API you would love to use.
True, there's no standard for this.
I find that Microsoft based products used (old ones like DAO for Visual Basic 6, Visual C++ 6 and similar products) to start their pagination from 1, but a lot of other tech stacks uses 0. Gradually I find that more and more libraries are using 0 instead of 1.
Why is this? It's because, mathematically speaking, it's easier to map pageIndex starting from 0 to rowNumber in DB or Array. Suppose you have a dataset fetched from a Table in DB with 100 records. Now you want to send the second page (pageSize = 10 for example). With pageIndex starting from 0, then you only need to write
startRowNumber = pageIndex * pageSize;
return dataSet[startRowNumber, startRowNumber + pageSize]
Because in most DBs and languages, arrays/lists are 0-indexed. And even if your Rest API language uses a 1-indexed array, you would still have a problem when mapping a 1-indexed pageIndex to recordIds. For example: Suppose you have a dataset indexed 1..100 (not 0..99), and you want to send the 11th to 20th records, as the second page (here pageSize=10 and pageIndex=2, because in your case you start with 1). This means you need use the formula
((pageIndex - 1) * pageSize) + 1 ; // to get the number 11.
You see that it's easier to have a 0-indexed paging for developers.
1-indexed pagination makes more sense to human users, because we start with 1 when counting everything.

Endeca UrlENEQuery java API search

I'm currently trying to create an Endeca query using the Java API for a URLENEQuery. The current query is:
collection()/record[CONTACT_ID = "xxxxx" and SALES_OFFICE = "yyyy"]
I need it to be:
collection()/record[(CONTACT_ID = "xxxxx" or CONTACT_ID = "zzzzz") and
SALES_OFFICE = "yyyy"]
Currently this is being done with an ERecSearchList with CONTACT_ID and the string I'm trying to match in an ERecSearch object, but I'm having difficulty figuring out how to get the UrlENEQuery to generate the or in the correct fashion as I have above. Does anyone know how I can do this?
One of us is confused on multiple levels:
Let me try to explain why I am confused:
If Contact_ID and Sales_Office are different dimensions, where Contact_ID is a multi-or dimension, then you don't need to use EQL (the xpath like language) to do anything. Just select the appropriate dimension values and your navigation state will reflect the query you are trying to build with XPATH. IE CONTACT_IDs "ORed together" with SALES_OFFICE "ANDed".
If you do have to use EQL, then the only way to modify it (provided that you have to modify it from the returned results) is via string manipulation.
ERecSearchList gives you ability to use "Search Within" functionality which functions completely different from the EQL filtering, though you can achieve similar results by using tricks like searching only specified field (which would be separate from the generic search interface") I am still not sure what's the connection between ERecSearchList and the EQL expression above?
Having expressed my confusion, I think what you need to do is to use String manipulation to dynamically build the EQL expression and add it to the Query.
A code example of what you are doing would be extremely helpful as well.