Is the Mapbox access token still required for Kepler.gl? - kepler.gl

we plan to integrate Kepler.gl into our open source analytics platform which will be used by many users. We have successfully integrated the library into our code and are able to visual beautiful maps with our data. The library is awesome and we want to use it but aren't sure about the Mapbox integration. Our code works without setting any Mapbox access token. Is the token no longer required if one uses the Kepler.gl with default settings? Are there any limitations in using it without a token e.g. will no longer work if more than 100 users use it?
This is the Kepler.gl related code that we used:
map_1 = KeplerGl(show_docs=False)
map_1.add_data(data=gdf.copy(), name="state")
config = {}
if self.save_config:
# Save map_1 config to a file
# config_str = json.dumps(map_1.config)
# if type(config) == str:
# config = config.encode("utf-8")
with open("kepler_config.json", "w") as f:
f.write(json.dumps(map_1.config))
if self.load_config:
with open("kepler_config.json", "r") as f:
config = json.loads(f.read())
map_1.config = config
# map_1.add_data(data=data.copy(), name="haha")
html = map_1._repr_html_()
html = html.decode("utf-8")
Thanks for your help
Tobias

Related

Why does urllib3 fail to download list of files if authentication is required <and> if headers aren't re-created

*NOTE: I'm posting this for two reasons:
It took me maybe 3-4hrs to figure out a solution (this is my first urllib3 project) and hopefully this will help others who run into this.
I'm curious why urllib3 behaves as described below, as it is (to me anyway) very un-intuitive.*
I'm using urllib3 to first load a list of files and then to download the files that are on the list. The server the files are on requires authentication.
The behavior I ran into is that if I don't re-make the headers before adding each file to the PoolManager, only the first file downloads correctly. The contents of all subsequent files is an error message from the server saying that authentication failed.
However, if I add a line that regenerates the headers (see the commented line in the code snippet below) the download works as expected. Is this intended behavior and if so can anyone explain why the headers can't be re-used (all they contain is my username/password, which doesn't change).
http = urllib3.PoolManager(num_pools=10,maxsize = 10, block=True)
myHeaders = urllib3.util.make_headers(basic_auth=f'{username}:{password}')
files = http.request('GET', url, headers=myHeaders)
file_list = files.data.decode('utf-8')
file_list = file_list.split('<a href="')
file_list_a = [file.split('">')[0] for file in file_list if file.startswith('https://')]
for path in tqdm.tqdm(file_list,desc = 'Downloading'):
output_fn = get_output_filename(path,output_dir)
#___---^^^Re-make headers^^^---___
myHeaders = urllib3.util.make_headers(basic_auth=f'{username}:{password}')
with open(output_fn, 'wb') as out:
r = http.request('GET', path, headers=myHeaders, preload_content=False)
shutil.copyfileobj(r, out)
Thanks in advance,

Is there a way to automate this Python script in GCP?

I am a complete beginner in using GCP functions/products.
I have written the following code below, that takes a list of cities from a local folder, and call in weather data for each city in that list, eventually uploading those weather values into a table in BigQuery. I don't need to change the code anymore, as it creates new tables when a new week begins, now I would want to "deploy" (I am not even sure if this is called deploying a code) in the cloud for it to automatically run there. I tried using App Engine and Cloud Functions but faced issues in both places.
import requests, json, sqlite3, os, csv, datetime, re
from google.cloud import bigquery
#from google.cloud import storage
list_city = []
with open("list_of_cities.txt", "r") as pointer:
for line in pointer:
list_city.append(line.strip())
API_key = "PLACEHOLDER"
Base_URL = "http://api.weatherapi.com/v1/history.json?key="
yday = datetime.date.today() - datetime.timedelta(days = 1)
Date = yday.strftime("%Y-%m-%d")
table_id = f"sonic-cat-315013.weather_data.Historical_Weather_{yday.isocalendar()[0]}_{yday.isocalendar()[1]}"
credentials_path = r"PATH_TO_JSON_FILE"
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = credentials_path
client = bigquery.Client()
try:
schema = [
bigquery.SchemaField("city", "STRING", mode="REQUIRED"),
bigquery.SchemaField("Date", "Date", mode="REQUIRED"),
bigquery.SchemaField("Hour", "INTEGER", mode="REQUIRED"),
bigquery.SchemaField("Temperature", "FLOAT", mode="REQUIRED"),
bigquery.SchemaField("Humidity", "FLOAT", mode="REQUIRED"),
bigquery.SchemaField("Condition", "STRING", mode="REQUIRED"),
bigquery.SchemaField("Chance_of_rain", "FLOAT", mode="REQUIRED"),
bigquery.SchemaField("Precipitation_mm", "FLOAT", mode="REQUIRED"),
bigquery.SchemaField("Cloud_coverage", "INTEGER", mode="REQUIRED"),
bigquery.SchemaField("Visibility_km", "FLOAT", mode="REQUIRED")
]
table = bigquery.Table(table_id, schema=schema)
table.time_partitioning = bigquery.TimePartitioning(
type_=bigquery.TimePartitioningType.DAY,
field="Date", # name of column to use for partitioning
)
table = client.create_table(table) # Make an API request.
print(
"Created table {}.{}.{}".format(table.project, table.dataset_id, table.table_id)
)
except:
print("Table {}_{} already exists".format(yday.isocalendar()[0], yday.isocalendar()[1]))
def get_weather():
try:
x["location"]
except:
print(f"API could not call city {city_name}")
global day, time, dailytemp, dailyhum, dailycond, chance_rain, Precipitation, Cloud_coverage, Visibility_km
day = []
time = []
dailytemp = []
dailyhum = []
dailycond = []
chance_rain = []
Precipitation = []
Cloud_coverage = []
Visibility_km = []
for i in range(24):
dayval = re.search("^\S*\s" ,x["forecast"]["forecastday"][0]["hour"][i]["time"])
timeval = re.search("\s(.*)" ,x["forecast"]["forecastday"][0]["hour"][i]["time"])
day.append(dayval.group()[:-1])
time.append(timeval.group()[1:])
dailytemp.append(x["forecast"]["forecastday"][0]["hour"][i]["temp_c"])
dailyhum.append(x["forecast"]["forecastday"][0]["hour"][i]["humidity"])
dailycond.append(x["forecast"]["forecastday"][0]["hour"][i]["condition"]["text"])
chance_rain.append(x["forecast"]["forecastday"][0]["hour"][i]["chance_of_rain"])
Precipitation.append(x["forecast"]["forecastday"][0]["hour"][i]["precip_mm"])
Cloud_coverage.append(x["forecast"]["forecastday"][0]["hour"][i]["cloud"])
Visibility_km.append(x["forecast"]["forecastday"][0]["hour"][i]["vis_km"])
for i in range(len(time)):
time[i] = int(time[i][:2])
def main():
i = 0
while i < len(list_city):
try:
global city_name
city_name = list_city[i]
complete_URL = Base_URL + API_key + "&q=" + city_name + "&dt=" + Date
response = requests.get(complete_URL, timeout = 10)
global x
x = response.json()
get_weather()
table = client.get_table(table_id)
varlist = []
for j in range(24):
variables = city_name, day[j], time[j], dailytemp[j], dailyhum[j], dailycond[j], chance_rain[j], Precipitation[j], Cloud_coverage[j], Visibility_km[j]
varlist.append(variables)
client.insert_rows(table, varlist)
print(f"City {city_name}, ({i+1} out of {len(list_city)}) successfully inserted")
i += 1
except Exception as e:
print(e)
continue
In the code, there is direct reference to two files that is located locally, one is the list of cities and the other is the JSON file containing the credentials to access my project in GCP. I believed that uploading these files in Cloud Storage and referencing them there won't be an issue, but then I realised that I can't actually access my Buckets in Cloud Storage without using the credential files.
This leads me to being unsure whether the entire process would be possible at all, how do I authenticate in the first place from the cloud, if I need to reference that first locally? Seems like an endless circle, where I'd authenticate from the file in Cloud Storage, but I'd need authentication first to access that file.
I'd really appreciate some help here, I have no idea where to go from this, and I also don't have great knowledge in SE/CS, I only know Python R and SQL.
For Cloud Functions, the deployed function will run with the project service account credentials by default, without needing a separate credentials file. Just make sure this service account is granted access to whatever resources it will be trying to access.
You can read more info about this approach here (along with options for using a different service account if you desire): https://cloud.google.com/functions/docs/securing/function-identity
This approach is very easy, and keeps you from having to deal with a credentials file at all on the server. Note that you should remove the os.environ line, as it's unneeded. The BigQuery client will use the default credentials as noted above.
If you want the code to run the same whether on your local machine or deployed to the cloud, simply set a "GOOGLE_APPLICATION_CREDENTIALS" environment variable permanently in the OS on your machine. This is similar to what you're doing in the code you posted; however, you're temporarily setting it every time using os.environ rather than permanently setting the environment variable on your machine. The os.environ call only sets that environment variable for that one process execution.
If for some reason you don't want to use the default service account approach outlined above, you can instead directly reference it when you instantiate the bigquery.Client()
https://cloud.google.com/bigquery/docs/authentication/service-account-file
You just need to package the credential file with your code (i.e. in the same folder as your main.py file), and deploy it alongside so it's in the execution environment. In that case, it is referenceable/loadable from your script without needing any special permissions or credentials. Just provide the relative path to the file (i.e. assuming you have it in the same directory as your python script, just reference only the filename)
There may be different flavors and options to deploy your application and these will depend on your application semantics and execution constraints.
It will be too hard to cover all of them and the official Google Cloud Platform documentation cover all of them in great details:
Google Compute Engine
Google Kubernetes Engine
Google App Engine
Google Cloud Functions
Google Cloud Run
Based on my understanding of your application design, the most suitable ones would be:
Google App Engine
Google Cloud Functions
Google Cloud Run: Check these criteria to see if you application is a good fit for this deployment style
I would suggest using Cloud Functions as you deployment option in which case your application will default to using the project App Engine service account to authenticate itself and perform allowed actions. Hence, you should only check if the default account PROJECT_ID#appspot.gserviceaccount.com under the IAM configuration section has proper access to needed APIs (BigQuery in your case).
In such a setup, you want need to push your service account key to Cloud Storage which I would recommend to avoid in either cases, and you want need to pull it either as the runtime will handle authentication the function for you.

Splunk dashboard and reports source backup for versioning

We are trying to take a backup of Splunk dashboards and reports source code for versioning. we are on an enterprise implementation where our rest calls are restricted. we can create and access dashboards and reports via Slunk UI, but would like know if we can automatically take backup of them and store in our versioning system.
Automated versioning will be quite a challenge without REST access. I assume you don't have CLI access or you wouldn't be asking.
There are apps available to do this for you. See https://splunkbase.splunk.com/app/4355/ and https://splunkbase.splunk.com/app/4182/ .
There's also a .conf presentation on the topic. See https://conf.splunk.com/files/2019/slides/FN1315.pdf
For the time being, I have written a Python script to read/intercept the UI inspet -> performance -> network responses of Splunk's reports URL which lists the complete set of reports under my app and as well as its full details.
from time import sleep
import json
from selenium import webdriver
from selenium.webdriver import DesiredCapabilities
# make chrome log requests
capabilities = DesiredCapabilities.CHROME
capabilities["goog:loggingPrefs"] = {"performance": "ALL"} # newer: goog:loggingPrefs
driver = webdriver.Chrome(
desired_capabilities=capabilities, executable_path="/Users/username/Downloads/chromedriver_92"
)
spl_reports_url="https://prod.aws-cloud-splunk.com/en-US/app/sre_app/reports"
driver.get(spl_reports_url)
sleep(5) # wait for the requests to take place
# extract requests from logs
logs_raw = driver.get_log("performance")
logs = [json.loads(lr["message"])["message"] for lr in logs_raw]
# create directory to save all reports as .json files
from pathlib import Path
main_bkp_folder='splunk_prod_reports'
# Create a main directory in which all dashboards will be downloaded to
Path(f"./{main_bkp_folder}").mkdir(parents=True, exist_ok=True)
# Function to write json content to file
def write_json_to_file(filenamewithpath,json_source):
with open(filenamewithpath, 'w') as jsonfileobj:
json_string = json.dumps(json_source, indent=4)
jsonfileobj.write(json_string)
def log_filter(log_):
return (
# is an actual response
log_["method"] == "Network.responseReceived"
# and json
and "json" in log_["params"]["response"]["mimeType"]
)
counter = 0
# extract Network entries from each log event
for log in filter(log_filter, logs):
#print(log)
request_id = log["params"]["requestId"]
resp_url = log["params"]["response"]["url"]
# print only results_preview
if "searches" in resp_url:
print(f"Caught {resp_url}")
counter = counter + 1
nw_resp_body = json.loads(driver.execute_cdp_cmd("Network.getResponseBody", {"requestId": request_id})['body'])
for each_report in nw_resp_body["entry"]:
report_name = each_report['name']
print(f"Extracting report source for {report_name}")
report_filename = f"./{main_bkp_folder}/{report_name.replace(' ','_')}.json"
write_json_to_file(report_filename,each_report)
print(f"Completed.")
print("All reports source code exported successfully.")
The above code is far from production version as yet to add error handling, logging, and modularization.
Also, to note that the above script uses browser UI, in production the scrip will be run in a docker image with ChromeOptions to use headless mode.
Instead of
driver = webdriver.Chrome(
desired_capabilities=capabilities, executable_path="/Users/username/Downloads/chromedriver_92"
)
Use:
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--window-size=1420,2080')
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
driver = webdriver.Chrome(
desired_capabilities=capabilities,options=chrome_options, executable_path="/Users/username/Downloads/chromedriver_92"
)
Here you can customize as per your need.

Ldap authentication with MoinMoin doesn't work

I'm trying to connect MoinMoin with my ldap server, however it doesn't work. Am I doing the setting in a proper way?
I'm using MoinMoin from the Ubuntu's repository.
Here I show you my farmconfig.py:
from farmconfig import FarmConfig
# now we subclass that config (inherit from it) and change what's different:
class Config(FarmConfig):
# basic options (you normally need to change these)
sitename = u'MyWiki' # [Unicode]
interwikiname = u'MyWiki' # [Unicode]
# name of entry page / front page [Unicode], choose one of those:
# a) if most wiki content is in a single language
#page_front_page = u"MyStartingPage"
# b) if wiki content is maintained in many languages
page_front_page = u"FrontPage"
data_dir = '/usr/share/moin/data'
data_underlay_dir = '/usr/share/moin/underlay'
from MoinMoin.auth.ldap_login import LDAPAuth
ldap_authenticator1 = LDAPAuth(
server_uri='ldap://192.168.1.196',
bind_dn='cn=admin,ou=People,dc=company,dc=com',
bind_pw='secret',
scope=2,
referrals=0,
search_filter='(uid=%(username)s)',
givenname_attribute='givenName',
surname_attribute='sn',
aliasname_attribute='displayName',
email_attribute='mailRoutingAddress',
email_callback=None,
coding='utf-8',
timeout=10,
start_tls=0,
tls_cacertdir=None,
tls_cacertfile=None,
tls_certfile=None,
tls_keyfile=None,
tls_require_cert=0,
bind_once=True,
autocreate=True,
)
auth = [ldap_authenticator1, ]
cookie_lifetime = 1
This is an indentation issue, auth and cookie_lifetime must be within class Config (so just indent all that by 4 spaces).

Locally calculate dropbox hash of files

Dropbox rest api, in function metatada has a parameter named "hash" https://www.dropbox.com/developers/reference/api#metadata
Can I calculate this hash locally without call any remote api rest function?
I need know this value to reduce upload bandwidth.
https://www.dropbox.com/developers/reference/content-hash explains how Dropbox computes their file hashes. A Python implementation of this is below:
import hashlib
import math
import os
DROPBOX_HASH_CHUNK_SIZE = 4*1024*1024
def compute_dropbox_hash(filename):
file_size = os.stat(filename).st_size
with open(filename, 'rb') as f:
block_hashes = b''
while True:
chunk = f.read(DROPBOX_HASH_CHUNK_SIZE)
if not chunk:
break
block_hashes += hashlib.sha256(chunk).digest()
return hashlib.sha256(block_hashes).hexdigest()
The "hash" parameter on the metadata call isn't actually the hash of the file, but a hash of the metadata. It's purpose is to save you having to re-download the metadata in your request if it hasn't changed by supplying it during the metadata request. It is not intended to be used as a file hash.
Unfortunately I don't see any way via the Dropbox API to get a hash of the file itself. I think your best bet for reducing your upload bandwidth would be to keep track of the hash's of your files locally and detect if they have changed when determining whether to upload them. Depending on your system you also likely want to keep track of the "rev" (revision) value returned on the metadata request so you can tell whether the version on Dropbox itself has changed.
This won't directly answer your question, but is meant more as a workaround; The dropbox sdk gives a simple updown.py example that uses file size and modification time to check the currency of a file.
an abbreviated example taken from updown.py:
dbx = dropbox.Dropbox(api_token)
...
# returns a dictionary of name: FileMetaData
listing = list_folder(dbx, folder, subfolder)
# name is the name of the file
md = listing[name]
# fullname is the path of the local file
mtime = os.path.getmtime(fullname)
mtime_dt = datetime.datetime(*time.gmtime(mtime)[:6])
size = os.path.getsize(fullname)
if (isinstance(md, dropbox.files.FileMetadata) and mtime_dt == md.client_modified and size == md.size):
print(name, 'is already synced [stats match]')
As far as I am concerned, No you can't.
The only way is using Dropbox API which is explained here.
The rclone go program from https://rclone.org has exactly what you want:
rclone hashsum dropbox localfile
rclone hashsum dropbox localdir
It can't take more than one path argument but I suspect that's something you can work with...
t0|todd#tlaptop/p8 ~/tmp|295$ echo "Hello, World!" > dropbox-hash-demo/hello.txt
t0|todd#tlaptop/p8 ~/tmp|296$ rclone copy dropbox-hash-demo/hello.txt dropbox-ttf:demo
t0|todd#tlaptop/p8 ~/tmp|297$ rclone hashsum dropbox dropbox-hash-demo
aa4aeabf82d0f32ed81807b2ddbb48e6d3bf58c7598a835651895e5ecb282e77 hello.txt
t0|todd#tlaptop/p8 ~/tmp|298$ rclone hashsum dropbox dropbox-ttf:demo
aa4aeabf82d0f32ed81807b2ddbb48e6d3bf58c7598a835651895e5ecb282e77 hello.txt