Splunk dashboard and reports source backup for versioning - splunk

We are trying to take a backup of Splunk dashboards and reports source code for versioning. we are on an enterprise implementation where our rest calls are restricted. we can create and access dashboards and reports via Slunk UI, but would like know if we can automatically take backup of them and store in our versioning system.

Automated versioning will be quite a challenge without REST access. I assume you don't have CLI access or you wouldn't be asking.
There are apps available to do this for you. See https://splunkbase.splunk.com/app/4355/ and https://splunkbase.splunk.com/app/4182/ .
There's also a .conf presentation on the topic. See https://conf.splunk.com/files/2019/slides/FN1315.pdf

For the time being, I have written a Python script to read/intercept the UI inspet -> performance -> network responses of Splunk's reports URL which lists the complete set of reports under my app and as well as its full details.
from time import sleep
import json
from selenium import webdriver
from selenium.webdriver import DesiredCapabilities
# make chrome log requests
capabilities = DesiredCapabilities.CHROME
capabilities["goog:loggingPrefs"] = {"performance": "ALL"} # newer: goog:loggingPrefs
driver = webdriver.Chrome(
desired_capabilities=capabilities, executable_path="/Users/username/Downloads/chromedriver_92"
)
spl_reports_url="https://prod.aws-cloud-splunk.com/en-US/app/sre_app/reports"
driver.get(spl_reports_url)
sleep(5) # wait for the requests to take place
# extract requests from logs
logs_raw = driver.get_log("performance")
logs = [json.loads(lr["message"])["message"] for lr in logs_raw]
# create directory to save all reports as .json files
from pathlib import Path
main_bkp_folder='splunk_prod_reports'
# Create a main directory in which all dashboards will be downloaded to
Path(f"./{main_bkp_folder}").mkdir(parents=True, exist_ok=True)
# Function to write json content to file
def write_json_to_file(filenamewithpath,json_source):
with open(filenamewithpath, 'w') as jsonfileobj:
json_string = json.dumps(json_source, indent=4)
jsonfileobj.write(json_string)
def log_filter(log_):
return (
# is an actual response
log_["method"] == "Network.responseReceived"
# and json
and "json" in log_["params"]["response"]["mimeType"]
)
counter = 0
# extract Network entries from each log event
for log in filter(log_filter, logs):
#print(log)
request_id = log["params"]["requestId"]
resp_url = log["params"]["response"]["url"]
# print only results_preview
if "searches" in resp_url:
print(f"Caught {resp_url}")
counter = counter + 1
nw_resp_body = json.loads(driver.execute_cdp_cmd("Network.getResponseBody", {"requestId": request_id})['body'])
for each_report in nw_resp_body["entry"]:
report_name = each_report['name']
print(f"Extracting report source for {report_name}")
report_filename = f"./{main_bkp_folder}/{report_name.replace(' ','_')}.json"
write_json_to_file(report_filename,each_report)
print(f"Completed.")
print("All reports source code exported successfully.")
The above code is far from production version as yet to add error handling, logging, and modularization.
Also, to note that the above script uses browser UI, in production the scrip will be run in a docker image with ChromeOptions to use headless mode.
Instead of
driver = webdriver.Chrome(
desired_capabilities=capabilities, executable_path="/Users/username/Downloads/chromedriver_92"
)
Use:
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--window-size=1420,2080')
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
driver = webdriver.Chrome(
desired_capabilities=capabilities,options=chrome_options, executable_path="/Users/username/Downloads/chromedriver_92"
)
Here you can customize as per your need.

Related

Is the Mapbox access token still required for Kepler.gl?

we plan to integrate Kepler.gl into our open source analytics platform which will be used by many users. We have successfully integrated the library into our code and are able to visual beautiful maps with our data. The library is awesome and we want to use it but aren't sure about the Mapbox integration. Our code works without setting any Mapbox access token. Is the token no longer required if one uses the Kepler.gl with default settings? Are there any limitations in using it without a token e.g. will no longer work if more than 100 users use it?
This is the Kepler.gl related code that we used:
map_1 = KeplerGl(show_docs=False)
map_1.add_data(data=gdf.copy(), name="state")
config = {}
if self.save_config:
# Save map_1 config to a file
# config_str = json.dumps(map_1.config)
# if type(config) == str:
# config = config.encode("utf-8")
with open("kepler_config.json", "w") as f:
f.write(json.dumps(map_1.config))
if self.load_config:
with open("kepler_config.json", "r") as f:
config = json.loads(f.read())
map_1.config = config
# map_1.add_data(data=data.copy(), name="haha")
html = map_1._repr_html_()
html = html.decode("utf-8")
Thanks for your help
Tobias

Is there a way to automate this Python script in GCP?

I am a complete beginner in using GCP functions/products.
I have written the following code below, that takes a list of cities from a local folder, and call in weather data for each city in that list, eventually uploading those weather values into a table in BigQuery. I don't need to change the code anymore, as it creates new tables when a new week begins, now I would want to "deploy" (I am not even sure if this is called deploying a code) in the cloud for it to automatically run there. I tried using App Engine and Cloud Functions but faced issues in both places.
import requests, json, sqlite3, os, csv, datetime, re
from google.cloud import bigquery
#from google.cloud import storage
list_city = []
with open("list_of_cities.txt", "r") as pointer:
for line in pointer:
list_city.append(line.strip())
API_key = "PLACEHOLDER"
Base_URL = "http://api.weatherapi.com/v1/history.json?key="
yday = datetime.date.today() - datetime.timedelta(days = 1)
Date = yday.strftime("%Y-%m-%d")
table_id = f"sonic-cat-315013.weather_data.Historical_Weather_{yday.isocalendar()[0]}_{yday.isocalendar()[1]}"
credentials_path = r"PATH_TO_JSON_FILE"
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = credentials_path
client = bigquery.Client()
try:
schema = [
bigquery.SchemaField("city", "STRING", mode="REQUIRED"),
bigquery.SchemaField("Date", "Date", mode="REQUIRED"),
bigquery.SchemaField("Hour", "INTEGER", mode="REQUIRED"),
bigquery.SchemaField("Temperature", "FLOAT", mode="REQUIRED"),
bigquery.SchemaField("Humidity", "FLOAT", mode="REQUIRED"),
bigquery.SchemaField("Condition", "STRING", mode="REQUIRED"),
bigquery.SchemaField("Chance_of_rain", "FLOAT", mode="REQUIRED"),
bigquery.SchemaField("Precipitation_mm", "FLOAT", mode="REQUIRED"),
bigquery.SchemaField("Cloud_coverage", "INTEGER", mode="REQUIRED"),
bigquery.SchemaField("Visibility_km", "FLOAT", mode="REQUIRED")
]
table = bigquery.Table(table_id, schema=schema)
table.time_partitioning = bigquery.TimePartitioning(
type_=bigquery.TimePartitioningType.DAY,
field="Date", # name of column to use for partitioning
)
table = client.create_table(table) # Make an API request.
print(
"Created table {}.{}.{}".format(table.project, table.dataset_id, table.table_id)
)
except:
print("Table {}_{} already exists".format(yday.isocalendar()[0], yday.isocalendar()[1]))
def get_weather():
try:
x["location"]
except:
print(f"API could not call city {city_name}")
global day, time, dailytemp, dailyhum, dailycond, chance_rain, Precipitation, Cloud_coverage, Visibility_km
day = []
time = []
dailytemp = []
dailyhum = []
dailycond = []
chance_rain = []
Precipitation = []
Cloud_coverage = []
Visibility_km = []
for i in range(24):
dayval = re.search("^\S*\s" ,x["forecast"]["forecastday"][0]["hour"][i]["time"])
timeval = re.search("\s(.*)" ,x["forecast"]["forecastday"][0]["hour"][i]["time"])
day.append(dayval.group()[:-1])
time.append(timeval.group()[1:])
dailytemp.append(x["forecast"]["forecastday"][0]["hour"][i]["temp_c"])
dailyhum.append(x["forecast"]["forecastday"][0]["hour"][i]["humidity"])
dailycond.append(x["forecast"]["forecastday"][0]["hour"][i]["condition"]["text"])
chance_rain.append(x["forecast"]["forecastday"][0]["hour"][i]["chance_of_rain"])
Precipitation.append(x["forecast"]["forecastday"][0]["hour"][i]["precip_mm"])
Cloud_coverage.append(x["forecast"]["forecastday"][0]["hour"][i]["cloud"])
Visibility_km.append(x["forecast"]["forecastday"][0]["hour"][i]["vis_km"])
for i in range(len(time)):
time[i] = int(time[i][:2])
def main():
i = 0
while i < len(list_city):
try:
global city_name
city_name = list_city[i]
complete_URL = Base_URL + API_key + "&q=" + city_name + "&dt=" + Date
response = requests.get(complete_URL, timeout = 10)
global x
x = response.json()
get_weather()
table = client.get_table(table_id)
varlist = []
for j in range(24):
variables = city_name, day[j], time[j], dailytemp[j], dailyhum[j], dailycond[j], chance_rain[j], Precipitation[j], Cloud_coverage[j], Visibility_km[j]
varlist.append(variables)
client.insert_rows(table, varlist)
print(f"City {city_name}, ({i+1} out of {len(list_city)}) successfully inserted")
i += 1
except Exception as e:
print(e)
continue
In the code, there is direct reference to two files that is located locally, one is the list of cities and the other is the JSON file containing the credentials to access my project in GCP. I believed that uploading these files in Cloud Storage and referencing them there won't be an issue, but then I realised that I can't actually access my Buckets in Cloud Storage without using the credential files.
This leads me to being unsure whether the entire process would be possible at all, how do I authenticate in the first place from the cloud, if I need to reference that first locally? Seems like an endless circle, where I'd authenticate from the file in Cloud Storage, but I'd need authentication first to access that file.
I'd really appreciate some help here, I have no idea where to go from this, and I also don't have great knowledge in SE/CS, I only know Python R and SQL.
For Cloud Functions, the deployed function will run with the project service account credentials by default, without needing a separate credentials file. Just make sure this service account is granted access to whatever resources it will be trying to access.
You can read more info about this approach here (along with options for using a different service account if you desire): https://cloud.google.com/functions/docs/securing/function-identity
This approach is very easy, and keeps you from having to deal with a credentials file at all on the server. Note that you should remove the os.environ line, as it's unneeded. The BigQuery client will use the default credentials as noted above.
If you want the code to run the same whether on your local machine or deployed to the cloud, simply set a "GOOGLE_APPLICATION_CREDENTIALS" environment variable permanently in the OS on your machine. This is similar to what you're doing in the code you posted; however, you're temporarily setting it every time using os.environ rather than permanently setting the environment variable on your machine. The os.environ call only sets that environment variable for that one process execution.
If for some reason you don't want to use the default service account approach outlined above, you can instead directly reference it when you instantiate the bigquery.Client()
https://cloud.google.com/bigquery/docs/authentication/service-account-file
You just need to package the credential file with your code (i.e. in the same folder as your main.py file), and deploy it alongside so it's in the execution environment. In that case, it is referenceable/loadable from your script without needing any special permissions or credentials. Just provide the relative path to the file (i.e. assuming you have it in the same directory as your python script, just reference only the filename)
There may be different flavors and options to deploy your application and these will depend on your application semantics and execution constraints.
It will be too hard to cover all of them and the official Google Cloud Platform documentation cover all of them in great details:
Google Compute Engine
Google Kubernetes Engine
Google App Engine
Google Cloud Functions
Google Cloud Run
Based on my understanding of your application design, the most suitable ones would be:
Google App Engine
Google Cloud Functions
Google Cloud Run: Check these criteria to see if you application is a good fit for this deployment style
I would suggest using Cloud Functions as you deployment option in which case your application will default to using the project App Engine service account to authenticate itself and perform allowed actions. Hence, you should only check if the default account PROJECT_ID#appspot.gserviceaccount.com under the IAM configuration section has proper access to needed APIs (BigQuery in your case).
In such a setup, you want need to push your service account key to Cloud Storage which I would recommend to avoid in either cases, and you want need to pull it either as the runtime will handle authentication the function for you.

Scrapy uses scrapyredis distributed

When I use scrapy to write a distributed crawler with scrapy-redis, I only store the request request queue in redis, and there is no storage deduplication fingerprint queue
# Spider code
import scrapy
from datetime import datetime
from machinedigikey.items import Detail_Item
from scrapy_redis.spiders import RedisSpider
class DgkUpdateDetailSpider(RedisSpider):
# setting
SCHEDULER = "scrapy_redis.scheduler.Scheduler"
DUPEFILTER_CLASS = "scrapy_redis.dupefilter.RFPDupeFilter"
SCHEDULER_QUEUE_CLASS = "scrapy_redis.queue.PriorityQueue"
SCHEDULER_PERSIST = True
REDIS_URL = "redis://127.0.0.1:6379"
ITEM_PIPELINES = {
'scrapy_redis.pipelines.RedisPipeline': 300
}
MONGO_URI = 'mongodb://localhost:27017'
# Startup method
scrapy crawl dgk_update_detail
No errors, but I checked that there is no heavy fingerprint in redis only dgk_update_detail:requests
Also ask if the code breaks during the running of the code, but does not clear the data in redis delete all the crawler code, modify and redeploy, will the entire crawler continue to crawl?
Stop the crawling method as:
Kill -s 9 id

Cherrypy web server hangs forever -- Matplotlib error

I'm creating a web-based interface for a number of different command line executables, and am using cherrypy behind apache (using mod_rewrite). I'm very new to this, and am having difficulty getting things configured properly. On my development machine, everything works reasonable well, but when I installed the code on a second machine I can't get anything to work properly.
The basic workflow for the applications is: 1. upload a dataset, 2. process the data (using python with some calls to executables using subprocess.call), 3. display the results on the web page.
After uploading and processing one dataset, everytime I attempt to process a second dataset the system stops responding. I'm not seeing any output in the terminal from the cherrypy process, or in the site log that shows any errors have occurred.
I'm starting cherrypy with the following conf file:
[global]
environment: 'production'
log.error_file: 'logs/site.log'
log.screen: True
tools.sessions.on: True
tools.session.storage_type: "file"
tools.session.storage_path: "sessions/"
tools.sessions.timeout: 60
tools.auth.on: True
tools.caching.on: False
server.socket_host: '0.0.0.0'
server.max_request_body_size: 0
server.socket_timeout: 60
server.thread_pool: 20
server.socket_queue_size: 10
engine.autoreload.on:True
My init.py file:
import cherrypy
import os
import string
from os.path import exists, join
from os import pathsep
from string import split
from mako.template import Template
from mako.lookup import TemplateLookup
from auth import AuthController, require, member_of, name_is
from twopoint import TwoPoint
current_dir = os.path.dirname(os.path.abspath(__file__))
lookup = TemplateLookup(directories=[current_dir + '/templates'])
def findInSubdirectory(filename, subdirectory=''):
if subdirectory:
path = subdirectory
else:
path = os.getcwd()
for root, dirs, names in os.walk(path):
if filename in names:
return os.path.join(root, filename)
return None
class Root:
#cherrypy.expose
#require()
def index(self):
tmpl = lookup.get_template("main.html")
return tmpl.render(usr=WebUtils.getUserName(),source="")
if __name__=='__main__':
conf_path = os.path.dirname(os.path.abspath(__file__))
conf_path = os.path.join(conf_path, "prod.conf")
cherrypy.config.update(conf_path)
cherrypy.config.update({'server.socket_host': '127.0.0.1',
'server.socket_port': 8080});
def nocache():
cherrypy.response.headers['Cache-Control']='no-cache,no-store,must-revalidate'
cherrypy.response.headers['Pragma']='no-cache'
cherrypy.response.headers['Expires']='0'
cherrypy.tools.nocache = cherrypy.Tool('before_finalize',nocache)
cherrypy.config.update({'tools.nocache.on':'True'})
cherrypy.tree.mount(Root(), '/')
cherrypy.tree.mount(TwoPoint(), '/twopoint')
cherrypy.engine.start()
cherrypy.engine.block()
For one example where this occurs, I've got the following javascript function that calls my python code:
function compTwoPoint(dataset,orig){
// call python code to generate images
$.post("/twopoint/compTwoPoint/"+dataset,
function(result){
res=jQuery.parseJSON(result);
if(res.success==true){
showTwoPoint(res.path,orig);
}
else{
alert(res.exception);
$('#display_loading').html("");
}
});
}
This calls the python code:
def twopoint(in_matrix):
"""proprietary code, can't share"""
def twopoint_file(in_file_name,out_file_name):
k = imread(in_file_name);
figure()
imshow(twopoint(k))
colorbar()
savefig(out_file_name,bbox_inches="tight")
close()
class TwoPoint:
#cherrypy.expose
def compTwoPoint(self,dataset):
try:
fnames=WebUtils.dataFileNames(dataset)
twopoint_file(fnames['filepath'],os.path.join(fnames['savebase'],"twopt.png"))
return encoder.iterencode({"success": True})
These functions work together to give the expected result. The problem is that after processing one input file, I am unable to process a second file. I don't seem to get a response from the server.
On the machine where things are working, I'm running python 2.7.6 and cherrypy 3.2.3. On the second machine, I have python 2.7.7 and cherrypy 3.3.0. While this may explain the difference in behavior, I'd like to find a way to make my code portable enough to overcome the difference in version (going from older to newer)
I'm not sure what the problem is, or even what to search for. I would appreciate any guidance or help you can offer.
(edit: Digging a bit more, I discovered something is happening with matplotlib. if I put print statments before and after the figure() command in twopoint_file, only the first one prints. Calling this function directly from a python interpreter (removing cherrypy from the equation) I get the following error:
can't invoke "event" command: application has been destroyed while executing "event generate $w{{ThemeChanged}}"
procedure "ttk::ThemeChanged" line 6 invoked from within "ttk::ThemeChanged"
end edit)
I don't understand what this error means, and haven't had much luck searching.
Old question, but I got the same problem which I fixed by changing backend in Matplotlib:
import matplotlib
matplotlib.use("qt4agg")

Renaming an Amazon CloudWatch Alarm

I'm trying to organize a large number of CloudWatch alarms for maintainability, and the web console grays out the name field on an edit. Is there another method (preferably something scriptable) for updating the name of CloudWatch alarms? I would prefer a solution that does not require any programming beyond simple executable scripts.
Here's a script we use to do this for the time being:
import sys
import boto
def rename_alarm(alarm_name, new_alarm_name):
conn = boto.connect_cloudwatch()
def get_alarm():
alarms = conn.describe_alarms(alarm_names=[alarm_name])
if not alarms:
raise Exception("Alarm '%s' not found" % alarm_name)
return alarms[0]
alarm = get_alarm()
# work around boto comparison serialization issue
# https://github.com/boto/boto/issues/1311
alarm.comparison = alarm._cmp_map.get(alarm.comparison)
alarm.name = new_alarm_name
conn.update_alarm(alarm)
# update actually creates a new alarm because the name has changed, so
# we have to manually delete the old one
get_alarm().delete()
if __name__ == '__main__':
alarm_name, new_alarm_name = sys.argv[1:3]
rename_alarm(alarm_name, new_alarm_name)
It assumes you're either on an ec2 instance with a role that allows this, or you've got a ~/.boto file with your credentials. It's easy enough to manually add yours.
Unfortunately it looks like this is not currently possible.
I looked around for the same solution but it seems neither console nor cloudwatch API provides that feature.
Note:
But we can copy the existing alram with the same parameter and can save on new name
.