Nested pymongo queries (mlab) - pymongo

I have some documents in mlab mongodb; the format is:
{
"_id": {
"$oid": "58aeb1d074fece33edf2b356"
},
"sensordata": {
"operation": "chgstatus",
"user": {
"status": "0",
"uniqueid": "191b117fcf5c"
}
},
"created_date": {
"$date": "2017-02-23T15:26:29.840Z"
}
}
database name : mparking_sensor
collection name : sensor
I want to query in python to extract status key value pair and created_date key value pair only.
my python code is :
import sys
import pymongo
uri = 'mongodb://thorburn:tekush1!#ds157529.mlab.com:57529/mparking_sensor'
client = pymongo.MongoClient(uri)
db = client.get_default_database().sensor
print db
results = db.find()
for record in results:
print(record["sensordata"] , record['created_date'])
print()
client.close()
which gives me everything under sensordata as expected, dot notations giving me an error, can somebody help?

PyMongo represents BSON documents as Python dictionaries, and subdocuments as dictionaries within dictionaries. To access a value in a nested dictionary:
record["sensordata"]["user"]["status"]
So a complete print statement might be:
print("%s %s" % (record["sensordata"]["user"]["status"], record['created_date']))
That prints:
0 {'$date': '2017-02-23T15:26:29.840Z'}

Related

Is there a way to use dynamic dataset name in bigquery

Problem Statement :
I am trying to use BigqueryOperator in airflow. The aim is to read the same queries as many times with dynamic changing of dataset names ie dataset names will be passed as a parameter.
example:
project.dataset1_layer1.tablename1, project.dataset2_layer1.tablename1
Expected:
I want to maintain one single copy of SQL wherein I can pass dataset names as parameters which can get replaced for that particular dataset.
Error Messages:
I tried to pass dynamic dataset name as a part of query_params. But it got failed with below error message.
The query got parsed as
INFO - Executing: [u'SELECT col1, col2 FROM project.#partner_layer1.tablename']
ERROR - BigQuery job failed. Final error was: {u'reason': u'invalidQuery', u'message': u'Query parameters cannot be used in place of table names at [1:37]', u'location': u'query'}. u'CREATE_IF_NEEDED', u'query': u'SELECT col1, col2 FROM project.#partner_layer1.tablename'}, u'jobType': u'QUERY'}}
`
Things I have tried so far
Query Temaplate temp.sql is as below:
SELECT col1, col2 FROM `project.#partner_layer1.tablename`;
Airflow BigqueryOperator is used as below:
query_template_dict = {
'partner_list' = ['val1', 'val2', 'val3', 'val4']
'google_project': 'project_name',
'queries': {
'layer3': {
'template': 'temp.sql',
'output_dataset': '_layer3',
'output_tbl': 'table_{}'.format(table_date),
'output_tbl_schema': 'temp.txt'
}
},
'applicable_tasks': {
'val1': {
'table_layer3': []
},
'val2': {
'table_layer3': []
},
'val3': {
'table_layer3': []
},
'val4': {
'table_layer3': []
}
}
}
for partner in query_template_dict['partner_list']:
# Loop over applicable report queries for a partner
applicable_tasks = query_template_dict['applicable_tasks'][partner].keys()
for task in applicable_tasks:
destination_tbl = '{}.{}{}.{}'.format(query_template_dict['google_project'], partner,
query_template_dict['queries'][task]['output_dataset'] ,
query_template_dict['queries'][task]['output_tbl'])
}
#Actual destination table structure
#destination_tbl = 'project.partner_layer3.table_20200223'
run_bq_cmd = BigQueryOperator (
task_id =partner + '-' + task,
sql =[query_template_dict['queries'][task]['template']],
destination_dataset_table =destination_tbl,
use_legacy_sql =False,
write_disposition ='WRITE_APPEND',
create_disposition ='CREATE_IF_NEEDED',
allow_large_results =True,
query_params=[
{
"name": "partner",
"parameterType": { "type": "STRING" },
"parameterValue": { "value": partner}
},
{
"name": "batch_date",
"parameterType": { "type": "STRING" },
"parameterValue": { "value": batch_date}
}
],
dag=dag,
Can anybody help me with this issue?
Is there a limitation in BigQuery to dynamically pass dataset names?
Replace the dataset name in Airflow, not in BigQuery.
So do this before the query is sent to BigQuery - use Python string replacement within Airflow.

Returning unknown JSON in a query

Here is my scenario. I have data in a Cosmos DB and I want to return c.this, c.that etc as the indexer for Azure Cognitive Search. One field I want to return is JSON of an unknown structure. The one thing I do know about it is that it is flat. However it is my understanding that the return value for an indexer needs to be known. How, using SQL in a SELECT, would I return all JSON elements in the flat object? Here is an example value I would be querying:
{
"BusinessKey": "SomeKey",
"Source": "flat",
"id": "SomeId",
"attributes": {
"Source": "flat",
"Element": "element",
"SomeOtherElement": "someOtherElement"
}
}
So I would want my select to be maybe something like:
SELECT
c.BusinessKey,
c.Source,
c.id,
-- SOMETHING HERE TO LIST OUT ALL ATTRIBUTES IN THE JSON AS FIELDS IN THE RESULT
And I would want the result to be:
{
"BusinessKey": "SomeKey",
"Source": "flat",
"id": "SomeId",
"attributes": [{"Source":"flat"},{"Element":"element"},{"SomeOtherElement":"someotherelement"}]
}
Currently we are calling ToString on the c.attributes, which is the JSON of unknown structure but it is adding all the escape characters. When we want to search the index, we have to add all those escape characters and it's getting really unruly.
Is there a way to do this using SQL?
Thanks for any help!
You could use UDF in cosmos db sql.
UDF code:
function userDefinedFunction(object){
var returnArray = [];
for (var key in object) {
var map = {};
map[key] = object[key];
returnArray.push(map);
}
return returnArray;
}
Sql:
SELECT
c.BusinessKey,
c.Source,
c.id,
udf.test(c.attributes) as attributes
from c
Output:

How to export pandas data to elasticsearch?

It is possible to export a pandas dataframe data to elasticsearch using elasticsearch-py. For example, here is some code:
https://www.analyticsvidhya.com/blog/2017/05/beginners-guide-to-data-exploration-using-elastic-search-and-kibana/
There are a lot of similar methods like to_excel, to_csv, to_sql.
Is there a to_elastic method? If no, where should I request it?
The following script works for localhost:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
INDEX="dataframe"
TYPE= "record"
def rec_to_actions(df):
import json
for record in df.to_dict(orient="records"):
yield ('{ "index" : { "_index" : "%s", "_type" : "%s" }}'% (INDEX, TYPE))
yield (json.dumps(record, default=int))
from elasticsearch import Elasticsearch
e = Elasticsearch() # no args, connect to localhost:9200
if not e.indices.exists(INDEX):
raise RuntimeError('index does not exists, use `curl -X PUT "localhost:9200/%s"` and try again'%INDEX)
r = e.bulk(rec_to_actions(df)) # return a dict
print(not r["errors"])
Verify using curl -g 'http://localhost:9200/dataframe/_search?q=A:[29%20TO%2039]'
There are many little things that can be added to suit different needs but main is there.
I'm not aware of any to_elastic method integrated in pandas. You can always raise an issue on the pandas github repo or create a pull request.
However, there is espandas which allows to import a pandas DataFrame to elasticsearch. The following example from the README has been tested with Elasticsearch 6.2.1.
import pandas as pd
import numpy as np
from espandas import Espandas
df = (100 * pd.DataFrame(np.round(np.random.rand(100, 5), 2))).astype(int)
df.columns = ['A', 'B', 'C', 'D', 'E']
df['indexId'] = (df.index + 100).astype(str)
INDEX = 'foo_index'
TYPE = 'bar_type'
esp = Espandas()
esp.es_write(df, INDEX, TYPE)
Retrieving the mappings with GET foo_index/_mappings:
{
"foo_index": {
"mappings": {
"bar_type": {
"properties": {
"A": {
"type": "long"
},
"B": {
"type": "long"
},
"C": {
"type": "long"
},
"D": {
"type": "long"
},
"E": {
"type": "long"
},
"indexId": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
may you can use
pip install es_pandas
pip install progressbar2
This package should work on Python3(>=3.4) and ElasticSearch should be version 5.x, 6.x or 7.x.
import time
import pandas as pd
from es_pandas import es_pandas
# Information of es cluseter
es_host = 'localhost:9200'
index = 'demo'
# crete es_pandas instance
ep = es_pandas(es_host)
# Example data frame
df = pd.DataFrame({'Alpha': [chr(i) for i in range(97, 128)],
'Num': [x for x in range(31)],
'Date': pd.date_range(start='2019/01/01', end='2019/01/31')})
# init template if you want
doc_type = 'demo'
ep.init_es_tmpl(df, doc_type)
# Example of write data to es, use the template you create
ep.to_es(df, index, doc_type=doc_type)
# set use_index=True if you want to use DataFrame index as records' _id
ep.to_es(df, index, doc_type=doc_type, use_index=True)
here is the document https://pypi.org/project/es-pandas/
if 'es_pandas' cann't solve you problem,you could see other solution : https://towardsdatascience.com/exporting-pandas-data-to-elasticsearch-724aa4dd8f62
You could use elasticsearch-py or if you won't use elasticsearch-py you may find answer to your question here => index-a-pandas-dataframe-into-elasticsearch-without-elasticsearch-py

Filter an object array to modify json with circe

I am evaluating Circe and couldn't find out how to use filter for arrays to transform a JSON. I read the guide on its website and API doc, still no clue. Help much appreciated.
Sample data:
{
"Department" : "HR",
"Employees" :[{ "name": "abc", "age": 25 }, {"name":"def", "age" : 30 }]
}
Task:
How to use a filter for Employees to transform the JSON to another JSON, for example, all employees with age older than 50?
For some reason I can't filter from data source before JSON is generated, in case you ask.
Thanks
One possible way of doing this is by
val data = """{"Department" : "HR","Employees" :[{ "name": "abc", "age": 25 }, {"name":"def", "age":30}]}"""
def ageFilter(j:Json): Json = j.withArray { x =>
Json.fromValues(x.filter(_.hcursor.downField("age").as[Int].map(_ > 26).getOrElse(false)))
}
val y: Either[ParsingFailure, Json] = parse(data).map( _.hcursor.downField("Employees").withFocus(ageFilter).top.get)
println(s"$y")

Filter parameters to POST verify and place order request for Performance storage

I am trying to do BPM and SoftLayer integration using Java REST client. On my initial analysis(as well as help form stack overflow),I found
Step 1) we to get getPriceItem list to have all IDs for next request.
https://username:api_key#api.softlayer.com/rest/v3/SoftLayer_Product_Package/2/getItemPrices?objectMask=mask[id,item[keyName,description],pricingLocationGroup[locations[id, name, longName]]]
and then do verify and place order POST call using respective APIs.
I am stucked on Step 1) as filtering here seems to be bit tricky. I am getting a json response of over 20000 lines.
I wanted to show similar data(just like SL Performance storage UI ) on my custom BPM UI . (One drop down to select type of storage, 2nd to show location, 3rd to show size and 4th would be IOPS) where user can select the items and place request.
Here I found, SL is something similar to this for populating the drop downs-
https://control.softlayer.com/sales/productpackage/getregions?_dc=1456386930027&categoryCode=performance_storage_iscsi&packageId=222&page=1&start=0&limit=25
Can't we have implementation where we can use control.softlayer.com just like SL instead of api.softlayer.com? In that case we can use similar logic to display data on UI.
Thanks
Anupam
Here, using the API, the steps for performance storage. For endure storage the steps are similar you just need to review the value for categoryCode and modify if it needed
you can get the locations using this method:
http://sldn.softlayer.com/reference/services/SoftLayer_Product_Package/getRegions
you just need to know the package of the storage e.g.
GET https://api.softlayer.com/rest/v3.1/SoftLayer_Product_Package/222/getRegions
then, you can get the storage size for that you can use the SoftLayer_Product_Package::getItems or SoftLayer_Product_Package::getItemPrices methods and a filter e.g.
GET https://api.softlayer.com/rest/v3.1/SoftLayer_Product_Package/222/getItemPrices?objectFilter={"itemPrices": {"categories": {"categoryCode": {"operation": "performance_storage_space"}},"locationGroupId": { "operation": "is null"}}}
Note: We are filtering the data to get the prices whose category code is "performance_storage_space" and we want the standard price locationGroupId = null
then, you can get the IOPS, you can use the same approach like above, but there is a dependency between the IOPS and storage space e.g.
GET https://api.softlayer.com/rest/v3.1/SoftLayer_Product_Package/222/getItemPrices?objectFilter={"itemPrices": { "attributes": { "value": { "operation": 20 } }, "categories": { "categoryCode": { "operation": "performance_storage_iops" } }, "locationGroupId": { "operation": "is null" } } }
Note: In the example we assume that selected storage space was "20", the prices for IOPS have an record called atributes, this record tell us the valid storage spaces of the IOPS, then we have other filters to get only the IOPS prices categoryCode = performance_storage_iops and we want only the standard prices locationGroupId=null
To selecting the storage type I do not think there is a method the only way I see is that you call the SoftLayer_Product_Package::getAllObjects method and filter the data to get the packages for endurance, performance and portable storage.
Just in case here an example using the Softlayer's Python client to order
"""
Order a block storage (performance ISCSI).
Important manual pages:
http://sldn.softlayer.com/reference/services/SoftLayer_Product_Order
http://sldn.softlayer.com/reference/services/SoftLayer_Product_Order/verifyOrder
http://sldn.softlayer.com/reference/services/SoftLayer_Product_Order/placeOrder
http://sldn.softlayer.com/reference/services/SoftLayer_Product_Package
http://sldn.softlayer.com/reference/services/SoftLayer_Product_Package/getItems
http://sldn.softlayer.com/reference/services/SoftLayer_Location
http://sldn.softlayer.com/reference/services/SoftLayer_Location/getDatacenters
http://sldn.softlayer.com/reference/services/SoftLayer_Network_Storage_Iscsi_OS_Type
http://sldn.softlayer.com/reference/services/SoftLayer_Network_Storage_Iscsi_OS_Type/getAllObjects
http://sldn.softlayer.com/reference/datatypes/SoftLayer_Location
http://sldn.softlayer.com/reference/datatypes/SoftLayer_Container_Product_Order_Network_Storage_Enterprise
http://sldn.softlayer.com/reference/datatypes/SoftLayer_Product_Item_Price
http://sldn.softlayer.com/blog/cmporter/Location-based-Pricing-and-You
http://sldn.softlayer.com/blog/bpotter/Going-Further-SoftLayer-API-Python-Client-Part-3
http://sldn.softlayer.com/article/Object-Filters
http://sldn.softlayer.com/article/Python
http://sldn.softlayer.com/article/Object-Masks
License: http://sldn.softlayer.com/article/License
Author: SoftLayer Technologies, Inc. <sldn#softlayer.com>
"""
import SoftLayer
import json
# Values "AMS01", "AMS03", "CHE01", "DAL05", "DAL06" "FRA02", "HKG02", "LON02", etc.
location = "AMS01"
# Values "20", "40", "80", "100", etc.
storageSize = "40"
# Values between "100" and "6000" by intervals of 100.
iops = "100"
# Values "Hyper-V", "Linux", "VMWare", "Windows 2008+", "Windows GPT", "Windows 2003", "Xen"
os = "Linux"
PACKAGE_ID = 222
client = SoftLayer.Client()
productOrderService = client['SoftLayer_Product_Order']
packageService = client['SoftLayer_Product_Package']
locationService = client['SoftLayer_Location']
osService = client['SoftLayer_Network_Storage_Iscsi_OS_Type']
objectFilterDatacenter = {"name": {"operation": location.lower()}}
objectFilterStorageNfs = {"items": {"categories": {"categoryCode": {"operation": "performance_storage_iscsi"}}}}
objectFilterOsType = {"name": {"operation": os}}
try:
# Getting the datacenter.
datacenter = locationService.getDatacenters(filter=objectFilterDatacenter)
# Getting the performance storage NFS prices.
itemsStorageNfs = packageService.getItems(id=PACKAGE_ID, filter=objectFilterStorageNfs)
# Getting the storage space prices
objectFilter = {
"itemPrices": {
"item": {
"capacity": {
"operation": storageSize
}
},
"categories": {
"categoryCode": {
"operation": "performance_storage_space"
}
},
"locationGroupId": {
"operation": "is null"
}
}
}
pricesStorageSpace = packageService.getItemPrices(id=PACKAGE_ID, filter=objectFilter)
# If the prices list is empty that means that the storage space value is invalid.
if len(pricesStorageSpace) == 0:
raise ValueError('The storage space value: ' + storageSize + ' GB, is not valid.')
# Getting the IOPS prices
objectFilter = {
"itemPrices": {
"item": {
"capacity": {
"operation": iops
}
},
"attributes": {
"value": {
"operation": storageSize
}
},
"categories": {
"categoryCode": {
"operation": "performance_storage_iops"
}
},
"locationGroupId": {
"operation": "is null"
}
}
}
pricesIops = packageService.getItemPrices(id=PACKAGE_ID, filter=objectFilter)
# If the prices list is empty that means that the IOPS value is invalid for the configured storage space.
if len(pricesIops) == 0:
raise ValueError('The IOPS value: ' + iops + ', is not valid for the storage space: ' + storageSize + ' GB.')
# Getting the OS.
os = osService.getAllObjects(filter=objectFilterOsType)
# Building the order template.
orderData = {
"complexType": "SoftLayer_Container_Product_Order_Network_PerformanceStorage_Iscsi",
"packageId": PACKAGE_ID,
"location": datacenter[0]['id'],
"quantity": 1,
"prices": [
{
"id": itemsStorageNfs[0]['prices'][0]['id']
},
{
"id": pricesStorageSpace[0]['id']
},
{
"id": pricesIops[0]['id']
}
],
"osFormatType": os[0]
}
# verifyOrder() will check your order for errors. Replace this with a call to
# placeOrder() when you're ready to order. Both calls return a receipt object
# that you can use for your records.
response = productOrderService.verifyOrder(orderData)
print(json.dumps(response, sort_keys=True, indent=2, separators=(',', ': ')))
except SoftLayer.SoftLayerAPIError as e:
print("Unable to place the order. faultCode=%s, faultString=%s" % (e.faultCode, e.faultString))