UUID to NUUID in Python - pymongo

In my application, I get some values from MSSQL using PyMSSQL. Python interpret one of this values as UUID. I assigned this value to a variable called id. When I do
print (type(id),id)
I get
<class 'uuid.UUID'> cc26ce03-b6cb-4a90-9d0b-395c313fc968
Everything is as expected so far. Now, I need to make a query in MongoDb using this id. But the type of my field in MongoDb is ".NET UUID(Legacy)", which is NUUID. But I don't get any result when I query with
client.db.collectionname.find_one({"_id" : id})
This is because I need to convert UUID to NUUID.
Note: I also tried
client.db.collectionname.find_one({"_id" : NUUID("cc26ce03-b6cb-4a90-9d0b-395c313fc968")})
But it didn't work. Any ideas?

Assuming you are using PyMongo 3.x:
from bson.binary import CSHARP_LEGACY
from bson.codec_options import CodecOptions
options = CodecOptions(uuid_representation=CSHARP_LEGACY)
coll = client.db.get_collection('collectionname', options)
coll.find_one({"_id": id})
If instead you are using PyMongo 2.x
from bson.binary import CSHARP_LEGACY
coll = client.db.collectionname
coll.uuid_subtype = CSHARP_LEGACY
coll.find_one({"_id": id})
You have to tell PyMongo what format the UUID was originally stored in. There is also a JAVA_LEGACY representation.

Related

How to check if a value in python dictionary exist in mongo db? If exist update the value in mongoDB else update in mongoDB

I have a python dictionary that has {objctID},'GUID',{'A':A's score'},{ 'B':'B's score}.I have got 5 guid's in the dictionary.
The same format document is already stored in mongoDB.
I want to check if the guid's in python dict are there in mongodb collection. If exist it has be to updated else insert to mongoDB.
How can I do this using pyMongo?
Assuming that RESULT is a dict with keys that match yourIDs and that contains your scores, you can do the following :
ids = RESULT.keys()
for id in ids:
collection.update({"ID": id}, {$set : {id : RESULT[id]}}, true)
The true, will insert if the document doesn't already exist.
Here are the pymongo docs

How to get StreamSets Record Fields Type inside Jython Evaluator

I have a StreamSets pipeline, where I read from a remote SQL Server database using JDBC component as an origin and put the data into a Hive and a Kudu Data Lake.
I'm facing some issues with the type Binary Columns, as there is no Binary type support in Impala, which I use to access both Hive and Kudu.
I decided to convert the Binary type columns (Which flows in the pipeline as Byte_Array type) to String and insert it like that.
I tried to use a Field Type Converter element to convert all Byte_Array types to String, but it didn't work. So I used a Jython component to convert all arr.arr types to String. It works fine, until I got a Null value on that field, so the Jython type was None.type and I was unable to detect the Byte_Array type and unable to convert it to String. So I couldn't insert it into Kudu.
Any help how to get StreamSets Record Field Types inside Jython Evaluator? Or any suggested work around for the problem I'm facing?
You need to use sdcFunctions.getFieldNull() to test whether the field is NULL_BYTE_ARRAY. For example:
import array
def convert(item):
return ':-)'
def is_byte_array(record, k, v):
# getFieldNull expect a field path, so we need to prepend the '/'
return (sdcFunctions.getFieldNull(record, '/'+k) == NULL_BYTE_ARRAY
or (type(v) == array.array and v.typecode == 'b'))
for record in records:
try:
record.value = {k: convert(v) if is_byte_array(record, k, v) else v
for k, v in record.value.items()}
output.write(record)
except Exception as e:
error.write(record, str(e))
So here is my final solution:
You can use the logic below to detect any StreamSets type inside the Jython component by using the NULL_CONSTANTS:
NULL_BOOLEAN, NULL_CHAR, NULL_BYTE, NULL_SHORT, NULL_INTEGER, NULL_LONG,
NULL_FLOAT, NULL_DOUBLE, NULL_DATE, NULL_DATETIME, NULL_TIME, NULL_DECIMAL,
NULL_BYTE_ARRAY, NULL_STRING, NULL_LIST, NULL_MAP
The idea is to save the value of the field in a temp variable, set the value of the field to be None and use the function sdcFunctions.getFieldNull to know the StreamSets type by comparing it to one of the NULL_CONSTANTS.
import binascii
def toByteArrayToHexString(value):
if value is None:
return NULL_STRING
value = '0x'+binascii.hexlify(value).upper()
return value
for record in records:
try:
for colName,value in record.value.items():
temp = record.value[colName]
record.value[colName] = None
if sdcFunctions.getFieldNull(record,'/'+colName) is NULL_BYTE_ARRAY:
temp = toByteArrayToHexString(temp)
record.value[colName] = temp
output.write(record)
except Exception as e
error.write(record, str(e))
Limitation:
The code above converts the Date type to Datetime type only when it has a value (When its not NULL)

Querying data on json saved using ReJSON

i have saved one json using Rejson against a key,now i would like to filter/query out data using ReJson.
Please let me know how can i do it ...python prefered .
print("Abount to execute coomnad")
response=redisClient.execute_command('JSON.SET', 'object', '.', json.dumps(data))
print(response)
reply = json.loads(redisClient.execute_command('JSON.GET', 'object'))
print(reply)
using the above code i was able to set data using ReJson .now lets suppose i want to filer data .
my test json is :
data = {
'foo': 'bar',
'ans': 42
}
How can you filter say json in which foo has value as bar
Redis in general, and ReJSON specifically, do not provide search-by-value functionality. For that, you'll have to either index the values yourself (see https://redis.io/topics/indexes) or use RediSearch.

Python Error when querying a SQL query (not all arguments converted during string formatting)

I have the below python code that tries to pull some data from a SQL query. I however am getting an error
TypeError: not all arguments converted during string formatting
Given below is the code I am using
import pandas as pd
import psycopg2
from psycopg2 import sql
import xlsxwriter
def func(input):
db_details = conn.cursor() # set DB Cursor
db_details.execute(sql.SQL("""select name from store where name = (%s)"""), (input))
names = dwh_cursor.fetchall()
df = pd.DataFrame(names,columns=[desc[0] for desc in dwh_cursor.description])
Could anyone guide me where am I going wrong. Thanks
If i recall correctly you need to pass to sql query a name included in single quotes, so your query need to be ...where name = '{}' """.format(variablename)

Creating User Defined Function in Spark-SQL

I am new to spark and spark sql and i was trying to query some data using spark SQL.
I need to fetch the month from a date which is given as a string.
I think it is not possible to query month directly from sparkqsl so i was thinking of writing a user defined function in scala.
Is it possible to write udf in sparkSQL and if possible can anybody suggest the best method of writing an udf.
You can do this, at least for filtering, if you're willing to use a language-integrated query.
For a data file dates.txt containing:
one,2014-06-01
two,2014-07-01
three,2014-08-01
four,2014-08-15
five,2014-09-15
You can pack as much Scala date magic in your UDF as you want but I'll keep it simple:
def myDateFilter(date: String) = date contains "-08-"
Set it all up as follows -- a lot of this is from the Programming guide.
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext._
// case class for your records
case class Entry(name: String, when: String)
// read and parse the data
val entries = sc.textFile("dates.txt").map(_.split(",")).map(e => Entry(e(0),e(1)))
You can use the UDF as part of your WHERE clause:
val augustEntries = entries.where('when)(myDateFilter).select('name, 'when)
and see the results:
augustEntries.map(r => r(0)).collect().foreach(println)
Notice the version of the where method I've used, declared as follows in the doc:
def where[T1](arg1: Symbol)(udf: (T1) ⇒ Boolean): SchemaRDD
So, the UDF can only take one argument, but you can compose several .where() calls to filter on multiple columns.
Edit for Spark 1.2.0 (and really 1.1.0 too)
While it's not really documented, Spark now supports registering a UDF so it can be queried from SQL.
The above UDF could be registered using:
sqlContext.registerFunction("myDateFilter", myDateFilter)
and if the table was registered
sqlContext.registerRDDAsTable(entries, "entries")
it could be queried using
sqlContext.sql("SELECT * FROM entries WHERE myDateFilter(when)")
For more details see this example.
In Spark 2.0, you can do this:
// define the UDF
def convert2Years(date: String) = date.substring(7, 11)
// register to session
sparkSession.udf.register("convert2Years", convert2Years(_: String))
val moviesDf = getMoviesDf // create dataframe usual way
moviesDf.createOrReplaceTempView("movies") // 'movies' is used in sql below
val years = sparkSession.sql("select convert2Years(releaseDate) from movies")
In PySpark 1.5 and above, we can easily achieve this with builtin function.
Following is an example:
raw_data =
[
("2016-02-27 23:59:59", "Gold", 97450.56),
("2016-02-28 23:00:00", "Silver", 7894.23),
("2016-02-29 22:59:58", "Titanium", 234589.66)]
Time_Material_revenue_df =
sqlContext.createDataFrame(raw_data, ["Sold_time", "Material", "Revenue"])
from pyspark.sql.functions import *
Day_Material_reveneu_df = Time_Material_revenue_df.select(to_date("Sold_time").alias("Sold_day"), "Material", "Revenue")