Create, and insert into, an Aerospike ordered map from Python - aerospike

I see documentation for appending to a list in Aerospike, from Python, namely:
key = ('test', 'demo', 1)
rec = {'coutry': 'India', 'city': ['Pune', 'Dehli']}
client.put(key, rec)
client.list_append(key, 'city', 'Mumbai')
However I don't know how to add elements to a map in Aerospike, from Python, and I also don't know how to define said map as sorted.
Essentially I am trying to model a time series as follows:
ticker1: {intepochtime1: some_number, intepochtime2: some_other_number,...}
ticker2: {intepochtime1: some_number, intepochtime2: some_other_number,...}
........
where the tickers are the record keys, so are indexed obviously, but also where the intepochtimes are integer JS-style integer timestamps and are also indexed by virtue of being stored in ascending or descending order and therefore easily range-queryable. How is this doable from Python?

Here is some sample code to get you started:
Also on github: https://github.com/pygupta/aerospike-discuss/tree/master/stkovrflo_Py_SortedMaps
import aerospike
from aerospike import predicates as p
def print_result((key, metadata, record)):
print(record)
config = { 'hosts': [ ("localhost", 3000), ] }
client = aerospike.client(config).connect()
map_policy={'map_order':aerospike.MAP_KEY_VALUE_ORDERED}
# Insert the records
key = ("test", "demo", 'km1')
client.map_set_policy(key, "mymap", map_policy)
client.map_put(key, "mymap", '0', 13)
client.map_put(key, "mymap", '1', 3)
client.map_put(key, "mymap", '2', 7)
client.map_put(key, "mymap", '3', 2)
client.map_put(key, "mymap", '4', 12)
client.map_put(key, "mymap", '5', 33)
client.map_put(key, "mymap", '6', 1)
client.map_put(key, "mymap", '7', 12)
client.map_put(key, "mymap", '8', 22)
# Query for sorted value
print "Sorted by values, 2 - 14"
ret_val = client.map_get_by_value_range(key, "mymap", 2, 14, aerospike.MAP_RETURN_VALUE)
print ret_val
#get first 3 indexes
print "Index 0 - 3"
ret_val2 = client.map_get_by_index_range(key, "mymap", 0, 3, aerospike.MAP_RETURN_VALUE)
print ret_val2
pgupta#ubuntu:~/discussRepo/aerospike-discuss/stkovrflo_Py_SortedMaps$ python sortedMapExample.py
Sorted by values, 2 - 14
[2, 3, 7, 12, 12, 13]
Index 0 - 3
[13, 3, 7]

Look at Python documentation for Client.
Must be ver 3.8.4+
Create map policy :
Define one of the key ordered or key value ordered policies
http://www.aerospike.com/apidocs/python/client.html#map-policies for map_order
Put map type bin but first define the map policy.
http://www.aerospike.com/apidocs/python/client.html#id1
see map_set_policy(key, bin, map_policy)
then map_put()
Sorted maps are just regular maps but with map_order policy.

python3 mem leak fixed in client ver 2.0.8.

Related

Pandas: how to read CSV specific columns which doesn't contain header

usecols = [*range(1, 5), *range(7, 9), *range(11, 13)]
df = pd.read_csv('/content/drive/MyDrive/weather.csv', header=None, usecols=usecols, names=['d', 'm', 'y', 'time', 'temp1', 'outtemp', 'temp2', 'air_pressure', 'humidity'])
I'm trying this but always get
ValueError: Usecols do not match columns, columns expected but not found: [1, 2, 3, 4, 7, 8, 11, 12]
my data set looks like:
The problem you are seeing is due to a mismatch in the number of columns designated by usecols and the number columns designated by names.
Usecols:
[1, 2, 3, 4, 7, 8, 11, 12] - 8 columns
names:
'd', 'm', 'y', 'time', 'temp1', 'outtemp', 'temp2', 'air_pressure', 'humidity' - 9 columns
Change code so that the range in usecols ends in 14 rather than 13:
Code:
usecols = [*range(1, 5), *range(7, 9), *range(11, 14)]
df = pd.read_csv('/content/drive/MyDrive/weather.csv', header=None, usecols=usecols, names=['d', 'm', 'y', 'time', 'temp1', 'outtemp', 'temp2', 'air_pressure', 'humidity'])
Example df output:

How to apply str.split() on pandas column?

Using Simple Data:
df = pd.DataFrame({'ids': [0,1,2], 'value': ['2 4 10 0 14', '5 91 19 20 0', '1 1 1 2 44']})
I need to convert the column to array, so I use:
df.iloc[:,-1] = df.iloc[:,-1].apply(lambda x: str(x).split())
X = df.iloc[:, 1:]
X = np.array(X.values)
but the problem is the data is being nested and I just need a matrix (3,5). How to make this properly and fast for large data (avoid looping)?
As said in the comments by #anky, #ScottBoston. You can use string method split along with expand parameter and finally change to NumPy:
df.iloc[:, 1].str.split(expand=True).values
array([['2', '4', '10', '0', '14'],
['5', '91', '19', '20', '0'],
['1', '1', '1', '2', '44']], dtype=object)

Query to filter where value equals #param unless #param is other

I have a filter dropdown list with the following options in it:
1, 2, 3, 4, 5, Other
When the user selects an option I will run a simple SQL query to filter the data by that value such as:
SELECT * FROM Product WHERE Code = #Code
The only problem is that when the "Other" option is selected I need to show everything that does not have a code of 1,2,3,4, or 5.
The data looks like the following:
Id: 1, Name: Product 1, Code: 1
Id: 2, Name: Product 2, Code: 2
Id: 3, Name: Product 3, Code: null
Id: 4, Name: Product 4, Code: 3
Id: 5, Name: Product 5, Code: 12
If the user selects "Other" I need to only display: "Product 3" and "Product 5".
A simple OR condition should accomplish that
SELECT *
FROM Product
WHERE (Code = #Code) OR (#Code = 'Other' AND Code NOT IN (1,2,3,4,5))
Is this what you want?
SELECT *
FROM Product
WHERE (Code = #Code AND #Code IN ('1', '2', '3', '4', '5')) OR
(Code NOT IN ('1', '2', '3', '4', '5') AND #Code = 'Other')
If Code is an integer, then the above may return a type conversion error. In that case, I would recommend:
WHERE Code = TRY_CONVERT(INT, #Code) OR
(Code NOT IN (1, 2, 3, 4, 5) AND #Code = 'Other')

Oracle SQL error: maximum number of expressions

Could you please help me regarding that issue getting error in Oracle SQL
ORA-01795 maximum number of expressions in a list is 1000
I'm passing value like
and test in (1, 2, 3.....1000)
Try to split your query with multiple in clauses like below
SELECT *
FROM table_name
WHERE test IN (1,2,3,....500)
OR test IN (501, 502, ......1000);
You can try workarounds:
Split single in into several ones:
select ...
from ...
where test in (1, ..., 999) or
test in (1000, ..., 1999) or
...
test in (9000, ..., 9999)
Put values into a (temporary?) table, say TestTable:
select ...
from ...
where test in (select TestField
from TestTable)
Edit: As I can see, the main difficulty is to build such a query. Let's implement it in C#. We are given a collection of ids:
// Test case ids are in [1..43] range
IEnumerable<int> Ids = Enumerable.Range(1, 43);
// Test case: 7, in actual Oracle query you, probably set it to 100 or 1000
int chunkSize = 7;
string fieldName = "test";
string filterText = string.Join(" or " + Environment.NewLine, Ids
.Select((value, index) => new {
value = value,
index = index
})
.GroupBy(item => item.index / chunkSize)
.Select(chunk =>
$"{fieldName} in ({string.Join(", ", chunk.Select(item => item.value))})"));
if (!string.IsNullOrEmpty(filterText))
filterText = $"and \r\n({filterText})";
string sql =
$#"select MyField
from MyTable
where (1 = 1) {filterText}";
Test:
Console.Write(sql);
Outcome:
select MyField
from MyTable
where (1 = 1) and
(test in (1, 2, 3, 4, 5, 6, 7) or
test in (8, 9, 10, 11, 12, 13, 14) or
test in (15, 16, 17, 18, 19, 20, 21) or
test in (22, 23, 24, 25, 26, 27, 28) or
test in (29, 30, 31, 32, 33, 34, 35) or
test in (36, 37, 38, 39, 40, 41, 42) or
test in (43))

Add another column(index) into the array

I have an array
a =array([ 0.74552751, 0.70868784, 0.7351144 , 0.71597612, 0.77608263,
0.71213591, 0.77297658, 0.75637376, 0.76636106, 0.76098067,
0.79142821, 0.71932262, 0.68984604, 0.77008623, 0.76334351,
0.76129872, 0.76717526, 0.78413129, 0.76483804, 0.75160062,
0.7532506 ], dtype=float32)
I want to store my array in item,value format and can't seems to get it right.
I'm trying to get this format:
a = [(0, 0.001497),
(1, 0.0061543),
..............
(46, 0.001436781),
(47, 0.00654533),
(48, 0.0027139),
(49, 0.00462962)],
Numpy arrays have a fixed data type that you must specify. It looks like a data type of
int for your item and float for you value would work best. Something like:
import numpy as np
dtype = [("item", int), ("value", float)]
a = np.array([(0, 0.), (1, .1), (2, .2)], dtype=dtype)
The string part of the dtype is the name of each field. The names allow you to access the fields more easily like this:
print a['value']
# [ 0., 0.1, 0.2]
a['value'] = [7, 8, 9]
print a
# [(0, 7.0) (1, 8.0) (2, 9.0)]
If you need to copy another array into the array I describe above, you can do it just by using the filed name:
new = np.empty(len(a), dtype)
new['index'] = [3, 4, 5]
new['value'] = a['value']
print new
# [(3, 7.0) (4, 8.0) (5, 9.0)]