Object is being embedded in wrong parent mongoid - ruby-on-rails-3

I'm currently trying to load in a CSV to my MongoDB using MongoID. It works to an extent except the embedded documents are saved under the same root document floor.
Essentially there currently is an information document that contains info about a building. Under the information document I embed floors and within floors I can embed rooms. Similar to the structure below:
Information
Floors
Rooms
:name
:thermostat_zone
My test CSV contains the following information:
floor room zone
1 101 A
1 102 A
1 102 B
1 104 C
1 105 D
1 106 E
1 107 F
1 108 G
4 109 G
2 201 H
2 202 I
2 204 J
2 207 J
2 209 K
2 208 L
2 210 M
3 214 N
3 215 O
3 216 P
3 225 Q
The only problem is that all of rooms currently embed under floor one, and I imagine there is something going on concerning object persistence eventhough whenever I take the time to print the floors out from the load in loop each floor is being printed out. I really hope someone can help me get beyond this problem! I come from a PHP/MYSQL background, so this is new to me.
#building_id = params[:building_id]
owner = Owner.where('buildings._id' => Moped::BSON::ObjectId(#building_id)).first
#building = owner.buildings.find(#building_id)
#building.information.floors.destroy
CSV.foreach(params[:file].path, headers: true) do |row|
floor_name = row['floor']
room_name = row['room']
zone_name = row['zone']
floor = Floor.new
floor = #building.information.floors.where('name' => floor_name).first
if !floor || floor == nil
floor = Floor.new
floor.name = floor_name
floor.information = #building.information
floor.save
end
room = Room.new
room.floor = floor
room.name = room_name
room.zone = zone_name
room.save
end
Also, code for both Room.rb and Information.rb incase it is needed:
class Room
include Mongoid::Document
embedded_in :floor, :inverse_of => :rooms
field :name
field :zone
end
Information.rb: (narrowed down to just embedding stuff to make it easier)
#Building information
class Information
include Mongoid::Document
embeds_many :floors, :cascade_callbacks => true
end
Thanks!

replace this
if !floor || floor == nil
floor = Floor.new
with
if !floor || floor == nil
floor = #building.information.floors.create!()
# OR
floor = #building.information.floors.create!(name: 'name', zone: 'A')
same goes with rooms
replace this
room = Room.new
with this
room = floor.rooms.create!()
# OR
room = floor.rooms.create!(name: 'name', zone: 'B')
this is because that mongoid doesn't set the parent of embedded docs using child.parrent_attr = parent doesn't work properly...

Related

SQL dealing every bit without run query repeatedly

I have a column using bits to record status of every mission. The index of bits represents the number of mission while 1/0 indicates if this mission is successful and all bits are logically isolated although they are put together.
For instance: 1010 is stored in decimal means a user finished the 2nd and 4th mission successfully and the table looks like:
uid status
a 1100
b 1111
c 1001
d 0100
e 0011
Now I need to calculate: for every mission, how many users passed this mission. E.g.: for mission1: it's 0+1+1+0+1 = 5 while for mission2, it's 0+1+0+0+1 = 2.
I can use a formula FLOOR(status%POWER(10,n)/POWER(10,n-1)) to get the bit of every mission of every user, but actually this means I need to run my query by n times and now the status is 64-bit long...
Is there any elegant way to do this in one query? Any help is appreciated....
The obvious approach is to normalise your data:
uid mission status
a 1 0
a 2 0
a 3 1
a 4 1
b 1 1
b 2 1
b 3 1
b 4 1
c 1 1
c 2 0
c 3 0
c 4 1
d 1 0
d 2 0
d 3 1
d 4 0
e 1 1
e 2 1
e 3 0
e 4 0
Alternatively, you can store a bitwise integer (or just do what you're currently doing) and process the data in your application code (e.g. a bit of PHP)...
uid status
a 12
b 15
c 9
d 4
e 3
<?php
$input = 15; // value comes from a query
$missions = array(1,2,3,4); // not really necessary in this particular instance
for( $i=0; $i<4; $i++ ) {
$intbit = pow(2,$i);
if( $input & $intbit ) {
echo $missions[$i] . ' ';
}
}
?>
Outputs '1 2 3 4'
Just convert the value to a string, remove the '0's, and calculate the length. Assuming that the value really is a decimal:
select length(replace(cast(status as char), '0', '')) as num_missions as num_missions
from t;
Here is a db<>fiddle using MySQL. Note that the conversion to a string might look a little different in Hive, but the idea is the same.
If it is stored as an integer, you can use the the bin() function to convert an integer to a string. This is supported in both Hive and MySQL (the original tags on the question).
Bit fiddling in databases is usually a bad idea and suggests a poor data model. Your data should have one row per user and mission. Attempts at optimizing by stuffing things into bits may work sometimes in some programming languages, but rarely in SQL.

Create a function to calculate an equation from a dataframe in pandas

I have a dataframe as shown below
Inspector_ID Sector Waste Fire Traffic
1 A 7 2 1
1 B 0 0 0
1 C 18 2 0
2 A 1 6 3
2 B 1 4 0
2 C 4 14 2
3 A 0 0 0
3 B 2 6 12
3 C 0 1 4
From the above dataframe I would like to calculate the Inspector's expertise score in raising issues in a domain (waste, Fire and Traffic).
For example the the score of inspector-1 for waste is (((7/8)*2) + ((18/22)*3)/2)/2
I1W = Inspector-1 similarity in waste.
Ai = No. of waste issues raised by inspector-1 in sector i
Ti = Total no. of waste issues in sector i
Ni = No of inspectors raised issues in sector i(if all zero then only it is considered as not raised)
TS1 = Total no of sectors the inspector-1 visited.
I1W = Sum((Ai/Ti)*Ni)/TS1
The expected output is below dataframe
Inspector_ID Waste Fire Traffic
1 I1W I1F I1T
2 I2W I2F I2T
3 I3W I3F I3T
TBF = To be filled
You could look into something along the lines of:
newData = []
inspector_ids = df['Inspector_ID'].unique().tolist()
for id in inspector id:
current_data = df.loc[df['Inspector_id'] == id]
#With the data of the current inspector you get the desired values
waste_val = 'I1W'
fire_val = 'I1F'
traffic_val = 'I1T'
newData.append([id,waste_val, fire_val, traffic_val])
new_df = pd.DataFrame(newData, columns = ['Inspector_ID','Waste','Fire','Traffic'])
Some ideas for getting the values you need
#IS1 = Sectors visited by inspector 1.
#After the first loc that filters the inspector
sectors_visited = len(df['Sector'].unique().tolist())
#Ai = No. of waste issues raised by inspector-1 in sector i
waste_Issues_A = current_data.loc[current_data['Sector' == A].value_counts()
#Ti = Total no. of waste issues in sector i
#You can get total number of issues by sector with
df['Sector'].value_counts()
#Ni = No of inspectors raised issues in sector i(if all zero then only it is considered as not raised)
#I dont know if i understand this one correctly, I guess its the number
#of inspectors that raised issues on a sector
inspectors_sector_A = len(df.loc[df['Sector'] == A]['Inspector_ID'].unique().tolist())
The previous was done by memory so take the code with a grain of salt (Specially the Ni one).

optaplanner results with even spread

Does anyone know using Optaplanner what the rule would be to evenly spread out the course scheduling?
eg:
I am trying to input 15 courses - some of which need to be in a specific order, then spread these 15 course as evenly as possible over the input time period.
If this type of scenario is documented somewhere, please just forward with the link, I really can't find any eaxmples
TIA
Phil
In the docs, look for fairness/load balancing.
The squared trick also works for spreads, for example optimal schedule:
exam A on 1-FEB
exam B on 4-FEB => 3 days between A and B => -3² = - 9
exam C on 7-FEB => 3 days between B and C => -3² = - 9
Total: - 18
Non optimal schedule:
exam A on 1-FEB
exam B on 3-FEB => 2 days between A and B => -2² = - 4
exam C on 7-FEB => 4 days between B and C => -4² = - 16
Total: - 20 (so worse than -18, which is what we want)
To combine this weighting with other constraints, see the tennis example and read my blog post a couple of times.

How can I efficiently create unique relationships in Neo4j?

Following up on my question here, I would like to create a constraint on relationships. That is, I would like there to be multiple nodes that share the same "neighborhood" name, but each uniquely point to a particular city in which they reside.
As encouraged in user2194039's answer, I am using the following index:
CREATE INDEX ON :Neighborhood(name)
Also, I have the following constraint:
CREATE CONSTRAINT ON (c:City) ASSERT c.name IS UNIQUE;
The following code fails to create unique relationships, and takes an excessively long period of time:
USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file://THEFILE" as line
WITH line
WHERE line.Neighborhood IS NOT NULL
WITH line
MATCH (c:City { name : line.City})
MERGE (c)<-[:IN]-(n:Neighborhood {name : toInt(line.Neighborhood)});
Note that there is a uniqueness constraint on City, but NOT on Neighborhood (because there should be multiple ones).
Profile with Limit 10,000:
+--------------+------+--------+---------------------------+------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+--------------+------+--------+---------------------------+------------------------------+
| EmptyResult | 0 | 0 | | |
| UpdateGraph | 9750 | 3360 | anon[307], b, neighborhood, line | MergePattern |
| SchemaIndex | 9750 | 19500 | b, line | line.City; :City(name) |
| ColumnFilter | 9750 | 0 | line | keep columns line |
| Filter | 9750 | 0 | anon[220], line | anon[220] |
| Extract | 10000 | 0 | anon[220], line | anon[220] |
| Slice | 10000 | 0 | line | { AUTOINT0} |
| LoadCSV | 10000 | 0 | line | |
+--------------+------+--------+---------------------------+------------------------------+
Total database accesses: 22860
Following Guilherme's recommendation below, I implemented the helper yet it is raising the error py2neo.error.Finished. I've searched the documentation, and wasn't able to determine a work around from this. It looks like there's an open SO post about this exception.
def run_batch_query(queries, timeout=None):
if timeout:
http.socket_timeout = timeout
try:
graph = Graph()
authenticate("localhost:7474", "account", "password")
tx = graph.cypher.begin()
for query in queries:
statement, params = query
tx.append(statement, params)
results = tx.process()
tx.commit()
except http.SocketError as err:
raise err
except error.Finished as err:
raise err
collection = []
for result in results:
records = []
for record in result:
records.append(record)
collection.append(records)
return collection
main:
queries = []
template = ["MERGE (city:City {Name:{city}})", "Merge (city)<-[:IN]-(n:Neighborhood {Name : {neighborhood}})"]
statement = '\n'.join(template)
batch = 5000
c = 1
start = time.time()
# city_neighborhood_map is a defaultdict that maps city-> set of neighborhoods
for city, neighborhoods in city_neighborhood_map.iteritems():
for neighborhood in neighborhoods:
params = dict(city=city, neighborhood=neighborhood)
queries.append((statement, params))
c +=1
if c % batch == 0:
print "running batch"
print c
s = time.time()*1000
r = run_batch_query(queries, 10)
e = time.time()*1000
print("\t{0}, {1:.00f}ms".format(c, e-s))
del queries[:]
print c
if queries:
s = time.time()*1000
r = run_batch_query(queries, 300)
e = time.time()*1000
print("\t{0} {1:.00f}ms".format(c, e-s))
end = time.time()
print("End. {0}s".format(end-start))
If you want to create unique relationships you have 2 options:
Prevent the path from being duplicated, using MERGE, just like #user2194039 suggested. I think this is the simplest, and best approach you can take.
Turn your relationship into a node, and create an unique constraint on it. But it's hardly necessary for most cases.
If you're having trouble with speed, try using the transactional endpoint. I tried importing your data (random cities and neighbourhoods) through IMPORT CSV in 2.2.1, and I it was slow as well, though I am not sure why. If you send your queries with parameters to the transactional endpoint in batches of 1000-5000, you can monitor the process, and probably gain a performance boost.
I managed to import 1M rows in just under 11 minutes.
I used an INDEX for Neighbourhood(name) and a unique constraint for City(name).
Give it a try and see if it works for you.
Edit:
The transactional endpoint is a restful endpoint that allows you do execute transactions in batch. You can read about it here.
Basically, it allows you to stream a bunch of queries to the server at once.
I don't know what programming language/stack you're using, but in python, using a package like py2neo, it would be something like this:
with open("city.csv", "r") as fp:
reader = csv.reader(fp)
queries = []
template = ["MERGE (c :`City` {name: {city}})",
"MERGE (c)<-[:IN]-(n :`Neighborhood` {name: {neighborhood}})"]
statement = '\n'.join(template)
batch = 5000
c = 1
start = time.time()
for row in reader:
city, neighborhood = row
params = dict(city=city, neighborhood=neighborhood)
queries.append((statement, params))
if c % batch == 0:
s = time.time()*1000
r = neo4j.run_batch_query(queries, 10)
e = time.time()*1000
print("\t{0}, {1:.00f}ms".format(c, e-s))
del queries[:]
c += 1
if queries:
s = time.time()*1000
r = neo4j.run_batch_query(queries, 300)
e = time.time()*1000
print("\t{0} {1:.00f}ms".format(c, e-s))
end = time.time()
print("End. {0}s".format(end-start))
Helper functions:
def run_batch_query(queries, timeout=None):
if timeout:
http.socket_timeout = timeout
try:
graph = Graph(uri) # "{protocol}://{host}:{port}/db/data/"
tx = graph.cypher.begin()
for query in queries:
statement, params = query
tx.append(statement, params)
results = tx.process()
tx.commit()
except http.SocketError as err:
raise err
collection = []
for result in results:
records = []
for record in result:
records.append(record)
collection.append(records)
return collection
You will monitor how long each transaction takes, and you can tweak the number of queries per transactions, as well as the timeout.
To be sure we're on the same page, this is how I understand your model: Each city is unique and should have some number of neighborhoods pointing to it. The neighborhoods are unique within the context of a city, but not globally. So if you have a neighborhood 3 [IN] city Boston, you could also have a neighborhood 3 [IN] city Seattle, and both of those neighborhoods are represented by different nodes, even though they have the same name property. Is that correct?
Before importing, I would recommend adding an index to your neighborhood nodes. You can add the index without enforcing uniqueness. I have found that this greatly increases speeds on even small databases.
CREATE INDEX ON :Neighborhood(name)
And for the import:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file://THEFILE" as line
MERGE (c:City {name: line.City})
MERGE (c)<-[:IN]-(n:Neighborhood {name: toInt(line.Neighborhood)})
If you are importing a large amount of data, it may be best to use the USING PERIODIC COMMIT command to commit periodically while importing. This will reduce the memory used in the process, and if your server is memory-constrained, I could see it helping performance. In your case, with almost a million records, this is recommended by Neo4j. You can even adjust how often the commit happens by doing USING PERIODIC COMMIT 10000 or such. The docs say 1000 is the default. Just understand that this will break the import into several transactions.
Best of luck!

Power reserve calculation for site_types

I have posted before but the codes were not shown properly. I am trying to write a logic to calculate the power reserve at telecom sites such as microwave stations and substations,...
I have created a site_type scaffold with following migration in the database mysql
mysql
id site_type
1 Substation
2 Generating Station
3 District Office 4 VHF_UHF Repeater
5 Capacitor Station
6 Office
7 Microwave Station
here is the site model
class Site < ActiveRecord::Base
attr_accessible :lat, :long, :site_code, :site_name, :site_type, :site_type_id
def is_microwave?
self.site_type = 'Microwave Station'
end
def is_under_capacity?(capacity)
(is_microwave? && capacity < 24) || (!is_microwave? && capacity < 8)
end
the code in the viewer
- total_capacity = #site.dc_power_inventories.sum {|d| d.dc_power_supply.battery.capacity.capacity } rescue 0
- total_amps = #site.equipment_inventories.sum {|e| e.equipment.amp.amp } rescue 0
- capacity_left = (total_capacity.to_f / total_amps).round(2) rescue 0
div.row-fluid
h4
span Site Name: #{#site.site_code}
span.pull-right Capacity Reserve Left: #{capacity_left}
/ h4 = "Capacity Reserve Left: #{capacity_left} Hrs "
- if #site.is_under_capacity?(capacity_left)
div.alert
strong Warning !!!
span You Must add or replace the DC power system.
The issue I am having is that I only get the warning when sites (microwave or others) are under capacity (< 24 hrs). What I want is to get the warning for microwave site type when the capacity goes below 24 hrs and warning for other sites for < 8 hrs under reserve capacity.
Thanks.
Could be wrong, but I think your test for Microwave is wrong, think you have missed out an = sign
def is_microwave?
self.site_type == 'Microwave Station'
end