Rails 3 - Store and Access large amount of static data - ruby-on-rails-3

I have a Rails 3 app that uses Geocoder for a search feature that I'm building. I need to be able to access a table (or hash) of static latitude/longitude data for all US airports. I currently have the data, but I need to store it somehow so that it is accessible.
Possible Solution 1:
I know that I can store a "static" hash of key/value pairs in a Helper to be called on like this:
AIRPORTS = { "abc" => "30.33, -95.66", "efg" => "42.65, -96.72", ... }
But the data contains about 20K key/value pairs. Wouldn't this all get pulled into memory and slow things down OR break?
Possible Solution 2:
Is there a way to store 20K+ static key/value pairs in a database table? I would think that there is, but I have only worked with dynamic data thus far.
Any help or additional suggestions would be greatly appreciated! Thanks in advance.

Related

Is there a way to copy Zapier formatter lookup tables from one zap to another?

I need to add a lookup table that converts states to abbreviations to 5 zaps that are already configured. Right now, the only solution I see is to recreate the lookup table for each zap, which is very time consuming.
Does anyone have suggestions how to do this programmatically, maybe using the Javascript/code action?
David here, from the Zapier Platform team.
That's sort of possible! Unfortunately you can't copy them as the lookup table step, but the code they're mimicking is very straightforward.
In javascript:
const mapping = {
// input : output
TX: 'Texas',
CO: 'Colorado',
CA: 'California'
}
return {
output: mapping[inputData.input] || 'fallback value'
}
You'll need an input field called input and you create your own mapping. That's basically all the lookup table is! It can be copy-pasted across zaps pretty easily. So, replace all your lookup table steps with Code steps and you're off to the races!
Often my solution for when I need access to the same information across multiple zaps is to place that information into storage using Zapier's storage client. It can be updated and accessed through the code module you can read more about Zapier's store client here. The nice thing about this is that if you need to make changes to the look up data you only have to do it in one place.

How to properly store a JSON object into a Table?

I am working on a scenario where I have invoices available in my Data Lake Store.
Invoice example (extremely simplified):
{
"business_guid":"b4f16300-8e78-4358-b3d2-b29436eaeba8",
"ingress_timestamp": 1523053808,
"client":{
"name":"Jake",
"age":55
},
"transactions":[
{
"name":"peanut",
"amount":100
},
{
"name":"avocado",
"amount":2
}
]
}
All invoices are stored in ADLS, and can be queried. But, It is my desire to provide access to the same data inside an ALD DB.
I am not an expert on unstructed data: I have RDBMS background. Taking that into consideration, I can only think of 2 possible scenarios:
2/3 tables - invoice, client (could be removed) and transaction. In this scenario, I would have to create an invoice ID to be able to build relationships between those tables
1 table - client info could be normalized into invoice data. But, transactions could (maybe) be defined as an SQL.ARRAY<SQL.MAP<string, object>>
I have mainly 3 questions:
What is the correct way of doing so? Solution 1 seems much better structured.
If I go with solution 1, how do I properly create an ID (probably GUID)? Is it acceptable to require ID creation when working with ADL?
Is there another solution I am missing here?
Thanks in advance!
This type of question is a bit like do you prefer your sauce on the pasta or next to the pasta :). The answer is: it depends.
To answer your 3 questions more seriously:
#1 has the benefit of being normalized that works well if you want to operate on the data separately (e.g., just clients, just invoices, just transactions) and want to the benefits of normalization, get the right indexing, and are not limited by the rowsize limits (e.g., your array of map needs to fit into a row). So I would recommend that approach unless your transaction data is always small and you always access the data together and mainly search on the column data.
U-SQL per se has no understanding of the hierarchy of the JSON document. Thus, you would have to write an extractor that turns your JSON into rows in a way that it either gives you the correlation of the parent to the child (normally done by stepwise downwards navigation with cross apply) and use the key value of the parent data item as the foreign key, or have the extractor generate the key (as int or guid).
There are some sample JSON extractors on the U-SQL GitHub site (start at http://usql.io) that can get you started with the JSON to rowset conversion. Note that you will probably want to optimize the extraction at some point to be JSON Reader based so you process larger docs without loading it into memory.

Pre-calculated JOIN queries as map in ignite

I am new to ignite and POCing currently.
I have a question regarding ways to store/load data in map. It's bit tricky and strange requirement.
Example:
I have Employee, Department, Project [Tables in database] + [Entity classes in application].
But I don't want to store each of these in a separate map in memory but rather I want to store pre-calculated join results in a designated map.
Dynamic Query : select employeeId,employeeName,departmentName,projectName,projectStart,projectEnd from Employee,Department,Project where $JOIN
I know at least before hand that, what would be key fields and what would be value fields. From above example, I can denote my "Map" as shown below,
Key : Set (employeeId,departmentId)
Value : List (employeeName,value),(departmentName,value),(projectName,value),(projectStart,value),(projectEnd,value)
So you can see with every pair of (employeeId,departmentId) I would be having multiple values associates with it. But dilemma is I don't have domain model/entity pojos before hand. Such dynamic views/maps can be added flexibly so that we don't have to go and change domain/entity model every time. We don't want to do joins/calculations every time for thousands of such client request on every call.
Is it possible to fire such join queries using MapLoader or by any other means?
I can think of Map with (Key=Set, Value = List)as data structure to store final results.Any other better alternative?
Could there be any performance issues while retrieving values from such map based on keys?
Any memory optimizations I should take care of?
Thanks,
Dharam
You are not required to use SQL queries. It's fine to use Ignite as a simple caching mechanism for DB query results. Each time a query is executed, save the result in IgniteCache and then use this cached result is the same query is requested. You can also use expirations [1] and/or evictions [2] to make sure that you don't have too much data in the cache and don't run out of memory.
[1] https://apacheignite.readme.io/docs/expiry-policies
[2] https://apacheignite.readme.io/docs/evictions

Datamodel design for an application using Redis

I am new to redis and I am trying to figure out how redis can be used.
So please let me know if this is a right way to build an application.
I am building an application which has got only one data source. I am planning to run a job on nightly basis to get data into a file.
Now I have a front end application, that needs to render this data in different formats.
Example application use case
Download processed applications by a university on nightly basis.
Display how many applications got approved or rejected.
Display number of applications by state.
Let user search for an application by application id.
Instead of using postgres/mysql like relational database, I am thinking about using redis. I am planning to store data in following ways.
Application id -> Application details
State -> List of application ids
Approved -> List of application ids (By date ?)
Declined -> List of application ids (By date ?)
Is this correct way to store data into redis?
Also if someone queries for all applications in california for a certain date,
I will be able to pull application ids in one call but to get details for each application, do I need to make another request?
Word of caution:
Instead of using postgres/mysql like relational database, I am thinking about using redis.
Why? Redis is an amazing database, but don't use the right hammer for the wrong nail. Use Redis if you need real time performance at scale, but don't try make it replace an RDBMS if that's what you need.
Answer:
Fetching data efficiently from Redis to answer your queries depends on how you'll be storing it. Therefore, to determine the "correct" data model, you first need to define your queries. The data model you proposed is just a description of the data - it doesn't really say how you're planning to store it in Redis. Without more details about the queries, I would store the data as follows:
Store the application details in a Hash (e.g. app:<id>)
Store the application IDs in a per state in Set (e.g. apps:<state>)
Store the approved/rejected applications in two Sorted Sets, the id being the member and the date being the score
Also if someone queries for all applications in california for a certain date, I will be able to pull application ids in one call but to get details for each application, do I need to make another request?
Again, that depends on the data model but you can use Lua scripts to embed this logic and execute it in one call to the database.
First of all you can use a Hash to store structured Data. With Lists (ZSets) and Sets you can create indexes for an ordered or unordered access. (Depending on your requirements of course. Make a list of how you want to access your data).
It is possible to get all data as json of an index in one go with a simple redis script (example using an unordered set):
local bulkToTable = function(bulk)
local retTable = {};
for index = 1, #bulk, 2 do
local key = bulk[index];
local value = bulk[index+1];
retTable[key] = value;
end
return retTable;
end
local functionSet = redis.call("SMEMBERS", "app:functions")
local returnObj = {} ;
for index = 1, #functionSet, 1 do
returnObj[index] = bulkToTable(redis.call("HGETALL", "app:function:" .. functionSet[index]));
returnObj[index]["functionId"] = functionSet[index];
end
return cjson.encode(returnObj);
more information about redis scripts see here : http://www.redisgreen.net/blog/intro-to-lua-for-redis-programmers/

neo4j count nodes performance on 200K nodes and 450K relations

We're developing an application based on neo4j and php with about 200k nodes, which every node has a property like type='user' or type='company' to denote a specific entity of our application. We need to get the count of all nodes of a specific type in the graph.
We created an index for every entity like users, companies which holds the nodes of that property. So inside users index resides 130K nodes, and the rest on companies.
With Cypher we quering like this.
START u=node:users('id:*')
RETURN count(u)
And the results are
Returned 1 row.Query took 4080ms
The Server is configured as default with a little tweaks, but 4 sec is too for our needs. Think that the database will grow in 1 month 20K, so we need this query performs very very much.
Is there any other way to do this, maybe with Gremlin, or with some other server plugin?
I'll cache those results, but I want to know if is possible to tweak this.
Thanks a lot and sorry for my poor english.
Finaly, using Gremlin instead of Cypher, I found the solution.
g.getRawGraph().index().forNodes('NAME_OF_USERS_INDEX').query(
new org.neo4j.index.lucene.QueryContext('*')
).size()
This method uses the lucene index to get "aproximate" rows.
Thanks again to all.
Mmh,
this is really about the performance of that Lucene index. If you just need this single query most of the time, why not update an integer with the total count on some node somewhere, and maybe update that together with the index insertions, for good measure run an update with the query above every night on it?
You could instead keep a property on a specific node up to date with the number of such nodes, where updates are done guarded by write locks:
Transaction tx = db.beginTx();
try {
...
...
tx.acquireWriteLock( countingNode );
countingNode.setProperty( "user_count",
((Integer)countingNode.getProperty( "user_count" ))+1 );
tx.success();
} finally {
tx.finish();
}
If you want the best performance, don't model your entity categories as properties on the node. In stead, do it like this :
company1-[:IS_ENTITY]->companyentity
Or if you are using 2.0
company1:COMPANY
The second would also allow you automatically update your index in a separate background thread by the way, imo one of the best new features of 2.0
The first method should also proof more efficient, since making a "hop" in general takes less time than reading a property from a node. It does however require you to create a separate index for the entities.
Your queries would look like this :
v2.0
MATCH company:COMPANY
RETURN count(company)
v1.9
START entity=node:entityindex(value='company')
MATCH company-[:IS_ENTITIY]->entity
RETURN count(company)