Preferences objects present a way to store arbitrary data into Rally which can be combined with other Rally information.
For example, if I want to calculate defect density and see a graph in Rally, I can't because I don't have KLOC information in Rally. But if I write a script that periodically drops my current line count every iteration or so into a preference object of a well know ID, I can do this easily.
But should I? And if so what are the limitations of preferences objects in Rally? How much data can I safely store in them, and how many preferences objects can the system reasonable handle? Is it hundreds, thousands, tens of thousands? Our instance already has thousands of these just from standard apps that are installed, so it looks like the answer is at least thousands.
We currently do not place any restrictions on the use of preferences and frankly, I don't think we know the limits if its use. For the load that you are suggesting, I suspect you will not exceed those limits.
On another front, I'd love to hear more about the analysis that you have in mind. Before coming to Rally, I have done a bit of work using LOC to normalize metrics as well as to heuristically determine artifact dependency. Now at Rally, I have both the analytics features as well as the connector features within my domain of responsibility as a Product Owner and I've been exploring ways to responsibly use LOC at Rally.
Related
The assets entity in Moqui has an associated asset field. But, we have a use case where multiple assets need to be associated with an asset.
For example, a tool(manufacturing equipment) may be used only in specified machine(manufacturing equipment). We are exploring the option to create an join entity.
Are we deviating from the best practices of framework?
Added to answer the comment from David E Jones
Business Requirement
There is a custom tool designed to manufacture a component.
This tool is technically compatible with wide range of machines in operation.
The operating cost of machines in question vary in a very wide band. So, the tool should be used only on specific machines to keep the overall cost of manufactured component within a specified band.
So, for a given tool, we intend to assign the allowed machine(s) and use only assigned machines for manufacturing.
As David remarked it is difficult to design for business requirements without detail and context, and there is relatively little to go on here.
I guess the tooling that might be set up on a particular machine could comprise a large range, related to the range of component specifications of orders for a component that might come in.
The main process to be designed here I guess would be to choose the most economically optimum machine to set up with the tooling for a particular order, and that would always vary depending on the other orders ongoing or scheduled, and the machines those orders were assigned to.
Back to your query with the above in mind, if you are defining particular toolings or tools as assets, it might comprise an approach to look at defining the assetTypeEnumId as 'tooling' or similar, and use the classEnumId across the asset types of machines and toolings to stipulate the maximum economic level of machine that the tool should be used with, etc.
Alternatively, or in addition, it might be useful to look at the AssetStandardCost entity and into setting up some enums for assetStandardCostTypeEnumId.
It would seem to me on the surface that the approach of trying to directly associate multiple toolings to multiple machines (with a range of constraints in addition) would quickly lead to exponential possibilities.
All in all, my experience would be that if you look into the existing entities they will typically suggest a variety of approaches, and later on when further requirements arise you may be glad you used what was existing rather than try and do something new.
Business requirements are difficult to design for without detail and context, but it sounds like what you really want to model is not at the Asset level but at the Product level. For asset type products the Product and related entities (like ProductAssoc) are used to define characteristics of physical items, Asset records represent the actual physical items.
One example of this is the maintenance side of things. The maintenance schedule is part of the Product definition (applicable to all assets for that product) and the maintenance history is part of the Asset side of things (applicable only to specific physical items).
this question has been asked many times, I have read many users telling that it is not advisable to store images in a DB, in particular within CoreData. By they all seems to omit the reason why they would do so. Even Apple documentation state this, and everybody points to that direction, and every discussion end like this "well you can, but storing the path is better".
Apart from opinions, I would like to have a concrete example of why it is not a good solution.
I explain better, I have a strong background in building Web Application. A concrete example I would give from my point of view could be: do not store images in a DB, but rather the path to them, because you can have them served them by the web server, which can apply all of its caching issues.
But in a desktop environment, especially in iOS application, what are the downside of having stored in Core Data using sqllite, providing that:
There's a separate entity holding the images, it is not an attribute
of main entity
Also seems to be a limit of 100kb for images. Why ? What does happen with a 110,120...200kb ecc ?
thanks
There's nothing special about what Core Data normally does here. It's just using an SQLite database. You can put large blobs of data into it, but it just doesn't scale all that well. You can read more about it here: Internal Versus External BLOBs in SQLite.
That said, Core Data has support for external blobs which in Core Data terminology is called stored in external record (iOS 5.0 and later). Again, there's nothing magic about it, it's just storing the large pieces of data in the file system separately from the SQLite db itself. The benefit is that Core Data updates all this for you.
When you're in Xcode, there'll be a checkbox called Allows External Storage that you can check for Binary Data properties.
The filesystem, and the API:s surrounding it is (just like a webserver) optimized to serve files, of any size, and to apply caching where appropriate.
CoreData is optimized for handling an object graph with tiny pieces of data, like integers and short strings.
Also, there are a number of other issues that tend to creep up on you, like periodically vacuuming the SQLite database CoreData uses, or it won't be able to shrink, just grow.
Leonardo,
With Lion/iOS 5, Core Data started handling file system storage of large BLOBs for you.
The choice is really determined by how many images you are going to have open. If you have many, then you should keep them in the DB. Why? Because you only have a modest number of file descriptors, one of which is used for each open image stored in the file system.
That said, there is still a reason to manage the files yourself. If your BLOBs are really big, say 2+ MB, you will want to map them into memory and not just read them in. (When the memory warnings come, this lets the OS automatically purge them from your resident memory. This is a very good thing.) Even so, you still have the limited number of file descriptors problem.
Andrew
I am currently working on a private project that is going to use Google's GTFS spec to get information about 100s of Public Transit agencies, their routers, stations, times, and other related information. I will be getting my information from here and the google code wiki page with similar info. There is a lot of data and its partitioned into multiple CSV formatted text files. These can be huge, some ranging in 80-100mb of data.
With the data I have, I want to translate it all into a nice solid database that I can build layers on top of to use for my project. I will be using GPS positioning to pinpoint a location and all surrounding stations/stops.
My goal is to access all the information for all these stops and stations with as few calls as possible, while keeping datasets small for queried results.
I am currently leaning towards MongoDB and CouchDB for their GeoSpatial support that can really optimize getting small datasets. But I also need to be sure to link all the stops on a route because I will be propagating information along a transit route for that line. In this case I have found that I can benefit from a Graph DB like Neo4j and OrientDB, but from what I know, neither has GeoSpatial support nor am I 100% sure that a Graph DB would be what I need.
The perfect solution might not exist, but I come here asking for help on finding the best possible for my situation. I know I will possible have to work around limitations of whatever I choose, but I want to at least have done my research and know that its the best I can get at the moment.
I have also been suggested to splinter the data into multiple DBs, but that could get very messy because all the information is very tightly interconnected through IDs.
Any help would be appreciated.
Obviously a graph database fits 100% your problem. My advice here is to go for some geo spatial module over neo4j or orientdb, althought you have some others free and open source implementation.
I think the best one right now, with all the geo spatial thing implemented is neo4j-spatial package. But as far as I know, you can also reproduce most of the geo spatial thing on your own if necessary.
BTW talking about splitting, if the amount of data/queries will be high, I strongly recommend you to share the load and think the model in this terms. Sure you can do something.
I've used Mongo's GeoSpatial features and can offer some guidance if you need help with a C# or javascript implementation - I would recommend it to start because it's super easy to use. I'm learning all about Neo4j right now and I am working on a hybrid approach that takes advantage of both Mongo and Neo4j. You might want to cross reference the documents in Mongo to the nodes in Neo4j using the Mongo object id.
For my hybrid implementation, I'm storing profiles and any other large static data in Mongo. In Neo4j, I'm storing relationships like friend and friend-of-friend. If I wanted to analyze movies two friends are most likely to want to watch together (or really any other relationship I hadn't thought of initially), by keeping that object id reference I can simply add some code instructing each node go out and grab a list of movies from the related profile.
Added 2011-02-12:
Just wanted to follow up on this "hybrid" idea as I created prototypes for and implemented a few more solutions recently where I ended up using more than one database. Martin Fowler refers to this as "Polyglot Persistence."
I'm finding that I am often using a combination of a relational database, document database and a graph database (in my case this is generally SQL Server, MongoDB and Neo4j). Since the question is related to data modeling as much as it is to geospatial, I thought I would touch on that here:
I've used Neo4j for site organization (similar to the idea of hypermedia in the REST model), modeling social data and building recommendations (often based on social data). As a result, I will generally model this part of the application before I begin programming.
I often end up using MongoDB for prototyping the rest of the application because it provides such a simple persistence mechanism. I like to start developing an application with the user interface, so this ends up working well.
When I start moving entities from Mongo to SQL Server, the context is usually important - for instance, if I have an application that allows users to build daily reports based on periodically collected data, it may make sense to run a procedure that builds those reports each night and stores daily report objects in Mongo that may be combined into larger aggregate reports as needed (obviously this doesn't consider a few special cases, but that is not relevant to the point)...on the other hand, if users need to pull on-demand reports limited to very specific time periods, it may make sense to keep everything in SQL server and build those reports as needed.
That said, and this deserves more intense thought, here are some considerations that may be helpful:
I generally try to store entities in a relational database if I find that pulling an entity from the database [in other words(in the context of a relational database) - querying data from the database that provides the data required to generate an entity or list of entities that fulfills the requested parameters] does not require significant processing (multiple joins, for instance)
Do you require ACID compliance(aside:if you have a graph problem, you can leverage Neo4j for this)? There are document databases with ACID compliance, but there's a reason Mongo is not: What does MongoDB not being ACID compliant really mean?
One use of Mongo I saw in the wild that I thought was worthy of mention - Hadoop was being used to compute massive hash tables that were then stored in Mongo. I believe a similar approach is used by TripAdvisor for user based customization in terms of targeting offers, advertising, etc..
NoSQL only exists because MySQL users assume that all databases have their performance problems when their database grows large and/or becomes complex.
I suggest that you use PostGIS. You can use the same database for the rest of your data needs as well.
http://postgis.refractions.net/
We are in the process of deciding if we go for Omniture or Google Analytics.
Some information regarding GA seems outdated on the Net, and it is not easy to find the relevant answers to our questions.
In particular, I would appreciate some pointers regarding, in Google Analytics
is there a limitation of the number of custom variables?
is there a limitation of the type of variables that can be used?
and besides,
what is your experience in the delay between the moment the data is recorded GA side, and the time it is made available to the GA account (read 2~10 hours?)
Thanks
There are 5 custom variable slots. Any given pageview/visit/visitor can only occupy up to 5. In theory, you could have thousands of different variables, but the slots are overriding. ie, you can't store 'Is Logged In' in the same slot as 'Is Paid User' if you want to be able to track both on the same pageview, session, or user. But, you could use the same slot for mutually-exclusive variables that you know won't ever overlap (like, 'banned user' and 'Admin').
There's also a 6th possible variable value known as "User Defined Variable" (called by _setVar), which is the deprecated ancestor to Custom Variables, but for backwards compatibility reasons will likely always be around. It is a single slot, visitor level, that lets you define one key-value pair.
The 'type' is basically any key-value string pair, with a limitation that the combined length of any given custom variable's key and value cannot exceed 128 characters. You can set the scope of the custom variable to be at the page-level (pageview), session-level (visit), or user-level (visitor).
The length of time for data processing is inconsistent. Sometimes, the most basic data from pageviews, transactions and events appears within minutes, but then some of the accompanying data (source information, custom variable values, etc) does not process for another few hours. Only on vary rare occasions does it take longer than 24 hours for a full snapshot of a day to be available.
I would like to add that GA and SC are in no way comparable products when you are talking about measurement you want to base decisions on.
GA wins hands down on setup and configuration (and no cost, especially no extra costs), but if you want to measure anything on visitor level, need real time figures or want to have any support, choose something else than GA. Based on you question and the very informative answer provided, I think you did.
In Google Universal Analytics, you can set up to 20 "Custom Dimensions" and "Custom Metrics", see https://developers.google.com/analytics/devguides/collection/analyticsjs/custom-dims-mets
These enable you do just about everything you currently do with custom variables. Only downside is that they are not displayed in any standard reports, but are very powerful when used in custom reports.
It may not be a pure programming question but I'm looking for information about enCapsa. Do you know what it is, have you ever used it? I'm reading some papers about it but I can't really see how it works and what it can be used for in an IT company (and this is what i am supposed to find out).
Basically enCapsa is a shared data storage system, focused on providing a way for storing any kind of data (even from etherogeneous data sources such as different designed db tables) and consequently obtain it through a sort of human friendly queries, just like on a search engine. They offer the possibility to upload data from everywhere (it's CSV based) and later download to use wherever you need it.
Usages are many, consider it's a centralized DB accessible through web and they say it meets high security standard.
A usefull way to employ this service is to have data stored in there without the need to keep them synchronized across company computers.