help with tree-like structure

help with tree-like structure - sql

I've got some financial data to store and manipulate. Let's say I have 2 divisions, with offices in 2 cities, in 2 currencies, and 4 bank accounts. (It's actually more complex than that.) I want to show a list like this:
Electronics
Chicago
Dollars
Account 2 -> transactions in acct2 in $ in chicago/electronics
Euros
Account 1 -> transactions in acct1 in E in chicago/electronics
Account 3 -> etc.
Account 4
Brussles
Dollars
Account 1
Euros
Account 3
Account 4
Dessert Toppings
Chicago
Dollars
Account 1
Account 4
Euros
Account 2
Account 4
Brussles
Dollars
Account 2
Euros
Account 3
Account 4
So at each level except the top, the category can appear in multiple places. I've been reading around about the various methods, but none of the examples seem to address my particular use case, where nodes can appear in more than one place in the hierarchy. (Maybe there's a different name for this than "tree" or "hierarchy".)
I guess my hierarchy is actually something like Division > City > Currency with 'Electronics' and 'Euros' merely instances of each level, but I'm not quite sure how that helps or hurts.
A few notes: this is for a demo site, so the dataset won't be large -- ease of set-up and maintenance is more important than query efficiency. (I'm actually considering just building a data object by hand, though I'd much rather do it the right way.) Also, FWIW, we're working in php with an ms access back-end, so any libraries out there that make this easy in that environment would be helpful. (I've found a couple of implementations of the nested set pattern already.)

Are you sure you want to use a hierarchical design for this? To me, the hierarchy seems more a consequence of the desired output format than something intrinsic to your data structure.
And what if you have to display the data in a different order, like City > Currency > Division? Wouldn't that be very cumbersome?
You could use a plain structure instead, with a table for Branches, one for Cities, one for Currencies, and then then one Account table with Branch_ID, City_ID, and Currency_ID as foreign keys.

I'm not sure what database platform you're using. But if you're using MS SQL Server, then you should check out recursive queries using common table expressions (CTEs). They're easy to use and are designed for exactly the type of situation you've illustrated (a bill of materials, for instance). Check out this website for more detail: http://www.mssqltips.com/tip.asp?tip=1520
Good luck!

Related

Generalizing work orders

Hello stackoverflowians,
I am working on designing tables for work orders.
The problem:
There is different work order models (from now on called WOM)
The WOMs share some attributes (Num, Date, Description, ... etc)
The WOMs have details such as:
Sectors on wich the work is done.
Some WOMs use storage tank instead of sectors (products are prepared in storage tanks).
Products and their quantities (plus or no some info on product) applied to wich sector.
Human ressources wich worked on the WO.
Materials used on the work order
... etc
What is needed
Design tables for work orders and the details ofc.
They want to know how ressources were spent.
Design queries to retrieve all shape of infos.
Constraints
Simple presentation for the end users.
Generalizing the work orders models.
What has been done
Designed all work orders and their details as a hierarchy starting from work order num as the mother node.
WorkOrderTable (ID, ParentID, Type, Value)
example of a work order Transform hierarchical data into flat table
ID ParentID Type Value
38 0 Num 327
39 38 Sector 21
40 38 Sector 22
43 40 Product NS
44 40 Product MS
50 40 Temp RAS
48 44 Quantity 60
47 43 Quantity 25
41 39 Product ARF
42 39 Product BRF
49 39 Temp RAS
51 39 Cible Acarien A.
46 42 Quantity 30
52 42 Cible Acarien B.
45 41 Quantity 20
The Question
Is what I am doing good/efficient easy to maintien work with or there is other ideas ?
UPDATE I : More details
Products aren't changing about 50 active ones [products change over time, need to keep track of version]
Sectors are about 40 (fixe land area)
People normal HR table
How Big is a typical WOM :
about 15 attributes (3 of them mportante and shared by all WOMs the others are a little less)
about 5 or more details sharing : Product, Sector, People and other describing infos like the quantity of the product.
WOMs are fixe for now but I am worried about them changing in future (or the rise of new ones)
The versionning isn't a requirement right now, but adding it is a plus.
I am planning on using different tables for participants (sectors, products ...)
The meta-data / data confilict is what this design dilemma is about.
Considered any WOM is defined by 3 parts:
The Work Order General Info (Num, Date, ...)
The Sectors [Other WOMs use Tank storage] in wich the jobs are done.
The Ressources to complete the job products, people, machines ...
State of the design
Specific tables for participants sectors,people,machines...
Meta-data table (ID, meta-data, lvl). Example :
Sector, 1 (directly to WO)
Tank Storage, 1
Product, 2 (can be part of sector job not directly to WO) sd
Work Order table (ID, parentID, metadataID, valueID) the value ID is taken from the participants table
Concerning XML I have so to no informations about how to store them and manipulate them.

Without knowing any numbers and further knowledge about your needs no good advise is possible. Here are some question coming into my mind
How many users?
How many products/locations/sectors/people...?
Is this changing data?
How many WOMs?
How big is one typical WOM?
Is it a plain tree hierarchy
If not: Might there be alternative routes, circles, islands?
Are these WOMs fix or do they change?
If changing: Do you need versioning?
It looks like trying to re-invent a professional ERP system. As Bostwick told you already, you should rather think about to use an existing one...
Just some general hints:
Do not use the WOM-storage for (meta) data (only IDs / foreign key)
Try to draw a sharp border between working data and meta data
Use a dedicated table for each of your participants (sectors, products...)
A WOM might be better placed within XML
Read about (Finite) State Machines
Read about state pattern
Read about Business-Process-Modelling and Workflows

I think if your looking for design advice, you should go to a meta stack group, i.e code review exchange
That being said, your asking for advice on design, but only giving abstract information. The scale of a system, the amount of CRUD expected, and several other factors need to be considered in design. With out getting the details and targets its really hard to answer your question. There are trade offs with different approaches. It may even be advisable to use a nosql solution.
That being said, I would suggest not building your own ERP system, and instead look to buy one from a vendor, that is industry specific, and apply your customization to it.
Its very expensive to write your own system, keeping it updated, adding security, and a lot of others features make it a worth while business decision to purchase from a software vendor.
If your just looking to gain more experience by writing this, I would suggest browsing github, and the previously mentioned stack exchange.

Mondrian Schema, separation of data and presentation

I've been attempting to build a Mondrian schema specifically to use with Pentaho 5.0 (I'm not sure the version matters much here.) One problem I seem to repeatedly come up against, is how to control the presentation of data vs the data itself. Let me offer an example to illustrate.
Imagine a cube such as: (D for Dimension, H for hierarchy, L for level)
D: time
H: default
L: year
L: month
L: day
D: currency
H: default
L: name
L: code
If we think about the members of time.year, I'm sure we'd all agree they would be ..., 2008, 2009, 2010, 2011, 2012, 2013, .... So let's just move on to time.month. Here things get interesting. Do we represent time.month as numbers or words? Why not have both?
Mondrian provides a way to specify the name of the member as well as the "caption" of the member, which provides a different value for presentation than the member's name. Great! However if I provide a caption, then in Pentaho, you ONLY see the caption. Never the original member name. How can I let my user choose whichever is more appropriate?
The month level (as well as the day level, and any hierarchy with multiple levels,) cause another source of confusion. If the months are represented as one of 12 values, (numbers or words make no difference here,) then the actual member values are time.[2012].[1], time.[2012].[2], ..., time.[2012][12], time.[2013].[1], .... So for June (month 6), there are many members such as ..., time.[2009].[6], time.[2010].[6], time.[2011].[6], .... So if the list of members is presented and it only contains the month portion of the member name, then we see 1,2,3,4,5,6,7,8,9,10,11,12,1,2,3,4,5,6,7,8,9,10,11,12,1,2,3,.... You can't differentiate between equal months. "Just include the year column as well," you say. Yes that makes sense, but there are other places that Pentaho doesn't provide the option to do that, such as in the filtering dialog. I had the idea of including the year in the caption of the member so instead of just 6 or June, you would see 2012 June. Unfortunately, this is also less than ideal. If each level of the hierarchy is present, (and suppose we followed this pattern for day as well), then you have each row looking like 2012 | 2012 June | 2012 June 13 | your_measure. This is of course, silly. But this could arise easily when drilling into a report in Pentaho.
Our second dimension has similar problems. Imagine the data set of world currency types. There's the 3-letter ISO standard currency code, and an official currency name. These two values are 1:1 and fully dependent on each other. Each one is a unique key. There is no actual hierarchical relationship between the two. I see them simply as 2 different representations of the same piece of data. The biggest obstacle here is that if they are not in the same hierarchy then Pentaho freely allows you to place them on opposite axes. This makes ridiculous looking reports like:
United States Dollar | Canadian Dollar | Euro | ...
USD | 12345 | - | - |
CAD | - | 12345 | - |
EUR | - | - | 1234 |
...
The codes are excellent when you desire conciseness. However maybe you're dealing with a specific situation involving several uncommon currencies and you don't want to make the report reader have to look up the meaning of the more obscure codes. I explored the use of <Property> elements but Pentaho again lacks flexibility in that you MUST display the member column to also display a property value. If the name was a property of the code member, there is no way to display only the currency name in a report without also including the code, which is redundant.
Ultimately, I'm hoping there's some mechanism to control the presentation of data, or some technique in the schema design that results in a sensible, coherent experience for the end user doing analysis in Pentaho.

This is pretty common.
Regarding properties - It is annoying how you can't re-order them in analyzer they seem to be sort of 2nd class citizens. And as they are still not supported or displayed in Saiku it frequently means they can't be used anyway. However in your example that is indeed an intended correct use of a property - a good description in fact!
So there is one solution, but it's not super clean. You can define different hierarchies depending on the user preferences, then use role based security to hide one or other of the hierarchies from the end user.
A variation of a theme I've done with this is to have admin level, senior level and beginner level access to the same cubes where you see different levels and hierarchies depending on permissions.
I havent had time to go into Mondrian4 much which may improve things here as everything is now simply an attribute - but I'm not 100% sure.
Finally I would definately raise this with support (sounds like you have a support contract) and see what improvements can be made. Post the jira here i'll definately upvote it!

Creating a workable Redis store with several filters

I am working on a system to display information about real estate. It runs in angular with the data stored as a json file on the server, which is updated once a day.
I have filters on number of bedrooms, bathrooms, price and a free text field for the address. It's all very snappy, but the problem is the load time of the app. This is why I am looking at Redis. Trouble is, I just can't get my head round how to get data with several different filters running.
Let's say I have some data like this: (missing off lots of fields for simplicity)
id beds price
0 3 270000
1 2 130000
2 4 420000
etc...
I am thinking I could set up three sets, one to hold the whole dataset, one to create an index on bedrooms and another for price:
beds id
2 1
3 0
4 2
and the same for price:
price id
130000 1
270000 0
420000 2
Then I was thinking I could use SINTER to return the overlapping sets.
Let's say I looking for a house with more than 2 bedrooms that is less than 300000.
From the bedrooms set I get IDs 0,2 for beds > 2.
From the prices set I get IDs 0,1 for price < 300000
So the common id is 0, which I would then lookup in the main dataset.
It all sounds good in theory, but being a Redis newbie, I have no clue how to go about achieving it!
Any advice would be gratefully received!

You're on the right track; sets + sorted sets is the right answer.
Two sources for all of the information that you could ever want:
Chapter 7 of my book, Redis in Action - http://bitly.com/redis-in-action
My Python/Redis object mapper - https://github.com/josiahcarlson/rom (it uses ideas directly from chapter 7 of my book to implement sql-like indices)
Both of those resources use Python as the programming language, though chapter 7 has been translated into Java: https://github.com/josiahcarlson/redis-in-action/ (go to the java path to see the code).
... That said, a normal relational database (especially one with built-in Geo handling like Postgres) should handle this data with ease. Have you considered a relational database?

TSQL grouping on fuzzy column

I would like to group all the merchant transactions from a single table, and just get a count. The problem is, the merchant, let's say redbox, will have a redbox plus the store number added to the end(redbox 4562,redbox*1234). I will also include the category for grouping purpose.
Category Merchant
restaurant bruger king 123 main st
restaurant burger king 456 abc ave
restaurant mc donalds * 45877d2d
restaurant mc 'donalds *888544d
restaurant subway 454545
travelsubway MTA
gas station mc donalds gas
travel nyc taxi
travel nyc-taxi
The question: How can I group the merchants when they have address or store locations added on to them.All I need is a count for each merchant.

The short answer is there is no way to accurately do this, especially with just pure SQL.
You can find exact matches, and you can find wildcard matches using the LIKE operator or a (potentially huge) series of regular expressions, but you cannot find similar matches nor can you find potential misspellings of matches.
There's a few potential approaches I can think of to solve this problem, depending on what type of application you're building.
First, normalize the merchant data in your database. I'd recommend against storing the exact, unprocessed string such as Bruger King in your database. If you come across a merchant that doesn't match a known set of merchants, ask the user if it already matches something in your database. When data goes in, process it then and match it to an existing known merchant.
Store a similarity coefficient. You might have some luck using something like a Jaccard index to judge how similar two strings are. Perhaps after stripping out the numbers, this could work fairly well. At the very least, it could allow you to create a user interface that can attempt to guess what merchant it is. Also, some database engines have full-text indexing operators that can descibe things like similar to or sounds like. Those could potentially be worth investigating.
Remember merchant matches per user. If a user corrects bruger king 123 main st to Burger King, store that relation and remember it in the future without having to prompt the user. This data could also be used to help other users correct their data.
But what if there is no UI? Perhaps you're trying to do some automated data processing. I really see no way to handle this without some sort of human intervention, though some of the techniques described above could help automate this process. I'd also look at the source of your data. Perhaps there's a distinct merchant ID you can use as a key, or perhaps there exists somewhere a list of all known merchants (maybe credit card companies provide this API?) If there's boat loads of data to process, another option would be to partially automate it using a service such as Amazon's Mechanical Turk.

You can use LIKE
SELECT COUNT(*) AS "COUNT", "BURGER KING"
FROM <tables>
WHERE restaurant LIKE "%king%"
UNION ALL
SELECT COUNT(*) AS "COUNT", "JACK IN THE BOX"
FROM <tables>
Where resturant LIKE "jack in the box%"
You may have to move the wildcards around depending on how the records were spelled out.

It depends a bit on what database you use, but most have some kind of REGEXP_INSTR or other function you can use to check for the first index of a pattern. You can then write something like this
SELECT SubStr(merchant, 1, REGEXP_INSTR(merchant, '[0-9]')), count('x')
FROM Expenses
GROUP BY SubStr(merchant, 1, REGEXP_INSTR(merchant, '[0-9]'))
This assumes that the merchant name doesn't have a number and the store number does. However you still may need to strip out any special chars with a replace (like *, -, etc).

Access Query - Compare Multiple User Selections Against Each Other

I'm running into a conceptual problem that I cannot seem to conquer in my mind.
Let's say I want a user to enter what they're currently wearing into a database via a form. Throwing 'T-Shirt' and 'Blue' in a new row is incredibly easy. However, let's say I want to compare one users against others, and rank in order from most similar to least.
This becomes a huge nightmare when you consider the amount of options available.
Undershirt
Overshirt
Jacket
Scarf/Necklaces
Headwear
Pants
Underwear
Leggings
Socks
Footwear
Accessories
As I see it, I could hard-code in the 11 categories above and let a user make selections from drop-drop boxes tailored to each category. Now, let's use an example of 'Undershirt' and 'Overshirt'. Depending on the person, a long-sleeved shirt could be used as either; they're still wearing one. If I make users put values in categories, User A might put it in one and User B might but it in another category. And they wouldn't get compared because of that, separate categories.
Now, instead of hard-coding in categories (and thus making a limit of how much a user can enter), I could put each item into its own row and search by User ID. But let's say a person enters in shorts one day, and next throws in jeans and a shirt. How can I make sure that they're compared separately (e.g., dress compared to shorts, dress compared to jeans+shirt) and not (dress compared to shorts+jeans+shirt).
As to actually comparing, each item vs. each other could be performed via a 2d lookup table. (Row Dress vs. Column Jeans would net a zero, Row Dress vs. Column Dress would net a one)

The appropriate design for this would depend on the acceptable margin of error. If there is zero acceptable error, then you must present the users with the categories and they specify true/false yes/no for each one or select from a limited set of possible answers.
HANDS:
gloves
mittens
brass knuckles
[Caveat: user could be wearing brass knuckles inside the mittens. You have to take into account
whether values are mutually exclusive or not. Barefoot <> no socks.
Someone who is barefoot is not wearing socks but someone not wearings socks may be wearing docksiders]
FEET1:
anklet socks
sheer stockings
fishnet stockings
ragg wool hiking socks
kneesocks
gym socks
no socks
FEET2:
mocassins
running shoes
sandals
wing-tips
uggs
spike heels
...
HEAD:
sombrero
beret
baseball hat
pirate's hat
beanie
knitted cap
NECK:
scarf
mock turtleneck aka dickie
Et cetera et cetera ad nauseam.
Or if margin of error is very generous, you could allow simple freeform text-entry and match/partial-match on words. Slightly less error : you could set up a synonyms table and match on the synonyms of the supplied words.

As a general rule, get the database design right and worry about reporting later. If this is not just a thought exercise, you may like to say what you are actually comparing, because with the above, a person is quite likely to say "tuxedo" or "evening dress", and let the details be inferred, whereas in some other area, this may not be possible. Even so, it seems that you would need a minimum of three columns (fields) for each item:
Timestamp
Major category (jeans, trousers, skirt)
Item (Levi's, tweeds, mini)
If accuracy is particularly important, you will need a trained interviewer :)
I have just noticed underwear in that list, which is even more complicated, because what would qualify as full underwear for a lady of a certain age is by no means the same as that for a gentleman of ten years.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas