Generalizing work orders - sql

Hello stackoverflowians,
I am working on designing tables for work orders.
The problem:
There is different work order models (from now on called WOM)
The WOMs share some attributes (Num, Date, Description, ... etc)
The WOMs have details such as:
Sectors on wich the work is done.
Some WOMs use storage tank instead of sectors (products are prepared in storage tanks).
Products and their quantities (plus or no some info on product) applied to wich sector.
Human ressources wich worked on the WO.
Materials used on the work order
... etc
What is needed
Design tables for work orders and the details ofc.
They want to know how ressources were spent.
Design queries to retrieve all shape of infos.
Constraints
Simple presentation for the end users.
Generalizing the work orders models.
What has been done
Designed all work orders and their details as a hierarchy starting from work order num as the mother node.
WorkOrderTable (ID, ParentID, Type, Value)
example of a work order Transform hierarchical data into flat table
ID ParentID Type Value
38 0 Num 327
39 38 Sector 21
40 38 Sector 22
43 40 Product NS
44 40 Product MS
50 40 Temp RAS
48 44 Quantity 60
47 43 Quantity 25
41 39 Product ARF
42 39 Product BRF
49 39 Temp RAS
51 39 Cible Acarien A.
46 42 Quantity 30
52 42 Cible Acarien B.
45 41 Quantity 20
The Question
Is what I am doing good/efficient easy to maintien work with or there is other ideas ?
UPDATE I : More details
Products aren't changing about 50 active ones [products change over time, need to keep track of version]
Sectors are about 40 (fixe land area)
People normal HR table
How Big is a typical WOM :
about 15 attributes (3 of them mportante and shared by all WOMs the others are a little less)
about 5 or more details sharing : Product, Sector, People and other describing infos like the quantity of the product.
WOMs are fixe for now but I am worried about them changing in future (or the rise of new ones)
The versionning isn't a requirement right now, but adding it is a plus.
I am planning on using different tables for participants (sectors, products ...)
The meta-data / data confilict is what this design dilemma is about.
Considered any WOM is defined by 3 parts:
The Work Order General Info (Num, Date, ...)
The Sectors [Other WOMs use Tank storage] in wich the jobs are done.
The Ressources to complete the job products, people, machines ...
State of the design
Specific tables for participants sectors,people,machines...
Meta-data table (ID, meta-data, lvl). Example :
Sector, 1 (directly to WO)
Tank Storage, 1
Product, 2 (can be part of sector job not directly to WO) sd
Work Order table (ID, parentID, metadataID, valueID) the value ID is taken from the participants table
Concerning XML I have so to no informations about how to store them and manipulate them.

Without knowing any numbers and further knowledge about your needs no good advise is possible. Here are some question coming into my mind
How many users?
How many products/locations/sectors/people...?
Is this changing data?
How many WOMs?
How big is one typical WOM?
Is it a plain tree hierarchy
If not: Might there be alternative routes, circles, islands?
Are these WOMs fix or do they change?
If changing: Do you need versioning?
It looks like trying to re-invent a professional ERP system. As Bostwick told you already, you should rather think about to use an existing one...
Just some general hints:
Do not use the WOM-storage for (meta) data (only IDs / foreign key)
Try to draw a sharp border between working data and meta data
Use a dedicated table for each of your participants (sectors, products...)
A WOM might be better placed within XML
Read about (Finite) State Machines
Read about state pattern
Read about Business-Process-Modelling and Workflows

I think if your looking for design advice, you should go to a meta stack group, i.e code review exchange
That being said, your asking for advice on design, but only giving abstract information. The scale of a system, the amount of CRUD expected, and several other factors need to be considered in design. With out getting the details and targets its really hard to answer your question. There are trade offs with different approaches. It may even be advisable to use a nosql solution.
That being said, I would suggest not building your own ERP system, and instead look to buy one from a vendor, that is industry specific, and apply your customization to it.
Its very expensive to write your own system, keeping it updated, adding security, and a lot of others features make it a worth while business decision to purchase from a software vendor.
If your just looking to gain more experience by writing this, I would suggest browsing github, and the previously mentioned stack exchange.

Related

Multi-path traveling salesman optimization

I am trying to solve variant of a multi-path traveling salesman with an incomplete graph.
EDIT: changed the description (twice) based on feedback from #daniel_junglas.
In more words:
Only 1 salesperson
The salesperson can only visit every city exactly once
The salesperson can drive in various modes of transport (e.g. train, car, boat). Every mode of transport changes the time it takes to travel between cities
The salesperson can change the mode of transport in between cities (at a cost), but not in a city. The change can be seen as yet another edge between the two cities with a particular weight associated
not every city can be visited by every mode of transport (e.g. only a boat can reach city D)
the graph is not complete, so not all cities are connected, but one or more Hamiltonian path exists
Based on the example:
4 cities (1-4), each node having a car parking lot (C), train station (T), harbor for boats (B).
Starting and ending in 1C
Every city has to be visited one
No link from a train station to a harbour to the parking lot, only able to change outside of the cities (for instance 1C to 2T).
every link has a weight associated to it, based on distance, speed of transport mode and time penalty for changing transport mode
Example paths:
1C -> 2T -> 3T -> 4T -> 1C
1C -> 2C -> 3T -> 4T -> 1C
I was planning to solve this with concorde/cplex.
I have tried solving it with pyconcorde. For this, I encoded every parallel edge to the same node as a new node (A, A', A''), but I can't find a restriction to say only one A-node should be visited. I see many a-symetric TSP and multi-TSP solutions, but none fitting my requirements.
My questions: How should I approach this problem (tutorial link, embedding proposal, etc) to find the shortest route that visits all cities exactly once? Which tool would help me?
P.S. I am aware of various single-algorithm solutions, both probabilistic and exact. However, I am searching for a tool that combines various techniques, such as cplex, in the hope of better results for my specific data.
P.S.2. I am aware this question might be broad. I am open to any remarks in order to improve my questions
Gurobi has a good example of this with an interactive map and example code, see here. They are solving the same problem essentially that you describe, you just need to provide your own input locations and connections.
I am not sure you can create a graph that represents your model and on which you can solve a vanilla TSP: If you have multiple edges between nodes then we already agreed that you can remove any but the cheapest one. If you instead duplicate nodes then you have the problem that you no longer want to visit all nodes but only exactly one node from a set of duplicates. This is not addressed by TSP solvers.
However, you said you want to solve concorde/cplex. How about dropping the concorde part? You could take a MIP formulation for TSP. That should be easy to extend to include your additional constraints. For example, you could go back to multiple edges and add conditions like "if you enter city A by car then you have to leave by car or pay an extra N". You can then feed this to a general MIP solver like CPLEX.
At a high level, Concorde is a very efficient implementation of column generation (branch and price and cut) for the TSP.
You can surely develop a very efficient implementation of column generation for your problem, where the routes would satisfy your constraints.
Marco Luebbecke has a very nice general-purpose tutorial. The TSP book is the reference work on the subject.

Can proc sql embedded in sas macros dynamically merge to data-sets, simulating residential treatment placement decisions for trouble youth?

Good afternoon and happy Friday, folks
I’m trying to automate a placement simulation of youth into residential treatment where they will have the highest likelihood of success. Success is operationalized as “not recidivating” within 3 years of entering treatment. Equations predicting recidivism have been generated for each location, and the equations have been applied to each individual in the scenario (based on youth characteristics like risk, age, etc., LOS). Each youth has predicted success rates for every location, which throws in a wrench: youth are not qualified for all of the treatment facilities for which they have predicted success rates. Indeed, treatment locations have differing, yet overlapping qualifications.
Let’s take a made-up example. Johnny (ID # 5, below) is a 15-year-old boy with drug charges. He could have “predicted success rates” of 91% for location A, 88% for location B, 50% for location C, and 75% for location D. Johnny is most likely to be successful (i.e., not recidivate within three years of entering treatment) if he is treated at location A; unfortunately, location A only accepts youth who are 17 years old or older; therefore, Johnny would not qualify for treatment here. Alternatively, for Johnny, location B is the next best location. Let us assume that Johnny is qualified for location B, but that all of location-B beds are filled; so, we must now look to location D, as it is now Johnny’s “best available” option at 75%.
The score so far: We are matching youth to available beds in location for which they qualify and might enjoy the greatest likelihood of success. Unfortunately, each location only has a certain number of available beds, and the number of available beds different across locations. The qualifications of entry into treatment facilities differ, yet overlap (e.g., 12-17 year-olds vs 14-20 year-olds).
In order to simulate what placement decisions might look like based on success rates, I went through the scenario describe above for over 400 youth, by hand, in excel. It took me about a week. I’d like to use PROC SQL imbedded in a SAS MACRO to automate these placement scenarios with the ultimate goals of a) obtain the ability to bootstrap iterations in order to examine effect sizes across distributions, b) save time, and c) prevent further brain damage from banging my head again desk and wall in frustration whilst doing this by hand. Whilst never having had the necessity—nay—the privilege of using SQL in my typical roll as a researcher, I believe that this time has now come to pass and I’m excited about it! Honestly. I believe it has the capacity I’m looking for. Unfortunately, it is beating the devil out of me!
Here’s what I’ve got cookin’ so far: I want to create and automate the placement simulation with the clever use of merging/joining/switching/or something like that.
I have two datasets (tables). The first dataset contains all of the youth information (one row per youth; several columns with demographics, location ranks, which correspond to the predicted success rates). The order of rows in the youth dataset (was/will be randomly generated (to simulate the randomness with which youth enter the system and are subsequently place into treatment). Note that I will be “cleaning” the youth dataset prior to merging such that rank-column cells will only be populated for programs for which a respective youth qualifies. This should take the “does the youth even qualify for the program” problem out of the equation.
However, it still leaves the issue of availability left to be contended with in the scenario.
The second dataset containing the treatment facility beds, with each row corresponding to an available bed in one of the treatment location; two columns contain bed numbers and location names. Each bed (row) has only one location cell populated, but locations will populate several cells.
Thus, in descending order, I want to merge each youth row with the available bed that represents his/her best chance of success, and so the merge/join/switch/thing should take place
on youth.Rank1= distinct TF.Location,
and if youth.Rank1≠ TF.location then
merge on youth.Rank2= TF.location,
if youth.Rank2≠ TF.location then merge at
youth.Rank3 = TF.location, etc.
Put plainly: “Merge on rank1 unless rank1 location is no longer available, then merge on rank2, unless rank2 location is no longer available, and on down the line, etc., etc., until all option are exhausted and foster care (i.e., alternative services). Is the only option.
I’ve had no success getting this to work. I haven’t even been successful getting the union function to work. About the only successful thing I’ve done in SQL so far is create a view of a single dataset. It’s pretty sad. I’ve been following this guidance, but I get hung up around the “where” command:
proc sql; /Calls the SQL procedure*/;
create table x as /*Tells SAS to create a table called x*/
select /*Specifies the column(s) to be selected*/
from /*Specificies the tables(s) (data sets) to be queried*/
where /*Subjests the data based on a condition*/
group by /*Classifies the data into groups based on the specified
column(s)*/
order by /*Sorts the resulting rows observations) by the specified
column(s)*/
; quit; /*Ends the proc sql procedure*/
Frankly, I’m stuck and I could use some advice. This greenhorn in me is in way over his head.
I appreciate any help or guidance anyone might lend.
Cheers!
P
The process you describe (and to be honest I skiped to the end so I might of missed something) does not lend itself to SQL because each step could affect the results of the next one. However, you want to get the most best results for the most kids. (I think a lot of that text was to convince us how important it is to help out). You don't actually give us anything we can really use to help since you don't give any details of your data model, your data, or expected results. There really is no way to answer this question. But I don't care -- I'm going to go forward with some suggestions because it is a friday and I've never done a stream of consciousness answer to a stream of consciousness question before. I will suggest you don't formulate your solution just in sql, but instead use a higher level program and engage is a process like the one described below -- because this a DB questions I've noted the locations where the DB might be involved.
Generate a list kids (this can be in a table -- called NEEDY-KID)
Have a list of locations to assign (this can also be a table LOCATION)
Run your matching for best fit from KID to location -- at this point don't worry about assign more than one kid to a location -- there can be duplicates (put this in table called KID2LOC using a query)
Check KID2LOC for locations assigned twice -- use some method to remove the duplicate ones so each loc is only assigned once. (remove from the KID2LOC using a query)
Prune the LOCATION list to remove assigned locations (once again -- a query)
If kids exist without a location go to 3 with new pruned location list.
Done.

Creating a workable Redis store with several filters

I am working on a system to display information about real estate. It runs in angular with the data stored as a json file on the server, which is updated once a day.
I have filters on number of bedrooms, bathrooms, price and a free text field for the address. It's all very snappy, but the problem is the load time of the app. This is why I am looking at Redis. Trouble is, I just can't get my head round how to get data with several different filters running.
Let's say I have some data like this: (missing off lots of fields for simplicity)
id beds price
0 3 270000
1 2 130000
2 4 420000
etc...
I am thinking I could set up three sets, one to hold the whole dataset, one to create an index on bedrooms and another for price:
beds id
2 1
3 0
4 2
and the same for price:
price id
130000 1
270000 0
420000 2
Then I was thinking I could use SINTER to return the overlapping sets.
Let's say I looking for a house with more than 2 bedrooms that is less than 300000.
From the bedrooms set I get IDs 0,2 for beds > 2.
From the prices set I get IDs 0,1 for price < 300000
So the common id is 0, which I would then lookup in the main dataset.
It all sounds good in theory, but being a Redis newbie, I have no clue how to go about achieving it!
Any advice would be gratefully received!
You're on the right track; sets + sorted sets is the right answer.
Two sources for all of the information that you could ever want:
Chapter 7 of my book, Redis in Action - http://bitly.com/redis-in-action
My Python/Redis object mapper - https://github.com/josiahcarlson/rom (it uses ideas directly from chapter 7 of my book to implement sql-like indices)
Both of those resources use Python as the programming language, though chapter 7 has been translated into Java: https://github.com/josiahcarlson/redis-in-action/ (go to the java path to see the code).
... That said, a normal relational database (especially one with built-in Geo handling like Postgres) should handle this data with ease. Have you considered a relational database?

TSQL grouping on fuzzy column

I would like to group all the merchant transactions from a single table, and just get a count. The problem is, the merchant, let's say redbox, will have a redbox plus the store number added to the end(redbox 4562,redbox*1234). I will also include the category for grouping purpose.
Category Merchant
restaurant bruger king 123 main st
restaurant burger king 456 abc ave
restaurant mc donalds * 45877d2d
restaurant mc 'donalds *888544d
restaurant subway 454545
travelsubway MTA
gas station mc donalds gas
travel nyc taxi
travel nyc-taxi
The question: How can I group the merchants when they have address or store locations added on to them.All I need is a count for each merchant.
The short answer is there is no way to accurately do this, especially with just pure SQL.
You can find exact matches, and you can find wildcard matches using the LIKE operator or a (potentially huge) series of regular expressions, but you cannot find similar matches nor can you find potential misspellings of matches.
There's a few potential approaches I can think of to solve this problem, depending on what type of application you're building.
First, normalize the merchant data in your database. I'd recommend against storing the exact, unprocessed string such as Bruger King in your database. If you come across a merchant that doesn't match a known set of merchants, ask the user if it already matches something in your database. When data goes in, process it then and match it to an existing known merchant.
Store a similarity coefficient. You might have some luck using something like a Jaccard index to judge how similar two strings are. Perhaps after stripping out the numbers, this could work fairly well. At the very least, it could allow you to create a user interface that can attempt to guess what merchant it is. Also, some database engines have full-text indexing operators that can descibe things like similar to or sounds like. Those could potentially be worth investigating.
Remember merchant matches per user. If a user corrects bruger king 123 main st to Burger King, store that relation and remember it in the future without having to prompt the user. This data could also be used to help other users correct their data.
But what if there is no UI? Perhaps you're trying to do some automated data processing. I really see no way to handle this without some sort of human intervention, though some of the techniques described above could help automate this process. I'd also look at the source of your data. Perhaps there's a distinct merchant ID you can use as a key, or perhaps there exists somewhere a list of all known merchants (maybe credit card companies provide this API?) If there's boat loads of data to process, another option would be to partially automate it using a service such as Amazon's Mechanical Turk.
You can use LIKE
SELECT COUNT(*) AS "COUNT", "BURGER KING"
FROM <tables>
WHERE restaurant LIKE "%king%"
UNION ALL
SELECT COUNT(*) AS "COUNT", "JACK IN THE BOX"
FROM <tables>
Where resturant LIKE "jack in the box%"
You may have to move the wildcards around depending on how the records were spelled out.
It depends a bit on what database you use, but most have some kind of REGEXP_INSTR or other function you can use to check for the first index of a pattern. You can then write something like this
SELECT SubStr(merchant, 1, REGEXP_INSTR(merchant, '[0-9]')), count('x')
FROM Expenses
GROUP BY SubStr(merchant, 1, REGEXP_INSTR(merchant, '[0-9]'))
This assumes that the merchant name doesn't have a number and the store number does. However you still may need to strip out any special chars with a replace (like *, -, etc).

help with tree-like structure

I've got some financial data to store and manipulate. Let's say I have 2 divisions, with offices in 2 cities, in 2 currencies, and 4 bank accounts. (It's actually more complex than that.) I want to show a list like this:
Electronics
Chicago
Dollars
Account 2 -> transactions in acct2 in $ in chicago/electronics
Euros
Account 1 -> transactions in acct1 in E in chicago/electronics
Account 3 -> etc.
Account 4
Brussles
Dollars
Account 1
Euros
Account 3
Account 4
Dessert Toppings
Chicago
Dollars
Account 1
Account 4
Euros
Account 2
Account 4
Brussles
Dollars
Account 2
Euros
Account 3
Account 4
So at each level except the top, the category can appear in multiple places. I've been reading around about the various methods, but none of the examples seem to address my particular use case, where nodes can appear in more than one place in the hierarchy. (Maybe there's a different name for this than "tree" or "hierarchy".)
I guess my hierarchy is actually something like Division > City > Currency with 'Electronics' and 'Euros' merely instances of each level, but I'm not quite sure how that helps or hurts.
A few notes: this is for a demo site, so the dataset won't be large -- ease of set-up and maintenance is more important than query efficiency. (I'm actually considering just building a data object by hand, though I'd much rather do it the right way.) Also, FWIW, we're working in php with an ms access back-end, so any libraries out there that make this easy in that environment would be helpful. (I've found a couple of implementations of the nested set pattern already.)
Are you sure you want to use a hierarchical design for this? To me, the hierarchy seems more a consequence of the desired output format than something intrinsic to your data structure.
And what if you have to display the data in a different order, like City > Currency > Division? Wouldn't that be very cumbersome?
You could use a plain structure instead, with a table for Branches, one for Cities, one for Currencies, and then then one Account table with Branch_ID, City_ID, and Currency_ID as foreign keys.
I'm not sure what database platform you're using. But if you're using MS SQL Server, then you should check out recursive queries using common table expressions (CTEs). They're easy to use and are designed for exactly the type of situation you've illustrated (a bill of materials, for instance). Check out this website for more detail: http://www.mssqltips.com/tip.asp?tip=1520
Good luck!