How to handle records when user is selecting "All" records? - sql

I have requirement where user wants "All" option is few fields.
1.Sites has records around 20 (Includes All option)
2.Cost Centers which are dependent on 1.Sites has total records around 540 including all Sites. Sites may have different number of Cost Centers (Includes All option)
3.Employees which are dependent on 2.Cost Centers has total records around 29000. Each Cost Center may include different number of Employees. (Includes All option)
4. Processes Which are independents of all above. It includes records around 20.(Includes All option)
Now Sites, Cost Centers, Employees and Processes have dropdown with "All" along with other options.
How would i design database table. Considering below scenarios
User selects following:
Sites : Riyadh
Cost Centers : MA - Medical
Employees : All
Processes : Travel Request and Authorization
User has gone for All in Cost Center
Sites : Jeddah
Cost Centers : All
Employees : All
Processes : All
Likewise There are few others combinations. And How user should see inserted records so that He/She can easily navigate to record and update/delete it. Right now i was thinking of inserting single records for option "All". For e.g.
User Selects:
Sites : Riyadh
Cost Centers : Nursing
Employees : All
Processes : All
This will insert just one row in database table.
User has requirement that if he has 200 Employees under selected Cost Center and he wants to apply for only 70 Employees. He needs to do more work.
How user edit the inserted records afterwards. And How view of all records should be rendered so that editing particular records become easy for user.

You don't model the ALL in your data or you have to deal with people mis-assigning an employee to a cost center named ALL under a site named all. You don't want that!
Sites have cost centers, cost centers have employees, there are processes and (I assume) employees may be assigned to them, thus implying a table that links employees to processes. Only store REAL data.
Then you be smart in your queries so that if the user selects ALL for a given drop-down they get ALL matching records, and inserted data must meet proper referential integrity. A cost center must belong to a valid site. An employee must belong to a cost center and may have one or more processes they are linked to.
but putting in "All" placeholders? You're opening yourself up for a world of hurt managing pseudo-relationships versus real relationships if you go down that route.

Actually you have two relationship between Sites and Cost Centers (I'm narrowing it down only to those two entities). Both are optional and one of them must be defined.
The first relation is (un-problematic) zero to one relationship Site to Cost Center (covering the case that the cost center is known and assigned for the Site).
The second relationship covers the case, that no cost center is assigned and the cost must be "somehow allocated". The "ALL" may mean each cost center (say) receives equal share.
This split in two relationship makes the database design more clean, but it will not address the main problem, which is in querying the relation.
The problem is manifested in OR condition in join predicates (chasing both paths) which can lead to sub-optimal performance.
So this is the touchstone of your design, collect the main queries and check how they perform on sample data.
One possible approach to attack performance problems would be to define materialized views that expand the ALL relationship to every Cost Center (as proposed by #Michael) and that can be refreshed in case of a new Cost Center definition, so you need not to handle manually such changes.

Related

Capacity Management database

I am designing database using Microsoft Access for capacity management. I have different types of equipment and need to track the power usage by every month. For instance, I have HVAC system and Chiller System. Withing each system there are different equipments like AHU_1, AHU_2 ,AHU_3, MAU_1, MAU_2 and etc in HVAC system and CHWP_1, CHWP_2, CWP_1, CWP_2 and etc in Chiller system.
I need to track the power or usage by every month. For each system i have separate tables containing their respective equipments. What will be suitable way to track the usage? This is what I'm planning to do which I believe there are three options as in the picture below:
Creating a main table called Chiller_usage Table which will have all the equipments and dates with usage value. The problem i see is that it will has repetitive of each equipments due to dates and the pro is not many tables.
Creating each equipment table which will have dates and usage. The problem is I have around 60 to 70 equipments with 5 different major systems and will lead to mass amount of table which will be very difficult when making queries and reports.
Creating date table with equipments and usage value. This looks promising for now because i will have few table initially and as times goes on there will be 12 tables each year which is alot in the future.
What I'm thinking of is the first option since it is easy to manage when making custom queries because I need to perform calculation in terms of costing, usage analysis of each equipment with graphs and etc. But that i believe will be clumsy due to repetitive name of equipments due to variable dates. Is there any other viable options? Thank you.
Assuming you need to store monthly energy usage for each piece of equipment. Normalize the tables. neither the person entering the data or the manager asking for reports needs to see the complexity of the underlying tables. The person entering the data sees both a form for adding systems/equipment and a form for entering energy usage per equipment per month. The manager sees a report with just the information he wants like energy costs per a system per a year. The normalized tables can be recombined into human readable tables with queries. Access trys to make making the forms and reports as simple as clicking on those appropriate queries and then clicking on create form/report. In practice some assembly is required. Behind the scenes the forms put the correct data in the correct tables and the report shows only the data the manager wants. For instance. here is a normalized table structure based on the data you provided and some assumptions:
The tables are normalized and have the relationships shown. Beneath there is a query to show the total power each system uses between any two dates such as for a yearly report. So data looking like this:
is turned into this:

Data warehousing Model approach

We are in a process of building a health data warehouse. And have been having discussions over the basic structure of the data warehouse. I need your suggestions on pros and cons of the below structures. DWH will be used for reporting and research purpose. It will be a near real time data warehouse with latency time of around 5-10 minutes.
The Source database has one Encounter/visit table. Everything is saved in this table. It's the central table which links everything. So If I need to get a patient's journey in the production database, I just go to the encounter/visit table and see how many times a patient has come for a treatment/has been admitted or went back from emergency, has been admitted from emergency etc.
model 1 ->
Encounter/visit table having the common fields (like encounter_id,arrival_date,care_type etc)
and then further tables can be built as per the encounter types with encounter specific fields :
Encounter_Emergency (Emergency specific fields such as emergency diagnosis, triage category etc)
Encounter_Inpatient
Encounter_outpatient
Model 2 ->
Having separate tables as base tables and then create a view on the top which then includes all the encounter types together.
Encounter_Emergency (Emergency specific fields such as emergency diagnosis,triage category etc)
Encounter_Inpatient
Encounter_outpatient
model 3 ->
Encounter/visit table having all the fields as the source database
and views are created as per the encounter types with encounter specific fields :
view_Encounter_Emergency
view_Encounter_Inpatient
view_Encounter_outpatient
these views can be further combined with the emergency_diagnosis table to get the diagnosis or emergency_alerts table to access the alerts etc.
A prime consideration would be how often there will be additions, deletions, or alterations to Encounter Types.
Model B will require extensive rework in advance of any such change just to make sure the data continues to be captured. Either of the other two models will continue to capture reclassed data, but will require rework to report on it.
As between A and C, the question becomes traffic. Views are comparatively easy to spin up/down, but they'll be putting load on that big base table. That might be acceptable if the DW won't have tons of load on it. But if there will be extensive reporting (Pro Tip there's always more extensive reporting than the business tells you there will be), it may be more advantageous to break the data out into stand alone tables.
There is, of course, ETL overhead to maintaining all of those tables.
For speed of delivery, perhaps build Model C, but architect Model A in case consumption requires the more robust model. For the record, you could build Views that don't have any kind of vw_ prefix, or any other identifier in their names that lets users know that they're views. Then, later, you can replace them with tables of the same name, and legacy queries against the old views will continue to work. I've done just the same thing in the opposite direction, sneaking in views to replace redundant tables.

How to efficiently filter large amount of records based on user permissions on specific records with specific criteria?

I'm working as a maintainer for a legacy Java-based cargo railway consignment note accounting system. There is a serious performance issue with retrieving a list of consignment notes to display on their website.
I cannot publish the entire query, but here are some statistics to give the general idea:
it has 17 left joins
it has a huge where clause with 5 OR groups to determine if a user is allowed to access a record because of a specific relation to the record (consignor, consignee, carrier, payer, supervisor) and to check user's permission to access records related to a specific railway station
each of the OR group has, in average, two exists() checks with subqueries on some data related to the record and also to check the station permission
when expanded to be human-readable, the query is about 200 lines long
Essentially, the availability of each record to currently logged-in user depends on the following factors:
- the company of the user
- the company of the carrier, consignee, consignor, payer of each specific consignment note
- every consignment note has multiple route sections and every section has its own carrier and payer, thus requiring further access control conditions to make these records visible to the user
- every consignment note and every route section has origin and destination stations, and a user is allowed to see the record only if he has been given access to any of these stations (using a simple relation table).
There are about 2 million consignment note records in the database and the customer is complaining that it takes too long to load a page with 20 records.
Unfortunately it is not possible to optimize the final query before passing it to the RDBMS (Oracle 11g, to be specific) because the system has complex architecture and a home-brew ORM tool, and the final query is being assembled in at least three different places that are responsible for collection of fields to select, collection of joins, adding criteria selected in the UI and, finally, the reason for this question - the permission related filter.
I wouldn't say that the final query is very complex; on the contrary, it is simple in its nature but it's just huge.
I'm afraid that caching solutions wouldn't be very effective in this case because data changes very often and the cache would be overwritten every minute or so. Also, because of individual permissions, each user should have own cache that would have to be maintained.
Besides the usual recommendations - dealing with indexes and optimizing each subquery as much as possible - are there any other well-known solutions for filtering large amount of records based on complex permission rules?
Just my two cents, since I see no other answers around.
First of all you would need to get the execution plan of the query. Without it, it's not that easy to have an idea of what could get improved. It sounds like a nice challenge, if it wasn't for your urgency.
Well, you say the query has 17 left joins. Does that mean there is a single main table in the query? If so, then that's the first section I would optimize. The key aspect is to reduce the TABLE ACCESS BY ROWID operations as much as possible on that table. The typical solution is to add well tailored indexes to narrow down the INDEX RANGE SCAN as much as possible on that table, therefore reducing the heap fetches.
Then, when navigating the rest of the [outer] tables (presumably using NESTED LOOPS) you can try materializing some of those conditions into simple 0/1 flags you could use, instead of the whole conditions.
Also, if you only need 20 rows, I would expect that to be very fast... well as long as the query is properly pipelined. If in your case it's taking too long, then it may not be the case. Are you sorting/aggregating/windowing by some specific condition that prevents pipelining? This condition could be the most important factor to index if you just need 20 rows.
Finally, you could try avoiding heap fetches by using "covering indexes". That could really improve performance of your query, but I would leave it as a last resort, since they have their downsides.
Well, again a good solution really requires to take a good look at the execution plan. If you still are game, post it, and I can look at it.

How to build a proper DB schema to have "periodic snapshots" of a table for a selected day? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Problem to be solved:
Im new to DataBases and Im trying to find out the best way to store changes in a table, that is a daily snapshot of some statuses: eg. "hotel_room_rentals" table (with 20 columns - every can change).
Id like to be able to generate that table for a selected day (e.g. data inside changes on production, so I have to store it somewhere else), or do some other transformations on it (e.g. average number of days rented in a period)
My theoretical example - detailed:
Let's say that Im creating a DB for a hotel.
In the production system I have a table that shows info for all 10 000 rooms in the hotel.
This is a daily snapshot - let's assume that the table is updated once per day.
Some attributes of a room change often: e.g. is_rented; customer_number, rate_usd.
Some attributes dont change too often: e.g. disabled_room, room_color, type_of_furniture.
Room_number obviously does not change (primary key)
Now I want to find the best way to track changes in this table; the best way to create statistics on base of this table (e.g. average number of days rented in a period) and to be able to generate the table for selected date (e.g. 2013-01-01)
MY IDEA:
Since I have no clue about databases, my idea is to copy the whole table every day, with 1 more column, called "DB_dump_date" (with a date). This is a pretty straightforward approach, which will probably require a lot of space; since my 10k rooms table, will have to be copied 365 times in a year.
OTHER SOLUTIONS:
On some other website, I was recommended to create two tables:
"Reservation" table with these columns: Startdate Enddate Room Rate Occupant_name
Then to transform this table into a FactReservations table: Date Room Is_occupied Rate Occupant_name
I do not understand how does this help me... in fact I assume I would have to make 20 intermediary tables and then 20 Fact tables (since I have 20 columns in my database).
QUESTIONS:
What are the recommended ways to deal with such problems?
Is there any DB schema that is prepared to deal with it, without the user making magic ETLs? (e.g. a DB that can optimize the problem by itself)
What are the alternatives?
How would you, smart people, do this? (preferably in MS Access... or some freeware technology)
edit:
one more thing - everything can change in the table, not only room reservetions, everything; and I want to be able to track the changes
stop - slow down - and take a breath.
do not - repeat do not make copies of tables each day. this approach is way off base.
your problem is a normalization problem. as you indicate - you have other suggestions on how to normalize - this is the direction you want to go.
Your goal will be to find a structure that accommodates the SQL statements that can answer your questions (and hopefully many more that you haven't thought up yet) This will be one static model where the tables do not change or get copied, but are instead static - and the only thing that changes is the data inside the tables. (ideally - to me there will also be few to no updates, only inserts)
You will certainly need a ROOM table, and a CUSTOMER table, and then a relation between them possibly RESERVATION.
these can then fill up - and you can get all the answers to the questions you posed without any copying or materialization or anything.. just SQL.
You need to focus on the requirements and start there. So far for requirements I see are:
-Generate that table for a selected day
-average number of days rented in a period
If we consider two extremes of design, at the more complex end would be a datamart with SCD tables, tracking changes to rooms, and at the simple end would be some kind of log table, along the lines of what you have already mentioned.
Reading between the lines, I don't really see any requirement for knowing the attributes of a room on a given day, but I do see a requirement for analysis of historical transactions.
So my suggestion is have a good hard think about your requirements before you start designing the database.
There is no magic design to cover this automatically. Dimensional design is a standard way of modelling business data to allow for easy analysis, but it might be over the top for your requirement.
Welcome to the world of databases! With that in mind – take almost everything that you know about Excel and throw it out the window. Whereas it’s much more difficult in Excel to define relationships between two sheets of a workbook and report off of those two different sheets, so the majority of the time it’s easier to simply copy the same data down a single sheet, it’s trivially easy to do using Access or any other relational database.
Typically what you’d want to do is create several normalized tables and define a relationship between them. Then, when querying the view, you can easily join between the tables to get the data that you need.
So, working off of the assumption that you’re building this for simple reporting and not to create a property management system (if you are looking at that – I’d recommend that you look at some of the players in the industry, like Micros or Agilysys), based on my experience working in the industry, I’d recommend the following table layout:
Reservations – this holds the reservation information (guest name,
arrival date, departure date, check-in date, check-out date, rate if
you use a blended rate, etc.)
Rooms – this holds information on your rack (number, wing code, max
guests, # beds, smoking/non, view, type, etc.)
Room Status – Only if you need to track if a room is on
reserve/hold/OOO/OTM (Status type, date start, date end)
Room Status Types – Types of room status holds and how it affects
inventory (type, out of inventory flag)
Rates (if you don’t use a blended rate) – one entry per reservation
per night (guest, rate)
Personally, I’m a huge fan of using surrogate keys for the unique identifiers, because all too often I've been burned where something changes in the business process and a natural key that was previously unique all of a sudden can be duplicated. In that vein, each table would have a surrogate key and the joins would be as follows:
Reservations – Rooms (many to one)
Rooms – Room Status (one to many)
Room Status – Room Status Types (many to one)
Reservations – Rates (one to many)
If you define the relationships properly in Access (i.e. foreign key relationships in other DBMS), it should automatically use them to build your joins when creating your queries (called Views in just about every other DBMS) or reports.
For learning about databases I’d recommend that you review:
Wikipedia on Join types
Wikipedia on Slowly Changing Dimension (you could use some of
these techniques to record changes in room information over time)
Wikipedia on Relational Databases
Office documentation on Access
Kimball Group Design Tips (great for data warehouse/datamart
design)
if you need to use your existing table then the following is not applicable. If the data can be migrated to a new schema then this will readily address the challenge. TRE is an approach which uses the current view paradigm for development but fully supports the time dimensions of data (which are system time=when the data goes into the db and valid time=the business time which applies to the data). By working in the current view approach of TRE this sort of problem is straightforward. Take a look at:- http://youtu.be/V1EcsuJxUno

Quickly compute millions of values for a search

Let's say I have a database of millions of widgets with a price attribute. Widgets belong to suppliers, and I sell widgets to customers by first buying them from suppliers and then selling them to the customer. With this basic setup, if a customer asks me for every widget less than $50, it's trivial to list them.
However, I mark up the price of widgets from individual suppliers differently. So I may mark up widgets from Supplier A by 10%, and I may mark up widgets from Supplier B by a flat rate of $5. In a database, these markups would be stored in a join table with my ID, the supplier ID, a markup type (flat, percentage), and a markup rate. On top of this, suppliers may add their own markups when they sell to me (these markups would be in the same join table with the supplier's ID, my ID, and a markup type/rate).
So if I want to sell a $45 widget from Supplier A, it might get marked up by the supplier's 10% markup (to $49.50), and then my own $10 flat markup (to $59.50). This widget would not show up in the client's search for widgets costing less than $50. However, it's possible that an $80 widget could get marked down to $45 by the time it reaches the client, and should be returned in results. These markups are subject to change, and let's assume I'm one of hundreds of people in this system selling widgets to customers through suppliers, all with their own markup relationships in that markup table.
Is there any precedent for performing calculations like this quickly across millions of objects? I realize this is a huge, non-trivial problem, but I'm curious how one would start addressing a problem like this.
Add columns to your database and store the computed results, updating them with the related records change. You cannot calculate these values on the fly for millions of records.
Is there any precedent for performing calculations like this quickly across millions of
objects?
Standard. Seriously. Data warehouse, risk projections. Stuff like that - your problem is small. Precaulcuate all combinations, store them in a proper higher level database server, finished.
it is not huge - seriously. It is only huge for a small server, but once you get a calculation grid going... it is quite trivial. Millions of objects? Calculate 100.000 objects in a minute per machine, 10 million are 100 minute objects. And you dont have THAT many changes.