How to avoid creating a date island in QlikView? - qlikview

I'm a beginner developer and I have a database which has several different dates.
Created Date
Converted Date
Lost Date
Changed Date
etc.
The data needs to be shown in one application and filter on all dates. I am coding in QlikView and I could create and date island and use their native set analysis to use filter the data, but that is having a major impact on performance.
Anyone coding in QlikView come across a similar scenario?

Set analysis indeed has a major impact on performance. You are better off using the normal 'selection' functionality in QlikView.
For the answer below I am going to assume that you are familiar with the concept of Star Schema development. In short it means separating Dimensions (selection fields) from Fact fields (counter fields, summation fields, etc.) and connecting them via a link table.
There are two possible scenarios:
1. More than one date is related to the same fact.
For example you have a ´sales transactions´ table which has as a fact the amount of money involved in the sale, and there is not only the ´sale date´ but also the ´payment date´ and you want to select on both. In this case you want to have several independent date selections, since you cannot be sure whether the user wants to select on Converted date, Created date... etc. You need to duplicate your ´date island´ with different keynames and connect it to your transactions table twice. Both date pools will no longer be islands and are more properly called ´Calendar dimensions´.
2. Different dates are related to different facts.
In this case you can use one 'Calendar dimension' to accommodate for all date fields. Simply create one AutoNumber key in your calendar and call it %DateKey. Make this field the connection between your calendar table and your link table. Now for all Fact Tables that have a date which you want to make selectable with the calendar, make sure you connect it to the linktable using a key that includes the Date in the Autonumber hash.

Having it experienced this same what i would reccomend would be creating what i call a Key Table like the example below ; keeps the relationships and you don't have to use set analysis as much; just make sure you put a table with all posible dates as one of the child tables and a %DateKey like littlegreen suggested

Related

Search one table (Encounters Table) for all records - Insert Totals into a Providers Table

Search one table (Encounters Table) for all records - Insert Totals into a Providers Table
Encounters table lists every hospital encounter over time. Each Encounter record has a Provider (Provider is also in the Providers Table one time as a record - 229 providers, presently.)
The encounter table lists the date and the provider, so that one provider may have 15 to 20 encounter records for that date.
In the Providers table I have been trying to get a count in different fields (formula fields) that will list the number of encounters for the provider in a time frame. Examples of the time frame are current week, past week... or could be last month, two months ago, etc. So the Providers Table primarily would have a key field of Providers which would relate to the Providers in the Encounter table.
Now the software used is Quickbase which is a Cloud Based Database used by the hospital I work for.
I have been unable to use a one-to-many relationship to pull the Provider from the Encounters table into the Provider's table. The column generated is blank. So when I try to pull encounter counts into formula number fields like 'current_week', last_week, last_month, etc. the counts just say 0.
The Quickbase help tells me that I need to populate the related Provider fields so the names appear, but I'm thinking I can't just use a file import (ie CSV file) to fill in the Encounters table since it has thousands of records and there are only 229 providers.
Queries can be done, and knowing that SQL handles this type of work, it just seems that may be the best way to handle this. My SQL isn't that good, but I think I could write a formula that handles this with two tables. I just am not certain how to fill the Encounter Count fields.
This is a bit of a hurdle and perhaps because I'm using a proprietary software package 'Quickbase' it may not be possible.
I do know that queries are used in Quickbase, so the logic of a SQL statement should work in the dialog box used to write formulas in QB.
Any help would be appreciated.
If the only place you need the see the # of Encounters Per Provider in a time frame, I'd try to make it work with a Summary Field in Quickbase and apply a filter for a date range (I'm assuming there is an existing relationship between Providers and Encounters where Providers is the parent).
You can get some good tips on how to configure that field in the Quickbase Community https://community.quickbase.com/home

Custom user defined database fields, what is the best solution?

To keep this as short as possible I'm going to use and example.
So let's say I have a simple database that has the following tables:
company - ( "idcompany", "name", "createdOn" )
user - ( "iduser", "idcompany", "name", "dob", "createdOn" )
event - ( "idevent", "idcompany", "name", "description", "date", "createdOn" )
Many users can be linked to a single company as well as multiple events and many events can be linked to a single company. All companies, users and events have columns as show above in common. However, what if I wanted to give my customers the ability to add custom fields to both their users and their events for any unique extra information they wish to store. These extra fields would be on a company wide basis, not on a per record basis ( so a company adding a custom field to their users would add it to all of their users not just one specific user ). The custom fields also need to be sesrchable and have the ability to be reported on, ideally automatically with some sort of report wizard. Considering the database is expected to have lots of traffic as well as lots of custom fields, what is the best solution for this?
My current research and findings in possible solutions:
To have generic placeholder columns such as "custom1", "custom2" etc.
** This is not viable as there will eventually be too many custom columns and there will be too many NULL values stored in the database
To have 3x tables per current table. eg: user, user-custom-field, user-custom-field-value. The user table being the same. The user-custom-field table containing the information about the new field such as name, data type etc. And the user-custom-field-value table containing the value for the custom field
** This one is more of a contender if it were not for its complexity and table size implications. I think it will be impossible to avoid a user-custom-field table if I want to automatically report on these fields as I will have to store the information on how to report on these fields here. However, In order to pull almost any data you would have to do a million joins on the user-custom-field-value table as well as the fact that your now storing column data as rows which in a database expected to have a lot of traffic as well as a lot of custom fields would soon cause a problem.
Create a new user and event table for each new company that is added to the system removing the company id from within those tables and instead using it in the table name ( eg user56, 56 being the company id ). Then allowing the user to trigger DB commands that add the new custom columns to the tables giving them the power to decide if it has a default value or auto increments etc.
** Everytime I have seen this solution it has always instantly been shut down by people saying it would be unmanageable as you would eventually get thousands of tables. However nobody really explains what they mean by unmanageable. Firstly as far as my understanding goes, more tables is actually more efficient and produces faster search times as the tables are much smaller. Secondly, yes I understand that making any common table changes would be difficult but all you would have to do is run a script that changes all your tables for each company. Finally I actually see benefits using this method as it would seperate company data making it impossible for one to accidentally access another's data via a potential bug, plus it would potentially give the ability to back up and restore company data individually. If someone could elaborate on why this is perceived as a bad idea It would be appreciated.
Convert fully or partially to a NoSQL database.
** Honestly I have no experience with schemaless databases and don't really know how dynamic user defined fields on a per record basis would work ( although I know it's possible ). If someone could explain the implications of the switch or differences in queries and potential benefits that would be appreciated.
Create a JSON column in each table that requires extra fields. Then add the extra fields into that JSON object.
** The issue I have with this solution is that it is nearly impossible to filter data via the custom columns. You would not be able to report on these columns and until you have received and processed them you don't really know what is in them.
Finally if anyone has a solution not mentioned above or any thoughts or disagreements on any of my notes please tell me as this is all I have been able to find or figure out for myself.
A typical solution is to have a JSON (or XML) column that contains the user-defined fields. This would be an additional column in each table.
This is the most flexible. It allows:
New fields to be created at any time.
No modification to the existing table to do so.
Supports any reasonable type of field, including types not readily available in SQL (i.e. array).
On the downside,
There is no validation of the fields.
Some databases support JSON but do not support indexes on them.
JSON is not "known" to the database for things like foreign key constraints and table definitions.

What is the preferred way to store custom fields in a SQL database?

My friend is building a product to be used by different independent medical units.
The database stores a vast collection of measurements taken at different times, like the temperature, blood pressure, etc...
Let us assume these are held in a table called exams with columns temperature, pressure, etc... (as well as id, patient_id and timestamp). Most of the measurements are stored as floats, but some are of other types (strings, integers...)
While many of these measurements are handled by their product, it needs to allow the different medical units to record and process other custom measurements. A very nifty UI allows the administrator to edit these customs fields, specify their name, type, possible range of values, etc...
He is unsure as to how to store these custom fields.
He is leaning towards a separate table (say a table custom_exam_data with fields like exam_id, custom_field_id, float_value, string_value, ...)
I worry that this will make searching both more difficult to achieve and less efficient.
I am leaning towards modifying the exam table directly (while avoiding conflicts on column names with some scheme like prefixing all custom fields with an underscore or naming them custom_1, ...)
He worries about modifying the database dynamically and having different schemas for each medical unit.
Hopefully some people which more experience can weigh in on this issue.
Notes:
he is using Ruby on Rails but I think this question is pretty much framework agnostic, except from the fact that he is only looking for solutions in SQL databases only.
I simplified the problem a bit since the custom fields need to be available for more than one table, but I believe this doesn`t really impact the direction to take.
(added) A very generic reporting module will need to search, sort, generate stats, etc.. of this data, so it is required that this data be stored in the columns of the appropriate type
(added) User inputs will be filtered, for the standard fields as well as for the custom fields. For example, numbers will be checked within a given range (can't have a temperature of -12 or +444), etc... Thus, conversion to the appropriate SQL type is not a problem.
I've had to deal with this situation many times over the years, and I agree with your initial idea of modifying the DB tables directly, and using dynamic SQL to generate statements.
Creating string UserAttribute or Key/Value columns sounds appealing at first, but it leads to the inner-platform effect where you end up having to re-implement foreign keys, data types, constraints, transactions, validation, sorting, grouping, calculations, et al. inside your RDBMS. You may as well just use flat files and not SQL at all.
SQL Server provides INFORMATION_SCHEMA tables that let you create, query, and modify table schemas at runtime. This has full type checking, constraints, transactions, calculations, and everything you need already built-in, don't reinvent it.
It's strange that so many people come up with ad-hoc solutions for this when there's a well-documented pattern for it:
Entity-Attribute-Value (EAV) Model
Two alternatives are XML and Nested Sets. XML is easier to manage but generally slow. Nested Sets usually require some type of proprietary database extension to do without making a mess, like CLR types in SQL Server 2005+. They violate first-normal form, but are nevertheless the fastest-performing solution.
Microsoft Dynamics CRM achieves this by altering the database design each time a change is made. Nasty, I think.
I would say a better option would be to consider an attribute table. Even though these are often frowned upon, it gives you the flexibility you need, and you can always create views using dynamic SQL to pivot the data out again. Just make sure you always use LEFT JOINs and FKs when creating these views, so that the Query Optimizer can do its job better.
I have seen a use of your friend's idea in a commercial accounting package. The table was split into two, first contained fields solely defined by the system, second contained fields like USER_STRING1, USER_STRING2, USER_FLOAT1 etc. The tables were linked by identity value (when a record is inserted into the main table, a record with same identity is inserted into the second one). Each table that needed user fields was split like that.
Well, whenever I need to store some unknown type in a database field, I usually store it as String, serializing it as needed, and also store the type of the data.
This way, you can have any kind of data, working with any type of database.
I would be inclined to store the measurement in the database as a string (varchar) with another column identifying the measurement type. My reasoning is that it will presumably, come from the UI as a string and casting to any other datatype may introduce a corruption before the user input get's stored.
The downside is that when you go to filter result-sets by some measurement metric you will still have to perform a casting but at least the storage and persistence mechanism is not introducing corruption.
I can't tell you the best way but I can tell you how Drupal achieves a sort of schemaless structure while still using the standard RDBMSs available today.
The general idea is that there's a schema table with a list of fields. Each row really only has two columns, the 'table':String column and the 'column':String column. For each of these columns it actually defines a whole table with just an id and the actual data for that column.
The trick really is that when you are working with the data it's never more than one join away from the bundle table that lists all the possible columns so you end up not losing as much speed as you might otherwise think. This will also allow you to expand much farther than just a few medical companies unlike the custom_ prefix you were proposing.
MySQL is very fast at returning row data for short rows with few columns. In this way this scheme ends up fairly quick while allowing you lots of flexibility.
As to search, my suggestion would be to index the page content instead of the database content. Use Solr to parse through rendered pages and hold links to the actual page instead of trying to search through the database using clever SQL.
Define two new tables: custom_exam_schema and custom_exam_data.
custom_exam_data has an exam_id column, plus an additional column for every custom attribute.
custom_exam_schema would have a row to describe how to interpret each of the columns of the custom_exam_data table. It would have columns like name, type, minValue, maxValue, etc.
So, for example, to create a custom field to track the number of fingers a person has, you would add ('fingerCount', 'number', 0, 10) to custom_exam_schema and then add a column named fingerCount to the exam table.
Someone might say it's bad to change the database schema at run time, but I'd argue that configuring these custom fields is part of set up and won't happen too often. Still, this method lets you handle changes at any time and doesn't risk messing around with your core table schemas.
lets say that your friend's database has to store data values from multiple sources such as demogrphic values, diagnosis, interventions, physionomic values, physiologic exam values, hospitalisation values etc.
He might have as well to define choices, lets say his database is missing the race and the unit staff need the race of the patient (different races are more unlikely to get some diseases), they might want to use a drop down with several choices.
I would propose to use an other table that would have these choices or would you just use a "Custom_field_choices" table, which at some point is exactly the same but with a different name.
Considering that the database :
- needs to be flexible
- that data from multiple tables can be added and be customized
- that you might want to keep the integrity of the main structure of your database for distribution and uniformity purpose
- that data MUST have a limit and alarms and warnings
- that data must have units ( 10 kg or 10 pounds) ?
- that data can have a selection of choices
- that data can be with different rights (from simple user to admin)
- that these data might be needed to generate reports without modifying the code (automation)
- that these data might be needed to make cross reference analysis within the system without modifying the code
the custom table would be my solution, modifying each table would end up being too risky.
I would store those custom fields in a table where each record ( dataType, dataValue, dataUnit ) would use in one row. So there would be a relation oneToMany from one sample to the data. You can also create a table to record all the kind of cutsom types you would use. For example:
create table DataType
(
id int primary key,
name varchar(100) not null unique
description text,
uri varchar(255) //<-- can be used for an ONTOLOGY
)
create table DataRecord
(
id int primary key,
sample_id int not null,//<-- reference to the sample
dataType_id int not null, //<-- references DataType
value varchar(100),//<-- the value as string
unit varchar(50)//<-- g, mg/ml, etc... but it could also be a link to a table describing the units just like DataType
)

How would you make a Temporal Many-to-Many Relationship in SQL?

How would you represent a temporal many-to-many relation in SQL? Under non-temporal circumstances one would use a junction table (aka link/bridge/map) to connect the two sides.
Is adding temporal tracking as simple as including a ValidStart and ValidEnd columns on the junction table? If you have done this, what issues (if any) did you run into? Is there a better method for keeping track of changes over time in this kind of relation?
If it helps at all, in my case I'm specifically using SQL 2008 and the temporal data is not bitemporal as I'm only tracking valid time.
I am working on a project (for some years now) that uses both temporal data and temporal many-to-many relations. Each table has ValidFrom and ValidUntil columns (storing dates only).
First you have to define the semantics of the Valid* columns, i.e. whether ValidUntil is included or excluded from the validity range. You also need to specify whether NULL dates are valid and what their meaning is.
Next you need a couple of functions, such as dbo.Overlaps2() and dbo.Overlaps3() which receive 2 and 3 date ranges respectively, and return 1 if the date ranges overlap and 0 otherwise.
On top of that, I defined views for the many-to-many relationships with dbo.Overlap3(...)=1.
One further point is to have a set of functions which calculate the effective validity range based on dates in 2 or 3 related tables.
Recently I had to add functionality to allow a user to display all available data, or only currently valid data. I save this setting in a users table, associate the SPID to the user when opening a connection, and filter the records in another set of views.

SQL - Table Design - DateCreated and DateUpdated columns

For my application there are several entity classes, User, Customer, Post, and so on
I'm about to design the database and I want to store the date when the entities were created and updated. This is where it gets tricky. Sure one option is to add created_timestamp and update_timestamp columns for each of the entity tables but that isn't that redudant?
Another possibility could be to create a log table that stores this information, and it could be made to contain keep track of updates for any entity.
Any thoughts? I'm leaning on implementing the latter.
The single-log-table-for-all-tables approach has two main problems that I can think of:
The design of the log table will (probably) constrain the design of all the other tables. Most likely the log table would have one column named TableName and then another column named PKValue (which would store the primary key value for the record you're logging). If some of your tables have compound primary keys (i.e. more than one column), then the design of your log table would have to account for this (probably by having columns like PKValue1, PKValue2 etc.).
If this is a web application of some sort, then the user identity that would be available from a trigger would be the application's account, instead of the ID of the web app user (which is most likely what you really want to store in your CreatedBy field). This would only help you distinguish between records created by your web app code and records created otherwise.
CreatedDate and ModifiedDate columns aren't redundant just because they're defined in each table. I would stick with that approach and put insert and update triggers on each table to populate those columns. If I also needed to record the end-user who made the change, I would skip the triggers and populate the timestamp and user fields from my application code.
I do the latter, with a "log" or "events" table. In my experience, the "updated" timestamp becomes frustrating pretty quick, because a lot of the time you find yourself in a fix where you want not just the very latest update time.
How often will you need to include the created/updated timestamps in your presentation layer? If the answer is anything more than "once in a great great while", I think you would be better served by having those columns in each table.
On a project I worked on a couple of years ago, we implemented triggers which updated what we called an audit table (it stored basic information about the changes being made, one audit table per table). This included modified date (and last modified).
They were only applied to key tables (not joins or reference data tables).
This removed a lot of the normal frustration of having to account for LastCreated & LastModified fields, but introduced the annoyance of keeping the triggers up to date.
In the end the trigger/audit table design worked well and all we had to remember was to remove and reapply the triggers before ETL(!).
It's for a web based CMS I work on. The creation and last updated dates will be displayed on most pages and there will be lists for the last created (and updated) pages. The admin interface will also use this information.