Calculated Column Editor - sql

I am sorry for the lngthy question, but it has to be precisely described, if it is to be answered.
I am building a schema-and-data application in SQL Server and .NET winforms.
Table ItemType holds the type of items, table ItemTypeColumn holds the columns for each type and finally, table ItemData holds all the data of the application.
An example of these would be:
<ul>
<li>ItemType: Customer, Customer Category, etc</li>
<li>ItemColumn: Customer Name, Description etc</li>
<li>ItemData: John Doe, International, etc</li>
</ul>
This leads to a very interesting chain of data retrieval. For instance:
<ul>
<li>Customer X is item ID 100</li>
<li>Category Y is item ID 60</li>
<li>To indicate that Customer X is of category Y - (to point towards it) we need to find the line that combines ItemID=100, ItemColumnID=[whatever the id of the item ItemColumn is] and then update the field Data with the ID of Category X (60).</li>
</ul>
I retrieve the data with a Select statement as follows:
SELECT * FROM (
SELECT ItemData.ItemID,
IC.ItemTypeID,
MAX(CASE WHEN ItemData.ItemColumnID = 28
THEN ItemData.Data ELSE NULL END) AS "Name",
MAX(CASE WHEN ItemData.ItemColumnID = 32
THEN ItemData ELSE NULL END)
AS "Code"
FROM ItemData as Data
INNER JOIN (
select *
from ItemColumns
where ItemTypeID=7) as IC
on ItemData.ItemColumnID=IC.ID
GROUP BY ItemData.ItemID, IC.ItemTypeID) as table1
INNER JOIN Item ON Item.ID=table1.ItemID
This works like a charm.
Now, I want to give my user the ability to create calculated columns with ability to select:
<ul>
<li>A column from this type item</li>
<li>A column from a parent type item</li>
<li>A value from a child type item </li>
<li>The specific data found in a specific row id</li>
</ul>
Here 's an example:
<ul>
<li>Item Type "Country" has 1 field: Name</li>
<li>Item Type "City" has 3 fields: Name, Population, Country (pointing to parent country)</li>
</ul>
I ' d like to give the user ability to create new columns that will hold calculated data (as opposed to data entry, like "name" or "decription").
I have managed to create an expression builder and a parser that actually works. Taking the above example into account, you can create a column Urban Population in type Country that brings the sum of column Population of all "children" Cities for each country. This, I accomplished with User Defined Functions, and virtual columns which I call in the select statement.
Here's the problem (finally): if I want to create a calculated column that references another calculated column, say Rural Population - that would show the Country Population (numeric field) minus the sum of Urban Cities (sum of population field for "child cities") it will not work because the newly "created" (in the selecte table) column cannot be referenced by peers.
I sense that I am on the wrong track, in general, concerning the calculated fields. Are there any best practices to follow? Is my approach wrong? Is there a workaround for the calculation-in-calculation error?
Thank you in advance - again, I am sorry for the length of this question.

The approach you are taking in database design is "Name/Key value pair" . Infact you are going one step further also storing the aggregate data in the same name value pair structure. This approach looks very flexible and is very tempting , however the RDBMS by architecture not design for this approach.
This works very well with small data does not with big data. Hence design of this kid mostly fails in production within few months of release.
Please refer following link :
http://geekswithblogs.net/darrengosbell/archive/2006/03/12/KVPsInDatabaseDesign.aspx
it is always suggested to design RDBMS with explicit column name and definition.
I have usually notice number and type of aggregation needs are not very complex and very different and mostly they can be determined in advance . Hence discouraging this approach of yours.
In some scenarios in which the flexibility is absolute needed , one can use external tools like : tableau, R or Python Panda/iPython etc. I understand for these tool to use the users needs to learn them.
It is really commendable Nassosk , that you could achieve all what you have told in your original post, i will interested in seeing your code :-).
Looks like you are designing a database over database :-)
Thank you

From what I 've read here and elsewhere my question is like asking "how should I go about if I want to jump off a skyscraper" - most people will tell you not to jump, instead of giving you their 5 cents :)
In any case, and since I 've put a lot of work to it up to now, I thought I might fail with dignity and go all the way, so here's the answer:
Since my select statement actually returns a virtual table (it transposes the data) it seemed totally plausible to add all the related tables in a dataset, create the relations among them on the fly in the dataset (yes, #user3851404 I am building a database over a database, it's quite rewarding though) and set the Expression property equal to my formula in the datacolumns that I want to display derrived data.
It actually works as expected. I will not comment on performance because I haven't stress-tested it yet, but whatever the outcome regarding performance, it seems that this is the only workaround.

Related

How to create inheritance in SQLite

I have to create an SQLite DB that models a survey with some ordered content; this content can be a question, an image or a simple text field (just like Google Forms). Each content doesn't have anything to do with the other, except questions which can have a list of attached images to them.
What would be the best way to model this situation? I thought about creating a "Survey" table and a "Content" table that has only an integer ID, and that same ID is then "duplicated" into each table ("Question", "Image" or "TextField"), but then I think I would have to insert both values for the Content and values for a specific content (Question, Image or TextField) every time I need to insert a new content. I don't think it would be a big problem, but if there is an way to model this better, I would like some advice.
Your approach is an example of 'table per type' as defined in this answer.
Conceptually, you're saying "there are 3 kinds of content, and the one thing they share is their relationship with a survey, as captured in the content table". You might include in that table an explicit type indicator along the ID - this will make your code a little more explicit. You may also find you need to capture meta data like "status", "date_entered" etc. which is common across subtypes.
By including a type indicator column, you make it easy to find out what the type of a content item is. So, if you want to show the summary of a question, you could do something like
select content_type, count(*)
from content
where question_id = ?
group by content_type
to show the number and type of responses.

How to populate all possible combination of values in columns, using Spark/normal SQL

I have a scenario, where my original dataset looks like below
Data:
Country,Commodity,Year,Type,Amount
US,Vegetable,2010,Harvested,2.44
US,Vegetable,2010,Yield,15.8
US,Vegetable,2010,Production,6.48
US,Vegetable,2011,Harvested,6
US,Vegetable,2011,Yield,18
US,Vegetable,2011,Production,3
Argentina,Vegetable,2010,Harvested,15.2
Argentina,Vegetable,2010,Yield,40.5
Argentina,Vegetable,2010,Production,2.66
Argentina,Vegetable,2011,Harvested,15.2
Argentina,Vegetable,2011,Yield,40.5
Argentina,Vegetable,2011,Production,2.66
Bhutan,Vegetable,2010,Harvested,7
Bhutan,Vegetable,2010,Yield,35
Bhutan,Vegetable,2010,Production,5
Bhutan,Vegetable,2011,Harvested,2
Bhutan,Vegetable,2011,Yield,6
Bhutan,Vegetable,2011,Production,3
Image of the above csv:
Now there is a very small country lookup table which has all possible countries the source data can come with, listed. PFB:
I want to have the output data's number of columns always fixed (this is to ensure the reporting/visualization tool doesn't get dynamic number columns with every day's new source data ingestions depending on the varying distinct number of countries present).
So, I've to somehow join the source data with the country_lookup csv and populate all those columns with default value as F. Every country column would be binary with T or F being the possible values.
The original dataset from the above has to be converted into below:
Data (I've kept the Amount field unsolved for column Type having Derived Yield as is, rather than calculating them below for a better understanding and for you to match with the formulae):
Country,Commodity,Year,Type,Amount,US,Argentina,Bhutan,India,Nepal,Bangladesh
US,Vegetable,2010,Harvested,2.44,T,F,F,F,F,F
US,Vegetable,2010,Yield,15.8,T,F,F,F,F,F
US,Vegetable,2010,Production,6.48,T,F,F,F,F,F
US,Vegetable,2010,Derived Yield,(2.44+15.2)/(6.48+2.66),T,T,F,F,F,F
US,Vegetable,2010,Derived Yield,(2.44+7)/(6.48+5),T,F,T,F,F,F
US,Vegetable,2010,Derived Yield,(2.44+15.2+7)/(6.48+2.66+5),T,T,T,F,F,F
US,Vegetable,2011,Harvested,6,T,F,F,F,F,F
US,Vegetable,2011,Yield,18,T,F,F,F,F,F
US,Vegetable,2011,Production,3,T,F,F,F,F,F
US,Vegetable,2011,Derived Yield,(6+10)/(3+9),T,T,F,F,F,F
US,Vegetable,2011,Derived Yield,(6+2)/(3+3),T,F,T,F,F,F
US,Vegetable,2011,Derived Yield,(6+10+2)/(3+9+3),T,T,T,F,F,F
Argentina,Vegetable,2010,Harvested,15.2,F,T,F,F,F,F
Argentina,Vegetable,2010,Yield,40.5,F,T,F,F,F,F
Argentina,Vegetable,2010,Production,2.66,F,T,F,F,F,F
Argentina,Vegetable,2010,Derived Yield,(2.44+15.2)/(6.48+2.66),T,T,F,F,F,F
Argentina,Vegetable,2010,Derived Yield,(15.2+7)/(2.66+5),F,T,T,F,F,F
Argentina,Vegetable,2010,Derived Yield,(2.44+15.2+7)/(6.48+2.66+5),T,T,T,F,F,F
Argentina,Vegetable,2011,Harvested,10,F,T,F,F,F,F
Argentina,Vegetable,2011,Yield,90,F,T,F,F,F,F
Argentina,Vegetable,2011,Production,9,F,T,F,F,F,F
Argentina,Vegetable,2011,Derived Yield,(6+10)/(3+9),T,T,F,F,F,F
Argentina,Vegetable,2011,Derived Yield,(10+2)/(9+3),F,T,T,F,F,F
Argentina,Vegetable,2011,Derived Yield,(6+10+2)/(3+9+3),T,T,T,F,F,F
Bhutan,Vegetable,2010,Harvested,7,F,F,T,F,F,F
Bhutan,Vegetable,2010,Yield,35,F,F,T,F,F,F
Bhutan,Vegetable,2010,Production,5,F,F,T,F,F,F
Bhutan,Vegetable,2010,Derived Yield,(2.44+7)/(6.48+5),T,F,T,F,F,F
Bhutan,Vegetable,2010,Derived Yield,(15.2+7)/(2.66+5),F,T,T,F,F,F
Bhutan,Vegetable,2010,Derived Yield,(2.44+15.2+7)/(6.48+2.66+5),T,T,T,F,F,F
Bhutan,Vegetable,2011,Harvested,2,F,F,T,F,F,F
Bhutan,Vegetable,2011,Yield,6,F,F,T,F,F,F
Bhutan,Vegetable,2011,Production,3,F,F,T,F,F,F
Bhutan,Vegetable,2011,Derived Yield,(2.44+7)/(6.48+5),T,F,T,F,F,F
Bhutan,Vegetable,2011,Derived Yield,(10+2)/(9+3),F,T,T,F,F,F
Bhutan,Vegetable,2011,Derived Yield,(6+10+2)/(3+9+3),T,T,T,F,F,F
The image of the above expected output data for a structured look at it:
Part 1 -
Part 2 -
Formulae for populating Amount Field for Derived Type:
Derived Amount = Sum of Harvested of all countries with T (True) grouped by Year and Commodity columns divided by Sum of Production of all countries with T (True)grouped by Year and Commodity columns.
So, the target is to have a combination of all the countries from source and calculate the sum of respective Harvested and Production values which then has to be divided. The commodity can be more than one in the actual scenario for any given country, but that should not bother as the summation of amount happens on grouped commodity and year.
Note: The users in the frontend can select any combination of countries. The sole purpose of doing it in the backend rather than dynamically doing it in the frontend is because AWS QuickSight (our visualisation tool), even though can populate sum on selected column filters but doesn't yet support calculation on those derived summed fields. Hence, the entire calculation of all combination of countries has to be pre-populated (very naive approach) in order to make it available in report on dynamic users selection of countries.
Also if you've any better approach (than the above naive approach mentioned in note) to solve this problem, you are most welcome to guide me. I've also posted a question on the same problem without writing my expected approach for experts to show me the path on how we can solve this kind of a problem better than this naive approach. If you want to help solve it with some other technique, you're most welcome, here is the link to that question.
Any help shall be greatly acknowledged.

SQL: Not Like produces different results than what would be Like's opposite

So, I'm practicing for an exam (high school level), and although we have never been thought SQL it is necessarry know a little when handling MS Access.
The task is to select the IDs of areas which names does not correspond with the town's they belong to.
In the solution was the following example:
SELECT name
FROM area
WHERE id not in (SELECT areaid
FROM area, town, conn
WHERE town.id = conn.townid
AND area.id = conn.areaid AND
area.name like "*"+town.name+"*");
It would be the same with INNER JOINS, just stating that, because Access makes the connection between tables that way.
It works perfectly (well, it was in the solution), but what I don't get is that why do we need the "not in" part and why can't we use just "not like" instead of "like" and make the query in one step.
I rewrote it that way (without the "not in" part) and it gave a totally different result. If I changed "like" with "not like" it wasn't the opposite of that, but just a bunch of mixed data. Why? How does that work? Please, could someone explain?
Edit (after best answer): It was more like a theoretical question on how SQL queries work, and does not needed a concrete solution, but an explanation of the process. (Because of this I feel like the sql tag however belongs here)
One thing that would create a difference is to consider this example
areaid areaname townname
1 AA AA
1 AA BB
So your first query would exclude both records from the outcome. Because the inner query would identify areaid =1 to be among those to be excluded. Therefore, both records will not show up in the output.
Using not like however would exclude the first record and return to you the second record. Because the first record satisfies the condition with not like but the second doesn't satisfy the condition.
In other words, the first query would exclude any area (and corresponding records) that have at least one townname that is like an areaname. The second approach, would exclude only incidences where areaname is like townname but doesn't necessarily exclude all records for that area.
The reason is because there can be more than one town in an area, right?
So if there is a town in an area that has a similar name, then that area will be found in the LIKE subquery.
If there is another town in the SAME AREA that does not have a similar name, then that area will ALSO be found in the NOT LIKE subquery.
So the same area can be returned whether you use LIKE or NOT LIKE, because of the one-to-many relationship to towns.
Make sense?
It depends on what the relationship between area, town and conn are. If you have many towns in an area, you will see the area duplicated in your row set. Your original query simply asks "Show me the areas that are in the following list:". Your query in one-step asks a different question: "Show me the 'conns' in towns, in areas which have an area name not like the town name...
SELECT name
FROM area, town, conn
WHERE area.id = conn.areaid
AND town.id = conn.townid
AND area.name NOT like "*"+town.name+"*");

Few questions about Grails' createCriteria

I read about createCriteria, and kind of interested on how these works, and its usability in providing values for dropdown box.
So say, i have a table in the database, Resource table, where i have defined the table in the domain class called Resource.groovy. Resource table has a total of 10 columns, where 5 of it are
Material Id
Material description
Resource
Resource Id
Product Code
So using the createCriteria, and i can use just like a query to return the items that i want to
def resList = Resource.createCriteria().list {
and {
eq('resource', resourceInstance)
ne('materialId', '-')
}
}
Where in the above, i want to get the data that matches the resource = resourceInstance, and none of the materialId is equal to '-'.
I want to use the returned data from createCriteria above on my form, where i want to use some of the column on my select dropdown. Below is the code i used for my select dropdown.
<g:select id="resourceId" name="resourceId"
from="${resList}"
disabled="${actionName != 'show' ? false : true}" />
How do i make it so that in a dropdown, it only shows the values taken from column Product Code? I believe the list created using createCriteria returns all 10 columns based on the createCriteria's specification. But i only want to use the Product Column values on my dropdown.
How do i customize the data if in one of the select dropdown in my form, i wanted to show the values as "Resource Id - Resource Description"? The values are combination of more than 1 columns for one select dropdown but i don't know how to combine both in a single select dropdown.
I read that hql and GORM query are better ways of fetching data from table than using createCriteria. Is this true?
Thanks
First of all refer to the document for using select in Grails. To answer all questions:
Yes, the list to select from in the dropdown can be customized. In this case it should be something like from="${resList*.productCode}"
Yes, this can be customized as well with something like
from="${resList.collect { \"${it.resourceId} - ${it.resourceDesc}\" } }"
It depends. If there are associations involved in a domain then using Criteria will lead to eager fetches which might not be required. But with HQL one gets the flexibility of tailoring the query as needed. With latest version of Grails those boundries are minimized a lot. Usage of DetachedCriteria, where queries etc are recommended whereever possible. So it is kind of mixing and matching to the scenario under consideration.

How to design a database table structure for storing and retrieving search statistics?

I'm developing a website with a custom search function and I want to collect statistics on what the users search for.
It is not a full text search of the website content, but rather a search for companies with search modes like:
by company name
by area code
by provided services
...
How to design the database for storing statistics about the searches?
What information is most relevant and how should I query for them?
Well, it's dependent on how the different search modes work, but generally I would say that a table with 3 columns would work:
SearchType SearchValue Count
Whenever someone does a search, say they search for "Company Name: Initech", first query to see if there are any rows in the table with SearchType = "Company Name" (or whatever enum/id value you've given this search type) and SearchValue = "Initech". If there is already a row for this, UPDATE the row by incrementing the Count column. If there is not already a row for this search, insert a new one with a Count of 1.
By doing this, you'll have a fair amount of flexibility for querying it later. You can figure out what the most popular searches for each type are:
... ORDER BY Count DESC WHERE SearchType = 'Some Search Type'
You can figure out the most popular search types:
... GROUP BY SearchType ORDER BY SUM(Count) DESC
Etc.
This is a pretty general question but here's what I would do:
Option 1
If you want to strictly separate all three search types, then create a table for each. For company name, you could simply store the CompanyID (assuming your website is maintaining a list of companies) and a search count. For area code, store the area code and a search count. If the area code doesn't exist, insert it. Provided services is most dependent on your setup. The most general way would be to store key words and a search count, again inserting if not already there.
Optionally, you could store search date information as well. As an example, you'd have a table with Provided Services Keyword and a unique ID. You'd have another table with an FK to that ID and a SearchDate. That way you could make sense of the data over time while minimizing storage.
Option 2
Treat all searches the same. One table with a Keyword column and a count column, incorporating SearchDate if needed.
You may want to check this:
http://www.microsoft.com/sqlserver/2005/en/us/express-starter-schemas.aspx