Oracle SQL - Give each row in a result set a unique identifier depending on a value in a column - sql

I have a result set, being returned from a view, that returns a list of items and the country they originated from, an example would be:
ID | Description | Country_Name
------------------------------------
1 | Item 1 | United Kingdom
2 | Item 2 | France
3 | Item 3 | United Kingdom
4 | Item 4 | France
5 | Item 5 | France
6 | Item 6 | Germany
I wanted to query this data, returning all columns (There are more columns than ID, Description and Country_Name, I've omitted them for brevity's sake) with an extra one added on giving a unique value depending on the value that is inside the field Country_name
ID | Description | Country_Name | Country_Relation
---------------------------------------------------------
1 | Item 1 | United Kingdom | 1
2 | Item 2 | France | 2
3 | Item 3 | United Kingdom | 1
4 | Item 4 | France | 2
5 | Item 5 | France | 2
6 | Item 6 | Germany | 3
The reason behind this, is we're using a Jasper report and need to show these items with an asterisk next to it (Or in this case a number) explaining some details about the country. So the report would look like this:
Desc. Country
Item 1 United Kingdom(1)
Item 2 France(2)
Item 3 United Kingdom(1)
Item 4 France(2)
Item 5 France(2)
Item 6 Germany(3)
And then further down the report would be a field stating:
1: Here are some details about the UK
2: Here are some details about France
3: Here are some details about Germany
I'm having difficulty trying to generate a unique number to go along side each country, starting at one each time the report is ran, incrementing it when a new country is found and keeping track of where to assign it. I would hazard a guess at using temporary tables to do such a thing, but I feel that's overkill.
Question
Is this kind of thing possible in Oracle SQL or am I attempting to do something that is rather large and cumbersome?
Are there better ways of doing this inside of a Jasper report?
At the moment, I'm looking at just having the subtext underneath each individual item and repeating the same information several times, just to avoid this situation, rather than having them aggregated and having the subtext once. It's not clean, but it saves this rather odd hassle.

You are looking for dense_rank():
select t.*, dense_rank() over (order by country_name) as country_relation
from t;
I don't know if this can be done inside Jasper reports. However, it is easy enough to set up a view to handle this in Oracle.

Related

Best data structure for finding tags of nested locations

Somebody pointed out that my data structure architecture sucks.
The task
I have a locations table which stores the name of a location. Then I have a tags table which stores information about those locations. The locations have a hierarchie which I want to use to get all tags.
Example
Locations:
USA <- California <- San Francisco <- Mission St
Tags:
USA: English
California: Sunny
California: West coast
San Francisco: Sea side
Mission St: Cable car station
If somebody requests information about the Mission St I want to deliver all tags of it and it's ancestors (["English", "Sunny", "West coast", "Sea side", "Cable car station"]. If I request all tags of California the answer would be ["English", "Sunny", "West coast"].
I'm looking for the best read performance! I don't care about write performance. This data is not changed very often. And I don't care about table sizes either. If I need more or larger tables to solve this quicker so be it.
The tables
So currently I'm thinking about setting up these tables:
locations
id | name
---|--------------
1 | USA
2 | California
3 | San Francisco
4 | Mission St
tags
id | location_id | name
---|-------------|------------------
1 | 1 | English
2 | 2 | Sunny
3 | 2 | West coast
4 | 3 | Sea side
5 | 4 | Cable car station
ancestors
I added a position field to store the hierarchy.
| id | location_id | ancestor_id | position |
|----|-------------|-------------|----------|
| 1 | 2 | 1 | 1 |
| 2 | 3 | 2 | 1 |
| 3 | 3 | 1 | 2 |
| 4 | 4 | 3 | 1 |
| 5 | 4 | 2 | 2 |
| 6 | 4 | 1 | 3 |
Question
Is this a good solution to solve the problem or is there a better one? I want to select as fast as possible all tags of any given location including all the tags of it's ancestors. I'm using a PostgreSQL database but I think this is a pure SQL architecture problem.
Your problem seems to consist of two challenges. The most interesting is "how do I store hierarchies in a relational database". There are lots of answers to that - the one you've proposed is the most common.
There's an alternative called "nested set" which is faster for reading (in your example, finding all locations within a particular hierarchy would be "between x and y".
Postgres has dedicated support for hierachies; I'd assume this would also provide great performance.
The second part of your question is "given a path in my hierarchy, retrieve all matching tags". The easiest option is to join to the tags table as you suggest.
The final aspect is "should you denormalize/precalculate". I usually recommend building and optimizing the "normalized" solution and only denormalize when you need to.
If you want to deliver all tags for a particular location, then I would recommend replicating the data and storing the tags in a tags array on a row for each location.
You say that the locations don't change very much. So, I would simply batch create the entire table, when any underlying data changes.
Modifying the data in situ is rather problematic. A single update could end up affecting a zillion different rows -- consider a tag change on USA. Recalculating the entire table is going to be more efficient.
If you need to search on the tags as well as return them, then I would go for a more traditional structure of a table with two important columns, location and tag. Then you can have indexes on both (location) and (tag) to facilitate searching in either direction.
If write performance is not crucial, I would go for denormalization of the database. That means you use the above structure for your write operations and fill a table for your read operations by a trigger or a some async job, if you are afraid of triggers. Then the read performance is optimal, but you have to invest a bit more into the write logic.
Using the above structure for read operations is indeed not a smart solution, cause you don't know how deep the tree can get.

Google Data Studio: Average Number of Sessions based on selected country values

Let's say that for the dimension country I have 4 values and for each of the 4 I have the respective number of Sessions. E.g.
+---------+----------+
| country | Sessions |
+---------+----------+
| Italy | 10 |
| France | 12 |
| Germany | 14 |
| Spain | 16 |
+---------+----------+
I want to compute and output in a scorecard the average number of Sessions, only for those specific countries. So, in the example, the output should be 13.
I tried with the following calculated field but it doesn't work:
Sessions * AVG(CASE
WHEN REGEXP_MATCH(country, '^Italy|France|Germany|Spain.*') THEN 1
ELSE 0 END)
Create a filter based on the country dimension using the Matching RegEx operator.
Then apply this to a scorecard with the metric sessions. In the Data tab on the right hand side, you should be able to click on a little pencil icon for the metric, and choose the aggregation method as average instead of sum.
You may not have this option if you're using the GA connector. In this case, there should be an Average Session metric in the data source.
One way it can be achieved is by using a Filter Control and a Calculated Field:
1) Filter Control
Add the component with the Dimension set to Country and then add a Default Selection (a comma separated list of the required countries):
Italy, France, Germany, Spain
2) Calculated Field (Scorecard)
Sessions / COUNT_DISTINCT(Country)
Google Data Studio Report and a GIF to elaborate:

Table with arbitrary number of columns for rows

I want to store the amount of the groceries that I have at home and show me dishes, that I can cook based on that.
I want to be precise as possible, that's why I want to differentiate e.g. between frozen and fresh berries or low fat and normal milk. But I've problems modeling this. I have the following tables:
Products Type Amounts
id | name id | name id | Products.id | Type.id | amount
---|---------- ---|------ ---|-------------|---------|-------
1 | milk 1 | frozen 1 | 1 | 2 | 1l
2 | strawberry 2 | low fat 2 | 1 | 3 | 0.5l
3 | organic 3 | 2 | 1 | 500g
4 | fresh 4 | 2 | 4 | 250g
So far I've no problem, but how would I store a product that has two or more types (e.g. low fat, organic milk)?
Things I could do:
create organic milk and low fat organic milk as separate products and drop the Type table
Remove the Type.id foreign key and put all types for a product as JSON or as CSV in a new types column of Amounts
Limit to n types per product and add n Type{1..n}.id columns to Amounts - set a column to NULL if product has less than n types
But before I do this, I would love to know if there are better solutions.
Distinguish between a "product", which is generic, and an "item" which is a specific instance of the product in the inventory.
Then "tag" the items with properties. This is essentially an entity-attribute-value relationship. Rows might be:
item product id tag
A 1 [milk] organic
A 1 low-fat
B 1 organic
The idea is to separate the notion of the "product", which is generic, and "item" which is a specific instance that might have additional tags.
Then an "amounts" table:
item Amount Unit
A 1 liter
The tags could of course be in another table, to ensure consistency within and across products. For instance, you might ask: What can I make with the organic items (products) that I have on hand?
Product <-> Type is a many to many relationship. Every product can have many types and there can be many products that are of a certain type. To model this in a relational database, you need a so called mapping table. That table would have two columns: product_id and type_id. Then you insert a row for every relation between a product and a type.
You can find an example in this post.

Database Table: Quantity or redundancy

I'm building a database which have a lot of items for a bike shop. This bike shop have many of the same items such as 100 wheels of size 4 and color 'red'. My question is:
Is is better to add a 'Quantity' field to the entity set and put all similar items in one entity (example 1) or is it better to have an entity for each item (example 2)?
Example 1:
id | color | size | quantity
1 | red | 4 | 100
Example 2:
id | color | size
1 | red | 4
2 | red | 4
3 | red | 4
etc.
The first - qqhantity field - unless you have a reason to track for example serial numbers, and even then you may go to v1 and use a separate column.
Generally: get a copy of the Data Model Ressoure Book Vol 1 - it has a ton of discussions about standard business data problems, among them an inventory system. You will learn a lot.

Extract data from one field into another in mysql

I have an old table which has a column like this
1 | McDonalds (Main Street)
2 | McDonalds (1st Ave)
3 | The Goose
4 | BurgerKing (Central Gardes)
...
I want to match the venues like ' %(%)' and then extract the content in the brackets to a second field
to result in
1 | McDonalds | Main Street
2 | McDonalds | 1st Ave
3 | The Goose | NULL
4 | BurgerKing| Central Gardes
...
How would one go about this?
MySQL provides string functions for finding characters and extracting substrings. You can also use control flow functions to handle the cases where the venue is not present.
I installed these user defined functions
http://www.mysqludf.org/lib_mysqludf_preg/
Then I could select the "branches" via
SELECT `id`, `name`, preg_capture('/.*?\\((.*)\\)/',`name`,1) AS branch FROM `venues`