How would SQL table store Map Legends? - sql

I am not asking which Map legend classification to choose. But assume I have picked one, how would I store it properly in the SQL tables? The reason of asking is, in the end, I need to store more info than I expected... Hope someone could verify.
Consider 3 cases below:
Numeric, Range
Numeric, Single Value
Alphabet, Single Value
To be able to store it properly, SO THAT I am able to do logic classification at real time (do coloring), meaning, I need to store the operators as well (less than [lt], greater than [gt], equal [eq], etc)
Ends up, Say I have 2 tables:
LegendSetup
Column:
1. LegendKey (int)
2. Type (varchar)
3. Min (decimal)
4. Max (decimal)
LegendValueSetup
Column:
1. ValueKey (int) //AutoIncrement PK
2. LegendKey (int) //FK
3. RangeNumeric (decimal) //numeric
4. RangeAlpha (varchar) //alphabet
5. RangeOperator (varchar) //eq, lt, gt
6. RangeShow (varchar) //for display purpose
7. HexColor (varchar)
Is that how it normally works?

As you only need ranges and equalities I suggest you don't use explicit operators, but ranges like this:
CREATE TABLE LegendValueSetup (
ValueAlpha varchar,
ValueNumeric decimal,
LowerNumeric decimal,
UpperNumeric decimal,
... -- other columns
)
For your examples 2 and 3 you would just store the explicit values in the ValueAlpha and ValueNumeric column.
For example 1 you would have to think about if LowerNumeric is an inclusive and UpperNumeric an exclusive bound and vice versa. Then you can store the legend like this:
LowerNumeric | UpperNumeric | Color
NULL | 100 | Black
100 | 200 | Red
200 | 400 | Orange
400 | 600 | Yellow
600 | NULL | Green
If you have now a value and want to get the color, you only have to query
SELECT color FROM LegendValueSetup WHERE LowerNumeric <= $value AND UpperNumeric > $value
You can of course go further and split the table into three different ones, one for each legend type.

Related

Count the number of occurences of a key in a json object - IMPALA/HIVE

I have a column in my Impala Table which is a map<string,string>. I have extracted that column as a key and value pair in 2 different cols as shown below.
The value column is a json object. I have to find out whether there are multiple occurrences of a particular key in that value column for a particular id.
id | key | value
1 | brm_res | {'abc':'3rr','vbg':''r45','abc':'5rr'}
2 | brm_res | {'abc':'3rr','vbg':''r45','bgh':'5rr'}
3 | brm_res | {'abc':'3rr','vbg':''r45','tyu':'5rr'}
4 | brm_res | {'abc':'3rr','vbg':''r45','yuo':'5rr'}
As shown in the example above, for a particular id(id=1) and key(brm_res), there are 2 entries for (abc) key in the value column. How to find this.
Please guide. Thanks in advance.
You can calculate character length of initial string, remove all occurences of key, calculate the difference in length, divide by length of key, it will be the number of occurrences. Something like this (not tested):
(char_length(value)-char_length(replace(value, "\'abc\':", ''))) div char_length("\'abc\':");

Changes to table not designed for SQL

I am supposed to do some changes to an enormous CSV file based on a different file. Therefore I chose to do it in SQL but after further consideration I am not sure how to proceed..
In the 1st table I have a list of contracts. Columns represent some segments the contract belongs to and some products that can be linked to the contract (example in the table below).
Here contract no. 1234 belongs to segments X1 and Y2. There is no product number 1 linked to it, but it has product number 2 linked to it. The product originaly ends on the 1st of January 2030.
cont_n|date|segment_1|segment_2|..|prod_1|date_prod_1|product_2|date_product_2|..
1234 |3011| X1 | Y2 |..| | |YES |01/01/2030 |..
The 2nd file is a list of combinations of segments and an indication how the "date" columns should be adjusted. The example shows following situation - if there is prod_2 linked to the contract which belongs to groups X1 and Y2, end the prod_2 this year. I need this result to alter table no. 1.
prod_no|segment_1|segment_2|result
prod_2 | X1 | Y2 | end the product on anniversary
Ergo I need to get to the result:
cont_n|date|segment_1|segment_2|..|prod_1|date_prod_1|product_2|date_product_2|..
1234 |3011| X1 | Y2 |..| | |YES |30/11/2019 |..
In the original files I have around 600k rows and more than 300 columns (meaning around 100 different products) in table 1 and around 800 possible combinations of segments in table 2.
The algorithm I need to implement (very generally):
for x=1 to 100
IF product_x = YES THEN date_product_x = date + "Seach for result in table2"
Is there a reasonable way how to change the "date_product_x" columns based on the 2nd table or would it be better to find a different solution?
Thanks a lot!
I can only give you a general approach, because the information in your question is general (for example, why does "end the product on anniversary" translate to "30/11/2019"? It's not explained in the question, so I assume you're going to be able to handle that part of the logic).
You can approach this by using an UNPIVOT on Table 1 to get a structure like:
cont_n | segment1 | segment2 | product_number | product_date
You will UNPIVOT..FOR date_product_1 thru date_product_100. You'll either have to type out all 100 column names, or use dynamic sql to build the whole thing.
You'll do some string manipulation to grab the "x" portion of "date_product_x", and turn it into "prod_x", and then you can join to the second table on the two segment columns and the "prod_x" column, get the result column value, and do whatever rules you're doing to get the value you want for date_product_x.
Finally, you take that result, and PIVOT it back to the one-row-per-contract form, and JOIN it to your original table to UPDATE the date_product_x columns.

SELECT MAX values for duplicate values in another column

I am having some trouble finding an answer for this one, so I apologize if it was somewhere else.
I have a table 'dbo.MileageImport' that has the following layout which I pulled to find duplicate entries:
|KEY | DATA |
---------------------
|V9864653 | 180288 |
|V9864653 | 22189 |
|V9864811 | 11464 |
|V9864811 | 12688 |
What I am having troubles with is when I run the following SQL in a DB2 environment:
SELECT KEY, MIN(DATA)
FROM dbo.MileageImport
GROUP BY KEY
HAVING (COUNT(KEY)>1);
It ends up pulling the following data:
|KEY | DATA |
---------------------
|V9864811 | 11464 |
|V9864653 | 180288 |
For some reason it's pulling the MIN value for V9864811, but not V9864653. If I inverse that and put MAX instead of MIN, it pulls the opposite values.
Is there something I am missing here so I can pull the MIN DATA value for only duplicate KEY records, or is there another way to do this? The report where this data comes from changes from month to month, so there could be different keys that end up being duplicated that I need to correct. Ultimately I am turning this into a DELETE statement to delete the lower of the two (or more) duplicated mileage entries.
Is your DATA column numerical? or a VARCHAR?
If you find its better to change it to a number if you can, maybe an integer if you aren't having any fractions and its just round numbers.
if not, then you could cast them to an integer value, but if there are lots of transactions or its a big table it will be slow and not ideal. Its bad practise to do that if you could just change the datatype!
SELECT KEY, MIN(CAST(DATA as Int))
FROM dbo.MileageImport
GROUP BY KEY
HAVING (COUNT(KEY)>1)

SQL: Find highest number if its in nvarchar format containing special characters

I need to pull the record containing the highest value, specifically I only need the value from that field. The problem is that the column is nvarchar format that contains a mix of numbers and special characters. The following is just an example:
PK | Column 2 (nvarchar)
-------------------
1 | .1.1.
2 | .10.1.1
3 | .5.1.7
4 | .4.1.
9 | .10.1.2
15 | .5.1.4
Basically, because of natural sort, the items in column 2 are sorted as strings. So instead of returning the PK for the row containing ".10.1.2" as the highest value i get the PK for the row that contains ".5.1.7" instead.
I attempted to write some functions to do this but it seems what I've written looked way more complicated than it should be. Anyone got something simple or complicated functions are the only way?
I want to make clear that I'm trying to grab the PK of the record that contains the highest Column 2 value.
This query might return what you desire
SELECT MAX(CAST(REPLACE(Column2, '.', '') as INT)) FROM table

Convert any string to an integer

Simply put, I'd like to be able to convert any string to an integer, preferably being able to restrict the size of the integer and ensure that the result is always identical. In other words is there a hashing function, supported by Oracle, that returns a numeric value and can that value have a maximum?
To provide some context if needed, I have two tables that have the following, simplified, format:
Table 1 Table 2
id | sequence_number id | sequence_number
-------------------- -------------
1 | 1 1 | 2QD44561
1 | 2 1 | 6HH00244
2 | 1 2 | 5DH08133
3 | 1 3 | 7RD03098
4 | 2 4 | 8BF02466
The column sequence_number is number(3) in Table 1 and varchar2(11) in Table 2; it is part of the primary key in both tables.
The data is externally provided and cannot be changed; in Table 1 it is, I believe, created by a simple sequence but in Table 2 has a meaning. The data is made up but representative.
Someone has promised that we would output a number(3) field. While this is fine for the column in the first table, it causes problems for the second.
I would like to be able to both convert sequence_number to an integer (easy), that is less than 1000 (harder) and if at all possible is constant (seemingly impossible). This means that I would like '2QD44561' to always return 586. It does not matter, much, if two strings return the same number.
Simply converting to an integer I can use utl_raw.cast_to_number():
select utl_raw.cast_to_number((utl_raw.cast_to_raw('2QD44561'))) from dual;
UTL_RAW.CAST_TO_NUMBER((UTL_RAW.CAST_TO_RAW('2QD44561')))
---------------------------------------------------------
-2.033E+25
But as you can see this isn't less than 1000
I've also been playing around with dbms_crypto and utl_encode to see if I could come up with something but I've not managed to get a small integer. Is there a way?
How about ora_hash?
select ora_hash(sequence_number, 999) from table_2;
... will produce a maximum of 3 digits. You could also seed it with the id I suppose, but not sure that adds much with so few values, and I'm not sure you'd want that anyway.
You are talking about using a hash function. There are lots of solutions out there - sha1 is very common.
But just FYI, when you say "restrict the size of the integer" understand that you will then be mapping an infinite set of strings or numbers onto a limited set of values. So while your strings will always map to the same value when they are the same, they will not be the only string to map to that value