SQL: Find highest number if its in nvarchar format containing special characters - sql

I need to pull the record containing the highest value, specifically I only need the value from that field. The problem is that the column is nvarchar format that contains a mix of numbers and special characters. The following is just an example:
PK | Column 2 (nvarchar)
-------------------
1 | .1.1.
2 | .10.1.1
3 | .5.1.7
4 | .4.1.
9 | .10.1.2
15 | .5.1.4
Basically, because of natural sort, the items in column 2 are sorted as strings. So instead of returning the PK for the row containing ".10.1.2" as the highest value i get the PK for the row that contains ".5.1.7" instead.
I attempted to write some functions to do this but it seems what I've written looked way more complicated than it should be. Anyone got something simple or complicated functions are the only way?
I want to make clear that I'm trying to grab the PK of the record that contains the highest Column 2 value.

This query might return what you desire
SELECT MAX(CAST(REPLACE(Column2, '.', '') as INT)) FROM table

Related

Count the number of occurences of a key in a json object - IMPALA/HIVE

I have a column in my Impala Table which is a map<string,string>. I have extracted that column as a key and value pair in 2 different cols as shown below.
The value column is a json object. I have to find out whether there are multiple occurrences of a particular key in that value column for a particular id.
id | key | value
1 | brm_res | {'abc':'3rr','vbg':''r45','abc':'5rr'}
2 | brm_res | {'abc':'3rr','vbg':''r45','bgh':'5rr'}
3 | brm_res | {'abc':'3rr','vbg':''r45','tyu':'5rr'}
4 | brm_res | {'abc':'3rr','vbg':''r45','yuo':'5rr'}
As shown in the example above, for a particular id(id=1) and key(brm_res), there are 2 entries for (abc) key in the value column. How to find this.
Please guide. Thanks in advance.
You can calculate character length of initial string, remove all occurences of key, calculate the difference in length, divide by length of key, it will be the number of occurrences. Something like this (not tested):
(char_length(value)-char_length(replace(value, "\'abc\':", ''))) div char_length("\'abc\':");

SELECT MAX values for duplicate values in another column

I am having some trouble finding an answer for this one, so I apologize if it was somewhere else.
I have a table 'dbo.MileageImport' that has the following layout which I pulled to find duplicate entries:
|KEY | DATA |
---------------------
|V9864653 | 180288 |
|V9864653 | 22189 |
|V9864811 | 11464 |
|V9864811 | 12688 |
What I am having troubles with is when I run the following SQL in a DB2 environment:
SELECT KEY, MIN(DATA)
FROM dbo.MileageImport
GROUP BY KEY
HAVING (COUNT(KEY)>1);
It ends up pulling the following data:
|KEY | DATA |
---------------------
|V9864811 | 11464 |
|V9864653 | 180288 |
For some reason it's pulling the MIN value for V9864811, but not V9864653. If I inverse that and put MAX instead of MIN, it pulls the opposite values.
Is there something I am missing here so I can pull the MIN DATA value for only duplicate KEY records, or is there another way to do this? The report where this data comes from changes from month to month, so there could be different keys that end up being duplicated that I need to correct. Ultimately I am turning this into a DELETE statement to delete the lower of the two (or more) duplicated mileage entries.
Is your DATA column numerical? or a VARCHAR?
If you find its better to change it to a number if you can, maybe an integer if you aren't having any fractions and its just round numbers.
if not, then you could cast them to an integer value, but if there are lots of transactions or its a big table it will be slow and not ideal. Its bad practise to do that if you could just change the datatype!
SELECT KEY, MIN(CAST(DATA as Int))
FROM dbo.MileageImport
GROUP BY KEY
HAVING (COUNT(KEY)>1)

Is condensing the number of columns in a database beneficial?

Say you want to record three numbers for every Movie record...let's say, :release_year, :box_office, and :budget.
Conventionally, using Rails, you would just add those three attributes to the Movie model and just call #movie.release_year, #movie.box_office, and #movie.budget.
Would it save any database space or provide any other benefits to condense all three numbers into one umbrella column?
So when adding the three numbers, it would go something like:
def update
...
#movie.umbrella = params[:movie_release_year]
+ "," + params[:movie_box_office] + "," + params[:movie_budget]
end
So the final #movie.umbrella value would be along the lines of "2015,617293,748273".
And then in the controller, to access the three values, it would be something like
#umbrella_array = #movie.umbrella.strip.split(',').map(&:strip)
#release_year = #umbrella_array.first
#box_office = #umbrella_array.second
#budget = #umbrella_array.third
This way, it would be the same amount of data (actually a little more, with the extra commas) but stored only in one column. Would this be better in any way than three columns?
There is no benefit in squeezing such attributes in a single column. In fact, following that path will increase the complexity of your code and will limit your capabilities.
Here's some of the possible issues you'll face:
You will not be able to add indexes to increase the performance of lookup of records with a specific attribute value or sort the filtering
You will not be able to query a specific attribute value
You will not be able to sort by a specific column value
The values will be stored and represented as Strings, rather than Integers
... and I can continue. There are no advantages, only disadvantages.
Agree with comments above, as an example try to use pg_column_size() to compare results:
WITH test(data_txt,data_int,data_date) AS ( VALUES
('9999'::TEXT,9999::INTEGER,'2015-01-01'::DATE),
('99999999'::TEXT,99999999::INTEGER,'2015-02-02'::DATE),
('2015-02-02'::TEXT,99999999::INTEGER,'2015-02-02'::DATE)
)
SELECT pg_column_size(data_txt) AS txt_size,
pg_column_size(data_int) AS int_size,
pg_column_size(data_date) AS date_size
FROM test;
Result is :
txt_size | int_size | date_size
----------+----------+-----------
5 | 4 | 4
9 | 4 | 4
11 | 4 | 4
(3 rows)

Spitting long column values to managable size for presenting data neatly

Hi I was wondering if there is a way to split long column values in this case I am using SSRS to get the distinct values with the number of product ID against a category into a matrix/pivot table in SSRS. The problem lies with the amount of distinct category makes it a nightmare to make the report look pretty shall we say. Is there a dynamic way to split the columns in say groups of 10 to make the table look nicer and easy to read. I was thinking of using in operator then the list of values but that means managing the data every time a new category gets added. Is there a dynamic way to present the data in the best way possible? There are 135 distinct category values
Also I am open to suggestions to make the report to nicer if anyone has any thoughts. I am new to SSRS and trying to get to grips with its.
Here is an example of my problem
enter image description here
Are your column names coming back from the database under the SubCat field you note in the comments above? If so I imagine your dataset looks something like this
Subcat | Logno
---------+---------------
SubCatA | 34
SubCatB | 65
SubCatC | 120
SubCatD | 8
SubCatE | 19
You can edit this so that there is an index of each individual category being returned also, using the Row_Number() function. Add the field
ROW_NUMBER() OVER (ORDER BY SubCat ASC) AS ColID
To your query. This will result in the following.
Subcat | LogNo | ColID
-----------+--------------+----------
SubCatA | 34 | 1
SubCatB | 65 | 2
SubCatC | 120 | 3
SubCatD | 8 | 4
SubCatE | 19 | 5
Now there is a numeric identifier for each column you can perform some logic on it to arrange itself nicely on the page.
This solution involves a Tablix, nested inside a Matrix nested inside a Matrix as follows
First create a Matrix (Matrix1), and set it’s datasource to your dataset. Set the Row Group Properties to group on the following expression where ‘4’ is the number of columns you wish to display horizontally.
=CInt(Floor((Fields!ColID.Value - 1) / 4))
Then in the data section of the Matrix (bottom right corner) insert a rectangle and on this insert a new Matrix (Matrix 2). Remove the leftmost row. Set the column header to be the Column Name SubCat. This will automatically set the column grouping to be SubCat.
Finally, in the Data Section of Matrix 2 add a new Rectangle and Add a Tablix on it. Remove the Header Row, and set it to be one column wide only. Set the Data to be the information you wish to display, i.e. LogNo.
Finally, delete the Leftmost and Topmost rows/columns from Matrix 1 to make it look tidier (Note Delete Column Row only! Not associated groups!)
Then when the report is run it should look similar to the following. Note in my example SubCat = ColName, and LogNo = NumItems, and I have multiple values per SubCat.
Hopefully you find this helpful. If not, please ask for clarification.
Can you do something like this:
The following gives the steps (in two columns, down then across)

Convert any string to an integer

Simply put, I'd like to be able to convert any string to an integer, preferably being able to restrict the size of the integer and ensure that the result is always identical. In other words is there a hashing function, supported by Oracle, that returns a numeric value and can that value have a maximum?
To provide some context if needed, I have two tables that have the following, simplified, format:
Table 1 Table 2
id | sequence_number id | sequence_number
-------------------- -------------
1 | 1 1 | 2QD44561
1 | 2 1 | 6HH00244
2 | 1 2 | 5DH08133
3 | 1 3 | 7RD03098
4 | 2 4 | 8BF02466
The column sequence_number is number(3) in Table 1 and varchar2(11) in Table 2; it is part of the primary key in both tables.
The data is externally provided and cannot be changed; in Table 1 it is, I believe, created by a simple sequence but in Table 2 has a meaning. The data is made up but representative.
Someone has promised that we would output a number(3) field. While this is fine for the column in the first table, it causes problems for the second.
I would like to be able to both convert sequence_number to an integer (easy), that is less than 1000 (harder) and if at all possible is constant (seemingly impossible). This means that I would like '2QD44561' to always return 586. It does not matter, much, if two strings return the same number.
Simply converting to an integer I can use utl_raw.cast_to_number():
select utl_raw.cast_to_number((utl_raw.cast_to_raw('2QD44561'))) from dual;
UTL_RAW.CAST_TO_NUMBER((UTL_RAW.CAST_TO_RAW('2QD44561')))
---------------------------------------------------------
-2.033E+25
But as you can see this isn't less than 1000
I've also been playing around with dbms_crypto and utl_encode to see if I could come up with something but I've not managed to get a small integer. Is there a way?
How about ora_hash?
select ora_hash(sequence_number, 999) from table_2;
... will produce a maximum of 3 digits. You could also seed it with the id I suppose, but not sure that adds much with so few values, and I'm not sure you'd want that anyway.
You are talking about using a hash function. There are lots of solutions out there - sha1 is very common.
But just FYI, when you say "restrict the size of the integer" understand that you will then be mapping an infinite set of strings or numbers onto a limited set of values. So while your strings will always map to the same value when they are the same, they will not be the only string to map to that value