Count the number of occurences of a key in a json object - IMPALA/HIVE

Count the number of occurences of a key in a json object - IMPALA/HIVE - sql

I have a column in my Impala Table which is a map<string,string>. I have extracted that column as a key and value pair in 2 different cols as shown below.
The value column is a json object. I have to find out whether there are multiple occurrences of a particular key in that value column for a particular id.
id | key | value
1 | brm_res | {'abc':'3rr','vbg':''r45','abc':'5rr'}
2 | brm_res | {'abc':'3rr','vbg':''r45','bgh':'5rr'}
3 | brm_res | {'abc':'3rr','vbg':''r45','tyu':'5rr'}
4 | brm_res | {'abc':'3rr','vbg':''r45','yuo':'5rr'}
As shown in the example above, for a particular id(id=1) and key(brm_res), there are 2 entries for (abc) key in the value column. How to find this.
Please guide. Thanks in advance.

You can calculate character length of initial string, remove all occurences of key, calculate the difference in length, divide by length of key, it will be the number of occurrences. Something like this (not tested):
(char_length(value)-char_length(replace(value, "\'abc\':", ''))) div char_length("\'abc\':");

Related

How can I find the last member of an array in PostgreSQL?

How can I find the last member of an array in PostgreSQL?
for example you have an array of the numbers which each of them are id of a course, in the list of your chosen courses I want to know the last one : [1,32,4,6] I need number 6 ! how can I find the last number which is 6 ?

You can use array function array_upper() to get the index of the last element.
Consider:
with t as (select '{1,32,4,6}'::int[] a)
select a[array_upper(a, 1)] last_element from t
Demo on DB Fiddle:
| last_element |
| -----------: |
| 6 |

You can use ARRAY_UPPER to get the upper bound of your array. You can then retrieve the value at that returned index.
SELECT yourcolumn[ARRAY_UPPER(yourcolumn,1)] FROM yourtable;

SELECT MAX values for duplicate values in another column

I am having some trouble finding an answer for this one, so I apologize if it was somewhere else.
I have a table 'dbo.MileageImport' that has the following layout which I pulled to find duplicate entries:
|KEY | DATA |
---------------------
|V9864653 | 180288 |
|V9864653 | 22189 |
|V9864811 | 11464 |
|V9864811 | 12688 |
What I am having troubles with is when I run the following SQL in a DB2 environment:
SELECT KEY, MIN(DATA)
FROM dbo.MileageImport
GROUP BY KEY
HAVING (COUNT(KEY)>1);
It ends up pulling the following data:
|KEY | DATA |
---------------------
|V9864811 | 11464 |
|V9864653 | 180288 |
For some reason it's pulling the MIN value for V9864811, but not V9864653. If I inverse that and put MAX instead of MIN, it pulls the opposite values.
Is there something I am missing here so I can pull the MIN DATA value for only duplicate KEY records, or is there another way to do this? The report where this data comes from changes from month to month, so there could be different keys that end up being duplicated that I need to correct. Ultimately I am turning this into a DELETE statement to delete the lower of the two (or more) duplicated mileage entries.

Is your DATA column numerical? or a VARCHAR?
If you find its better to change it to a number if you can, maybe an integer if you aren't having any fractions and its just round numbers.
if not, then you could cast them to an integer value, but if there are lots of transactions or its a big table it will be slow and not ideal. Its bad practise to do that if you could just change the datatype!
SELECT KEY, MIN(CAST(DATA as Int))
FROM dbo.MileageImport
GROUP BY KEY
HAVING (COUNT(KEY)>1)

SQL: Find highest number if its in nvarchar format containing special characters

I need to pull the record containing the highest value, specifically I only need the value from that field. The problem is that the column is nvarchar format that contains a mix of numbers and special characters. The following is just an example:
PK | Column 2 (nvarchar)
-------------------
1 | .1.1.
2 | .10.1.1
3 | .5.1.7
4 | .4.1.
9 | .10.1.2
15 | .5.1.4
Basically, because of natural sort, the items in column 2 are sorted as strings. So instead of returning the PK for the row containing ".10.1.2" as the highest value i get the PK for the row that contains ".5.1.7" instead.
I attempted to write some functions to do this but it seems what I've written looked way more complicated than it should be. Anyone got something simple or complicated functions are the only way?
I want to make clear that I'm trying to grab the PK of the record that contains the highest Column 2 value.

This query might return what you desire
SELECT MAX(CAST(REPLACE(Column2, '.', '') as INT)) FROM table

Convert any string to an integer

Simply put, I'd like to be able to convert any string to an integer, preferably being able to restrict the size of the integer and ensure that the result is always identical. In other words is there a hashing function, supported by Oracle, that returns a numeric value and can that value have a maximum?
To provide some context if needed, I have two tables that have the following, simplified, format:
Table 1 Table 2
id | sequence_number id | sequence_number
-------------------- -------------
1 | 1 1 | 2QD44561
1 | 2 1 | 6HH00244
2 | 1 2 | 5DH08133
3 | 1 3 | 7RD03098
4 | 2 4 | 8BF02466
The column sequence_number is number(3) in Table 1 and varchar2(11) in Table 2; it is part of the primary key in both tables.
The data is externally provided and cannot be changed; in Table 1 it is, I believe, created by a simple sequence but in Table 2 has a meaning. The data is made up but representative.
Someone has promised that we would output a number(3) field. While this is fine for the column in the first table, it causes problems for the second.
I would like to be able to both convert sequence_number to an integer (easy), that is less than 1000 (harder) and if at all possible is constant (seemingly impossible). This means that I would like '2QD44561' to always return 586. It does not matter, much, if two strings return the same number.
Simply converting to an integer I can use utl_raw.cast_to_number():
select utl_raw.cast_to_number((utl_raw.cast_to_raw('2QD44561'))) from dual;
UTL_RAW.CAST_TO_NUMBER((UTL_RAW.CAST_TO_RAW('2QD44561')))
---------------------------------------------------------
-2.033E+25
But as you can see this isn't less than 1000
I've also been playing around with dbms_crypto and utl_encode to see if I could come up with something but I've not managed to get a small integer. Is there a way?

How about ora_hash?
select ora_hash(sequence_number, 999) from table_2;
... will produce a maximum of 3 digits. You could also seed it with the id I suppose, but not sure that adds much with so few values, and I'm not sure you'd want that anyway.

You are talking about using a hash function. There are lots of solutions out there - sha1 is very common.
But just FYI, when you say "restrict the size of the integer" understand that you will then be mapping an infinite set of strings or numbers onto a limited set of values. So while your strings will always map to the same value when they are the same, they will not be the only string to map to that value

Updating the first field with an existing Null value in a given row

I'm using sqlite3.
In a table "Playlist", I want to Set a new value to the first Null field in a given row (only the first one!).
So for example, here if I were editing the first row, I would want that value to replace the first Null (in column 'Song2') :
____________________________________
**table=Playlist**
____________________________________
id | plname | song1 | song2 | song3| song4..... (<- column names)
____________________________________
1 | Sounds | Alps | Null | Null | Null.....(<- first row)
____________________________________
Whats statement could I use to find the first Null field in a given row, and Set a new value to that field?

You can use a CASE statement for each column. Here is a SQL Fiddle as an example (note: you'd want to replace 'value' with a parameter) http://sqlfiddle.com/#!7/59931/6
Update: I missed the part where you want it to update the first null column. A little trickier, but this version should work: http://sqlfiddle.com/#!7/59931/9

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Count the number of occurences of a key in a json object - IMPALA/HIVE - sql

Related

How can I find the last member of an array in PostgreSQL?

SELECT MAX values for duplicate values in another column

SQL: Find highest number if its in nvarchar format containing special characters

Convert any string to an integer

Updating the first field with an existing Null value in a given row

Categories

Resources