Extract the highest key:value pair from a string in Standard SQL

Extract the highest key:value pair from a string in Standard SQL - sql

I have the following data type below, it is a type of key value pair such as 116=0.2875. Big Query has stored this as a string. What I am required to do is to extract the key i.e 116 from each row.
To make things more complicated if a row has more than one key value pair the iteration to be extracted is the one with the highest number on the right e.g {1=0.1,2=0.8} so the extracted number would be 2.
I am struggling to use SQL to perform this, Particularly as some rows have one value and some have multiple:
This is as close as I have managed to get where I can create a bit of code to extract the highest right hand value (which I don't need) but I just cant seem to create something to either get the whole key/value pair which would be fine and work for me or just the key which would be great.
column
,(SELECT MAX(CAST(Values AS NUMERIC)) FROM UNNEST(JSON_EXTRACT_ARRAY(REPLACE(REPLACE(REPLACE(column,"{","["),"}","]"),"=",","))) AS Values WHERE Values LIKE "%.%") AS Highest
from `table`
Here is some sample data:
1 {99=0.25}
2 {99=0.25}
3 {99=0.25}
4 {116=0.2875, 119=0.6, 87=0.5142857142857143}
5 {105=0.308724832214765}
6 {105=0.308724832214765}
7 {139=0.5712754555198284}
8 {127=0.5767967894928858}
9 {134=0.2530120481927711, 129=0.29696599825632086, 73=0.2662459427947186}
10 {80=0.21242613001118038}
Any help on this conundrum would be greatly appreciated!

Consider below approach
select column,
( select cast(split(kv, '=')[offset(0)] as int64)
from unnest(regexp_extract_all(column, r'(\d+=\d+.\d+)')) kv
order by cast(split(kv, '=')[offset(1)] as float64) desc
limit 1
) key
from your_table
if applied to sample data in your question - output is

Related

SQLite3 Order by highest/lowest numerical value

I am trying to do a query in SQLite3 to order a column by numerical value. Instead of getting the rows ordered by the numerical value of the column, the rows are ordered alphabetically by the first digit's numerical value.
For example in the query below 110 appears before 2 because the first digit (1) is less than two. However the entire number 110 is greater than 2 and I need that to appear after 2.
sqlite> SELECT digit,text FROM test ORDER BY digit;
1|one
110|One Hundred Ten
2|TWO
3|Three
sqlite>
Is there a way to make 110 appear after 2?

It seems like digit is a stored as a string, not as a number. You need to convert it to a number to get the proper ordering. A simple approach uses:
SELECT digit, text
FROM test
ORDER BY digit + 0

Generate random records from the table tblFruit based on the field Type

I will need your help to generate random records from the table tblFruit based on the field Type (without no duplication)
As per the above table.
There are 4 type of fruit number 1,2,3,4
I want to generate x records dynamically from the table tblFruit (e.g 7 records).
Let say I need to get 7 random record of fruit .
My result should contains fruit of the different types. However, we need to ensure that the result contains only 7 records.
i.e
2 records of type 1,
2 records of type 2,
2 records of type 3,
1 records of type 4
e.g
Note: If i want to generate 10 records (without no duplication),
then i will get 2 records of each type and the two remaining records randomly of any type.
Much grateful for your help.

I might suggest:
select top (7) f.*
from tblfruit f
order by row_number() over (partition by type order by newid());
This will actually produce a result with approximately the same number of rows of each type (well, off by 1), but that meets your needs.

Populate NULL Values based on Array Formula

New user, so apologies in advance for bad formatting.
Essentially what I'm trying to do is be able to populate the staff_hours column where it equals NULL with the one value that IS NOT NULL. As you can see from the screenshot, there will only be one person who staffs an open cl_hole_staffing_no and as a result will have a start_dt (with time) and end_dt (with time) along with staff_hours. 16 people were offered a shift, and the person in row 15 accepted it is what is going on here.
The ideal output would be the staff_hours column is populated with the amount of time of the one person who ended up taking the open job, so 24.00 in this example. How can I write a formula to do this? I was thinking something like an array function in Excel, but am not sure how to do that in SQL.

Your explanation is a bit confusing about what you are really trying to achieve. However I think that what you really want is just to populate the staff_hours column, which can be achieved with the following:
UPDATE
your_table_name
SET
staff_hours = 24
WHERE
staff_hours is NULL;
EDIT
I get it now. You want to operate with the two dates and extract the amount of hours between them. Since you are in sql-server you can actually define a Computed Column in which you can use the values from other columns to compute the value you want.
You will need to create your table again. (The example below contains only the necessary attributes for it to work)
CREATE TABLE your_table_name
( id INT IDENTITY (1,1) NOT NULL
, staff_start_dt DATETIME
, staff_end_dt DATETIME
, staff_hours AS DATEDIFF(hh, staff_start_dt , staff_end_dt)
);
Now every time you insert a record on the table with both staff_start_dt and staff_end_dt, the column staff_hours will automatically compute the number of hours between the two dates.

[pre]
Code (vb):
A B C
1 10 X X
2 11 A Y
3 12 Y Z
4 13 B
5 14 B
6 15 Z
[/pre]
Assuming that the rows in Col A is Named "datarange"
And your criteria is in C1:C3
The following formula will return an array {10,12,15}
=SMALL(COUNTIF(C1:C3,B1:B6)*datarange, ROW(INDEX(A:A,SUMPRODUCT(--(COUNTIF(C1:C3,B1:B6)=0))+1):INDEX(A:A,ROWS(datarange))))
COUNTIF(C1:C3,B1:B6)*datarange returns {10;0;12;0;0;15}
The segment ROW(INDEX(....):INDEX(...)) returns {4;5;6}, indicating the number of non-zero values.
The SMALL() function then returns the 4th smallest, 5th smallest and 6th smallest values.
One disadvantage with this approach is that you get a sorted sub-list. Perhaps that would work for you.

Average Distinct Values in a single column in Power Pivot

I have a column in PowerPivot that basically goes:
1
1
2
3
4
3
5
4
If I =AVERAGE([Column]), it's going to average all 8 values in the sample column. I just need the average of the distinct values (i.e., in the example above I want the average of (1,2,3,4,5).
Any thoughts on how to go about doing this? I tried a combination of =(DISTINCT(AVERAGE)) but it gives a formula error.
Thanks!!
Kevin

There must be a cleaner way of doing this but here is one method which uses a measure to get the sum of the values divided by the number of times it appears (to basically give the original value) then uses an iterative function to do it for each unique value.
Apologies for the uninspired measure names:
[m1] = SUM(table1[theValue]) / COUNTROWS(Table1)
[m2] = AVERAGEX(VALUES(Tables1[theValue]), [m1])
Assuming your table is caled table1 and the column is called theValue

What is the maximum number of key value pairs in a single hstore value?

I've got a postgres table (mapfeatures_20120813) with 2 columns (tags and pky) with around 1000 rows. Each row consists of a hstore and a primary key:
tags (hstore) pky
"aerialway"=>"cable_car"; 1
"aerialway"=>"chair_lift"; 2
"aerialway"=>"drag_lift"; 3
"aerialway"=>"gondola"; 4
"aerialway"=>"goods"; 5
"aerialway"=>"mixed_lift"; 6
"aerialway"=>"pylon"; 7
"aerialway"=>"station"; 8
"aeroway"=>"aerodrome"; 9
"aeroway"=>"apron"; 10
...
For some analysis I need to push all these single hstore key-value-pairs into one single hstore -row and I'm not sure how to solve this.
Therefore I first convert all the rows into a single-row text-field:
CREATE TABLE mf_text AS
SELECT array_to_string(array_agg(tags), ',')
FROM mapfeatures_20120813;
In a second step I create a hstore out of this text-field:
SELECT hstore(array_to_string)
FROM mf_text
But the problem is that only 97 of the more than 1000 key-value-pairs are written into the new hstore-field. Also I can't see any pattern in my result, it's totally mixed up:
"atv"=>"no", "hgv"=>"forestry", "lit"=>"no", "psv"=>"private"
, "area"=>"yes", "boat"=>"permissive"
Is there some sort of limitation to an hstore-field on how many key-value-pairs fit into a single hstore? The documentation doesn't say anything.

Aggregating a single hstore value should be as simple as:
SELECT string_agg(tags::text,',')::hstore
FROM mapfeatures_20120813;
As to your question in the title: there is practically no limit to the number of elements.
For your remark:
Also I can't see any pattern in my result, it's totally mixed up:
The manual has this to say:
The order of the pairs is not significant (and may not be reproduced
on output).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Extract the highest key:value pair from a string in Standard SQL - sql

Consider below approach select column, ( select cast(split(kv, '=')[offset(0)] as int64) from unnest(regexp_extract_all(column, r'(\d+=\d+.\d+)')) kv order by cast(split(kv, '=')[offset(1)] as float64) desc limit 1 ) key from your_table if applied to sample data in your question - output is

Related

SQLite3 Order by highest/lowest numerical value

Generate random records from the table tblFruit based on the field Type

Populate NULL Values based on Array Formula

Average Distinct Values in a single column in Power Pivot

What is the maximum number of key value pairs in a single hstore value?

Categories

Resources