Redshift - How to group by max() on an array field

Redshift - How to group by max() on an array field - sql

I am using redshift
I have a table like this :
metric is a super type, built with the array() function within redshift
user
metrics
red
array(2021, 120)
red
array(2020, 99)
blue
array(2021, 151)
I would like to do :
select user, max(metrics) from table group by user
and get this :
user
metrics
red
array(2021, 120)
blue
array(2021, 151)
Sadly using this query, I only get null values
Do you know how to handle that ?
Thanks

If you are familiar with Redshift Spectrum, the logic is very similar to unnest an array field when you query an external schema.
In your case, the query is pretty simple:
SELECT t.user, max(metric)
FROM my_schema.my_table as t, t.metrics as metric
GROUP BY 1
If the array contains types other than numerical ones, you can simply cast it to int or double like:
max(metric::int)
In this way, pure string such as "hello world" are considered as null, but string like "33333" is converted to int.

Related

Find volumes by keying in on Units of Measure, but when there's two?

I have a string like below:
194736 BBLS FLUID 3800880 LBS 40/70 361060 LBS
I have used the following to fish out the '3800880' where I used a function to grab the right most word to the left of the 'LBS'. This does a decent enough job about finding the first value to the left of the 'LBS'. How do I go about getting the second value of 'LBS' there? The 361060 too? Either sum them up, or separate is fine
replace([dbo].[StripNonNumerics](dbo.getLastWord((SUBSTRING (fluid_description,0,PATINDEX('%LBS%',fluid_description))))),'.','') as LBSAmount,
Charindex, patindex, etc. How can you key off of 'Multiple' of the same keyword in you string?

Getting data out of unstructured strings is always going to be a problem, the easy solution is don't store data like that.
However to extract both values from the string and sum them, consider parsing the string as Json with openJson. This allows you to treat each "word" as a row and therefore able to utilise window functions to extract the required data:
select Sum([value])
from (
select '194736 BBLS FLUID 3800880 LBS 40/70 361060 LBS' )d(MyBadData)
cross apply (
select case when lead([value]) over(order by j.[key]) ='LBS'
then Try_Convert(int, [value])
end [value]
from OpenJson(Concat('["',replace(MyBadData,' ', '","'),'"]')) j
)x;
Result: 4161940

Convert strings into table columns in biq query

I would like to convert this table
to something like this
the long string can be dynamic so it's important to me that it's not a fixed solution for these values specifically
Please help, i'm using big query

You could start by using SPLIT SPLIT(value[, delimiter]) to convert your long string into separate key-value pairs in an array.
This will be sensitive to you having commas as part of your values.
SPLIT(session_experiments, ',')
Then you could either FLATTEN that array or access each element, and then use some REGEXs to separate the key and the value.
If you share more context on your restrictions and intended result I could try and put together a query for you that does exactly what you want.

It's not possible what you want, however, there is a better practice for BigQuery.
You can use arrays of structs to store that information in a table.
Let's say you have a table like that
You can use that sample query to understand how to use it.
with rawdata AS
(
SELECT 1 as id, 'test1-val1,test2-val2,test3-val3' as experiments union all
SELECT 1 as id, 'test1-val1,test3-val3,test5-val5' as experiments
)
select
id,
(select array_agg(struct(split(param, '-')[offset(0)] as experiment, split(param, '-')[offset(1)] as value)) from unnest(split(experiments)) as param ) as experiments
from rawdata
The output will look like that:
After having that output, it's more convenient to manipulate the data

function to convert unicode in bigquery

I trying out the NORMALIZE function with NFKC in bigquery from the documentation, I see that I can convert a string to a readable format. For example
WITH EquivalentNames AS (
SELECT name
FROM UNNEST([
'Jane\u2004Doe',
'\u0026 Hello'
]) AS name
)
SELECT
NORMALIZE(name, NFKC) AS normalized_str
FROM EquivalentNames
GROUP BY 1;
The ampersand character shows up correctly, but I have a table, with a column of STRING with unicode character in its values, but I'm not able to use NORMALIZE to convert it to a readable format.
I've also tried some of the other solutions presented
Decode Unicode's to Local language in Bigquery but nothing is working yet.
Attached is an example of the data:

You posted a question about NORMALIZE, but didn't make your goals clear.
Here I'll answer the question about NORMALIZE - to point out that it probably doesn't do what you are expecting it to do. But at least it's acting as expected.
There are many ways to encode the same string with Unicode. Normalize chooses one, while preserving the string.
See this query:
SELECT *, a=b ab, a=c ac, a=d ad, b=c bc, b=d bd, c=d cd
FROM (
SELECT NORMALIZE('hello ñá 😞', NFC) a
, NORMALIZE('hello ñá 😞', NFKC) b
, NORMALIZE('hello ñá 😞', NFD) c
, NORMALIZE('hello ñá 😞', NFKD) d
)
As you see - every time you get the same string, they just have different non-visible representations.

The \u2004 is so called thick space so that is why you thnk it is not showing correctly because you just see space -
But if you will try some other codes - like for example \2020 - you will see it is actually showing even without extra processing with NORMALIZE function
As in below
#standardSQL
WITH EquivalentNames AS (
SELECT name
FROM UNNEST([
'Jane\u2020Doe',
'\u0026 Hello'
]) AS name
)
SELECT
name, NORMALIZE(name, NFKC) AS normalized_str
FROM EquivalentNames
GROUP BY 1
with result
Row name normalized_str
1 Jane†Doe Jane†Doe
2 & Hello & Hello

Get an average value for element in column of arrays of json data in postgres

I have some data in a postgres table that is a string representation of an array of json data, like this:
[
{"UsageInfo"=>"P-1008366", "Role"=>"Abstract", "RetailPrice"=>2, "EffectivePrice"=>0},
{"Role"=>"Text", "ProjectCode"=>"", "PublicationCode"=>"", "RetailPrice"=>2},
{"Role"=>"Abstract", "RetailPrice"=>2, "EffectivePrice"=>0, "ParentItemId"=>"396487"}
]
This is is data in one cell from a single column of similar data in my database.
The datatype of this stored in the db is varchar(max).
My goal is to find the average RetailPrice of EVERY json item with "Role"=>"Abstract", including all of the json elements in the array, and all of the rows in the database.
Something like:
SELECT avg(json_extract_path_text(json_item, 'RetailPrice'))
FROM (
SELECT cast(json_items to varchar[]) as json_item
FROM my_table
WHERE json_extract_path_text(json_item, 'Role') like 'Abstract'
)
Now, obviously this particular query wouldn't work for a few reasons. Postgres doesn't let you directly convert a varchar to a varchar[]. Even after I had an array, this query would do nothing to iterate through the array. There are probably other issues with it too, but I hope it helps to clarify what it is I want to get.
Any advice on how to get the average retail price from all of these arrays of json data in the database?

It does not seem like Redshift would support the json data type per se. At least, I found nothing in the online manual.
But I found a few JSON function in the manual, which should be instrumental:
JSON_ARRAY_LENGTH
JSON_EXTRACT_ARRAY_ELEMENT_TEXT
JSON_EXTRACT_PATH_TEXT
Since generate_series() is not supported, we have to substitute for that ...
SELECT tbl_id
, round(avg((json_extract_path_text(elem, 'RetailPrice'))::numeric), 2) AS avg_retail_price
FROM (
SELECT *, json_extract_array_element_text(json_items, pos) AS elem
FROM (VALUES (0),(1),(2),(3),(4),(5)) a(pos)
CROSS JOIN tbl
) sub
WHERE json_extract_path_text(elem, 'Role') = 'Abstract'
GROUP BY 1;
I substituted with a poor man's solution: A dummy table counting from 0 to n (the VALUES expression). Make sure you count up to the maximum number of possible elements in your array. If you need this on a regular basis create an actual numbers table.
Modern Postgres has much better options, like json_array_elements() to unnest a json array. Compare to your sibling question for Postgres:
Can get an average of values in a json array using postgres?
I tested in Postgres with the related operator ->>, where it works:
SQL Fiddle.

How to use ORDER BY with proper handling of numbers?

I want to get data from mysql table sorted by one of it's varchar column. So let's say I have query like this:
SELECT name, model FROM vehicle ORDER BY model
The problem is, that for 'model' values like these: 'S 43', 'S 111' the order will be:
S 111
S 43
because I suppose ORDER BY uses alphabetic order rules, right? So how to modify this query to get "numerical" order? In which 'S 43' would be before 'S 111'? Without changing or adding any data to this table.

Something like this:
SELECT name, model
FROM vehicle
ORDER BY CAST(TRIM(LEADING 'S ' FROM model) AS INTEGER)
Note, that it's not a good practice to sort by function result, because it produces dynamic unindexed result which can be very slow, especially on large datasets.

If the non-numeric portion's of constant length, you could
ORDER BY substring(model, <length of non-numeric portion>)
or, if the non-numeric portion's length varies, you could
ORDER BY substring(model, 1 + LOCATE(' ', model))

You can take numeric part only (substring functions) and convert it into int (cast functions).
mySQL cast functions
mySQL string functions
I didn't test it myself but I suppose it should work.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Redshift - How to group by max() on an array field - sql

Related

Find volumes by keying in on Units of Measure, but when there's two?

Convert strings into table columns in biq query

function to convert unicode in bigquery

Get an average value for element in column of arrays of json data in postgres

How to use ORDER BY with proper handling of numbers?

Categories

Resources