I am trying figure out how to extract a value from curly brackets in a column in Prestosql.
The field looks like,
rates
{"B":750}
{"B":1600}
{"B":900}
I want to extract the number values only in each bracket.
Also, if I want to divide that by 10 and then divide by 20 would that be easy to add into the query?
The rates column is of type map(varchar, bigint).
Since rates column is of type map(varchar, bigint). You can use Presto Map Functions and Operators on it. Examples:
SELECT rates['B'] FROM ... -- value under key "B"
SELECT map_values(rates) FROM ... -- all values in a map
See more in the Presto documentation.
Use something like this, where the regexp_extract function pulls out the number from your string, and the cast function converts this from a string to a number, which you can then go on to divide by 10 etc.
select cast(regexp_extract(rates, '\d+') as double) / 10
from my_table
Related
I have 2 columns that look a little like this:
Column A
Column B
Column C
ABC
{"ABC":1.0,"DEF":24.0,"XYZ":10.50,}
1.0
DEF
{"ABC":1.0,"DEF":24.0,"XYZ":10.50,}
24.0
I need a select statement to create column C - the numerical digits in column B that correspond to the letters in Column A. I have got as far as finding the starting point of the numbers I want to take out. But as they have different character lengths I can't count a length, I want to extract the characters from the calculated starting point( below) up to the next comma.
STRPOS(Column B, Column A) +5 Gives me the correct character for the starting point of a SUBSTRING query, from here I am lost. Any help much appreciated.
NB, I am using google Big Query, it doesn't recognise CHARINDEX.
You can use a regular expression as well.
WITH sample_table AS (
SELECT 'ABC' ColumnA, '{"ABC":1.0,"DEF":24.0,"XYZ":10.50,}' ColumnB UNION ALL
SELECT 'DEF', '{"ABC":1.0,"DEF":24.0,"XYZ":10.50,}' UNION ALL
SELECT 'XYZ', '{"ABC":1.0,"DEF":24.0,"XYZ":10.50,}'
)
SELECT *,
REGEXP_EXTRACT(ColumnB, FORMAT('"%s":([0-9.]+)', ColumnA)) ColumnC
FROM sample_table;
Query results
[Updated]
Regarding #Bihag Kashikar's suggestion: sinceColumnB is an invalid json, it will not be properly parsed within js udf like below. If it's a valid json, js udf with json key can be an alternative of a regular expression. I think.
CREATE TEMP FUNCTION custom_json_extract(json STRING, key STRING)
RETURNS STRING
LANGUAGE js AS """
try {
obj = JSON.parse(json);
}
catch {
return null;
}
return obj[key];
""";
SELECT custom_json_extract('{"ABC":1.0,"DEF":24.0,"XYZ":10.50,}', 'ABC') invalid_json,
custom_json_extract('{"ABC":1.0,"DEF":24.0,"XYZ":10.50}', 'ABC') valid_json;
Query results
take a look at this post too, this shows using js udf and with split options
Error when trying to have a variable pathsname: JSONPath must be a string literal or query parameter
I have string column that looks usually approximately like this:
https://mapy.cz/zakladni?x=16.3360208&y=49.6718038&z=8&source=firm&id=13123554
https://mapy.cz/turisticka?x=15.9380354&y=50.1990211&z=11&source=base&id=2197
https://mapy.cz/turisticka?x=12.8611357&y=49.8051338&z=16&source=base&id=1703157
I would like to group data by source which is part of the string - four letters behind "source=" (in the case above: firm) and then simply count them. Is there a way to achieve this directly in SQL code? I am using hadoop.
Data is a set of strings that look like above. My expected result is summary table with two columns: 1) Each type of the source (there is about 20 possible and their length is different so I cannot use sipmle substring). Ideally I am looking for solution that says: For the grouping use four letters that come after "source=" 2) Count of their occurences in all the strings.
There is just one source type in each string.
You can use regexp_extract():
select substr(regexp_extract(url, 'source[^&]+'), 8)
You can use charindex in MSSQL to get position of string and extract record
;with cte as (
SELECT SUBSTRING('https://mapy.cz/zakladni?x=16.3360208&y=49.6718038&z=8&source=firm&id=13123554',
charindex('&source=','https://mapy.cz/zakladni?x=16.3360208&y=49.6718038&z=8&source=firm&id=13123554')
+8,4) AS ExtractString )
select ExtractString,count(ExtractString) as count from cte group by ExtractString;
There is equivalent function LOCATE in hiveql for charindex.
Do you know how to format the output of a number in hive with thousand separator? For example:
data:146452664
output:146,452,664
I use this in Teradata, but don't know how to achieve in Hive.
cast(cast(cast(number as integer) as format'ZZZ,ZZZ,ZZZ,ZZ9') as char(11))
Use the format_number() function.
select format_number(146452664,0)
The first argument is the number and second is the number of decimal places to round.If D is 0, the result has no decimal point.
I need a SQL query to get the value between two known strings in a text column.
The column name is d_info and the table name is Details.
The text is an XML fragment, but stored as a text value.
What I need is to get the value between the bookends <nettoeinkommen> and </nettoeinkommen> which is 718 in this example.
I also need the output to be saved in new column named income with data type float(8).
land>DE</land></wohnanschrift><taetigkeit>rentner</taetigkeit><dkbkundenstatus><bestandskunde>false</bestandskunde></dkbkundenstatus><haushaltsangaben><einnahmen><einkommen><nettoeinkommen>718</nettoeinkommen></einkommen><kindergeld>0</kindergeld><vermietungverpachtungnetto>0</vermietungverpachtungnetto><elterngeld>0</elterngeld><rentenunbefristet>0</rentenunbefristet><unselbststaendigetaetigkeit>740</unselbststaendigetaetigkeit><geringfuegigebeschaeftigung>0</geringfuegigebeschaeftigung></einnahmen><ausgaben><warmmiete>550</warmmiete><ratenimmobilienfinanzierung>0</ratenimmobilienfinanzierung>
I tried this code:
SELECT cast(SUBSTRING(d_info, CHARINDEX('<nettoeinkommen>', d_info)
, CHARINDEX('</nettoeinkommen>', d_info) - CHARINDEX('<nettoeinkommen>', d_info)) as float(8)) as income
from dbo.Details
But it's returning an Error converting data type varchar to real.
When I remove the cast function, the script works but it returns <nettoeinkommen>718 instead of only 718.
Thanks.
It is starting at the start of the tag not the end of it.
SELECT cast(
SUBSTRING(
d_info,
CHARINDEX('<nettoeinkommen>', d_info) + len('<nettoeinkommen>'),
CHARINDEX('</nettoeinkommen>', d_info) - (CHARINDEX('<nettoeinkommen>', d_info) + len('<nettoeinkommen>'))
) as float(8)) as income
from dbo.Details
you might even have these defined in variables:
SELECT cast(
SUBSTRING(
d_info,
CHARINDEX(#startTag, d_info) + len(#startTag),
CHARINDEX(#endTag, d_info) - (CHARINDEX(#startTag,d_info)+ len(#startTag))
) as float(8)) as income
from dbo.Details
I think the code is much easier to understand with the variables.
You need to add the length of your opening tag from the start index and subtract from the length of your substring statement:
SUBSTRING(d_info, CHARINDEX('<nettoeinkommen>', d_info)+16,
CHARINDEX('</nettoeinkommen>', d_info) - CHARINDEX('<nettoeinkommen>', d_info)-16)
As it seems, you are querieing plain xml data, for such purpose sql-server provides xquery functionality:
SELECT CAST(r.d_info AS XML).value('(/haushaltsangaben/einnahmen/einkommen/nettoeinkommen)[1]', 'decimal(19,2)')
FROM
(
SELECT '<taetigkeit>rentner</taetigkeit>
<dkbkundenstatus>
<bestandskunde>false</bestandskunde>
</dkbkundenstatus>
<haushaltsangaben>
<einnahmen>
<einkommen>
<nettoeinkommen>718</nettoeinkommen>
</einkommen>
</einnahmen>
</haushaltsangaben>' AS d_info
) AS r
If you intend to query more info from your source, you will end up with a bunch of stacked substring, patindex functions or even your own defined functions. This should be more readable and mantainable.
Using XQuery: https://learn.microsoft.com/en-us/sql/t-sql/xml/query-method-xml-data-type
As for your initial issue The SUBSTRING function in SQL returns the subset from a string starting from a given index for a specific length. For example SELECT SUBSTRING('whatever',5,4) returns 'ever'.
In case of CHARINDEX it gives the index for the first found match of a given pattern within a string. Example SELECT CHARINDEX('ever','whatever') should return 5, as 'ever' starts at the fifth position in 'whatever').
Now in your case you need to add the length of '<nettoeinkommen>' to the starting charindex and substract the length of '</nettoeinkommen>' from the length of the substring:
Also consider using decimal or numeric type instead of float, if you need to precise calculations: https://technet.microsoft.com/en-us/library/ms187912(v=sql.105).aspx
Is there a way to get a number formatted with a comma for thousand in numbers?
According to IBM documentation, this is the syntax:
DECIMAL(:newsalary, 9, 2, ',')
newsalary is the string (field)
9 is the precision
2 is the scale
, is the delimiter.
I tried:
SELECT DECIMAL ( T1.FIELD1 , 15 , 2 , "," ) AS TOTAL FROM TABLE T1
When trying it, I am getting the following error:
Message: [SQL0171] Argument 4 of function DECIMAL not valid.
DECIMAL converts from string type to a numeric type.
Numeric types don't have separators; only character representations of numbers have separators.
What tool are you using STRSQL, Run SQL Scripts or something else? Once you convert the string to a number, the tool should add the language appropriate separators when it displays the numeric data. For example, in STRSQL:
select decimal('12345.67', 12,2) as mynum
from sysibm.sysdummy1
Returns:
MYNUM
12,345.67
Using SQL to format strings is usually a bad idea. That should be left to whatever is consuming the data.
But if you really, really, really want to do it. You should create a user defined function (UDF) that does it for you. Here's an article, Make SQL Edit the Way You Want It To that includes source for for an EDITDEC function written in ILE RPG along with the SQL function definition you need to use it in an SQL statement.