regular expression on date extract in hive

regular expression on date extract in hive - hive

Team,
Need help here.
I have column with value like "Sum total to percent on 02/27/2019" and I need to extract only date part of it from it and wherever column has value like above so I used below.
case when split(col1,' ')[0]='Sum' then substr(col1,-10) else null end as col2
but, problem is that I do have column values which starts with value like "Sum total and not necessary" so with above code I am getting result as " necessary" which i do not need and should be replaced with null values.
my new column should get only date value and rest should be null. how this can achieved. kindly help. thanks

Use regexp_extract :
Demo:
Select regexp_extract(str,'\\d{2}/\\d{2}/\\d{4}',0) as dt
from
(-- your data
select 'Sum total to percent on 02/27/2019' as str
)s
Result:
02/27/2019
Regular expression '\d{2}/\d{2}/\d{4}' means: two digits, slash two digits, slash, 4 digits

Related

How to remove a leading character from numeric string? SQL

I have a list of tournament results from PGA tour data and would like to remove the "T" from the beginning of the finish column strings where applicable, so that I can get an average number. The string lengths are variable and also contain "CUT" in some rows. Is there a way to remove the "T"?
I have used...
WHERE Finish not like "CUT"
to remove "CUT" values
and have used various functions with no success to remove the "T". Any help would be greatly appreciated! Thanks
Showing variable string lengths in Finish column
EDIT:
This is what I have so far, which works perfectly to aggregate averages and group by player in a single row as desired.
SELECT
DISTINCT(player),
ROUND(AVG(CAST(sg_putt as numeric)),2) as avg_sg_putt,
ROUND(AVG(CAST(sg_arg as numeric)),2) as avg_sg_arg,
ROUND(AVG(CAST(sg_app as numeric)),2) as avg_sg_app,
ROUND(AVG(CAST(sg_ott as numeric)),2) as avg_sg_ott,
ROUND(AVG(CAST(sg_t2g as numeric)),2) as avg_sg_t2g,
ROUND(AVG(CAST(sg_total as numeric)),2) as avg_sg_total,
SUM(made_cut) as cuts_made,
COUNT(DISTINCT(tournament_id)) as total_played,
FROM
`pga_stats_2015_2022.stats`
WHERE
season >= 2017 AND
sg_putt not like "NA" AND
sg_arg not like "NA" AND
sg_app not like "NA" AND
sg_ott not like "NA" AND
sg_t2g not like "NA" AND
sg_total not like "NA"
GROUP BY player
HAVING total_played > 50
ORDER BY(avg_sg_total) DESC

From the documentation here, it seems you want REPLACE:
REPLACE(original_value, from_value, to_value)
So for instance,
SELECT REPLACE(Finish,'T','') as Finish
FROM yourTable
WHERE Finish <> 'CUT'
EDIT:
Looking at your full query, I suspect you want to add:
ROUND(AVG(CAST(REPLACE(Finish,'T','') as numeric)),2) as avg_Finish
to your SELECT.
Then add:
WHERE Finish <> "CUT"
to your WHERE

Perhaps ltrim does the trick
select ltrim(finish,'T') --might want to cast to int before calculating avg
from..
where..
Note that ltrim removes all occurrences from the left so it'll remove T from both T6 and TT6 for example

ORACLE sql Substr / Instr

I have a column within a table that has PO-RAILCAR. I need to split this column into two. I write the following query and it does exactly what I want. However, the results come back with the dash. How do I write it to return the values as they are without the dashes?
SELECT INVT_LEV3, SUBSTR(INVT_LEV3,1,INSTR(INVT_LEV3,'-')) AS PO,
SUBSTR(INVT_LEV3,INSTR(INVT_LEV3,'-')) AS Railcar
FROM C_MVT_H
WHERE INVT_LEV4 = 'G07K02129/G07K02133'
This is what I get: First column is the column I need to split. The second and third look perfect but I need the dash removed
Column 1: 110799P-FBOX50553 Column2: 110799P- Column3:-FBOX505536

The problem is occurring because INSTR is giving you the position of the '-' within the text. To fix this you can just add or subtract 1 from the position returned.
Your current query:
SELECT INVT_LEV3, SUBSTR(INVT_LEV3,1,INSTR(INVT_LEV3,'-')-1) AS PO, SUBSTR(INVT_LEV3,INSTR(INVT_LEV3,'-')+1) AS Railcar FROM C_MVT_H WHERE INVT_LEV4 = 'G07K02129/G07K02133'
Proposed new query
SELECT INVT_LEV3, SUBSTR(INVT_LEV3,1,INSTR(INVT_LEV3,'-')) AS PO, SUBSTR(INVT_LEV3,INSTR(INVT_LEV3,'-')) AS Railcar FROM C_MVT_H WHERE INVT_LEV4 = 'G07K02129/G07K02133'

Need to find string using bigquery

We have below string column and having below data
and I want to find Null count present in string columns means how many times null value('') present in front of id column present in select statement
using big query.
Don't use string position.
Expected output:
count of null ('')id =3
1st row,2nd row and 5th row

Below is for BigQuery Standard SQL
#standardSQL
SELECT
FORMAT(
"count of null ('')id = %d. List of id is: %s",
COUNT(*),
STRING_AGG(CAST(ID AS STRING))
) AS output
FROM `project.dataset.table`
WHERE REGEXP_CONTAINS(String, r"(?i)''\s+(?:as|)\s+(?:id|\[id\])")
if to apply to sample data from your question - the output is
Row output
1 count of null ('')id = 3. List of id is: 1,2,5

The idea is to unify all strings to something you can query with like = "%''asid%" or regex
First replace all spaces with ''
replace "[", "]" with ''.
Make the use of " or ' consistent.
Then query with like.
For example:
select 1 from (select replace(replace(replace(replace('select "" as do, "" as [id] form table1',' ',''),'[',''),']',''),'"',"'") as tt)
where tt like ("%''asid%")
Its not a "smart" idea but its simple.
A better idea will be to save the query columns in a repeat column '"" as id' and the table in another column.
You don't need to save 'select' and 'from' this way you can query easily and also assemble a query from the data.

If I understand correctly, you want to count the number of appearances of '' in the string column.
If so, you can use regexp_extract_all():
select t.*,
(select count(*)
from unnest(regexp_extract_all(t.string, "''")) u
) as empty_string_count
from t;

SQL Server: How to select rows which contain value comprising of only one digit

I am trying to write a SQL query that only returns rows where a specific column (let's say 'amount' column) contains numbers comprising of only one digit, e.g. only '1's (1111111...) or only '2's (2222222...), etc.
In addition, 'amount' column contains numbers with decimal points as well and these kind of values should also be returned, e.g. 1111.11, 2222.22, etc

If you want to make the query generic that you don't have to specify each possible digit you could change the where to the following:
WHERE LEN(REPLACE(REPLACE(amount,LEFT(amount,1),''),'.','') = 0
This will always use the first digit as comparison for the rest of the string

If you are using SQL Server, then you can try this script:
SELECT *
FROM (
SELECT CAST(amount AS VARCHAR(30)) AS amount
FROM TableName
)t
WHERE LEN(REPLACE(REPLACE(amount,'1',''),'.','') = 0 OR
LEN(REPLACE(REPLACE(amount,'2',''),'.','') = 0

I tried like this in place of 1111111 replace with column name:
Select replace(Str(1111111, 12, 2),0,left(11111,1))

SQL Syntax to count number of digits in the whole number portion of a Decimal value

I am looking for SQL Syntax to count number of digits in the whole number portion of a decimal value.
Example : E001.0
For this, I expect 4
and E00.10
For this, I expect 3.
http://sqlfiddle.com/#!9/b19c7/2
My table has more than 100,000 records. I need to get only distinct count.

You can use a combination of substring and charindex.
SELECT distinct
case when charindex('.',value) > 0 then len(substring(value, 1, charindex('.',value)-1))
else len(value) end as lngth
from Numbers

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

regular expression on date extract in hive - hive

Use regexp_extract : Demo: Select regexp_extract(str,'\\d{2}/\\d{2}/\\d{4}',0) as dt from (-- your data select 'Sum total to percent on 02/27/2019' as str )s Result: 02/27/2019 Regular expression '\d{2}/\d{2}/\d{4}' means: two digits, slash two digits, slash, 4 digits

Related

How to remove a leading character from numeric string? SQL

ORACLE sql Substr / Instr

Need to find string using bigquery

SQL Server: How to select rows which contain value comprising of only one digit

SQL Syntax to count number of digits in the whole number portion of a Decimal value

Categories

Resources