I am trying to get data from a table that has column name as: year_2016, year_2017, year_2018 etc.
I am not sure how to get the data from this table.
The data looks like:
| count_of_accidents | year_2016 | year_2017 |year_2018 |
|--------------------|-----------|-----------|----------|
| 15 | 12 | 5 | 1 |
| 5 | 10 | 6 | 18 |
I have tried 'concat' function but this doesn't really work.
I have tried with this:
select SUM( count_of_accidents * concat('year_',year(regexp_replace('2018_1_1','_','-'))))
from table_name;
The column name (year_2017 or year_2018 etc) will be passed as a parameter. So, I am not really able to hardcode the column name like this-
select SUM( count_of_accidents * year_2018) from table_name;
Is there any way I can do this?
You can do it using regular expressions. Like this:
--create test table
create table test_col(year_2018 string, year_2019 string);
set hive.support.quoted.identifiers=none;
set hive.cli.print.header=true;
--test select using hard-coded pattern
select year_2018, `(year_)2019` from test_col;
OK
year_2018 year_2019
Time taken: 0.862 seconds
--test pattern parameter
set hivevar:year_param=2019;
select year_2018, `(year_)${year_param}` from test_col;
OK
year_2018 year_2019
Time taken: 0.945 seconds
--two parameters
set hivevar:year_param1=2018;
set hivevar:year_param2=2019;
select `(year_)${year_param1}`, `(year_)${year_param2}` from test_col t;
OK
year_2018 year_2019
Time taken: 0.159 seconds
--parameter contains full column_name and using more strict regexp pattern
set hivevar:year_param2=year_2019;
select `^${year_param2}$` from test_col t;
OK
year_2019
Time taken: 0.053 seconds
--select all columns using single pattern year_ and four digits
select `^year_[0-9]{4}$` from test_col t;
OK
year_2018 year_2019
Parameter should be calculated and passed to the hive script, no functions like concat(), regexp_replace are supported in the column names.
Also column aliasing does not work for columns extracted using regular expressions:
select t.number_of_incidents, `^${year_param}$` as year1 from test_t t;
throws exception:
FAILED: SemanticException [Error 10004]: Line 1:30 Invalid table alias
or column reference '^year_2018$': (possible column names are:
number_of_incidents, year_2016, year_2017, year_2018)
I found a workaround to alias a column using union all with empty dataset, see this test:
create table test_t(number_of_incidents int, year_2016 int, year_2017 int, year_2018 int);
insert into table test_t values(15, 12, 5, 1); --insert test data
insert into table test_t values(5,10,6,18);
--parameter, can be passed from outside the script from command line
set hivevar:year_param=year_2018;
--enable regex columns and print column names
set hive.support.quoted.identifiers=none;
set hive.cli.print.header=true;
--Alias column using UNION ALL with empty dataset
select sum(number_of_incidents*year1) incidents_year1
from
(--UNION ALL with empty dataset to alias columns extracted
select 0 number_of_incidents, 0 year1 where false --returns no rows because of false condition
union all
select t.number_of_incidents, `^${year_param}$` from test_t t
)s;
Result:
OK
incidents_year1
105
Time taken: 38.003 seconds, Fetched: 1 row(s)
First query in the UNION ALL does not affect data because it returns no rows. But it's column names become the names of the whole UNION ALL dataset and can be used in the upper query. This trick works. If you will find a better workaround to alias columns extracted using regexp, please add your solution as well.
Update:
No need in regular expressions if you can pass full column_name as a parameter. Hive substitutes variables as is (does not calculate them) before query execution. Use regexp only if you can not pass full column name for some reason and like in the original query some pattern concatenation is needed. See this test:
--parameter, can be passed from outside the script from command line
set hivevar:year_param=year_2018;
select sum(number_of_incidents*${year_param}) incidents_year1 from test_t t;
Result:
OK
incidents_year1
105
Time taken: 63.339 seconds, Fetched: 1 row(s)
Related
I'm trying to use the strtok function to breakout a concatenated string.
This is what I have so far. My source tables are sitting in Teradata and I'm running the code via SAS.
proc sql;
connect to teradata as tera ( server='XXXXXX' authdomain='XXXXX';
execute(
update DB.Table1
from
(
select id, string_key
from DB.Table2
where date_time >= current_date
) c
set
country = STRTOK (c.string_key,',',1),
Expense = STRTOK (c.string_key,',',2),
First_Name = STRTOK (c.string_key,',',3),
) by tera
disconnect from tera
quit;
An example of a value in string_key is :
UK,244,Jack,Mathews
For the above example, my code has no problem creating the required output i.e:
Country
Expense
First name
UK
244
Jack
However in instances where a value in string_key has a null value after the delimiter, the strtok function returns the next available value in the wrong column.
for e.g when string_key is :
UK,244,,Mathews
then the output I get is
Country
Expense
First name
UK
244
Mathews
but what I want is that the First_name column should be empty as there is no value for it in the string_key.
i.e I want it is
Country
Expense
First name
UK
244
Could someone pls help tweak my code such that the column populates with a null value if the string has a null value?
Many thanks!
STRTOK doesn't play nicely with consecutive delimiters. You can replace consecutive delimiters with <delimiter><space><delimiter>. In your case:
trim(strtok(oreplace(string_key,',,',', ,'),',',3))
Alternatively, you could use csvld:
select
*
from
table (csvld(<table>.string_key,',','')
returns(a varchar(100), b varchar(100), c varchar(100), d varchar (100))) as t;
A RegEx should work:
SELECT 'UK,244,,Mathews' AS string_key
,REGEXP_SUBSTR(string_key, '(,|^)\K([^,]*)(?=,|$)',1,1)
,REGEXP_SUBSTR(string_key, '(,|^)\K([^,]*)(?=,|$)',1,2)
,REGEXP_SUBSTR(string_key, '(,|^)\K([^,]*)(?=,|$)',1,3)
,REGEXP_SUBSTR(string_key, '(,|^)\K([^,]*)(?=,|$)',1,4)
;
See RegEx101
I have a table like cust_attbr consists column attbr which has values like:
{"SRCTAXAMT":"11300",เอ็ก10110","TAXAMT":"11300","LOGID":"190301863","VAT_NUMBER":"0835546003122"}
{"SRCTAXAMT":"11300", กรุงค10110","TAXAMT":"11300","LOGID":"190301863","VAT_NUMBER":"0835546003122"}
........ ... ...
{"SRCTAXAMT":"11300", กรุงค10110","TAXAMT":"11300","LOGID":"190301863","VAT_NUMBER":" "}
So, I have to write one select statement which will fetch only VAT_NUMBER value like:
0835546003122
0835546003122
.... ... ..
null
With sample data you posted:
SQL> select * From test;
ID ATTBR
---------- ----------------------------------------------------------------------------------------------------------------
1 "{"SRCTAXAMT":"11300",????10110","TAXAMT":"11300","LOGID":"190301863","VAT_NUMBER":"0835546003122"}"
2 "{"SRCTAXAMT":"11300", ?????10110","TAXAMT":"11300","LOGID":"190301863","VAT_NUMBER":"0835546003122"}"
3 "{"SRCTAXAMT":"11300", ?????10110","TAXAMT":"11300","LOGID":"190301863","VAT_NUMBER":" "}"
this might be one option:
SQL> select id,
2 regexp_substr(regexp_substr(attbr, 'VAT_NUMBER":"(\d+)?'), '\d+$') vat
3 from test;
ID VAT
---------- --------------------
1 0835546003122
2 0835546003122
3
SQL>
Inner regexp_substr returns VAT_NUMBER followed by optional number, while the outer one extracts only the number anchored to the end of the previous substring.
If you're on 18c and the data is actual json (it currently is not because of the double quotes around the curly braces and the ",.กรุงค10110" - It is unclear that this is because of your sample data) you could use json_table function:
WITH t (json_val) AS
(
SELECT '{"SRCTAXAMT":"11300","TAXAMT":"11300","LOGID":"190301863","VAT_NUMBER":"0835546003122"}' FROM DUAL UNION ALL
SELECT '{"SRCTAXAMT":"11300","TAXAMT":"11300","LOGID":"190301863","VAT_NUMBER":"0835546003122"}' FROM DUAL UNION ALL
SELECT '{"SRCTAXAMT":"11300","TAXAMT":"11300","LOGID":"190301863","VAT_NUMBER":" "}' FROM DUAL
)
SELECT jt.*
FROM t,
JSON_TABLE(json_val, '$'
COLUMNS (first_name VARCHAR2(50 CHAR) PATH '$."VAT_NUMBER"')) jt;
0835546003122
0835546003122
One option would be converting those column values to JSON syntax an then extract the values of VAT_NUMBER keys provided DB version is 12c Release 1+. Here, we have an issue that there are unrecognized characters, probably an alphabet from far east and those strings are not properly quoted, then we need to remove the part upto TAXAMT key, and then extracting VAT_NUMBER key's value through prefixing an opening curly brace('{') by use of JSON_VALUE() function :
SELECT JSON_VALUE(
'{'||REGEXP_REPLACE(str,'(.*10110",)(.*)','\2'),
'$.VAT_NUMBER'
) AS VAT_NUMBER
FROM tab --> your original data source
Demo
Using Remove part of string in table as an example, I want to change part of my string in my database column with a different string.
Ex:
Database says E:\websites\nas\globe.png , E:\websites\nas\apple.png and etc
I want it to say \\nas\globe.png, \\nas\apple.png,
Only part I want to replace is the E:\websites\ not the rest of the string
How do I do this?
So far I have:
SELECT file_name,
REPLACE(file_name,'E:\websites\','\\nas\')
FROM t_class;
I just referenced http://nntp-archive.sybase.com/nntp-archive/action/article/%3C348_1744DC78C1045E920059DE7F85256A8B.0037D71C85256A8B#webforums%3E
and used:
SELECT REPLACE('E:\websites\web\Class\Main_Image\','E:\websites\web\Class\Main_Image\','\\nas\class_s\Main_Image\') "Changes"
FROM DUAL;
but once again it wouldn't change O.o
In Oracle, you may need to double up on the back slashes:
SELECT file_name,
REPLACE(file_name,'E:\\websites\\', '\\\\nas\\')
FROM t_class;
For the fun of it, using regexp_replace:
SQL> with tbl(filename) as (
2 select 'E:\websites\nas\globe.png' from dual
3 union
4 select 'E:\websites\nas\apple.png' from dual
5 )
6 select filename, regexp_replace(filename, 'E:\\websites', '\\') edited
7 from tbl;
FILENAME EDITED
------------------------- --------------------
E:\websites\nas\apple.png \\nas\apple.png
E:\websites\nas\globe.png \\nas\globe.png
SQL>
I found a reference at how to replace string values of column values by SQL Stored Procedure
by doing the following:
UPDATE t_class SET file_name =
REPLACE
(file_name, 'E:\websites\web\Class\Main_Image\No_Image_Available.png', '\\nas\class_s\Main_Image\No_Image_Available.png');
so only difference is the update and = sign
Example,
Suppose I have following column values as in column1 in three rows,
10,9,2,3
12,9,8,9
16,2,9,2
I need to get the records based on column1 value with 2nd position value to be 9.
Result I am expecting as follows,
10,9,2,3
12,9,8,9
Thanks
Rajasekar R
Try Regex
select * from a where a1 ~ '^\d+[,][9][,]';
or
select * from a where a1 ~ '^\d+,9,';
Both work flawlessly
For Input
ABC
DEF
HIJ
1,9,2,3
5,9,4,6
1,2,3,9
2,3,3,9
5,99,4,6
10,9,2,3
12,9,8,9
16,2,9,2
162,9,2
Output
1,9,2,3
5,9,4,6
10,9,2,3
12,9,8,9
162,9,2
I am having Postgres So I used ~ , for others you can use respective regex command
You can try above example (SQL) for your case :
DECLARE #tempTable TABLE(Value NVARCHAR(100))
INSERT INTO #tempTable VALUES('10,9,2,3');
INSERT INTO #tempTable VALUES('12,9,8,9');
INSERT INTO #tempTable VALUES('16,2,9,2');
SELECT *
FROM #tempTable
WHERE SUBSTRING(Value,CHARINDEX(',', Value)+1,1) = '9'
I have a varchar field with the following data:
Interface
Gig1/0/1
Gig1/0/3
Gig1/0/5
Gig1/0/10
Gig1/1/11
I am trying to do a search (BETWEEN).
Select * from test1 where Interface Between "Gig1/0/1" and "Gig1/0/5"
Returns all the records except for Gig1/1/11
What about using the substring_index function?
select substring_index(fieldname,'/',-1) as v from tablename where v between 1 and 5