In a hive query can you specify a condition like "where coulm1 is INT"? - hive

I would like to query a hive table only for those rows that have coulmn1 as integer value only. Due to some data corruption, without this check I am getting a lot of junk data, I would like to get rid of that data by applying where column1 is INT kind of condition, but I couldn't find anything like that in hive. Could anyone suggest how I could do it?

Without any example data, I would suggest something very basic like this:
define column X as STRING
check that X = cast(cast(X as INT) as STRING)
You may have to add some tolerance to blank space, zero-padding, etc. depending on the way your "integers" are actually formatted.

Found a solution that is working :
I could add a Double number check like below, anything other than just numbers will make it null. Also, the valid numbers for the column will never cross the Double range.
So we could do something like below I guess:
"select * from table_example where cast(column1 as double) is not null"

Related

Adding column to table based on whether another column = a specific string

I want to add a column called "Sweep" that contains bools based on whether the "Result" was a sweep or not. So I want the value in the "Sweep" column to be True if the "Result" is '4-0' or '0-4' and False if it isn't.
This is a part of the table:
I tried this:
ALTER TABLE "NBA_finals_1950-2018"
ADD "Sweep" BOOL;
UPDATE "NBA_finals_1950-2018"
SET "Sweep" = ("Result" = '4-0' OR "Result" = '0-4');
But for some reason, when I run this code...:
SELECT *
FROM "NBA_finals_1950-2018"
ORDER BY "Year";
...only one of the rows (last row) has the value True even though there are other rows where the result is a sweep ('4-0' or '0-4') as shown in the picture below.
I don't know why this is happening but I guess there is something wrong with the UPDATE...SET code. Please help.
Thanks in advance.
NOTE: I am using PostgreSQL 13
This would occur if the strings are not really what they look like -- this is often due to spaces at the beginning or end. Or perhaps to hyphens being different, or other look-alike characters.
You just need to find the right pattern. So so with a select. This returns no values:
select *
from "NBA_finals_1950-2018"
where "Result" in ('4-0', '0-4');
You can try:
where "Result" like '%0-4%' or
"Result" like '%4-0%'
But, this should do what you want:
where "Result" like '%4%' and
"Result" like '%0%'
because the numbers are all single digits.
You can incorporate this into the update statement.
Note: double quotes are a bad idea. I would recommend creating tables and columns without escaping the names.

SQL full text search behavior on numeric values

I have a table with about 200 million records. One of the columns is defined as varchar(100) and it's included in a full text index. Most of the values are numeric. Only few are not numeric.
The problem is that it's not working well. For example if a row contains the value '123456789' and i look for '567', it's not returning this row. It will only return rows where the value is exactly '567'.
What am I doing wrong?
sql server 2012.
Thanks.
Full text search doesn't support leading wildcards
In my setup, these return the same
SELECT *
FROM [dbo].[somelogtable]
where CONTAINS (logmessage, N'28400')
SELECT *
FROM [dbo].[somelogtable]
where CONTAINS (logmessage, N'"2840*"')
This gives zero rows
SELECT *
FROM [dbo].[somelogtable]
where CONTAINS (logmessage, N'"*840*"')
You'll have to use LIKE or some fancy trigram approach
The problem is probably that you are using a wrong tool since Full-text queries perform linguistic searches and it seems like you want to use simple "like" condition.
If you want to get a solution to your needs then you can post DDL+DML+'desired result'
You can do this:
....your_query.... LIKE '567%' ;
This will return all the rows that have a number 567 in the beginning, end or in between somewhere.
99% You're missing % after and before the string you search in the LIKE clause.
es:
SELECT * FROM t WHERE att LIKE '66'
is the same as as using WHERE att = '66'
if you write:
SELECT * FROM t WHERE att LIKE '%66%'
will return you all the lines containing 2 'sixes' one after other

SQL Query - Greater Than with Text Data Type

I've searched around and couldn't find an answer anywhere.
I'm querying a database that has stored numbers as a VARCHAR2 data type. I'm trying to find numbers that are greater than 1450000 (where BI_SO_NBR > '1450000'), but this doesn't bring back the results I'm expecting.
I'm assuming it's because the value is stored as text and I don't know any way to get around it.
Is there some way to convert the field to a number in my query or some other trick that would work?Hopefully this makes sense.
I'm fairly new to SQL.
Thanks in advance.
If the number is too long to be converted correctly to a number, and it is always an integer with no left padding of zeroes, then you can also do:
where length(BI_SO_NBR) > length('1450000') or
(length(BI_SO_NBR) = length('1450000') and
BI_SO_NBR > '1450000'
)
You can try to use like this:
where to_number(BI_SO_NBR) > 1450000
Assuming you are using Oracle database. Also check To_Number function
EDIT:-
You can try this(after OP commented that it worked):
where COALESCE(TO_NUMBER(REGEXP_SUBSTR(BI_SO_NBR, '^\d+(\.\d+)?')), 0) > 1450000
If you are talking about Oracle, then:
where to_number(bi_so_nbr) > 1450000
However, there are 2 issues with this:
1. if there is any value in bi_so_nbr that cannot be converted to a number, this can result in an error
2. the query will not use an index on bi_so_nbr, if there is one. You could solve this by creating a function based index, but converting the varchar2 to number would be a better solution.

SQL - Conditionally joining two columns in same table into one

I am working with a table that contains two versions of stored information. To simplify it, one column contains the old description of a file run while another column contains the updated standard for displaying ran files. It gets more complicated in that the older column can have multiple standards within itself. The table:
Old Column New Column
Desc: LGX/101/rpt null
null Home
Print: LGX/234/rpt null
null Print
null Page
I need to combine the two columns into one, but I also need to delete the "Print: " and "Desc: " string from the beginning of the old column values. Any suggestions? Let me know if/when I'm forgetting something you need to know!
(I am writing in Cache SQL, but I'd just like a general approach to my problem, I can figure out the specifics past that.)
EDIT: the condition is that if substr(oldcol,1,5) = 'desc: ' then substr(oldcol,6)
else if substr(oldcol,1,6) = 'print: ' then substr(oldcol,7) etc. So as to take out the "desc: " and the "print: " to sanitize the data somewhat.
EDIT2: I want to make the table look like this:
Col
LGX/101/rpt
Home
LGX/234/rpt
Print
Page
It's difficult to understand what you are looking for exactly. Does the above represent before/after, or both columns that need combining/merging.
My guess is that COALESCE might be able to help you. It takes a bunch of parameters and returns the first non NULL.
It looks like you're wanting to grab values from new if old is NULL and old if new is null. To do that you can use a case statement in your SQL. I know CASE statements are supported by MySQL, I'm not sure if they'll help you here.
SELECT (CASE WHEN old_col IS NULL THEN new_col ELSE old_col END) as val FROM table_name
This will grab new_col if old_col is NULL, otherwise it will grab old_col.
You can remove the Print: and Desc: by using a combination of CharIndex and Substring functions. Here it goes
SELECT CASE WHEN CHARINDEX(':',COALESCE(OldCol,NewCol)) > 0 THEN
SUBSTRING(COALESCE(OldCol,NewCol),CHARINDEX(':',COALESCE(OldCol,NewCol))+1,8000)
ELSE
COALESCE(OldCol,NewCol)
END AS Newcolvalue
FROM [SchemaName].[TableName]
The Charindex gives the position of the character/string you are searching for.
So you get the position of ":" in the computed column(Coalesce part) and pass that value to the substring function. Then add +1 to the position which indicates the substring function to get the part after the ":". Now you have a string without "Desc:" and "Print:".
Hope this helps.

Return rows where first character is non-alpha

I'm trying to retrieve all columns that start with any non alpha characters in SQlite but can't seem to get it working. I've currently got this code, but it returns every row:
SELECT * FROM TestTable WHERE TestNames NOT LIKE '[A-z]%'
Is there a way to retrieve all rows where the first character of TestNames are not part of the alphabet?
Are you going first character only?
select * from TestTable WHERE substr(TestNames,1) NOT LIKE '%[^a-zA-Z]%'
The substr function (can also be called as left() in some SQL languages) will help isolate the first char in the string for you.
edit:
Maybe substr(TestNames,1,1) in sqllite, I don't have a ready instance to test the syntax there on.
Added:
select * from TestTable WHERE Upper(substr(TestNames,1,1)) NOT in ('A','B','C','D','E',....)
Doesn't seem optimal, but functionally will work. Unsure what char commands there are to do a range of letters in SQLlite.
I used 'upper' to make it so you don't need to do lower case letters in the not in statement...kinda hope SQLlite knows what that is.
try
SELECT * FROM TestTable WHERE TestNames NOT LIKE '[^a-zA-Z]%'
SELECT * FROM NC_CRIT_ATTACH WHERE substring(FILENAME,1,1) NOT LIKE '[A-z]%';
SHOULD be a little faster as it is
A) First getting all of the data from the first column only, then scanning it.
B) Still a full-table scan unless you index this column.