Exclude the row which has numeric characters, only at the beginning of the row - sql

I have a similar table as below:
product
01 apple
02 orange
banana 10
I am trying to exclude only rows which start with a number. If the number is not in the beginning then it should not be excluded. The desired table output should be like this:
product
banana 10
However with my current query, it excludes everything as soon as there is a number in the row:
SELECT *
FROM table
WHERE product NOT LIKE '%0%'
Could anyone please suggest me on how to tackle this? Much appreciated.

Something like this maybe:
SELECT *
FROM table
WHERE left(product, 1) NOT IN ('0','1','2','3','4','5','6','7','8','9')

regex to match lines that don't start with number is
^[^0-9].*
An sql query in mysql would look like
SELECT *
FROM table
WHERE product RLIKE '^[^0-9].*'

I would recommend regular expressions. In Redshift, this looks like:
where product ~ '^[^0-9]'
I might also suggest:
where left(product, 1) not between '0' and '9'

Related

Select distinct instances substring from a column

I have a column containing substring variations of 'SN'. I just want to see the distinct variations of this plus 1 character on either side, not the whole field.
Field1
SYMBOL
1
safsdafsadfs aSN fsadfsadf
2
sadfsdafb_SN0 sdfsadfsadf
3
adsfsjSN
4
23 SN dfe
So I'd want to see; aSN, _SN0, jSN, SN .
The below returns each field.
SELECT DISTINCT SYMBOL
FROM table
WHERE SYMBOL LIKE '%SN%'
I checked SQL: selecting distinct substring from a field and some others but no luck. Any help appreciated. This is using Netezza DB.
UPDATED (I haven't read that you are using Netezza):
Maybe the query you are looking for is (not tested):
SELECT DISTINCT substr(symbol, strpos(symbol, 'SN')-1 , 3)
BUT... You will have a problem if the Symbol field has another "SN" before. For example "snafsdafsadfs aSN fsadfsadf". You will have problems because you cannot know which one is the correct SN.

select row based on what a substring in a column might contain

I'm looking to select the primary key of a row and I've only got a column that contains info (in a substring) that I need to select the row.
E.g. MyTable
ID | Label
------------
11 | 1593:#:#:RE: test
12 | 1239#:#:#some more random text
13 | 12415#:#:#some more random text about the weather
14 | 369#:#:#some more random text about the StackOverflow
The label column has always a delimiter of :#:#:
So really I guess, I'd need to be able to split this row by the delimiter, grab the first part of the label column (i.e. the number I'm looking) to get the id I wanted.
So, If I wanted row with ID of 14, then I'd be:
Select ID from MyTable
where *something* = '369'
Any ideas on how to construct something ..or how best to go about this:)
I'm completely stumped and haven't been able to find how to do this.
Thanks,
How about:
WHERE label LIKE '369#%'?
No reason to get fancy.
Although.. if you are going to do this search often, then maybe pre-split that value out to another column as part of your ETL process and index it.

SQL list only unique / distinct values

I have a table which contains geometry lines (ways).There are lines that have a unique geometry (not repeating) and lines which have the same geometry (2,3,4 and more). I want to list only unique ones. If there are, for example, 2 lines with the same geometry I want to drop them. I tried DISTINCT but it also shows the first result from duplicated lines. I only want to see the unique ones.
I tried window function but I get similar result (I get a counter on first line from the duplicating ones). Sorry for a newbie question but I'm learning :) Thanks!
Example:
way|
1 |
1 |
2 |
3 |
3 |
4 |
Result should be:
way|
2 |
4 |
That actually worked. Thanks a lot. I also have other tags in this table for every way (name, ref and few other tags) When I add them to the query I loose the segregation.
select count(way), way, name
from planet_osm_line
group by way, name
having count(way) = 1;
Without "name" in the query I get all unique values listed but I want to keep "name" for every line. With this example I stilll get all the lines in the table listed.
To expound on #Nithila answer:
select count(way), way
from your_table
group by way
having count(way) = 1;
You first calculate the rows you want, and then search for the rest of the fields. So the aggregation doesnt cause you problems.
WITH singleRow as (
select count(way), way
from planet_osm_line
group by way
having count(way) = 1
)
SELECT P.*
FROM planet_osm_line P
JOIN singleRow S
ON P.way = S.way
you can group by way and while taking the data out check the count=1.It will give non duplicating data.
#voyteck
As I understood your question you need to get only non duplicating records of way column and for each row you need to show the name is it
If so, you have to put all the column in select statement, but no need to group by all the columns.
select count(way), way, name
from planet_osm_line
group by way
having count(way) = 1;

Comma delimited values sql

From my research online I have discovered two answers to this question which I am trying to stay away from.
I cannot modify the table or add a new table because the software is third party and needs the table to remain unmodified.
I am trying to stay away from using temporary tables or extra user defined functions.
Here is my issue.
There is a column in the database that is a list of comma-delimited numbers representing days of the week, i.e. (1,2,4,5,7).
I am trying to find a way to read that data and find out if there are any rows where that column represents days that are 3 consecutive days.
It should return anything with
1,2,3
2,3,4
3,4,5
5,6,7
1,,,,,6,7
1,2,,,,,7
But if the column has 1,2,3,4 it should not return twice. There are a lot of rows that have 2,3,4,5,6 and any solution I've come up with will return that 3 times.
Preferably, I would like to create a stored procedure to pass in a number and look for that number of consecutive days. So if 5 is passed in, it will look for anything that is marked for 5 consecutive days.
Is there another option other than using extra tables? If so can you show me how to do make this work? I am not new to SQL but there are a lot of more advanced querying techniques I am not familiar with.
The following brute force method will work in all databases:
select (case when col like '%1%' and col like '%2%' and col like '%3%' then 1
when col like '%2%' and col like '%3%' and col like '%4%' then 1
when col like '%3%' and col like '%4%' and col like '%5%' then 1
when col like '%4%' and col like '%5%' and col like '%6%' then 1
when col like '%5%' and col like '%6%' and col like '%7%' then 1
when col like '%6%' and col like '%7%' and col like '%1%' then 1
when col like '%7%' and col like '%1%' and col like '%2%' then 1
else 0
end) as HasThreeConsecutiveDays
It returns a 0/1 flag if three days are consecutive.
So if 5 is passed in, it will look for anything that is marked for 5 consecutive days.
You won't be able to do that without dynamic sql, because you want to support wrapping from 7 back to 1. I could write a query that would do it for you in a single statement if you didn't care about wrapping from the end of the week back to the beginning, but with that requirement I don't see how to do it without building a dynamic sql string in the procedure, which I don't have time to play with right now (maybe someone else will take that idea and run with it).
With that option defeated for now, I can do this instead:
WHERE
( col like '1,2,3%'
OR col like '%2,3,4%'
OR col like '%3,4,5%'
OR col like '%4,5,6%'
OR col like '%5,6,7'
OR col like '1%6,7'
OR col like '1,2%7'
)
This should be better than checking individual numbers as shown in another answer, because there are fewer pattern matches to complete. However, it only works if we can guarantee the sort order. We also need to know in advance how the commas are spaced between numbers, but we can fix that issue if necessary by replacing all commas and/or spaces with an empty string (and adjusting the patterns accordingly).
One more thought here: I realized that I can support a day count argument, if you can manage sneaking an additional table into the db somewhere. The table would look something like this:
create Table DayPatterns (Days int, Pattern varchar(13) )
and the data in the table would look like this:
1 1%
1 %2%
1 %3%
...
2 1,2%
2 %2,3%
2 %3,4%
2 %4,5%
...
2 1%7
...
3 1,2,3%
3 %2,3,4%
...
3 1%6,7
3 1,2%7
...
7 1,2,3,4,5,6,7
Hopefully you get the idea on how to fill that out. With that table in hand, you can JOIN against the table with a query like this:
INNER JOIN DayPatterns p ON p.Days = #ConsecutiveDays AND col LIKE p.Pattern
The key to making that work (aside from needing to be able to create that table somewhere) is also doing a GROUP BY on the correct columns. Otherwise, you'll end up with the same problem you have right now, where matching multiple possible consecutive day patterns will duplicate your results.
Finally, of course you know that most any schema that includes csv data is broken, but since you can't seem to fix this, hopefully one of these ideas will help.

Count particular substring text within column

I have a Hive table, titled 'UK.Choices' with a column, titled 'Fruit', with each row as follows:
AppleBananaAppleOrangeOrangePears
BananaKiwiPlumAppleAppleOrange
KiwiKiwiOrangeGrapesAppleKiwi
etc.
etc.
There are 2.5M rows and the rows are much longer than the above.
I want to count the number of instances that the word 'Apple' appears.
For example above, it is:
Number of 'Apple'= 5
My sql so far is:
select 'Fruit' from UK.Choices
Then in chunks of 300,000 I copy and paste into Excel, where I'm more proficient and able to do this using formulas. Problem is, it takes upto an hour and a half to generate each chunk of 300,000 rows.
Anyone know a quicker way to do this bypassing Excel? I can do simple things like counts using where clauses, but something like the above is a little beyond me right now. Please help.
Thank you.
I think I am 2 years too late. But since I was looking for the same answer and I finally managed to solve it, I thought it was a good idea to post it here.
Here is how I do it.
Solution 1:
+-----------------------------------+---------------------------+-------------+-------------+
| Fruits | Transform 1 | Transform 2 | Final Count |
+-----------------------------------+---------------------------+-------------+-------------+
| AppleBananaAppleOrangeOrangePears | #Banana#OrangeOrangePears | ## | 2 |
| BananaKiwiPlumAppleAppleOrange | BananaKiwiPlum##Orange | ## | 2 |
| KiwiKiwiOrangeGrapesAppleKiwi | KiwiKiwiOrangeGrapes#Kiwi | # | 1 |
+-----------------------------------+---------------------------+-------------+-------------+
Here is the code for it:
SELECT length(regexp_replace(regexp_replace(fruits, "Apple", "#"), "[A-Za-z]", "")) as number_of_apples
FROM fruits;
You may have numbers or other special characters in your fruits column and you can just modify the second regexp to incorporate that. Just remember that in hive to escape a character you may need to use \\ instead of just one \.
Solution 2:
SELECT size(split(fruits,"Apple"))-1 as number_of_apples
FROM fruits;
This just first split the string using "Apple" as a separator and makes an array. The size function just tells the size of that array. Note that the size of the array is one more than the number of separators.
This is straight-forward if you have any delimiter ( eg: comma ) between the fruit names. The idea is to split the column into an array, and explode the array into multiple rows using the 'explode' function.
SELECT fruit, count(1) as count FROM
( SELECT
explode(split(Fruit, ',')) as fruit
FROM UK.Choices ) X
GROUP BY fruit
From your example, it looks like fruits are delimited by Capital letters. One idea is to split the column based on capital letters, assuming there are no fruits with same suffix.
SELECT fruit_suffix, count(1) as count FROM
( SELECT
explode(split(Fruit, '[A-Z]')) as fruit_suffix
FROM UK.Choices ) X
WHERE fruit_suffix <> ''
GROUP BY fruit_suffix
The downside is that, the output will not have first letter of the fruit,
pple - 5
range - 4
I think you want to run in one select, and use the Hive if UDF to sum for the different cases. Something like the following...
select sum( if( fruit like '%Apple%' , 1, 0 ) ) as apple_count,
sum( if( fruit like '%Orange%', 1, 0 ) ) as orange_count
from UK.Choices
where ID > start and ID < end;
instead of a join in the above query.
No experience of Hive, I'm afraid, so this may or may not work. But on SQLServer, Oracle etc I'd do something like this:
Assuming that you have an int PK called ID on the row, something along the lines of:
select AppleCount, OrangeCount, AppleCount - OrangeCount score
from
(
select count(*) as AppleCount
from UK.Choices
where ID > start and ID < end
and Fruit like '%Apple%'
) a,
(
select count(*) as OrangeCount
from UK.Choices
where ID > start and ID < end
and Fruit like '%Orange%'
) o
I'd leave the division by the total count to the end, when you have all the rows in the spreadsheet and can count them there.
However, I'd urgently ask my boss to let me change the Fruit field to be a table with an FK to Choices and one fruit name per row. Unless this is something you can't do in Hive, this design is something that makes kittens cry.
PS I'd missed that you wanted the count of occurances of Apple which this won't do. I'm leaving my answer up, because I reckon that my However... para is actually a good answer. :(