Comma delimited values sql - sql

From my research online I have discovered two answers to this question which I am trying to stay away from.
I cannot modify the table or add a new table because the software is third party and needs the table to remain unmodified.
I am trying to stay away from using temporary tables or extra user defined functions.
Here is my issue.
There is a column in the database that is a list of comma-delimited numbers representing days of the week, i.e. (1,2,4,5,7).
I am trying to find a way to read that data and find out if there are any rows where that column represents days that are 3 consecutive days.
It should return anything with
1,2,3
2,3,4
3,4,5
5,6,7
1,,,,,6,7
1,2,,,,,7
But if the column has 1,2,3,4 it should not return twice. There are a lot of rows that have 2,3,4,5,6 and any solution I've come up with will return that 3 times.
Preferably, I would like to create a stored procedure to pass in a number and look for that number of consecutive days. So if 5 is passed in, it will look for anything that is marked for 5 consecutive days.
Is there another option other than using extra tables? If so can you show me how to do make this work? I am not new to SQL but there are a lot of more advanced querying techniques I am not familiar with.

The following brute force method will work in all databases:
select (case when col like '%1%' and col like '%2%' and col like '%3%' then 1
when col like '%2%' and col like '%3%' and col like '%4%' then 1
when col like '%3%' and col like '%4%' and col like '%5%' then 1
when col like '%4%' and col like '%5%' and col like '%6%' then 1
when col like '%5%' and col like '%6%' and col like '%7%' then 1
when col like '%6%' and col like '%7%' and col like '%1%' then 1
when col like '%7%' and col like '%1%' and col like '%2%' then 1
else 0
end) as HasThreeConsecutiveDays
It returns a 0/1 flag if three days are consecutive.

So if 5 is passed in, it will look for anything that is marked for 5 consecutive days.
You won't be able to do that without dynamic sql, because you want to support wrapping from 7 back to 1. I could write a query that would do it for you in a single statement if you didn't care about wrapping from the end of the week back to the beginning, but with that requirement I don't see how to do it without building a dynamic sql string in the procedure, which I don't have time to play with right now (maybe someone else will take that idea and run with it).
With that option defeated for now, I can do this instead:
WHERE
( col like '1,2,3%'
OR col like '%2,3,4%'
OR col like '%3,4,5%'
OR col like '%4,5,6%'
OR col like '%5,6,7'
OR col like '1%6,7'
OR col like '1,2%7'
)
This should be better than checking individual numbers as shown in another answer, because there are fewer pattern matches to complete. However, it only works if we can guarantee the sort order. We also need to know in advance how the commas are spaced between numbers, but we can fix that issue if necessary by replacing all commas and/or spaces with an empty string (and adjusting the patterns accordingly).
One more thought here: I realized that I can support a day count argument, if you can manage sneaking an additional table into the db somewhere. The table would look something like this:
create Table DayPatterns (Days int, Pattern varchar(13) )
and the data in the table would look like this:
1 1%
1 %2%
1 %3%
...
2 1,2%
2 %2,3%
2 %3,4%
2 %4,5%
...
2 1%7
...
3 1,2,3%
3 %2,3,4%
...
3 1%6,7
3 1,2%7
...
7 1,2,3,4,5,6,7
Hopefully you get the idea on how to fill that out. With that table in hand, you can JOIN against the table with a query like this:
INNER JOIN DayPatterns p ON p.Days = #ConsecutiveDays AND col LIKE p.Pattern
The key to making that work (aside from needing to be able to create that table somewhere) is also doing a GROUP BY on the correct columns. Otherwise, you'll end up with the same problem you have right now, where matching multiple possible consecutive day patterns will duplicate your results.
Finally, of course you know that most any schema that includes csv data is broken, but since you can't seem to fix this, hopefully one of these ideas will help.

Related

New column based on list of values SQL

I am new to SQL and working on a database that needs a binary indicator based on the presence of string values in a column. I'm trying to make a new table as follows:
Original:
Indicator
a, b, c
c, d, e
Desired:
Indicator
type
a, b, c
1
c, d, e
0
SQL code:
SELECT
ID,
Contract,
Indicator,
CASE
WHEN Indicator IN ('a', 'b')
THEN 1
ELSE 0
END as Type
INTO new_table
FROM old_table
The table I keep creating reports every type as 0.
I also have 200+ distinct indicators, so it will be really time-consuming to write each as:
CASE
WHEN Indicator = 'a' THEN '1'
WHEN Indicator = 'b' THEN '1'
Is there a more streamlined way to think about this?
Thanks!
I think the first step is to understand why your code doesn’t work right now.
If your examples of what’s Indicator column are literally the strings you noted (a, b, c in one string and c, d, e in another) you should understand that your case statement is saying “I am looking for an exact match on the full value of Indicator against the following list -
The letter A or
The letter B
Essentially- you are saying “hey SQL, does ‘a,b,c’ match to ‘a’? Or does ‘a,b,c’ match to ‘b’. ?”
Obviously SQL’s answer is “these don’t match” which is why you get all 0s.
You can try wildcard matching with the LIKE syntax.
Case when Indicator like ‘%a%’ or Indicator like ‘%b%’ then 1 else 0 end as Type
Now, if the abc and cde strings aren’t REALLY what’s in your database then this approach may not work well for you.
Example, let’s say your real values are words that are all slapped together in a single string.
Let’s say that your strings are 3 words each.
Cat, Dog, Man
Catalog, Stick, Shoe
Hair, Hellcat, Belt
And let’s say that Cat is a value that should cause Type to be 1.
If you write: case when Indicator like ‘%cat%’ then 1 else 0 end as Type - all 3 rows will get a 1 because the wildcard will match Cat in Catalog and cat in Hellcat.
I think the bottom line is that unless your Indicator values really are 3 letters and your match criteria is a single letter, you very well could be better off writing a 200 line long case statement if you need this done any time soon.
A better approach to consider (depending on things like are you going to have 300 different combinations a week or month or year from now?)
If yes, wouldn’t it be nice if you had a table with a total of 6 rows - like so?
Indicator | Indictor_Parsed
a,b,c | a
a,b,c | b
a,b,c | c
c,d,e | c
c,d,e | d
c,d,e | e
Then you could write the query as you have it case when Indicator_Parsed in (‘a’, ‘b’) then 1 else 0 end as Type - as a piece of a more verbose solution.
If this approach seems useful to you, here’s a link to the page that lets you parse those comma-separated-values into additional rows. Turning a Comma Separated string into individual rows
ON mysql/sql server You can do it as follows :
insert into table2
select Indicator,
CASE WHEN Indicator like '%a%' or Indicator like '%b%' THEN 1 ELSE 0 END As type
from table1;
demo here
You can use the REGEXP operator to check for presence of either a, b or both.
SELECT Indicator,
Indicator REGEXP '.*[ab].*'
FROM tab
If you need that into a table, you either create it from scratch
CREATE your_table AS
SELECT Indicator,
Indicator REGEXP '.*[ab].*'
FROM tab
or you insert values in it:
INSERT INTO your_table
SELECT Indicator,
Indicator REGEXP '.*[ab].*'
FROM tab
Check the demo here.

Exclude the row which has numeric characters, only at the beginning of the row

I have a similar table as below:
product
01 apple
02 orange
banana 10
I am trying to exclude only rows which start with a number. If the number is not in the beginning then it should not be excluded. The desired table output should be like this:
product
banana 10
However with my current query, it excludes everything as soon as there is a number in the row:
SELECT *
FROM table
WHERE product NOT LIKE '%0%'
Could anyone please suggest me on how to tackle this? Much appreciated.
Something like this maybe:
SELECT *
FROM table
WHERE left(product, 1) NOT IN ('0','1','2','3','4','5','6','7','8','9')
regex to match lines that don't start with number is
^[^0-9].*
An sql query in mysql would look like
SELECT *
FROM table
WHERE product RLIKE '^[^0-9].*'
I would recommend regular expressions. In Redshift, this looks like:
where product ~ '^[^0-9]'
I might also suggest:
where left(product, 1) not between '0' and '9'

How to use a variable as a column name in DolphinDB?

To simplify my question: start with the following table in DolphinDB:
t=table(1 2 3 as x, 4 5 6 as y)
I would like to select a column from the table, but I prefer to assign the column to choose in a separate statement. I tried the following:
colName= x
select colName from t
and
colName="x"
select colName from t
neither works. I am sure there is a way to do this in DolphinDB. Could someone point out where to look at in the manual? Thanks!
You can take a look at the part about metaprogramming in DolphinDB's manual:
https://www.dolphindb.com/help/Metaprogramming.html
For your question, try this:
colName= "x"
sql(select=sqlCol(colName), from=t).eval()

SAP HANA SQL - Concatenate multiple result rows for a single column into a single row

I am pulling data and when I pull in the text field my results for the "distinct ID" are sometimes being duplicated when there are multiple results for that ID. Is there a way to concatenate the results into a single column/row rather than having them duplicated?
It looks like there are ways in other SQL platforms but I have not been able to find something that works in HANA.
Example
Select
Distinct ID
From Table1
If I pull only Distinct ID I get the following:
ID
1
2
3
4
However when I pull the following:
Example
Select
Distinct ID,Text
From Table1
I get something like
ID
Text
1
Dog
2
Cat
2
Dog
3
Fish
4
Bird
4
Horse
I am trying to Concat the Text field when there is more than 1 row for each ID.
What I need the results to be (Having a "break" between results so that they are on separate lines would be even better but at least a "," would work):
ID
Text
1
Dog
2
Cat,Dog
3
Fish
4
Bird,Horse
I see Kiran has just referred to another valid answer in the comment, but in your example this would work.
SELECT ID, STRING_AGG(Text, ',')
FROM TABLE1
GROUP BY ID;
You can replace the ',' with other characters, maybe a '\n' for a line break
I would caution against the approach to concatenate rows in this way, unless you know your data well. There is no effective limit to the rows and length of the string that you will generate, but HANA will have a limit on string length, so consider that.

Count particular substring text within column

I have a Hive table, titled 'UK.Choices' with a column, titled 'Fruit', with each row as follows:
AppleBananaAppleOrangeOrangePears
BananaKiwiPlumAppleAppleOrange
KiwiKiwiOrangeGrapesAppleKiwi
etc.
etc.
There are 2.5M rows and the rows are much longer than the above.
I want to count the number of instances that the word 'Apple' appears.
For example above, it is:
Number of 'Apple'= 5
My sql so far is:
select 'Fruit' from UK.Choices
Then in chunks of 300,000 I copy and paste into Excel, where I'm more proficient and able to do this using formulas. Problem is, it takes upto an hour and a half to generate each chunk of 300,000 rows.
Anyone know a quicker way to do this bypassing Excel? I can do simple things like counts using where clauses, but something like the above is a little beyond me right now. Please help.
Thank you.
I think I am 2 years too late. But since I was looking for the same answer and I finally managed to solve it, I thought it was a good idea to post it here.
Here is how I do it.
Solution 1:
+-----------------------------------+---------------------------+-------------+-------------+
| Fruits | Transform 1 | Transform 2 | Final Count |
+-----------------------------------+---------------------------+-------------+-------------+
| AppleBananaAppleOrangeOrangePears | #Banana#OrangeOrangePears | ## | 2 |
| BananaKiwiPlumAppleAppleOrange | BananaKiwiPlum##Orange | ## | 2 |
| KiwiKiwiOrangeGrapesAppleKiwi | KiwiKiwiOrangeGrapes#Kiwi | # | 1 |
+-----------------------------------+---------------------------+-------------+-------------+
Here is the code for it:
SELECT length(regexp_replace(regexp_replace(fruits, "Apple", "#"), "[A-Za-z]", "")) as number_of_apples
FROM fruits;
You may have numbers or other special characters in your fruits column and you can just modify the second regexp to incorporate that. Just remember that in hive to escape a character you may need to use \\ instead of just one \.
Solution 2:
SELECT size(split(fruits,"Apple"))-1 as number_of_apples
FROM fruits;
This just first split the string using "Apple" as a separator and makes an array. The size function just tells the size of that array. Note that the size of the array is one more than the number of separators.
This is straight-forward if you have any delimiter ( eg: comma ) between the fruit names. The idea is to split the column into an array, and explode the array into multiple rows using the 'explode' function.
SELECT fruit, count(1) as count FROM
( SELECT
explode(split(Fruit, ',')) as fruit
FROM UK.Choices ) X
GROUP BY fruit
From your example, it looks like fruits are delimited by Capital letters. One idea is to split the column based on capital letters, assuming there are no fruits with same suffix.
SELECT fruit_suffix, count(1) as count FROM
( SELECT
explode(split(Fruit, '[A-Z]')) as fruit_suffix
FROM UK.Choices ) X
WHERE fruit_suffix <> ''
GROUP BY fruit_suffix
The downside is that, the output will not have first letter of the fruit,
pple - 5
range - 4
I think you want to run in one select, and use the Hive if UDF to sum for the different cases. Something like the following...
select sum( if( fruit like '%Apple%' , 1, 0 ) ) as apple_count,
sum( if( fruit like '%Orange%', 1, 0 ) ) as orange_count
from UK.Choices
where ID > start and ID < end;
instead of a join in the above query.
No experience of Hive, I'm afraid, so this may or may not work. But on SQLServer, Oracle etc I'd do something like this:
Assuming that you have an int PK called ID on the row, something along the lines of:
select AppleCount, OrangeCount, AppleCount - OrangeCount score
from
(
select count(*) as AppleCount
from UK.Choices
where ID > start and ID < end
and Fruit like '%Apple%'
) a,
(
select count(*) as OrangeCount
from UK.Choices
where ID > start and ID < end
and Fruit like '%Orange%'
) o
I'd leave the division by the total count to the end, when you have all the rows in the spreadsheet and can count them there.
However, I'd urgently ask my boss to let me change the Fruit field to be a table with an FK to Choices and one fruit name per row. Unless this is something you can't do in Hive, this design is something that makes kittens cry.
PS I'd missed that you wanted the count of occurances of Apple which this won't do. I'm leaving my answer up, because I reckon that my However... para is actually a good answer. :(