Druid SQL: get substring issue - sql

There is the table column which holds the comma-separated values, e.g:
abc321,rd512,spwewr
I need to extract the substring which starts from the user-defined pattern.
Example:
Input Pattern | Expected result
abc abc321
r rd512
spwe spwewr
b NULL
Following fails in Druid SQL:
SELECT SUBSTRING('abc321,rd512,spwewr', POSITION('r' IN 'abc321,rd512,spwewr'), 2)
This is the known Druid bug:
" Substring operator converter does not handle non-constant literals correctly":
https://issues.apache.org/jira/browse/CALCITE-2226
I think the way to go is to use REGEXP_EXTRACT() or REGEXP_LIKE()
but I cannot figure out the specific syntax.

select regexp_extract('abc321,rd512,spwewr', 'rd[^,]+', 0)

Related

SQL group by middle part of string

I have string column that looks usually approximately like this:
https://mapy.cz/zakladni?x=16.3360208&y=49.6718038&z=8&source=firm&id=13123554
https://mapy.cz/turisticka?x=15.9380354&y=50.1990211&z=11&source=base&id=2197
https://mapy.cz/turisticka?x=12.8611357&y=49.8051338&z=16&source=base&id=1703157
I would like to group data by source which is part of the string - four letters behind "source=" (in the case above: firm) and then simply count them. Is there a way to achieve this directly in SQL code? I am using hadoop.
Data is a set of strings that look like above. My expected result is summary table with two columns: 1) Each type of the source (there is about 20 possible and their length is different so I cannot use sipmle substring). Ideally I am looking for solution that says: For the grouping use four letters that come after "source=" 2) Count of their occurences in all the strings.
There is just one source type in each string.
You can use regexp_extract():
select substr(regexp_extract(url, 'source[^&]+'), 8)
You can use charindex in MSSQL to get position of string and extract record
;with cte as (
SELECT SUBSTRING('https://mapy.cz/zakladni?x=16.3360208&y=49.6718038&z=8&source=firm&id=13123554',
charindex('&source=','https://mapy.cz/zakladni?x=16.3360208&y=49.6718038&z=8&source=firm&id=13123554')
+8,4) AS ExtractString )
select ExtractString,count(ExtractString) as count from cte group by ExtractString;
There is equivalent function LOCATE in hiveql for charindex.

Substring and Charindex - issues with minus operator

I am using SQL Server 2012. In the column PROJECT_NAME I have line items, all with the same format, that look like this:
PROJECT_NAME
--------------
Caulk, Norman v BPI
Caulk, Norman v BWD
Carper, Robert v ECH
I am trying to extract the first name (second name in the text string) and am using this query:
select
substring(Project_name,(charindex(',',PROJECT_NAME,0)),((CHARINDEX(' v ',PROJECT_NAME)-
(charindex(',',PROJECT_NAME)))))
from RPT_PROJ_MAIN pm
When I run this query I get the following error:
Invalid length parameter passed to the LEFT or SUBSTRING function.
I have isolated all of the different expressions and they all work fine on their own. If I replace the minus operator with a + then the query runs fine and I cannot figure out why?
That means that despite you saying that all rows have the same format some of them don't.
Specifically if a value has no v <something> at the end you'll get exactly that error, because the third parameter to SUBSTRING() function will have a negative value
Here is SQLFiddle demo
Hi I have solution for this just used below mentioned trick to handle this issue. Require to Declare value you want to minus. Just see the example below.
Declare #Val numeric=2
PRINT LEFT('My Name is Bhumesh Shah',charindex('is','My Name is Bhumesh Shah'))
PRINT LEFT('My Name is Bhumesh Shah',charindex('is','My Name is Bhumesh Shah')-#val)
Output:
My Name i
My Name

Is it possible to use commodin in the left side of a LIKE operator?

Is there a way to be able to use at least _ in the left side so the following statement returns 1:
SELECT 1 FROM DUAL WHERE
'te_ephone' like 'tele_ho%'
I want oracle to parse the left side as it parses the right one, to make _ match 'any' char. Is this possible or is there any workaround to make this work?
To give some context, the final objective is that things like remoñoson matchs with remonos%.
Left hand side is a column where I am replacing some characters by _ whilst the start with query with the same replacement.
Based on your context what you are expecting can be achieved using Linguistic Sort which gives detailed information about searching Linguistic strings and sorting
Example1(case-insensitive or accent-insensitive comparisons):-
SELECT word FROM test1
WHERE NLS_UPPER(word, 'NLS_SORT = XGERMAN') = 'GROSSE';
WORD
------------
GROSSE
Große
große
Example 2 using Regular expression with the Base Letter Operator [==]:-
Oracle SQL syntax:
SQL> SELECT col FROM test WHERE REGEXP_LIKE(col,'r[[=e=]]sum[[=e=]]');
Expression: r[[=e=]]sum[[=e=]]
Matches:
resume
résumé
résume
resumé

Implement an IN Query using XQuery in MSSQLServer 2005

I'm trying to query an xml column using an IN expression. I have not found a native XQuery way of doing such a query so I have tried two work-arounds:
Implement the IN query as a concatenation of ORs like this:
WHERE Data.exist('/Document/ParentKTMNode[text() = sql:variable("#Param1368320145") or
text() = sql:variable("#Param2043685301") or ...
Implement the IN query with the String fn:contains(...) method like this:
WHERE Data.exist('/Document/Field2[fn:contains(sql:variable("#Param1412022317"), .)]') = 1
Where the given parameter is a (long) string with the values separated by "|"
The problem is that Version 1. doesn't work for more than about 50 arguments. The server throws an out of memory exception. Version 2. works, but is very, very slow.
Has anyone a 3. idea? To phrase the problem more complete: Given a list of values, of any sql native type, select all rows whose xml column has one of the given values at a specific field in the xml.
Try to insert all your parameters in a table and query using sql:column clause:
SELECT Mytable.Column FROM MyTable
CROSS JOIN (SELECT '#Param1' T UNION ALL SELECT '#Param2') B
WHERE Data.exist('/Document/ParentKTMNode[text() = sql:column("T")

How do I sort a VARCHAR column in PostgreSQL that contains words and numbers?

I need to order a select query using a varchar column, using numerical and text order. The query will be done in a java program, using jdbc over postgresql.
If I use ORDER BY in the select clause I obtain:
1
11
2
abc
However, I need to obtain:
1
2
11
abc
The problem is that the column can also contain text.
This question is similar (but targeted for SQL Server):
How do I sort a VARCHAR column in SQL server that contains words and numbers?
However, the solution proposed did not work with PostgreSQL.
Thanks in advance, regards,
I had the same problem and the following code solves it:
SELECT ...
FROM table
order by
CASE WHEN column < 'A'
THEN lpad(column, size, '0')
ELSE column
END;
The size var is the length of the varchar column, e.g 255 for varying(255).
You can use regular expression to do this kind of thing:
select THECOL from ...
order by
case
when substring(THECOL from '^\d+$') is null then 9999
else cast(THECOL as integer)
end,
THECOL
First you use regular expression to detect whether the content of the column is a number or not. In this case I use '^\d+$' but you can modify it to suit the situation.
If the regexp doesn't match, return a big number so this row will fall to the bottom of the order.
If the regexp matches, convert the string to number and then sort on that.
After this, sort regularly with the column.
I'm not aware of any database having a "natural sort", like some know to exist in PHP. All I've found is various functions:
Natural order sort in Postgres
Comment in the PostgreSQL ORDER BY documentation