Find the most frequent value ignoring everything after '(' within it - sql

I am trying to find the most frequent string ignoring everything after ( within it.
So, how it should work. If I've got the strings:
England (88)
Iceland (100)
Iceland (77)
England (88)
Denmark (15)
Iceland (18)
It should return
Iceland
because it's the most frequent country here and no matter that as a string England (88) is going to pretend.
Unfortunately, my query returns
England(88)
SQLfiddle
I've been thinking to do it by 2 steps:
truncate every country string
do script that I already written.
But I failed on the first step.

SQL Fiddle is acting up, so can't test, but I'd think you could use SUBSTR() and INSTR() to isolate the portion left of the first (:
SELECT SUBSTR(X,1,INSTR(X,'(')-1) AS HUS
FROM tt
GROUP BY SUBSTR(X,1,INSTR(X,'(')-1)
ORDER BY COUNT(*) DESC
LIMIT 1;
Edit: Tested on https://sqliteonline.com/ and it returns Iceland as expected: Fiddle.

This is a bunch of string manipulation, which is rather cumbersome in SQLite. Here is one approach:
select trim(substr(str, 1, instr(str, '(') - 1)) as country,
sum(cast(replace(substr(str + 1, instr(str, '('), ')', '') as int))
from t
group by trim(substr(str, 1, instr(str, '(') - 1));

This would be a bullet-proof, whether you have '(' in your text or not
select rtrim(substr(mycolumn,1,instr(mycolumn || '(','(')-1))
from mytable
group by rtrim(substr(mycolumn,1,instr(mycolumn || '(','(')-1))
order by count(*) desc
limit 1

Please try the following solutions based on replace, rtrim and potentially replicate
select rtrim(substr(replace(mycolumn,'(',replicate(' ',50)),1,50))
from mytable
group by rtrim(substr(replace(mycolumn,'(',replicate(' ',50)),1,50))
order by count(*) desc
limit 1
;
select rtrim(substr(replace(mycolumn,'(',' '),1,50))
from mytable
group by rtrim(substr(replace(mycolumn,'(',' '),1,50))
order by count(*) desc
limit 1
;

Related

BigQuery - concatenate ignoring NULL

I'm very new to SQL. I understand in MySQL there's the CONCAT_WS function, but BigQuery doesn't recognise this.
I have a bunch of twenty fields I need to CONCAT into one comma-separated string, but some are NULL, and if one is NULL then the whole result will be NULL. Here's what I have so far:
CONCAT(m.track1, ", ", m.track2))) As Tracks,
I tried this but it returns NULL too:
CONCAT(m.track1, IFNULL(m.track2,CONCAT(", ", m.track2))) As Tracks,
Super grateful for any advice, thank you in advance.
Unfortunately, BigQuery doesn't support concat_ws(). So, one method is string_agg():
select t.*,
(select string_agg(track, ',')
from (select t.track1 as track union all select t.track2) x
) x
from t;
Actually a simpler method uses arrays:
select t.*,
array_to_string([track1, track2], ',')
Arrays with NULL values are not supported in result sets, but they can be used for intermediate results.
I have a bunch of twenty fields I need to CONCAT into one comma-separated string
Assuming that these are the only fields in the table - you can use below approach - generic enough to handle any number of columns and their names w/o explicit enumeration
select
(select string_agg(col, ', ' order by offset)
from unnest(split(trim(format('%t', (select as struct t.*)), '()'), ', ')) col with offset
where not upper(col) = 'NULL'
) as Tracks
from `project.dataset.table` t
Below is oversimplified dummy example to try, test the approach
#standardSQL
with `project.dataset.table` as (
select 1 track1, 2 track2, 3 track3, 4 track4 union all
select 5, null, 7, 8
)
select
(select string_agg(col, ', ' order by offset)
from unnest(split(trim(format('%t', (select as struct t.*)), '()'), ', ')) col with offset
where not upper(col) = 'NULL'
) as Tracks
from `project.dataset.table` t
with output

GROUPING multiple LIKE string

Data:
2015478 warning occurred at 20201403021545
2020179 error occurred at 20201303021545
2025480 timeout occurred at 20201203021545
2025481 timeout occurred at 20201103021545
2020482 error occurred at 20201473021545
2020157 timeout occurred at 20201403781545
2020154 warning occurred at 20201407851545
2027845 warning occurred at 20201403458745
In above data, there are 3 kinds of strings I am interested in warning, error and timeout
Can we have a single query where it will group by string and give the count of occurrences as below
Output:
timeout 3
warning 3
error 2
I know I can write separate queries to find count individually. But interested in a single query
Thanks
You can use filtered aggregation for that:
select count(*) filter (where the_column like '%timeout%') as timeout_count,
count(*) filter (where the_column like '%error%') as error_count,
count(*) filter (where the_column like '%warning%') as warning_count
from the_table;
This returns the counts in three columns rather then three rows as your indicated.
If you do need this in separate rows, you can use regexp_replace() to cleanup the string, then group by that:
select regexp_replace(the_column, '(.*)(warning|error|timeout)(.*)', '\2') as what,
count(*)
from the_table
group by what;
Please use below query, without hard coding the values using STRPOS
select val, count(1) from
(select substring(column_name ,position(' ' in (column_name))+1,
length(column_name) - position(reverse(' ') in reverse(column_name)) -
position(' ' in (column_name))) as val from matching) qry
group by val; -- Provide the proper column name
Demo:
If you want this on separate rows you can also use a lateral join:
select which, count(*)
from t cross join lateral
(values (case when col like '%error%' then 'error' end),
(case when col like '%warning%' then 'warning' end),
(case when col like '%timeout%' then 'timeout' end)
) v(which)
where which is not null
group by which;
On the other hand, if you simply want the second word -- but don't want to hardcode the values -- then you can use:
select split_part(col, ' ', 2) as which, count(*)
from t
group by which;
Here is a db<>fiddle.

How to convert only first letter uppercase without using Initcap in Oracle?

Is there a way to convert the first letter uppercase in Oracle SQl without using the Initcap Function?
I have the problem, that I must work with the DISTINCT keyword in SQL clause and the Initcap function doesn´t work.
Heres is my SQL example:
select distinct p.nr, initcap(p.firstname), initcap(p.lastname), ill.describtion
from patient p left join illness ill
on p.id = ill.id
where p.deleted = 0
order by p.lastname, p.firstname;
I get this error message: ORA-01791: not a SELECTed expression
When SELECT DISTINCT, you can't ORDER BY columns that aren't selected. Use column aliases instead, as:
select distinct p.nr, initcap(p.firstname) fname, initcap(p.lastname) lname, ill.describtion
from patient p left join illness ill
on p.id = ill.id
where p.deleted = 0
order by lname, fname
this would do it, but i think you need to post your query as there may be a better solution
select upper(substr(<column>,1,1)) || substr(<column>,2,9999) from dual
To change string to String, you can use this:
SELECT
regexp_replace ('string', '[a-z]', upper (substr ('string', 1, 1)), 1, 1, 'i')
FROM dual;
This assumes that the first letter is the one you want to convert. It your input text starts with a number, such as 2 strings then it won't change it to 2 Strings.
You can also use the column number instead of the name or alias:
select distinct p.nr, initcap(p.firstname), initcap(p.lastname), ill.describtion
from patient p left join illness ill
on p.id = ill.id
where p.deleted = 0
order by 3, 2;
WITH inData AS
(
SELECT 'word1, wORD2, word3, woRD4, worD5, word6' str FROM dual
),
inRows as
(
SELECT 1 as tId, LEVEL as rId, trim(regexp_substr(str, '([A-Za-z0-9])+', 1, LEVEL)) as str
FROM inData
CONNECT BY instr(str, ',', 1, LEVEL - 1) > 0
)
SELECT tId, LISTAGG( upper(substr(str, 1, 1)) || substr(str, 2) , '') WITHIN GROUP (ORDER BY rId) AS camelCase
FROM inRows
GROUP BY tId;

Oracle 11g: LISTAGG ignores NULL values

I have some table TABLE1 with data:
+------------+
| COL1 |
+------------+
| FOO |
| BAR |
| (null) |
| EXP |
+------------+
( FIDDLE )
When I executing:
SELECT listagg(col1, '#') within group(ORDER BY rownum)
FROM table1
I receive: FOO#BAR#EXP but I want to have: FOO#BAR##EXP
(LISTAGG ignoring empty cells :/ )
Any idea to achieve that without writing own function ?
select replace(listagg(NVL(col1, '#'), '#')
within group(order by rownum),'###','##') from table1
you can use the NVL(col1, '#') here you can pass any value instead of null.
HErE is the demo
select substr(listagg('#'||col1) within group (order by rownum),2)
from table1
Prepend the separator before each value (this yields the separator only for NULLs), then aggregate without separator, and strip the leading separator.
Try this way:
select replace(
listagg(coalesce(col1,'replace'), '#')
within group(order by rownum),
'replace','')
from table1
Sql Fiddle Demo
Please try:
select replace(listagg(nvl(col1, '#') , '#')
within group(order by rownum), '##', '#')
from table1
Another approach, using model clause.
SQL> select rtrim(res, '#') as col1
2 from ( select res
3 , rn
4 from table1
5 model
6 dimension by (rownum as rn)
7 measures(cast(null as varchar2(500)) as res, col1)
8 rules(
9 res[any] order by rn desc= col1[cv()] || '#' || res[cv() + 1]
10 )
11 )
12 where rn = 1
13 /
COL1
--------------------
FOO#BAR##EXP
SQLFiddle Demo
Try this
select
REPLACE(listagg(NVL(col1,' '), '#') within group(order by rownum),'# #','##')
from table1
SQL FIDDLE DEMO
IMHO its worth mentioning that with using listagg, you will likely encounter limit of the 4000 chars on result value and then seek some alternative solution. The solution is xmlagg, which by contrast respects null values. Hence consider whether using placeholder-based tricks with listagg is worth.
(My case was the opposite: I needed to ignore nulls which perfectly fits listagg but then I reached the limit and was forced to learn ignoring values with xmlagg clause. Which is possible using nvl2 function but that's another story.)

How Can I Sort A 'Version Number' Column Generically Using a SQL Server Query

I wonder if the SQL geniuses amongst us could lend me a helping hand.
I have a column VersionNo in a table Versions that contains 'version number' values like
VersionNo
---------
1.2.3.1
1.10.3.1
1.4.7.2
etc.
I am looking to sort this, but unfortunately, when I do a standard order by, it is treated as a string, so the order comes out as
VersionNo
---------
1.10.3.1
1.2.3.1
1.4.7.2
Intead of the following, which is what I am after:
VersionNo
---------
1.2.3.1
1.4.7.2
1.10.3.1
So, what I need to do is to sort by the numbers in reverse order (e.g. in a.b.c.d, I need to sort by d,c,b,a to get the correct sort ourder).
But I am stuck as to how to achieve this in a GENERIC way. Sure, I can split the string up using the various sql functions (e.g. left, right, substring, len, charindex), but I can't guarantee that there will always be 4 parts to the version number. I may have a list like this:
VersionNo
---------
1.2.3.1
1.3
1.4.7.2
1.7.1
1.10.3.1
1.16.8.0.1
Can, does anyone have any suggestions? Your help would be much appreciated.
If You are using SQL Server 2008
select VersionNo from Versions order by cast('/' + replace(VersionNo , '.', '/') + '/' as hierarchyid);
What is hierarchyid
Edit:
Solutions for 2000, 2005, 2008: Solutions to T-SQL Sorting Challenge here.
The challenge
Depending on SQL engine for MySQL would be sth like this:
SELECT versionNo FROM Versions
ORDER BY
SUBSTRING_INDEX(versionNo, '.', 1) + 0,
SUBSTRING_INDEX(SUBSTRING_INDEX(versionNo, '.', -3), '.', 1) + 0,
SUBSTRING_INDEX(SUBSTRING_INDEX(versionNo, '.', -2), '.', 1) + 0,
SUBSTRING_INDEX(versionNo, '.', -1) + 0;
For MySQL version 3.23.15 an above
SELECT versionNo FROM Versions ORDER BY INET_ATON(ip);
Another way to do it:
Assuming you only have a,b,c,d only you may as well separate the data out to columns and do an order by a,b,c,d(all desc) and get the top 1 row
If you need to scale to more than d to say e,f,g... just change 1,2,3,4, to 1,2,3,4,5,6,7 and so on in the query
Query :
see demo
create table t (versionnumber varchar(255))
insert into t values
('1.0.0.505')
,('1.0.0.506')
,('1.0.0.507')
,('1.0.0.508')
,('1.0.0.509')
,('1.0.1.2')
; with cte as
(
select
column1=row_number() over (order by (select NULL)) ,
column2=versionnumber
from t
)
select top 1
CONCAT([1],'.',[2],'.',[3],'.',[4])
from
(
select
t.column1,
split_values=SUBSTRING( t.column2, t1.N, ISNULL(NULLIF(CHARINDEX('.',t.column2,t1.N),0)-t1.N,8000)),
r= row_number() over( partition by column1 order by t1.N)
from cte t
join
(
select
t.column2,
1 as N
from cte t
UNION ALL
select
t.column2,
t1.N + 1 as N
from cte t
join
(
select
top 8000
row_number() over(order by (select NULL)) as N
from
sys.objects s1
cross join
sys.objects s2
) t1
on SUBSTRING(t.column2,t1.N,1) = '.'
) t1
on t1.column2=t.column2
)a
pivot
(
max(split_values) for r in ([1],[2],[3],[4])
)p
order by [1] desc,[2] desc,[3] desc,[4] desc
If you can, alter the schema so that the version has 4 columns instead of one. Then sorting is easy.