Optimization of text parsing - Oracle SQL - sql

I am working on a SQL query that will count the appearances of certain words in "long text", or a huge text field that is a CLOB data type.
My dataset (which is massive, ~5M+ rows) looks something like this:
http://sqlfiddle.com/#!4/2c13d/1
I have a query, like this:
SELECT
TheTask AS Tasking,
SUM(CASE WHEN TRIM(UPPER(TheTaskText)) LIKE '%LONG%' THEN 1 ELSE 0 END) AS LongCount,
SUM(CASE WHEN TRIM(UPPER(TheTaskText)) LIKE '%TEXT%' THEN 1 ELSE 0 END) AS TextCount,
SUM(CASE WHEN TRIM(UPPER(TheTaskText)) LIKE '%ENGLISH%' THEN 1 ELSE 0 END) AS EnglishCount
FROM
example
GROUP BY
TheTask
However, it takes an extremely long time to run on the complete dataset (~3 hours or so). I believe this is due to LIKE optimization issues, but I am unsure of how else to achieve this goal dataset. I have tried researching other articles on how to optimize like, but is it possible that REGEX or something would be quicker? I am looking to optimize this query by evaluating LIKE performance.

The CONTEXT index type is used to index long texts. You can use :
CREATE INDEX idx_TheTaskTxt ON example(TRIM(UPPER(TheTaskText))) INDEXTYPE IS CTXSYS.CONTEXT;
and collect statistics for the optimizer to take effect :
EXEC DBMS_STATS.GATHER_TABLE_STATS(USER, 'EXAMPLE', cascade=>TRUE);
and call
SELECT
TheTask AS Tasking,
SUM(CASE WHEN CONTAINS(TRIM(UPPER(TheTaskText)), 'LONG', 1) > 0 THEN 1 ELSE 0 END) AS LongCount,
SUM(CASE WHEN CONTAINS(TRIM(UPPER(TheTaskText)), 'TEXT', 1) > 0 THEN 1 ELSE 0 END) AS TextCount,
SUM(CASE WHEN CONTAINS(TRIM(UPPER(TheTaskText)), 'ENGLISH', 1) > 0 THEN 1 ELSE 0 END) AS EnglishCount
FROM example
GROUP BY TheTask
HAVING
SUM(CASE WHEN CONTAINS(TRIM(UPPER(TheTaskText)), 'LONG', 1) > 0 THEN 1 ELSE 0 END) *
SUM(CASE WHEN CONTAINS(TRIM(UPPER(TheTaskText)), 'TEXT', 1) > 0 THEN 1 ELSE 0 END) *
SUM(CASE WHEN CONTAINS(TRIM(UPPER(TheTaskText)), 'ENGLISH', 1) > 0 THEN 1 ELSE 0 END)
IN (0,1)

Related

Sql loop over unique elements of a column

I am trying to get data of rows into columns. This is my data before query.
This is my query.
SELECT
Date,
SUM(CASE WHEN (Terms="Art") THEN 1 ELSE 0 END) AS Art,
SUM(CASE WHEN (Terms="Action") THEN 1 ELSE 0 END) AS Action,
SUM(CASE WHEN (Terms="Board") THEN 1 ELSE 0 END) AS Board,
SUM(CASE WHEN (Terms="Puzzle") THEN 1 ELSE 0 END) AS Puzzle,
SUM(CASE WHEN (Terms="Adventure") THEN 1 ELSE 0 END) AS Adventure,
SUM(CASE WHEN (Terms="Others") THEN 1 ELSE 0 END) AS Others
FROM
__table__
GROUP BY
Date
After Query this is my data.
Which is fine and intended data. But problem is in query as you can see I have to explicitly write each term. I want to automate that using loop. Theoretical solution is to write subquery to get all unique terms and then loop over them. But I don't know how.

Cleaning "SUM" Query

I have a bit of sql code that look similar to this:
select sum(case when latitude = '0' then 1 else 0 end) as count_zero,
sum(case when latitude is NULL then 1 else 0 end) as count_null,
sum((case when latitude = '0' then 1 else 0 end) +
(case when latitude is NULL then 1 else 0 end)
) as total_zero,
count(latitude) as count_not_nulls,
count(*) as total
from sites_database
Is there a "cleaner" way to write this same query. I have tried using the "sum" expression using the column alias, something like:
Sum(count_zero + count_null) as total_null
But this doesn't seem to work for some reason
You could use COUNT instead of SUM:
SELECT
COUNT(CASE WHEN latitude = '0' THEN 1 END) As count_zero,
COUNT(CASE WHEN latitude IS NULL THEN 1 END) AS count_null,
COUNT(CASE WHEN COALESCE(latitude, '0') = '0' THEN 1 END) AS total_zero,
COUNT(latitude) As count_not_nulls,
COUNT(*) as total
FROM sites_database;
Using COUNT here saves a bit of coding, because we don't have to provide an explicit ELSE condition (the default ELSE is NULL, which just isn't counted at all). Also note that for the total_zero conditional sum, I used COALESCE to merge the two counts into just one.

Why doesn't the "having" clause work?

For the following query:
select count(distinct email_address)
from
(
select distinct email_address,
case when elq_activity_type='EmailSend' then 1 else 0 end 'Sends',
case when elq_activity_type='Bounceback' then 1 else 0 end 'Bounces',
case when elq_activity_type='EmailOpen' then 1 else 0 end 'Opens',
case when elq_activity_type='EmailClickthrough' then 1 else 0 end 'Clicks'
from elq_stg_activity
) a
having sum(sends-bounces)>0
The having clause doesn't seem to be doing anything. What am I doing wrong?
Need to get all unique emails that had an email delivered to them (send-bounce).
Thanks!
I think you want this:
select count(email_address)
from (select email_address,
sum(case when elq_activity_type = 'EmailSend' then 1 else 0 end) Sends,
sum(case when elq_activity_type = 'Bounceback' then 1 else 0 end) as Bounces,
sum(case when elq_activity_type = 'EmailOpen' then 1 else 0 end) as Opens,
sum(case when elq_activity_type = 'EmailClickthrough' then 1 else 0 end) as Clicks
from elq_stg_activity
group by email_address
) a
where sends = bounces;
There are numerous issues with your query. This is the only sensible interpretation I could think of.

SQL percentage with rows same table with different where condition

I want to do a query like:
select
count(asterisk) where acción='a'/count(asterisk) where acción='b' * 100
from
same_table
grouped by day
but I don't want use subquery, is it possible with joins?
I`m not sure the syntax is correct, but you can use something like this:
SELECT day,
SUM(CASE WHEN "acción" = 'a' THEN 1 ELSE 0 END) AS SUM_A,
SUM(CASE WHEN "acción" = 'b' THEN 1 ELSE 0 END) AS SUM_B,
SUM(CASE WHEN "acción" = 'a' THEN 1 ELSE 0 END) AS SUM_A / SUM(CASE WHEN "acción" = 'b' THEN 1 ELSE 0 END) * 100 AS result
FROM your_table
GROUP BY day
The concept is to actually sum the the values that you need, instead of count.

SQL Do not return column if value is zero

This query returns one row with columns Ready, Processing, Complete, Failed and Error with totals for each. Is there a way to rewrite this query so that columns that have a total of zero are not returned?
I'm using this to populate the mschart control and I don't wan't labels on the chart if there are 0 instances of that category.
SELECT
SUM(CASE WHEN Status = 'R' THEN 1 ELSE 0 END) AS Ready,
SUM(CASE WHEN Status = 'P' THEN 1 ELSE 0 END) AS Processing,
SUM(CASE WHEN Status = 'C' THEN 1 ELSE 0 END) AS Complete,
SUM(CASE WHEN Status = 'F' THEN 1 ELSE 0 END) AS Failed,
SUM(CASE WHEN Status = 'E' THEN 1 ELSE 0 END) AS Error
FROM MailDefinition
No, because the shape of the query (the fields it contains) has to be known. Only the data can change, and that is what you should be looking for. You can dynamically remove or hide labels based on 0 or null data in a column.
What I would do is take what you have, throw it into an unpivot, then remove all of the 0 records.
select
Type,
Sum
from
(
SELECT
SUM(CASE WHEN Status = 'R' THEN 1 ELSE 0 END) AS Ready,
SUM(CASE WHEN Status = 'P' THEN 1 ELSE 0 END) AS Processing,
SUM(CASE WHEN Status = 'C' THEN 1 ELSE 0 END) AS Complete,
SUM(CASE WHEN Status = 'F' THEN 1 ELSE 0 END) AS Failed,
SUM(CASE WHEN Status = 'E' THEN 1 ELSE 0 END) AS Error
FROM MailDefinition
) a
unpivot
(
Sum for Type in ([Ready],[Processing],[Complete],[Failed],[Error])
) u
where Sum>0
That does, of course, entail changing your chart some.