Query with OR excluding results in SQL - sql

I am performing this query:
SELECT SQL_CALC_FOUND_ROWS uniqueid, start
FROM sometable
LEFT JOIN log ON
sometable.uniqueid = log.anid AND (log.info='someinfo' or log.info='otherinfo')
WHERE `moreInfo` = 1 AND `type` = 'myType'
AND (`channel` LIKE 'ACHANNEL/%' or `channel` LIKE 'OTHERCHANNEL\/%')
AND `start` > '2019-01-22 00:00:00' AND `start` < '2019-01-22 23:59:59'
GROUP BY uniqueid
However this part does not seem to be working:
AND (`channel` LIKE 'ACHANNEL%' or `channel` LIKE 'OTHERCHANNEL%')
I want this to give me all channels that begin with ACHANNEL/, or channel that begins with OTHERCHANNEL/, instead, it always gives me the results that begin with OTHERCHANNEL/.
The OR seems to not be working.
Any help?

Your query should be doing what you want. If you are missing rows then perhaps the patterns don't really match.
I do have some suggestions though:
SELECT SQL_CALC_FOUND_ROWS t.uniqueid, MIN(start) as start
FROM sometable t LEFT JOIN
log l
ON t.uniqueid = l.anid AND l.info in ('someinfo', 'otherinfo')
WHERE t.`moreInfo` = 1 AND
t.`type` = 'myType' AND
(t.`channel` LIKE 'ACHANNEL/%' or t.`channel` LIKE 'OTHERCHANNEL/%') AND
t.`start` > '2019-01-22' AND `start` < '2019-01-23'
GROUP BY t.uniqueid;
Note that I have qualified all column names and assumed that all column references in the WHERE refer to sometable and not log. Otherwise, the LEFT JOIN would be turned into an INNER JOIN.
Notes:
All columns are qualified meaning they use the table aliases. Strongly, strongly recommended when a query has more than one table reference.
The date arithmetic has been simplified.
The multiple comparisons with OR have been simplified using IN.
The "bare" start column has been turned into an aggregation function.
The escape character \ has been removed. There is no need for any escape characters with forward slashes.

You did not escape your first condition:
AND (channel LIKE 'ACHANNEL\/%' or channel LIKE 'OTHERCHANNEL\/%')

Related

Convert Legacy to Standard SQL (Join Each and comma Like)

I'm struggling to convert this Legacy SQL Query to Standard SQL. Particular things that need to be converted are FLATTEN, JOIN EACH, No matching signature for function REGEXP_REPLACE for argument types: ARRAY, STRING, STRING. Supported signatures: REGEXP_REPLACE(STRING, STRING, STRING); REGEXP_REPLACE(BYTES, BYTES, BYTES), etc. ...Can anyone please help?
Thanks!
SELECT a.name, b.name, COUNT(*) as count
FROM (FLATTEN(
SELECT GKGRECORDID, UNIQUE(REGEXP_REPLACE(SPLIT(V2Persons,';'), r',.*'," ")) name
FROM [gdelt-bq:gdeltv2.gkg]
WHERE DATE>20180901000000 and DATE < 20180910000000 and V2Persons like '%Trump%'
,name)) a
JOIN EACH (
SELECT GKGRECORDID, UNIQUE(REGEXP_REPLACE(SPLIT(V2Persons,';'), r',.*'," ")) name
FROM [gdelt-bq:gdeltv2.gkg]
WHERE DATE>20180901000000 and DATE < 20180910000000 and V2Persons like '%Trump%'
) b
ON a.GKGRECORDID=b.GKGRECORDID
WHERE a.name<b.name
GROUP EACH BY 1,2
ORDER BY 3 DESC
LIMIT 250
SELECT a.name, b.b_name, COUNT(*) as count
FROM (
SELECT DISTINCT GKGRECORDID, REGEXP_REPLACE(name, r',.*'," ") name
FROM `gdelt-bq.gdeltv2.gkg`, UNNEST(SPLIT(V2Persons,';')) as name
WHERE DATE>20180901000000 and DATE < 20180910000000 and V2Persons like '%Trump%'
) a
JOIN (
SELECT DISTINCT GKGRECORDID, REGEXP_REPLACE(b_name, r',.*'," ") b_name
FROM `gdelt-bq.gdeltv2.gkg`, UNNEST(SPLIT(V2Persons,';')) as b_name
WHERE DATE>20180901000000 and DATE < 20180910000000 and V2Persons like '%Trump%'
) b
ON a.GKGRECORDID=b.GKGRECORDID
WHERE a.name<b.b_name
GROUP BY 1,2
ORDER BY 3 DESC
LIMIT 250
Re: the flatten I would consult the documentation here: https://cloud.google.com/bigquery/docs/reference/standard-sql/migrating-from-legacy-sql#removing_repetition_with_flatten
Among other examples, the documentation notes:
"Standard SQL does not have a FLATTEN function as in legacy SQL, but you can achieve similar semantics using the JOIN (comma) operator."
Re: Join Each, this has been answered here: BigQuery - equivalent of GROUP EACH in standard SQL
Basically, it is not necessary at all in standard sql
Re: "LIKE that has comma separated parameters...", your syntax is fine for standard sql. it should not operate any differently than it did when you ran in in legacy sql. One of the big pluses of standard sql is that you can compare columns using functions in the WHERE statement with more flexibility than legacy SQL allowed (if necessary). For instance, if you wanted to split V2Persons before running a like comparison, you could do that right in the WHERE statement
UPDATE: Realizing I missed your last question about data type mismatches. In standard sql you will probably want to cast everything explicitly when you run into these errors. It is more finicky than legacy sql with regards to comparisons between different data-types, but I find this to me more in line with other SQL databases.

SELECT MAX keeps converting DATETIME to VARCHAR

Trying to write something that produces the MAX date for a record and then show all records that have a MAX date less than today, but the SQL query keeps converting the DATETIME field to a VARCHAR so I'm unable to use maxdate < GetDate()
The code I'm trying to use is
'Max Day'=(SELECT MAX (del.original_stop) FROM del, line WHERE del.obj_id=T2.obj_id AND del.type=0 AND line.opt <> 1)
T2.obj_id is defined in a LEFT OUTER JOIN in my main FROM clause.
Any ideas?
*Updating w/full query
SELECT
T1.doc AS "Document",
T1.owner AS "Own",
T1.last_up AS "Last Updated (EST)",
"Dollars"=ISNULL((SELECT CAST(SUM(fund_amt) AS DECIMAL(16,2)) FROM c_fund T3 WHERE T2.obj_id=T3.obj_id),0),
"Max Day"=(SELECT MAX(del.orig_stop) FROM del, line WHERE del.obj_id=T2.obj_id AND del.del_type=0 AND line.opt <> 1)
FROM
dsk T1 LEFT OUTER JOIN doc_object T2 ON T1.obj_id=T2.obj_id
WHERE
T1.icon_id=4
AND
T1.last_up >= '2018/01/21' AND T1.last_up <= '2018/01/24'
ORDER BY
"Last Updated (EST)" desc
And I know about the commas in the FROM clause, will fix those shortly.
Use as to assign aliases:
select (SELECT MAX(del.original_stop)
FROM del JOIN
line
ON del.obj_id=T2.obj_id AND del.type=0 AND line.opt <> 1
) as max_day,
. . .
Also, never use commas in the FROM clause. Always use proper, explicit JOIN syntax.
If original_stop is a string, then my first advice is to fix the data. You should be storing dates and times in native formats, not as strings.
If for some reason you cannot fix the data model, then you can use convert() . You'll need to peruse the formats available for the function.

stream analytics query get error column name doesn't exist, but it does?

When I run my query in management studio it works fine, but in a stream analytics job it throws an error: Query compilation error: Invalid column name: 'afkorting'. Column with such name does not exist..
I downloaded the input tables to check if something went wrong with uploading, but that file does have that column name (and I double checked for capital letters, miswriting etc), so how can I fix this?
This is my query:
; WITH Check AS
(
SELECT afkorting, *
FROM Reizen RE
LEFT JOIN Gegevens AP
ON RE.ID = AP.code
)
SELECT *
FROM Check CH
JOIN Model VM
ON CH.afkorting = VM.Station
WHERE VM.h_station = VM.v_station
AND DATEPART(hour, CH.MsgReportDate) = VM.start_uur
AND (DATEPART(minute, CH.MsgReportDate) BETWEEN VM.start_minuut AND VM.eind_minuut)
AND DATEPART(weekday, CH.MsgReportDate) = VM.weekdag
Hope someone can help me!
*PROBLEM SOLVED: you need to give in all columnnames, so not SELECT * but SELECT column1, column2 and use the given prefixes of the table, in my case: AP.column1, RE.column2 etc*
Just summarize all comments above for resolving the issue, I did some testing for Stream Query language elements WITH, SELECT & JOIN. Here is my result list for the issue.
Without JOIN, using column names with symbol * in the WITH scope is correct for executing on ASA.
With JOIN, it's necessary to list all column names you want without symbol * for executing. The reason seems to be to avoid ambiguity with column name conflict.
you need to give in all column names, so not
SELECT * but SELECT column1, column2
and use the given prefixes of the table,
for example
in my case:
AP.column1, RE.column2 etc

Oracle SQL: Filtering rows with non-numeric characters

My question is very similar to this one: removing all the rows from a table with columns A and B, where some records include non-numeric characters (looking like '1234#5' or '1bbbb'). However, the solutions I read around don't seem to work for me. For example,
SELECT count(*) FROM tbl
--962060;
SELECT count(*)
FROM tbl
WHERE (REGEXP_like(A,'[^0-9]') OR REGEXP_like(B,'[^0-9]') ) ;
--17
SELECT count(*)
FROM tbl
WHERE (REGEXP_like(A,'[0-9]') and REGEXP_like(B,'[0-9]') )
;
--962060
From the 3rd query, I'd expect to see (962060-17)=962043. Why is it still 962060? An alternative query like this also gives the same answer:
SELECT count(*)
FROM tbl
WHERE (REGEXP_like(A,'[[:digit:]]')and REGEXP_like(B,'[[:digit:]]') )
;
--962060
Of course, I could bypass the problem by doing query1 minus query2, but I'd like to learn how to do that using regular expressions.
If you use regexp you should take in account that any part of string may be matched as regexp. According your example you should specify that whole string should cntain only numbers ^ - is the beginig of string $ - is the end. And you may use \d- is digits
SELECT count(*)
FROM tbl
WHERE (REGEXP_like(A,'^[0-9]+$') and REGEXP_like(B,'^[0-9]+$') )
or
SELECT count(*)
FROM tbl
WHERE (REGEXP_like(A,'^\d+$') and REGEXP_like(B,'^\d+$') )
I know you specifically asked for a regex solution, but translate can solve these kind of questions as well (and usually faster because regexes use more processing power):
select count(1)
from tbl
where translate(a, 'x0123456789', 'x') is null
and translate(b, 'x0123456789', 'x') is null;
What this does: translate the characters 0123456789 to null, and if the result is null, then the input must have been all digits. The 'x' is just there because the third argument to translate can not be null.
Thought I should add this here, might be helpful to other readers.

how can I force SQL to only evaluate a join if the value can be converted to an INT?

I've got a query that uses several subqueries. It's about 100 lines, so I'll leave it out. The issue is that I have several rows returned as part of one subquery that need to be joined to an integer value from the main query. Like so:
Select
... columns ...
from
... tables ...
(
select
... column ...
from
... tables ...
INNER JOIN core.Type mt
on m.TypeID = mt.TypeID
where dpt.[DataPointTypeName] = 'TheDataPointType'
and m.TypeID in (100008, 100009, 100738, 100739)
and datediff(d, m.MeasureEntered, GETDATE()) < 365 -- only care about measures from past year
and dp.DataPointValue <> ''
) as subMdp
) as subMeas
on (subMeas.DataPointValue NOT LIKE '%[^0-9]%'
and subMeas.DataPointValue = cast(vcert.IDNumber as varchar(50))) -- THIS LINE
... more tables etc ...
The issue is that if I take out the cast(vcert.IDNumber as varchar(50))) it will attempt to compare a value like 'daffodil' to a number like 3245. Even though the datapoint that contains 'daffodil' is an orphan record that should be filtered out by the INNER JOIN 4 lines above it. It works fine if I try to compare a string to a string but blows up if I try to compare a string to an int -- even though I have a clause in there to only look at things that can be converted to integers: NOT LIKE '%[^0-9]%'. If I specifically filter out the record containing 'daffodil' then it's fine. If I move the NOT LIKE line into the subquery it will still fail. It's like the NOT LIKE is evaluated last no matter what I do.
So the real question is why SQL would be evaluating a JOIN clause before evaluating a WHERE clause contained in a subquery. Also how I can force it to only evaluate the JOIN clause if the value being evaluated is convertible to an INT. Also why it would be evaluating a record that will definitely not be present after an INNER JOIN is applied.
I understand that there's a strong element of query optimizer voodoo going on here. On the other hand I'm telling it to do an INNER JOIN and the optimizer is specifically ignoring it. I'd like to know why.
The problem you are having is discussed in this item of feedback on the connect site.
Whilst logically you might expect the filter to exclude any DataPointValue values that contain any non numeric characters SQL Server appears to be ordering the CAST operation in the execution plan before this filter happens. Hence the error.
Until Denali comes along with its TRY_CONVERT function the way around this is to wrap the usage of the column in a case expression that repeats the same logic as the filter.
So the real question is why SQL would be evaluating a JOIN clause
before evaluating a WHERE clause contained in a subquery.
Because SQL engines are required to behave as if that's what they do. They're required to act like they build a working table from all of the table constructors in the FROM clause; expressions in the WHERE clause are applied to that working table.
Joe Celko wrote about this many times on Usenet. Here's an old version with more details.
First of all,
NOT LIKE '%[^0-9]%'
isn`t work well. Example:
DECLARE #Int nvarchar(20)= ' 454 54'
SELECT CASE WHEN #INT LIKE '%[^0-9]%' THEN 1 ELSE 0 END AS Is_Number
Result: 1
But it is not a number!
To check if it is real int value , you should use ISNUMERIC function. Let`s check this:
DECLARE #Int nvarchar(20)= ' 454 54'
SELECT ISNUMERIC(#int) Is_Int
Result:0
Result is correct.
So, instead of
NOT LIKE '%[^0-9]%'
try to change this to
ISNUMERIC(subMeas.DataPointValue)=0
UPDATE
How check if value is integer?
First here:
WHERE ISNUMERIC(str) AND str NOT LIKE '%.%' AND str NOT LIKE '%e%' AND str NOT LIKE '%-%'
Second:
CREATE Function dbo.IsInteger(#Value VarChar(18))
Returns Bit
As
Begin
Return IsNull(
(Select Case When CharIndex('.', #Value) > 0
Then Case When Convert(int, ParseName(#Value, 1)) <> 0
Then 0
Else 1
End
Else 1
End
Where IsNumeric(#Value + 'e0') = 1), 0)
End
Filter out the non-numeric records in a subquery or CTE