Maintain WHERE Order In SQL Select - sql

Is it possible to maintain the order of the WHERE clause when doing a SELECT for specific records?
For instance, given the following SELECT statement:
SELECT [RecSeq] FROM [MyData] WHERE
[RecSeq]=3 OR [RecSeq]=2 OR [RecSeq]=1 OR [RecSeq]=21 OR [RecSeq]=20 OR
[RecSeq]=19 OR [RecSeq]=110 OR [RecSeq]=109 OR [RecSeq]=108 OR
[RecSeq]=53 OR [RecSeq]=52 OR [RecSeq]=51;
I'd like the results to come back as:
3
2
1
21
20
19
110
109
108
53
53
51
However, what I get back isn't in any particular order. Currently I have a loop that calls the SELECT statement for each record required. This could range anywhere from 1 to 700,000 times. Needless to say the performance isn't the best.
Any solutions or am I stuck in the loop?

You need the ORDER BY FIELD clause.
SELECT RecSeq From MyData WHERE RecSeq IN (3, 2, 1, 21, 20, 19, 110, 109, 108, 53, 52, 51)
ORDER BY FIELD (RecSeq, 3, 2, 1, 21, 20, 19, 110, 109, 108, 53, 52, 51);
You don't say what database system you are using - I know this works in MySQL.

There is exactly one way to reliable enforce an ordering of the results of a sql statement: use an order by clause. I don't know if it is standard sql, but in oracle you could do something like this:
select ... from ...
where recseq in ( 3, 2, 1, 21, 20, 19, 110, 109, 108, 53, 53, 51)
order by decode(recseq 3,1, 2,2, 1,3, 21,4, 20,5, 19,6, 110,7, 109,8, 108,9, 53,10, 53,11, 51,12,13)

WHERE clause cannot specify your output order.
You will have to sort your results using an "order by".
If you absolutely need this order, try a 'pseudo-column' , or fake column with a union clause (performance warning here).
select 0 as my_fake_column, blah_columns from table where recseq = 3
UNION
select 1, blah_columns from table where recseq = 2
UNION
select 2, blah_columns from table where recseq = 1
UNION
select 3, blah_columns from table where recseq = 21
order by my_fake_column
The above will deliver the results in your specific order 3,2,1,21.
As the other poster said, adding a column could be an option.

You can use a derived table for filtering and sorting like this
SELECT t.RecSeq
FROM MyData t
JOIN (
SELECT 3, 1 UNION ALL
SELECT 2, 2 UNION ALL
SELECT 1, 3 UNION ALL
SELECT 21, 4 UNION ALL
SELECT 20, 5 UNION ALL
SELECT 19, 6
...
) f(RecSeq, SortKey)
ON t.RecSeq = f.RecSeq
ORDER BY f.SortKey

Ya there is a way, although, some might consider it a hack. Also, I want to point out that you can/should use the IN function instead of the giant conditional statement.
SELECT [RecSeq]
FROM [MyData]
WHERE [RecSeq] in (3,2,1,21,20,19,110,109,108,53,52,51)
ORDER BY DECODE (recseq 3,1, 2,2, 1,3, 21,4,......)

You could try using a UNION. Something like:
SELECT [RecSeq], 1 FROM [MyData] WHERE [RecSeq]=3
UNION
SELECT [RecSeq], 2 FROM [MyData] WHERE [RecSeq]=2
UNION
SELECT [RecSeq], 3 FROM [MyData] WHERE [RecSeq]=1
*etc...*
ORDER BY 2

Related

How to efficiently match regular expressions in a large dataset of Google BigQuery

I have almost 750 regular expressions to match against the Google BigQuery GitHub Public Dataset. At first, I was writing the query as below:
SELECT * FROM
Sample_Table
WHERE
(REGEXP_CONTAINS(content, r"[a-z0-9A-Z]{40}") OR -- expression 1
REGEXP_CONTAINS(content, r"[0-9a-z]{32}") OR -- expression 2
..................
REGEXP_CONTAINS(content, r"[a-z0-9]{80}") OR -- expression n-1
REGEXP_CONTAINS(content, r"[a-z0-9A-Z\%]{35}")) -- expression n
But, in this case, I did not know which regular expression has been matched for a returned result row. Then, after this suggestion, I changed my query to below:
with patterns as (
SELECT 1 pattern_id, r"(?i)(?:abbysale)(?:.|[\n\r]){0,40}\b([a-z0-9A-Z]{40})\b" pattern
UNION ALL SELECT 2, r"(?i)(?:abstract)(?:.|[\n\r]){0,40}\b([0-9a-z]{32})\b"
UNION ALL SELECT 3, r"(?i)(?:abuseipdb)(?:.|[\n\r]){0,40}\b([a-z0-9]{80})\b"
UNION ALL SELECT 4, r"(?i)(?:accuweather)(?:.|[\n\r]){0,40}([a-z0-9A-Z\%]{35})\b"
UNION ALL SELECT 5, r"\b(aio\_[a-zA-Z0-9]{28})\b"
UNION ALL SELECT 6, r"(?i)(?:adobe)(?:.|[\n\r]){0,40}\b([a-z0-9]{32})\b"
UNION ALL SELECT 7, r"(?i)(?:adzuna)(?:.|[\n\r]){0,40}\b([a-z0-9]{32})\b"
UNION ALL SELECT 8, r"(?i)(?:aeroworkflow)(?:.|[\n\r]){0,40}\b([a-zA-Z0-9^!]{20})\b"
UNION ALL SELECT 9, r"(?i)(?:agora)(?:.|[\n\r]){0,40}\b([a-z0-9]{32})\b"
UNION ALL SELECT 10, r"(?i)(?:aha)(?:.|[\n\r]){0,40}\b([0-9a-f]{64})\b"
UNION ALL SELECT 11, r"(?i)(?:airbrake)(?:.|[\n\r]){0,40}\b([a-zA-Z-0-9]{32})\b"
UNION ALL SELECT 12, r"(?i)(?:airship)(?:.|[\n\r]){0,40}\b([0-9Aa-zA-Z]{91})\b"
UNION ALL SELECT 13, r"\b(key[a-zA-Z0-9_-]{14})\b"
UNION ALL SELECT 14, r"(?i)(?:airvisual)(?:.|[\n\r]){0,40}\b([a-z0-9-]{36})\b"
UNION ALL SELECT 15, r"(?i)(?:alconost)(?:.|[\n\r]){0,40}\b([0-9Aa-z]{32})\b"
UNION ALL SELECT 16, r"(?i)(?:alegra)(?:.|[\n\r]){0,40}\b([a-z0-9-]{20})\b"
UNION ALL SELECT 17, r"(?i)(?:aletheiaapi)(?:.|[\n\r]){0,40}\b([A-Z0-9]{32})\b"
UNION ALL SELECT 18, r"(?i)(?:algolia)(?:.|[\n\r]){0,40}\b([a-zA-Z0-9]{32})\b"
UNION ALL SELECT 19, r"\b([a-zA-Z0-9]{30})\b"
UNION ALL SELECT 20, r"(?i)(?:alienvault)(?:.|[\n\r]){0,40}\b([a-z0-9]{64})\b"
UNION ALL SELECT 21, r"(?i)(?:allsports)(?:.|[\n\r]){0,40}\b([a-z0-9]{64})\b"
UNION ALL SELECT 22, r"(?i)(?:amadeus)(?:.|[\n\r]){0,40}\b([0-9A-Za-z]{32})\b"
UNION ALL SELECT 23, r"(?i)(?:ambee)(?:.|[\n\r]){0,40}\b([0-9a-f]{64})\b"
UNION ALL SELECT 24, r"(?i)(?:amplitude)(?:.|[\n\r]){0,40}\b([0-9a-f]{32})\b"
UNION ALL SELECT 25, r"\b([0-9a-z]{8}-[0-9a-z]{4}-[0-9a-z]{4}-[0-9a-z]{4}-[0-9a-z]{12})\b"
UNION ALL SELECT 26, r"(?i)(?:apacta)(?:.|[\n\r]){0,40}\b([a-z0-9-]{36})\b"
UNION ALL SELECT 27, r"(?i)(?:api2cart)(?:.|[\n\r]){0,40}\b([0-9a-f]{32})\b"
UNION ALL SELECT 28, r"\b(sk_live_[a-z0-9A-Z-]{93})\b"
UNION ALL SELECT 29, r"(?i)(?:apideck)(?:.|[\n\r]){0,40}\b([a-z0-9A-Z]{40})\b"
UNION ALL SELECT 30, r"(?i)(?:apiflash)(?:.|[\n\r]){0,40}\b([a-z0-9]{32})\b"
UNION ALL SELECT 31, r"(?i)(?:apiflash)(?:.|[\n\r]){0,40}\b([a-zA-Z0-9\S]{21,30})\b"
UNION ALL SELECT 32, r"(?i)(?:apifonica)(?:.|[\n\r]){0,40}\b([0-9a-z]{11}-[0-9a-z]{4}-[0-9a-z]{4}-[0-9a-z]{4}-[0-9a-z]{12})\b"
UNION ALL SELECT 33, r"\b(apify\_api\_[a-zA-Z-0-9]{36})\b"
UNION ALL SELECT 34, r"(?i)(?:apimatic)(?:.|[\n\r]){0,40}\b([a-zA-Z0-9]{3,20}#[a-zA-Z0-9]{2,12}.[a-zA-Z0-9]{2,5})\b"
UNION ALL SELECT 35, r"(?i)(?:apimatic)(?:.|[\n\r]){0,40}\b([a-z0-9-\S]{8,32})\b"
UNION ALL SELECT 36, r"(?i)(?:apiscience)(?:.|[\n\r]){0,40}\b([a-bA-Z0-9\S]{22})\b"
UNION ALL SELECT 37, r"(?i)(?:apitemplate)(?:.|[\n\r]){0,40}\b([0-9a-zA-Z]{39})\b"
UNION ALL SELECT 38, r"(?i)(?:apollo)(?:.|[\n\r]){0,40}\b([a-zA-Z0-9]{22})\b"
UNION ALL SELECT 39, r"(?i)(?:appcues)(?:.|[\n\r]){0,40}\b([a-z0-9-]{36})\b"
UNION ALL SELECT 40, r"(?i)(?:appcues)(?:.|[\n\r]){0,40}\b([a-z0-9-]{39})\b"
UNION ALL SELECT 41, r"(?i)(?:appcues)(?:.|[\n\r]){0,40}\b([0-9]{5})\b"
UNION ALL SELECT 42, r"(?i)(?:appfollow)(?:.|[\n\r]){0,40}\b([0-9A-Za-z]{20})\b"
UNION ALL SELECT 43, r"(?i)(?:appsynergy)(?:.|[\n\r]){0,40}\b([a-z0-9]{64})\b"
UNION ALL SELECT 44, r"(?i)(?:apptivo)(?:.|[\n\r]){0,40}\b([a-z0-9-]{36})\b"
UNION ALL SELECT 45, r"\b([a-zA-Z0-9]{73})"
UNION ALL SELECT 46, r"\b([A-Za-z0-9](?:[A-Za-z0-9\-]{0,61}[A-Za-z0-9])\.jfrog\.io)"
UNION ALL SELECT 47, r"(?i)(?:artsy)(?:.|[\n\r]){0,40}\b([0-9a-zA-Z]{32})\b"
UNION ALL SELECT 48, r"(?i)(?:asana)(?:.|[\n\r]){0,40}\b([a-z\/:0-9]{51})\b"
UNION ALL SELECT 49, r"(?i)(?:asana)(?:.|[\n\r]){0,40}\b([0-9]{1,}\/[0-9]{16,}:[A-Za-z0-9]{32,})\b"
UNION ALL SELECT 50, r"(?i)(?:assemblyai)(?:.|[\n\r]){0,40}\b([0-9a-z]{32})\b"
)
SELECT any_value(F.repo_name) AS repo_name, string_agg(DISTINCT ('' || pattern_id)) as matches
FROM bigquery-public-data.github_repos.contents AS C
INNER JOIN bigquery-public-data.github_repos.files AS F
ON C.id = F.id
,patterns p
WHERE NOT BINARY AND regexp_contains(C.content, p.pattern)
GROUP BY F.repo_name
Here, my goal is to have the repository name with the distinct matched regular expressions in a comma separated string like below:
Though I want to run all my regular expressions because of the Google BigQuery resource limitation, but I ran the above query with only 50 regular expressions. Unfortunately, I got timeout error.
Operation timed out after 6.0 hours. Consider reducing the amount of work performed by your operation so that it can complete within this limit.
I am not sure how can I optimize the query. Can anyone please help?
The (?:.|[\n\r])*? is a performance killer regex construct.
The only regex flavor that requires this form of matching any char is ElasticSearch, in all others, there are much more efficient ways.
Here, you can use something like
(?is)assemblyai.{0,43}\b([0-9a-z]{32})\b
Make sure your limiting (range/interval) quantifier can match long enough to get to the 32-char whole word (hence, I changed 40 to 43).

How to filter multiple values using BigQuery

I have a BigQuery table with numbers and values. Each number has several values.
I want to get list of numbers where no one negative value.
SELECT 1 as id, 10 as number, 0.2 as value
UNION ALL
SELECT 1, 10, 0.3
UNION ALL
SELECT 1, 10, 0.4
UNION ALL
SELECT 1, 10, 0.4
UNION ALL
SELECT 1, 11, 0.3
UNION ALL
SELECT 1, 11, -0.3
UNION ALL
SELECT 1, 11, 0.1
UNION ALL
SELECT 1, 12, 0.83
UNION ALL
SELECT 1, 12, 0.16
UNION ALL
SELECT 1, 12, 2.3
UNION ALL
SELECT 1, 12, 0.3
I this case I need to get numbers 10 and 12 only because one value of number 11 is negative.
I know how to make it using EXCEPT statement. Maybe there is some more effective way, because original table has more than 70 millions rows.
Below is for BigQuery Standard SQL
#standardSQL
SELECT number
FROM `project.dataset.table`
GROUP BY number
HAVING COUNTIF(value < 0) = 0
If to apply to sample data from your question - output is
You could use an aggregation approach here:
SELECT number
FROM yourTable
GROUP BY number
HAVING COUNT(CASE WHEN value < 0 THEN 1 END) = 0;

unpivot sql table

I have a table with logid,skilllevel,logskill where Data is like
logid, skilllevel1, skilllevel2,skilllevel3,logonskill1,logonskill2,logonskill3,
101, 90, 40, 60 1 2 3
102, 30, 20, 10 4 5 6
I want to get it arranged like the following:
logid, skilllevel, logonskill , skillposition
101, 90, 1 1
101, 40, 2 2
102, 30, 4 1
skilllevel1 corresponds to logonskill1 as so on
skillposition is the substring of logonskill
How can I achieve this?
My preferred method is a lateral join, using apply:
select v.*
from t cross apply
(values (logid, skilllevel1, logonskill1, 1),
(logid, skilllevel2, logonskill2, 2),
(logid, skilllevel3, logonskill3, 3)
) v(logid, skilllevel, logonskill, skillposition)
where skilllevel is not null or logonskill is not null;
Lateral joins are very powerful. This is just one or many things that you can do with apply.

Why doesn't this SELECT statment have a FROM?

<CFIF ListLen(SESSION.WHSurveyStruct.reasonString, ";") gt 0>
<CFQUERY name="insertReasons" datasource="#REQUEST.dsn#">
INSERT INTO TWelcomeHome_Reason
(ReasonID, SubReasonID, SurveyID)
SELECT #sanitize(ListFirst(SESSION.WHSurveyStruct.reasonString, ";"))#, #sanitize(getLatestSurveyID.SurveyID)#
<CFLOOP list="#sanitize(ListRest(SESSION.WHSurveyStruct.reasonString, ';'))#" index="thisReason" delimiters=";">
UNION ALL
SELECT #sanitize(thisReason)#, #sanitize(getLatestSurveyID.SurveyID)#
</CFLOOP>
</CFQUERY>
I'm trying to understand what this does. I'm confused with the loop, why don't the select statements have a FROM? Ok they are just scalars.
What about how there's one select statement on the outside of the loop and one on the inside? I sort of don't get the point on union all. And how come there are 3 columns being specified (ReasonID, SubReasonID, SurveyID) but in each select 2 values are given?
dumped:
struct
CACHED: false EXECUTIONTIME: 0 RECORDCOUNT: 8 SQL: INSERT INTO
TWelcomeHome_Reason (ReasonID, SubReasonID, SurveyID) SELECT 6,
18, 245
UNION ALL
SELECT 6, 21, 245
UNION ALL
SELECT 6, 24, 245
UNION ALL
SELECT 3, 5, 245
UNION ALL
SELECT 3, 6, 245
UNION ALL
SELECT 3, 8, 245
UNION ALL
SELECT 3, 11, 245
UNION ALL
SELECT 3, 7, 245
It looks like it is just SELECTing scalar values, not records from any table. So
INSERT INTO myTable
SELECT 'foo'
UNION ALL
SELECT 'bar'
will insert two records into myTable, foo and bar.
The short answer is that it's not selecting from a table. So there's no table to FROM from.
If you execute:
INSERT INTO TableSomething (ColumnA)
SELECT 'A'
UNION ALL
SELECT 'B'
It will insert A and B into ColumnA.
ColdFusion is creating the data to insert rather than pulling from a table.

How to do equivalent of "limit distinct"?

How can I limit a result set to n distinct values of a given column(s), where the actual number of rows may be higher?
Input table:
client_id, employer_id, other_value
1, 2, abc
1, 3, defg
2, 3, dkfjh
3, 1, ldkfjkj
4, 4, dlkfjk
4, 5, 342
4, 6, dkj
5, 1, dlkfj
6, 1, 34kjf
7, 7, 34kjf
8, 6, lkjkj
8, 7, 23kj
desired output, where limit distinct=5 distinct values of client_id:
1, 2, abc
1, 3, defg
2, 3, dkfjh
3, 1, ldkfjkj
4, 4, dlkfjk
4, 5, 342
4, 6, dkj
5, 1, dlkfj
Platform this is intended for is MySQL.
You can use a subselect
select * from table where client_id in
(select distinct client_id from table order by client_id limit 5)
This is for SQL Server. I can't remember, MySQL may use a LIMIT keyword instead of TOP. That may make the query more efficient if you can get rid of the inner most subquery by using the LIMIT and DISTINCT in the same subquery. (It looks like Vinko used this method and that LIMIT is correct. I'll leave this here for the second possible answer though.)
SELECT
client_id,
employer_id,
other_value
FROM
MyTable
WHERE
client_id IN
(
SELECT TOP 5
client_id
FROM
(
SELECT DISTINCT
client_id
FROM
MyTable
) SQ
ORDER BY
client_id
)
Of course, add in your own WHERE clause and ORDER BY clause in the subquery.
Another possibility (compare performance and see which works out better) is:
SELECT
client_id,
employer_id,
other_value
FROM
MyTable T1
WHERE
T1.code IN
(
SELECT
T2.code
FROM
MyTable T2
WHERE
(SELECT COUNT(*) FROM MyTable T3 WHERE T3,code < T2.code) < 5
)
-- Using Common Table Expression in Microsoft SQL Server.
-- LIMIT function does not exist in MS SQL.
WITH CTE
AS
(SELECT DISTINCT([COLUMN_NAME])
FROM [TABLE_NAME])
SELECT TOP (5) [[COLUMN_NAME]]
FROM CTE;
This works for ‍‍MS SQL if anyone is on that platform:
SET ROWCOUNT 10;
SELECT DISTINCT
column1, column2, column3,...
FROM
Table1
WHERE ...