Why doesn't this SELECT statment have a FROM? - sql

<CFIF ListLen(SESSION.WHSurveyStruct.reasonString, ";") gt 0>
<CFQUERY name="insertReasons" datasource="#REQUEST.dsn#">
INSERT INTO TWelcomeHome_Reason
(ReasonID, SubReasonID, SurveyID)
SELECT #sanitize(ListFirst(SESSION.WHSurveyStruct.reasonString, ";"))#, #sanitize(getLatestSurveyID.SurveyID)#
<CFLOOP list="#sanitize(ListRest(SESSION.WHSurveyStruct.reasonString, ';'))#" index="thisReason" delimiters=";">
UNION ALL
SELECT #sanitize(thisReason)#, #sanitize(getLatestSurveyID.SurveyID)#
</CFLOOP>
</CFQUERY>
I'm trying to understand what this does. I'm confused with the loop, why don't the select statements have a FROM? Ok they are just scalars.
What about how there's one select statement on the outside of the loop and one on the inside? I sort of don't get the point on union all. And how come there are 3 columns being specified (ReasonID, SubReasonID, SurveyID) but in each select 2 values are given?
dumped:
struct
CACHED: false EXECUTIONTIME: 0 RECORDCOUNT: 8 SQL: INSERT INTO
TWelcomeHome_Reason (ReasonID, SubReasonID, SurveyID) SELECT 6,
18, 245
UNION ALL
SELECT 6, 21, 245
UNION ALL
SELECT 6, 24, 245
UNION ALL
SELECT 3, 5, 245
UNION ALL
SELECT 3, 6, 245
UNION ALL
SELECT 3, 8, 245
UNION ALL
SELECT 3, 11, 245
UNION ALL
SELECT 3, 7, 245

It looks like it is just SELECTing scalar values, not records from any table. So
INSERT INTO myTable
SELECT 'foo'
UNION ALL
SELECT 'bar'
will insert two records into myTable, foo and bar.

The short answer is that it's not selecting from a table. So there's no table to FROM from.
If you execute:
INSERT INTO TableSomething (ColumnA)
SELECT 'A'
UNION ALL
SELECT 'B'
It will insert A and B into ColumnA.
ColdFusion is creating the data to insert rather than pulling from a table.

Related

Shouldn't this statement end with an error?

Consider the SELECT statement below:
SELECT 1, 'A'
UNION ALL
SELECT 2, 'B'
UNION ALL
SELECT 3, 'C';
The result is obvious:
1 'A'
2 'B'
3 'C'
I tried to store it as a separate table:
CREATE TABLE tmp AS
SELECT 1, 'A'
UNION ALL
SELECT 2, 'B'
UNION ALL
SELECT 3, 'C';
The contents of the tmp table are surprising:
1 'A'
2 'A'
3 'A'
Ok, that can be fixed providing explicit field names:
CREATE TABLE tmp AS
SELECT 1 AS field1, 'A' AS field2
UNION ALL
SELECT 2, 'B'
UNION ALL
SELECT 3, 'C';
Now I have a question whether the observed behavior is defined and valid. I feel that the statement without explicit field names should end with an error instead of producing a surprising result.
I'm using SQLite.

How to efficiently match regular expressions in a large dataset of Google BigQuery

I have almost 750 regular expressions to match against the Google BigQuery GitHub Public Dataset. At first, I was writing the query as below:
SELECT * FROM
Sample_Table
WHERE
(REGEXP_CONTAINS(content, r"[a-z0-9A-Z]{40}") OR -- expression 1
REGEXP_CONTAINS(content, r"[0-9a-z]{32}") OR -- expression 2
..................
REGEXP_CONTAINS(content, r"[a-z0-9]{80}") OR -- expression n-1
REGEXP_CONTAINS(content, r"[a-z0-9A-Z\%]{35}")) -- expression n
But, in this case, I did not know which regular expression has been matched for a returned result row. Then, after this suggestion, I changed my query to below:
with patterns as (
SELECT 1 pattern_id, r"(?i)(?:abbysale)(?:.|[\n\r]){0,40}\b([a-z0-9A-Z]{40})\b" pattern
UNION ALL SELECT 2, r"(?i)(?:abstract)(?:.|[\n\r]){0,40}\b([0-9a-z]{32})\b"
UNION ALL SELECT 3, r"(?i)(?:abuseipdb)(?:.|[\n\r]){0,40}\b([a-z0-9]{80})\b"
UNION ALL SELECT 4, r"(?i)(?:accuweather)(?:.|[\n\r]){0,40}([a-z0-9A-Z\%]{35})\b"
UNION ALL SELECT 5, r"\b(aio\_[a-zA-Z0-9]{28})\b"
UNION ALL SELECT 6, r"(?i)(?:adobe)(?:.|[\n\r]){0,40}\b([a-z0-9]{32})\b"
UNION ALL SELECT 7, r"(?i)(?:adzuna)(?:.|[\n\r]){0,40}\b([a-z0-9]{32})\b"
UNION ALL SELECT 8, r"(?i)(?:aeroworkflow)(?:.|[\n\r]){0,40}\b([a-zA-Z0-9^!]{20})\b"
UNION ALL SELECT 9, r"(?i)(?:agora)(?:.|[\n\r]){0,40}\b([a-z0-9]{32})\b"
UNION ALL SELECT 10, r"(?i)(?:aha)(?:.|[\n\r]){0,40}\b([0-9a-f]{64})\b"
UNION ALL SELECT 11, r"(?i)(?:airbrake)(?:.|[\n\r]){0,40}\b([a-zA-Z-0-9]{32})\b"
UNION ALL SELECT 12, r"(?i)(?:airship)(?:.|[\n\r]){0,40}\b([0-9Aa-zA-Z]{91})\b"
UNION ALL SELECT 13, r"\b(key[a-zA-Z0-9_-]{14})\b"
UNION ALL SELECT 14, r"(?i)(?:airvisual)(?:.|[\n\r]){0,40}\b([a-z0-9-]{36})\b"
UNION ALL SELECT 15, r"(?i)(?:alconost)(?:.|[\n\r]){0,40}\b([0-9Aa-z]{32})\b"
UNION ALL SELECT 16, r"(?i)(?:alegra)(?:.|[\n\r]){0,40}\b([a-z0-9-]{20})\b"
UNION ALL SELECT 17, r"(?i)(?:aletheiaapi)(?:.|[\n\r]){0,40}\b([A-Z0-9]{32})\b"
UNION ALL SELECT 18, r"(?i)(?:algolia)(?:.|[\n\r]){0,40}\b([a-zA-Z0-9]{32})\b"
UNION ALL SELECT 19, r"\b([a-zA-Z0-9]{30})\b"
UNION ALL SELECT 20, r"(?i)(?:alienvault)(?:.|[\n\r]){0,40}\b([a-z0-9]{64})\b"
UNION ALL SELECT 21, r"(?i)(?:allsports)(?:.|[\n\r]){0,40}\b([a-z0-9]{64})\b"
UNION ALL SELECT 22, r"(?i)(?:amadeus)(?:.|[\n\r]){0,40}\b([0-9A-Za-z]{32})\b"
UNION ALL SELECT 23, r"(?i)(?:ambee)(?:.|[\n\r]){0,40}\b([0-9a-f]{64})\b"
UNION ALL SELECT 24, r"(?i)(?:amplitude)(?:.|[\n\r]){0,40}\b([0-9a-f]{32})\b"
UNION ALL SELECT 25, r"\b([0-9a-z]{8}-[0-9a-z]{4}-[0-9a-z]{4}-[0-9a-z]{4}-[0-9a-z]{12})\b"
UNION ALL SELECT 26, r"(?i)(?:apacta)(?:.|[\n\r]){0,40}\b([a-z0-9-]{36})\b"
UNION ALL SELECT 27, r"(?i)(?:api2cart)(?:.|[\n\r]){0,40}\b([0-9a-f]{32})\b"
UNION ALL SELECT 28, r"\b(sk_live_[a-z0-9A-Z-]{93})\b"
UNION ALL SELECT 29, r"(?i)(?:apideck)(?:.|[\n\r]){0,40}\b([a-z0-9A-Z]{40})\b"
UNION ALL SELECT 30, r"(?i)(?:apiflash)(?:.|[\n\r]){0,40}\b([a-z0-9]{32})\b"
UNION ALL SELECT 31, r"(?i)(?:apiflash)(?:.|[\n\r]){0,40}\b([a-zA-Z0-9\S]{21,30})\b"
UNION ALL SELECT 32, r"(?i)(?:apifonica)(?:.|[\n\r]){0,40}\b([0-9a-z]{11}-[0-9a-z]{4}-[0-9a-z]{4}-[0-9a-z]{4}-[0-9a-z]{12})\b"
UNION ALL SELECT 33, r"\b(apify\_api\_[a-zA-Z-0-9]{36})\b"
UNION ALL SELECT 34, r"(?i)(?:apimatic)(?:.|[\n\r]){0,40}\b([a-zA-Z0-9]{3,20}#[a-zA-Z0-9]{2,12}.[a-zA-Z0-9]{2,5})\b"
UNION ALL SELECT 35, r"(?i)(?:apimatic)(?:.|[\n\r]){0,40}\b([a-z0-9-\S]{8,32})\b"
UNION ALL SELECT 36, r"(?i)(?:apiscience)(?:.|[\n\r]){0,40}\b([a-bA-Z0-9\S]{22})\b"
UNION ALL SELECT 37, r"(?i)(?:apitemplate)(?:.|[\n\r]){0,40}\b([0-9a-zA-Z]{39})\b"
UNION ALL SELECT 38, r"(?i)(?:apollo)(?:.|[\n\r]){0,40}\b([a-zA-Z0-9]{22})\b"
UNION ALL SELECT 39, r"(?i)(?:appcues)(?:.|[\n\r]){0,40}\b([a-z0-9-]{36})\b"
UNION ALL SELECT 40, r"(?i)(?:appcues)(?:.|[\n\r]){0,40}\b([a-z0-9-]{39})\b"
UNION ALL SELECT 41, r"(?i)(?:appcues)(?:.|[\n\r]){0,40}\b([0-9]{5})\b"
UNION ALL SELECT 42, r"(?i)(?:appfollow)(?:.|[\n\r]){0,40}\b([0-9A-Za-z]{20})\b"
UNION ALL SELECT 43, r"(?i)(?:appsynergy)(?:.|[\n\r]){0,40}\b([a-z0-9]{64})\b"
UNION ALL SELECT 44, r"(?i)(?:apptivo)(?:.|[\n\r]){0,40}\b([a-z0-9-]{36})\b"
UNION ALL SELECT 45, r"\b([a-zA-Z0-9]{73})"
UNION ALL SELECT 46, r"\b([A-Za-z0-9](?:[A-Za-z0-9\-]{0,61}[A-Za-z0-9])\.jfrog\.io)"
UNION ALL SELECT 47, r"(?i)(?:artsy)(?:.|[\n\r]){0,40}\b([0-9a-zA-Z]{32})\b"
UNION ALL SELECT 48, r"(?i)(?:asana)(?:.|[\n\r]){0,40}\b([a-z\/:0-9]{51})\b"
UNION ALL SELECT 49, r"(?i)(?:asana)(?:.|[\n\r]){0,40}\b([0-9]{1,}\/[0-9]{16,}:[A-Za-z0-9]{32,})\b"
UNION ALL SELECT 50, r"(?i)(?:assemblyai)(?:.|[\n\r]){0,40}\b([0-9a-z]{32})\b"
)
SELECT any_value(F.repo_name) AS repo_name, string_agg(DISTINCT ('' || pattern_id)) as matches
FROM bigquery-public-data.github_repos.contents AS C
INNER JOIN bigquery-public-data.github_repos.files AS F
ON C.id = F.id
,patterns p
WHERE NOT BINARY AND regexp_contains(C.content, p.pattern)
GROUP BY F.repo_name
Here, my goal is to have the repository name with the distinct matched regular expressions in a comma separated string like below:
Though I want to run all my regular expressions because of the Google BigQuery resource limitation, but I ran the above query with only 50 regular expressions. Unfortunately, I got timeout error.
Operation timed out after 6.0 hours. Consider reducing the amount of work performed by your operation so that it can complete within this limit.
I am not sure how can I optimize the query. Can anyone please help?
The (?:.|[\n\r])*? is a performance killer regex construct.
The only regex flavor that requires this form of matching any char is ElasticSearch, in all others, there are much more efficient ways.
Here, you can use something like
(?is)assemblyai.{0,43}\b([0-9a-z]{32})\b
Make sure your limiting (range/interval) quantifier can match long enough to get to the 32-char whole word (hence, I changed 40 to 43).

How to query data which is not unique up to a certain point?

Basically the current conditions of the query are
WHERE data_payload_uri BETWEEN
'/organization/team/folder/2021'
AND
'/organization/team/folder/2022'
And this gets all data for the year of 2021.
A sample of the data_payload_uri data looks like this:
/organization/team/folder/20210101/orig
/organization/team/folder/20210102/orig
/organization/team/folder/20210102/orig_v1
/organization/team/folder/20210103/orig
/organization/team/folder/20210104/orig
/organization/team/folder/20210105/orig
/organization/team/folder/20210105/orig_v1
/organization/team/folder/20210105/orig_v2
What I would like to do is only query the rows where up until the last forward-slash, the row is NOT unique.
What this means, is I want to NOT query the rows which ONLY have one orig
/organization/team/folder/20210101/orig
/organization/team/folder/20210103/orig
/organization/team/folder/20210104/orig
but I DO want to query all the other rows
/organization/team/folder/20210105/orig
/organization/team/folder/20210105/orig_v1
/organization/team/folder/20210105/orig_v2
/organization/team/folder/20210102/orig
/organization/team/folder/20210102/orig_v1
What is the best way to do this? Pls let me know if anything is unclear and thank you for any help
You can use the analytic COUNT function:
SELECT *
FROM (
SELECT t.*,
COUNT(DISTINCT data_payload_uri) OVER (
PARTITION BY SUBSTR(data_payload_uri, 1, INSTR(data_payload_uri, '/', -1))
) AS cnt
FROM table_name t
WHERE data_payload_uri >= '/organization/team/folder/2021'
AND data_payload_uri < '/organization/team/folder/2022'
)
WHERE cnt > 1
Which, for the sample data:
CREATE TABLE table_name (id, data_payload_uri) AS
SELECT 1, '/organization/team/folder/20210101/orig' FROM DUAL UNION ALL
SELECT 2, '/organization/team/folder/20210102/orig' FROM DUAL UNION ALL
SELECT 3, '/organization/team/folder/20210102/orig_v1' FROM DUAL UNION ALL
SELECT 4, '/organization/team/folder/20210103/orig' FROM DUAL UNION ALL
SELECT 5, '/organization/team/folder/20210104/orig' FROM DUAL UNION ALL
SELECT 6, '/organization/team/folder/20210105/orig' FROM DUAL UNION ALL
SELECT 7, '/organization/team/folder/20210105/orig_v1' FROM DUAL UNION ALL
SELECT 8, '/organization/team/folder/20210105/orig_v2' FROM DUAL;
Outputs:
ID
DATA_PAYLOAD_URI
CNT
2
/organization/team/folder/20210102/orig
2
3
/organization/team/folder/20210102/orig_v1
2
6
/organization/team/folder/20210105/orig
3
7
/organization/team/folder/20210105/orig_v1
3
8
/organization/team/folder/20210105/orig_v2
3
db<>fiddle here

Efficient way to insert values in an Oracle SQL table

I have two tables like
create table nodes_tbl as (
select 'a' as nodeid, 'some string' as dummy_string, 0 as subnetid from dual union all
select 'b', 'qwe', 0 from dual union all
select 'c', 'asd', 0 from dual union all
select 'd', 'zxc', 0 from dual union all
select 'e', 'rty', 0 from dual);
And
create table subnets as (
select 'a' as nodeid, 1 as subnetid from dual union all
select 'b', 2 from dual union all
select 'c', 2 from dual union all
select 'd', 3 from dual union all
select 'e', 4 as nodeid from dual);
With several millions of records a join works fast.
select n.NODEID, n.DUMMY_STRING, s.subnetid
from nodes_tbl n, subnets s where s.nodeid=n.nodeid
Writes are fast as well
create table test_tbl as n.NODEID, s.subnetid
from nodes_tbl n, subnets s where s.nodeid=n.nodeid --10M records in 2s.
However, when I try to update table and add values to the column the query is very slow
UPDATE nodes_tbl n
SET subnetid = (SELECT subnetid
FROM subnets s
WHERE s.nodeid = n.nodeid)
WHERE EXISTS (
SELECT subnetid FROM subnets s
WHERE s.nodeid = n.nodeid) --8 minutes for 100K records
Why is insert so much slower than a create table from a select statement?
What is the most efficient way to do this insert?
I know about create view option, but want to avoid it.
Try MERGE instead:
merge into nodes_tbl n
using (select s.subnetid, s.nodeid
from subnets s
) x
on (x.nodeid = n.nodeid)
when matched then update set
n.subnetid = x.subnetid;
Any improvement?
By the way, did you create index on NODEID column in both tables?

Maintain WHERE Order In SQL Select

Is it possible to maintain the order of the WHERE clause when doing a SELECT for specific records?
For instance, given the following SELECT statement:
SELECT [RecSeq] FROM [MyData] WHERE
[RecSeq]=3 OR [RecSeq]=2 OR [RecSeq]=1 OR [RecSeq]=21 OR [RecSeq]=20 OR
[RecSeq]=19 OR [RecSeq]=110 OR [RecSeq]=109 OR [RecSeq]=108 OR
[RecSeq]=53 OR [RecSeq]=52 OR [RecSeq]=51;
I'd like the results to come back as:
3
2
1
21
20
19
110
109
108
53
53
51
However, what I get back isn't in any particular order. Currently I have a loop that calls the SELECT statement for each record required. This could range anywhere from 1 to 700,000 times. Needless to say the performance isn't the best.
Any solutions or am I stuck in the loop?
You need the ORDER BY FIELD clause.
SELECT RecSeq From MyData WHERE RecSeq IN (3, 2, 1, 21, 20, 19, 110, 109, 108, 53, 52, 51)
ORDER BY FIELD (RecSeq, 3, 2, 1, 21, 20, 19, 110, 109, 108, 53, 52, 51);
You don't say what database system you are using - I know this works in MySQL.
There is exactly one way to reliable enforce an ordering of the results of a sql statement: use an order by clause. I don't know if it is standard sql, but in oracle you could do something like this:
select ... from ...
where recseq in ( 3, 2, 1, 21, 20, 19, 110, 109, 108, 53, 53, 51)
order by decode(recseq 3,1, 2,2, 1,3, 21,4, 20,5, 19,6, 110,7, 109,8, 108,9, 53,10, 53,11, 51,12,13)
WHERE clause cannot specify your output order.
You will have to sort your results using an "order by".
If you absolutely need this order, try a 'pseudo-column' , or fake column with a union clause (performance warning here).
select 0 as my_fake_column, blah_columns from table where recseq = 3
UNION
select 1, blah_columns from table where recseq = 2
UNION
select 2, blah_columns from table where recseq = 1
UNION
select 3, blah_columns from table where recseq = 21
order by my_fake_column
The above will deliver the results in your specific order 3,2,1,21.
As the other poster said, adding a column could be an option.
You can use a derived table for filtering and sorting like this
SELECT t.RecSeq
FROM MyData t
JOIN (
SELECT 3, 1 UNION ALL
SELECT 2, 2 UNION ALL
SELECT 1, 3 UNION ALL
SELECT 21, 4 UNION ALL
SELECT 20, 5 UNION ALL
SELECT 19, 6
...
) f(RecSeq, SortKey)
ON t.RecSeq = f.RecSeq
ORDER BY f.SortKey
Ya there is a way, although, some might consider it a hack. Also, I want to point out that you can/should use the IN function instead of the giant conditional statement.
SELECT [RecSeq]
FROM [MyData]
WHERE [RecSeq] in (3,2,1,21,20,19,110,109,108,53,52,51)
ORDER BY DECODE (recseq 3,1, 2,2, 1,3, 21,4,......)
You could try using a UNION. Something like:
SELECT [RecSeq], 1 FROM [MyData] WHERE [RecSeq]=3
UNION
SELECT [RecSeq], 2 FROM [MyData] WHERE [RecSeq]=2
UNION
SELECT [RecSeq], 3 FROM [MyData] WHERE [RecSeq]=1
*etc...*
ORDER BY 2