Let's say I have the below SQL query:
with unused_cte as
(select 1 as one),
used_cte as
(select 2 as two)
select * from used_cte
I want to know your opinion on how I can automate the catching of unused CTE (the CTE which is not used in the SQL execution, for Example 'unused_cte' in this case) in the form of SQL assertion or dbt checks or any other suggestions.
If using dbt-core, you can add SQLFluff as your linter locally. Then, after setting it up, it will be able to catch the cases where you have a CTE inside a data model that is not being used.
See example below -> I have a model called stg_testing.sql that looks like the following:
with used_cte as (
select 1 as fake_id
),
unused_cte as (
select 2 as fake_id
),
final as (
select fake_id
from used_cte
)
select * from final
Since SQLFluff detects that the second CTE is not used, it will raise the following problem:
Query defines CTE (unused_cte) but does not use it. sqlfluff(L045) [Ln 5, Col 2]
It will not be run, as this simple test shows .
the first CTE is a division by zero, which should produce and error.
But as the unused cte isn't executed, the second valid will run.
Postgres
with cte_unused as (select 1/0 as val), cte_used as (select 'it works' as val)
select * from cte_used
| val |
| :------- |
| it works |
db<>fiddle here
MySQL
with cte_unused as (select 1/0 as val), cte_used as (select 'it works' as val)
select * from cte_used
| val |
| :------- |
| it works |
db<>fiddle here
Related
I have a DB column that has a comma delimited list:
VALUES ID
--------------------
1,11,32 A
11,12,28 B
1 C
32,12,1 D
When I run my SQL statement, in my WHERE clause I have tried IN, CONTAINS and LIKE with varying degrees of errors and success, but none offer an exact return of what I need.
What I need is a where clause that if I'm looking for all IDs with vale of '1' (NOT the number) in the list.
Example of problem:
WHERE values like (1)
This will return A,B,C,D because 1 is included in the value (11). I would expect IDs (A,C,D).
WHERE values like (2)
This will return A,B,D because 2 is included in the value (32,28,12). I would expect zeros records.
Thanks in advance for your help!
I will begin my answer by quoting the spot-on comment given by #Jarlh above:
Never, ever store data as comma separated items. It will only cause you lots of trouble.
That being said, if you're really stuck with this design, you could use:
SELECT *
FROM yourTable
WHERE ',' + [VALUES] + ',' LIKE '%,1,%';
The trick here is convert every VALUES into something looking like:
,11,12,28,
Then, we can search for a target number with comma delimiters on both sides. Since we placed commas at both ends, then every number in the CSV list is now guaranteed to have commas around it.
If you are stuck with such a poor data model, I would suggest:
select t.*
from t
where exists (select 1
from string_split(t.values, ',') s
where s.value = 1
);
Exactly i echo what jarlh and Tim says. relational model is not the right place to store comma delimited strings in table.
Here is an approach, that can likely use an index if there is one on column x
select distinct x
from t
cross apply string_split(t.x,',')
where value=1 /*out here you may parameterize, and also could make use of an index each if there is one in value*/
+---------+
| x |
+---------+
| 1 |
| 1,11,32 |
| 32,12,1 |
+---------+
working example
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=b9b3084f52b0f42ffd17d90427016999
--SQL Server older versions
with data
as (
SELECT t.c.value('.', 'VARCHAR(1000)') as val
,y
,x
FROM (
SELECT x1 = CAST('<t>' +
REPLACE(x , ',', '</t><t>') + '</t>' AS XML)
,y
,x
FROM t
) a
CROSS APPLY x1.nodes('/t') t(c)
)
select x,y
from data
+---------+
| x |
+---------+
| 1 |
| 1,11,32 |
| 32,12,1 |
+---------+
working example
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=011a096bbdd759ea5fe3aa74b08bc895
Having the following table in BigQuery database, where the f0_
Row | f0_
1 | {"configuration":[{"param1":"value1"},{"param2":[3.0,45]}]}
2 | {"configuration":[{"param1":"value2"},{"param2":[3.0,45]}]}
3 | {"configuration":[{"param1":"value1"},{"param2":[3.0,36]}]}
4 | {"configuration":[{"param1":"value1"},{"param2":[3.0,46]}]}
5 | {"configuration":[{"param1":"value1"},{"param2":[3.0,30]}]}
6 | {"configuration":[{"param1":"value1"}]}
f0_ column is a pure string.
Is there a way to write a select query, where the "param2" value is equal to [3.0, 45] array meaning it would only return rows 1 and 2? Preferably would be great to accomplish it without directly indexing the first element in the "configuration" array as the order might not be guaranteed.
Below is for BigQuery Standrad SQL
#standardSQL
SELECT line
FROM `project.dataset.table`
WHERE REGEXP_EXTRACT(JSON_EXTRACT(line, '$.configuration'), r'{"param2":(.*?)}') = '[3.0,45]'
You can test, play with above using sample data from your question as in example below
#standardSQL
WITH `project.dataset.table` AS (
SELECT '{"configuration":[{"param1":"value1"},{"param2":[3.0,45]}]}' line UNION ALL
SELECT '{"configuration":[{"param1":"value2"},{"param2":[3.0,45]}]}' UNION ALL
SELECT '{"configuration":[{"param1":"value1"},{"param2":[3.0,36]}]}' UNION ALL
SELECT '{"configuration":[{"param1":"value1"},{"param2":[3.0,46]}]}' UNION ALL
SELECT '{"configuration":[{"param1":"value1"},{"param2":[3.0,30]}]}' UNION ALL
SELECT '{"configuration":[{"param1":"value1"}]}'
)
SELECT line
FROM `project.dataset.table`
WHERE REGEXP_EXTRACT(JSON_EXTRACT(line, '$.configuration'), r'{"param2":(.*?)}') = '[3.0,45]'
with result
Row line
1 {"configuration":[{"param1":"value1"},{"param2":[3.0,45]}]}
2 {"configuration":[{"param1":"value2"},{"param2":[3.0,45]}]}
Preferably would be great to accomplish it without directly indexing the first element in the "configuration" array as the order might not be guaranteed.
Note: this solution does not depend on position of "param2" in the configuration array
You can use some of BQ's neat JSON functions as described here.
Based on that, you can locate param2 and check if its value matches what you're looking for. If you aren't sure of the configuration order, you can iterate through the array to find param2, but it's not particularly efficient. I recommend you try to find a way where param2 is always the second field in the array. I was able to get the correct results like so:
SELECT json_text AS correct_configurations
FROM UNNEST([
'{"configuration":[{"param1":"value1"},{"param2":[3.0,45]}]}',
'{"configuration":[{"param1":"value2"},{"param2":[3.0,45]}]}',
'{"configuration":[{"param1":"value1"},{"param2":[3.0,36]}]}',
'{"configuration":[{"param1":"value1"},{"param2":[3.0,46]}]}',
'{"configuration":[{"param1":"value1"},{"param2":[3.0,30]}]}',
'{"configuration":[{"param1":"value1"}]}'
])
AS json_text
WHERE JSON_EXTRACT(json_text, '$.configuration[1].param2') LIKE "[3.0,45]";
Gives a result of:
Row | correct_configurations
1 | {"configuration":[{"param1":"value1"},{"param2":[3.0,45]}]}
2 | {"configuration":[{"param1":"value2"},{"param2":[3.0,45]}]}
I'm starting out in BigQuery, with some experience in pSQl.
The #legacySQL query I'm running successfully is:
SELECT
FIRST(SPLIT(ewTerms, '/')) AS place,
NTH(2, SPLIT(ewTerms, '/')) AS divisor
FROM (SELECT ewTerms FROM account.free)
The string values in the 'ewTerms' column from the table 'free' are single digit fractions, such as "2/4", "3/5", etc. This #legacySQl query successfully creates two columns from 'ewTerms', reading:
Row place divisor
1 3 5
2 2 4
I need now to use this column creation in a WITH function, so I have to switch to using #standardSQL.
Can anyone tell me how I can call to the string's FIRST() and NTH() functions using #standardSQL? I've tried:
WITH prep AS(
SELECT
SPLIT(ewTerms, '/') AS split
FROM (SELECT ewTerms FROM accounts.free)
)
SELECT
split[SAFE_ORDINAL(1)] AS place,
split[SAFE_ORDINAL(2)] AS divisor
FROM prep
but this is wrong. Help anyone?
Your question is not clear about what is wrong. This query works for me:
#standardSQL
WITH Input AS (
SELECT '3/5' AS ewTerms UNION ALL
SELECT '2/4' AS ewTerms
), prep AS (
SELECT
SPLIT(ewTerms, '/') AS split
FROM Input
)
SELECT
split[SAFE_ORDINAL(1)] AS place,
split[SAFE_ORDINAL(2)] AS divisor
FROM prep;
The output is:
+-------+---------+
| place | divisor |
+-------+---------+
| 2 | 4 |
| 3 | 5 |
+-------+---------+
Using your original table, your query would be:
#standardSQL
WITH prep AS (
SELECT
SPLIT(ewTerms, '/') AS split
FROM accounts.free
)
SELECT
split[SAFE_ORDINAL(1)] AS place,
split[SAFE_ORDINAL(2)] AS divisor
FROM prep;
I currently try to retrieve the last decendet efficiently from a linked list like structure.
Essentially there's a table with a data series, with certain criteria I split it up to get a list like this
current_id | next_id
for example
1 | 2
2 | 3
3 | 4
4 | NULL
42 | 43
43 | 45
45 | NULL
etc...
would result in lists like
1 -> 2 -> 3 -> 4
and
42 -> 43 -> 45
Now I want to get the first and the last id from each of those lists.
This is what I have right now:
WITH RECURSIVE contract(ruid, rdid, rstart_ts, rend_ts) AS ( -- recursive Query to traverse the "linked list" of continuous timestamps
SELECT start_ts, end_ts FROM track_caps tc
UNION
SELECT c.rstart_ts, tc.end_ts AS end_ts0 FROM contract c INNER JOIN track_caps tc ON (tc.start_ts = c.rend_ts AND c.rend_ts IS NOT NULL AND tc.end_ts IS NOT NULL)
),
fcontract AS ( --final step, after traversing the "linked list", pick the largest timestamp found as the end_ts and the smallest as the start_ts
SELECT DISTINCT ON(start_ts, end_ts) min(rstart_ts) AS start_ts, rend_ts AS end_ts
FROM (
SELECT rstart_ts, max(rend_ts) AS rend_ts FROM contract
GROUP BY rstart_ts
) sq
GROUP BY end_ts
)
SELECT * FROM fcontract
ORDER BY start_ts
In this case I just used timestamps which work fine for the given data.
Basically I just use a recursive query that walks through all the nodes until it reaches the end, as suggested by many other posts on StackOverflow and other sites. The next query removes all the sub-steps and returns what I want, like in the first list example: 1 | 4
Just for illustration, the produced result set by the recursive query looks like this:
1 | 2
2 | 3
3 | 4
1 | 3
2 | 4
1 | 4
As nicely as it works, it's quite a memory hog however which is absolutely unsurprising when looking at the results of EXPLAIN ANALYZE.
For a dataset of roughly 42,600 rows, the recursive query produces a whopping 849,542,346 rows. Now it was actually supposed to process around 2,000,000 rows but with that solution right now it seems very unfeasible.
Did I just improperly use recursive queries? Is there a way to reduce the amount of data it produces?(like removing the sub-steps?)
Or are there better single-query solutions to this problem?
The main problem is that your recursive query doesn't properly filter the root nodes which is caused by the the model you have. So the non-recursive part already selects the entire table and then Postgres needs to recurse for each and every row of the table.
To make that more efficient only select the root nodes in the non-recursive part of your query. This can be done using:
select t1.current_id, t1.next_id, t1.current_id as root_id
from track_caps t1
where not exists (select *
from track_caps t2
where t2.next_id = t1.current_id)
Now that is still not very efficient (compared to the "usual" where parent_id is null design), but at least makes sure the recursion doesn't need to process more rows then necessary.
To find the root node of each tree, just select that as an extra column in the non-recursive part of the query and carry it over to each row in the recursive part.
So you wind up with something like this:
with recursive contract as (
select t1.current_id, t1.next_id, t1.current_id as root_id
from track_caps t1
where not exists (select *
from track_caps t2
where t2.next_id = t1.current_id)
union
select c.current_id, c.next_id, p.root_id
from track_caps c
join contract p on c.current_id = p.next_id
and c.next_id is not null
)
select *
from contract
order by current_id;
Online example: http://rextester.com/DOABC98823
I try to call to my DB and where is only one table:
id | value
----------
1 | 1|2|4
2 | 11|23
3 | 1|4|3|11
4 | 2|4|11
5 | 5|6|11
6 | 12|15|16
7 | 3|1|4
8 | 5|2|1
QUERY was : SELECT * FROM table_name WHERE value LIKE '%1%'
I want to select only rows with value 1 but I get rows with 11 value to.
How to show in SQL differences?
If you have to stick with this broken design, it's probably better to use Postgres' ability to parse a string into an array.
This is more robust than using a like condition:
select *
from the_table
where string_to_array(value,'|') #> array['1']
or maybe a bit easier to read
select *
from the_table
where '1' = any (string_to_array(value,'|'))
using the overlaps operator #> you can also search for more than one value at a time:
select *
from the_table
where string_to_array(value,'|') #> array['1','2']
will return all rows where value contains 1 and 2
SQLFiddle example: http://sqlfiddle.com/#!15/8793d/2
I strongly recommend that you should normalize your schema to every column store only atomic values.
Without it, you are forced to do some nasty trick, f.ex. with arrays:
select * from t
where '1' = any (string_to_array(value, '|'))
or, with pattern matching:
select * from t
where '1' similar to value
SQLFiddle