How could I delete sql functions format in awk? - sql

I've got a sql query which looks like this:
SELECT test1(func1(MYFIELD)),
test2(MAX(MYFIELD), LOWER("NOPE")),
test3(MAX(MYFIELD), 1234),
AVG(test1(test2(MYFIELD, func1(4)))),
func2(UPPER("stack"))
SUBSTR(MYFIELD, 2, 4),
test2(MIN(MYFIELD), SUBSTR(LOWER(UPPER("NOPE")), 1, 7)),
SUBSTR('func1(', 2, 4)
FROM MYTABLE;
Then I'm trying to remove all functions called:
test1
test2
test3
func1
func2
But preserving the AVG, MAX, UPPER, SUBSTR... and all native functions.
So the desired output would be:
SELECT MYFIELD,
MAX(MYFIELD),
MAX(MYFIELD),
AVG(MYFIELD),
UPPER("stack")
SUBSTR(MYFIELD, 2, 4),
MIN(MYFIELD)
SUBSTR('func1(', 2, 4)
FROM MYTABLE;
I want to remove the LOWER of the second line because, it is an argument of one of the functions to delete, in this case test2, which has two parameters. Then if we delete the function, we should delete its params as well.
I've tried to do it by this way in awk:
{
print gensub(/(test1|test2|func1|func2)\(/,"","gi", $0);
}
But the output doesn't have into account the right parentheses, it doesn't also delete the rest of parameters of the custom functions:
SELECT MYFIELD)),
MAX(MYFIELD), LOWER("NOPE")),
MAX(MYFIELD), 1234),
AVG(MYFIELD, 4)))),
UPPER("stack"))
SUBSTR(MYFIELD, 2, 4),
MIN(MYFIELD), SUBSTR(LOWER(UPPER("NOPE")), 1, 7)),
SUBSTR('', 2, 4)
FROM MYTABLE;
Any idea or clue to handle this situation?

you could just rename functions' names to built-in functionCOALESCE while keep the brakets ( ) and other params of users' functions.
It will produce the same result, not syntactically, but it will work the same UNLESS the built-in functions don't return NULL values. It will be much easier to achieve because you don't have to worry about brakets.
If file is an input you provide, then:
cat file | sed 's#\(test1\|test2\|func1\|func2\)(#COALESCE(#g'
will produce:
SELECT COALESCE(COALESCE(MYFIELD)),
COALESCE(MAX(MYFIELD), 4),
AVG(COALESCE(COALESCE(MYFIELD, COALESCE(4)))),
COALESCE(UPPER("stack"))
FROM MYTABLE;

Related

Writing Spark-SQL queries involving Complex types

I am new to Spark-SQL and trying to come to terms with it. We use Spark-SQL primarily for data transformations/manipulations for ETL. Recently, I have stumbled upon few SparkSQL functions for manipulating Spark Complex Types (array, map, struct etc). I came across certain functions such as array(), array_zip(), struct(), map_concat(), map_from_arrays() etc. I searched online but unfortunately, I could not get enough examples/explanations involving them for me to have a clear understanding.
Can anyone please provide some examples of its applications so that I can have a better understanding for me to be able to apply them in our project tasks.
Note: I only have access to Spark-SQL but Not PySpark-SQL.
Thanks
Please go through below explination,
array
array(expr, ...) - Returns an array with the given elements.
Examples:
SELECT array(1, 2, 3);
[1,2,3]
arrays_zip
arrays_zip(a1, a2, ...) - Returns a merged array of structs in which the N-th struct contains all N-th values of input arrays.
Examples:
SELECT arrays_zip(array(1, 2, 3), array(2, 3, 4));
[{"0":1,"1":2},{"0":2,"1":3},{"0":3,"1":4}]
SELECT arrays_zip(array(1, 2), array(2, 3), array(3, 4));
[{"0":1,"1":2,"2":3},{"0":2,"1":3,"2":4}]
struct
struct(col1, col2, col3, ...) - Creates a struct with the given field values.
map
map(key0, value0, key1, value1, ...) - Creates a map with the given key/value pairs.
Examples:
SELECT map(1.0, '2', 3.0, '4');
{1.0:"2",3.0:"4"}
map_concat
map_concat(map, ...) - Returns the union of all the given maps
Examples:
SELECT map_concat(map(1, 'a', 2, 'b'), map(2, 'c', 3, 'd'));
{1:"a",2:"c",3:"d"}
map_from_arrays
map_from_arrays(keys, values) - Creates a map with a pair of the given key/value arrays. All elements in keys should not be null
Examples:
SELECT map_from_arrays(array(1.0, 3.0), array('2', '4'));
{1.0:"2",3.0:"4"}
For more built in Spark functions, please go through below link,
https://spark.apache.org/docs/latest/api/sql/

Can I parse string to SQL code

I am trying to create a materialized view that requires slightly different filters between prod, dev, qa.
We have a variables table that stores random ids and I'm trying to find a way to store something like this in my variables table:
prod_filter_values = "(D.DEFID = 123 AND D.ATTRID IN (2, 3, 4)) OR
(D.DEFID = 3112 AND D.ATTRID IN (3, 30, 34, 23, 4)) OR
(D.DEFID = 379 AND D.ATTRID IN (3, 5, 8)) OR
(D.DEFID = 3076 AND D.ATTRID = 5);"
Then I'd do something like select * from variables_table where EVAL(prod_filter_values)
Is it possible?
Yes you can as other answers have explained. However a better way would be to have this data driven - simply create tables in your various environments that have the corresponding magic numbers and join to that as required.
A second way is to have different views for the different environments with the numbers hard-coded there.
Anything that avoids building strings is going to be better for several reasons including having code in one place, stable code, no security/injection problems, no parse overhead.
Yes. Lookup dynamic SQL:
https://docs.oracle.com/cloud/latest/db112/LNPLS/dynamic.htm#LNPLS01102
something like this:
EXECUTE IMMEDIATE 'select * from vars_table where ' || prod_filter_values;

Extracting Values from Array in Redshift SQL

I have some arrays stored in Redshift table "transactions" in the following format:
id, total, breakdown
1, 100, [50,50]
2, 200, [150,50]
3, 125, [15, 110]
...
n, 10000, [100,900]
Since this format is useless to me, I need to do some processing on this to get the values out. I've tried using regex to extract it.
SELECT regexp_substr(breakdown, '\[([0-9]+),([0-9]+)\]')
FROM transactions
but I get an error returned that says
Unmatched ( or \(
Detail:
-----------------------------------------------
error: Unmatched ( or \(
code: 8002
context: T_regexp_init
query: 8946413
location: funcs_expr.cpp:130
process: query3_40 [pid=17533]
--------------------------------------------
Ideally I would like to get x and y as their own columns so I can do the appropriate math. I know I can do this fairly easy in python or PHP or the like, but I'm interested in a pure SQL solution - partially because I'm using an online SQL editor (Mode Analytics) to plot it easily as a dashboard.
Thanks for your help!
If breakdown really is an array you can do this:
select id, total, breakdown[1] as x, breakdown[2] as y
from transactions;
If breakdown is not an array but e.g. a varchar column, you can cast it into an array if you replace the square brackets with curly braces:
select id, total,
(translate(breakdown, '[]', '{}')::integer[])[1] as x,
(translate(breakdown, '[]', '{}')::integer[])[2] as y
from transactions;
You can try this :
SELECT REPLACE(SPLIT_PART(breakdown,',',1),'[','') as x,REPLACE(SPLIT_PART(breakdown,',',2),']','') as y FROM transactions;
I tried this with redshift db and this worked for me.
Detailed Explanation:
SPLIT_PART(breakdown,',',1) will give you [50.
SPLIT_PART(breakdown,',',2) will give you 50].
REPLACE(SPLIT_PART(breakdown,',',1),'[','') will replace the [ and will give just 50.
REPLACE(SPLIT_PART(breakdown,',',2),']','') will replace the ] and will give just 50.
Know its an old post.But if someone needs a much easier way
select json_extract_array_element_text('[100,101,102]', 2);
output : 102

SQL server query on json string for stats

I have this SQL Server database that holds contest participations. In the Participation table, I have various fields and a special one called ParticipationDetails. It's a varchar(MAX). This field is used to throw in all contest specific data in json format. Example rows:
Id,ParticipationDetails
1,"{'Phone evening': '6546546541', 'Store': 'StoreABC', 'Math': '2', 'Age': '01/01/1951'}"
2,"{'Phone evening': '6546546542', 'Store': 'StoreABC', 'Math': '2', 'Age': '01/01/1952'}"
3,"{'Phone evening': '6546546543', 'Store': 'StoreXYZ', 'Math': '2', 'Age': '01/01/1953'}"
4,"{'Phone evening': '6546546544', 'Store': 'StoreABC', 'Math': '3', 'Age': '01/01/1954'}"
I'm trying to get a a query runing, that will yield this result:
Store, Count
StoreABC, 3
StoreXYZ, 1
I used to run this query:
SELECT TOP (20) ParticipationDetails, COUNT(*) Count FROM Participation GROUP BY ParticipationDetails ORDER BY Count DESC
This works as long as I want unique ParticipationDetails. How can I change this to "sub-query" into my json strings. I've gotten to this query, but I'm kind of stuck here:
SELECT 'StoreABC' Store, Count(*) Count FROM Participation WHERE ParticipationDetails LIKE '%StoreABC%'
This query gets me the results I want for a specific store, but I want the store value to be "anything that was put in there".
Thanks for the help!
first of all, I suggest to avoid any json management with t-sql, since is not natively supported. If you have an application layer, let it to manage those kind of formatted data (i.e. .net framework and non MS frameworks have json serializers available).
However, you can convert your json strings using the function described in this link.
You can also write your own query which works with strings. Something like the following one:
SELECT
T.Store,
COUNT(*) AS [Count]
FROM
(
SELECT
STUFF(
STUFF(ParticipationDetails, 1, CHARINDEX('"Store"', ParticipationDetails) + 9, ''),
CHARINDEX('"Math"',
STUFF(ParticipationDetails, 1, CHARINDEX('"Store"', ParticipationDetails) + 9, '')) - 3, LEN(STUFF(ParticipationDetails, 1, CHARINDEX('"Store"', ParticipationDetails) + 9, '')), '')
AS Store
FROM
Participation
) AS T
GROUP BY
T.Store

Access/jet equivalent of Oracle's decode

Is there an equivalent for Oracle's decode() in Access (or Jet, for that matter).
The problem I am facing is: I should sort (order) a resultset based basically upon
a status and a date (with all records having status = 2) at the end.
In Oracle I'd go something like
select
...
from
...
where
..
order by
decode(status, 2, 0, 1),
date_column
The closest analogy is the SWITCH() function e.g.
Oracle:
SELECT supplier_name,
decode(supplier_id, 10000, 'IBM',
10001, 'Microsoft',
10002, 'Hewlett Packard',
'Gateway') result
FROM suppliers;
Access Database Engine
SELECT supplier_name,
SWITCH(supplier_id = 10000, 'IBM',
supplier_id = 10001, 'Microsoft',
supplier_id = 10002, 'Hewlett Packard',
TRUE, 'Gateway') AS result
FROM suppliers;
Note that with the SWITCH() function you have to supply the full predicate each time, so you are not restricted to using just supplier_id. For the default value, use a predicate that is obvious to the human reader that it is TRUE e.g. 1 = 1 or indeed simply TRUE :)
Something that may not be obvious is that the logic in the SWITCH() function doesn't short circuit, meaning that every expression in the function must be able to be evaluated without error. If you require logic to short circuit then you will need to use nested IIF() functions.
You can try with IIF. See this stackoverflow question.
I think it might compare to switch or choose.
Switch(expr-1, value-1[, expr-2, value-2 … [, expr-n,value-n]])
-- http://office.microsoft.com/en-us/access/HA012289181033.aspx
Choose(index, choice-1[, choice-2, ... [, choice-n]])
-- http://msdn.microsoft.com/en-us/library/aa262690%28VS.60%29.aspx
You can use the SWITCH function:
LABEL: Switch(
[TABLE_NAME]![COL_NAME]='VAL1';'NEW_VAL1';
[TABLE_NAME]![COL_NAME]='VAL2';'NEW_VAL2';
)
Note semicolons and not commas.
The example above works in queries in MS Access 2010.