Select all Valid starting letters with Sequelize - sql

I have a list of countries which will be separated by starting letter so for example when you click on 'A' it will make an API call to return all the countries beginning with 'A'.
However there are some letters that don't have any countries in our system, and these may change as we update out data.
I want to have a query that will let me know which letters do not have any countries that begin with them, so that I can disable them.
I can do this be running a findOne query for every single letter in the alphabet... but that is not neat or performant. Is there a way to get the data from a single query?

I am able to get the desired result by using a substring function within a distinct function.
const result = await Countries.findAll({
attributes: [
[
sequelize.fn(
'DISTINCT',
sequelize.fn('substring', sequelize.col('countryName'), 1, 1),
),
'letter',
],
],
group: [sequelize.fn('substring', sequelize.col('countryName'), 1, 1)],
raw: true,
})

Related

How to use a function in select along with all the records in Sequalize?

Here is a Sequalize query below which retrieves a transformed value based on the table column value.
courses.findAll({
attributes: [ [sequelize.fn('to_char', sequelize.col('session_date'), 'Day'), 'days']]
});
The above sequlaize query will return result equal to as followed SQL query.
select to_char(bs.session_date, 'Day') as days from courses bs;
Expected output:
I want the transformed value which is in attributes along with all records like below. I know we can mention all the column names in attributes array but it is a tedious job. Any shortcut similar to asterisk in SQL query.
select to_char(bs.session_date, 'Day') as days,* from courses bs;
I tried the below sequalize query but no luck.
courses.findAll({
attributes: [ [sequelize.fn('to_char', sequelize.col('session_date'), 'Day'), 'days'],'*']
});
The attributes option can be passed an object as well as an array of fields for finer tuning in situations like this. It's briefly addressed in the documentation.
courses.findAll({
attributes: {
include: [
[ sequelize.fn('to_char', sequelize.col('session_date'), 'Day'), 'days' ]
]
}
});
By using include we're adding fields to the courses.* selection. Likewise we can also include an exclude parameter in the attributes object which will remove fields from the courses.* selection.
There is one shortcut to achieve the asterisk kind of selection in Sequalize. Which can be done as follows...
// To get all the column names in an array
let attributes = Object.keys(yourModel.rawAttributes);
courses.findAll({
attributes: [...attributes ,
[sequelize.fn('to_char', sequelize.col('session_date'), 'Day'), 'days']]
});
This is a work around there may be a different option.

Using Athena to get terminatingrule from rulegrouplist in AWS WAF logs

I followed these instructions to get my AWS WAF data into an Athena table.
I would like to query the data to find the latest requests with an action of BLOCK. This query works:
SELECT
from_unixtime(timestamp / 1000e0) AS date,
action,
httprequest.clientip AS ip,
httprequest.uri AS request,
httprequest.country as country,
terminatingruleid,
rulegrouplist
FROM waf_logs
WHERE action='BLOCK'
ORDER BY date DESC
LIMIT 100;
My issue is cleanly identifying the "terminatingrule" - the reason the request was blocked. As an example, a result has
terminatingrule = AWS-AWSManagedRulesCommonRuleSet
And
rulegrouplist = [
{
"nonterminatingmatchingrules": [],
"rulegroupid": "AWS#AWSManagedRulesAmazonIpReputationList",
"terminatingrule": "null",
"excludedrules": "null"
},
{
"nonterminatingmatchingrules": [],
"rulegroupid": "AWS#AWSManagedRulesKnownBadInputsRuleSet",
"terminatingrule": "null",
"excludedrules": "null"
},
{
"nonterminatingmatchingrules": [],
"rulegroupid": "AWS#AWSManagedRulesLinuxRuleSet",
"terminatingrule": "null",
"excludedrules": "null"
},
{
"nonterminatingmatchingrules": [],
"rulegroupid": "AWS#AWSManagedRulesCommonRuleSet",
"terminatingrule": {
"rulematchdetails": "null",
"action": "BLOCK",
"ruleid": "NoUserAgent_HEADER"
},
"excludedrules":"null"
}
]
The piece of data I would like separated into a column is rulegrouplist[terminatingrule].ruleid which has a value of NoUserAgent_HEADER
AWS provide useful information on querying nested Athena arrays, but I have been unable to get the result I want.
I have framed this as an AWS question but since Athena uses SQL queries, it's likely that anyone with good SQL skills could work this out.
It's not entirely clear to me exactly what you want, but I'm going to assume you are after the array element where terminatingrule is not "null" (I will also assume that if there are multiple you want the first).
The documentation you link to say that the type of the rulegrouplist column is array<string>. The reason why it is string and not a complex type is because there seems to be multiple different schemas for this column, one example being that the terminatingrule property is either the string "null", or a struct/object – something that can't be described using Athena's type system.
This is not a problem, however. When dealing with JSON there's a whole set of JSON functions that can be used. Here's one way to use json_extract combined with filter and element_at to remove array elements where the terminatingrule property is the string "null" and then pick the first of the remaining elements:
SELECT
element_at(
filter(
rulegrouplist,
rulegroup -> json_extract(rulegroup, '$.terminatingrule') <> CAST('null' AS JSON)
),
1
) AS first_non_null_terminatingrule
FROM waf_logs
WHERE action = 'BLOCK'
ORDER BY date DESC
You say you want the "latest", which to me is ambiguous and could mean both first non-null and last non-null element. The query above will return the first non-null element, and if you want the last you can change the second argument to element_at to -1 (Athena's array indexing starts from 1, and -1 is counting from the end).
To return the individual ruleid element of the json:
SELECT from_unixtime(timestamp / 1000e0) AS date, action, httprequest.clientip AS ip, httprequest.uri AS request, httprequest.country as country, terminatingruleid, json_extract(element_at(filter(rulegrouplist,rulegroup -> json_extract(rulegroup, '$.terminatingrule') <> CAST('null' AS JSON) ),1), '$.terminatingrule.ruleid') AS ruleid
FROM waf_logs
WHERE action='BLOCK'
ORDER BY date DESC
I had the same issue but the solution posted by Theo didn't work for me, even though the table was created according to the instructions linked to in the original post.
Here is what worked for me, which is basically the same as Theo's solution, but without the json conversion:
SELECT
from_unixtime(timestamp / 1000e0) AS date,
action,
httprequest.clientip AS ip,
httprequest.uri AS request,
httprequest.country as country,
terminatingruleid,
rulegrouplist,
element_at(filter(ruleGroupList, ruleGroup -> ruleGroup.terminatingRule IS NOT NULL),1).terminatingRule.ruleId AS ruleId
FROM waf_logs
WHERE action='BLOCK'
ORDER BY date DESC
LIMIT 100;

postgres jsonb update key value in array

I have a table with a jsonb column with data from one row like
[
{
"a": [],
"c_id": 624,
"ps": [{"": 0, "pr": "73", "f": "M", "s": "M"}],
"g_n": "K L Mish",
"g_num": 1
},
{
"a": [],
"c_id": 719,
"ps": [{"": 0, "pr": "65433", "f": "R", "s": "W"}],
"g_n": "S H Star",
"g_num": 2
},
]
I want to update c_id in the table wherever it is 719 to 720.
How can I do it?
I am using Postgres 12.1
If it is only one single occurrence, you could do it using a Regular Expression:
Click: demo:db<>fiddle
UPDATE mytable
SET mydata = s.result::jsonb
FROM (
SELECT
regexp_replace(mydata::text, '(.*)("c_id"\s*:\s*)(719)(.*)','\1\2720\4') AS result
FROM
mytable
) s;
RegExp Groups:
(.*) All characters before the relevant key
("c_id"\s*:\s*) The relevant key incl. possible spaces
(719) The relevant value to be replaced
(.*) Everything after the relevant point
With \1\2720\4 you put the first two groups together, followed by the new value (instead of group 3) and the fourth group.
Disclaimer:
I fully agree with #a_horse_with_no_name: You should think about storing all values in separate and normalized tables/columns. You would gain a lot of benefits (much better search and update handling, indexing, performance, ...). If you need this JSON output, just handle it as output: Generate it when you need it, do not store it. Maybe a view could help a lot

Get JSON_VALUE with Oracle SQL when multiple nodes share the same name

I have an issue where I have some JSON stored in my oracle database, and I need to extract values from it.
The problem is, there are some fields that are duplicated.
When I try this, it works as there is only one firstname key in the options array:
SELECT
JSON_VALUE('{"increment_id":"2500000043","item_id":"845768","options":[{"firstname":"Kevin"},{"lastname":"Test"}]}', '$.options.firstname') AS value
FROM DUAL;
Which returns 'Kevin'.
However, when there are two values for the firstname field:
SELECT JSON_VALUE('{"increment_id":"2500000043","item_id":"845768","options":[{"firstname":"Kevin"},{"firstname":"Okay"},{"lastname":"Test"}]}', '$.options.firstname') AS value
FROM DUAL;
It only returns NULL.
Is there any way to select the first occurence of 'firstname' in this context?
JSON_VALUE returns one SQL VALUE from the JSON data (or SQL NULL if the key does not exists).
If you have a collection of values (a JSON array) an you want one specific item of the array you use array subscripts (square brackets) like in JavaScript, for example [2] to select the third item. [0] selects the first item.
To get the first array item in your example you have to change the path expression from '$.options.firstname' to '$.options[0].firstname'
You can follow this query:-
SELECT JSON_VALUE('{
"increment_id": "2500000043",
"item_id": "845768",
"options": [
{
"firstname": "Kevin"
},
{
"firstname": "Okay"
},
{
"lastname": "Test"
}
]
}', '$.options[0].firstname') AS value
FROM DUAL;

bigquery joins on nested repeated

I am having trouble joining on a repeated nested field while still preserving the original row structure in BigQuery.
For my example I'll call the two tables being joined A and B.
Records in table A look something like:
{
"url":"some url",
"repeated_nested": [
{"key":"some key","property":"some property"}
]
}
and records in table B look something like:
{
"key":"some key",
"property2": "another property"
}
I am hoping to find a way to join this data together to generate a row that looks like:
{
"url":"some url",
"repeated_nested": [
{
"key":"some key",
"property":"some property",
"property2":"another property"
}
]
}
The very first query I tried was:
SELECT
url, repeated_nested.key, repeated_nested.property, repeated_nested.property2
FROM A
AS lefttable
LEFT OUTER JOIN B
AS righttable
ON lefttable.key=righttable.key
This doesn't work because BQ can't join on repeated nested fields. There is not a unique identifier for each row. If I were to do a FLATTEN on repeated_nested then I'm not sure how to get the original row put back together correctly.
The data is such that a url will always have the same repeated_nested field with it. Because of that, I was able to make a workaround using a UDF to sort of roll up this repeated nested object into a JSON string and then unroll it again:
SELECT url, repeated_nested.key, repeated_nested.property, repeated_nested.property2
FROM
JS(
(
SELECT basetable.url as url, repeated_nested
FROM A as basetable
LEFT JOIN (
SELECT url, CONCAT("[", GROUP_CONCAT_UNQUOTED(repeated_nested_json, ","), "]") as repeated_nested
FROM
(
SELECT
url,
CONCAT(
'{"key": "', repeated_nested.key, '",',
' "property": "', repeated_nested.property, '",',
' "property2": "', mapping_table.property2, '"',
'}'
)
) as repeated_nested_json
FROM (
SELECT
url, repeated_nested.key, repeated_nested.property
FROM A
GROUP BY url, repeated_nested.key, repeated_nested.property
) as urltable
LEFT OUTER JOIN [SDF.alchemy_to_ric]
AS mapping_table
ON urltable.repeated_nested.key=mapping_table.key
)
GROUP BY url
) as companytable
ON basetable.url = urltable.url
),
// input columns:
url, repeated_nested_json,
// output schema:
"[{'name': 'url', 'type': 'string'},
{'name': 'repeated_nested_json', 'type': 'RECORD', 'mode':'REPEATED', 'fields':
[ { 'name': 'key', 'type':'string' },
{ 'name': 'property', 'type':'string' },
{ 'name': 'property2', 'type':'string' }]
}]",
// UDF:
"function(row, emit) {
parsed_repeated_nested = [];
try {
if ( row.repeated_nested_json != null ) {
parsed_repeated_nested = JSON.parse(row.repeated_nested_json);
}
} catch (ex) { }
emit({
url: row.url,
repeated_nested: parsed_repeated_nested
});
}"
)
This solution works fine for small tables. But the real life tables I'm working with have many more columns than in my example above. When there are other fields in addition to url and repeated_nested_json they all have to be passed through the UDF. When I work with tables that are around the 50 gb range everything is fine. But when I apply the UDF and query to tables that are 500-1000 gb, I get an Internal Server Error from BQ.
In the end I just need all of the data in new line delimited JSON format in GCS. As a last ditch effort I tried concatenating all of the fields into a JSON string (so that I only had 1 column) in the hopes that I could export it as CSV and have what I need. However, the export process escaped the double quotes and adds double quotes around the JSON string. According to the BQ docs on jobs (https://cloud.google.com/bigquery/docs/reference/v2/jobs) there is a property configuration.query.tableDefinitions.(key).csvOptions.quote that could help me. But I can't figure out how to make it work.
Does anybody have advice on how they have dealt with this sort of situation?
I have never had to do this, but you should be able to use flatten, then join, then use nest to get repeated fields again.
The docs state that BigQuery always flattens query results, but that appears to be false: you can choose to not have results flattened if you set a destination table. You should then be able to export that table as JSON to Storage.
See also this answer for how to get nest to work.
#AndrewBackes - we rolled out some fixes for UDF memory-related issues this week; there are some details on the root cause here https://stackoverflow.com/a/36275462/5265394 and here https://stackoverflow.com/a/35563562/5265394.
The UDF version of your query is now working for me; could you verify on your side?