Database independent grouped count distinct "days ago" in Rails 6 - sql

I want to find out on how many different 24 hour intervals a word occurred (note that the actual application is different and more complicated, but this is equivalent and needs no context).
I could do that with an SQL query like this in Postgresql:
SELECT
word,
COUNT(DISTINCT FLOOR(EXTRACT(EPOCH FROM AGE(created_at)) / 86400)) AS num_distinct_days_with_occurrence
FROM word_occurrences
GROUP BY word
I even managed to write Rails code that generates this query:
WordOccurrence.select(:word).group(:word).distinct.count('FLOOR(EXTRACT(EPOCH FROM AGE(created_at)) / 86400)')
However, it doesn't work in Sqlite3 with the same code.
I guess if I put in some time, I could probably also find a way to make it work in Sqlite3, but I can't figure out how to get both to work. I saw the count API had more features in past rails versions, but I think even that wouldn't be enough given that my main problem is timestamp handling and not counting. Also, I couldn't figure out where these features went in Rails 6.
Is there a way to make this work for both adapters? And if not, what is the best way to handle such snippets of database dependent code and choose the right one?
Example data:
[
{ word: 'a', created_at: '2020-01-10 22:30' },
{ word: 'a', created_at: '2020-01-10 23:30' },
{ word: 'a', created_at: '2020-01-11 22:30' },
{ word: 'b', created_at: '2020-01-10 22:30' }
]
If I query this at 2020-04-08, I should get this result:
[
{ word: 'a', num_distinct_days_with_occurrence: 2 },
{ word: 'b', num_distinct_days_with_occurrence: 1 }
]

Related

How to use a function in select along with all the records in Sequalize?

Here is a Sequalize query below which retrieves a transformed value based on the table column value.
courses.findAll({
attributes: [ [sequelize.fn('to_char', sequelize.col('session_date'), 'Day'), 'days']]
});
The above sequlaize query will return result equal to as followed SQL query.
select to_char(bs.session_date, 'Day') as days from courses bs;
Expected output:
I want the transformed value which is in attributes along with all records like below. I know we can mention all the column names in attributes array but it is a tedious job. Any shortcut similar to asterisk in SQL query.
select to_char(bs.session_date, 'Day') as days,* from courses bs;
I tried the below sequalize query but no luck.
courses.findAll({
attributes: [ [sequelize.fn('to_char', sequelize.col('session_date'), 'Day'), 'days'],'*']
});
The attributes option can be passed an object as well as an array of fields for finer tuning in situations like this. It's briefly addressed in the documentation.
courses.findAll({
attributes: {
include: [
[ sequelize.fn('to_char', sequelize.col('session_date'), 'Day'), 'days' ]
]
}
});
By using include we're adding fields to the courses.* selection. Likewise we can also include an exclude parameter in the attributes object which will remove fields from the courses.* selection.
There is one shortcut to achieve the asterisk kind of selection in Sequalize. Which can be done as follows...
// To get all the column names in an array
let attributes = Object.keys(yourModel.rawAttributes);
courses.findAll({
attributes: [...attributes ,
[sequelize.fn('to_char', sequelize.col('session_date'), 'Day'), 'days']]
});
This is a work around there may be a different option.

SQL: loop through array entry to find correct value

I have a table with simple & complex entries:
id | formats | ...
1 [array]
2 [array]
...
I can select the rows I want based on some other columns.
The [array] is a list of complex entries
formats:
[{
format: "blah1"
hash_key: "hash_key1"
},
{
format: "blah2"
hash_key: "hash_key2"
},{
format: "correct"
hash_key: "hash_key3"
},
...
]
I need to loop through the the list of formats and if format=="correct" select the hash_key.
So I will return all of my rows with:
id1, hashkey
id2, hashkey
...
I don't know how this can be done in SQL. This would be easy with a while loop in C++ or Python, but I need to do it in SQL here.
I need to do this in Spanner SQL as this might matter. I can try any standard SQL answers.

Using Athena to get terminatingrule from rulegrouplist in AWS WAF logs

I followed these instructions to get my AWS WAF data into an Athena table.
I would like to query the data to find the latest requests with an action of BLOCK. This query works:
SELECT
from_unixtime(timestamp / 1000e0) AS date,
action,
httprequest.clientip AS ip,
httprequest.uri AS request,
httprequest.country as country,
terminatingruleid,
rulegrouplist
FROM waf_logs
WHERE action='BLOCK'
ORDER BY date DESC
LIMIT 100;
My issue is cleanly identifying the "terminatingrule" - the reason the request was blocked. As an example, a result has
terminatingrule = AWS-AWSManagedRulesCommonRuleSet
And
rulegrouplist = [
{
"nonterminatingmatchingrules": [],
"rulegroupid": "AWS#AWSManagedRulesAmazonIpReputationList",
"terminatingrule": "null",
"excludedrules": "null"
},
{
"nonterminatingmatchingrules": [],
"rulegroupid": "AWS#AWSManagedRulesKnownBadInputsRuleSet",
"terminatingrule": "null",
"excludedrules": "null"
},
{
"nonterminatingmatchingrules": [],
"rulegroupid": "AWS#AWSManagedRulesLinuxRuleSet",
"terminatingrule": "null",
"excludedrules": "null"
},
{
"nonterminatingmatchingrules": [],
"rulegroupid": "AWS#AWSManagedRulesCommonRuleSet",
"terminatingrule": {
"rulematchdetails": "null",
"action": "BLOCK",
"ruleid": "NoUserAgent_HEADER"
},
"excludedrules":"null"
}
]
The piece of data I would like separated into a column is rulegrouplist[terminatingrule].ruleid which has a value of NoUserAgent_HEADER
AWS provide useful information on querying nested Athena arrays, but I have been unable to get the result I want.
I have framed this as an AWS question but since Athena uses SQL queries, it's likely that anyone with good SQL skills could work this out.
It's not entirely clear to me exactly what you want, but I'm going to assume you are after the array element where terminatingrule is not "null" (I will also assume that if there are multiple you want the first).
The documentation you link to say that the type of the rulegrouplist column is array<string>. The reason why it is string and not a complex type is because there seems to be multiple different schemas for this column, one example being that the terminatingrule property is either the string "null", or a struct/object – something that can't be described using Athena's type system.
This is not a problem, however. When dealing with JSON there's a whole set of JSON functions that can be used. Here's one way to use json_extract combined with filter and element_at to remove array elements where the terminatingrule property is the string "null" and then pick the first of the remaining elements:
SELECT
element_at(
filter(
rulegrouplist,
rulegroup -> json_extract(rulegroup, '$.terminatingrule') <> CAST('null' AS JSON)
),
1
) AS first_non_null_terminatingrule
FROM waf_logs
WHERE action = 'BLOCK'
ORDER BY date DESC
You say you want the "latest", which to me is ambiguous and could mean both first non-null and last non-null element. The query above will return the first non-null element, and if you want the last you can change the second argument to element_at to -1 (Athena's array indexing starts from 1, and -1 is counting from the end).
To return the individual ruleid element of the json:
SELECT from_unixtime(timestamp / 1000e0) AS date, action, httprequest.clientip AS ip, httprequest.uri AS request, httprequest.country as country, terminatingruleid, json_extract(element_at(filter(rulegrouplist,rulegroup -> json_extract(rulegroup, '$.terminatingrule') <> CAST('null' AS JSON) ),1), '$.terminatingrule.ruleid') AS ruleid
FROM waf_logs
WHERE action='BLOCK'
ORDER BY date DESC
I had the same issue but the solution posted by Theo didn't work for me, even though the table was created according to the instructions linked to in the original post.
Here is what worked for me, which is basically the same as Theo's solution, but without the json conversion:
SELECT
from_unixtime(timestamp / 1000e0) AS date,
action,
httprequest.clientip AS ip,
httprequest.uri AS request,
httprequest.country as country,
terminatingruleid,
rulegrouplist,
element_at(filter(ruleGroupList, ruleGroup -> ruleGroup.terminatingRule IS NOT NULL),1).terminatingRule.ruleId AS ruleId
FROM waf_logs
WHERE action='BLOCK'
ORDER BY date DESC
LIMIT 100;

Select all Valid starting letters with Sequelize

I have a list of countries which will be separated by starting letter so for example when you click on 'A' it will make an API call to return all the countries beginning with 'A'.
However there are some letters that don't have any countries in our system, and these may change as we update out data.
I want to have a query that will let me know which letters do not have any countries that begin with them, so that I can disable them.
I can do this be running a findOne query for every single letter in the alphabet... but that is not neat or performant. Is there a way to get the data from a single query?
I am able to get the desired result by using a substring function within a distinct function.
const result = await Countries.findAll({
attributes: [
[
sequelize.fn(
'DISTINCT',
sequelize.fn('substring', sequelize.col('countryName'), 1, 1),
),
'letter',
],
],
group: [sequelize.fn('substring', sequelize.col('countryName'), 1, 1)],
raw: true,
})

django order by date in datetime / extract date from datetime

I have a model with a datetime field and I want to show the most viewed entries for the day today.
I thought I might try something like dt_published__date to extract the date from the datetime field but obviously it didn't work.
popular = Entry.objects.filter(type='A', is_public=True).order_by('-dt_published__date', '-views', '-dt_written', 'headline')[0:5]
How can I do this?
AFAIK the __date syntax is not supported yet by Django. There is a ticket open for this.
If your database has a function to extract date part then you can do this:
popular = Entry.objects.filter(**conditions).extra(select =
{'custom_dt': 'to_date(dt_published)'}).order_by('-custom_dt')
In the new Django, it should work out of the box [tested on 3.2] with Mysql 5.7
Dataset
[
{ "id": 82148, "paid_date": "2019-09-30 20:51:11"},
{ "id": 82315, "paid_date": "2019-09-30 00:00:00"},
]
Query
Payment.objects.filter(order_id=135342).order_by('paid_date__date', 'id').values_list('id', 'paid_date__date')
Results
`<QuerySet [(82148, datetime.date(2019, 9, 30)), (82315, datetime.date(2019, 9, 30))]>`