How to get part of the String before last delimiter in AWS Athena - sql

Suppose I have the following table in AWS Athena
+----------------+
| Thread |
+----------------+
| poll-23 |
| poll-34 |
| pool-thread-24 |
| spartan.error |
+----------------+
I need to extract the part of the string from columns before last delimiter(Here '-' is delimiter)
Basically need a query which can give me output as
+----------------+
| Thread |
+----------------+
| poll |
| poll |
| pool-thread |
| spartan.error |
+----------------+
Also i need a group by query which ca generate this
+---------------+-------+
| Thread | Count |
+---------------+-------+
| poll | 2 |
| pool-thread | 1 |
| spartan.error | 1 |
+---------------+-------+
I tried various forms of MySql queries using LEFT(), RIGHT(), LOCATE(), SUBSTRING_INDEX() functions but it seems that athena does not support all these functions.

You could use regexp_replace() to remove the part of the string that follows the last '-':
select regexp_replace(thread, '-[^-]*$', ''), count(*)
from mytable
group by regexp_replace(thread, '-[^-]*$', '')

Related

Postgresql query substract from one table

I have a one tables in Postgresql and cannot find how to build a query.
The table contains columns nr_serii and deleteing_time. I trying to count nr_serii and substract from this positions with deleting_time.
My query:
select nr_serii , count(nr_serii ) as ilosc,count(deleting_time) as ilosc_delete
from MyTable
group by nr_serii, deleting_time
output is:
+--------------------+
| "666666";1;1 |
| "456456";1;0 |
| "333333";3;0 |
| "333333";1;1 |
| "111111";1;1 |
| "111111";3;0 |
+--------------------+
The part of table with raw data:
+--------------------------------+
| "666666";"2020-11-20 14:08:13" |
| "456456";"" |
| "333333";"" |
| "333333";"" |
| "333333";"" |
| "333333";"2020-11-20 14:02:23" |
| "111111";"" |
| "111111";"" |
| "111111";"2020-11-20 14:08:04" |
| "111111";"" |
+--------------------------------+
And i need substract column ilosc and column ilosc_delete
example:
nr_serii:333333 ilosc:3-1=2
Expected output:
+-------------+
| "666666";-1 |
| "456456";1 |
| "333333";2 |
| "111111";2 |
| ... |
+-------------+
I think this is very simple solution for this but i have empty in my head.
I see what you want now. You want to subtract the number where deleting_time is not null from the ones where it is null:
select nr_serii,
count(*) filter (where deleting_time is null) - count(deleting_time) as ilosc_delete
from MyTable
group by nr_serii;
Here is a db<>fiddle.

SQL - Given sequence of data, how do I query the origin?

Let's assume we have the following data.
| UUID | SEENTIME | LAST_SEENTIME |
------------------------------------------------------
| UUID1 | 2020-11-10T05:00:00 | |
| UUID2 | 2020-11-10T05:01:00 | 2020-11-10T05:00:00 |
| UUID3 | 2020-11-10T05:03:00 | 2020-11-10T05:01:00 |
| UUID4 | 2020-11-10T05:04:00 | 2020-11-10T05:03:00 |
| UUID5 | 2020-11-10T05:07:00 | 2020-11-10T05:04:00 |
| UUID6 | 2020-11-10T05:08:00 | 2020-11-10T05:07:00 |
Each data is connected to each other via LAST_SEENTIME.
In such case, is there a way to use SQL to identify these connected events as one? I want to be able to calculate start and end to calculate the duration of this event.
You can use a recursive CTE. The exact syntax varies by database, but something like this:
with recursive cte as
select uuid as orig_uuid, uuid, seentime
from t
where last_seentime is null
union all
select cte.orig_uuid, t.uuid, t.seentime
from cte join
t
on cte.seentime = t.last_seentime
)
select orig_uuid,
max(seentime) - min(seentime) -- or whatever your database uses
from cte
group by orig_uuid;

Query from secondary index on aerospike

I'm considering aerospike for one of our projects. So I currently created a 3 node cluster and loaded some data on it.
Sample data
ns: imei
set: imei_data
+-------------------+-----------------------+-----------------------+----------------------------+--------------+--------------+
| imsi | fcheck | lcheck | msc | fcheck_epoch | lcheck_epoch |
+-------------------+-----------------------+-----------------------+----------------------------+--------------+--------------+
| "413010324064956" | "2017-03-01 14:30:26" | "2017-03-01 14:35:30" | "13d20b080011044917004100" | 1488358826 | 1488359130 |
| "413012628090023" | "2016-09-21 10:06:49" | "2017-09-16 13:54:40" | "13dc0b080011044917006100" | 1474432609 | 1505550280 |
| "413010130130320" | "2016-12-29 22:05:07" | "2017-10-09 16:17:10" | "13d20b080011044917003100" | 1483029307 | 1507546030 |
| "413011330114274" | "2016-09-06 01:48:06" | "2017-10-09 11:53:41" | "13d20b080011044917003100" | 1473106686 | 1507530221 |
| "413012629781993" | "2017-08-16 16:03:01" | "2017-09-13 18:10:48" | "13dc0b080011044917004100" | 1502879581 | 1505306448 |
Then I created a secondary index on lcheck_epoch using AQL since I want to query based on date.
create index idx_lcheck on imei.imei_data (lcheck_epoch) NUMERIC
+--------+----------------+-----------+-------------+-------+--------------+----------------+-----------+
| ns | bin | indextype | set | state | indexname | path | type |
+--------+----------------+-----------+-------------+-------+--------------+----------------+-----------+
| "imei" | "lcheck_epoch" | "NONE" | "imei_data" | "RW" | "idx_lcheck" | "lcheck_epoch" | "NUMERIC" |
+--------+----------------+-----------+-------------+-------+--------------+----------------+-----------+
When I execute
select imsi from imei.imei_data where idx_lcheck=1476165806
I'm getting
Error: (204) AEROSPIKE_ERR_INDEX
Please explain.
You're using the index name, not the bin name, in your query. Try this:
SELECT imsi FROM imei.imei_data WHERE lcheck_epoch=1476165806
Or
SELECT imsi FROM imei.imei_data WHERE lcheck_epoch BETWEEN 1490000000 AND 1510000000
Just a note, you can do much more complex queries using predicate filtering through several of the language clients (Java, C, C#, Go). For example the PredExp class of the Java client (see examples.)

HiveQl: extract based on a string

I have the following table:
ID | Keyword | Date
87NB | skill,love,hate,funny,very funny | 02/19/2004
27YV | funny,tiger,movie,king | 08/10/2014
92JK | sun,light,funny,baby | 06/27/2015
65TH | moon,cow,bird,car | 04/22/2017
From the above table, i want to obtain ID's of everyone who have "funny" as a keyword. The result would be
ID
87NB
27YV
92JK
you can use split and then the function array_contains
select ID from yourtable where array_contains(split(Keyword, ","), "funny");
select ID
from t
where find_in_set('funny',Keyword) > 0
;
+------+
| id |
+------+
| 87NB |
+------+
| 27YV |
+------+
| 92JK |
+------+

Grouped string aggregation / LISTAGG for SQL Server

I'm sure this has been asked but I can't quite find the right search terms.
Given a schema like this:
| CarMakeID | CarMake
------------------------
| 1 | SuperCars
| 2 | MehCars
| CarMakeID | CarModelID | CarModel
-----------------------------------------
| 1 | 1 | Zoom
| 2 | 1 | Wow
| 3 | 1 | Awesome
| 4 | 2 | Mediocrity
| 5 | 2 | YoureSettling
I want to produce a dataset like this:
| CarMakeID | CarMake | CarModels
---------------------------------------------
| 1 | SuperCars | Zoom, Wow, Awesome
| 2 | MehCars | Mediocrity, YoureSettling
What do I do in place of 'AGG' for strings in SQL Server in the following style query?
SELECT *,
(SELECT AGG(CarModel)
FROM CarModels model
WHERE model.CarMakeID = make.CarMakeID
GROUP BY make.CarMakeID) as CarMakes
FROM CarMakes make
http://www.simple-talk.com/sql/t-sql-programming/concatenating-row-values-in-transact-sql/
It is an interesting problem in Transact SQL, for which there are a number of solutions and considerable debate. How do you go about producing a summary result in which a distinguishing column from each row in each particular category is listed in a 'aggregate' column? A simple, and intuitive way of displaying data is surprisingly difficult to achieve. Anith Sen gives a summary of different ways, and offers words of caution over the one you choose...
If it is SQL Server 2017 or SQL Server VNext, Azure SQL database you can use String_agg as below:
SELECT make.CarMakeId, make.CarMake,
CarModels = string_agg(model.CarModel, ', ')
FROM CarModels model
INNER JOIN CarMakes make
ON model.CarMakeId = make.CarMakeId
GROUP BY make.CarMakeId, make.CarMake
Output:
+-----------+-----------+---------------------------+
| CarMakeId | CarMake | CarModels |
+-----------+-----------+---------------------------+
| 1 | SuperCars | Zoom, Wow, Awesome |
| 2 | MehCars | Mediocrity, YoureSettling |
+-----------+-----------+---------------------------+