Splunk query - Total or Count by field - splunk

I am working with event logs which contain many fields. I am trying to isolate 1 field and get a count of the value of that field and display the count in an existing table as a new field
This is my log:
LOG_LEVEL="INFO" MESSAGE="Type_of_Call = Sample Call LOB = F DateTime_Stamp = 2022-10-10T21:10:53.900129 Policy_Number = 12-AB-1234-5 Requester_Id = A1231301 Last_Name = SAMPLE State = IL City = Chicago Zip 12345" APPLICATION_VERSION="appVersion_IS_UNDEFINED"
This is my splunk query:
| stats count, values(*) as * by Requester_Id
| table Type_of_Call LOB DateTime_Stamp Policy_Number Requester_Id Last_Name State City Zip
The issue that this query has is that it is grouping the Requester Id field into 1 row and not displaying the count at all.
This is what the table and the issue look like :
What I want is that I need to make the rows unique and display the count of the Requester Id in a new field.
For example: if there are 2 logs with the same Requester_Id with value "abc", I would still display those two logs separately in a table because it would have other fields different such as the date and time but I would like to display the count of the Requester_Id as 2 in a new field in the same table.
updated picture of the total:

Add the count field to the table command.
To get the total count at the end, use the addcoltotals command.
| table Type_of_Call LOB DateTime_Stamp Policy_Number Requester_Id Last_Name State City Zip count
| addcoltotals labelfield=Type_of_Call label="Total Events" count

Related

Splunk - Displaying addcoltotals into its own column

I have a report where I am working with event logs. I have created a table with fields that are extracted from the event logs.
This is my splunk query:
| stats count as Total_by_Requester values(*) as * by Requester_Id
| table Type_of_Call LOB DateTime_Stamp Policy_Number Requester_Id Last_Name State City Zip Total_by_Requester
| addcoltotals labelfield=Type_of_Call label="Grand Total" Total_by_Requester
Here I am taking the count of Requester_Id and then displaying it in Total_by_Requester field/column in the table and then doing the addcoltotals command to the get the total of Tota_by_Requester field
The issue that this query has is that it is displaying the Grand Total right underneath the Type_of_Call column, I want to display the Grand Total in its own column after the Total_by_Requester column
Picture of the issue
I have tried doing this query which brings the Grand Total to it's own column and has the right value but gets rid of all the other columns:
| stats count as Total_by_Requester values(*) as * by Requester_Id
| stats sum(Total_by_Requester) as Grand_Total
| table Type_of_Call LOB DateTime_Stamp Policy_Number Requester_Id Last_Name State City Zip Total_by_Requester Grand_Total
Issue in picture:
The labelfield option to addcoltotals tells the command where to put the added label. If the specified field name already exists then the label will go in that field, but if the value of the labelfield option is new then a new column will be created.
However, to create an entirely separate Grand_Total field, use the appendpipe command. The command applies a set of commands to the existing result set without triggering a new search.
| table Type_of_Call LOB DateTime_Stamp Policy_Number Requester_Id Last_Name State City Zip count
| appendpipe
[ stats sum(count) as Grand_Total ]

SQL - How to get a max value from a table and add it into another (sqlite3)

Just like the title says, how would i get the maximun value from one table and add it into a field into another table from the same database:
I currently have my main table "users":
username | password | Email | Highscore 1 | Highscore 2 | Highscore 3 |
I also have my other tables :
"user_scores1":
username | Score 1 |
"user_scores2":
username | Score 2 |
"user_scores3":
username | Score 3 |
The "user_scores" tables contains all the scores of all the users (for the 3 different game modes) whenever they play. Whenever the user finishes the game for a particular game mode, a new score gets added into a new row as well as their username associaed to it, to the table of scores for that gamemode
I want to filter out all the scores from a user (e.g user1) and then get their highest score from the game modes, (e.g filtering out all the scores of user1 from the user_scores1 table)
With this, i want to get the highest score of that specific user from that specific table , and add it into my main table "users" in the appropite field (e.g like the previous example ,filtering out all the scores of user1 from the user_scores1 table, then getting the highest score and adding that score into my main table "users" into highscores1 where the username is user1 )
Is this what you want?
update users
set highscore1 = (select max(score) from user_scores1 us where us.username = users.name),
highscore2 = (select max(score) from user_scores2 us where us.username = users.name),
highscore3 = (select max(score) from user_scores3 us where us.username = users.name);

columns manipulation in fast load

Hello i am new to teradata. I am loading flat file into my TD DB using fast load.
My data set(CSV FILE) contains some issues like some of the rows in city column contains proper data but some of the rows contains NULL. The values of the city columns which contains NULL are stored into the next column which is zip code and so on. At the end some of the rows contains extra columns due to the extra NULL in rows. Examples is given below. How to resolve these kind of issues in fastload? Can someone answer this with SQL example?
City Zipcode country
xyz 12 Esp
abc 11 Ger
Null def(city's data) 12(zipcode's data) Por(country's data)
What about different approach. Instead of solving this in fast load, load your data to temporary table like DATABASENAME.CITIES_TMP with structure like below
City | zip_code | country | column4
xyz | 12 | Esp |
NULL | abc | 12 | Por
In next step create target table DATABASENAME.CITY with the structure
City | zip_code | country |
As a final step you need to run 2 INSERT queries:
INSERT INTO DATABASENAME.CITY (City, zip_code, country)
SELECT City, zip_code, country FROM DATABASENAME.CITIES_TMP
WHERE CITY not like 'NULL'/* WHERE CITY is not null - depends if null is a text value or just empty cell*/;
INSERT INTO DATABASENAME.CITY (City, zip_code, country)
SELECT Zip_code, country, column4 FROM DATABASENAME.CITIES_TMP
WHERE CITY like 'NULL' /* WHERE CITY is null - depends if null is a text value or just empty cell*/
Of course this will work if all your data looks exacly like in sample you provide.
This also will work only when you need to do this once in a while. If you need to load data few times a day it will be a litte cumbersome (not sure if I used proper word in this context) and then you should build some kind of ETL process with for example Talend tool.

How can I group flight legs together into routes for counting?

I have a person's flight history and want to find their most frequent route. All flights are stored as a single row in a table, even return trips where a->b will be in one row and b->a will be in another.
I need to identify where two legs equate to a route; for example:
This person has flown 16 times in total
New York to Paris 2 times (Flight key: JFKCDG)
Paris to New York 2 times (Flight Key: CDGJFK)
New York to London 3 times (Flight Key: JFKLHR)
Currently I don't know a way to group the first two above as a 'Route' and therefore any query I write considers JFKLHR to be the most frequent route (6 times between NY and London) even though I can see from the data that this person has flown between NY and Paris a total of 10 times
Sample Table:
User ID¦Flight Key
-------------------
1 ¦JFKCDG
1 ¦JFKCDG
1 ¦CDGJFK
1 ¦CDGJFK
1 ¦JFKLHR
1 ¦JFKLHR
1 ¦JFKLHR
Expected Output
User ID¦Flight Key¦Count
------------------------
1 ¦JFKCDGJFK ¦4
Building on the clever idea in the answer by #fancyPants. You can use string functions to compare each leg of a route and patch together a full return trip.
I believe this query should work. The first part of the common table expression turns those flights that are round trips into three parts (src-dst-src) and the second part returns those that are one way (as src-dst).
with flights_cte as (
select
USERID,
case when left(flightkey,3) > right(flightkey,3)
then concat(flightkey, left(flightkey,3))
else concat(right(flightkey,3), flightkey)
end as flightkey,
count(*) count
from flights f
where exists (
select 1 from flights where right(f.flightkey,3) = left(flightKey,3)
)
group by
userid,
case
when left(flightkey,3) > right(flightkey,3)
then concat(flightkey, left(flightkey,3))
else concat(right(flightkey,3), flightkey)
end
union all
select userid, FlightKey, count(*)
from flights f
where not exists (
select 1 from flights where right(f.flightkey,3) = left(flightKey,3)
)
group by UserID, FlightKey
)
select flights_cte.userid, flights_cte.flightkey, flights_cte.count
from flights_cte
join (select userid, max(count) _max_count from flights_cte group by userid) _max
on flights_cte.UserID=_max.UserID and flights_cte.count = _max_count
A sample SQL Fiddle gives this output:
| USERID | FLIGHTKEY | COUNT |
|--------|-----------|-------|
| 1 | JFKCDGJFK | 4 |
Assuming routes are not a single row, otherwise you wouldn't be asking.. (although I would guess that the whole route is in some other table, maybe reservation-related)
Guessing the first step is to group this data by person and flights that compose a 'route'. I have an article called T-SQL: Identify bad dates in a time series where the time series can be modified to detect gaps between legs of over a day (guess) to differentiate routes. Second step would be to convert legs into route, i.e. JFK-CDG and CDG-JFK to single value JFK-CDG-JFK.
Then it would be a single query, counting the above single value route, and ORDER BY that count.
Good luck.

PostgreSQL: Self-referencing, flattening join to table which contains tree of objects

I have a relatively large (as in >10^6 entries) table called "things" which represent locateable objects, e.g. countries, areas, cities, streets, etc. They are used as a tree of objects with a fixed depth, so the table structure looks like this:
id
name
type
continent_id
country_id
city_id
area_id
street_id
etc.
The association inside "things" is 1:n, i.e. a street or area always belongs to a defined city and country (not two or none); the column city_id for example contains the id of the "city" thing for all the objects which are inside that city. The "type" column contains the type of thing (Street, City, etc) as a string.
This table is referenced in another table "actions" as "thing_id". I am trying to generate a table of action location statistics showing the number of active and inactive actions a given location has. A simple JOIN like
SELECT count(nullif(actions.active, 1)) AS icount,
count(nullif(actions.active, 0)) AS acount,
things.name AS name, things.id AS thing_id, things.city_id AS city_id
FROM "actions"
LEFT JOIN things ON actions.thing_id = things.id
WHERE UPPER(substring(things.name, 1, 1)) = UPPER('A')
AND actions.datetime_at BETWEEN '2012-09-26 19:52:14' AND '2012-10-26 22:00:00'
GROUP BY things.name, things.id ORDER BY things.name
will give me a list of "things" (starting with 'A') which have actions associated with them and their active and inactive count like this:
icount | acount | name | thing_id | city_id
------------------------------------------------------------------
0 5 Brooklyn, New York City | 25 | 23
1 0 Manhattan, New York City | 24 | 23
3 2 New York City | 23 | 23
Now I would like to
only consider "city" things (that's easy: filter by type in "things"), and
in the active/inactive counts, use the sum of all actions happening in this city - regardless of whether the action is associated with the city itself or something inside the city (= having the same city_id). With the same dataset as above, the new query should result in
icount | acount | name | thing_id | city_id
------------------------------------------------------------------
4 7 New York City | 23 | 23
I do not need the thing_id in this table (since it would not be unique anyway), but since I do need the city's name (for display), it is probably just as easy to also output the ID, then I don't have to change as much in my code.
How would I have to modify the above query to achieve this? I'd like to avoid additional trips to the database, and advanced SQL features such as procedures, triggers, views and temporary tables, if possible.
I'm using Postgres 8.3 with Ruby 1.9.3 on Rails 3.0.14 (on Mac OS X 10.7.4).
Thank you! :)
You need to count actions for all things in the city in an independent subquery and then join to a limited set of things:
SELECT c.icount
,c.acount
,t.name
,t.id AS thing_id
,t.city_id
FROM (
SELECT t.city_id
,count(nullif(a.active, 1)) AS icount
,sum(a.active) AS acount
FROM things t
LEFT JOIN actions a ON a.thing_id = t.id
WHERE t.city_id = 23 -- to restrict results to one city
GROUP BY t.city_id
) c -- counts per city
JOIN things t USING (city_id)
WHERE t.name ILIKE 'A%'
AND t.datetime_at BETWEEN '2012-09-26 19:52:14'
AND '2012-10-26 22:00:00'
ORDER BY t.name, t.id;
I also simplified a number of other things in your query and used table aliases to make it easier to read.