Splunk create table with counts based on differing value in a field - splunk

I am trying to create a table which counts the items in my list with splunk.
E.g. I have a list of items, with one item having the following fields:
name
type
result (e.g. has only three values success, failure, N.A.)
I wish to create a table that groups the items into its respective names, and then count the number of items belong to that name and list the respective type of the group which contains the set of items. After which, I wish to have additional columns that split the counts into different columns based on the "tag" attribute.
Here's a sample table format I wish to achieve:
name | type | success | failure | N.A. | Total count
Item A | fruits | 5 | 0 | 1 | 6
Item B | vegetables | 0 | 2 | 3 | 5
Item C | sweets | 1 | 3 | 2 | 6
Here's what I tried after looking up in the splunk command reference:
index="The index I am looking for" | stats count, values(fields.type) as type by fields.name | table fields.name, Type, count | rename fields.name as name, count as "Total Count"
| appendcols [search index="The index I am looking for" fields.result="success" | stats count, values(fields.type) as type by fields.name | table fields.name, Type, count | rename fields.name as name, count as "success"]
| appendcols [search index="The index I am looking for" fields.result="failure" | stats count, values(fields.type) as type by fields.name | table fields.name, Type, count | rename fields.name as name, count as "failure"]
| appendcols [search index="The index I am looking for" fields.result="N.A." | stats count, values(fields.type) as type by fields.name | table fields.name, Type, count | rename fields.name as name, count as "N.A."]
I noticed that for some columns, e.g. column with the heading "failure", do not have their rows aligned with the other rows, resulting in the total count column not matching with all the counts in the rows being added up.
E.g. referencing from the table presented earlier, with the search query I created, the table below is generated:
name | type | success | failure | N.A. | Total count
Item A | fruits | 5 | 2 | 1 | 6
Item B | vegetables | 0 | 3 | 3 | 5
Item C | sweets | 1 | 0 | 2 | 6
I'd appreciate if advice can be given on how to improve upon the search query, or if possible, to correct me on better suited commands to use.

As you've discovered, the appendcols command works right under somewhat limited circumstances. The order and count of results from appendcols must be exactly the same as that from the main search and other appendcols commands or they won't "line up".
One solution is to use the append command and then re-group the results using stats.
index=foo
| stats count, values(fields.type) as Type by fields.name
| fields fields.name, Type, count
| rename fields.name as name, count as "Total Count"
| append [search index=foo fields.result="success"
| stats count, values(fields.type) as Type by fields.name
| fields fields.name, Type, count
| rename fields.name as name, count as "success"]
| append [search index=foo fields.result="failure"
| stats count, values(fields.type) as Type by fields.name
| fields fields.name, Type, count
| rename fields.name as name, count as "failure"]
| append [search index=foo fields.result="N.A."
| stats count, values(fields.type) as Type by fields.name
| fields fields.name, Type, count
| rename fields.name as name, count as "N.A."]
| stats values(*) as * by name
| table fields.name, Type, count
Another solution avoids using append at all.
index=foo
| rename fields.* as *
| stats count as "Total Count",
sum(eval(result="success")) as success,
sum(eval(result="failure")) as failure,
sum(eval(result="N.A.")) as "N.A.",
values(type) as Type by name
| table name, Type, success, failure, "N.A.", "Total Count"
The construct sum(eval(<<expression>>)) counts the results where <<expression>> is true.

Related

How to see what rows are missing between two select statements in SQLite?

I have a single table view that has a group column and a data column (among other columns). In a particular group, there should be n rows of the same set of text in the same order. However, I'm finding that in some groups, some rows are missing. I'd like to query the view so that I can see what rows are missing.
Concrete example:
+--------+-------+
| Group | Data |
+--------+-------+
| 1 | row 1 |
| 1 | row 2 |
| 1 | row 3 |
| 2 | row 1 |
| 2 | row 3 |
+--------+-------+
Group 2 has "row 2" missing, and I'd like that output. Something like:
+-------+
| Data |
+-------+
| row 2 |
+-------+
Is this possible?
You need to take the COUNT of Data column and then find count(Data) is less than Unique number of Group.
You can achieve it using below.
Select
Data,Count(*)
from tab
Group By Data
having Count(*)<(select count(Distinct Grp) from tab);
DB Fiddle: Try it here

How to get SHOW CREATE TABLE in BigQuery

I want to get CREATE TABLE script for an existing Table, similar to MySQL SHOW CREATE TABLE. How can I do it?
BigQuery now supports a DDL column in INFORMATION_SCHEMA.TABLES view, which gives you the CREATE TABLE (or VIEW) DDL statement.
Note that the DDL column is hidden if you do SELECT * FROM I_S.TABLE, you need to query the view like:
SELECT
table_name, ddl
FROM
`bigquery-public-data`.census_bureau_usa.INFORMATION_SCHEMA.TABLES
WHERE
table_name="population_by_zip_2010"
gives you
+------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| table_name | ddl |
+------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| population_by_zip_2010 | CREATE TABLE `bigquery-public-data.census_bureau_usa.population_by_zip_2010` |
| | ( |
| | geo_id STRING OPTIONS(description="Geo code"), |
| | zipcode STRING NOT NULL OPTIONS(description="Five digit ZIP Code Tabulation Area Census Code"), |
| | population INT64 OPTIONS(description="The total count of the population for this segment."), |
| | minimum_age INT64 OPTIONS(description="The minimum age in the age range. If null, this indicates the row as a total for male, female, or overall population."), |
| | maximum_age INT64 OPTIONS(description="The maximum age in the age range. If null, this indicates the row as having no maximum (such as 85 and over) or the row is a total of the male, female, or overall population."), |
| | gender STRING OPTIONS(description="male or female. If empty, the row is a total population summary.") |
| | ) |
| | OPTIONS( |
| | labels=[("freebqcovid", "")] |
| | ); |
+------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Select rows from a filtered portion of Table A where a column matches a relationship with a column from the row in Table B that matches by ID

I want to get all rows in a table where one column matches a relationship with the value of the column in the row in a different table that has the same value of another column.
Concretely, I have two tables, orders and product_info that I'm accessing through Amazon Redshift
Orders
| ID | Date | Amount | Region |
=====================================
| 1 | 2019/4/1 | $120 | A |
| 1 | 2019/4/4 | $100 | A |
| 2 | 2019/4/2 | $50 | A |
| 3 | 2019/4/6 | $70 | B |
The partition keys of order are region and date.
Product Information
| ID | Release Date | Region |
| ---- | ------------ | ------ |
| 1 | 2019/4/2 | A |
| 2 | 2019/4/3 | A |
| 3 | 2019/4/5 | B |
The primary key of product information is id, and the partition key is region.
I want to get all rows from Orders in region A where the date of the row is greater than the release date value in product information for that ID.
So in this case it should return just one row,
| 1 | 2019/4/4 | $100 | A |
I tried doing
select *
from orders
INNER JOIN product_info ON orders.date>product_info.release_date
AND orders.id=product_info.id
AND orders.region=A
AND product_info.region=A
limit 10
The problem is that this query was absurdly slow (cancelled it after 10 minutes). The tables are extremely large, and I have a feeling it was scanning the entire table without restricting it to region first (in reality I have other filters in addition to region that I want to apply to the list of IDs before I do the inner join, but I've limited it to only region for the sake of simplifying the question).
How can I efficiently write this type of query?
The best way to make an SQL query faster is to exclude rows as soon as possible.
So, rather than putting conditions like orders.region=A in the JOIN statement, you should move them to a WHERE statement. This will eliminate rows before they are joined.
Also, make the JOIN condition as simple as possible so that the database can optimize the comparison.
Try something like this:
SELECT *
FROM orders
INNER JOIN product_info ON orders.id = product_info.id
WHERE orders.region = 'A'
AND product_info.region = 'A'
AND orders.date > product_info.release_date
Any further optimization would require consideration of the DISTKEY and SORTKEY on the Redshift tables. (Preferably a DISTKEY of id and a SORTKEY of date).

SQL / Oracle to Tableau - How to combine to sort based on two fields?

I have tables below as follows:
tbl_tasks
+---------+-------------+
| Task_ID | Assigned_ID |
+---------+-------------+
| 1 | 8 |
| 2 | 12 |
| 3 | 31 |
+---------+-------------+
tbl_resources
+---------+-----------+
| Task_ID | Source_ID |
+---------+-----------+
| 1 | 4 |
| 1 | 10 |
| 2 | 42 |
| 4 | 8 |
+---------+-----------+
A task is assigned to at least one person (denoted by the "assigned_ID") and then any number of people can be assigned as a source (denoted by "source_ID"). The ID numbers are all linked to names in another table. Though the ID numbers are named differently, they all return to the same table.
Would there be any way for me to combine the two tables based on ID such that I could search based on someone's ID number? For example- if I decide to search on or do a WHERE User_ID = 8, in order to see what Tasks that 8 is involved in, I would get back Task 1 and Task 4.
Right now, by joining all the tables together, I can easily filter on "Assigned" but not "Source" due to all the multiple entries in the table.
Use union all:
select distinct task_id
from ((select task_id, assigned_id as id
from tbl_tasks
) union all
(select task_id, source_id
from tbl_resources
)
) ti
where id = ?;
Note that this uses select distinct in case someone is assigned to the same task in both tables. If not, remove the distinct.

Count expression in SSRS 2008

I have a situation where I have a column name `NotifcationLog.Status.
The status can be of 3 types Accepted, Pending & Overdue. I need to have a count of all the Notifications status .
I created a calculated field with the following expression
=COUNT(IIF(Fields!NotificationStatus.Value="Accepted",1,Nothing))
When I tried to add this Calculated field to the table and preview it , I got a error stating "aggregate ,row number,running value,previous and lookup functions cannot be used in calculated field expressions "
What should i do now ??
You try adding
=IIF(Fields!NotificationStatus.Value="Accepted",1,0)
as your calculated field. This returns back 1 or 0 depending on if the status is accepted.
and then where you want to use it you can just SUM your calculated field to give you a count.
=Sum(Fields!NewCalculatedField.Value)
Use this in a table / matrix etc. where your data is grouped.
The error seems straight forward enough, the reason you can't do this is that it wouldn't make sense without any grouping. Imagine following dataset:
+-------+------------+
| ID | Status |
|-------+------------+
| 1 | Accepted |
| 2 | Pending |
| 3 | Accepted |
| 4 | Overdue |
+-------+------------+
If you were to add a third column with your expression, it would be the SQL Equivalent of
SELECT ID, Status, COUNT(CASE WHEN Status = 'Accepted' THEN 1 END)
FROM T
With no group by this is not valid syntax. You could add a report field with your count expression, but not a calculated field in your dataset. The dataset you are trying to make is as follows:
+-------+------------+----------+
| ID | Status | Accepted |
|-------+------------+----------+
| 1 | Accepted | 2 |
| 2 | Pending | 2 |
| 3 | Accepted | 2 |
| 4 | Overdue | 2 |
+-------+------------+----------+
Which does not really make sense to have the value repeated for all rows, but you can do it in SQL using windowed functions:
SELECT ID,
Status,
Accepted = COUNT(CASE WHEN Status = 'Accepted' THEN 1 END) OVER()
FROM T;