I'm trying to run on-demand yara scan in osqueryi using 'pattern' constraints but that column is not there and getting error below. Am i missing something on how to use pattern constraints?
select * from yara where pattern="/bin/%sh" and sig_group="sig_group_1";
Error: no such column: pattern
Just referencing the osquery yara documentation here that i followed:
https://osquery.readthedocs.io/en/stable/deployment/yara/
osquery> SELECT * FROM yara WHERE pattern="/bin/%sh" AND sigfile="/Users/wxs/sigs/baz.sig";
+-----------+---------+-------+-----------+-------------------------+----------+
| path | matches | count | sig_group | sigfile | pattern |
+-----------+---------+-------+-----------+-------------------------+----------+
| /bin/bash | | 0 | | /Users/wxs/sigs/baz.sig | /bin/%sh |
| /bin/csh | | 0 | | /Users/wxs/sigs/baz.sig | /bin/%sh |
| /bin/ksh | | 0 | | /Users/wxs/sigs/baz.sig | /bin/%sh |
| /bin/sh | | 0 | | /Users/wxs/sigs/baz.sig | /bin/%sh |
| /bin/tcsh | | 0 | | /Users/wxs/sigs/baz.sig | /bin/%sh |
| /bin/zsh | | 0 | | /Users/wxs/sigs/baz.sig | /bin/%sh |
+-----------+---------+-------+-----------+-------------------------+----------+
osquery>
And the table schema of yara don't have column 'pattern' either:
https://osquery.io/schema/4.8.0/#yara
Those linked docs appear to be out of date. As you point out, there is no pattern column.
It looks like you should be able to use a pattern on path. From the examples in the source code:
select * from yara where path LIKE '/etc/%'
(I don't use yara, and can't easily confirm this)
Related
When I use "show databases" command in taos shell, I see there are a lot of database parameters, like keep, days, cache, blocks
taos> show databases;
name | created_time | ntables | vgroups | replica | quorum | days | keep0,keep1,keep(D) | cache(MB) | blocks | minrows | maxrows | wallevel | fsync | comp | cachelast | precision | update | status |
====================================================================================================================================================================================================================================================================================
test | 2021-05-26 17:33:17.338 | 1 | 1 | 1 | 1 | 10 | 3650,3650,3650 | 16 | 6 | 100 | 4096 | 1 | 3000 | 2 | 0 | ms | 0 | ready |
Query OK, 1 row(s) in set (0.001774s)
To make best practice of TDengine database, how should I adjust these databases parameters?
You can try "ALTER DATABASE db_name KEEP value;".
I am new to bigquery. First thing, I would have liked to do the SQL equivalent of DESC using Google bigquery.
I did:
DESC `paj.dw.MY_TABLE`;
But I get:
Statement not supported: DescribeStatement
There are mentions of INFORMATION_SCHEMA in beta version, but I get:
Syntax error: Unexpected identifier "INFORMATION_SCHEMA"
How do you do it yourself ?
Thank you.
In addition to INFORMATION_SCHEMA, you can also run the following from the console command line (cloud shell)
bq show --schema --format=prettyjson dataset.table
I prefer this for tables that have nested records.
You could do like
SELECT
* EXCEPT(is_generated, generation_expression, is_stored, is_updatable)
FROM
paj.dw.INFORMATION_SCHEMA.COLUMNS
WHERE
table_name="MY_TABLE"
For other information schema views and example see this page.
Take first table in public dataset for example:
SELECT column_name, is_nullable, data_type
FROM `bigquery-public-data.austin_311.INFORMATION_SCHEMA.COLUMNS`
WHERE table_name="311_request"
You get:
+--------------------------+-------------+-----------+
| column_name | is_nullable | data_type |
+--------------------------+-------------+-----------+
| unique_key | YES | STRING |
| complaint_type | YES | STRING |
| complaint_description | YES | STRING |
| owning_department | YES | STRING |
| source | YES | STRING |
| status | YES | STRING |
| status_change_date | YES | TIMESTAMP |
| created_date | YES | TIMESTAMP |
| last_update_date | YES | TIMESTAMP |
| close_date | YES | TIMESTAMP |
| incident_address | YES | STRING |
| street_number | YES | STRING |
| street_name | YES | STRING |
| city | YES | STRING |
| incident_zip | YES | INT64 |
| county | YES | STRING |
| state_plane_x_coordinate | YES | STRING |
| state_plane_y_coordinate | YES | FLOAT64 |
| latitude | YES | FLOAT64 |
| longitude | YES | FLOAT64 |
| location | YES | STRING |
| council_district_code | YES | INT64 |
| map_page | YES | STRING |
| map_tile | YES | STRING |
+--------------------------+-------------+-----------+
In hive if I create an internal table using the loaction clause (mentioning loaction other than default location of hive) in table creation statement then on dropping that table will it delete the data from the specified location just like it does when the data is in default location of hive?
Yes, it will delete the location even it is not default location of hive also.
Let's assume i'm having test table in default database on /user/yashu/test5 directory.
hive> desc formatted test_tmp;
+-------------------------------+-------------------------------------------------------------+-----------------------+--+
| col_name | data_type | comment |
+-------------------------------+-------------------------------------------------------------+-----------------------+--+
| # col_name | data_type | comment |
| | NULL | NULL |
| id | int | |
| name | string | |
| | NULL | NULL |
| # Detailed Table Information | NULL | NULL |
| Database: | default | NULL |
| Owner: | shu | NULL |
| CreateTime: | Fri Mar 23 03:42:15 EDT 2018 | NULL |
| LastAccessTime: | UNKNOWN | NULL |
| Protect Mode: | None | NULL |
| Retention: | 0 | NULL |
| Location: | hdfs://nn1.com/user/yashu/test5 | NULL |
| Table Type: | MANAGED_TABLE | NULL |
| Table Parameters: | NULL | NULL |
| | numFiles | 1 |
| | totalSize | 12 |
| | transient_lastDdlTime | 1521790935 |
| | NULL | NULL |
| # Storage Information | NULL | NULL |
| SerDe Library: | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL |
| InputFormat: | org.apache.hadoop.mapred.TextInputFormat | NULL |
| OutputFormat: | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL |
| Compressed: | No | NULL |
| Num Buckets: | -1 | NULL |
| Bucket Columns: | [] | NULL |
| Sort Columns: | [] | NULL |
| Storage Desc Params: | NULL | NULL |
| | field.delim | , |
| | serialization.format | , |
+-------------------------------+-------------------------------------------------------------+-----------------------+--+
hadoop directory having one .txt file in test 5 directory
bash$ hadoop fs -ls /user/yashu/test5/
Found 1 items
-rw-r--r-- 3 hdfs hdfs 12 2018-03-23 03:42 /user/yashu/test5/test.txt
Hive table data
select * from test_tmp;
+--------------+----------------+--+
| test_tmp.id | test_tmp.name |
+--------------+----------------+--+
| 1 | bar |
| 2 | foo |
+--------------+----------------+--+
once i drop the table in hive then the directory test5 also dropped from hdfs
hive> drop table test_tmp;
bash$ hadoop fs -ls /user/yashu/test5/
ls: `/user/yashu/test5/': No such file or directory
So once we delete the internal table in hive even the hive table is not on default location also drops the directory(location) that the table is pointing to.
I would like to get daily statistics using TABLE_DATE_RANGE like this:
Select count(*), tableName
FROM
(TABLE_DATE_RANGE(appengine_logs.appengine_googleapis_com_request_log_,
DATE_ADD(CURRENT_TIMESTAMP(), -7, 'DAY'), CURRENT_TIMESTAMP()))
group by tableName
Is there any way to get a table name when using TABLE_DATE_RANGE?
You need to query your dataset with a metadata query.
SELECT * FROM publicdata:samples.__TABLES__
WHERE MSEC_TO_TIMESTAMP(creation_time) < DATE_ADD(CURRENT_TIMESTAMP(), -7, 'DAY')
this returns
+-----+------------+------------+-----------------+---------------+--------------------+-----------+--------------+------+---+
| Row | project_id | dataset_id | table_id | creation_time | last_modified_time | row_count | size_bytes | type | |
+-----+------------+------------+-----------------+---------------+--------------------+-----------+--------------+------+---+
| 1 | publicdata | samples | github_nested | 1348782587310 | 1348782587310 | 2541639 | 1694950811 | 1 | |
| 2 | publicdata | samples | github_timeline | 1335915950690 | 1335915950690 | 6219749 | 3801936185 | 1 | |
| 3 | publicdata | samples | gsod | 1335916040125 | 1413937987846 | 114420316 | 17290009238 | 1 | |
| 4 | publicdata | samples | natality | 1335916045005 | 1413925598038 | 137826763 | 23562717384 | 1 | |
| 5 | publicdata | samples | shakespeare | 1335916045099 | 1413926827257 | 164656 | 6432064 | 1 | |
| 6 | publicdata | samples | trigrams | 1335916127449 | 1335916127449 | 68051509 | 277168458677 | 1 | |
| 7 | publicdata | samples | wikipedia | 1335916132870 | 1423520879902 | 313797035 | 38324173849 | 1 | |
+-----+------------+------------+-----------------+---------------+--------------------+-----------+--------------+------+---+
You can add in the WHERE clauses to restrict to tables similar to
WHERE table_id contains "wiki"
or regexp like WHERE REGEXP_MATCH(table_id, r"^foo[\d]{3,5}")
What are segments in Lucene?
What are the benefits of segments?
The Lucene index is split into smaller chunks called segments. Each segment is its own index. Lucene searches all of them in sequence.
A new segment is created when a new writer is opened and when a writer commits or is closed.
The advantages of using this system are that you never have to modify the files of a segment once it is created. When you are adding new documents in your index, they are added to the next segment. Previous segments are never modified.
Deleting a document is done by simply indicating in a file which document of a segment is deleted, but physically, the document always stays in the segment. Documents in Lucene aren't really updated. What happens is that the previous version of the document is marked as deleted in its original segment and the new version of the document is added to the current segment. This minimizes the chances of corrupting an index by constantly having to modify its content when there are changes. It also allows for easy backup and synchronization of the index across different machines.
However, at some point, Lucene may decide to merge some segments. This operation can also be triggered with an optimize.
A segment is very simply a section of the index. The idea is that you can add documents to the index that's currently being served by creating a new segment with only new documents in it. This way, you don't have to go to the expensive trouble of rebuilding your entire index frequently in order to add new documents to the index.
The segment benefits have been answered already by others. I will include an ascii diagram of a Lucene Index.
Lucene Segment
A Lucene segment is part of an Index. Each segment is composed of several index files. If you look inside any of these files, you will see that it holds 1 or more Lucene documents.
+- Index 5 ------------------------------------------+
| |
| +- Segment _0 ---------------------------------+ |
| | | |
| | +- file 1 -------------------------------+ | |
| | | | | |
| | | +- L.Doc1-+ +- L.Doc2-+ +- L.Doc3-+ | | |
| | | | | | | | | | | |
| | | | field 1 | | field 1 | | field 1 | | | |
| | | | field 2 | | field 2 | | field 2 | | | |
| | | | field 3 | | field 3 | | field 3 | | | |
| | | | | | | | | | | |
| | | +---------+ +---------+ +---------+ | | |
| | | | | |
| | +----------------------------------------+ | |
| | | |
| | | |
| | +- file 2 -------------------------------+ | |
| | | | | |
| | | +- L.Doc4-+ +- L.Doc5-+ +- L.Doc6-+ | | |
| | | | | | | | | | | |
| | | | field 1 | | field 1 | | field 1 | | | |
| | | | field 2 | | field 2 | | field 2 | | | |
| | | | field 3 | | field 3 | | field 3 | | | |
| | | | | | | | | | | |
| | | +---------+ +---------+ +---------+ | | |
| | | | | |
| | +----------------------------------------+ | |
| | | |
| +----------------------------------------------+ |
| |
| +- Segment _1 (optional) ----------------------+ |
| | | |
| +----------------------------------------------+ |
+----------------------------------------------------+
Reference
Lucene in Action Second Edition - July 2010 - Manning Publication