Unexpected error while loading data - google-bigquery

I am getting an "Unexpected" error. I tried a few times, and I still could not load the data. Is there any other way to load data?
gs://log_data/r_mini_raw_20120510.txt.gzto567402616005:myv.may10c
Errors:
Unexpected. Please try again.
Job ID: job_4bde60f1c13743ddabd3be2de9d6b511
Start Time: 1:48pm, 12 May 2012
End Time: 1:51pm, 12 May 2012
Destination Table: 567402616005:myvserv.may10c
Source URI: gs://log_data/r_mini_raw_20120510.txt.gz
Delimiter: ^
Max Bad Records: 30000
Schema:
zoneid: STRING
creativeid: STRING
ip: STRING
update:
I am using the file that can be found here:
http://saraswaticlasses.net/bad.csv.zip
bq load -F '^' --max_bad_record=30000 mycompany.abc bad.csv id:STRING,ceid:STRING,ip:STRING,cb:STRING,country:STRING,telco_name:STRING,date_time:STRING,secondary:STRING,mn:STRING,sf:STRING,uuid:STRING,ua:STRING,brand:STRING,model:STRING,os:STRING,osversion:STRING,sh:STRING,sw:STRING,proxy:STRING,ah:STRING,callback:STRING
I am getting an error "BigQuery error in load operation: Unexpected. Please try again."
The same file works from Ubuntu while it does not work from CentOS 5.4 (Final)
Does the OS encoding need to be checked?

The file you uploaded has an unterminated quote. Can you delete that line and try again? I've filed an internal bigquery bug to be able to handle this case more gracefully.
$grep '"' bad.csv
3000^0^1.202.218.8^2f1f1491^CN^others^2012-05-02 20:35:00^^^^^"Mozilla/5.0^generic web browser^^^^^^^^
When I run a load from my workstation (Ubuntu), I get a warning about the line in question. Note that if you were using a larger file, you would not see this warning, instead you'd just get a failure.
$bq show --format=prettyjson -j job_e1d8636e225a4d5f81becf84019e7484
...
"status": {
"errors": [
{
"location": "Line:29057 / Field:12",
"message": "Missing close double quote (\") character: field starts with: <Mozilla/>",
"reason": "invalid"
}
]

My suspicion is that you have rows or fields in your input data that exceed the 64 KB limit. Perhaps re-check the formatting of your data, check that it is gzipped properly, and if all else fails, try importing uncompressed data. (One possibility is that the entire compressed file is being interpreted as a single row/field that exceeds the aforementioned limit.)
To answer your original question, there are a few other ways to import data: you could upload directly from your local machine using the command-line tool or the web UI, or you could use the raw API. However, all of these mechanisms (including the Google Storage import that you used) funnel through the same CSV parser, so it's possible that they'll all fail in the same way.

Related

KIBANA - WAZUH pattern index

I have a project to install wazuh as FIM on linux, AIX and windows.
I managed to install Manager and all agents on all systems and I can see all three connected on the Kibana web as agents.
I created test file on the linux agent and I can find it also on web interface, so servers are connected.
Here is test file found in wazuh inventory tab
But, I am not recieving any logs if I modify this test file.
This is my settings in ossec.conf under syscheck on agent server>
<directories>/var/ossec/etc/test</directories>
<directories report_changes="yes" check_all="yes" realtime="yes">/var/ossec/etc/test</directories>
And now I ma also strugling to understand meanings of index patterns, index templates and fields.
I dont understand what they are and why we need to set it.
My settings on manager server - /usr/share/kibana/data/wazuh/config/wazuh.yml
alerts.sample.prefix: 'wazuh-alerts-*'
pattern: 'wazuh-alerts-*'
On the kibana web I also have this error when I am trying to check ,,events,, -the are no logs in the events.
Error: The field "timestamp" associated with this object no longer exists in the index pattern. Please use another field.
at FieldParamType.config.write.write (http://MYIP:5601/42959/bundles/plugin/data/kibana/data.plugin.js:1:627309)
at http://MYIP:5601/42959/bundles/plugin/data/kibana/data.plugin.js:1:455052
at Array.forEach (<anonymous>)
at writeParams (http://MYIP:5601/42959/bundles/plugin/data/kibana/data.plugin.js:1:455018)
at AggConfig.write (http://MYIP:5601/42959/bundles/plugin/data/kibana/data.plugin.js:1:355081)
at AggConfig.toDsl (http://MYIP:5601/42959/bundles/plugin/data/kibana/data.plugin.js:1:355960)
at http://MYIP:5601/42959/bundles/plugin/data/kibana/data.plugin.js:1:190748
at Array.forEach (<anonymous>)
at agg_configs_AggConfigs.toDsl (http://MYIP:5601/42959/bundles/plugin/data/kibana/data.plugin.js:1:189329)
at http://MYIP:5601/42959/bundles/plugin/wazuh/4.2.5-4206-1/wazuh.chunk.6.js:55:1397640
Thank you.
About FIM:
here you can find the FIM documentation in case you don't have it:
https://documentation.wazuh.com/current/user-manual/capabilities/file-integrity/fim-configuration.html
https://documentation.wazuh.com/current/user-manual/reference/ossec-conf/syscheck.html.
The first requirement for this to work would be to ensure a FIM alert is triggered, could you check the alerts.json file on your manager? It is usually located under /var/ossec/logs/alerts/alerts.json In order to test this fully I would run "tail -f /var/ossec/logs/alerts/alerts.json" and make a change in yout directory , if no alerts is generated, then we will need to check the agent configuration.
About indexing:
Here you can find some documentation:
https://www.elastic.co/guide/en/elasticsearch/reference/current/index-templates.html
https://www.elastic.co/guide/en/kibana/current/managing-index-patterns.html#scripted-fields
https://documentation.wazuh.com/current/user-manual/kibana-app/reference/elasticsearch.html
Regarding your error, The best way to solve this is to delete the index. To do this:
got to Kibana -> Stack management -> index patterns and there delete wazuh-alerts-*.
Then if you enter to Wazuh App the health check will create it again or you can follow this to create your index:
Go to kibana -> stack management -> index pattern and select Create index pattern.
Hope this information helps you.
Regards.
thank you for your answer.
I managed to step over this issue, but I hit another error.
When I check tail -f /var/ossec/logs/alerts/alerts.json I got never ending updating, thousands lines with errors like.
{"timestamp":"2022-01-31T12:40:08.458+0100","rule":{"level":5,"description":"Systemd: Service has entered a failed state, and likely has not started.","id":"40703","firedtimes":7420,"mail":false,"groups":["local","systemd"],"gpg13":["4.3"],"gdpr":["IV_35.7.d"]},"agent":{"id":"003","name":"MYAGENTSERVERNAME","ip":"X.X.X.X"},"manager":{"name":"MYMANAGERSERVERNAME"},"id":"1643629208.66501653","full_log":"Jan 31 12:40:07 MYAGENTSERVERNAME systemd: Unit rbro-cbs-adapter-int.service entered failed state.","predecoder":{"program_name":"systemd","timestamp":"Jan 31 12:40:07","hostname":"MYAGENTSERVERNAME"},"decoder":{"name":"systemd"},"location":"/var/log/messages"}
But, I can also find alert if I change monitored file. (file> wazuhtest)
{"timestamp":"2022-01-31T12:45:59.874+0100","rule":{"level":7,"description":"Integrity checksum changed.","id":"550","mitre":{"id":["T1492"],"tactic":["Impact"],"technique":["Stored Data Manipulation"]},"firedtimes":1,"mail":false,"groups":["ossec","syscheck","syscheck_entry_modified","syscheck_file"],"pci_dss":["11.5"],"gpg13":["4.11"],"gdpr":["II_5.1.f"],"hipaa":["164.312.c.1","164.312.c.2"],"nist_800_53":["SI.7"],"tsc":["PI1.4","PI1.5","CC6.1","CC6.8","CC7.2","CC7.3"]},"agent":{"id":"003","name":"MYAGENTSERVERNAME","ip":"x.x.xx.x"},"manager":{"name":"MYMANAGERSERVERNAME"},"id":"1643629559.67086751","full_log":"File '/var/ossec/etc/wazuhtest' modified\nMode: realtime\nChanged attributes: size,mtime,inode,md5,sha1,sha256\nSize changed from '61' to '66'\nOld modification time was: '1643618571', now it is '1643629559'\nOld inode was: '786558', now it is '786559'\nOld md5sum was: '2dd5fe4d08e7c58dfdba76e55430ba57'\nNew md5sum is : 'd8b218e9ea8e2da8e8ade8498d06cba8'\nOld sha1sum was: 'ca9bac5a2d8e6df4aa9772b8485945a9f004a2e3'\nNew sha1sum is : 'bd8b8b5c20abfe08841aa4f5aaa1e72f54a46d31'\nOld sha256sum was: '589e6f3d691a563e5111e0362de0ae454aea52b7f63014cafbe07825a1681320'\nNew sha256sum is : '7f26a582157830b1a725a059743e6d4d9253e5f98c52d33863bc7c00cca827c7'\n","syscheck":{"path":"/var/ossec/etc/wazuhtest","mode":"realtime","size_before":"61","size_after":"66","perm_after":"rw-r-----","uid_after":"0","gid_after":"0","md5_before":"2dd5fe4d08e7c58dfdba76e55430ba57","md5_after":"d8b218e9ea8e2da8e8ade8498d06cba8","sha1_before":"ca9bac5a2d8e6df4aa9772b8485945a9f004a2e3","sha1_after":"bd8b8b5c20abfe08841aa4f5aaa1e72f54a46d31","sha256_before":"589e6f3d691a563e5111e0362de0ae454aea52b7f63014cafbe07825a1681320","sha256_after":"7f26a582157830b1a725a059743e6d4d9253e5f98c52d33863bc7c00cca827c7","uname_after":"root","gname_after":"root","mtime_before":"2022-01-31T09:42:51","mtime_after":"2022-01-31T12:45:59","inode_before":786558,"inode_after":786559,"diff":"1c1\n< dadadadadad\n---\n> dfsdfdadadadadad\n","changed_attributes":["size","mtime","inode","md5","sha1","sha256"],"event":"modified"},"decoder":{"name":"syscheck_integrity_changed"},"location":"syscheck"}
{"timestamp":"2022-01-31T12:46:08.452+0100","rule":{"level":3,"description":"Log file rotated.","id":"591","firedtimes":5,"mail":false,"groups":["ossec"],"pci_dss":["10.5.2","10.5.5"],"gpg13":["10.1"],"gdpr":["II_5.1.f","IV_35.7.d"],"hipaa":["164.312.b"],"nist_800_53":["AU.9"],"tsc":["CC6.1","CC7.2","CC7.3","PI1.4","PI1.5","CC7.1","CC8.1"]},"agent":{"id":"003","name":"MYAGENTSERVERNAME","ip":"x.x.xx.x"},"manager":{"name":"MYMANAGERSERVERNAME"},"id":"1643629568.67099280","full_log":"ossec: File rotated (inode changed): '/var/ossec/etc/wazuhtest'.","decoder":{"name":"ossec"},"location":"wazuh-logcollector"}
Also I can see this alert in messages logs on the manager server>
Jan 31 12:46:10 MYMANAGERSERVERNAME filebeat[186670]: 2022-01-31T12:46:10.379+0100#011WARN#011[elasticsearch]#011elasticsearch/client.go:405#011Cannot index event publisher.Event{Content:beat.Event{Timestamp:time.Time{wall:0xc07610e0563729bf, ext:10888984451164, loc:(*time.Location)(0x55958e3622a0)}, Meta:{"pipeline":"filebeat-7.14.0-wazuh-alerts-pipeline"}, Fields:{"agent":{"ephemeral_id":"dd9ff0c5-d5a9-4a0e-b1b3-0e9d7e8997ad","hostname":"MYMANAGERSERVERNAME","id":"03fb57ca-9940-4886-9e6e-a3b3e635cd35","name":"MYMANAGERSERVERNAME","type":"filebeat","version":"7.14.0"},"ecs":{"version":"1.10.0"},"event":{"dataset":"wazuh.alerts","module":"wazuh"},"fields":{"index_prefix":"wazuh-alerts-4.x-"},"fileset":{"name":"alerts"},"host":{"name":"MYMANAGERSERVERNAME"},"input":{"type":"log"},"log":{"file":{"path":"/var/ossec/logs/alerts/alerts.json"},"offset":127261462},"message":"{"timestamp":"2022-01-31T12:46:08.452+0100","rule":{"level":3,"description":"Log file rotated.","id":"591","firedtimes":5,"mail":false,"groups":["ossec"],"pci_dss":["10.5.2","10.5.5"],"gpg13":["10.1"],"gdpr":["II_5.1.f","IV_35.7.d"],"hipaa":["164.312.b"],"nist_800_53":["AU.9"],"tsc":["CC6.1","CC7.2","CC7.3","PI1.4","PI1.5","CC7.1","CC8.1"]},"agent":{"id":"003","name":"xlcppt36","ip":"10.74.96.34"},"manager":{"name":"MYMANAGERSERVERNAME"},"id":"1643629568.67099280","full_log":"ossec: File rotated (inode changed): '/var/ossec/etc/wazuhtest'.","decoder":{"name":"ossec"},"location":"wazuh-logcollector"}","service":{"type":"wazuh"}}, Private:file.State{Id:"native::706-64776", PrevId:"", Finished:false, Fileinfo:(*os.fileStat)(0xc00095ea90), Source:"/var/ossec/logs/alerts/alerts.json", Offset:127262058, Timestamp:time.Time{wall:0xc076063e1f1b1286, ext:133605185, loc:(*time.Location)(0x55958e3622a0)}, TTL:-1, Type:"log", Meta:map[string]string(nil), FileStateOS:file.StateOS{Inode:0x2c2, Device:0xfd08}, IdentifierName:"native"}, TimeSeries:false}, Flags:0x1, Cache:publisher.EventCache{m:common.MapStr(nil)}} (status=400): {"type":"illegal_argument_exception","reason":"data_stream [<wazuh-alerts-4.x-{2022.01.31||/d{yyyy.MM.dd|UTC}}>] must not contain the following characters [ , ", *, \, <, |, ,, >, /, ?]"}
Here is output form apps check.
curl "http://localhost:9200"
{
"version" : {
"number" : "7.14.2",
"build_flavor" : "default",
"build_type" : "rpm",
"build_hash" : "6bc13727ce758c0e943c3c21653b3da82f627f75",
"build_date" : "2021-09-15T10:18:09.722761972Z",
"build_snapshot" : false,
"lucene_version" : "8.9.0",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
filebeat test output
elasticsearch: http://127.0.0.1:9200...
parse url... OK
connection...
parse host... OK
dns lookup... OK
addresses: 127.0.0.1
dial up... OK
TLS... WARN secure connection disabled
talk to server... OK
version: 7.14.2
So .. I can see alerts coming from Agent, but Its not reaching Kibana yet. On the kibana web I can see agent active and connected.

Uploading job fails on the same file that was uploaded successfully before

I'm running regular uploading job to upload csv into BigQuery. The job runs every hour. According to recent fail log, it says:
Error: [REASON] invalid [MESSAGE] Invalid argument: service.geotab.com [LOCATION] File: 0 / Offset:268436098 / Line:218637 / Field:2
Error: [REASON] invalid [MESSAGE] Too many errors encountered. Limit is: 0. [LOCATION]
I went to line 218638 (the original csv has a headline, so I assume 218638 should be the actual failed line, let me know if I'm wrong) but it seems all right. I checked according table in BigQuery, it has that line too, which means I actually successfully uploaded this line before.
Then why does it causes failure recently?
project id: red-road-574
Job ID: Job_Upload-7EDCB180-2A2E-492B-9143-BEFFB36E5BB5
This indicates that there was a problem with the data in your file, where it didn't match the schema.
The error message says it occurred at File: 0 / Offset:268436098 / Line:218637 / Field:2. This means the first file (it looks like you just had one), and then the chunk of the file starting at 268436098 bytes from the beginning of the file, then the 218637th line from that file offset.
The reason for the offset portion is that bigquery processes large files in parallel in multiple workers. Each file worker starts at an offset from the beginning of the file. The offset that we include is the offset that the worker started from.
From the rest of the error message, it looks like the string service.geotab.com showed up in the second field, but the second field was a number, and service.geotab.com isn't a valid number. Perhaps there was a stray newline?
You can see what the lines looked like around the error by doing:
cat <yourfile> | tail -c +268436098 | tail -n +218636 | head -3
This will print out three lines... the one before the error (since I used -n +218636 instead of +218637), the one that had the error, and the next line as well.
Note that if this is just one line in the file that has a problem, you may be able to work around the issue by specifying maxBadRecords.

Hi , Google big query - bq fail load display file number how to get the file name

I'm running the following bq command
bq load --source_format=CSV --skip_leading_rows=1 --max_bad_records=1000 --replace raw_data.order_20150131 gs://raw-data/order/order/2050131/* order.json
and
getting the following message when loading data into bq .
*************************************
Waiting on bqjob_r4ca10491_0000014ce70963aa_1 ... (412s) Current status: DONE
BigQuery error in load operation: Error processing job
'orders:bqjob_r4ca10491_0000014ce70963aa_1': Too few columns: expected
11 column(s) but got 1 column(s). For additional help: http://goo.gl/RWuPQ
Failure details:
- File: 844 / Line:1: Too few columns: expected 11 column(s) but got
1 column(s). For additional help: http://goo.gl/RWuPQ
**********************************
The message display only the file number .
checked the files content most of them are good .
gsutil ls and the cloud console on the other hand display file names .
how can I know which file is it according to the file number?
There seems to be some weird spacing introduced in the question, but if the desired path to ingest is "/order.json" - that won't work: You can only use "" at the end of the path when ingesting data to BigQuery.

H2 - Split file option in server mode

Using H2 database, is it possible to use the split file option while in (SSL) server mode and using encryption? If so, how can I do it?
I created a split database using this JDBC string:
jdbc:h2:split:28:/g:/db_split;CIPHER=AES
It is stated that a split database always needs the :split option afterwards, which seems true because I get errors about corrupted files when connecting with
jdbc:h2:ssl://g:/db_split;CIPHER=AES
General error: "java.lang.NumberFormatException: Zero length string" [50000-170] HY000/50000
But when I attach the appropriate option, another error follows:
jdbc:h2:split:ssl://g:/db_split;CIPHER=AES
IO Exception: "java.io.IOException: A sintaxe do nome do arquivo, do nome do diretório ou do rótulo do volume está incorreta"; "ssl://g:/db_split.h2.db" [90031-170] 90031/90031 (Error message localized in Portuguese - something like "The syntax for file name, folder name or volume label is incorrect")
Is there a way to make these options coexist? I am considering AUTO_SERVER, but it would be a lousy option.
For the server mode, use:
jdbc:h2:tcp://localhost/split:28:/g:/db_split;CIPHER=AES
When using SSL:
jdbc:h2:ssl://localhost/split:28:/g:/db_split;CIPHER=AES
For embedded mode, use:
jdbc:h2:split:28:/g:/db_split;CIPHER=AES

Possible BigQuery bug (on not enough rows returned)

Using the shakespeare public dataset, I tried to run the following (full code of the query, plus error):
` bq query "SELECT word FROM publicdata:samples.shakespeare WHERE word = 'huzzah' IGNORE CASE"
Waiting on bqjob_ref3f8f63522c642_0000014452358cb2_1 ... (0s) Current status: DONE
Bigquery service returned an invalid reply in query operation: Not enough rows returned by
server for job 'test-rich-app:bqjob_ref3f8f63522c642_0000014452358cb2_1'.
Please make sure you are using the latest version of the bq tool and try again. If this problem
persists, you may have encountered a bug in the bigquery client. Google engineers monitor and
answer questions on Stack Overflow, with the tag google-bigquery:
https://stackoverflow.com/questions/ask?tags=google-bigquery
Please include a brief description of the steps that led to this issue, as well as the
following information:
========================================
== Platform ==
CPython:2.7.2:Darwin-12.5.0-x86_64-i386-64bit
== bq version ==
v2.0.17
== Command line ==
['/Users/rich/google-cloud-sdk/platform/bigquery/bq.py', '--credential_file', '/Users/richmorrow/.config/gcloud/legacy_credentials/rich#quicloud.com/singlestore.json', 'query', "SELECT word FROM publicdata:samples.shakespeare WHERE word = 'huzzah' IGNORE CASE"]
== UTC timestamp ==
2014-02-21 02:10:46
== Error trace ==
File "/Users/rich/google-cloud-sdk/platform/bigquery/bq.py", line 783, in RunSafely
return_value = self.RunWithArgs(*args, **kwds)
File "/Users/rich/google-cloud-sdk/platform/bigquery/bq.py", line 1134, in RunWithArgs
max_rows=self.max_rows)
File "/Users/rich/google-cloud-sdk/platform/bigquery/bigquery_client.py", line 804, in ReadSchemaAndJobRows
return reader.ReadSchemaAndRows(start_row, max_rows)
File "/Users/rich/google-cloud-sdk/platform/bigquery/bigquery_client.py", line 2095, in ReadSchemaAndRows
'Not enough rows returned by server for %r' % (self,))
Unexpected exception in query operation: Not enough rows returned by server for job
'test-rich-app:bqjob_ref3f8f63522c642_0000014452358cb2_1'`
This issue happens with google cloud SDK and BigQuery CLI v2.0.15
Works fine with: BigQuery CLI v2.0.11
Sounds like you're using an old version of bq. The bq tool is now distributed as part of the google cloud SDK... you can install it here:
https://developers.google.com/cloud/sdk/
Essentially the issue was that when queries returned 0 rows, the old version of bq failed with the error you're seeing.