KIBANA - WAZUH pattern index - indexing

I have a project to install wazuh as FIM on linux, AIX and windows.
I managed to install Manager and all agents on all systems and I can see all three connected on the Kibana web as agents.
I created test file on the linux agent and I can find it also on web interface, so servers are connected.
Here is test file found in wazuh inventory tab
But, I am not recieving any logs if I modify this test file.
This is my settings in ossec.conf under syscheck on agent server>
<directories>/var/ossec/etc/test</directories>
<directories report_changes="yes" check_all="yes" realtime="yes">/var/ossec/etc/test</directories>
And now I ma also strugling to understand meanings of index patterns, index templates and fields.
I dont understand what they are and why we need to set it.
My settings on manager server - /usr/share/kibana/data/wazuh/config/wazuh.yml
alerts.sample.prefix: 'wazuh-alerts-*'
pattern: 'wazuh-alerts-*'
On the kibana web I also have this error when I am trying to check ,,events,, -the are no logs in the events.
Error: The field "timestamp" associated with this object no longer exists in the index pattern. Please use another field.
at FieldParamType.config.write.write (http://MYIP:5601/42959/bundles/plugin/data/kibana/data.plugin.js:1:627309)
at http://MYIP:5601/42959/bundles/plugin/data/kibana/data.plugin.js:1:455052
at Array.forEach (<anonymous>)
at writeParams (http://MYIP:5601/42959/bundles/plugin/data/kibana/data.plugin.js:1:455018)
at AggConfig.write (http://MYIP:5601/42959/bundles/plugin/data/kibana/data.plugin.js:1:355081)
at AggConfig.toDsl (http://MYIP:5601/42959/bundles/plugin/data/kibana/data.plugin.js:1:355960)
at http://MYIP:5601/42959/bundles/plugin/data/kibana/data.plugin.js:1:190748
at Array.forEach (<anonymous>)
at agg_configs_AggConfigs.toDsl (http://MYIP:5601/42959/bundles/plugin/data/kibana/data.plugin.js:1:189329)
at http://MYIP:5601/42959/bundles/plugin/wazuh/4.2.5-4206-1/wazuh.chunk.6.js:55:1397640
Thank you.

About FIM:
here you can find the FIM documentation in case you don't have it:
https://documentation.wazuh.com/current/user-manual/capabilities/file-integrity/fim-configuration.html
https://documentation.wazuh.com/current/user-manual/reference/ossec-conf/syscheck.html.
The first requirement for this to work would be to ensure a FIM alert is triggered, could you check the alerts.json file on your manager? It is usually located under /var/ossec/logs/alerts/alerts.json In order to test this fully I would run "tail -f /var/ossec/logs/alerts/alerts.json" and make a change in yout directory , if no alerts is generated, then we will need to check the agent configuration.
About indexing:
Here you can find some documentation:
https://www.elastic.co/guide/en/elasticsearch/reference/current/index-templates.html
https://www.elastic.co/guide/en/kibana/current/managing-index-patterns.html#scripted-fields
https://documentation.wazuh.com/current/user-manual/kibana-app/reference/elasticsearch.html
Regarding your error, The best way to solve this is to delete the index. To do this:
got to Kibana -> Stack management -> index patterns and there delete wazuh-alerts-*.
Then if you enter to Wazuh App the health check will create it again or you can follow this to create your index:
Go to kibana -> stack management -> index pattern and select Create index pattern.
Hope this information helps you.
Regards.

thank you for your answer.
I managed to step over this issue, but I hit another error.
When I check tail -f /var/ossec/logs/alerts/alerts.json I got never ending updating, thousands lines with errors like.
{"timestamp":"2022-01-31T12:40:08.458+0100","rule":{"level":5,"description":"Systemd: Service has entered a failed state, and likely has not started.","id":"40703","firedtimes":7420,"mail":false,"groups":["local","systemd"],"gpg13":["4.3"],"gdpr":["IV_35.7.d"]},"agent":{"id":"003","name":"MYAGENTSERVERNAME","ip":"X.X.X.X"},"manager":{"name":"MYMANAGERSERVERNAME"},"id":"1643629208.66501653","full_log":"Jan 31 12:40:07 MYAGENTSERVERNAME systemd: Unit rbro-cbs-adapter-int.service entered failed state.","predecoder":{"program_name":"systemd","timestamp":"Jan 31 12:40:07","hostname":"MYAGENTSERVERNAME"},"decoder":{"name":"systemd"},"location":"/var/log/messages"}
But, I can also find alert if I change monitored file. (file> wazuhtest)
{"timestamp":"2022-01-31T12:45:59.874+0100","rule":{"level":7,"description":"Integrity checksum changed.","id":"550","mitre":{"id":["T1492"],"tactic":["Impact"],"technique":["Stored Data Manipulation"]},"firedtimes":1,"mail":false,"groups":["ossec","syscheck","syscheck_entry_modified","syscheck_file"],"pci_dss":["11.5"],"gpg13":["4.11"],"gdpr":["II_5.1.f"],"hipaa":["164.312.c.1","164.312.c.2"],"nist_800_53":["SI.7"],"tsc":["PI1.4","PI1.5","CC6.1","CC6.8","CC7.2","CC7.3"]},"agent":{"id":"003","name":"MYAGENTSERVERNAME","ip":"x.x.xx.x"},"manager":{"name":"MYMANAGERSERVERNAME"},"id":"1643629559.67086751","full_log":"File '/var/ossec/etc/wazuhtest' modified\nMode: realtime\nChanged attributes: size,mtime,inode,md5,sha1,sha256\nSize changed from '61' to '66'\nOld modification time was: '1643618571', now it is '1643629559'\nOld inode was: '786558', now it is '786559'\nOld md5sum was: '2dd5fe4d08e7c58dfdba76e55430ba57'\nNew md5sum is : 'd8b218e9ea8e2da8e8ade8498d06cba8'\nOld sha1sum was: 'ca9bac5a2d8e6df4aa9772b8485945a9f004a2e3'\nNew sha1sum is : 'bd8b8b5c20abfe08841aa4f5aaa1e72f54a46d31'\nOld sha256sum was: '589e6f3d691a563e5111e0362de0ae454aea52b7f63014cafbe07825a1681320'\nNew sha256sum is : '7f26a582157830b1a725a059743e6d4d9253e5f98c52d33863bc7c00cca827c7'\n","syscheck":{"path":"/var/ossec/etc/wazuhtest","mode":"realtime","size_before":"61","size_after":"66","perm_after":"rw-r-----","uid_after":"0","gid_after":"0","md5_before":"2dd5fe4d08e7c58dfdba76e55430ba57","md5_after":"d8b218e9ea8e2da8e8ade8498d06cba8","sha1_before":"ca9bac5a2d8e6df4aa9772b8485945a9f004a2e3","sha1_after":"bd8b8b5c20abfe08841aa4f5aaa1e72f54a46d31","sha256_before":"589e6f3d691a563e5111e0362de0ae454aea52b7f63014cafbe07825a1681320","sha256_after":"7f26a582157830b1a725a059743e6d4d9253e5f98c52d33863bc7c00cca827c7","uname_after":"root","gname_after":"root","mtime_before":"2022-01-31T09:42:51","mtime_after":"2022-01-31T12:45:59","inode_before":786558,"inode_after":786559,"diff":"1c1\n< dadadadadad\n---\n> dfsdfdadadadadad\n","changed_attributes":["size","mtime","inode","md5","sha1","sha256"],"event":"modified"},"decoder":{"name":"syscheck_integrity_changed"},"location":"syscheck"}
{"timestamp":"2022-01-31T12:46:08.452+0100","rule":{"level":3,"description":"Log file rotated.","id":"591","firedtimes":5,"mail":false,"groups":["ossec"],"pci_dss":["10.5.2","10.5.5"],"gpg13":["10.1"],"gdpr":["II_5.1.f","IV_35.7.d"],"hipaa":["164.312.b"],"nist_800_53":["AU.9"],"tsc":["CC6.1","CC7.2","CC7.3","PI1.4","PI1.5","CC7.1","CC8.1"]},"agent":{"id":"003","name":"MYAGENTSERVERNAME","ip":"x.x.xx.x"},"manager":{"name":"MYMANAGERSERVERNAME"},"id":"1643629568.67099280","full_log":"ossec: File rotated (inode changed): '/var/ossec/etc/wazuhtest'.","decoder":{"name":"ossec"},"location":"wazuh-logcollector"}
Also I can see this alert in messages logs on the manager server>
Jan 31 12:46:10 MYMANAGERSERVERNAME filebeat[186670]: 2022-01-31T12:46:10.379+0100#011WARN#011[elasticsearch]#011elasticsearch/client.go:405#011Cannot index event publisher.Event{Content:beat.Event{Timestamp:time.Time{wall:0xc07610e0563729bf, ext:10888984451164, loc:(*time.Location)(0x55958e3622a0)}, Meta:{"pipeline":"filebeat-7.14.0-wazuh-alerts-pipeline"}, Fields:{"agent":{"ephemeral_id":"dd9ff0c5-d5a9-4a0e-b1b3-0e9d7e8997ad","hostname":"MYMANAGERSERVERNAME","id":"03fb57ca-9940-4886-9e6e-a3b3e635cd35","name":"MYMANAGERSERVERNAME","type":"filebeat","version":"7.14.0"},"ecs":{"version":"1.10.0"},"event":{"dataset":"wazuh.alerts","module":"wazuh"},"fields":{"index_prefix":"wazuh-alerts-4.x-"},"fileset":{"name":"alerts"},"host":{"name":"MYMANAGERSERVERNAME"},"input":{"type":"log"},"log":{"file":{"path":"/var/ossec/logs/alerts/alerts.json"},"offset":127261462},"message":"{"timestamp":"2022-01-31T12:46:08.452+0100","rule":{"level":3,"description":"Log file rotated.","id":"591","firedtimes":5,"mail":false,"groups":["ossec"],"pci_dss":["10.5.2","10.5.5"],"gpg13":["10.1"],"gdpr":["II_5.1.f","IV_35.7.d"],"hipaa":["164.312.b"],"nist_800_53":["AU.9"],"tsc":["CC6.1","CC7.2","CC7.3","PI1.4","PI1.5","CC7.1","CC8.1"]},"agent":{"id":"003","name":"xlcppt36","ip":"10.74.96.34"},"manager":{"name":"MYMANAGERSERVERNAME"},"id":"1643629568.67099280","full_log":"ossec: File rotated (inode changed): '/var/ossec/etc/wazuhtest'.","decoder":{"name":"ossec"},"location":"wazuh-logcollector"}","service":{"type":"wazuh"}}, Private:file.State{Id:"native::706-64776", PrevId:"", Finished:false, Fileinfo:(*os.fileStat)(0xc00095ea90), Source:"/var/ossec/logs/alerts/alerts.json", Offset:127262058, Timestamp:time.Time{wall:0xc076063e1f1b1286, ext:133605185, loc:(*time.Location)(0x55958e3622a0)}, TTL:-1, Type:"log", Meta:map[string]string(nil), FileStateOS:file.StateOS{Inode:0x2c2, Device:0xfd08}, IdentifierName:"native"}, TimeSeries:false}, Flags:0x1, Cache:publisher.EventCache{m:common.MapStr(nil)}} (status=400): {"type":"illegal_argument_exception","reason":"data_stream [<wazuh-alerts-4.x-{2022.01.31||/d{yyyy.MM.dd|UTC}}>] must not contain the following characters [ , ", *, \, <, |, ,, >, /, ?]"}
Here is output form apps check.
curl "http://localhost:9200"
{
"version" : {
"number" : "7.14.2",
"build_flavor" : "default",
"build_type" : "rpm",
"build_hash" : "6bc13727ce758c0e943c3c21653b3da82f627f75",
"build_date" : "2021-09-15T10:18:09.722761972Z",
"build_snapshot" : false,
"lucene_version" : "8.9.0",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
filebeat test output
elasticsearch: http://127.0.0.1:9200...
parse url... OK
connection...
parse host... OK
dns lookup... OK
addresses: 127.0.0.1
dial up... OK
TLS... WARN secure connection disabled
talk to server... OK
version: 7.14.2
So .. I can see alerts coming from Agent, but Its not reaching Kibana yet. On the kibana web I can see agent active and connected.

Related

How to get information on latest successful pod deployment in OpenShift 3.6

I am currently working on making a CICD script to deploy a complex environment into another environment. We have multiple technology involved and I currently want to optimize this script because it's taking too much time to fetch information on each environment.
In the OpenShift 3.6 section, I need to get the last successful deployment for each application for a specific project. I try to find a quick way to do so, but right now I only found this solution :
oc rollout history dc -n <Project_name>
This will give me the following output
deploymentconfigs "<Application_name>"
REVISION STATUS CAUSE
1 Complete config change
2 Complete config change
3 Failed manual change
4 Running config change
deploymentconfigs "<Application_name2>"
REVISION STATUS CAUSE
18 Complete config change
19 Complete config change
20 Complete manual change
21 Failed config change
....
I then take this output and parse each line to know which is the latest revision that have the status "Complete".
In the above example, I would get this list :
<Application_name> : 2
<Application_name2> : 20
Then for each application and each revision I do :
oc rollout history dc/<Application_name> -n <Project_name> --revision=<Latest_Revision>
In the above example the Latest_Revision for Application_name is 2 which is the latest complete revision not building and not failed.
This will give me the output with the information I need which is the version of the ear and the version of the configuration that was used in the creation of the image use for this successful deployment.
But since I have multiple application, this process can take up to 2 minutes per environment.
Would anybody have a better way of fetching the information I required?
Unless I am mistaken, it looks like there are no "one liner" with the possibility to get the information on the currently running and accessible application.
Thanks
Assuming that the currently active deployment is the latest successful one, you may try the following:
oc get dc -a --no-headers | awk '{print "oc rollout history dc "$1" --revision="$2}' | . /dev/stdin
It gets a list of deployments, feeds it to awk to extract the name $1 and revision $2, then compiles your command to extract the details, finally sends it to standard input to execute. It may be frowned upon for not using xargs or the like, but I found it easier for debugging (just drop the last part and see the commands printed out).
UPDATE:
On second thoughts, you might actually like this one better:
oc get dc -a -o jsonpath='{range .items[*]}{.metadata.name}{"\n\t"}{.spec.template.spec.containers[0].env}{"\n\t"}{.spec.template.spec.containers[0].image}{"\n-------\n"}{end}'
The example output:
daily-checks
[map[name:SQL_QUERIES_DIR value:daily-checks/]]
docker-registry.default.svc:5000/ptrk-testing/daily-checks#sha256:b299434622b5f9e9958ae753b7211f1928318e57848e992bbf33a6e9ee0f6d94
-------
jboss-webserver31-tomcat
registry.access.redhat.com/jboss-webserver-3/webserver31-tomcat7-openshift#sha256:b5fac47d43939b82ce1e7ef864a7c2ee79db7920df5764b631f2783c4b73f044
-------
jtask
172.30.31.183:5000/ptrk-testing/app-txeq:build
-------
lifebicycle
docker-registry.default.svc:5000/ptrk-testing/lifebicycle#sha256:a93cfaf9efd9b806b0d4d3f0c087b369a9963ea05404c2c7445cc01f07344a35
You get the idea, with expressions like .spec.template.spec.containers[0].env you can reach for specific variables, labels, etc. Unfortunately the jsonpath output is not available with oc rollout history.
UPDATE 2:
You could also use post-deployment hooks to collect the data, if you can set up a listener for the hooks. Hopefully the information you need is inherited by the PODs. More info here: https://docs.openshift.com/container-platform/3.10/dev_guide/deployments/deployment_strategies.html#lifecycle-hooks

Switching the system does not work

I had the following situation: I'm in a live user mode debugging session and I wanted to show the win32k!_W32Process structure. Unfortunately, win32k is a kernel mode SYS file, so the symbols are not available in the user mode session.
I know that I can always load a DLL, EXE or SYS as a dump file and then inspect the symbols. Usually I would do that via File/Open Crash Dump.
This time, I wanted to show the participants of a debugging workshop that it's possible to debug multiple systems at the same time, so I opened the Win32K.sys via WinDbg's command prompt:
0:003> |
. 0 id: 10fc attach name: [...]\NetHeaps.exe
0:003> .opendump C:\Windows\winsxs\[...]\win32k.sys
Loading Dump File [C:\Windows\winsxs\[...]\win32k.sys]
Opened 'C:\Windows\winsxs\[...]\win32k.sys'
||0:0:003>
As we can now see, we have 2 systems and I'm currently on the live debugging system:
||0:0:003> ||
. 0 Live user mode: <Local>
1 Image file: C:\Windows\winsxs\[...]\win32k.sys
I thought I could switch to the other system now, but that does not work:
||0:0:003> ||1s
^ Illegal debuggee error in '||1s'
I would not have worried too much, but it can't find the symbols of win32k in this case:
||0:0:003> .reload
Reloading current modules
...........................
||0:0:003> dt win32k!_W32Process
Symbol win32k!_W32Process not found.
The problem is not in the || command, it's in the .opendump command.
The help says:
After you use the .opendump command, you must use the g (Go) command to finish loading the dump file.
Be aware that this will also run your live process. Therefore, freeze the threads first (~*f) and unfreeze later (~*u).
After that you can switch the system and display the type:
||1:1:004> ||
0 Live user mode: <Local>
. 1 Image file: C:\Windows\winsxs\[...]\win32k.sys
||1:1:004> dt _W32Process
win32k!_W32PROCESS
+0x000 Process : Ptr64 _EPROCESS
+0x008 RefCount : Uint4B
+0x00c W32PF_Flags : Uint4B
[...]

Registering a new Command Line Option in RYU App

I need to be able to read in a path file from my simple_switch.py application.I have added the following code to my simple_switch.py in python.
LOG = logging.getLogger(__name__)
CONF = cfg.CONF
CONF.register_cli_opts([
cfg.StrOpt('path-file', default='test.txt',
help='path-file')
])
I attempt to start the application as follows.
bin/ryu-manager --observe-links --path-file test.txt ryu/app/simple_switch.py
However I get the following error.
usage: ryu-manager [-h] [--app-lists APP_LISTS] [--ca-certs CA_CERTS]
[--config-dir DIR] [--config-file PATH]
[--ctl-cert CTL_CERT] [--ctl-privkey CTL_PRIVKEY]
[--default-log-level DEFAULT_LOG_LEVEL] [--explicit-drop]
[--install-lldp-flow] [--log-config-file LOG_CONFIG_FILE]
[--log-dir LOG_DIR] [--log-file LOG_FILE]
[--log-file-mode LOG_FILE_MODE]
[--neutron-admin-auth-url NEUTRON_ADMIN_AUTH_URL]
[--neutron-admin-password NEUTRON_ADMIN_PASSWORD]
[--neutron-admin-tenant-name NEUTRON_ADMIN_TENANT_NAME]
[--neutron-admin-username NEUTRON_ADMIN_USERNAME]
[--neutron-auth-strategy NEUTRON_AUTH_STRATEGY]
[--neutron-controller-addr NEUTRON_CONTROLLER_ADDR]
[--neutron-url NEUTRON_URL]
[--neutron-url-timeout NEUTRON_URL_TIMEOUT]
[--noexplicit-drop] [--noinstall-lldp-flow]
[--noobserve-links] [--nouse-stderr] [--nouse-syslog]
[--noverbose] [--observe-links]
[--ofp-listen-host OFP_LISTEN_HOST]
[--ofp-ssl-listen-port OFP_SSL_LISTEN_PORT]
[--ofp-tcp-listen-port OFP_TCP_LISTEN_PORT] [--use-stderr]
[--use-syslog] [--verbose] [--version]
[--wsapi-host WSAPI_HOST] [--wsapi-port WSAPI_PORT]
[--test-switch-dir TEST-SWITCH_DIR]
[--test-switch-target TEST-SWITCH_TARGET]
[--test-switch-tester TEST-SWITCH_TESTER]
[app [app ...]]
ryu-manager: error: unrecognized arguments: --path-file
It does look like I need to register a new command line option somewhere before I can use it.Can some-one point out to me how to do that? Also can someone explain how to access the file(text.txt) inside the program?
You're on the right track, however the CONF entry that you are creating actually needs to be loaded before your app is loaded, otherwise ryu-manager has no way of knowing it exists!
The file you are looking for is flags.py, under the ryu directory of the source tree (or under the root installation directory).
This is how the ryu/tests/switch/tester.py Ryu app defines it's own arguments, so you might use that as your reference:
CONF.register_cli_opts([
# tests/switch/tester
cfg.StrOpt('target', default='0000000000000001', help='target sw dp-id'),
cfg.StrOpt('tester', default='0000000000000002', help='tester sw dp-id'),
cfg.StrOpt('dir', default='ryu/tests/switch/of13',
help='test files directory')
], group='test-switch')
Following this format, the CONF.register_cli_opts takes an array of config types exactly as you have done it (see ryu/cfg.py for the different types available).
You'll notice that when you run the ryu-manager help, i.e.
ryu-manager --help
the list that comes up is sorted by application (e.g. the group of arguments under 'test-switch options'). For that reason, you will want to specify a group name for your set of commands.
Now let us say that you used the group name 'my-app' and have an argument named 'path-file' in that group, the command line argument will be --my-app-path-file (this can get a little long), while you can access it in your application like this:
from ryu import cfg
CONF = cfg.CONF
path_file = CONF['my-app']['path_file']
Note the use of dash versus the use of underscores.
Cheers!

opensplice dds Hello Word Example

I am posting here after asking the question at the openslice dds forum, and not receiving any reply.I am trying to use opensplice dds on a ubuntu machine. I am not sure if it serves as a proof of proper installation, but I have pasted my release.com file below. Now, I was able to run the ping pong example just fine. But when I ran the executable sac_helloworld_pub ( HelloWorld example in the C programming language), I got the following error
vishal#expmach:~/HDE/x86.linux2.6/examples/dcps/HelloWorld/c/standalone$ ./sac_helloworld_pub
Error in DDS_DomainParticipantFactory_create_participant: Creation failed: invalid handle
I did some searching, and it looks like I need to be running the ospl start command from the terminal. But when I do so, I get a No command ospl found message. Below is the release.comfile's contents
echo "<<< OpenSplice HDE Release V6.3.130716OSS For x86.linux2.6, Date 2013-07-30 >>>"
if [ "${SPLICE_ORB:=}" = "" ]
then
SPLICE_ORB=DDS_OpenFusion_1_6_1
export SPLICE_ORB
fi
if [ "${SPLICE_JDK:=}" = "" ]
then
SPLICE_JDK=jdk
export SPLICE_JDK
fi
OSPL_HOME="/home/vishal/HDE/x86.linux2.6"
OSPL_TARGET=x86.linux2.6
PATH=$OSPL_HOME/bin:$PATH
LD_LIBRARY_PATH=$OSPL_HOME/lib${LD_LIBRARY_PATH:+:}$LD_LIBRARY_PATH
CPATH=$OSPL_HOME/include:$OSPL_HOME/include/sys:${CPATH:=}
OSPL_URI=file://$OSPL_HOME/etc/config/ospl.xml
OSPL_TMPL_PATH=$OSPL_HOME/etc/idlpp
. $OSPL_HOME/etc/java/defs.$SPLICE_JDK
export OSPL_HOME OSPL_TARGET PATH LD_LIBRARY_PATH CPATH OSPL_TMPL_PATH OSPL_URI
$#
release.com (END)
Sorry for the holidays-driven lack of 'reactivity' on the OpenSplice forum .. I've answered your question there though ..
Here's that same answer for completeness:
*For the 6.3 community-edition, the deployment-model changed from shared-memory (v5.x) to the so-called single-process standalone deployment mode where the middleware is simply linked (as libraries) with the application so you don't need to start any daemons first (as was the case for the federated 'shared-memory' mode that was the default in V5).
So its OK that you get the error when trying to call 'ospl' as thats not used anymore so isn't in the distribution.
Now to your issue, your release.com looks OK to me, but perhaps you didn't actually 'source' it in your environment i.e. calling it with a '.' in front of it:
promtp> . release.com
you can verify that by doing an 'echo $OSPL_HOME' in your shell and see if it actually shows the value of the env. variable as set by the release.com.
Hope that helps,
-Hans*

Unexpected error while loading data

I am getting an "Unexpected" error. I tried a few times, and I still could not load the data. Is there any other way to load data?
gs://log_data/r_mini_raw_20120510.txt.gzto567402616005:myv.may10c
Errors:
Unexpected. Please try again.
Job ID: job_4bde60f1c13743ddabd3be2de9d6b511
Start Time: 1:48pm, 12 May 2012
End Time: 1:51pm, 12 May 2012
Destination Table: 567402616005:myvserv.may10c
Source URI: gs://log_data/r_mini_raw_20120510.txt.gz
Delimiter: ^
Max Bad Records: 30000
Schema:
zoneid: STRING
creativeid: STRING
ip: STRING
update:
I am using the file that can be found here:
http://saraswaticlasses.net/bad.csv.zip
bq load -F '^' --max_bad_record=30000 mycompany.abc bad.csv id:STRING,ceid:STRING,ip:STRING,cb:STRING,country:STRING,telco_name:STRING,date_time:STRING,secondary:STRING,mn:STRING,sf:STRING,uuid:STRING,ua:STRING,brand:STRING,model:STRING,os:STRING,osversion:STRING,sh:STRING,sw:STRING,proxy:STRING,ah:STRING,callback:STRING
I am getting an error "BigQuery error in load operation: Unexpected. Please try again."
The same file works from Ubuntu while it does not work from CentOS 5.4 (Final)
Does the OS encoding need to be checked?
The file you uploaded has an unterminated quote. Can you delete that line and try again? I've filed an internal bigquery bug to be able to handle this case more gracefully.
$grep '"' bad.csv
3000^0^1.202.218.8^2f1f1491^CN^others^2012-05-02 20:35:00^^^^^"Mozilla/5.0^generic web browser^^^^^^^^
When I run a load from my workstation (Ubuntu), I get a warning about the line in question. Note that if you were using a larger file, you would not see this warning, instead you'd just get a failure.
$bq show --format=prettyjson -j job_e1d8636e225a4d5f81becf84019e7484
...
"status": {
"errors": [
{
"location": "Line:29057 / Field:12",
"message": "Missing close double quote (\") character: field starts with: <Mozilla/>",
"reason": "invalid"
}
]
My suspicion is that you have rows or fields in your input data that exceed the 64 KB limit. Perhaps re-check the formatting of your data, check that it is gzipped properly, and if all else fails, try importing uncompressed data. (One possibility is that the entire compressed file is being interpreted as a single row/field that exceeds the aforementioned limit.)
To answer your original question, there are a few other ways to import data: you could upload directly from your local machine using the command-line tool or the web UI, or you could use the raw API. However, all of these mechanisms (including the Google Storage import that you used) funnel through the same CSV parser, so it's possible that they'll all fail in the same way.