Hive External Table Returning zero Rows - hive

I have a snappy compressed parquet file in local directory /home/hive/part-00000-52d40ae4-92cd-414c-b4f7-bfa795ee65c8-c000.snappy.parque.
When an external hive table is created with the below command it gets executed but when select * from parquet_hive123456789 is run then no rows are returned.
CREATE EXTERNAL TABLE parquet_hive123456789 (
`ip` string,
`request` string,
`status` string,
`userid` string,
`bytes` string,
`agent` string,
`timestamp` timestamp
) STORED AS PARQUET
LOCATION '/home/hive/';
Through parquet-tools i am able to see the contents in the file.
parquet-tools show part-00000-52d40ae4-92cd-414c-b4f7-bfa795ee65c8-c000.snappy.parquet
+-----------------+-------------------------------------+----------+----------+---------+---------------------------------------------------------------------------------------------------------------------+-------------+
| ip | request | status | userid | bytes | agent | timestamp |
|-----------------+-------------------------------------+----------+----------+---------+---------------------------------------------------------------------------------------------------------------------+-------------|
| 222.203.236.146 | GET /site/user_status.html HTTP/1.1 | 405 | 13 | 14096 | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36 | NaT |
| 122.152.45.245 | GET /site/login.html HTTP/1.1 | 407 | 5 | 278 | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36 | NaT |
| 222.152.45.45 | GET /site/user_status.html HTTP/1.1 | 302 | 22 | 4096 | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36 | NaT |
| 222.245.174.248 | GET /index.html HTTP/1.1 | 404 | 7 | 14096 | Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) | NaT |
| 122.173.165.203 | GET /index.html HTTP/1.1 | 200 | 39 | 278 | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36 | NaT |
| 122.168.57.222 | GET /images/logo-small.png HTTP/1.1 | 404 | 2 | 14096 | Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) | NaT |
| 122.152.45.245 | GET /images/track.png HTTP/1.1 | 405 | 5 | 278 | Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) | NaT |
| 122.173.165.203 | GET /site/user_status.html HTTP/1.1 | 407 | 39 | 14096 | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36 | NaT |
| 222.245.174.248 | GET /images/track.png HTTP/1.1 | 302 | 7 | 278 | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36 | NaT |
| 122.173.165.203 | GET /site/user_status.html HTTP/1.1 | 200 | 39 | 14096 | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36 | NaT |
+-----------------+-------------------------------------+----------+----------+---------+---------------------------------------------------------------------------------------------------------------------+-------------+
Can somebody please help ?

LOCATION should be HDFS directory, not local. Directory like '/home/hive' may exist in HDFS also, but this is bad idea to name table location like this. It should be table-specific name because all tables data should be in there own locations, separated from other tables. Usually table dir looks like this: /user/hadoop/mytable - where mytable is a table name.
Put your file into HDFS dir. For example like this (use your path in HDFS):
hdfs dfs -put /home/hive/part-00000-52d40ae4-92cd-414c-b4f7-bfa795ee65c8-c000.snappy.parque /user/hadoop/table_dir/
Check file exists in HDFS (use your HDFS path):
hdfs dfs -ls '/user/hadoop/table_dir/'
Then create table (EXTERNAL or MANAGED, does not matter in this context) with location in HDFS: '/user/hadoop/table_dir/'
Alternatively you can create table, then load local file into it using LOAD DATA LOCAL INPATH command like in this answer.

Related

gfsh create command region-time-to-live not working as expected

When I create a region using the following command and use describe afterwards, it doesn't show me region-time-to-live settings at all. Only when I use alter I can see the entry-time-to-live and region-time-to-live to be set properly.
gfsh>create region --name=myRegion --type=REPLICATE --enable-statistics=true --entry-time-to-live-expiration=200 --region-time-to-live-expiration=2000
gfsh>describe region --name
Region | entry-time-to-live.timeout | 2000
| data-policy | REPLICATE
| size | 0
| statistics-enabled | true
| scope | distributed-ack
gfsh>alter region --name=myRegion --entry-time-to-live-expiration=200 --region-time-to-live-expiration=2000
gfsh>describe region --name
Region | entry-time-to-live.timeout | 200
| data-policy | REPLICATE
| region-time-to-live.timeout| 2000
| size | 0
| statistics-enabled | true
| scope | distributed-ack
I believe this bug was already solved in the latest develop branch from GEODE, specifically through GEODE-1897. Below is the output I see:
_________________________ __
/ _____/ ______/ ______/ /____/ /
/ / __/ /___ /_____ / _____ /
/ /__/ / ____/ _____/ / / / /
/______/_/ /______/_/ /_/ 1.5.0-SNAPSHOT
Monitor and Manage Apache Geode
gfsh>start locator --name=locator1
gfsh>start server --name=server1
gfsh>create region --name=myRegion --type=REPLICATE --enable-statistics=true --entry-time-to-live-expiration=200 --region-time-to-live-expiration=2000
Member | Status
------- | ---------------------------------------
server1 | Region "/myRegion" created on "server1"
gfsh>describe region --name=/myRegion
..........................................................
Name : myRegion
Data Policy : replicate
Hosting Members : server1
Non-Default Attributes Shared By Hosting Members
Type | Name | Value
------ | --------------------------- | ---------------
Region | entry-time-to-live.timeout | 200
| data-policy | REPLICATE
| region-time-to-live.timeout | 2000
| size | 0
| scope | distributed-ack
| statistics-enabled | true
Hope this helps.
Cheers.

MariaDB v 10: console login failure

I have installed MariaDB server:
$ mysql --version
mysql Ver 15.1 Distrib 10.0.32-MariaDB, for debian-linux-gnu (x86_64) using readline 5.2
and have created new user 'alex':
> SELECT User, Host FROM mysql.user;
+------------------+-----------+
| User | Host |
+------------------+-----------+
| root | 127.0.0.1 |
| root | ::1 |
| alex | localhost |
| debian-sys-maint | localhost |
| root | localhost |
| root | myhost |
+------------------+-----------+
I can connect to the server as 'alex'#'localhost' using DBeaver client but I cannot do the same from console:
$ mysql -h localhost --user=alex --password=...
ERROR 1045 (28000): Access denied for user 'alex'#'localhost' (using password: YES)
I can connect as 'root' from console but not as 'alex'. Permissions are OK when I connected with DBeaver.
I can't reproduce the problem:
$ mysql -u root -p
Enter password:
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 44
Server version: 10.0.32-MariaDB mariadb.org binary distribution
MariaDB [(none)]> CREATE USER 'alex'#'localhost';
Query OK, 0 rows affected (0.00 sec)
MariaDB [(none)]> SELECT `User`, `Host` FROM `mysql`.`user`;
+------------------+-----------+
| User | Host |
+------------------+-----------+
| root | 127.0.0.1 |
| root | ::1 |
| alex | localhost |
| debian-sys-maint | localhost |
| root | localhost |
| root | myhost |
+------------------+-----------+
6 rows in set (0.01 sec)
MariaDB [(none)]> exit
Bye
$ mysql -h localhost --user=alex --password
Enter password:
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 48
Server version: 10.0.32-MariaDB mariadb.org binary distribution
MariaDB [(none)]>

Please help me with RESTful http caching

I am developing a mobile application that loads the data from our REST- Server. For example, this list of products . Products are stored on the server in a database table and has the following structure:
| id | updatet_at | name | price |
|----|---------------------|-------|-------|
| 1 | 23.08.2015 06:00:00 | bread | 10 |
| 2 | 24.08.2015 12:00:00 | butter| 55 |
| 3 | 24.08.2015 12:00:00 | cheese| 180 |
| 4 | 24.08.2015 18:00:00 | sugar | 80 |
My goal is to understand the standard caching scheme. Now my caching works this way:
1)
GET /api/v1/products HTTP/1.1
Accept: application/json
HTTP/1.1 200 OK
Last-Modified: 24.08.2015 18:00:00
Content-Type: application/json
[
{ "id" : "1", "name" : "bread", "price" : 10 },
{ "id" : "2", "name" : "butter", "price" : 55 },
{ "id" : "3", "name" : "cheese", "price" : 180 },
{ "id" : "4", "name" : "sugar", "price" : 80 }
]
After the server gave me the answer I caches data in the same table in the local database with the only difference, I put updated_at each element == Last-Modified came with the server. So my local database as follows:
| id | updatet_at | name | price |
|----|---------------------|-------|-------|
| 1 | 24.08.2015 18:00:00 | bread | 10 |
| 2 | 24.08.2015 18:00:00 | butter| 55 |
| 3 | 24.08.2015 18:00:00 | cheese| 180 |
| 4 | 24.08.2015 18:00:00 | sugar | 80 |
Then I send the following request :
2)
GET /api/v1/products HTTP/1.1
Accept: application/json
If-Modified-Since: 24.08.2015 18:00:00
HTTP/1.1 304 Not Modified
[Empty body]
The request header If-Modified-Since I was substitute the latest date from the field updated_at, because Server data is not updated or added, the response is returned from the server 304. Let the server added one entry and one existing changed:
| id | updatet_at | name | price |
|----|---------------------|-------|-------|
| 1 | 23.08.2015 06:00:00 | bread | 10 |
| 2 | 24.08.2015 12:00:00 | butter| 55 |
| 3 | 24.08.2015 12:00:00 | cheese| 180 |
| 4 | 26.08.2015 09:00:00 | sugar | 90 |
| 5 | 26.08.2015 08:00:00 | flour | 60 |
Again, I send an inquiry
3)
GET /api/v1/products HTTP/1.1
Accept: application/json
If-Modified-Since: 24.08.2015 18:00:00
HTTP/1.1 200 OK
Last-Modified: 26.08.2015 09:00:00
Content-Type: application/json
[
{ "id" : "4", "name" : "sugar", "price" : 90 },
{ "id" : "5", "name" : "flour", "price" : 60 }
]
After receiving this response, I change the record with id == 4 , in accordance with the answer , and add a new entry . Both of these records I put updated_at == Last-Modified response from the appropriate server . My implementation of caching I was completely satisfied until I entered selections on parameters GET request . Let's say I cleaned a local cache and sends the following query:
4)
GET /api/v1/products?min_price=60&max_price=100 HTTP/1.1
Accept: application/json
HTTP/1.1 200 OK
Last-Modified: 26.08.2015 09:00:00
Content-Type: application/json
[
{ "id" : "4", "name" : "sugar", "price" : 90 },
{ "id" : "5", "name" : "flour", "price" : 60 }
]
Here, in the GET parameters string I pass min_price = 60 and max_price = 100 . The server according to my request, selects among existing elements , and gives me 2 suitable according caching scheme , he substitutes the maximum value updated_at among the selected elements in the Last-Modified. My client application caches data come . The local database is as follows:
| id | updatet_at | name | price |
|----|---------------------|-------|-------|
| 4 | 26.08.2015 09:00:00 | sugar | 90 |
| 5 | 26.08.2015 09:00:00 | flour | 60 |
Now, when I will send a request for all products:
5)
GET /api/v1/products HTTP/1.1
Accept: application/json
If-Modified-Since: 26.08.2015 09:00:00
HTTP/1.1 304 Not Modified
[Empty body]
It turns out that this situation can not I get older ( with less updated_at) products due to the fact that I already give maximum updated_at, so my caching scheme broke down. Prompt how to use standard HTTP caching for dynamic data?
If you're going to request individual resources, you can use the individual updated_at value. But if you want to request the whole collection using If-Modified-Since, you need to separately track the last time you requested the whole collection. Otherwise you're going to run into exactly the issue you're seeing, because individual records may be updated indpendently of the whole collection.
Pretend you were using ETags instead of the last modified date. You wouldn't send up the ETag for an individual record and expect to get back the correct collection results, would you?
Another reasonable option is to request the collection without an ETag, and then do individual GETs to retrieve the resources, sending up If-Modified-Since for each individual resource. That approach may or may not be too noisy for you.

compute engine load balancer UDP/DNS responses dropped

Have been testing out GCE and the load balancing capabilities - however have been seeing some unexpected results.
The trial configuration involves 2 instances acting as DNS resolvers in a target pool with a 3rd test instance. There is also a http server running on the hosts. No health check scripts have been added.
DNS request to individual instance public IP (from ANY) - OK
HTTP request to individual instance public IP (from ANY) - OK
HTTP request to load balance IP (from ANY) - OK
DNS request to load balance IP (from an instance in the target pool) - OK
DNS request to load balance IP (from an instance in the same network - but not in the target pool) - NOK
DNS request to load balance IP (other) - NOK
I can see in the instance logs that the DNS request arrive for all cases and are distributed evenly - though the replies don't seem to get back to the originator.
The behavior seems unexpected. I've played with the session affinity with similar results - though the default behavior is the most desired option.
Have hit a wall. Are there some ideas to try?
Information on the setup:
$ gcutil listhttphealthchecks
+------+------+------+
| name | host | port |
+------+------+------+
$ gcutil listtargetpools
+----------+-------------+
| name | region |
+----------+-------------+
| dns-pool | us-central1 |
+----------+-------------+
$ gcutil listforwardingrules
+---------+-------------+-------------+
| name | region | ip |
+---------+-------------+-------------+
| dns-tcp | us-central1 | 8.34.215.45 |
+---------+-------------+-------------+
| dns-udp | us-central1 | 8.34.215.45 |
+---------+-------------+-------------+
| http | us-central1 | 8.34.215.45 |
+---------+-------------+-------------+
$ gcutil getforwardingrule dns-udp
+---------------+----------------------------------+
| name | dns-udp |
| description | |
| creation-time | 2013-12-28T12:28:05.816-08:00 |
| region | us-central1 |
| ip | 8.34.215.45 |
| protocol | UDP |
| port-range | 53-53 |
| target | us-central1/targetPools/dns-pool |
+---------------+----------------------------------+
$ gcutil gettargetpool dns-pool
+------------------+-------------------------------+
| name | dns-pool |
| description | |
| creation-time | 2013-12-28T11:48:08.896-08:00 |
| health-checks | |
| session-affinity | NONE |
| failover-ratio | |
| backup-pool | |
| instances | us-central1-a/instances/dns-1 |
| | us-central1-b/instances/dns-2 |
+------------------+-------------------------------+
[#dns-1 ~]$ curl "http://metadata/computeMetadata/v1/instance/network-interfaces/?recursive=true" -H "X-Google-Metadata-Request: True"
[{"accessConfigs":[{"externalIp":"162.222.178.116","type":"ONE_TO_ONE_NAT"}],"forwardedIps":["8.34.215.45"],"ip":"10.240.157.97","network":"projects/763472520840/networks/default"}]
[#dns-2 ~]$ curl "http://metadata/computeMetadata/v1/instance/network-interfaces/?recursive=true" -H "X-Google-Metadata-Request: True"
[{"accessConfigs":[{"externalIp":"8.34.215.162","type":"ONE_TO_ONE_NAT"}],"forwardedIps":["8.34.215.45"],"ip":"10.240.200.109","network":"projects/763472520840/networks/default"}]
$ gcutil getfirewall dns2
+---------------+------------------------------------+
| name | dns2 |
| description | Allow the incoming service traffic |
| creation-time | 2013-12-28T10:35:18.185-08:00 |
| network | default |
| source-ips | 0.0.0.0/0 |
| source-tags | |
| target-tags | |
| allowed | tcp: 53 |
| allowed | udp: 53 |
| allowed | tcp: 80 |
| allowed | tcp: 443 |
+---------------+------------------------------------+
The instances are CentOS and have their iptables firewalls disabled.
Reply from instance in target pool
#dns-1 ~]$ nslookup test 8.34.215.45 | grep answer
Non-authoritative answer:
#dns-1 ~]$
Reply from other instance in target pool
#dns-2 ~]$ nslookup test 8.34.215.45 | grep answer
Non-authoritative answer:
#dns-2 ~]$
No reply from instance not in the target pool on the load balanced IP. However it gets a reply from all other interfaces
#dns-3 ~]$ nslookup test 8.34.215.45 | grep answer
#dns-3 ~]$
#dns-3 ~]$ nslookup test 8.34.215.162 | grep answer
Non-authoritative answer:
#dns-3 ~]$ nslookup test 10.240.200.109 | grep answer
Non-authoritative answer:
#dns-3 ~]$ nslookup test 10.240.157.97 | grep answer
Non-authoritative answer:
#dns-3 ~]$ nslookup test 162.222.178.116 | grep answer
Non-authoritative answer:
-- Update --
Added a health check so that the instances wouldn't be marked as UNHEALTHY. However got the same result.
$ gcutil gettargetpoolhealth dns-pool
+-------------------------------+-------------+--------------+
| instance | ip | health-state |
+-------------------------------+-------------+--------------+
| us-central1-a/instances/dns-1 | 8.34.215.45 | HEALTHY |
+-------------------------------+-------------+--------------+
| us-central1-b/instances/dns-2 | 8.34.215.45 | HEALTHY |
+-------------------------------+-------------+--------------+
-- Update --
Looks like the DNS service is not responding with the same IP that the request came in on. This is for sure be the reason it doens't appear to be responding.
0.000000 162.222.178.130 -> 8.34.215.45 DNS 82 Standard query 0x5323 A test.internal
2.081868 10.240.157.97 -> 162.222.178.130 DNS 98 Standard query response 0x5323 A 54.122.122.227
Looks like the DNS service is not responding with the same IP that the request came in on. This is for sure be the reason it doens't appear to be responding.
0.000000 162.222.178.130 -> 8.34.215.45 DNS 82 Standard query 0x5323 A test.internal
2.081868 10.240.157.97 -> 162.222.178.130 DNS 98 Standard query response 0x5323 A 54.122.122.227

Is it possible to view RabbitMQ message contents directly from the command line?

Is it possible to view RabbitMQ message contents directly from the command line?
sudo rabbitmqctl list_queues lists the queues.
Is there any command like sudo rabbitmqctl list_queue_messages <queue_name>?
You should enable the management plugin.
rabbitmq-plugins enable rabbitmq_management
See here:
http://www.rabbitmq.com/plugins.html
And here for the specifics of management.
http://www.rabbitmq.com/management.html
Finally once set up you will need to follow the instructions below to install and use the rabbitmqadmin tool. Which can be used to fully interact with the system.
http://www.rabbitmq.com/management-cli.html
For example:
rabbitmqadmin get queue=<QueueName> requeue=false
will give you the first message off the queue.
Here are the commands I use to get the contents of the queue:
RabbitMQ version 3.1.5 on Fedora linux using https://www.rabbitmq.com/management-cli.html
Here are my exchanges:
eric#dev ~ $ sudo python rabbitmqadmin list exchanges
+-------+--------------------+---------+-------------+---------+----------+
| vhost | name | type | auto_delete | durable | internal |
+-------+--------------------+---------+-------------+---------+----------+
| / | | direct | False | True | False |
| / | kowalski | topic | False | True | False |
+-------+--------------------+---------+-------------+---------+----------+
Here is my queue:
eric#dev ~ $ sudo python rabbitmqadmin list queues
+-------+----------+-------------+-----------+---------+------------------------+---------------------+--------+----------+----------------+-------------------------+---------------------+--------+---------+
| vhost | name | auto_delete | consumers | durable | exclusive_consumer_tag | idle_since | memory | messages | messages_ready | messages_unacknowledged | node | policy | status |
+-------+----------+-------------+-----------+---------+------------------------+---------------------+--------+----------+----------------+-------------------------+---------------------+--------+---------+
| / | myqueue | False | 0 | True | | 2014-09-10 13:32:18 | 13760 | 0 | 0 | 0 |rabbit#ip-11-1-52-125| | running |
+-------+----------+-------------+-----------+---------+------------------------+---------------------+--------+----------+----------------+-------------------------+---------------------+--------+---------+
Cram some items into myqueue:
curl -i -u guest:guest http://localhost:15672/api/exchanges/%2f/kowalski/publish -d '{"properties":{},"routing_key":"abcxyz","payload":"foobar","payload_encoding":"string"}'
HTTP/1.1 200 OK
Server: MochiWeb/1.1 WebMachine/1.10.0 (never breaks eye contact)
Date: Wed, 10 Sep 2014 17:46:59 GMT
content-type: application/json
Content-Length: 15
Cache-Control: no-cache
{"routed":true}
RabbitMQ see messages in queue:
eric#dev ~ $ sudo python rabbitmqadmin get queue=myqueue requeue=true count=10
+-------------+----------+---------------+---------------------------------------+---------------+------------------+------------+-------------+
| routing_key | exchange | message_count | payload | payload_bytes | payload_encoding | properties | redelivered |
+-------------+----------+---------------+---------------------------------------+---------------+------------------+------------+-------------+
| abcxyz | kowalski | 10 | foobar | 6 | string | | True |
| abcxyz | kowalski | 9 | {'testdata':'test'} | 19 | string | | True |
| abcxyz | kowalski | 8 | {'mykey':'myvalue'} | 19 | string | | True |
| abcxyz | kowalski | 7 | {'mykey':'myvalue'} | 19 | string | | True |
+-------------+----------+---------------+---------------------------------------+---------------+------------------+------------+-------------+
I wrote rabbitmq-dump-queue which allows dumping messages from a RabbitMQ queue to local files and requeuing the messages in their original order.
Example usage (to dump the first 50 messages of queue incoming_1):
rabbitmq-dump-queue -url="amqp://user:password#rabbitmq.example.com:5672/" -queue=incoming_1 -max-messages=50 -output-dir=/tmp
If you want multiple messages from a queue, say 10 messages, the command to use is:
rabbitmqadmin get queue=<QueueName> ackmode=ack_requeue_true count=10
This is how it looks on front interface avalable on http://localhost:15672 :
If you don't want the messages requeued, just change ackmode to ack_requeue_false.
you can use RabbitMQ API to get count or messages :
/api/queues/vhost/name/get
Get messages from a queue. (This is not an HTTP GET as it will alter the state of the queue.) You should post a body looking like:
{"count":5,"requeue":true,"encoding":"auto","truncate":50000}
count controls the maximum number of messages to get. You may get fewer messages than this if the queue cannot immediately provide them.
requeue determines whether the messages will be removed from the queue. If requeue is true they will be requeued - but their redelivered flag will be set.
encoding must be either "auto" (in which case the payload will be returned as a string if it is valid UTF-8, and base64 encoded otherwise), or "base64" (in which case the payload will always be base64 encoded).
If truncate is present it will truncate the message payload if it is larger than the size given (in bytes).
truncate is optional; all other keys are mandatory.
Please note that the publish / get paths in the HTTP API are intended for injecting test messages, diagnostics etc - they do not implement reliable delivery and so should be treated as a sysadmin's tool rather than a general API for messaging.
http://hg.rabbitmq.com/rabbitmq-management/raw-file/rabbitmq_v3_1_3/priv/www/api/index.html
a bit late to this, but yes rabbitmq has a build in tracer that allows you to see the incomming messages in a log. When enabled, you can just tail -f /var/tmp/rabbitmq-tracing/.log (on mac) to watch the messages.
the detailed discription is here http://www.mikeobrien.net/blog/tracing-rabbitmq-messages