I have a problem with PostgreSQL script runing with docker-compose. I want to create only ONE table in PostgreSQL.
My script is:
DROP TABLE IF EXISTS person;
CREATE TABLE person
(
id INT NOT NULL,
name VARCHAR(255),
surname VARCHAR(255),
age INT,
email VARCHAR(255) UNIQUE
);
Log from drocker-compose
Error message is:
postgres | Success. You can now start the database server using:
postgres |
postgres | pg_ctl -D /var/lib/postgresql/data -l logfile start
postgres |
postgres | waiting for server to start....2022-07-11 08:39:50.970 UTC [48] LOG: starting PostgreSQL 14.4 (Debian 14.4-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
postgres | 2022-07-11 08:39:50.978 UTC [48] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
postgres | 2022-07-11 08:39:51.000 UTC [49] LOG: database system was shut down at 2022-07-11 08:39:50 UTC
postgres | 2022-07-11 08:39:51.008 UTC [48] LOG: database system is ready to accept connections
postgres | done
postgres | server started
postgres | CREATE DATABASE
postgres |
postgres |
postgres | /usr/local/bin/docker-entrypoint.sh: running /docker-entrypoint-initdb.d/create-table-person.sql
postgres | 2022-07-11 08:39:51.431 UTC [62] ERROR: syntax error at or near "CREATE" at character 30
postgres | 2022-07-11 08:39:51.431 UTC [62] STATEMENT: DROP TABLE IF EXISTS person
postgres | CREATE TABLE person
postgres | (
postgres | id integer NOT NULL PRIMARY KEY,
postgres | name VARCHAR(255),
postgres | surname VARCHAR(255),
postgres | age integer,
postgres | email VARCHAR(255)
postgres | );
postgres | psql:/docker-entrypoint-initdb.d/create-table-person.sql:9: ERROR: syntax error at or near "CREATE"
postgres | LINE 2: CREATE TABLE person
postgres | ^
postgres exited with code 3
and my Dockerfile for PostgreSQL is:
FROM postgres
ENV POSTGRES_DB testdatabase1
COPY create-table-person.sql /docker-entrypoint-initdb.d/
Related
I have following question which return created_at timestamps. I would like to convert it in total hours from now. Is there an easy way to make that conversion and print it in total hours?
MariaDB version 10.5.12-MariaDB-1:10.5.12+maria~focal-log
MariaDB [nova]> select hostname, uuid, instances.created_at, instances.deleted_at, json_extract(flavor, '$.cur.*."name"') AS FLAVOR from instances join instance_extra on instances.uuid = instance_extra.instance_uuid WHERE (vm_state='active' OR vm_state='stopped');
+----------+--------------------------------------+---------------------+------------+--------------+
| hostname | uuid | created_at | deleted_at | FLAVOR |
+----------+--------------------------------------+---------------------+------------+--------------+
| vm1 | ef6380b4-5455-48f8-9e4b-3d04199be3f5 | 2023-01-05 14:25:51 | NULL | ["tempest2"] |
+----------+--------------------------------------+---------------------+------------+--------------+
1 row in set (0.001 sec)
Try it like this:
SELECT hostname, UUID, instances.created_at,
TIMESTAMPDIFF(hour,instances.created_at, NOW()) AS HOURDIFF,
instances.deleted_at,
JSON_EXTRACT(flavor, '$.cur.*."name"') AS FLAVOR
FROM instances
JOIN instance_extra ON instances.uuid = instance_extra.instance_uuid
WHERE (vm_state='active' OR vm_state='stopped');
Demo fiddle
I am unable to delete a row in postgres db. This is what it shows
candlepin=# delete from cp_upstream_consumer where uuid = 'd88b0079-a271-4ee7-a7fe-ee3a1a7d5';
Cancel request sent
ERROR: canceling statement due to user request
CONTEXT: while locking tuple (0,5) in relation "cp_owner"
SQL statement "SELECT 1 FROM ONLY "public"."cp_owner" x WHERE $1::pg_catalog.text OPERATOR(pg_catalog.=) "upstream_id"::pg_catalog.text FOR KEY SHARE OF x"
This has been hung for several minutes. And after I force quit, It says it has a relation with cp_owner column. But when if we try to delete cp_column, the db may crash. So is there any other way to delete the entry in cp_upstream_consumer table? Since I am new to postgres, I am unable to find out the possible alternatives.
This is what I have in cp_owner table
candlepin=# select * from cp_owner;
id | created | updated | contentprefix | defaultservicelevel | displayname | acco
unt | parent_owner | upstream_id | loglevel | autobind_disabled | content_access_mode | content_access_mode_list | last_refreshe
d | autobind_hypervisor_disabled
----------------------------------+-------------------------------+-------------------------------+---------------+---------------------+-------------+-----
----+--------------+----------------------------------+----------+-------------------+---------------------+--------------------------+---------------------
----------+------------------------------
021308a2752d917a01752d91b05d0001 | 2020-10-16 00:12:03.997+05:30 | 2021-04-16 16:08:32.789+05:30 | /COT/$env | | COT | COT
| | 021308a278d03bc50178da42c1a402bd | | f | entitlement | entitlement | 2021-04-16 16:08:32.
781+05:30 | f
(1 row)
EDIT
After removing couple of postgres process I tried to re-run, and this is the new error when I try to run delete command
candlepin=# delete from cp_upstream_consumer where uuid = 'd88b0079-a271-4ee7-a7fe-ee3a1a7d5';
ERROR: update or delete on table "cp_upstream_consumer" violates foreign key constraint "fk_upstream_consumer_id" on table "cp_owner"
DETAIL: Key (id)=(021308a278d03bc50178da42c1a402bd) is still referenced from table "cp_owner".
Thanks in advance
cp_upstream_consumer has a foreign key to cp_owner, and there is a long running transaction that holds a lock on the referenced row.
Kill the database session that holds the lock, and you can delete the row.
We have followed the below steps ,
imported a table from MySQL to HDFS location user/hive/warehouse/orders/, the table schema as
mysql> describe orders;
+-------------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+-------------+------+-----+---------+-------+
| order_id | int(11) | YES | | NULL | |
| order_date | varchar(30) | YES | | NULL | |
| order_customer_id | int(11) | YES | | NULL | |
| order_items | varchar(30) | YES | | NULL | |
+-------------------+-------------+------+-----+---------+-------+
Created an External Table in Hive using the same data from (1).
CREATE EXTERNAL TABLE orders
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION 'hdfs:///user/hive/warehouse/retail_stage.db/orders'
TBLPROPERTIES ('avro.schema.url'='hdfs://host_name//tmp/sqoop-cloudera/compile/bb8e849c53ab9ceb0ddec7441115125d/orders.avsc');
Sqoop Command :
sqoop import \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username=root \
--password=cloudera \
--table orders \
--target-dir /user/hive/warehouse/retail_stage.db/orders \
--as-avrodatafile \
--split-by order_id
Describe formatted orders , returning error , tried many combination but failed.
hive> describe orders;
OK
error_error_error_error_error_error_error string from deserializer
cannot_determine_schema string from deserializer
check string from deserializer
schema string from deserializer
url string from deserializer
and string from deserializer
literal string from deserializer
Time taken: 1.15 seconds, Fetched: 7 row(s)
Same thing worked for --as-textfile , where as throwing error in case of --as-avrodatafile.
Referred some stack overflow but did not able to resolve. Any idea?
I think the reference to avro schema file in TBLPROPERTIES should be checked.
does following resolve?
hdfs dfs -cat hdfs://host_name//tmp/sqoop-cloudera/compile/bb8e849c53ab9ceb0ddec7441115125d/orders.avsc
I was able to create exact scenario and select from hive table.
hive> CREATE EXTERNAL TABLE sqoop_test
> COMMENT "A table backed by Avro data with the Avro schema stored in HDFS"
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
> STORED AS
> INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
> LOCATION '/user/cloudera/categories/'
> TBLPROPERTIES
> ('avro.schema.url'='hdfs:///user/cloudera/categories.avsc')
> ;
OK
Time taken: 1.471 seconds
hive> select * from sqoop_test;
OK
1 2 Football
2 2 Soccer
3 2 Baseball & Softball
Although Impala is much faster than Hive, we used Hive because it supports complex (nested) data types such as arrays and maps.
I notice that Impala, as of CDH5.5, now supports complex data types. Since it's also possible to run Hive UDF's in Impala, we can probably do everything we want in Impala, but much, much faster. That's great news!
As I scan through the documentation, I see that Impala expects data to be stored in Parquet format. My data, in its raw form, happens to be a two-column CSV where the first column is an ID, and the second column is a pipe-delimited array of strings, e.g.:
123,ASDFG|SDFGH|DFGHJ|FGHJK
234,QWERT|WERTY|ERTYU
A Hive table was created:
CREATE TABLE `id_member_of`(
`id` INT,
`member_of` ARRAY<STRING>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '|'
LINES TERMINATED BY '\n'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
The raw data was loaded into the Hive table:
LOAD DATA LOCAL INPATH 'raw_data.csv' INTO TABLE id_member_of;
A Parquet version of the table was created:
CREATE TABLE `id_member_of_parquet` (
`id` STRING,
`member_of` ARRAY<STRING>)
STORED AS PARQUET;
The data from the CSV-backed table was inserted into the Parquet table:
INSERT INTO id_member_of_parquet SELECT id, member_of FROM id_member_of;
And the Parquet table is now queryable in Hive:
hive> select * from id_member_of_parquet;
123 ["ASDFG","SDFGH","DFGHJ","FGHJK"]
234 ["QWERT","WERTY","ERTYU"]
Strangely, when I query the same Parquet-backed table in Impala, it doesn't return the array column:
[hadoop01:21000] > invalidate metadata;
[hadoop01:21000] > select * from id_member_of_parquet;
+-----+
| id |
+-----+
| 123 |
| 234 |
+-----+
Question: What happened to the array column? Can you see what I'm doing wrong?
It turned out to be really simple: we can access the array by adding it to the FROM with a dot, e.g.
Query: select * from id_member_of_parquet, id_member_of_parquet.member_of
+-----+-------+
| id | item |
+-----+-------+
| 123 | ASDFG |
| 123 | SDFGH |
| 123 | DFGHJ |
| 123 | FGHJK |
| 234 | QWERT |
| 234 | WERTY |
| 234 | ERTYU |
+-----+-------+
The following query is running very slowly for me:
SELECT r.comp,b.comp,n.comp
FROM paths AS p
INNER JOIN comps AS r ON (p.root=r.id)
INNER JOIN comps AS b ON (p.base=b.id)
INNER JOIN comps AS n ON (p.name=n.id);
Running EXPLAIN (BUFFERS,ANALYZE) gives the following result:
http://explain.depesz.com/s/iKG
Is it (re)building a hash for the comps table for each alias? Anything I can do to make this faster? Note: running two separate queries to join the data myself is faster.
Postgres version: 9.1.9
Machine: Ubuntu 12.04 8 | 4-core Xeon 2.5Ghz | 8GB of RAM
archiving=> \d+ comps
Table "public.comps"
Column | Type | Modifiers | Storage | Description
--------+--------+----------------------------------------------------+----------+-------------
id | bigint | not null default nextval('comps_id_seq'::regclass) | plain |
comp | text | not null | extended |
Indexes:
"comps_pkey" PRIMARY KEY, btree (id)
"comps_comp_key" UNIQUE CONSTRAINT, btree (comp)
"comps_comp_idx" btree (comp)
"comps_id_idx" btree (id)
Has OIDs: no
PostgreSQL does not have any special optimizations for seljoin - so it is expected behavior.