Insufficient Resources error while inserting into SQL table using Vertica

Insufficient Resources error while inserting into SQL table using Vertica - sql

I'm running a Python script to load data from a DataFrame into a SQL Table. However, the insert command is throwing this error:
(pyodbc.Error) ('HY000', '[HY000] ERROR 3587: Insufficient resources to execute plan on pool fastlane [Request exceeds session memory cap: 28357027KB > 20971520KB]\n (3587) (SQLExecDirectW)')
This is my code:
df.to_sql('TableName',engine,schema='trw',if_exists='append',index=False) #copying data from Dataframe df to a SQL Table

Can you do the following for me:
run this command - and share the output. MAXMEMORYSIZE, MEMORYSIZE and MAXQUERYMEMORYSIZE, plus PLANNEDCONCURRENCY give you an idea of the (memory) budget at the time when the query / copy command was planned.
gessnerm#gessnerm-HP-ZBook-15-G3:~/1/fam/fam-ostschweiz$ vsql -x -c \
"select * from resource_pools where name='fastlane'"
-[ RECORD 1 ]------------+------------------
pool_id | 45035996273841188
name | fastlane
is_internal | f
memorysize | 0%
maxmemorysize |
maxquerymemorysize |
executionparallelism | 16
priority | 0
runtimepriority | MEDIUM
runtimeprioritythreshold | 2
queuetimeout | 00:05
plannedconcurrency | 2
maxconcurrency |
runtimecap |
singleinitiator | f
cpuaffinityset |
cpuaffinitymode | ANY
cascadeto |
Then, you should dig, out of the QUERY_REQUESTS system table, the acutal SQL command that your python script triggered. It should be in the format of:
COPY <_the_target_table_>
FROM STDIN DELIMITER ',' ENCLOSED BY '"'
DIRECT REJECTED DATA '<_bad_file_name_>'
or similar.
Then: how big is the file / are the files you're trying to load in one go? If too big, then B.Muthamizhselvi is right - you'll need to portion the data volume you load.
Can you also run:
vsql -c "SELECT EXPORT_OBJECTS('','<schema>.<table>',FALSE)"
.. .and share the output? It could well be that you have too many projections for the memory to be enough, that you are sorting by too many columns.
Hope this helps for starters ...

Related

Why Query_parallelism affects the result of a join between two UUID columns

I'm running the following test on ignite 2.10.0
I have 2 tables created with a query_parallelism=1 and without affinity key.
When I join the 2 following tables I have the result as expected.
0: jdbc:ignite:thin://localhost:10800> SELECT "id" AS "_A_id", "source_id" AS "_A_source_id" FROM PUBLIC."source_ml_blue";
+--------------------------------------+--------------------------------------+
| _A_id | _A_source_id |
+--------------------------------------+--------------------------------------+
| 86c068cd-da89-11eb-a185-3da86c6c6bb3 | 86c068cc-da89-11eb-a185-3da86c6c6bb3 |
+--------------------------------------+--------------------------------------+
1 row selected (0.004 seconds)
0: jdbc:ignite:thin://localhost:10800> SELECT "id" AS "_B_id", "flx_src_ip_text" AS "_B_src_ip" FROM PUBLIC."source_nprobe_tcp_blue";
+--------------------------------------+-----------+
| _B_id | _B_src_ip |
+--------------------------------------+-----------+
| 86c068cc-da89-11eb-a185-3da86c6c6bb3 | 1.1.1.1 |
+--------------------------------------+-----------+
1 row selected (0.003 seconds)
0: jdbc:ignite:thin://localhost:10800> SELECT _A."id" AS "_A_id", _A."source_id" AS "_A_source_id", _B."id" AS "_B_id", _B."flx_src_ip_text" AS "_B_src_ip" FROM PUBLIC."source_ml_blue" AS "_A" INNER JOIN PUBLIC."source_nprobe_tcp_blue" AS "_B" ON "_A"."source_id"="_B"."id";
+--------------------------------------+--------------------------------------+--------------------------------------+-----------+
| _A_id | _A_source_id | _B_id | _B_src_ip |
+--------------------------------------+--------------------------------------+--------------------------------------+-----------+
| 86c068cd-da89-11eb-a185-3da86c6c6bb3 | 86c068cc-da89-11eb-a185-3da86c6c6bb3 | 86c068cc-da89-11eb-a185-3da86c6c6bb3 | 1.1.1.1 |
+--------------------------------------+--------------------------------------+--------------------------------------+-----------+
1 row selected (0.005 seconds)
If I delete and create the same tables with a query_parallelism = 8, I do not have a SQL error (the parallelism is equal on the 2 tables) BUT the result of the join is empty.
any idea why I get this behavior ?

You observe this behaviour because of optimisations for parallel query execution. Most likely your records landed to different partitions (handled by a different thread). If you increase the number of records in both tables you will see a subset of this join as a result.
The most elegant option here is to let "_A"."source_id" and "_B"."id" be affinity keys. Most likely ignite.jdbc.distributedJoins is going to affect performance for clustered installation. Affinity collocation will make items with matching "_A"."source_id" and "_B"."id" reside in the same partition to avoid cross-partitional interaction (for clustered environments it would lead to additional networks hops).

The problem comes from the SQL client : it has to be aware of the parallelism.
On DBeaver, I had to enable ignite.jdbc.distributedJoins in the connection properties to make the request works properly.

Can I display column headings when querying via gcloud dataproc jobs submit spark-sql?

I'm issuing a spark-sql job to dataproc that simply displays some data from a table:
gcloud dataproc jobs submit spark-sql --cluster mycluster --region europe-west1 -e "select * from mydb.mytable limit 10"
When the data is returned and outputted to stdout I don't see column headings, I only see the raw data, whitespace delimited. I'd really like the output to be formatted better, specifically I'd like to see column headings. I tried this:
gcloud dataproc jobs submit spark-sql --cluster mycluster --region europe-west1 -e "SET hive.cli.print.header=true;select * from mydb.mytable limit 10"
But it had no affect.
Is there a way to get spark-sql to display column headings on dataproc?
If there is a way to get the data displayed like so:
+----+-------+
| ID | Name |
+----+-------+
| 1 | Jim |
| 2 | Ann |
| 3 | Simon |
+----+-------+
then that would be even better.

I have been performing some tests with a Dataproc cluster, and it looks like it is not possible to retrieve query results with the column names using Spark SQL. However, this is more a Apache Spark SQL issue, rather than Dataproc, so I have added that tag to your question too, in order for it to receive a better attention.
If you get into the Spark SQL console in your Dataproc cluster (by SSHing in the master and typing spark-sql), you will see that the result for SELECT queries does not include the table name:
SELECT * FROM mytable;
18/04/17 10:31:51 INFO org.apache.hadoop.mapred.FileInputFormat: Total input files to process : 3
2 Ann
1 Jim
3 Simon
There's no change if using instead SELECT ID FROM mytable;. Therefore, the issue is not in the gcloud dataproc jobs sbmit spark-sql command, but instead in the fact that Spark SQL does not provide this type of data.
If you do not necessarily have to use Spark SQL, you can try using HIVE instead. HIVE does provide the type of information you want (including the column names plus a prettier formatting):
user#project:~$ gcloud dataproc jobs submit hive --cluster <CLUSTER_NAME> -e "SELECT * FROM mytable;"
Job [JOB_ID] submitted.
Waiting for job output...
+-------------+---------------+--+
| mytable.id | mytable.name |
+-------------+---------------+--+
| 2 | Ann |
| 1 | Jim |
| 3 | Simon |
+-------------+---------------+--+
Job [JOB_ID] finished successfully.

Apache Sqoop/Pig Consistent Data Representation/Processing

In our organization, we have been trying to use hadoop ecosystem based tools to implement ETLs lately. Although the ecosystem itself is quite big, we are using only a very limited set of tools at the moment. Our typical pipeline flow is as follows:
Source Database (1 or more) -> sqoop import -> pig scripts -> sqoop export -> Destination Database (1 or more)
Over a period of time, we have encountered multiple issues with the above approach to implementing ETLs. One problem we notice a fair bit is that the fields don't align properly when trying to read a field from HDFS using pig (where data on HDFS has been typically imported with sqoop) and pig script fails with an error. For instance, a string may end up in a numeric field type due to misalignment.
It appears that there are two approaches to this problem:
Remove the problem characters you know of in fields before processing with pig. This is the approach we have taken in the past. We do know that we have some bad data in our source databases - typically new lines and tabs in fields that shouldn't exist. (NOTE: we used to have tabs as field delimiters). So what we did is to use a DB view or a free-form query option sqoop that in turn uses a REPLACE function or its equivalent that's available in the source DB (typically mysql but less often postgres). This approach does work but it has the side effect of HDFS data not matching the source data. In addition, some of the other imported fields will no longer make sense - for instance, imagine that you have a MD5 or SHA1 hash on a field but the field has been modified to replace some characters, so we have to compute MD5 or SHA1 to be consistent instead of importing the one from the source DB. In addition, this approach involves trial and error to a certain extent. We wouldn't necessarily know which fields need to be modified ahead of time (and which characters to remove) so we might need more than one iteration to reach our end goal.
Use enclosure feature with sqoop in combination with escaping and combine this with a loader of an appropriate type in pig so that not only do the fields do line up properly but a given field (and its associated values) are represented the same way as the data moves through the pipeline.
I was trying to figure out a good way to accomplish #2 using different options available in sqoop and pig. Presented below is an outline of what I have tried so far in addition to the findings.
Below are the specific versions of software used for this experiment:
Sqoop: 1.4.3
Pig: 0.12.0
Hadoop: 2.0.0
Since our data sets are typically large (and would take several hours to process), I figured I'll come up with an extremely small data set that kind of mimics some of the data issues we have had. Towards this end, I put together a small table in mysql (which will be used as source DB):
mysql> desc example;
+-------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+---------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(1024) | YES | | NULL | |
| v1 | int(11) | YES | | NULL | |
| v2 | int(11) | YES | | NULL | |
| v3 | int(11) | YES | | NULL | |
+-------+---------------+------+-----+---------+----------------+
5 rows in set (0.00 sec)
After data has been added with INSERT statement, here are the contents of the example table:
mysql> select * from example;
+----+----------------------------------------------------------------------------+------+------+------+
| id | name | v1 | v2 | v3 |
+----+----------------------------------------------------------------------------+------+------+------+
| 1 | Some string, with a comma. | 1 | 2 | 3 |
| 2 | Another "string with quotes" | 4 | 5 | 6 |
| 3 | A string with
new line | 7 | 8 | 9 |
| 4 | A string with 3 new lines -
first new line
second new line
third new line | 10 | 11 | 12 |
| 5 | a string with "quote" and a
new line | 13 | 14 | 15 |
| 6 | clean record | 0 | 1 | 2 |
| 7 | single
newline | 0 | 1 | 2 |
| 8 | | 51 | 52 | 53 |
| 9 | NULL | 105 | NULL | 103 |
+----+----------------------------------------------------------------------------+------+------+------+
9 rows in set (0.00 sec)
We can readily see the new lines in name field. I didn't include tabs in this data set as I switched the delimiter from tab to comma, so there is one record with a comma. Since typical enclosing character is a double quote, there are some records with double quotes. Finally in the last two records (id = 8 and 9), I wanted to see how the empty string and nulls are represented in char type field and how null is represented in numeric type field.
I tried the following sqoop import on the above table:
sqoop import --connect jdbc:mysql://localhost/test --username user --password pass --table example --columns 'id, name, v1, v2, v3' --verbose --split-by id --target-dir example --fields-terminated-by , --escaped-by \\ --enclosed-by \" --num-mappers 1
Notice that blackslash has been used an escape character, double quote as an enclosure, and comma as field delimiter.
Here is how the data looks on HDFS:
$hadoop fs -cat example/part-m-00000
"1","Some string, with a comma.","1","2","3"
"2","Another \"string with quotes\"","4","5","6"
"3","A string with
new line","7","8","9"
"4","A string with 3 new lines -
first new line
second new line
third new line","10","11","12"
"5","a string with \"quote\" and a
new line","13","14","15"
"6","clean record","0","1","2"
"7","single
newline","0","1","2"
"8","","51","52","53"
"9","null","105","null","103"
I created a small pig script to read and parse the above data:
REGISTER '……./pig/contrib/piggybank/java/piggybank.jar';
data = LOAD 'example' USING org.apache.pig.piggybank.storage.CSVExcelStorage(',', 'YES_MULTILINE') AS (id:int, name:chararray, v1:int, v2:int, v3:int);
dump data;
Notice the usage of CSVExcelStorage loader that's available in piggybank. Since we have newlines in the incoming data set, we enable MULTILINE option. The above script produces the following output:
(1,Some string, with a comma.,1,2,3)
(2,Another \string with quotes\",4,5,6)
(3,A string with
new line,7,8,9)
(4,A string with 3 new lines -
first new line
second new line
third new line,10,11,12)
(5,a string with \quote\" and a
new line,13,14,15)
(6,clean record,0,1,2)
(7,single
newline,0,1,2)
(8,",51,52,53)
(9,null,105,,103)
In records with id 2 and 5, a blackslash remains in the place of the very first double quote while for subsequent double quotes, both the slash as well as the quote remains. This is not exactly what I want. Noting that CSVExcelStorage, based on Excel 2007, uses double quotes to escape quote (i.e., consecutive double quotes are treated as single double quote), I made the escape character a double quote:
sqoop import --connect jdbc:mysql://localhost/test --username user --password pass --table example --columns 'name, v1, v2, v3' --verbose --split-by id --target-dir example --fields-terminated-by , --escaped-by '\"' --enclosed-by '\"' --num-mappers 1
Before executing the above command, I deleted the existing data:
$hadoop fs -rm -r example
After the sqoop import runs through, here is how data looks on HDFS now:
$hadoop fs -cat example/part-m-00000
"1","Some string, with a comma.","1","2","3"
"2","Another """"string with quotes""""","4","5","6"
"3","A string with
new line","7","8","9"
"4","A string with 3 new lines -
first new line
second new line
third new line","10","11","12"
"5","a string with """"quote"""" and a
new line","13","14","15"
"6","clean record","0","1","2"
"7","single
newline","0","1","2"
"8","","51","52","53"
"9","null","105","null","103"
I ran the same pig script once more on this data and it produces the following output:
(1,Some string, with a comma.,1,2,3)
(2,Another ""string with quotes"",4,5,6)
(3,A string with
new line,7,8,9)
(4,A string with 3 new lines -
first new line
second new line
third new line,10,11,12)
(5,a string with ""quote"" and a
new line,13,14,15)
(6,clean record,0,1,2)
(7,single
newline,0,1,2)
(8,",51,52,53)
(9,null,105,,103)
Noticing that any double quotes in the string are now doubled effectively, I can get rid of this by using REPLACE function in pig:
data2 = FOREACH data GENERATE id, REPLACE(name, '""', '"') as name, v1, v2, v3;
dump data2;
The above script produces the following output:
(1,Some string, with a comma.,1,2,3)
(2,Another "string with quotes",4,5,6)
(3,A string with
new line,7,8,9)
(4,A string with 3 new lines -
first new line
second new line
third new line,10,11,12)
(5,a string with "quote" and a
new line,13,14,15)
(6,clean record,0,1,2)
(7,single
newline,0,1,2)
(8,",51,52,53)
(9,null,105,,103)
The above looks much more like the output I want. One last item I need to ensure is that nulls and empty strings for chararray type and nulls for int type are accounted for.
Towards that end, I add one more section to the above pig script that generates null and empty strings for char type and null for int type:
data3 = FOREACH data2 GENERATE id, name, v1, v2, v3, null as name2:chararray, '' as name3:chararray, null as v4:int;
dump data3;
The output looks as follows:
(1,Some string, with a comma.,1,2,3,,,)
(2,Another "string with quotes",4,5,6,,,)
(3,A string with
new line,7,8,9,,,)
(4,A string with 3 new lines -
first new line
second new line
third new line,10,11,12,,,)
(5,a string with "quote" and a
new line,13,14,15,,,)
(6,clean record,0,1,2,,,)
(7,single
newline,0,1,2,,,)
(8,",51,52,53,,,)
(9,null,105,,103,,,)
I stored the same output in HDFS using the following pig script:
STORE data3 INTO 'example_output' USING org.apache.pig.piggybank.storage.CSVExcelStorage(',', 'YES_MULTILINE');
Here is how data on HDFS looks like:
$hadoop fs -cat example_output/part-m-00000
1,"Some string, with a comma.",1,2,3,,,
2,"Another ""string with quotes""",4,5,6,,,
3,"A string with
new line",7,8,9,,,
4,"A string with 3 new lines -
first new line
second new line
third new line",10,11,12,,,
5,"a string with ""quote"" and a
new line",13,14,15,,,
6,clean record,0,1,2,,,
7,"single
newline",0,1,2,,,
8,"""",51,52,53,,,
9,null,105,,103,,,
For nulls and empty strings, the only two records of interest are the bottom two ones (id = 8 and 9). It’s clear that there is a difference between empty string and null from source using sqoop versus that which is generated from pig. I could account for null and empty strings in the name field above similar to how I have done for the double quote but it seems rather manual and more steps than needed.
Notice that although we have used "enclosed-by" option in sqoop import (as opposed to "optionally-enclosed-by" option), the output from PIG uses enclosure only when there is a need for it i.e., if a quote or a comma appears in the field, then enclosing is performed, otherwise not - in other words, this looks like the sqoop equivalent of "optionally-enclosed-by" option.
The final stage in the pipeline is sqoop export. I put together the following table:
mysql> desc example_output;
+-------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+---------------+------+-----+---------+-------+
| id | int(11) | YES | | NULL | |
| name | varchar(1024) | YES | | NULL | |
| v1 | int(11) | YES | | NULL | |
| v2 | int(11) | YES | | NULL | |
| v3 | int(11) | YES | | NULL | |
| name2 | varchar(1024) | YES | | NULL | |
| name3 | varchar(1024) | YES | | NULL | |
| v4 | int(11) | YES | | NULL | |
+-------+---------------+------+-----+---------+-------+
8 rows in set (0.00 sec)
Here is the sqoop export command I used:
sqoop export --connect jdbc:mysql://localhost/test --username user --password pass --table example_output --export-dir example_output --input-fields-terminated-by , --input-escaped-by '\"' --input-optionally-enclosed-by '\"' --num-mappers 1 --verbose
The export options are similar to import options except that the "enclosed-by" has been replaced by "optionally-enclosed-by" and an "input-" prefix has been added to some of the options (e.g: --input-fields-terminated-by) since sqoop export uses those while reading input from HDFS.
This fails with the following error in the logs:
2014-02-25 22:19:05,750 ERROR org.apache.sqoop.mapreduce.TextExportMapper: Exception:
java.lang.RuntimeException: Can't parse input data: 'Some string, with a comma.,1,2,3,,,'
at example_output.__loadFromFields(example_output.java:396)
at example_output.parse(example_output.java:309)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.util.NoSuchElementException
at java.util.ArrayList$Itr.next(ArrayList.java:794)
at example_output.__loadFromFields(example_output.java:366)
... 12 more
2014-02-25 22:19:05,756 ERROR org.apache.sqoop.mapreduce.TextExportMapper: On input: 1,"Some string, with a comma.",1,2,3,,,
2014-02-25 22:19:05,757 ERROR org.apache.sqoop.mapreduce.TextExportMapper: On input file: hdfs://nameservice1/user/xyz/example_output/part-m-00000
2014-02-25 22:19:05,757 ERROR org.apache.sqoop.mapreduce.TextExportMapper: At position 0
In an effort to troubleshoot this problem, I created a HDFS location that has only one record (id = 6) from the input data set:
$ hadoop fs -cat example_output_single_record/part-m-00000
6,clean record,0,1,2,,,
Now the sqoop export command becomes:
sqoop export --connect jdbc:mysql://localhost/test --username user --password pass --table example_output --export-dir example_output_single_record --input-fields-terminated-by , --input-escaped-by '\"' --input-optionally-enclosed-by '\"' --num-mappers 1 --verbose
The above command runs through fine and produces the desired result of inserting the single record into the destination DB:
mysql> select * from example_output;
+------+--------------+------+------+------+-------+-------+------+
| id | name | v1 | v2 | v3 | name2 | name3 | v4 |
+------+--------------+------+------+------+-------+-------+------+
| 6 | clean record | 0 | 1 | 2 | | | NULL |
+------+--------------+------+------+------+-------+-------+------+
1 row in set (0.00 sec)
While null value has been preserved for the numeric field, both null and empty string mapped to empty string in the destination DB.
With the above as the background, here are the questions:
I think it would be easier if we can ensure that a given value for a given data type will be represented/processed exactly the same way regardless of whether it’s coming from sqoop or generated by pig. Has anyone figured out a way to ensure consistent representation/processing of a given data type while preserving the original field values? I have covered only two data types here (chararray and int) but I suppose some of the other data types also have potentially similar issues.
I have used "enclosed-by" option in sqoop import instead of "optionally-enclosed-by" so that every field value will be enclosed within double quotes. I just thought it would be a source of less confusion if every value in every field was enclosed instead of just those that need enclosing. What do others use and has one of these options worked better for your use case relative to the other? It looks like CSVExcelStorage doesn't support a notion of "enclosed-by" - are there any other storage functions that support this mechanism?
Any suggestions on how to get the sqoop export to work as intended on the full output of pig script (i.e., example_output on HDFS)?

Maybe you need to step back and choose a simpler solution. So you have newlines, tabs, commas, double quotes, nulls, foreign characters? and maybe even some garbage in your data, but how random is it really? Could you choose an obscure delimiter character and survive?
For example, to use 0x17 as the field delimiter
use delimiter with sqoop:
--fields-terminated-by \0x17
and with pig:
LOAD 'input.dat' USING PigStorage('\\0x17') as (x,y,z);
Or maybe there is some other obscure ascii value you could use:
http://en.wikipedia.org/wiki/ASCII

Q. While importing data with sqoop, I cannot determine an unambiguous delimiter for text files. What are my options in such a scenario?
A. When a chosen delimiter might occur in imported data, use qualifiers to avoid ambiguity. You can accomplish this by using the --escaped-by and --enclosed-by args of the sqoop import command. For example, the following command encloses fields in an imported file in double quotes:
sqoop import --fields-terminated-by , -- escaped-by \\ --enclosed-by '\"'
Big Data Analytics with HDInsight in "cough" 24 "cough" hours
https://books.google.com/books?id=FWvoCgAAQBAJ&pg=PT648&lpg=PT648&dq=sqoop+enclose+fields+double+quote&source=bl&ots=zkYTKphcZp&sig=LdB0BxQVQWrBbiNyA9g_roFA8Yk&hl=en&sa=X&ved=0ahUKEwiMzO-a4KrOA

Postgres DB Size Command

What is the command to find the size of all the databases?
I am able to find the size of a specific database by using following command:
select pg_database_size('databaseName');

You can enter the following psql meta-command to get some details about a specified database, including its size:
\l+ <database_name>
And to get sizes of all databases (that you can connect to):
\l+

You can get the names of all the databases that you can connect to from the "pg_datbase" system table. Just apply the function to the names, as below.
select t1.datname AS db_name,
pg_size_pretty(pg_database_size(t1.datname)) as db_size
from pg_database t1
order by pg_database_size(t1.datname) desc;
If you intend the output to be consumed by a machine instead of a human, you can cut the pg_size_pretty() function.

-- Database Size
SELECT pg_size_pretty(pg_database_size('Database Name'));
-- Table Size
SELECT pg_size_pretty(pg_relation_size('table_name'));

Based on the answer here by #Hendy Irawan
Show database sizes:
\l+
e.g.
=> \l+
berbatik_prd_commerce | berbatik_prd | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | 19 MB | pg_default |
berbatik_stg_commerce | berbatik_stg | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | 8633 kB | pg_default |
bursasajadah_prd | bursasajadah_prd | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | 1122 MB | pg_default |
Show table sizes:
\d+
e.g.
=> \d+
public | tuneeca_prd | table | tomcat | 8192 bytes |
public | tuneeca_stg | table | tomcat | 1464 kB |
Only works in psql.

Yes, there is a command to find the size of a database in Postgres. It's the following:
SELECT pg_database.datname as "database_name", pg_size_pretty(pg_database_size(pg_database.datname)) AS size_in_mb FROM pg_database ORDER by size_in_mb DESC;

SELECT pg_size_pretty(pg_database_size('name of database'));
Will give you the total size of a particular database however I don't think you can do all databases within a server.
However you could do this...
DO
$$
DECLARE
r RECORD;
db_size TEXT;
BEGIN
FOR r in
SELECT datname FROM pg_database
WHERE datistemplate = false
LOOP
db_size:= (SELECT pg_size_pretty(pg_database_size(r.datname)));
RAISE NOTICE 'Database:% , Size:%', r.datname , db_size;
END LOOP;
END;
$$

From the PostgreSQL wiki.
NOTE: Databases to which the user cannot connect are sorted as if they were infinite size.
SELECT d.datname AS Name, pg_catalog.pg_get_userbyid(d.datdba) AS Owner,
CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
THEN pg_catalog.pg_size_pretty(pg_catalog.pg_database_size(d.datname))
ELSE 'No Access'
END AS Size
FROM pg_catalog.pg_database d
ORDER BY
CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
THEN pg_catalog.pg_database_size(d.datname)
ELSE NULL
END DESC -- nulls first
LIMIT 20
The page also has snippets for finding the size of your biggest relations and largest tables.

Start pgAdmin, connect to the server, click on the database name, and select the statistics tab. You will see the size of the database at the bottom of the list.
Then if you click on another database, it stays on the statistics tab so you can easily see many database sizes without much effort. If you open the table list, it shows all tables and their sizes.

You can use below query to find the size of all databases of PostgreSQL.
Reference is taken from this blog.
SELECT
datname AS DatabaseName
,pg_catalog.pg_get_userbyid(datdba) AS OwnerName
,CASE
WHEN pg_catalog.has_database_privilege(datname, 'CONNECT')
THEN pg_catalog.pg_size_pretty(pg_catalog.pg_database_size(datname))
ELSE 'No Access For You'
END AS DatabaseSize
FROM pg_catalog.pg_database
ORDER BY
CASE
WHEN pg_catalog.has_database_privilege(datname, 'CONNECT')
THEN pg_catalog.pg_database_size(datname)
ELSE NULL
END DESC;

du -k /var/lib/postgresql/ |sort -n |tail

Postgres concurrent copy without ID value?

I am performing concurrent copy commands but am not specifying a value for a serial ID field. As far as I know this is ok if I have just one copy command since Postgres will generate an ID.
But would this cause conflicts with more than 1 copy command running since the sequence is never updated by a copy command?

copy command update id serial automatically. so, it works fine without id conflicts.
I test performing concurrent copy commands in postgresql 9.24
I create table like below
create table tbl_test (id serial primary key, name varchar(16), age integer);
I also made 2 csv file having 1,000,000 data.
file1.csv
"1", 1
"2", 2
...
"1000000", 1000000
file2.csv
"n1", 1
"n2", 2
...
"n1000000", 1000000
when I try to copy simultaneously from file1, I get result like below
...
1000245 | n453649 | 453649
1000246 | 546595 | 546595
1000247 | n453650 | 453650
1000248 | 546596 | 546596
1000249 | n453651 | 453651
1000250 | 546597 | 546597
...
all data copied well.
postgres=# select count(*) from tbl_test;
count
---------
2000000
(1 row)

As long as the column has a sequence as a default (or is a SERIAL/BIGSERIAL datatype) and you are not referencing that directly in the COPY command you will not have ever have conflicts on that id.
Sequences are designed to be atomic, even within transactions, which also generates another common question "How do I get gapless sequences?"

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Insufficient Resources error while inserting into SQL table using Vertica - sql

Related

Why Query_parallelism affects the result of a join between two UUID columns

Can I display column headings when querying via gcloud dataproc jobs submit spark-sql?

Apache Sqoop/Pig Consistent Data Representation/Processing

Postgres DB Size Command

Postgres concurrent copy without ID value?

Categories

Resources