enter code hereI have the following problem:
I have a bunch of Tables in a Vertica Database say:
+------------+
| Tablenames |
+------------+
| a_1 |
| a_2 |
| a_34 |
| b_1 |
| b_4 |
+------------+
The tables are not exactly the same but have mostly similar entries. And now I want to make one query over all tables that start with a_ (a_1 a_2 a_34).
Is there a way to search through all the tables for the string a_ in their name, output some sort of list and than either use a for loop or join operation with the generated list?
Once I get the new table (lets call it temp_table) that has all the table names that start with a_ I would like to run one query over all of them, something like that (Matlab syntax):
for ii=1:length(temp_table)
Data{ii}=SELECT * FROM temp_table(ii) WHERE paste_condition_here
end
So Data should be a new table that appends the new rows with each iteration.
#Nirjihar - there is on information_schema (you need v_catalog), you are confusing with MySQL.
select TABLE_NAME from v_catalog.TABLES where TABLE_NAME like 'a_%';
This will return all tables with a criteria of 'a_%'
Just as a complement !
In Vertica you won`t have loops ! For this you have to use UDP(user defined procedures), this can be written in the language of your choice (shell,java,R,C++).
i will go ahead and post on model here for you :
1 - Shell proc - to be created in the procedures folder
#!/bin/bash
. /home/dbadmin/.profile
/opt/vertica/bin/vsql -U $username -w $password -t -o /tmp/query.sql -c"
SELECT
' select * from '
||TABLE_SCHEMA
||'.'
||TABLE_NAME
||';'
FROM
v_catalog.TABLES where TABLE_NAME like '%$1%'
"
/opt/vertica/bin/vsql -U $username -w $password -F $'|' -At -o /tmp/query_output.csv -f /tmp/query.sql
2 - change sh file privs
chmod 4750 query_table.sh
3 - make sure you have the .profile file populated accordingly
. /home/dbadmin/.profile
#!/bin/bash
username=dbadmin
password=secrectpasswd
export username
export password
Note: this is to avoid passwd in text and only have one point of text passwd
4 - Register the UDP with Vertica Catalog
. /home/dbadmin/.profile
admintools -t install_procedure -f /vertica/catalog//procedures/query_table.sh -d -p $password
5 -Create the UDP inside the database
. /home/dbadmin/.profile
/opt/vertica/bin/vsql -U $username -w $password -c "CREATE PROCEDURE dba.query_table(table_name varchar) AS 'query_table.sh' LANGUAGE 'external' USER 'dbadmin';"
6 - execute the proc
select dba.query_table('you possible table name here');
7 - check results
a - you will get a file with the query
b - one file with the exported data(csv '|' delimited).
i have a similar post here:
http://www.aodba.com/create-vertica-schema-fly/
To get all the tables that start from a_:-
select TABLE_NAME from INFORMATION_SCHEMA.TABLES where TABLE_NAME like 'a_%'
Then you can alias this and join the list or each table as you like.
Related
I'm trying to export a bunch of DB2 tables to CSV, with column names. I don't see any straight forward way to do this. I followed this to get the data I want. But I have to execute that over hundreds of tables. Is there a way to dynamically get all the columns and tables given N schema names?
I also tried this which exports all tables to csv in a schema but this doesn't give me column names. So if someone could show me show to change this script to get column names in the CSVs my work is done.
The server is running: Red Hat Linux Server.
Using files
The following db2 command generates the export script:
export to exp.sql of del modified by nochardel
select
x'0a'||'export to file_header of del modified by nochardel VALUES '''||columns||''''
||x'0a'||'export to file_data of del messages messages.msg select '||columns||' from '||tabname_full
||x'0a'||'! cat file_header file_data > '||tabname_full||'.csv'
from
(
select rtrim(c.tabschema)||'.'||c.tabname as tabname_full, listagg(c.colname, ', ') as columns
from syscat.tables t
join syscat.columns c on c.tabschema=t.tabschema and c.tabname=t.tabname
where t.tabschema='SYSIBM' and t.type='T'
group by c.tabschema, c.tabname
--fetch first 10 row only
)
;
It's better to place the command above to some file like gen_exp.sql and run it to produce the export script:
db2 -tf gen_exp.sql
The export script exp.sql consists of 3 commands for each table:
* db2 export command to get a comma separated list of columns
* db2 export command to get table data
* concatenation command to collect both outputs above to a single file
You run this script as follows:
db2 -vf exp.sql -z exp.sql.log
Using pipe
gen_exp_sh.sql:
export to exp.sh of del modified by nochardel
select
x'0a'||'echo "'||columns||'" > '||filename
||x'0a'||'db2 "export to pipe_data of del messages messages.msg select '||columns||' from '||tabname_full||'" >/dev/null 2>&1 </dev/null &'
||x'0a'||'cat pipe_data >> '||filename
from
(
select
rtrim(c.tabschema)||'.'||c.tabname as tabname_full
, rtrim(c.tabschema)||'.'||c.tabname||'.csv' as filename
, listagg(c.colname, ', ') as columns
from syscat.tables t
join syscat.columns c on c.tabschema=t.tabschema and c.tabname=t.tabname
where t.tabschema='SYSIBM' and t.type='T'
group by c.tabschema, c.tabname
--fetch first 10 row only
)
;
Run it as follows:
db2 -tf gen_exp_sh.sql
The export shell script exp.sh consists of 3 commands for each table:
* echo command to write a comma separated list of columns to a file
* db2 export command to get table data to a pipe (started in a background)
* simple cat command to read from the pipe and add data to the same file with the columns list
Usage:
You must create the pipe first and source (dot space script notation - it's important) the export script afterwards:
mkfifo pipe_data
db2 connect to mydb ...
. ./exp.sh
rm -f pipe_data
Try to use this great tool: https://www.sql-workbench.eu/. It's universal and you may transfer data between any type of database motors.
I have one big file containing data, for example :
123;test/x/COD_ACT_008510/descr="R08-Ballon d''eau"
456;test/x/COD_ACT_008510/descr="R08-Ballon d''eau"
In reality, there is much more column but I simplified here.
I want to treat each line, and do some sqlplus treatment with them.
Let say that I have one table, with two column, with this :
ID | CONTENT
123 | test/x/COD_ACT_333/descr="Test 1"
456 | test/x/COD_ACT_444/descr="Test 2"
Let say I want to update the two lines content value to have that :
ID | CONTENT
123 | test/x/COD_ACT_008510/descr="R08-Ballon d''eau"
456 | test/x/COD_ACT_008510/descr="R08-Ballon d''eau"
I have a lot of data and complex request to execute in reality, so I have to use sqlplus, not tools like sqlloader.
So, I treat the input file on 5 multi thread, one line at each time, and define "\n" like separator to evict quote conflict :
cat input_file.txt | xargs -n 1 -P 5 -d '\n' ./my_script.sh &
In "my_script.sh" I have :
#!/bin/bash
line="$1"
sim_id=$(echo "$line" | cut -d';' -f1)
content=$(echo "$line" | cut -d';' -f2)
sqlplus -s $DBUSER/$DBPASSWORD#$DBHOST:$DBPORT/$DBSCHEMA #updateRequest.sql "$id" "'"$content"'"
And in the updateRequest.sql file (just containing a test) :
set heading off
set feed off
set pages 0
set verify off
update T_TABLE SET CONTENT = '&2' where ID = '&1';
commit;
And in result, I have :
01740: missing double quote in identifier
If I put “verify” parameter to on in the sql script, I can see :
old 1: select '&2' from dual
new 1: select 'test/BVAL/COD_ACT_008510/descr="R08-Ballon d'eau"' from dual
It seems like one of the two single quotes (used for escape the second quote) is missing...
I tried everything, but each time I have an error with quote or double quote, either of bash side, or sql side... it's endless :/
I need the double quote for the "descr" part, and I need to process the apostrophe (quote) in content.
For info, the input file is generated automatically, but I can modify his format.
With GNU Parallel it looks like this:
dburl=oracle://$DBUSER:$DBPASSWORD#$DBHOST:$DBPORT/$DBSCHEMA
cat big |
parallel -j5 -v --colsep ';' -q sql $dburl "update T_TABLE SET CONTENT = '{=2 s/'/''/g=}' where ID = '{1}'; commit;"
But only if you do not have ; in the values. So given this input it will do the wrong thing:
456;test/x/COD_ACT_008510/descr="semicolon;in;value"
(Please correct my if i do some mistakes)
I have 2 tables:
ACTIVITIES : ID / NAME / CONTENT / DATE /ETC..
ARTICLES : ID / NAME /CONTENT /DATE /ETC..
I have created one script to delete image when it is not in the db , the problem is :
I don't know how to check the content of activity and article on the same request because this request bellow just delete my activities images..
#!/bin/bash
db="intranet_carc_development"
user="benjamin"
echo "DELETING UNUSED FILES AND IMAGES..."
for f in public/uploads/files/*
do
if [[ -f "$f" ]]
then
f="$(basename "$f")"
psql $db $user -t -v "ON_ERROR_STOP=1" \
-c "select content from public.articles where content like '%$f%'" | grep . \
&& echo "exist" \
|| rm public/uploads/files/$f
fi
done
printf "DONE\n\n"
If tied something like :
select content from public.articles, public.activities where content like '%$f%'"
but I have this log error:
ERROR: column reference "content" is ambiguous
You can try something like
WITH artcontent AS (
SELECT content
FROM public.articles
),
actcontent AS (
SELECT content
FROM public.activities
),
merge AS (
SELECT * FROM artcontent
UNION ALL
SELECT * FROM actcontent
)
SELECT *
FROM merge
The UNION ALL statement will put together your two results artcontent (which comes from articles) and actcontent (from activities).
Hope it will help you !
I'm trying to set up a monitoring script that would take all the databases we have, showed tables and done some arithmetics on it.
I have this command:
impala-shell -i impalad -q " show databases;" -B | while read a; do impala-shell -q "show tables in ${a}" -B -i impalad; done
That produces following output:
Query: show tables in database1
table1
table2
How should I format the output to display the database name($a) with each table? I tried echoing it or || but this only prints the database name after displaying all the tables. Or is there a way how to pass the variable to awk?
Desired output would look like this:
database1.table1
database1.table2
It looks like the output of the show tables ... command will have a 1-line header, followed by the list of table names.
You could skip the first line by piping to tail -n +2,
and then use another while loop to echo the database name and table name pairs in the desired format:
impala-shell -i impalad -q " show databases;" -B | while read a; do
impala-shell -q "show tables in ${a}" -B -i impalad | tail -n +2 | while read table; do
echo $a.$table
done
done
You could also do
impala-shell -q ... | awk -v db="$a" 'NR > 1 {print db "." $0}'
I am looking for a way to count the number of columns in a table in Hive.
I know the following code works in Microsoft SQL Server. Is there a Hive equivalent?
SELECT COUNT(*),
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_CATALOG = 'database_name'
AND TABLE_SCHEMA = 'schema_name'
AND TABLE_NAME = 'table_name'
Try this
SHOW COLUMNS (FROM|IN) table_name [(FROM|IN) db_name]
Try this, it will show you the columns of your table:
DESCRIBE schemaName.tableName;
I do not know of a way to count the columns directly, however, I solved the problem for my needs indirectly via:
echo 'table1name:, '`hive -e 'describe schemaname.table1name;' | grep -v 'col_name' | wc -l > num_columns.csv
echo 'table2name:, '`hive -e 'describe schemaname.table2name;' | grep -v 'col_name' | wc -l >> num_columns.csv
...
(I needed the grep -v bit because I have headers on by default; without it you get one too many lines counted in the wc -l step.)
you have to check if your HIVE include HIVE-287 because for versions of HIVE which don't include HIVE-287, you'll need to use COUNT(1) in place of COUNT(*).
Just do a describe it will show you all columns then at the bottom then you can see number of rows it fetched that is number of columns.