In my windows batch file, I execute an sql which contains a query that deletes rows if fits the condition which means that it deleted unwanted rows of data. Assuming that the sql execution has no errors, is there any way to obtain the count of deleted rows?
The purpose of this is so that I can log if there are deleted rows.
IF DELETED_ROWS_COUNT > 0 (
echo Some invalid rows are DELETED.
)
This is how I executed the sql file.
psql -h %DB_HOST% -p 5432 -U %DB_USER% -d %DB_NAME% -q -f "%HOME%\bin\import.sql" 2>> "%importfilename%"
Or if you have any other approach that is more doable than this idea, it would be highly appreciated.
One option is to get the deleted tuple count using a writeable CTE, e.g.
with del as (delete from a returning id) select count (*) from del;
AFAIK psql doesn't offer any easy way to get the affected-rows count without parsing the command-tag output like
DELETE 8
from the standard output.
Also, I suggest using psql -qAtX to run in quiet tuples-only mode with no headers/formatting, and ignore any psqlrc.
Related
I am new to Postgres and i am trying to learn from an online tutorial. One of the first thing is to load the data, as follows:
Finally, run psql -U <username> -f clubdata.sql -d postgres -x -q to
create the 'exercises' database, the Postgres 'pgexercises' user, the
tables, and to load the data in. Note that you may find that the sort
order of your results differs from those shown on the web site:
I am using pdAdmin4 and opened the SQL shell. However I wasn't able to load this database. First of all, how can i figure out what is my current username?
Secondly, I have never worked with command line before and am quite unsure how to do this. Could someone break this down step-by-step?
You can run "psql -h" for more help. You never have a current username as such, you have to specify it but start with "-U postgres" and ask again if that doesn't work.
Your sql file to load will need the folder path or you could try the cmd prompt and change to the folder where your clubdata file is. Your command line assumes there is already a database named postgres which there probably is. Try again;
psql -U postgres -f clubdata.sql -d postgres -x -q
The command psql is for the command line client. You need to run this in a terminal.
I wrestled with this input myself, despite a little CLI experience with psql. It may help to remove the -q flag in the end to make the output non-quiet, then you see what's going on.
Lastly, beware that the import creates a schema, so you need to read up on schemas. See this related question for a bit more background: https://dba.stackexchange.com/questions/264398/cant-find-any-tables-after-psql-dump-import-from-pgexercises-com
I'm working with hive/impala and I often run into the need of having to query the results of a show partition to get specific partition. Let's suppose I have a table tbl1 partitioned by fields country and date. So, show partitions tbl1 would result in something like this
country=c1/date=d1
country=c1/date=d3
country=c2/date=d2
I want to do something like select * from (show partitions tbl1) a where a.country='c1' and I want to do this in Hue or shell (hive and impala).
Is this possible?
I don't think what you are trying is possible inside impala/hive directly.
I can suggest an alternative way:
Use bash in combination impala/hive
so instead of entering into interactive modes in hive and impala , use the command line option to pass the query from bash shell itself so that the result comes back to bash shell and then use grep or other text processing commands to process it.
so it would seem like
impala -k -i <> --ssl -ca_cert <> -B -q "show partitions tbl1" | grep "country=c1"
here u need to put required values in place of <>
so in this way you can use grep/sed or other tools to get the desired output.
Obviously it depends on your use case what you exactly want..but I hope this can help
If someone ever finds this useful, this is what I ended up doing. Assuming you have spark-shell or spark2-shell, you can store the output of show partitions in a dataframe and then transform such dataframe. This is what I did (inside spark2-shell:
val df = spark.sql("show partitions tbl1").map(row => {
val arrayValues = row.getString(0).split("/")
(arrayValues.head.split("=")(1), arrayValues(1).split("=")(1))
}).toDF("country", "date")
this takes the list of partitions (a DataFrame[String]) and splits the dataframe by / and then for each piece, splits for = and takes the value
I have a query whose result I wanted to store in a variable
How can I do it ?
I tried
./hive -e "use telecom;insert overwrite local directory '/tmp/result' select
avg(a) from abc;"
./hive --hiveconf MY_VAR =`cat /tmp/result/000000_0`;
I am able to get average value in MY_VAR but it takes me in hive CLI which is not required
and is there a way to access unix commands inside hive CLI?
Use Case: in mysql the following is valid:
set #max_date := select max(date) from some_table;
select * from some_other_table where date > #max_date;
This is super useful for scripts that need to repeatedly call this variable since you only need to execute the max date query once rather than every time the variable is called.
HIVE does not currently support this. (please correct me if I'm wrong! I have been trying to figure out how to do this all all afternoon)
My workaround is to store the required variable in a table that is small enough to map join onto the query in which it is used. Because the join is a map rather than a broadcast join it should not significantly hurt performance. For example:
drop table if exists var_table;
create table var_table as
select max(date) as max_date from some_table;
select some_other_table.*
from some_other_table
left join var_table
where some_other_table.date > var_table.max_date;
The suggested solution by #visakh is not optimal because stores the string 'select count(1) from table_name;' rather than the returned value and so will not be helpful in cases where you need to call a var repeatedly during a script.
Storing hive query output in a variable and using it in another query.
In shell create a variable with desired value by doing:
var=`hive -S -e "select max(datekey) from ....;"`
echo $var
Use the variable value in another hive query by:
hive -hiveconf MID_DATE=$var -f test.hql
You can simply achieve this using a shell script.
create a shell script
file: avg_op.sh
#!/bin/sh
hive -e 'use telecom;select avg(a) from abc;' > avg.txt
wait
value=`cat avg.txt`
hive --hiveconf avgval=$value -e "set avgval;set hiveconf:avgval;
use telecom;
select * from abc2 where avg_var=\${hiveconf:avgval};"
execute the .sh file
>bash avg_op.sh
If you trying to capture the number from a Hive query or impala query in Linux, you can achieve this by executing the query and selecting numbers from the regex.
With Hive,
max=`beeline -u ${hiveConnectionUrl} -e "select max(col1) from schema_name.table_name;" | sed 's/[^0-9]*//g'`
The main part is to extract the number from the result. Also, if you're getting too large a result, you can use --silent=true flag to silent the execution which would reduce the log messages.
You can use BeeTamer for that. It allows to store result (or part of it) in a variable, and use this variable later in your code.
Beetamer is a macro language / macro processor that allows to extend functionality of the Apache Hive and Cloudera Impala engines.
select avg(a) from abc;
%capture MY_AVERAGE;
select * from abc2 where avg_var=#MY_AVERAGE#;
In here you save average value from you query into macro variable MY_AVERAGE and then reusing it in the second query.
try below :
$ var=$(hive -e "select '12' ")
$ echo $var
12 -- output
I am trying to do a mysql dump of a few rows in my database. I can then use the dump to upload those few rows into another database. The code I have is working, but it dumps everything. How can I get mysqldump to only dump certain rows of a table?
Here is my code:
mysqldump --opt --user=username --password=password lmhprogram myResumes --where=date_pulled='2011-05-23' > test.sql
Just fix your --where option. It should be a valid SQL WHERE clause, like:
--where="date_pulled='2011-05-23'"
You have the column name outside of the quotes.
You need to quote the "where" clause.
Try
mysqldump --opt --user=username --password=password lmhprogram myResumes --where="date_pulled='2011-05-23'" > test.sql
Use this code for specific table rows, using LIKE condition.
mysqldump -u root -p sel_db_server case_today --where="date_created LIKE '%2018
%'" > few_rows_dump.sql
We have a database table that we pre-populate with data as part of our deployment procedure. Since one of the columns is binary (it's a binary serialized object) we use BCP to copy the data into the table.
So far this has worked very well, however, today we tried this technique on a Windows Server 2008 machine for the first time and noticed that not all of the columns were being updated. Out of the 31 rows that are normally inserted as part of this operation, only 2 rows actually had their binary columns populated correctly. The other 29 rows simply had null values for their binary column. This is the first situation where we've seen an issue like this and this is the same .dat file that we use for all of our deployments.
Has anyone else ever encountered this issue before or have any insight as to what the issue could be?
Thanks in advance,
Jeremy
My guess is that you're using -c or -w to dump as text, and it's choking on a particular combination of characters it doesn't like and subbing in a NULL. This can also happen in Native mode if there's no format file. Try the following and see if it helps. (Obviously, you'll need to add the server and login switches yourself.)
bcp MyDatabase.dbo.MyTable format nul -f MyTable.fmt -n
bcp MyDatabase.dbo.MyTable out MyTable.dat -f MyTable.fmt -k -E -b 1000 -h "TABLOCK"
This'll dump the table data as straight binary with a format file, NULLs, and identity values to make absolutely sure everything lines up. In addition, it'll use batches of 1000 to optimize the data dump. Then, to insert it back:
bcp MySecondData.dbo.MyTable in MyTable.dat -f MyTable.fmt -n -b 1000
...which will use the format file, data file, and set batching to increase the speed a little. If you need more speed than that, you'll want to look at BULK INSERT, FirstRow/LastRow, and loading in parallel, but that's a bit beyond the scope of this question. :)