How to replace IN clause with JOIN in Postgres? - sql

I have the following query.
select *
from table table0_
where (table0_.col1, table0_.col2, table0_.col3) in (($1, $2, $3) , ($4, $5, $6) , ($7, $8, $9) , ($10, $11, $12) , ($13, $14, $15))
How to replace IN clause with JOIN as shown in the below in the Postgres.
select *
from table0_
where table0_.col1=$1
and table0_.col2=$2
and table0_.col3=$3
EDIT: I read from somewhere that IN operator does not make use of indexes. Also, this query is taking more time if passing more parameters.

I don't know why you should do that because actually no difference between them. You can use the below query and use CTE to create a temp table and join together.
with data as (
select
*
from (
values ($1, $2, $3) , ($4, $5, $6) , ($7, $8, $9) , ($10, $11, $12) , ($13, $14, $15)
) t (col1, col2, col3)
)
select
table0_.*
from
table0_, data
where
table0_.col1 = data.col1
and table0_.col2 = data.col2
and table0_.col3 = $data.col3

Related

How do I add a comma as my delimiter for the output of awk

How do I add a comma as my delimiter for the output of awk
> cat file.csv
col1,col2,col3,col4
col1,col2,col3
col1,col2
col1
col1,col2,col3,col4,col5
This is my attempt:
> awk -F, 'BEGIN{FS=OFS=","} {print $1 $2}' file.csv
col1col2
col1col2
col1col2
col1
col1col2
>
What I want is this
col1,col2
col1,col2
col1,col2
col1,
col1,col2
Below is just for my ref:
> awk -F, '{print $0}' file.csv
col1,col2,col3,col4
col1,col2,col3
col1,col2
col1
col1,col2,col3,col4,col5
> awk -F, '{print $1}' file.csv
col1
col1
col1
col1
col1
> awk -F, '{print $1 $2}' file.csv
col1col2
col1col2
col1col2
col1
col1col2
The statement print $1 $2 prints one output field, the concatenation of $1 and $2. Hence there will be no OFS output.
What you're looking for is print $1, $2 which prints two distinct fields, and will therefore have the OFS inserted between them. You can see the difference in the following transcript:
pax#styx:~> echo "1 2" | awk 'BEGIN{OFS="-xyzzy-"}{print $1 $2}'
12
pax#styx:~> echo "1 2" | awk 'BEGIN{OFS="-xyzzy-"}{print $1, $2}'
1-xyzzy-2
if you actually want that trailing "," :
mawk NF=2 FS=, OFS=,
col1,col2
col1,col2
col1,col2
col1,
col1,col2
...without the trailing "," :
gawk 'NF < 2 || NF = 2' FS=, OFS=,
col1,col2
col1,col2
col1,col2
col1
col1,col2

postgres: using function in where clause

the intention is to search across a few columns, get the sums of the weighted results, and return the top few of those. while the below statement runs as intended, it doesnt feel right having to explicitly use the same function over and over again, as a column, as the where criteria, and finally as order by. which is why this is incomplete, but i want to use all the similarity results to come up with a grand total simlarity, select where its above a certain level and order by that same value. but i need to spell it out for each action? is there a better way to write this?
select parent.id, parent.name, child.id, child.name,
similarity(parent.name, $1) as parent_name_similarity,
similarity(parent.abbreviated_name, $1) as parent_abbr_similarity,
similarity(parent.common_name, $1) as parent_common_similarity,
similarity(parent.state, $1) as parent_state_similarity,
similarity(parent.city, $1) as parent_city_similarity ,
similarity(child.name, $1) as child_name_similarity,
(
similarity(parent.name, $1)
+ similarity(parent.common_name, $1)
+ CASE WHEN similarity(child.name, $1) IS NULL THEN 0 ELSE similarity(child.name, $1) END
) as weighted_total
from account_master parent
left outer join child_table child on child.parent = parent.id::text
WHERE (
similarity(parent.name, $1)
+ similarity(parent.common_name, $1)
+ CASE WHEN similarity(child.name, $1) IS NULL THEN 0 ELSE similarity(child.name, $1) END
) > .3
ORDER BY (
similarity(parent.name, $1)
+ similarity(parent.common_name, $1)
+ CASE WHEN similarity(child.name, $1) IS NULL THEN 0 ELSE similarity(child.name, $1) END
) DESC;
update:
similarity is a built-in function that is like a text search, ie searching for the title of a book by only one word. it returns a percentage, 0 to 1, likelihood of result matching. so if a user is searching by company name or city name, i want to return whatever i have thats similar. they could search by name or initials, think fedex vs federal express. Or just Denver. Since im not sure which one they really want, i was going to add up the results across each field; name, common name, abbreviated name, city name, child thing name. So I have 6 individual column matches, i want to sum them into a single total, sort by that single total, and only return the top 5-10 of those when the total is above some threshold ~.3
So i end up calling the similarity function for each column, then also combine into the single total column, to combine again for the where clause, and to combine again for the order by
what i was hoping for was something like below where functions are needed once for columns only and alias can be used for sum, where and order
select parent.id, parent.name, child.id, child.name,
similarity(parent.name, $1) as parent_name_similarity,
similarity(parent.abbreviated_name, $1) as parent_abbr_similarity,
similarity(parent.common_name, $1) as parent_common_similarity,
similarity(parent.state, $1) as parent_state_similarity,
similarity(parent.city, $1) as parent_city_similarity ,
similarity(child.name, $1) as child_name_similarity,
(
parent_name_similarity
+ parent_abbr_similarity
+ CASE WHEN child_name_similarity IS NULL THEN 0 ELSE child_name_similarity END
) as weighted_total
from account_master parent
left outer join child_table child on child.parent = parent.id::text
WHERE weighted_total > .3
ORDER BY weighted_total DESC;

coalesce of a query in postgres

I have a query:
SELECT $ID_TABLE,
TO_CHAR($DATE_COLUMN,'YYYYMMDD') ,
'$UPPER_HOUR',
COUNT(1)
FROM $TABLE_NAME
WHERE DATE_TRUNC('day',$DATE_COLUMN) = cast('$TODAY' as date)
AND TO_CHAR($DATE_COLUMN,'HH24MI') BETWEEN '$LOWER_HOUR' AND '$UPPER_HOUR'
GROUP BY TO_CHAR($DATE_COLUMN,'YYYYMMDD');
But in some cases the query has data in other cases is null.
Then if is null I need select other values, something like:
SELECT(coalesce(
SELECT $ID_TABLE,
TO_CHAR($DATE_COLUMN,'YYYYMMDD') ,
'$UPPER_HOUR',
COUNT(1)
FROM $TABLE_NAME
WHERE DATE_TRUNC('day',$DATE_COLUMN) = cast('$TODAY' as date)
AND TO_CHAR($DATE_COLUMN,'HH24MI') BETWEEN '$LOWER_HOUR' AND '$UPPER_HOUR'
GROUP BY TO_CHAR($DATE_COLUMN,'YYYYMMDD')),select $ID_TABLE, $date, $PPER_HOUR, 0);"
Is possible do something like that.
This is for a process with a insertion with that select.
And if i don't have any row I need insert the values with constant date and the count(1)= 0.
The values with $ are constants
Thanks ;)
It is possible to use coalesce() in this way only when the query returns a single value.
You can use a plpgsql block and the FOUND variable, like in this pseudocode:
do $$
begin
insert into my_table
<a select query>;
if not found then
insert into my_table
values(<some default values>);
end if;
end $$;
Perhaps this is what you mean:
WITH t as (
SELECT $ID_TABLE as col1, TO_CHAR($DATE_COLUMN, 'YYYYMMDD') as col2,
'$UPPER_HOUR' as col3,
COUNT(1) as col4
FROM $TABLE_NAME
WHERE DATE_TRUNC('day',$DATE_COLUMN) = cast('$TODAY' as date) AND
TO_CHAR($DATE_COLUMN, 'HH24MI') BETWEEN '$LOWER_HOUR' AND '$UPPER_HOUR'
GROUP BY TO_CHAR($DATE_COLUMN, 'YYYYMMDD'
)
SELECT *
FROM T
UNION ALL
SELECT v.*
FROM (VALUES (?, ?, ?, ?)) v(col1, col2, col3, col4)
WHERE NOT EXISTS (SELECT 1 FROM t);

Postgres: Could not identify an ordering operator for type unknown

I have this query with prepared statement:
SELECT * FROM ONLY service_services
UNION ALL
SELECT * FROM fleet.service_services
WHERE deleted=false
ORDER BY $1
LIMIT $2
I send the value of name ASC to $1 and 10 to $2
For some reason I am getting this error:
could not identify an ordering operator for type unknown
If I hard code the name ASC instead of $1, like this:
SELECT * FROM ONLY service_services
UNION ALL
SELECT * FROM fleet.service_services
WHERE deleted=false
ORDER BY name ASC
LIMIT $1
It is working fine.
What am I doing wrong?
For one column you can use CASE WHEN to parametrize it:
SELECT * FROM ONLY service_services
UNION ALL
SELECT * FROM fleet.service_services
WHERE deleted=false
ORDER BY
CASE WHEN $1 = 'name' THEN name
WHEN $1 = 'col_name' THEN col_name
ELSE ...
END
LIMIT $2;
or:
SELECT * FROM ONLY service_services
UNION ALL
SELECT * FROM fleet.service_services
WHERE deleted=false
ORDER BY
CASE $1
WHEN 'name' THEN name
WHEN 'col_name' THEN col_name
ELSE column_name -- default sorting
END
LIMIT $2;
Using CASE you nay need to cast column to the same datatype to avoid implicit conversion errors.
EDIT:
SELECT sub.*
FROM (
SELECT * FROM ONLY service_services
UNION ALL
SELECT * FROM fleet.service_services
WHERE deleted=false
) As sub
ORDER BY
CASE $1
WHEN 'name' THEN name
WHEN 'col_name' THEN col_name
ELSE column_name -- default sorting
END
LIMIT $2;
you can't pass column names as a variable (unless you're using dynamic query building)

Use different line separator in awk

I have a file as follows:
cat file
00:29:01|10.3.57.60|dbname1| SELECT
re.id,
re.event_type_cd,
re.event_ts,
re.source_type,
re.source_id,
re.properties
FROM
table1 re
WHERE
re.id > 621982999
AND re.id <= 884892348
ORDER BY
re.id
^
00:01:00|10.3.56.101|dbname2|BEGIN;declare "SQL_CUR00000000009CE140" cursor for SELECT id, cast(event_type_cd as character(4)) event_type_cd, CAST(event_ts AS DATE) event_ts, CAST(source_id AS character varying(100)) source_id, CAST(tx_id AS character varying(100)) tx_id, CAST(properties AS character varying(4000)) properties, CAST(source_type AS character(1)) source_type FROM table1 WHERE ID > 514725989 ORDER BY ID limit 500000;fetch 500000 in "SQL_CUR00000000009CE140"^
These are the output of sql results delimited by pipe (|). In order to identify new line I used ^ at the end of each row.
I want to get the output as:
1/00:29:01|10.3.57.60|parasol_ams| SELECT
re.id,
re.event_type_cd,
re.event_ts,
re.source_type,
re.source_id,
re.properties
FROM
table1 re
WHERE
re.id > 621982999
AND re.id <= 884892348
ORDER BY
re.id
2/00:01:00|10.3.56.101|parasol_sprint_tep|BEGIN;declare "SQL_CUR00000000009CE140" cursor for SELECT id, cast(event_type_cd as character(4)) event_type_cd, CAST(event_ts AS DATE) event_ts, CAST(source_id AS character varying(100)) source_id, CAST(tx_id AS character varying(100)) tx_id, CAST(properties AS character varying(4000)) properties, CAST(source_type AS character(1)) source_type FROM table1 WHERE ID > 514725989 ORDER BY ID limit 500000;fetch 500000 in "SQL_CUR00000000009CE140"
But when I am using:
cat file | awk -F '|' -v RS="^" '{ print FNR "/" $0 }'
I get:
1/00:29:01|10.3.57.60|parasol_ams| SELECT
re.id,
re.event_type_cd,
re.event_ts,
re.source_type,
re.source_id,
re.properties
FROM
table1 re
WHERE
re.id > 621982999
AND re.id <= 884892348
ORDER BY
re.id
2/
00:01:00|10.3.56.101|parasol_sprint_tep|BEGIN;declare "SQL_CUR00000000009CE140" cursor for SELECT id, cast(event_type_cd as character(4)) event_type_cd, CAST(event_ts AS DATE) event_ts, CAST(source_id AS character varying(100)) source_id, CAST(tx_id AS character varying(100)) tx_id, CAST(properties AS character varying(4000)) properties, CAST(source_type AS character(1)) source_type FROM table1 WHERE ID > 514725989 ORDER BY ID limit 500000;fetch 500000 in "SQL_CUR00000000009CE140"
3/
awk '/^\^/{next}/\|/{sub("^",++c"/")}1' file
awk -vRS='^' -F '|' '{sub("^\n","")}{printf "%s/TIME:%s HOST:%s DB:%s SQL:%s",FNR,$1,$2,$3,$4}' file