Query table by indexes from integer array - sql

After I get excellent results with converting data "to_timestamp" and "to_number" from VB.NET I am wondering if PostgreSQL have possibility to query table indexes by array of integers from .NET?
Say, I have array filled with (1, 3, 5, 6, 9).
Is here any possibility that PostgreSQL return rows with data from those indexes to "odbc.reader"?
That would be much faster than looping and querying 5 times like I do now.
Something like this:
SELECT myindexes, myname, myadress from mytable WHERE myindexes IS IN ARRAY
If this is possible how a simple query should look like?

That's possible.
ANY
SELECT myindex, myname, myadress
FROM mytable
WHERE myindex = ANY ($my_array)
Example with integer-array:
...
WHERE myindex = ANY ('{1,3,5,6,9}'::int[])
Details about ANY in the manual.
IN
There is also the SQL IN() expression for the same purpose.
PostgreSQl in its current implementation transforms that to = ANY (array) internally prior to execution, so it's conceivably a bit slower.
Examples for joining to a long list (as per comment):
JOIN to VALUES expression
WITH x(myindex) AS (
VALUES
(1),(3),(5),(6),(9)
)
SELECT myindex, myname, myadress
FROM mytable
JOIN x USING (myindex)
I am using a CTE in the example (which is optional, could be a sub-query as well). You need PostgreSQL 8.4 of later for that.
The manual about VALUES.
JOIN to unnested array
Or you could unnest() an array and JOIN to it:
SELECT myindex, myname, myadress
FROM mytable
JOIN (SELECT unnest('{1,3,5,6,9}'::int[]) AS myindex) x USING (myindex)
Each of these methods is far superior in performance to running a separate query per value.

Related

How to select a column whose name is a value in another column in POSTGRESQL?

I know this isn't valid SQL, but I'd like to do something like:
SELECT items.{SELECT items.preferred_column}
To elaborate, to achieve what I'm trying to achieve, I could write a long case when statement:
SELECT
CASE WHEN items.preferred_column = "column_a" THEN items.column_a
CASE WHEN items.preferred_column = "column_b" THEN items.column_b
CASE WHEN items.preferred_column = "column_c" THEN items.column_c
... and so on...
But that seems wrong. I would prefer to write a query that looks at the value of items.preferred_column and loads that column.
Is this possible?
My use case involves an Active Record (the ORM for Rails) query, which limits me. I'm not able to use "INTO" for example.
Doing this without creating a SQL function would preferred, though if it's not possible without creating a SQL function that would be good to know.
Thanks in advance for lending your expertise!
You can try transforming the table rows with row_to_json() and then using json_each(), you can join the resultant "key" field on the preferred_column:
WITH CTE AS (
SELECT
row_to_json(Z.*)::jsonb as rcr,
row_number() over(partition by null order by <whatever comparator clause>) as rn,
Z.*
FROM items Z)
SELECT b.value, a.*
FROM CTE a, jsonb_each(rcr) b, CTE c
WHERE c.rn=a.rn AND b.key = ( c.preferred_column )
Note that this essentially operates as a quasi-pivot, so you'll need to maintain an index (the row_number invocation) to self-join the table when extracting the appropriate key-value pairs from jsonb_each's set-return. Casting to jsonb will be helpful in that the binary form will alphabetize the key-value pairs by key order within the object itself.
If you need to get the resultant value as a text string instead of a json primitive, you can do
b.value #>>'{}'
instead of using jsonb_each_text(), which will preserve any json columns.

Filtering several numbers

I want to get some data froma Oracle DB to Power QUery (Excel).
I am managing this with a sql-statement.
There are 10 columns (50 total) and millions of rows. I need to filter some data / columns. The criterie are just some numbers like:
100258
100256
100333
100055
This are just SAP-Cost center
For now I have just a Where-statment, which includes 22 different numbers.
WHERE column1 = 100256,
column1 = 100258, ....
Is there maybe a more elegant way?
Maybe something like an array?
best regards
Joshua
You may use WHERE IN e.g.
WHERE column1 IN(100256, 100258, ...)
If you expect to have more values than can be supported by IN (1000, I think) then consider creating a table to store the values, say table1 with a column val. Then you could use:
WHERE column1 IN (SELECT val FROM table1)
You might also consider joining to table1, depending on what your actual query is.
Use with clause:
With numbers as (
Select 100256,100258 As number From dual
)
From table1,numbers
Where column1 = numbers.number

How to do calculations on json data in Postgres

I'm storing AdWords report data in Postgres. Each report is stored in a table named Reports, which has a jsonb column named 'data'. Each report has json stored in its 'data' field that looks that looks like this:
[
{
match_type: "exact",
search_query: "gm hubcaps",
conversions: 2,
cost: 1.24
},
{
match_type: "broad",
search_query: "gm auto parts",
conversions: 34,
cost: 21.33
},
{
match_type: "phrase",
search_query: "silverdo headlights",
conversions: 63,
cost: 244.05
}
]
What I want to do is query off these data hashes and sum up the total number of conversions for a given report. I've looked though the Postgresql docs and it looks like you can only really do calculations on hashes, not arrays of hashes like this. Is what I'm trying to do possible in postgres? Do I need to make a temp table out of this array and do calculations off that? Or can I use a stored procedure?
I'm using Postgresql 9.4
EDIT
The reason I'm not just using a regular, normalized table is that this is just one example of how report data could be structured. In my project, reports have to allow arbitrary keys, because they are populated by users uploading CSV's with any columns they like. It's basically just a way to get around having arbitrarily many, user-created tables.
What I want to do is query off these data hashes and sum up the conversions
The fastest way should be with jsonb_populate_recordset(). But you need a registered row type for it.
CREATE TEMP TABLE report_data (
-- match_type text -- commented out, because we only need ..
-- , search_query text -- .. conversions for this query
conversions int
-- , cost numeric
);
A temp table is one way to register a row type ad-hoc. More explanation in this related answer:
jsonb query with nested objects in an array
Assuming a table report with report_id as PK for lack of inforamtion.
SELECT r.report_id, sum(d.conversions) AS sum_conversions
FROM report r
LEFT JOIN LATERAL jsonb_populate_recordset(null::report_data, r.data) d ON true
-- WHERE r.report_id = 12345 -- only for given report?
GROUP BY 1;
The LEFT JOIN ensures you get a result, even if data is NULL or empty or the JSON array is empty.
For a sum from a single row in the underlying table, this is faster:
SELECT d.sum_conversions
FROM report r
LEFT JOIN LATERAL (
SELECT sum(conversions) AS sum_conversions
FROM jsonb_populate_recordset(null::report_data, r.data)
) d ON true
WHERE r.report_id = 12345; -- enter report_id here
Alternative with jsonb_array_elements() (no need for a registered row type):
SELECT d.sum_conversions
FROM report r
LEFT JOIN LATERAL (
SELECT sum((value->>'conversions')::int) AS sum_conversions
FROM jsonb_array_elements(r.data)
) d ON true
WHERE r.report_id = 12345; -- enter report_id here
Normally you would implement this as plain, normalized table. I don't see the benefit of JSON here (except that your application seems to require it, like you added).
You could use unnest:
select sum(conv) from
(select d->'conversion' as conv from
(select unnest(data) as d from <your table>) all_data
) all_conv
Disclaimer: I don't have Pg 9.2 so I couldn't test it myself.
EDIT: this is assuming that the array you mentioned is a Postgresql array, i.e. that the data type of your data column is character varying[]. If you mean the data is a json array, you should be able to use json_array_elements instead of unnest.

PostgreSQL case insensitive SELECT on array

I'm having problems finding the answer here, on google or in the docs ...
I need to do a case insensitive select against an array type.
So if:
value = {"Foo","bar","bAz"}
I need
SELECT value FROM table WHERE 'foo' = ANY(value)
to match.
I've tried lots of combinations of lower() with no success.
ILIKE instead of = seems to work but I've always been nervous about LIKE - is that the best way?
One alternative not mentioned is to install the citext extension that comes with PostgreSQL 8.4+ and use an array of citext:
regress=# CREATE EXTENSION citext;
regress=# SELECT 'foo' = ANY( '{"Foo","bar","bAz"}'::citext[] );
?column?
----------
t
(1 row)
If you want to be strictly correct about this and avoid extensions you have to do some pretty ugly subqueries because Pg doesn't have many rich array operations, in particular no functional mapping operations. Something like:
SELECT array_agg(lower(($1)[n])) FROM generate_subscripts($1,1) n;
... where $1 is the array parameter. In your case I think you can cheat a bit because you don't care about preserving the array's order, so you can do something like:
SELECT 'foo' IN (SELECT lower(x) FROM unnest('{"Foo","bar","bAz"}'::text[]) x);
This seems hackish to me but I think it should work
SELECT value FROM table WHERE 'foo' = ANY(lower(value::text)::text[])
ilike could have issues if your arrays can have _ or %
Note that what you are doing is converting the text array to a single text string, converting it to lower case, and then back to an array. This should be safe. If this is not sufficient you could use various combinations of string_to_array and array_to_string, but I think the standard textual representations should be safer.
Update building on subquery solution below, one option would be a simple function:
CREATE OR REPLACE FUNCTION lower(text[]) RETURNS text[] LANGUAGE SQL IMMUTABLE AS
$$
SELECT array_agg(lower(value)) FROM unnest($1) value;
$$;
Then you could do:
SELECT value FROM table WHERE 'foo' = ANY(lower(value));
This might actually be the best approach. You could also create GIN indexes on the output of the function if you want.
Another alternative would be with unnest()
WITH tbl AS (SELECT 1 AS id, '{"Foo","bar","bAz"}'::text[] AS value)
SELECT value
FROM (SELECT id, value, unnest(value) AS val FROM tbl) x
WHERE lower(val) = 'foo'
GROUP BY id, value;
I added an id column to get exactly identical results - i.e. duplicate value if there are duplicates in the base table. Depending on your circumstances, you can probably omit the id from the query to collapse duplicates in the results or if there are no dupes to begin with. Also demonstrating a syntax alternative:
SELECT value
FROM (SELECT value, lower(unnest(value)) AS val FROM tbl) x
WHERE val = 'foo'
GROUP BY value;
If array elements are unique within arrays in lower case, you don't even need the GROUP BY, since every value can only match once.
SELECT value
FROM (SELECT value, lower(unnest(value)) AS val FROM tbl) x
WHERE val = 'foo';
'foo' must be lower case, obviously.
Should be fast.
If you want that fast wit a big table, I would create a functional GIN index, though.
my solution to exclude values using a sub select...
and groupname not ilike all (
select unnest(array[exceptionname||'%'])
from public.group_exceptions
where ...
and ...
)
Regular expression may do the job for most cases
SELECT array_to_string('{"a","b","c"}'::text[],'|') ~* ANY('{"A","B","C"}');
I find creating a custom PostgreSQL function works best for me
CREATE OR REPLACE FUNCTION lower(text_array text[]) RETURNS text[] AS
$BODY$
SELECT (lower(text_array::text))::text[]
$BODY$
LANGUAGE SQL IMMUTABLE;

sql server-query optimization with many columns

we have "Profile" table with over 60 columns like (Id, fname, lname, gender, profilestate, city, state, degree, ...).
users search other peopel on website. query is like :
WITH TempResult as (
select ROW_NUMBER() OVER(ORDER BY #sortColumn DESC) as RowNum, profile.id from Profile
where
(#a is null or a = #a) and
(#b is null or b = #b) and
...(over 60 column)
)
SELECT profile.* FROM TempResult join profile on TempResult.id = profile.id
WHERE
(RowNum >= #FirstRow)
AND
(RowNum <= #LastRow)
sql server by default use clustered index for execution query. but total execution time is over 300. we test another solution such as multi column index in all columns in where clause but total execution time is over 400.
do you have any solution to make total execution time lower than 100.
we using sql server 2008.
Unfortunately I don't think there is a pure SQL solution to your issue. Here are a couple alternatives:
Dynamic SQL - build up a query that only includes WHERE clause statements for values that are actually provided. Assuming the average search actually only fills in 2-3 fields, indexes could be added and utilized.
Full Text Search - go to something more like a Google keyword search. No individual options.
Lucene (or something else) - Search outside of SQL; This is a fairly significant change though.
One other option that I just remembered implementing in a system once. Create a vertical table that includes all of the data you are searching on and build up a query for it. This is easiest to do with dynamic SQL, but could be done using Table Value Parameters or a temp table in a pinch.
The idea is to make a table that looks something like this:
Profile ID
Attribute Name
Attribute Value
The table should have a unique index on (Profile ID, Attribute Name) (unique to make the search work properly, index will make it perform well).
In this table you'd have rows of data like:
(1, 'city', 'grand rapids')
(1, 'state', 'MI')
(2, 'city', 'detroit')
(2, 'state', 'MI')
Then your SQL will be something like:
SELECT *
FROM Profile
JOIN (
SELECT ProfileID
FROM ProfileAttributes
WHERE (AttributeName = 'city' AND AttributeValue = 'grand rapids')
AND (AttributeName = 'state' AND AttributeValue = 'MI')
GROUP BY ProfileID
HAVING COUNT(*) = 2
) SelectedProfiles ON Profile.ProfileID = SelectedProfiles.ProfileID
... -- Add your paging here
Like I said, you could use a temp table that has attribute name/values:
SELECT *
FROM Profile
JOIN (
SELECT ProfileID
FROM ProfileAttributes
JOIN PassedInAttributeTable ON ProfileAttributes.AttributeName = PassedInAttributeTable.AttributeName
AND ProfileAttributes.AttributeValue = PassedInAttributeTable.AttributeValue
GROUP BY ProfileID
HAVING COUNT(*) = CountOfRowsInPassedInAttributeTable -- calculate or pass in
) SelectedProfiles ON Profile.ProfileID = SelectedProfiles.ProfileID
... -- Add your paging here
As I recall, this ended up performing very well, even on fairly complicated queries (though I think we only had 12 or so columns).
As a single query, I can't think of a clever way of optimising this.
Provided that each column's check is highly selective, however, the following (very long winded) code, might prove faster, assuming each individual column has it's own separate index...
WITH
filter AS (
SELECT
[a].*
FROM
(SELECT * FROM Profile WHERE #a IS NULL OR a = #a) AS [a]
INNER JOIN
(SELECT id FROM Profile WHERE b = #b UNION ALL SELECT NULL WHERE #b IS NULL) AS [b]
ON ([a].id = [b].id) OR ([b].id IS NULL)
INNER JOIN
(SELECT id FROM Profile WHERE c = #c UNION ALL SELECT NULL WHERE #c IS NULL) AS [c]
ON ([a].id = [c].id) OR ([c].id IS NULL)
.
.
.
INNER JOIN
(SELECT id FROM Profile WHERE zz = #zz UNION ALL SELECT NULL WHERE #zz IS NULL) AS [zz]
ON ([a].id = [zz].id) OR ([zz].id IS NULL)
)
, TempResult as (
SELECT
ROW_NUMBER() OVER(ORDER BY #sortColumn DESC) as RowNum,
[filter].*
FROM
[filter]
)
SELECT
*
FROM
TempResult
WHERE
(RowNum >= #FirstRow)
AND (RowNum <= #LastRow)
EDIT
Also, thinking about it, you may even get the same result just by having the 60 individual indexes. SQL Server can do INDEX MERGING...
You've several issues imho. One is that you're going to end up with a seq scan no matter what you do.
But I think your more crucial issue here is that you've an unnecessary join:
SELECT profile.* FROM TempResult
WHERE
(RowNum >= #FirstRow)
AND
(RowNum <= #LastRow)
This is a classic "SQL Filter" query problem. I've found that the typical approaches of "(#b is null or b = #b)" & it's common derivatives all yeild mediocre performance. The OR clause tends to be the cause.
Over the years I've done a lot of Perf/Tuning & Query Optimisation. The Approach I've found best is to generate Dynamic SQL inside a Stored Proc. Most times you also need to add "with Recompile" on the statement. The Stored Proc helps reduce potential for SQL injection attacks. The Recompile is needed to force the selection of indexes appropriate to the parameters you are searching on.
Generally it is at least an order of magnitude faster.
I agree you should also look at points mentioned above like :-
If you commonly only refer to a small subset of the columns you could create non-clustered "Covering" indexes.
Highly selective (ie:those with many unique values) columns will work best if they are the lead colum in the index.
If many colums have a very small number of values, consider using The BIT datatype. OR Create your own BITMASKED BIGINT to represent many colums ie: a form of "Enumerated datatyle". But be careful as any function in the WHERE clause (like MOD or bitwise AND/OR) will prevent the optimiser from choosing an index. It works best if you know the value for each & can combine them to use an equality or range query.
While often good to find RoWID's with a small query & then join to get all the other columns you want to retrieve. (As you are doing above) This approach can sometimes backfire. If the 1st part of the query does a Clustred Index Scan then often it is faster to get the otehr columns you need in the select list & savethe 2nd table scan.
So always good to try it both ways & see what works best.
Remember to run SET STATISTICS IO ON & SET STATISTICS TIME ON. Before running your tests. Then you can see where the IO is & it may help you with index selection, for the mose frequent combination of paramaters.
I hope this makes sense without long code samples. (it is on my other machine)