There is a Postgres database and the table has three columns. The data structure is in external system so I can not modify it.
Every object is represented by three rows (identified by column element_id - rows with the same value in this column represents the same object), for example:
key value element_id
-----------------------------------
status active 1
name exampleNameAAA 1
city exampleCityAAA 1
status inactive 2
name exampleNameBBB 2
city exampleCityBBB 2
status inactive 3
name exampleNameCCC 3
city exampleCityCCC 3
In the query, I want to put list of some names, check if the value of row with key column status in the same object has status 'active' and return the name of this objects only if the status is 'active'.
So for this example, there are three objects in the database table. I want to put in query two 'names':
a)exampleNameAAA
b)exampleNameCCC
and the result should be:
exampleNameAAA (because I asked for two objects and only one of them has active value in status row.
You can use an EXISTS query:
select e1.*
from element e1
where (e1.key, e1.value) in ( ('name', 'exampleNameAAA'), ('name', 'exampleNameCCC'))
and exists (select *
from element e2
where e2.element_id = e1.element_Id
and (e2.key, e2.value) = ('status', 'active'));
Online example: https://rextester.com/JOWED21150
One option uses aggregation:
SELECT
MAX(CASE WHEN "key" = 'name' THEN "value" END) AS name
FROM yourTable
GROUP BY element_id
HAVING
MAX(CASE WHEN "key" = 'name' THEN "value" END) IN
('exampleNameAAA', 'exampleNameCCC') AND
SUM(CASE WHEN "key" = 'status' AND "value" = 'active' THEN 1 ELSE 0 END) > 0;
name
exampleNameAAA
Demo
This is the pivot approach, where we isolate individual keys and values for each element_id group.
I like expressing this as:
SELECT MAX(t.value) FILTER (WHERE t.key = 'name') AS name
FROM t
GROUP BY t.element_id
HAVING MAX(t.value) FILTER (WHERE t.key = 'name') IN ('exampleNameAAA', 'exampleNameCCC') AND
MAX(t.value) FILTER (WHERE t.key = 'status') = 'active';
All that said, the exists solution is probably more performant in this case. The advantage of the aggregation approach is that you can easily bring additional columns into the select, such as the city:
SELECT MAX(t.value) FILTER (WHERE t.key = 'name') AS name,
MAX(t.value) FILTER (WHERE t.key = 'city') as city
(Note: key is a bad name for a column because it is a SQL keyword.)
Related
I am stuck with a specific scenario of flattening the data and need help for it. I need the output as flattened data where the column values are not fixed. Due to this I want to restrict the output to fixed set of columns.
Given Table 'test_table'
ID
Name
Property
1
C1
xxx
2
C2
xyz
2
C3
zz
The scenario is, column Name can have any no. of values corresponding to an ID. I need to flatten the data based in such a way that there is one row per ID field. Since the Name field varies with each ID, I want to flatten it for fix 3 columns like Co1, Co2, Co3. The output should look like
ID
Co1
Co1_Property
Co2
Co2_Property
Co3
Co3_Property
1
C1
xxx
null
null
2
C2
xyz
C3
zz
Could not think of a solution using Pivot or aggregation. Any help would be appreciated.
You can use arrays:
select id,
array_agg(name order by name)[safe_ordinal(1)] as name_1,
array_agg(property order by name)[safe_ordinal(1)] as property_1,
array_agg(name order by name)[safe_ordinal(2)] as name_2,
array_agg(property order by name)[safe_ordinal(2)] as property_2,
array_agg(name order by name)[safe_ordinal(3)] as name_3,
array_agg(property order by name)[safe_ordinal(3)] as property_3
from t
group by id;
All current answers are too verbose and involve heavy repetition of same fragments of code again and again and if you need to account more columns you need to copy paste and add more lines which will make it even more verbose!
My preference is to avoid such type of coding and rather use something more generic as in below example
select * from (
select *, row_number() over(partition by id) col
from `project.dataset.table`)
pivot (max(name) as name, max(property) as property for col in (1, 2, 3))
If applied to sample data in your question - output is
If you want to change number of output columns - you just simply modify for col in (1, 2, 3) part of query.
For example if you would wanted to have 5 columns - you would use for col in (1, 2, 3, 4, 5) - that simple!!!
The standard practice is to use conditional aggregation. That is, to use CASE expressions to pick which row goes to which column, then MAX() to collapse multiple rows into individual rows...
SELECT
id,
MAX(CASE WHEN name = 'C1' THEN name END) AS co1,
MAX(CASE WHEN name = 'C1' THEN property END) AS co1_property,
MAX(CASE WHEN name = 'C2' THEN name END) AS co2,
MAX(CASE WHEN name = 'C2' THEN property END) AS co2_property,
MAX(CASE WHEN name = 'C3' THEN name END) AS co3,
MAX(CASE WHEN name = 'C3' THEN property END) AS co3_property
FROM
yourTable
GROUP BY
id
Background info:
Not having an ELSE in the CASE expression implicitly means ELSE NULL
The intention is therefore for each column to recieve NULL from every input row, except for the row being pivoted into that column
Aggregates, such as MAX() essentially skip NULL values
MAX( {NULL,NULL,'xxx',NULL,NULL} ) therefore equals 'xxx'
A similar approach "bunches" the values to the left (so that NULL values always only appears to the right...)
That approach first uses row_number() to give each row a value corresponding to which column you want to put that row in to..
WITH
sorted AS
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY name) AS seq_num
FROM
yourTable
)
SELECT
id,
MAX(CASE WHEN seq_num = 1 THEN name END) AS co1,
MAX(CASE WHEN seq_num = 1 THEN property END) AS co1_property,
MAX(CASE WHEN seq_num = 2 THEN name END) AS co2,
MAX(CASE WHEN seq_num = 2 THEN property END) AS co2_property,
MAX(CASE WHEN seq_num = 3 THEN name END) AS co3,
MAX(CASE WHEN seq_num = 3 THEN property END) AS co3_property
FROM
yourTable
GROUP BY
id
I am having a table with below specified structure
From the table, I just want to retrieve the product id which is having Ram with value 12 and color with Blue. The expected result is 1.
I tried many queries and it's not sharing the expected result.
What will be the solution?
It's very difficult to manage the separate table for each feature as we have an undefined set of features.
You can use conditional aggregation:
select productid
from t
group by productid
having max(case when feature = 'Ram' then value end) = '12' and
max(case when feature = 'Color' then value end) = 'Blue';
use correlated subquery with not exists
select distinct product_id from tablename a
where not exists
(select 1 from tablename b where a.product_id=b.product_id and feature='Ram' and value<>12)
and not exists
(select 1 from tablename c where a.product_id=c.product_id and feature='Color' and value<>'blue')
I’ve following data in the table
id Parameter Value
10 Location New York
10 Business SME
9 Location London
9 Business SME
8 Location New York
8 Business IT
I want a single row from the following:
where location = New York and Business = SME
The below Query returns multiple rows due to OR condition.
SELECT * from TABLEA WHERE
(Parameter='Location' AND DataValue = 'New York')
OR ( Parameter='Business' AND DataValue = 'SME')
Update:
Thanks everyone for your reply, exists and inner join resolved my problem. But in my case the column checking will happen dynamically based on certain conditions.
i.e.
id Parameter Value Value2 Value3
10 Location New York L1
10 Business SME B1
9 Location London L2
9 Business SME B2
8 Location New York L3
8 Business IT B3
Could it be possible to develop some query to check the columns dynamically?
You can use this.
SELECT * from TABLEA T WHERE
(Parameter='Location' AND DataValue = 'New York' )
AND EXISTS (
SELECT * FROM TABLEA T1 WHERE T1.id= T.id
AND ( T1.Parameter='Business' AND T1.DataValue = 'SME'))
You need to JOIN the table back to itself to get two rows into a single one. For example:
SELECT
Businesses.ID,
Locations.[Value] AS [Location],
Businesses.[Value] AS Business
FROM TABLEA AS Locations
JOIN TABLEA AS Businesses
ON Locations.ID = Businesses.ID
WHERE Locations.Parameter='Location' AND Locations.DataValue = 'New York'
AND Businesses.Parameter='Business' AND Businesses.DataValue = 'SME'
Other answers here may work, but the joins can add unnecessary cost to the query, or the query itself may be hard to generalise.
One approach that scans the data just once (no joins) and is reasonably general, is to take your query, aggregate the results and check that sufficient number of your criteria were matched.
Individual attributes/parameters can then be picked out of the results using conditional aggregation.
SELECT
id,
MAX(CASE WHEN Parameter = 'Location' THEN DataValue END) AS Location,
MAX(CASE WHEN Parameter = 'Business' THEN DataValue END) AS Business
FROM
TABLEA
WHERE
(Parameter='Location' AND DataValue = 'New York')
OR (Parameter='Business' AND DataValue = 'SME')
GROUP BY
id
HAVING
COUNT(*) = 2
This query is reasonably usual when querying an "Entity, Attribute, Value" table. In your case the "Entity" is the id column, the "Attribute" is the parameter column, and the "Value" is the DataValue column.
As you will no doubt find out, however, these are very poor for searching in the way you are doing. This is because many different entities may match some but not all of your conditions, all of which have to be checked, making it very slow.
They're very fast when the query has WHERE id = ??? and you then want to pick out the location or the business, it's just that they're very slow the way you are using them.
I recommend searching for use cases, optimisations and alternatives for "EAV" tables.
You'd typically aggregate over a key/value table when you want to get an entity that matches multiple attributes:
select id
from keyvalue
group by id
having count(case when parameter = 'Location' AND datavalue = 'New York' then 1 end) > 0
and count(case when parameter = 'Business' AND datavalue = 'SME' then 1 end) > 0;
or
select id
from keyvalue
where (parameter = 'Location' AND datavalue = 'New York')
or (parameter = 'Business' AND datavalue = 'SME')
group by id
having count(distinct parameter) = 2;
You can use TOP(1) expression.
SELECT TOP(1) ... columns ... FROM my_table WHERE ... filter condition ...
Use TOP keyword.
SELECT TOP 1 * from TABLEA WHERE
(Parameter='Location' AND DataValue = 'New York' )
OR ( Parameter='Business' AND DataValue = 'SME')
Change OR for AND:
SELECT * from TABLEA WHERE
(Parameter='Location' AND DataValue = 'New York' )
AND ( Parameter='Business' AND DataValue = 'SME')
Maybe the following:
SELECT Top 1 * from TABLEA
WHERE (Parameter='Location' AND DataValue = 'New York')
AND ( Parameter='Business' AND DataValue = 'SME')
I have a table with column uuid and type. I want all the uuid's 'xxxxx' such that no rows have uuid = 'xxxxx' AND type = 'buy'.
This ends up the same as if you took all uuid's in the table, and then removed all uuid's that match SELECT uuid FROM table WHERE type = 'buy'.
I approach these problems using aggregation and a having clause:
select a_uuid
from table t
group by a_uuid
having sum(case when type = 'Purchase' then 1 else 0 end) = 0;
EDIT:
If you have a table with one row per a_uuid, then the fastest is likely to be:
select a_uuid
from adtbs a
where not exists (select 1 from table t where t.a_uuid = a.a_uuid and t.type = 'Purchase');
For this query, you want an index on table(a_uuid, type).
select a_uuid
from t
group by a_uuid
having not bool_or(type = 'Purchase')
http://www.postgresql.org/docs/current/static/functions-aggregate.html#FUNCTIONS-AGGREGATE-TABLE
I am writing data to a table and allocating a "group-id" for each batch of data that is written. To illustrate, consider the following table.
GroupId Value
------- -----
1 a
1 b
1 c
2 a
2 b
3 a
3 b
3 c
3 d
In this example, there are three groups of data, each with similar but varying values.
How do I query this table to find a group that contains a given set of values? For instance, if I query for (a,b,c) the result should be group 1. Similarly, a query for (b,a) should result in group 2, and a query for (a, b, c, e) should result in the empty set.
I can write a stored procedure that performs the following steps:
select distinct GroupId from Groups -- and store locally
for each distinct GroupId: perform a set-difference (except) between the input and table values (for the group), and vice versa
return the GroupId if both set-difference operations produced empty sets
This seems a bit excessive, and I hoping to leverage some other commands in SQL to simplify. Is there a simpler way to perform a set-comparison in this context, or to select the group ID that contains the exact input values for the query?
This is a set-within-sets query. I like to solve it using group by and having:
select groupid
from GroupValues gv
group by groupid
having sum(case when value = 'a' then 1 else 0 end) > 0 and
sum(case when value = 'b' then 1 else 0 end) > 0 and
sum(case when value = 'c' then 1 else 0 end) > 0 and
sum(case when value not in ('a', 'b', 'c') then 1 else - end) = 0;
The first three conditions in the having clause check that each elements exists. The last condition checks that there are no other values. This method is quite flexible, for various exclusions and inclusion conditions on the values you are looking for.
EDIT:
If you want to pass in a list, you can use:
with thelist as (
select 'a' as value union all
select 'b' union all
select 'c'
)
select groupid
from GroupValues gv left outer join
thelist
on gv.value = thelist.value
group by groupid
having count(distinct gv.value) = (select count(*) from thelist) and
count(distinct (case when gv.value = thelist.value then gv.value end)) = count(distinct gv.value);
Here the having clause counts the number of matching values and makes sure that this is the same size as the list.
EDIT:
query compile failed because missing the table alias. updated with right table alias.
This is kind of ugly, but it works. On larger datasets I'm not sure what performance would look like, but the nested instances of #GroupValues key off GroupID in the main table so I think as long as you have a good index on GroupID it probably wouldn't be too horrible.
If Object_ID('tempdb..#GroupValues') Is Not Null Drop Table #GroupValues
Create Table #GroupValues (GroupID Int, Val Varchar(10));
Insert #GroupValues (GroupID, Val)
Values (1,'a'),(1,'b'),(1,'c'),(2,'a'),(2,'b'),(3,'a'),(3,'b'),(3,'c'),(3,'d');
If Object_ID('tempdb..#FindValues') Is Not Null Drop Table #FindValues
Create Table #FindValues (Val Varchar(10));
Insert #FindValues (Val)
Values ('a'),('b'),('c');
Select Distinct gv.GroupID
From (Select Distinct GroupID
From #GroupValues) gv
Where Not Exists (Select 1
From #FindValues fv2
Where Not Exists (Select 1
From #GroupValues gv2
Where gv.GroupID = gv2.GroupID
And fv2.Val = gv2.Val))
And Not Exists (Select 1
From #GroupValues gv3
Where gv3.GroupID = gv.GroupID
And Not Exists (Select 1
From #FindValues fv3
Where gv3.Val = fv3.Val))