I have two tables : DOCUMENT and METADATA. DOCUMENT stores an ID and some informations we're not interested in, METADATA stores "tags" for those documents. A tag is composed of a key and a value.
So for one document, there is only one entry in the DOCUMENT table, but possibly many in the METADATA table.
Now what I need is to pass a set of keys/values, and retrieve from the METADATA table only the documents that match ALL the keys/values. Which means inspecting different rows "at the same time", well, I don't really know how to do it.
Quick example:
META_KEY | META_VALUE | META_DOCUMENT_ID
----------------------------------------
Firstname| Chris | 1
Lastname | Doe | 1
Firstname| Chris | 2
Lastname | Moe | 2
So if I query with the following tags : "Firstname"="Chris", "Lastname"="Doe", I want 1 as result. If I only specify "Firstname"="Chris" I want both 1 and 2 as results.
Thanks a lot for any help !
EDIT :
How about something where I count the number of tags that have to match ?
Like this :
select meta_document_id, count(*) from metadata where (meta_key = 'Firstname' and meta_value = 'Chris') or (meta_key = 'Lastname' and meta_value = 'Doe') group by meta_document_id
With the count(*) I can easily find out if all the input key/value pairs have matched. How would that run performance-wise ?
Well, you are employing a database model named "key-value" or "Entity-attributte-value".
This is usually not a best choice, you can read more on this in these questions:
Key/Value pairs in a database table
Key value pairs in relational database
You need two separate queries for these two cases like this:
SELECT distinct META_DOCUMENT_ID
FROM METADATA
WHERE meta_key = 'Firstname' and meta_value = 'Chris'
SELECT distinct m1.META_DOCUMENT_ID
FROM METADATA m1
JOIN METADATA m2
ON m1.META_DOCUMENT_ID = m2.META_DOCUMENT_ID
WHERE m1.meta_key = 'Firstname' and m1.meta_value = 'Chris'
AND m2.meta_key = 'Lastname' and m2.meta_value = 'Doe'
EDIT:
I suppose I'll have to join N times the table for N key/value pairs ?
This could be done without a join, for example like below (assuming that each id has no more than 1 meta_key value):
SELECT META_DOCUMENT_ID
FROM METADATA
WHERE (meta_key, meta_value) IN
( ('Firstname' ,'Chris'), ('Lastname', 'Doe' ) )
GROUP BY META_DOCUMENT_ID
HAVING COUNT(*) = 2 /* 2 means that we are looking for 2 meta keys */
How is that going to run performance-wise ?
Terribly. See an explanation from links above about this model.
This query must in many cases do a full table scan (especially when a number of attributes/keys we are looking for is more than a few), count values for each id, then pick these id that have count = 2.
In a normalized model this is a simple query that can use indexes to quickly pick only these few rows with firstname = 'Chris'
SELECT *
FROM table
WHERE firstname = 'Chris' and lastname = 'Doe'
Oracle Setup:
CREATE TYPE KEY_VALUE_PAIR IS OBJECT (
KEY VARCHAR2(50),
VALUE VARCHAR2(50)
);
/
CREATE TYPE KEY_VALUE_TABLE IS TABLE OF KEY_VALUE_PAIR;
/
CREATE TABLE meta_data ( meta_key, meta_value, meta_document_id ) AS
SELECT 'Firstname', 'Chris', 1 FROM DUAL UNION ALL
SELECT 'Lastname', 'Doe', 1 FROM DUAL UNION ALL
SELECT 'Phonenumber', '555-2368', 1 FROM DUAL UNION ALL
SELECT 'Firstname', 'Chris', 2 FROM DUAL UNION ALL
SELECT 'Lastname', 'Moe', 2 FROM DUAL UNION ALL
SELECT 'Phonenumber', '555-0001', 2 FROM DUAL;
Query:
SELECT meta_document_id
FROM (
SELECT meta_document_id,
CAST(
COLLECT(
KEY_VALUE_PAIR( meta_key, meta_value )
) AS KEY_VALUE_TABLE
) AS key_values
FROM meta_data
GROUP BY meta_document_id
)
WHERE KEY_VALUE_TABLE(
-- Your values here:
KEY_VALUE_PAIR( 'Firstname', 'Chris' ),
KEY_VALUE_PAIR( 'Lastname', 'Doe' )
)
SUBMULTISET OF key_values;
Output:
META_DOCUMENT_ID
------------------
1
Update - Reimplementing the meta data table using a nested table:
Oracle Setup:
CREATE TYPE KEY_VALUE_PAIR IS OBJECT (
META_KEY VARCHAR2(50),
META_VALUE VARCHAR2(50)
);
/
CREATE TYPE KEY_VALUE_TABLE IS TABLE OF KEY_VALUE_PAIR;
/
CREATE TABLE meta_data (
meta_document_id INT,
key_values KEY_VALUE_TABLE
) NESTED TABLE key_values STORE AS meta_data_key_values;
CREATE UNIQUE INDEX META_DATA_KEY_VALUES_IDX ON META_DATA_KEY_VALUES (
NESTED_TABLE_ID,
META_KEY,
META_VALUE
);
/
-- Insert everything in one go:
INSERT INTO META_DATA VALUES(
1,
KEY_VALUE_TABLE(
KEY_VALUE_PAIR( 'Firstname', 'Chris' ),
KEY_VALUE_PAIR( 'Lastname', 'Doe' ),
KEY_VALUE_PAIR( 'Phonenumber', '555-2368' )
)
);
-- Insert everything in bits:
INSERT INTO meta_data VALUE ( 2, KEY_VALUE_TABLE() );
INSERT INTO TABLE( SELECT key_values FROM meta_data WHERE meta_document_id = 2 )
( meta_key, meta_value ) VALUES( 'Firstname', 'Chris' );
INSERT INTO TABLE( SELECT key_values FROM meta_data WHERE meta_document_id = 2 )
( meta_key, meta_value ) VALUES( 'Lastname', 'Moe' );
INSERT INTO TABLE( SELECT key_values FROM meta_data WHERE meta_document_id = 2 )
( meta_key, meta_value ) VALUES( 'Phonenumber', '555-0001' );
--Select all the key-value pairs:
SELECT META_DOCUMENT_ID,
META_KEY,
META_VALUE
FROM META_DATA md,
TABLE( md.KEY_VALUES );
Query:
The changes above let you simplify the query a lot:
SELECT META_DOCUMENT_ID
FROM meta_data
WHERE KEY_VALUE_TABLE(
-- Your values here:
KEY_VALUE_PAIR( 'Firstname', 'Chris' ),
KEY_VALUE_PAIR( 'Lastname', 'Doe' )
)
SUBMULTISET OF key_values;
If you know in advance all the possible TAGS, an approach could be with some PIVOT:
with METADATA (META_KEY, META_VALUE, META_DOCUMENT_ID) as
(
select 'Firstname', 'Chris',1 from dual union all
select 'Lastname', 'Doe',1 from dual union all
select 'Firstname', 'Chris',2 from dual union all
select 'Lastname', 'Moe',2 from dual
)
select *
from metadata
PIVOT ( max (META_VALUE ) FOR (META_KEY) IN ('Firstname' AS Firstname, 'Lastname' AS Lastname))
where Firstname = 'Chris' /* and Lastname ='Doe' ...*/
Related
An app is developed where a user picks what data he wants to see in a report. Having data as
ReportDataValues
ID
TableName
ColumnName
1
customer
first_name
2
address
zip_code
Customer
ID
first_name
last_name
address_id
1
joe
powell
1
2
andy
smith
2
Address
ID
street
zip_code
1
main ave.
48521
2
central str.
56851
is it possible using generic SQL mechanisms (PIVOT, UNPIVOT or other way) to select such data from only specified table.column pairs in DataValues table as rows so the query is compatible with SQL Server and Oracle and is not using dynamic execution of generated statements (like EXEC(query) or EXECUTE IMMEDIATE (query) ), so the result would be like
Col1
Col2
joe
48521
andy
56851
Later SQL statement will be used in a SAP Crystal Reports reporting engine.
In Oracle, join the customer and address tables to every row of reportdatavalues and then use a CASE expression to correlate the expected value with the table columns and pivot:
SELECT col1, col2
FROM (
SELECT c.id,
r.id AS value_id,
CASE
WHEN r.tablename = 'customer' AND r.columnname = 'id'
THEN TO_CHAR(c.id)
WHEN r.tablename = 'customer' AND r.columnname = 'first_name'
THEN c.first_name
WHEN r.tablename = 'customer' AND r.columnname = 'last_name'
THEN c.last_name
WHEN r.tablename = 'address' AND r.columnname = 'street'
THEN a.street
WHEN r.tablename = 'address' AND r.columnname = 'zip_code'
THEN TO_CHAR(a.zip_code)
END AS value
FROM customer c
INNER JOIN address a
ON a.id = c.address_id
CROSS JOIN ReportDataValues r
)
PIVOT (
MAX(value) FOR value_id IN (1 AS col1, 2 AS col2)
)
Which, for the sample data:
CREATE TABLE ReportDataValues (ID, TableName, ColumnName) AS
SELECT 1, 'customer', 'first_name' FROM DUAL UNION ALL
SELECT 2, 'address', 'zip_code' FROM DUAL;
CREATE TABLE Customer (ID, first_name, last_name, address_id) AS
SELECT 1, 'joe', 'powell', 1 FROM DUAL UNION ALL
SELECT 2, 'andy', 'smith', 2 FROM DUAL;
CREATE TABLE Address (ID, street, zip_code) AS
SELECT 1, 'main ave.', 48521 FROM DUAL UNION ALL
SELECT 2, 'central str.', 56851 FROM DUAL;
Outputs:
COL1
COL2
joe
48521
andy
56851
fiddle
I have two tables I want to create a UNION of, but they do not have the same schema.
Table 1 looks like this:
Table 1
Table 2 looks like this:
Table 2
How can I select all data for Table 2 and add a record for details.employment.locations city,regionCode, and Formatted. I want to just include null values for all of them using BigQuery SQL.
Thank you,
This should work. I have created ctes for table1 as table2 as per your schema. Since you have tables in place you can start your query from UNIONED cte
WITH TABLE1 AS
(
SELECT
'WEBSITE 1' AS WEBSITE,
STRUCT (
STRUCT ('NAME 1 ') AS NAME,
STRUCT ('AGE 1') AS AGE,
'GENDER' AS GENDER,
STRUCT(
'EMPLOYEMENT NAME 1' AS NAME,
FALSE AS _CURRENT,
'EMPLOYEMENT TITLE 1' AS TITLE,
STRUCT(
'EMPLOYEMENT LOCATION 1' AS CITY,
'EMPLOYEMENT REGION_CODE 1' AS REGION_CODE,
'EMPLOYEMENT FORMATTED 1' AS FORMATTED
) AS LOCATION
) AS EMPLOYEMENT
) AS DETAILS
)
, TABLE2 AS
(
SELECT
'WEBSITE 1' AS WEBSITE,
STRUCT (
STRUCT ('NAME 2') AS NAME,
STRUCT ('AGE 2') AS AGE,
'GENDER' AS GENDER,
STRUCT(
'EMPLOYEMENT NAME 2' AS NAME,
FALSE AS _CURRENT,
'EMPLOYEMENT TITLE 2' AS TITLE
) AS EMPLOYEMENT
) AS DETAILS
)
,UNIONED AS
(
SELECT WEBSITE,
DETAILS.NAME AS NAME,
DETAILS.AGE AS AGE,
DETAILS.GENDER AS GENDER,
DETAILS.EMPLOYEMENT.NAME AS EMPLOYEMENT_NAME,
DETAILS.EMPLOYEMENT._CURRENT AS EMPLOYEMENT_CURRENT,
DETAILS.EMPLOYEMENT.TITLE AS EMPLOYEMENT_TITLE,
DETAILS.EMPLOYEMENT.LOCATION.CITY AS EMPLOYEMENT_LOCATION_CITY,
DETAILS.EMPLOYEMENT.LOCATION.REGION_CODE AS EMPLOYEMENT_LOCATION_REGION_CODE,
DETAILS.EMPLOYEMENT.LOCATION.FORMATTED AS EMPLOYEMENT_LOCATION_FORMATTED,
FROM TABLE1
UNION ALL
SELECT WEBSITE,
DETAILS.NAME,
DETAILS.AGE,
DETAILS.GENDER,
DETAILS.EMPLOYEMENT.NAME,
DETAILS.EMPLOYEMENT._CURRENT,
DETAILS.EMPLOYEMENT.TITLE,
NULL ,
NULL,
NULL
FROM TABLE2
)
SELECT
WEBSITE,
STRUCT (
STRUCT (U.NAME ) AS NAME,
STRUCT (U.AGE) AS AGE,
U.GENDER,
STRUCT (
U.EMPLOYEMENT_NAME AS NAME,
U.EMPLOYEMENT_CURRENT AS _CURRENT,
U.EMPLOYEMENT_TITLE AS TITLE,
STRUCT (
U.EMPLOYEMENT_LOCATION_CITY AS CITY,
U.EMPLOYEMENT_LOCATION_REGION_CODE AS REGION_COD,
U.EMPLOYEMENT_LOCATION_FORMATTED AS FORMATTED
) AS LOCATION
) AS EMPLOYMENT
) AS DETAILS
FROM UNIONED AS U
any idea why COALESCE is not returning the string 'emptyfield' at all?
SELECT
userid,
string_agg(COALESCE(configvalue, 'emptyfield'), '|')
FROM
oc_preferences
WHERE
configkey IN(
'email',
'quota',
'lastLogin',
'displayName'
)
GROUP BY
userid;
i'm getting mixed field order and i'm also missing NULL values
480f0c81-8090aa8f|1 GB|John Smith|john.smith#e-mail.com|1551376267
9094f888-aa4ef8ef|peter.calo#domain.com|1 GB|Peter Calo
34555345-76867888|Mary Aston|2 GB
but i expected something like
480f0c81-8090aa8f|1 GB|John Smith|john.smith#e-mail.com|1551376267
9094f888-aa4ef8ef|1 GB|Peter Calo|peter.calo#domain.com|emptyfield
34555345-76867888|2 GB|Mary Aston|emptyfield |emptyfield
the table looks like this
SELECT *
FROM oc_preferences
ORDER by userid
FETCH first 5 rows only
userid|appid|configkey|configvalue
480f0c81-8090aa8f|avatar|generated|true
480f0c81-8090aa8f|files|quota|1 GB
480f0c81-8090aa8f|settings|email|john.smith#e-mail.com
480f0c81-8090aa8f|user_ldap|displayName|John Smith
480f0c81-8090aa8f|user_ldap|homePath|
live example: https://www.db-fiddle.com/f/2rNofuyfKv3j3YLyRctE7U/1
Applying a custom order before aggregating may help with the mixed field values. I assume that for every user that you will have all configkeys even if their values are empty in this table.
See fiddle below:
Schema (PostgreSQL v11)
CREATE TABLE oc_preferences (
userid VARCHAR(17),
appid VARCHAR(9),
configkey VARCHAR(11),
configvalue VARCHAR(21)
);
INSERT INTO oc_preferences
(userid, appid, configkey, configvalue)
VALUES
('480f0c81-8090aa8f', 'avatar', 'generated', 'true'),
('480f0c81-8090aa8f', 'files', 'quota', '1 GB'),
('480f0c81-8090aa8f', 'settings', 'email', 'john.smith#e-mail.com'),
('480f0c81-8090aa8f', 'user_ldap', 'displayName', 'John Smith'),
('480f0c81-8090aa8f', 'user_ldap', 'homePath', '');
Query #1
WITH oc_preferences_sorted AS (
SELECT
userid,configvalue,
CASE
WHEN configkey='quota' THEN 1
WHEN configkey='displayName' THEN 2
WHEN configkey='email' THEN 3
WHEN configkey='lastLogin' THEN 4
END as custom_order
FROM
oc_preferences
WHERE
configkey IN(
'email',
'quota',
'lastLogin',
'displayName'
)
ORDER BY 3
)
SELECT
userid,
string_agg(COALESCE(configvalue, 'emptyfield'), '|')
FROM
oc_preferences_sorted
GROUP BY
userid;
| userid | string_agg |
| ----------------- | ------------------------------------- |
| 480f0c81-8090aa8f | 1 GB|John Smith|john.smith#e-mail.com |
COALESCE will not return the string 'emptyfield' if a record with that configkey does not exist as there is nothing to aggregate as shown in the example above.
The following query creates all possible values and continues with your approach to create the concatenated field
WITH user_detaults AS (
SELECT
userid,
configkey
FROM
(SELECT DISTINCT userid from oc_preferences) users
INNER JOIN
(
SELECT 'quota' as configkey UNION ALL
SELECT 'displayName' as configkey UNION ALL
SELECT 'email' as configkey UNION ALL
SELECT 'lastLogin' as configkey
) keys ON 1=1
),
oc_preferences_sorted AS (
SELECT
u.userid,op.configvalue,
CASE
WHEN u.configkey='quota' THEN 1
WHEN u.configkey='displayName' THEN 2
WHEN u.configkey='email' THEN 3
WHEN u.configkey='lastLogin' THEN 4
END as custom_order
FROM
oc_preferences op
RIGHT JOIN
user_detaults u on u.userid = op.userid AND
u.configkey = op.configkey
ORDER BY 3
)
SELECT
userid,
string_agg(COALESCE(configvalue, 'emptyfield'), '|')
FROM
oc_preferences_sorted
GROUP BY
userid;
| userid | string_agg |
| ----------------- | ------------------------------------------------ |
| 480f0c81-8090aa8f | 1 GB|John Smith|john.smith#e-mail.com|emptyfield |
View on DB Fiddle
Recommended Edit
WITH user_detaults AS (
SELECT
userid,
configkey
FROM
(SELECT DISTINCT userid from oc_preferences) users
CROSS JOIN
(
SELECT 'quota' as configkey UNION ALL
SELECT 'displayName' as configkey UNION ALL
SELECT 'email' as configkey UNION ALL
SELECT 'lastLogin' as configkey
) keys
),
oc_preferences_sorted AS (
SELECT
u.userid,op.configvalue,
CASE
WHEN u.configkey='quota' THEN 1
WHEN u.configkey='displayName' THEN 2
WHEN u.configkey='email' THEN 3
WHEN u.configkey='lastLogin' THEN 4
END as custom_order
FROM
oc_preferences op
RIGHT JOIN
user_detaults u on u.userid = op.userid AND
u.configkey = op.configkey
)
SELECT
userid,
string_agg(COALESCE(configvalue, 'emptyfield') , '|' ORDER BY custom_order )
FROM
oc_preferences_sorted
GROUP BY
userid;
DB Fiddle
Trying to bring multiple columns into rows. The intended result is
Here's sample data with what I tried. I am open to unpivot as well if that's faster overall. The full data has 15 AttributeID, AttributeData columns.
DROP TABLE Attribute;
CREATE TABLE Attribute
(
Producttitle varchar(200),
AttributeID_1 varchar(50),
AttributeData_1 varchar(50),
AttributeID_2 varchar(50),
AttributeData_2 varchar(50),
AttributeID_3 varchar(50),
AttributeData_3 varchar(50)
);
INSERT INTO Attribute
VALUES ('title1', '3145', 'Specific', '30', 'Yes', '40', 'Pink')
INSERT INTO Attribute
VALUES ('title2', '17', 'Stainless', '19', 'smoke', '19', 'Something');
SELECT
Producttitle,
[AttributeID],
[AttributeData]
FROM
Attribute
CROSS APPLY
(SELECT 'Indicator1', [AttributeID_1] UNION ALL
SELECT 'Indicator2', [AttributeID_2] UNION ALL
SELECT 'Indicator3', [AttributeID_3]) c (indicatorname, [AttributeID])
CROSS APPLY
(SELECT 'Indicator1', [AttributeData_1] UNION ALL
SELECT 'Indicator2', [AttributeData_2] UNION ALL
SELECT 'Indicator3', [AttributeData_3]) d (indicatorname, [AttributeData]);
You can use cross apply to unpivot your dataset. It is much simpler with values():
select a.title, x.*
from attribute a
cross apply (values
(a.attributeId_1, a.attributeData_1),
(a.attributeId_2, a.attributeData_2),
(a.attributeId_3, a.attributeData_3)
) as x(attributeId, attributeData)
Note that this works because the two groups of columns have consistent data types - otherwise additional casting would be required.
GMB's solution is really cool, but basic unions will also work:
SELECT Producttitle, AttributeID_1 AttributeID, AttributeData_1 AttributeData
from attribute
union
SELECT Producttitle, AttributeID_2 AttributeID, AttributeData_2 AttributeData
from attribute
union
SELECT Producttitle, AttributeID_3 AttributeID, AttributeData_3 AttributeData
from attribute
So I Have a table called Value that's associated with different 'Fields'. Note that some of these fields have similar 'names' but they are named differently. Ultimately I want these 'similar names' to be pivoted/grouped as the same field name in the result set
VALUE_ID VALUE_TX FIELD_NAME Version_ID
1 Yes Adult 1
2 18 Age 1
3 Black Eye Color 1
4 Yes Is_Adult 2
5 25 Years_old 2
6 Brown Color_of_Eyes 2
I have a table called Submitted that looks like the following:
Version_ID Version_Name
1 TEST_RUN
2 REAL_RUN
I need a result set that Looks like this:
Submitted_Name Adult? Age Eye_Color
TEST_RUN Yes 18 Black
REAL_RUN Yes 25 Brown
I've tried the following:
SELECT * FROM (
select value_Tx, field_name, version_id
from VALUE
)
PIVOT (max (value_tx) for field_name in (('Adult', 'Is_Adult') as 'Adult?', ('Age', 'Years_old') as 'Age', ('Eye Color', 'Color_of_Eyes') as 'Eye_Color')
);
What am I doing wrong? Please let me know if I need to add any additional details / data.
Thanks in advance!
The error message that I am getting is the following:
ORA-00907: missing right parenthesis
I would change the field names in the subquery:
SELECT *
FROM (select value_Tx,
(case when field_name in ('Adult', 'Is_Adult') then 'Adult?'
field_name in ('Age', 'Years_old') then 'Age'
field_name in ('Eye Color', 'Color_of_Eyes') then 'Eye_Color'
else field_name
end) as field_name, version_id
from VALUE
)
PIVOT (max(value_tx) for field_name in ('Adult?', 'Age', 'Eye_Color'));
You can use double quotes for column aliasing within the pivot clause's part, and I think decode function suits well for this question. You can consider using the following query :
with value( value_id, value_tx, field_name, version_id ) as
(
select 1 ,'Yes' ,'Adult' ,1 from dual union all
select 2 ,'18' ,'Age' ,1 from dual union all
select 3 ,'Black','Eye_Color' ,1 from dual union all
select 4 ,'Yes' ,'Is_Adult' ,2 from dual union all
select 5 ,'25' ,'Years_old' ,2 from dual union all
select 6 ,'Brown','Color_of_Eyes',2 from dual
), Submitted( version_id, version_name ) as
(
select 1 ,'TEST_RUN' from dual union all
select 2 ,'REAL_RUN' from dual
)
select * from
(
select s.version_name as "Submitted_Name", v.value_Tx,
decode(v.field_name,'Adult','Is_Adult','Age','Years_old','Eye_Color',
'Color_of_Eyes',v.field_name) field_name
from value v
join Submitted s
on s.version_id = v.version_id
group by decode(v.field_name,'Adult','Is_Adult','Age','Years_old','Eye_Color',
'Color_of_Eyes',v.field_name),
v.value_Tx, s.Version_Name
)
pivot(
max(value_tx) for field_name in ( 'Is_Adult' as "Adult?", 'Years_old' as "Age",
'Color_of_Eyes' as "Eye_Color" )
);
Submitted_Name Adult? Age Eye_Color
REAL_RUN Yes 25 Brown
TEST_RUN Yes 18 Black
I think, better to solve as much as shorter way, as an example, using modular arithmetic would even be better as below :
select *
from
(
select s.version_name as "Submitted_Name", v.value_Tx, mod(v.value_id,3) as value_id
from value v
join Submitted s
on s.version_id = v.version_id
group by v.value_Tx, s.version_name, mod(v.value_id,3)
)
pivot(
max(value_tx) for value_id in ( 1 as "Adult?", 2 as "Age", 0 as "Eye_Color" )
)
Demo