Enter data for missing category in snowflake - sql

I have a table like
For each keyword, there are 2 devices - mobile and desktop. If entry for only one device is found, then it should automatically create the entry for other device keeping the data in rest of the columns same. I am currently doing a full outer join which is working fine for the case where one device category is missing but generating duplicates where both devices are present. For example,
my current query is giving the result as
select a.keyword, b.device, a.rating
from kw a full outer join kw b
on a.keyword=b.keyword and a.rating=b.rating
How do I get the result as

The first step will be to identify records that don't have a paired record. There's a couple of ways to do this, but the easiest is probably just a quick GROUP BY/HAVING:
SELECT keyword
FROM kw
GROUP BY keyword
HAVING COUNT(*) = 1
You can those join those results back into the original table to generate the new records that are needed:
SELECT sk.keyword,
CASE WHEN kw.device = 'mobile' THEN 'desktop' ELSE 'mobile' END as device,
kw.rating
FROM
(
SELECT keyword
FROM kw
GROUP BY keyword
HAVING COUNT(*) = 1
)sk
INNER JOIN kw ON kw.keyword = sk.keyword
Then you can UNION back in the original table to bring your new records and existing records into a single result set:
SELECT sk.keyword,
CASE WHEN kw.device = 'mobile' THEN 'desktop' ELSE 'mobile' END as device,
kw.rating
FROM
(
SELECT keyword
FROM kw
GROUP BY keyword
HAVING COUNT(*) = 1
)sk
INNER JOIN kw ON kw.keyword = sk.keyword
UNION ALL
SELECT * FROM kw;
As another option that will scale if you add in more 'devices' is to cross join all the potential device/keyword combinations and then left join to your original table:
SELECT
fe.keyword,
fe.device,
CASE WHEN kw.rating IS NULL THEN max(rating) OVER (PARTITION BY fe.keyword) ELSE kw.rating END AS rating
FROM
(
SELECT DISTINCT kw.keyword, kw2.device
FROM kw, kw kw2
) fe
LEFT OUTER JOIN kw ON kw.keyword = fe.keyword
AND kw.device = fe.device;

Related

JOIN AND CASE MORE AN TABLE

I have 2 tables; the first one ORG contains the following columns:
ORG_REF, ARB_REF, NAME, LEVEL, START_DATE
and the second one WORK contains these columns:
ARB_REF, WORK_STREET - WORK_NUM, WORK_ZIP
I want to do the following: write a select query that search in work and see if the WORK_STREET, WORK_ZIP are duplicate together, then you should look at WORK_NUM. If it is the same then output value ' ok ', but if WORK_NUM is not the same, output 'not ok'
I wrote this SQL query:
select
A.ARB_REF, A.WORK_STREET, A.WORK_NUM, A.WORK_ZIP
case when B.B = 1 then 'OK' else 'not ok' end
from
work A
join
(select
WORK_STREET, WORK_ZIP count(distinct , A.WORK_NUM) B
from
WORK
group by
WORK_STREET, WORK_ZIP) B on B.WORK_STREET = A.WORK_STREET
and B.WORK_ZIP = A.WORK_ZIP
Now I want to join the table ORG with this result I want to check if every address belong to org if it belong I should create a new column result and set it to yes in it (RESULT) AND show the "name" column otherwise set no in 'RESULT'.
Can anyone help me please?
While you can accomplish your result by adding a left outer join to the query you've already started, it might be easiest to just use count() over....
with org_data as (
-- do the inner join before the left join later
select * from org1 o1 inner join org2 o2 on o2.orgid = o1.orgid
)
select
*,
count(*) over (partition by WORK_STREET, WORKZIP) as cnt,
case when o.ARB_REF is not null then 'Yes' else 'No' end as result
from
WORK w left outer join org_data o on o.ARB_REF = w.ARB_REF

Multiple left joins with aggregation on same table causes huge performance hit in SAP HANA

I am joining two tables on HANA and, to get some statistics, I am LEFT joining the items table 3 times to get a total count, number of entries processed and number of errors, as shown below.
This is a dev system and the items table has only 1500 items. But the query below runs for 17 seconds.
When I remove any of the three aggregation terms (but leave the corresponding JOIN in place), the query executes almost immediately.
I have also tried adding indexes on the fields used in the specific JOINs, but that makes no difference.
select rk.guid, rk.run_id, rk.status, rk.created_at, rk.created_by,
count( distinct rp.guid ),
count( distinct rp2.guid ),
count( distinct rp3.guid )
from zbsbpi_rk as rk
left join zbsbpi_rp as rp
on rp.header = rk.guid
left join zbsbpi_rp as rp2
on rp2.header = rk.guid
and rp2.processed = 'X'
left join zbsbpi_rp as rp3
on rp3.header = rk.guid
and rp3.result_status = 'E'
where rk.run_id = '0000000010'
group by rk.guid, run_id, status, created_at, created_by
I think you can re-write you query to improve the performance:
select rk.guid, rk.run_id, rk.status, rk.created_at, rk.created_by,
count( distinct rp.guid ),
count( distinct (CASE WHEN rp.processed = 'X' then rp.guid else null end) ),
count( distinct (CASE WHEN rp.result_status = 'E' then rp.guid else null end))
from zbsbpi_rk as rk
left join zbsbpi_rp as rp
on rp.header = rk.guid
where rk.run_id = '0000000010'
group by rk.guid, run_id, status, created_at, created_by
I'm not entirely sure if the count distinct case construct will work on hana but you may try.
My apologies, but I forgot that I had posted this question here. I had posted the same question at answers.sap.com after not getting any joy here: https://answers.sap.com/questions/172096/multiple-left-joins-with-aggregation-on-same-table.html
I eventually came up with the solution, which was a bit of a "doh!" moment:
select rk.guid, rk.run_id, rk.status, rk.created_at, rk.created_by,
count( distinct rp.guid ),
count( distinct rp2.guid ),
count( distinct rp3.guid )
from zbsbpi_rk as rk
join zbsbpi_rp as rp
on rp.header = rk.guid
left join zbsbpi_rp as rp2
on rp2.guid = rp.guid
and rp2.processed = 'X'
left join zbsbpi_rp as rp3
on rp3.guid = rp.guid
and rp3.result_status = 'E'
where rk.run_id = '0000000010'
group by rk.guid, run_id, status, created_at, created_by
The subsequent left joins needed only to be joined to the first join on the same table, as the first join contained a superset of all the records anyway.

Display Y/N column if record found in detail table

I'm trying to create a query so that I can have a column show Y/N if a particular item was ordered for a group of orders. The item I'm looking for would be OLI.id = '538'.
So my results would be:
Order#, Customer#, FreightPaid
12345, 00112233, Y
12346, 00112233, N
I cannot figure out if I need to use a subquery or the where exists function ?
Here's my current query:
SELECT distinct
OrderID,
Accountuid as Customerno
FROM [SMILEWEB_live].[dbo].[OrderLog] OL
inner join Orderlog_item OLI on OLI.orderlogkey = OL.[key]
inner join Account A on A.uid = OL.Accountuid
where A.GroupId = 'X9955'
and OL.CreateDate >= GETDATE() - 60
I would suggest an exists clause instead of a join:
select ol.OrderID, ol.Accountuid as Customerno,
(case when exists (select 1
from Orderlog_item OLI join
Account A
on A.uid = OL.Accountuid
where OLI.orderlogkey = OL.[key] and A.GroupId = 'X9955'
)
then 1 else 0
end) as flag
from [SMILEWEB_live].[dbo].[OrderLog] OL
where OL.CreateDate >= GETDATE() - 60;
This prevents a couple of problems. First, duplicate rows which are caused when there are multiple matching rows (and select distinct add unnecessary overhead). Second, missing rows, which happen when you use inner join instead of an outer join.

SQL: Want to alter the conditions on a join depending on values in table

I have a table called Member_Id which has a column in it called Member_ID_Type. The select statement below returns the value of another column, id_value from the same table. The join on the tables in the select statement is on the universal id column. There may be several entries in that table with this same universal id.
I want to adjust the select statement so that it will return the id_values for entries that have member_id_type equal to '7'. However if this is null then I want to return records that have member_id_type equal to '1'
So previously I had a condition on the join (commented out below) but that just returned records that had member_id_type equal to '7' and otherwise returned null.
I think I may have to use a case statement here but I'm not 100% sure how to use it in this scenario
SELECT TOP 1 cm.Contact_Relation_Gid,
mc.Universal_ID,
mi.ID_Value,
cm.First_Name,
cm.Last_Name,
cm.Middle_Name,
cm.Name_Suffix,
cm.Email_Address,
cm.Disability_Type_PKID,
cm.Race_Type_PKID,
cm.Citizenship_Type_PKID,
cm.Marital_Status_Type_PKID,
cm.Actual_SSN,
cm.Birth_Date,
cm.Gender,
mc.Person_Code,
mc.Relationship_Code,
mc.Member_Coverage_PKID,
sc.Subscriber_Coverage_PKID,
FROM Contact_Member cm (NOLOCK)
INNER JOIN Member_Coverage mc (NOLOCK)
ON cm.contact_relation_gid = mc.contact_relation_gid
AND mc.Record_Status = 'A'
INNER JOIN Subscriber_Coverage sc (NOLOCK)
ON mc.Subscriber_Coverage_PKID = sc.Subscriber_Coverage_PKID
AND mc.Record_Status = 'A'
LEFT outer JOIN Member_ID mi ON mi.Universal_ID = cm.Contact_Gid
--AND mi.Member_ID_Type_PKID='7'
WHERE cm.Contact_Relation_Gid = #Contact_Relation_Gid
AND cm.Record_Status = 'A'
Join them both, and use one if the other is not present:
select bt.name
, coalesce(eav1.value, eav2.value) as Value1OrValue2
from BaseTable bt
left join EavTable eav1
on eav1.id = bt.id
and eav1.type = 1
left join EavTable eav2
on eav2.id = bt.id
and eav2.type = 2
This query assumes that there is never more than one record with the same ID and Type.

Limit join to one row

I have the following query:
SELECT sum((select count(*) as itemCount) * "SalesOrderItems"."price") as amount, 'rma' as
"creditType", "Clients"."company" as "client", "Clients".id as "ClientId", "Rmas".*
FROM "Rmas" JOIN "EsnsRmas" on("EsnsRmas"."RmaId" = "Rmas"."id")
JOIN "Esns" on ("Esns".id = "EsnsRmas"."EsnId")
JOIN "EsnsSalesOrderItems" on("EsnsSalesOrderItems"."EsnId" = "Esns"."id" )
JOIN "SalesOrderItems" on("SalesOrderItems"."id" = "EsnsSalesOrderItems"."SalesOrderItemId")
JOIN "Clients" on("Clients"."id" = "Rmas"."ClientId" )
WHERE "Rmas"."credited"=false AND "Rmas"."verifyStatus" IS NOT null
GROUP BY "Clients".id, "Rmas".id;
The problem is that the table "EsnsSalesOrderItems" can have the same EsnId in different entries. I want to restrict the query to only pull the last entry in "EsnsSalesOrderItems" that has the same "EsnId".
By "last" entry I mean the following:
The one that appears last in the table "EsnsSalesOrderItems". So for example if "EsnsSalesOrderItems" has two entries with "EsnId" = 6 and "createdAt" = '2012-06-19' and '2012-07-19' respectively it should only give me the entry from '2012-07-19'.
SELECT (count(*) * sum(s."price")) AS amount
, 'rma' AS "creditType"
, c."company" AS "client"
, c.id AS "ClientId"
, r.*
FROM "Rmas" r
JOIN "EsnsRmas" er ON er."RmaId" = r."id"
JOIN "Esns" e ON e.id = er."EsnId"
JOIN (
SELECT DISTINCT ON ("EsnId") *
FROM "EsnsSalesOrderItems"
ORDER BY "EsnId", "createdAt" DESC
) es ON es."EsnId" = e."id"
JOIN "SalesOrderItems" s ON s."id" = es."SalesOrderItemId"
JOIN "Clients" c ON c."id" = r."ClientId"
WHERE r."credited" = FALSE
AND r."verifyStatus" IS NOT NULL
GROUP BY c.id, r.id;
Your query in the question has an illegal aggregate over another aggregate:
sum((select count(*) as itemCount) * "SalesOrderItems"."price") as amount
Simplified and converted to legal syntax:
(count(*) * sum(s."price")) AS amount
But do you really want to multiply with the count per group?
I retrieve the the single row per group in "EsnsSalesOrderItems" with DISTINCT ON. Detailed explanation:
Select first row in each GROUP BY group?
I also added table aliases and formatting to make the query easier to parse for human eyes. If you could avoid camel case you could get rid of all the double quotes clouding the view.
Something like:
join (
select "EsnId",
row_number() over (partition by "EsnId" order by "createdAt" desc) as rn
from "EsnsSalesOrderItems"
) t ON t."EsnId" = "Esns"."id" and rn = 1
this will select the latest "EsnId" from "EsnsSalesOrderItems" based on the column creation_date. As you didn't post the structure of your tables, I had to "invent" a column name. You can use any column that allows you to define an order on the rows that suits you.
But remember the concept of the "last row" is only valid if you specifiy an order or the rows. A table as such is not ordered, nor is the result of a query unless you specify an order by
Necromancing because the answers are outdated.
Take advantage of the LATERAL keyword introduced in PG 9.3
left | right | inner JOIN LATERAL
I'll explain with an example:
Assuming you have a table "Contacts".
Now contacts have organisational units.
They can have one OU at a point in time, but N OUs at N points in time.
Now, if you have to query contacts and OU in a time period (not a reporting date, but a date range), you could N-fold increase the record count if you just did a left join.
So, to display the OU, you need to just join the first OU for each contact (where what shall be first is an arbitrary criterion - when taking the last value, for example, that is just another way of saying the first value when sorted by descending date order).
In SQL-server, you would use cross-apply (or rather OUTER APPLY since we need a left join), which will invoke a table-valued function on each row it has to join.
SELECT * FROM T_Contacts
--LEFT JOIN T_MAP_Contacts_Ref_OrganisationalUnit ON MAP_CTCOU_CT_UID = T_Contacts.CT_UID AND MAP_CTCOU_SoftDeleteStatus = 1
--WHERE T_MAP_Contacts_Ref_OrganisationalUnit.MAP_CTCOU_UID IS NULL -- 989
-- CROSS APPLY -- = INNER JOIN
OUTER APPLY -- = LEFT JOIN
(
SELECT TOP 1
--MAP_CTCOU_UID
MAP_CTCOU_CT_UID
,MAP_CTCOU_COU_UID
,MAP_CTCOU_DateFrom
,MAP_CTCOU_DateTo
FROM T_MAP_Contacts_Ref_OrganisationalUnit
WHERE MAP_CTCOU_SoftDeleteStatus = 1
AND MAP_CTCOU_CT_UID = T_Contacts.CT_UID
/*
AND
(
(#in_DateFrom <= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateTo)
AND
(#in_DateTo >= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateFrom)
)
*/
ORDER BY MAP_CTCOU_DateFrom
) AS FirstOE
In PostgreSQL, starting from version 9.3, you can do that, too - just use the LATERAL keyword to achieve the same:
SELECT * FROM T_Contacts
--LEFT JOIN T_MAP_Contacts_Ref_OrganisationalUnit ON MAP_CTCOU_CT_UID = T_Contacts.CT_UID AND MAP_CTCOU_SoftDeleteStatus = 1
--WHERE T_MAP_Contacts_Ref_OrganisationalUnit.MAP_CTCOU_UID IS NULL -- 989
LEFT JOIN LATERAL
(
SELECT
--MAP_CTCOU_UID
MAP_CTCOU_CT_UID
,MAP_CTCOU_COU_UID
,MAP_CTCOU_DateFrom
,MAP_CTCOU_DateTo
FROM T_MAP_Contacts_Ref_OrganisationalUnit
WHERE MAP_CTCOU_SoftDeleteStatus = 1
AND MAP_CTCOU_CT_UID = T_Contacts.CT_UID
/*
AND
(
(__in_DateFrom <= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateTo)
AND
(__in_DateTo >= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateFrom)
)
*/
ORDER BY MAP_CTCOU_DateFrom
LIMIT 1
) AS FirstOE
Try using a subquery in your ON clause. An abstract example:
SELECT
*
FROM table1
JOIN table2 ON table2.id = (
SELECT id FROM table2 WHERE table2.table1_id = table1.id LIMIT 1
)
WHERE
...