Want to use multiple aggregate function with snowflake pivot columns function - sql

CREATE TABLE person (id INT, name STRING, date date, class INT, address STRING);
INSERT INTO person VALUES
(100, 'John', 30-1-2021, 1, 'Street 1'),
(200, 'Mary', 20-1-2021, 1, 'Street 2'),
(300, 'Mike', 21-1-2021, 3, 'Street 3'),
(100, 'John', 15-5-2021, 4, 'Street 4');
SELECT * FROM person
PIVOT (
**SUM(age) AS a, MAX(date) AS c**
FOR name IN ('John' AS john, 'Mike' AS mike)
);
This is databricks sql code above, how do I implement the same logic in snowflake

Below is the syntax for PIVOT in Snowflake:
SELECT ...
FROM ...
PIVOT ( <aggregate_function> ( <pivot_column> )
FOR <value_column> IN ( <pivot_value_1> [ , <pivot_value_2> ... ] ) )
[ ... ]
In case of Snowflake, your AS keyword will be outside the PIVOT function.
Check this example for your reference:
select *
from monthly_sales
pivot(sum(amount) for month in ('JAN', 'FEB', 'MAR', 'APR'))
as p
order by empid;
Visit this official document and check the given examples for better understanding.

Firstly, there is no "AGE" column as I can see from your table DDL.
Secondly, I do not think you can pivot on multiple aggregation functions, as the value will be put under the mentioned columns "JOHN" and "MIKE" for their corresponding aggregated values, it can't fit into two separate values. I don't know how your DataBricks example would work.
Your example will look something like below in Snowflake, after removing one aggregation function:
SELECT *
FROM
person
PIVOT (
MAX(date) FOR name IN ('John', 'Mike')
)
as p (id, class, address, john, mike)
;

Snowflake does not support multiple aggregate expressions in the PIVOT
And as noted by others, your AGE is missing, and you also do not have a ORDER BY clause, which makes rolling your own SQL harder.
SELECT
SUM(IFF(name='John',age,null)) AS john_sum_age,
MAX(IFF(name='John',date,null)) AS john_max_date,
SUM(IFF(name='Mike',age,null)) AS mike_age,
MAX(IFF(name='Mike',date,null)) AS mike_max_date
FROM person
if you had the ORDER BY in your example it would become the GROUP BY clause in this form
SELECT
<gouping_columns>,
SUM(IFF(name='John',age,null)) AS john_sum_age,
MAX(IFF(name='John',date,null)) AS john_max_date,
SUM(IFF(name='Mike',age,null)) AS mike_age,
MAX(IFF(name='Mike',date,null)) AS mike_max_date
FROM person
GROUP BY <gouping_columns>

Related

UNNEST returns no rows for empty array

I am using unnest for more flatten more than one array in Athena query. When the array has some records it returns the correct result. But when the second array is empty it is returning no records. Can someone please let me know how to do unnest to unnest more than one array in a single query?
The following query returns empty row.
WITH example AS (
SELECT devop, devs
FROM
UNNEST(ARRAY['Sharon', 'John', 'Bob', 'Sally']) AS t(devop),
UNNEST(ARRAY[]) AS t(devs)
)
select array_join(array_agg(distinct example.devop),';'),array_join(array_agg(distinct example.devs),';') from example
The following query returns the correct result.
WITH example AS (
SELECT devop, devs
FROM
UNNEST(ARRAY['Sharon', 'John', 'Bob', 'Sally']) AS t(devop),
UNNEST(ARRAY['a','b']) AS t(devs)
)
select array_join(array_agg(distinct example.devop),';'),array_join(array_agg(distinct example.devs),';') from example
When the second array is empty I want the following result
_col0 _col1
----------------------------------------------
Sally;John;Bob;Sharon
Use a left join:
WITH example AS (
SELECT devop, devs
FROM UNNEST(ARRAY['Sharon', 'John', 'Bob', 'Sally']) AS t(devop) LEFT JOIN
UNNEST(ARRAY[]) AS t(devs)
ON 1=1
)
. . .
Since LEFT JOIN UNNEST doesn't work on Athena Presto, you can cross-join on null values using the IF and CARDINALITY like this:
WITH example AS (
SELECT devop, devs
FROM UNNEST(ARRAY['Sharon', 'John', 'Bob', 'Sally']) AS t(devop) CROSS JOIN
UNNEST(IF(CARDINALITY(ARRAY[])=0, ARRAY[NULL], ARRAY[]) AS t(devs)
)
This way, if the ARRAY[]/Column is empty (CARDINALITY checks the size of array), ARRAY[NULL] is returned and the row is not skipped.
(tested on Presto 0.217 / Athena engine version 2)
I don't think that you want a cross join here. Instead, you could phrase this as:
select
array_join(
array_agg(distinct unnest(array['Sharon', 'John', 'Bob', 'Sally'])),
';'
) devops,
array_join(
array_agg(distinct unnest(array[])),
';'
) devs
Try to array_union with a null array,
Below query will give you the desired result. Tested on Athena engine version 1.
WITH example AS (
SELECT devop, devs
FROM
UNNEST(ARRAY['Sharon', 'John', 'Bob', 'Sally']) AS t(devop),
UNNEST(array_union(ARRAY[], ARRAY[null])) AS t(devs)
)
select array_join(array_agg(distinct example.devop),';'),array_join(array_agg(distinct example.devs),';') from example

How do I write a SQL select statement using regular expressions that will display the two words in LEFT in reverse order

For example I'm have created a table:
CREATE TABLE CAR
(
LEFT VARCHAR(50),
RIGHT VARCHAR(50)
)
Then insert some values to table CAR:
INSERT INTO CAR (LEFT, RIGHT)
VALUES ('super car', 'car super')
Now I want to write a select statement by using regexp_replace (which I'm really unfamiliar with) to display the two words in LEFT column in reverse order and in two separate output column. I will appreciate any suggestion! Thank you!
The output should be looked like this:
column1 column2
-------------------
car super
CREATE TABLE car (
left VARCHAR(50),
right VARCHAR(50)
)
insert into CAR(LEFT,RIGHT)
values('super car', 'car super');
select regexp_replace(left,'(.*?)([[:space:]])') AS COLUMN1,regexp_replace(left, '([[:space:]].*)') AS COLUMN2 from CAR;
You can use regexp_substr function to split your string.
select regexp_substr('super car', '([[:space:]].*)', 1, 1), regexp_substr('super car', '(.*?)([[:space:]])', 1, 1) from dual
using regexp_replace function.
select regexp_replace('super car','(.*?)([[:space:]])'), regexp_replace('super car', '([[:space:]].*)') from dual

SQL query - find where a value (where there could be multiple) does not exist

I need some help in identifying records which do not have a specific value associated with it.
Need:
Each distinct customer record can have multiple methods of contact, for example:
Cheryl Hubert has the following contact records:
Code value: 1.
Description: home phone
CustomerData:. 123-456-7890
Code value: 2
Description: work phone
CustomerData: 000-123-4567
Code value:3
Description: email
CustomerData: chubert#xxx.xxx
Customers may have none of these, or some of these.
I need to write a query to find all those customer records which DO NOT have an email address (code value 3). I've seen queries with 'not exists' but not sure that would be the right way. Keep in mind that the same field name is used for all contact data (CustomerData).
The code value/description provides what is within the CustomerData field.
Any help appreciated.
Let's say the contact info is in a table contactRecords, which looks something like this:
customerId int,
codeValue int,
description varchar,
customerData varchar
To get all of the customers who do not have an email record (where codeValue = 3), try something like this:
select distinct customerId
from contactRecords
where customerId not in (
select distinct customerId
from contactRecords
where codeValue = 3)
The inner query finds all customers who have an email record. The outer query finds all but those customers.
As you posted almost no data i will try guessing your structure. Assuming you have clients in one table and contacts on another one with the client id, usually when you want to find something non relational between two tables, you select on your client, left join on your contact and put a where clause on any of the contact column is null. If you want specifically the value 3, put it directly in join clause.
Try this query:
select *
from customers c
where not exists(select 1 from contact_method
where customer_id = c.id
and description = 'email');
I assumed such schema:
create table customers(id int, name varchar(20));
insert into customers values (1, 'Cheryl Hubert');
create table contact_method (id int, customer_id int, code_value int, description varchar(20), customer_data varchar(20));
insert into contact_method values (1, 1, 1, 'home phone', '123-456-7890');
insert into contact_method values (2, 1, 2, 'work phone', '000-123-4567');
insert into contact_method values (3, 1, 3, 'email', 'chubert#xxx.xxx');
Demo
You can use the GROUP BY and HAVING clauses to check:
Oracle Setup:
CREATE TABLE contact_details ( code_value, customerid, description, customerdata ) AS
SELECT 1, 1, 'home phone', '123-456-7890' FROM DUAL UNION ALL
SELECT 2, 1, 'work phone', '000-123-4567' FROM DUAL UNION ALL
SELECT 3, 1, 'email', 'chubert#xxx.xxx' FROM DUAL UNION ALL
SELECT 4, 2, 'home phone', '012-345-6789' FROM DUAL;
Query:
SELECT customerid
FROM contact_details
GROUP BY customerid
HAVING COUNT( CASE description WHEN 'email' THEN 1 END ) = 0
Output:
| CUSTOMERID |
|------------|
| 2 |

Is there a way to round an Oracle crosstab PIVOT?

I want to pivot data using the AVG() function, but I want to round the results to prevent repeating decimals from displaying.
When I try something like this: PIVOT( ROUND( AVG(column_name), 2) FOR ...)
I get an error: ORA-56902: expect aggregate function inside pivot operation
Here is a very simple example of "number of students registered in a course":
CREATE TABLE TBL_EXAMPLE
(
enrolled NUMBER,
course VARCHAR2(50 CHAR)
);
INSERT INTO TBL_EXAMPLE (enrolled, course) VALUES (1, 'math');
INSERT INTO TBL_EXAMPLE (enrolled, course) VALUES (2, 'math');
INSERT INTO TBL_EXAMPLE (enrolled, course) VALUES (2, 'math');
INSERT INTO TBL_EXAMPLE (enrolled, course) VALUES (1, 'english');
INSERT INTO TBL_EXAMPLE (enrolled, course) VALUES (4, 'english');
SELECT *
FROM TBL_EXAMPLE
PIVOT ( AVG(enrolled) FOR course IN ('math', 'english') );
'math' 'english'
---------------|-------------
1.6666666666...| 2.5
What I want is:
SELECT *
FROM TBL_EXAMPLE
PIVOT ( ROUND(AVG(enrolled), 2) FOR course IN ('math', 'english') );
'math' 'english'
---------------|-------------
1.67 | 2.50
In the real world application, the SQL is being dynamically generated based on user input on a report, and due to the complexities of the real world scenario I can't just re-write the query like this:
SELECT ROUND("'math'", 2) as "'math'", ROUND("'english'", 2) as "'english'"
FROM TBL_EXAMPLE
PIVOT ( AVG(enrolled) FOR course IN ('math', 'english') );
So, my question is, is there any workaround I can use to bypass ORA-56902 in this scenario, or any other way to 'trick' Oracle into NOT returning up to 38 digits of decimal precision when numbers don't divide evenly via the AVG() calculation in a PIVOT clause?
Maybe I'm missing something, but why not perform the AVG() in a subquery with a ROUND and then apply your PIVOT:
select *
from
(
select round(avg(enrolled), 2) enrolled, course
from tbl_example
group by course
) d
PIVOT
(
max(enrolled)
FOR course IN ('math', 'english')
);
See SQL Fiddle with Demo

SQL Showing Less information depending on date

I have this code, what It returns is a list of some clients, but it lists too many. This is because it lists several of the same thing just with diffrent dates. I only want to show the latest date and none of the other ones. I tried to do a group by Client_Code but it didn't work, it just through up not an aggregate function or something similar (can get if needed). What I have been asked to get is all of our clients, with all the details listed. in the 'as' part and they all pull through properly. If I take out:
I.DATE_LAST_POSTED as 'Last Posted',
I.DATE_LAST_BILLED as 'Last Billed'
It shows up okay, but I need the last billed date only to appear. But putting these lines in shows the client several times listing all the diffrent bill dates. And I think that is because it is pulling across the diffrent Matters in the Matter_Master Table. Essentially, I would like to only show the Client Information on the highest Matter with there last billed date.
Please let me know if this needs clarification, im trying to explain best I can....
SELECT DISTINCT
A.DIWOR as 'ID',
B.Client_alpha_Name as 'Client Name',
A.ClientCODE as 'Client Code',
B.Client_address as 'Client Address',
D.COMM_NO AS 'Contact',
E.Contact_full_name as 'Possible Key Contact',
G.LOBSICDESC as 'LOBSIC Code',
H.EARNERNAME as 'Client Care Parnter',
A.CLIENTCODE + '/' + LTRIM(STR(A.LAST_MATTER_NUM)) as 'Last Matter Code',
I.DATE_LAST_POSTED as 'Last Posted',
I.DATE_LAST_BILLED as 'Last Billed'
FROM CLIENT_MASTER A
JOIN CLIENT_INFO B
ON A.CLIENTCODE=B.CLIENT_CODE
JOIN MATTER_MASTER C
ON A.DIWOR=C.CLIENTDIWOR
JOIN COMMINFO D
ON A.DIWOR=D.DIWOR
JOIN CONTACT E
ON A.CLIENTCODE=E.CLIENTCODE
JOIN VW_CONTACT F
ON E.NAME_DIWOR=F.NAME_DIWOR
JOIN LOBSIC_CODES G
ON A.LOBSICDIWOR=G.DIWOR
JOIN STAFF H
ON A.CLIENTCAREPARTNER=H.DIWOR
JOIN MATTER I
ON C.DIWOR=I.MATTER_DIWOR
WHERE F.COMPANY_FLAG='Y'
AND C.MATTER_MANAGER NOT IN ('78','466','2','104','408','73','51','561','504','101','13','534','16','461','531','144','57','365','83','107','502','514','451')
AND I.DATE_LAST_BILLED > 0
GROUP BY A.ClientCODE
ORDER BY A.DIWOR
Your problem is that you aren't using enough aggregate functions. Which is probably why you're using both the DISTINCT clause and the GROUP BY clause (the recommendation is to use GROUP BY, and not DISTINCT).
So... remove DISTINCT, add the necessary (unique, more or less) list of columns to the GROUP BY clause, and wrap the rest in aggregate functions, constants, or subselects. In the specific case of wanting the largest date, wrap it in a MAX() function.
If I understood right:
--=======================
-- sample data - simplifed output of your query
--=======================
declare #t table
(
ClientCode int,
ClientAddress varchar(50),
DateLastBilled datetime
-- the rest of fields is skipped
)
insert into #t values (1, 'address1', '2011-01-01')
insert into #t values (1, 'address1', '2011-01-02')
insert into #t values (1, 'address1', '2011-01-03')
insert into #t values (1, 'address1', '2011-01-04')
insert into #t values (2, 'address2', '2011-01-07')
insert into #t values (2, 'address2', '2011-01-08')
insert into #t values (2, 'address2', '2011-01-09')
insert into #t values (2, 'address2', '2011-01-10')
--=======================
-- solution
--=======================
select distinct
ClientCode,
ClientAddress,
DateLastBilled
from
(
select
ClientCode,
ClientAddress,
DateLastBilled,
-- list of remaining fields
MaxDateLastBilled = max(DateLastBilled) over(partition by ClientCode)
from
(
-- here should be your query
select * from #t
) t
) t
where MaxDateLastBilled = DateLastBilled