SQL Showing Less information depending on date - sql

I have this code, what It returns is a list of some clients, but it lists too many. This is because it lists several of the same thing just with diffrent dates. I only want to show the latest date and none of the other ones. I tried to do a group by Client_Code but it didn't work, it just through up not an aggregate function or something similar (can get if needed). What I have been asked to get is all of our clients, with all the details listed. in the 'as' part and they all pull through properly. If I take out:
I.DATE_LAST_POSTED as 'Last Posted',
I.DATE_LAST_BILLED as 'Last Billed'
It shows up okay, but I need the last billed date only to appear. But putting these lines in shows the client several times listing all the diffrent bill dates. And I think that is because it is pulling across the diffrent Matters in the Matter_Master Table. Essentially, I would like to only show the Client Information on the highest Matter with there last billed date.
Please let me know if this needs clarification, im trying to explain best I can....
SELECT DISTINCT
A.DIWOR as 'ID',
B.Client_alpha_Name as 'Client Name',
A.ClientCODE as 'Client Code',
B.Client_address as 'Client Address',
D.COMM_NO AS 'Contact',
E.Contact_full_name as 'Possible Key Contact',
G.LOBSICDESC as 'LOBSIC Code',
H.EARNERNAME as 'Client Care Parnter',
A.CLIENTCODE + '/' + LTRIM(STR(A.LAST_MATTER_NUM)) as 'Last Matter Code',
I.DATE_LAST_POSTED as 'Last Posted',
I.DATE_LAST_BILLED as 'Last Billed'
FROM CLIENT_MASTER A
JOIN CLIENT_INFO B
ON A.CLIENTCODE=B.CLIENT_CODE
JOIN MATTER_MASTER C
ON A.DIWOR=C.CLIENTDIWOR
JOIN COMMINFO D
ON A.DIWOR=D.DIWOR
JOIN CONTACT E
ON A.CLIENTCODE=E.CLIENTCODE
JOIN VW_CONTACT F
ON E.NAME_DIWOR=F.NAME_DIWOR
JOIN LOBSIC_CODES G
ON A.LOBSICDIWOR=G.DIWOR
JOIN STAFF H
ON A.CLIENTCAREPARTNER=H.DIWOR
JOIN MATTER I
ON C.DIWOR=I.MATTER_DIWOR
WHERE F.COMPANY_FLAG='Y'
AND C.MATTER_MANAGER NOT IN ('78','466','2','104','408','73','51','561','504','101','13','534','16','461','531','144','57','365','83','107','502','514','451')
AND I.DATE_LAST_BILLED > 0
GROUP BY A.ClientCODE
ORDER BY A.DIWOR

Your problem is that you aren't using enough aggregate functions. Which is probably why you're using both the DISTINCT clause and the GROUP BY clause (the recommendation is to use GROUP BY, and not DISTINCT).
So... remove DISTINCT, add the necessary (unique, more or less) list of columns to the GROUP BY clause, and wrap the rest in aggregate functions, constants, or subselects. In the specific case of wanting the largest date, wrap it in a MAX() function.

If I understood right:
--=======================
-- sample data - simplifed output of your query
--=======================
declare #t table
(
ClientCode int,
ClientAddress varchar(50),
DateLastBilled datetime
-- the rest of fields is skipped
)
insert into #t values (1, 'address1', '2011-01-01')
insert into #t values (1, 'address1', '2011-01-02')
insert into #t values (1, 'address1', '2011-01-03')
insert into #t values (1, 'address1', '2011-01-04')
insert into #t values (2, 'address2', '2011-01-07')
insert into #t values (2, 'address2', '2011-01-08')
insert into #t values (2, 'address2', '2011-01-09')
insert into #t values (2, 'address2', '2011-01-10')
--=======================
-- solution
--=======================
select distinct
ClientCode,
ClientAddress,
DateLastBilled
from
(
select
ClientCode,
ClientAddress,
DateLastBilled,
-- list of remaining fields
MaxDateLastBilled = max(DateLastBilled) over(partition by ClientCode)
from
(
-- here should be your query
select * from #t
) t
) t
where MaxDateLastBilled = DateLastBilled

Related

Want to use multiple aggregate function with snowflake pivot columns function

CREATE TABLE person (id INT, name STRING, date date, class INT, address STRING);
INSERT INTO person VALUES
(100, 'John', 30-1-2021, 1, 'Street 1'),
(200, 'Mary', 20-1-2021, 1, 'Street 2'),
(300, 'Mike', 21-1-2021, 3, 'Street 3'),
(100, 'John', 15-5-2021, 4, 'Street 4');
SELECT * FROM person
PIVOT (
**SUM(age) AS a, MAX(date) AS c**
FOR name IN ('John' AS john, 'Mike' AS mike)
);
This is databricks sql code above, how do I implement the same logic in snowflake
Below is the syntax for PIVOT in Snowflake:
SELECT ...
FROM ...
PIVOT ( <aggregate_function> ( <pivot_column> )
FOR <value_column> IN ( <pivot_value_1> [ , <pivot_value_2> ... ] ) )
[ ... ]
In case of Snowflake, your AS keyword will be outside the PIVOT function.
Check this example for your reference:
select *
from monthly_sales
pivot(sum(amount) for month in ('JAN', 'FEB', 'MAR', 'APR'))
as p
order by empid;
Visit this official document and check the given examples for better understanding.
Firstly, there is no "AGE" column as I can see from your table DDL.
Secondly, I do not think you can pivot on multiple aggregation functions, as the value will be put under the mentioned columns "JOHN" and "MIKE" for their corresponding aggregated values, it can't fit into two separate values. I don't know how your DataBricks example would work.
Your example will look something like below in Snowflake, after removing one aggregation function:
SELECT *
FROM
person
PIVOT (
MAX(date) FOR name IN ('John', 'Mike')
)
as p (id, class, address, john, mike)
;
Snowflake does not support multiple aggregate expressions in the PIVOT
And as noted by others, your AGE is missing, and you also do not have a ORDER BY clause, which makes rolling your own SQL harder.
SELECT
SUM(IFF(name='John',age,null)) AS john_sum_age,
MAX(IFF(name='John',date,null)) AS john_max_date,
SUM(IFF(name='Mike',age,null)) AS mike_age,
MAX(IFF(name='Mike',date,null)) AS mike_max_date
FROM person
if you had the ORDER BY in your example it would become the GROUP BY clause in this form
SELECT
<gouping_columns>,
SUM(IFF(name='John',age,null)) AS john_sum_age,
MAX(IFF(name='John',date,null)) AS john_max_date,
SUM(IFF(name='Mike',age,null)) AS mike_age,
MAX(IFF(name='Mike',date,null)) AS mike_max_date
FROM person
GROUP BY <gouping_columns>

avoiding group by for column used in datediff?

As the database is currently constructed, I can only use a Date Field of a certain table in a datediff-function that is also part of a count aggregation (not the date field, but that entity where that date field is not null. The group by in the end messes up the counting, since the one entry is counted on it's own / as it's own group.
In some detail:
Our lead recruiter want's a report that shows the sum of applications, and conducted interviews per opening. So far no problem. Additionally he likes to see the total duration per opening from making it public to signing a new employee per opening and of cause only if the opening could already be filled.
I have 4 tables to join:
table 1 holds the data of the opening
table 2 has the single applications
table 3 has the interview data of the applications
table 4 has the data regarding the publication of the openings (with the date when a certain opening was made public)
The problem is the duration requirement. table 4 holds the starting point and in table 2 one (or none) applicant per opening has a date field filled with the time he returned a signed contract and therefor the opening counts as filled. When I use that field in a datediff I'm forced to also put that column in the group by clause and that results in 2 row per opening. 1 row has all the numbers as wanted and in the second row there is always that one person who has a entry in that date field...
So far I haven't come far in thinking of a way of avoiding that problem except for explanining to the colleague that he get's his time-to-fill number in another report.
SELECT
table1.col1 as NameOfProject,
table1.col2 as Company,
table1.col3 as OpeningType,
table1.col4 as ReasonForOpening,
count (table2.col2) as NumberOfApplications,
sum (case when table2.colSTATUS = 'withdrawn' then 1 else 0 end) as mberOfApplicantsWhoWithdraw,
sum (case when table3.colTypeInterview = 'PhoneInterview' then 1 else 0 end) as NumberOfPhoneInterview,
...more sum columns...,
table1.finished, // shows „1“ if opening is occupied
DATEDIFF(day, table4.colValidFrom, **table2.colContractReceived**) as DaysToCompletion
FROM
table2 left join table3 on table2.REF_NR = table3.REF_NR
join table1 on table2.PROJEKT = table1.KBEZ
left join table4 on table1.REFNR = table4.PRJ_REFNR
GROUP BY
**table2.colContractReceived**
and all other columns except the ones in aggregate (sum and count) functions go in the GROUP BY section
ORDER BY table1.NameOfProject
Here is a short rebuild of what it looks like. First a row where the opening is not filled and all aggregations come out in one row as wanted. The next project/opening shows up double, because the field used in the datediff is grouped independently...
project company; no_of_applications; no_of_phoneinterview; no_of_personalinterview; ... ; time_to_fill_in_days; filled?
2018_312 comp a 27 4 2 null 0
2018_313 comp b 54 7 4 null 0
2018_313 comp b 1 1 1 42 1
I'd be glad to get any idea how to solve this. Thanks for considering my request!
(During the 'translation' of all the specific column and table names I might have build in a syntax error here and there but the query worked well ecxept for that unwanted extra aggregation per filled opening)
If I've understood your requirement properly, I believe the issue you are having is that you need to show the date between the starting point and the time at which an applicant responded to an opening, however this must only show a single row based on whether or not the position was filled (if the position was filled, then show that row, if not then show that row).
I've achieved this result by assuming that you count a position as filled using the "ContractsRecevied" column. This may be wrong however the principle should still provide what you are looking for.
I've essentially wrapped your query in to a subquery, performed a rank ordering by the contractsfilled column descending and partitioned by the project. Then in the outer query I filter for the first instance of this ranking.
Even if my assumption about the column structure and data types is wrong, this should provide you with a model to work with.
The only issue you might have with this ranking solution is if you want to aggregate over both rows within one (so include all of the summed columns for both the position filled and position not filled row per project). If this is the case let me know and we can work around that.
Please let me know if you have any questions.
declare #table1 table (
REFNR int,
NameOfProject nvarchar(20),
Company nvarchar(20),
OpeningType nvarchar(20),
ReasonForOpening nvarchar(20),
KBEZ int
);
declare #table2 table (
NumberOfApplications int,
Status nvarchar(15),
REF_NR int,
ReturnedApplicationDate datetime,
ContractsReceived bit,
PROJEKT int
);
declare #table3 table (
TypeInterview nvarchar(25),
REF_NR int
);
declare #table4 table (
PRJ_REFNR int,
StartingPoint datetime
);
insert into #table1 (REFNR, NameOfProject, Company, OpeningType, ReasonForOpening, KBEZ)
values (1, '2018_312', 'comp a' ,'Permanent', 'Business growth', 1),
(2, '2018_313', 'comp a', 'Permanent', 'Business growth', 2),
(3, '2018_313', 'comp a', 'Permanent', 'Business growth', 3);
insert into #table2 (NumberOfApplications, Status, REF_NR, ReturnedApplicationDate, ContractsReceived, PROJEKT)
values (27, 'Processed', 4, '2018-04-01 08:00', 0, 1),
(54, 'Withdrawn', 5, '2018-04-02 10:12', 0, 2),
(1, 'Processed', 6, '2018-04-15 15:00', 1, 3);
insert into #table3 (TypeInterview, REF_NR)
values ('Phone', 4),
('Phone', 5),
('Personal', 6);
insert into #table4 (PRJ_REFNR, StartingPoint)
values (1, '2018-02-25 08:00'),
(2, '2018-03-04 15:00'),
(3, '2018-03-04 15:00');
select * from
(
SELECT
RANK()OVER(Partition by NameOfProject, Company order by ContractsReceived desc) as rowno,
table1. NameOfProject,
table1.Company,
table1.OpeningType,
table1.ReasonForOpening,
case when ContractsReceived >0 then datediff(DAY, StartingPoint, ReturnedApplicationDate) else null end as TimeToFillInDays,
ContractsReceived Filled
FROM
#table2 table2 left join #table3 table3 on table2.REF_NR = table3.REF_NR
join #table1 table1 on table2.PROJEKT = table1.KBEZ
left join #table4 table4 on table1.REFNR = table4.PRJ_REFNR
group by NameOfProject, Company, OpeningType, ReasonForOpening, ContractsReceived,
StartingPoint, ReturnedApplicationDate
) x where rowno=1

Return zeros in empty query result

I have this query:
SELECT
--DGC.Name
[GameChannel],
SUM([AdjustedGames] ) *100. / SUM(SUM([AdjustedGames])) OVER() [Percentage]
FROM #GameChannelResults1 GC
--LEFT OUTER JOIN [WarehouseMgmt].[DimGameChannel] DGC ON DGC.Name = GC.[GameChannel]
GROUP BY [GameChannel]--DGC.Name
But when there is no match it returns empty result(nothing).I want to put somehow all values from [WarehouseMgmt].[DimGameChannel].Name as GameChannel and 0 for percentage if there is no match or result is empty. How i can do that?
Maybe this, JOIN to your dimension, like you already had commented.
SELECT
DGC.Name
ISNULL(SUM([AdjustedGames] ) *100. / SUM(SUM([AdjustedGames])) OVER(),0) [Percentage]
FROM [WarehouseMgmt].[DimGameChannel] DGC
LEFT JOIN #GameChannelResults1 GC ON DGC.Name = GC.[GameChannel]
GROUP BY DGC.Name
It always helps if you add sample data to your question, and provide the expected output. SO has a handy guide on this topic, you might want to read.
This sample has three channels. Two of those channels have results, one does not.
Sample
-- Table vars are easy to share.
DECLARE #Channel TABLE
(
ChannelName VARCHAR(10)
)
;
DECLARE #GameResults TABLE
(
ChannelName VARCHAR(10),
AdjustedGames INT
)
;
-- There are three sample channels...
INSERT INTO #Channel
(
ChannelName
)
VALUES
('Channel A'),
('Channel B'),
('Channel C')
;
-- ... but only two contain results.
INSERT INTO #GameResults
(
ChannelName,
AdjustedGames
)
VALUES
('Channel A', 1),
('Channel A', 2),
('Channel B', 1),
('Channel B', 2)
;
Query
This query uses an outer join to return every record from the #Channel table and any matching records from the #GameResults table. When there are no matching records NULL is returned by default. I've used ISNULL to replace these with a 0.
-- Outer join example.
SELECT
c.ChannelName,
ISNULL(SUM(AdjustedGames), 0) AS AdjustedGames
FROM
#Channel AS c
LEFT OUTER JOIN #GameResults AS gr ON gr.ChannelName = c.ChannelName
GROUP BY
c.ChannelName
;
Something like this ?
Use NULLIF function.
It works like CASE, if value is blank then returns null ...
NULLIF(Value,'')
Then you can use ISNULL to replace NULL values with 0.
Use together to tweak your desired logic ...

Merging two rows on to one in different columns

Firstly sorry if this has already been answered somewhere else, I have been unable to find an answer though after days of searching.
Is there a way to merge two rows into one row using different columns.
You will see from the image below, the row is identical, other than the date and location what I am looking for is to have the details below on one row. Where the date column is displayed twice with different column names for example 'Date sent to X location' and 'Date sent to Y location'. The location would not need to be displayed if we put the correct dates in the correct columns, as they would see what the location was from the column name.
So far I use this query, and I am unsure on how to adjust it to do what I need?
select
l.lot_number,
trunc(l.start_tran_date) AS "Date sent to location",
l.location_id_2 AS "Location"
FROM t_tran_log l
WHERE
(l.location_id_2 = 'SENTTOMAP' OR l.location_id_2 = 'WAITINGFORCOLLECTION')
;
This is what I would like the above result to look like:
This would be my approach:
1- Create a test table
create table MESSYLOG
(
lotn varchar(20),
datesent date,
location varchar(20)
);
insert into messylog values ('abc', '06-JUN-16', 'waiting');
insert into messylog values ('abc', '07-JUN-16', 'sent');
insert into messylog values ('def', '08-JUN-16', 'waiting');
insert into messylog values ('def', '10-JUN-16', 'sent');
--select * from MESSYLOG
2- Write 2 subqueries
select t1.lotn,t2.DateWait, t1.DateSentmap,
from
(
select e.lotn, e.datesent as DateSentmap
from messylog e
where e.location = 'sent'
) t1
JOIN
(
select m.lotn, m.datesent as DateWait
from messylog m
where m.location = 'waiting'
)t2
on t1.lotn = t2.lotn
3-Resultset
LOTN DATEWAIT DATESENTMAP
abc 06-JUN-16 07-JUN-16
def 08-JUN-16 10-JUN-16
If there is only a single date to a location for a given lot, could try something like this:
SELECT lot_number
,MAX(CASE WHEN location_id = 'WAITINGFORCOLLECTION' THEN start_tran_date) ELSE NULL END) AS "Date waiting for collection"
,MAX(CASE WHEN location_id = 'SENTTOMAP' THEN start_tran_date) ELSE NULL END) AS "Date sent to map"
FROM t_tran_log
GROUP BY lot_number
The aggregate function (MAX) will skip the NULL column values leaving the single value for the location.

Getting Count Only of Distinct Value Combinations of multiple fields.

Please consider the following:
IF OBJECT_ID ('tempdb..#Customer') IS NOT NULL
DROP TABLE #Customer;
CREATE TABLE #Customer
(
CustomerKey INT IDENTITY (1, 1) NOT NULL
,CustomerNum INT NOT NULL
,CustomerName VARCHAR (25) NOT NULL
,Planet VARCHAR (25) NOT NULL
)
GO
INSERT INTO #Customer (CustomerNum, CustomerName, Planet)
VALUES (1, 'Anakin Skywalker', 'Tatooine')
, (2, 'Yoda', 'Coruscant')
, (3, 'Obi-Wan Kenobi', 'Coruscant')
, (4, 'Luke Skywalker', 'Tatooine')
, (4, 'Luke Skywalker', 'Tatooine')
, (4, 'Luke Skywalker', 'Bespin')
, (4, 'Luke Skywalker', 'Bespin')
, (4, 'Luke Skywalker', 'Endor')
, (4, 'Luke Skywalker', 'Tatooine')
, (4, 'Luke Skywalker', 'Kashyyyk');
Notice that there are a total of 10 records. I know that I can get the list of distinct combinations of CustomerName and PLanet eith either of the following two queries.
SELECT DISTINCT CustomerName, Planet FROM #Customer;
SELECT CustomerName, Planet FROM #Customer
GROUP BY CustomerName, Planet;
However, what I'd like is a simple way to get just the count of those values, not the values themselves. I'd like a way that's quick to type, but also performant. I know I could load the values into a CTE, Temp Table, Table Variable, or Sub Query, and then count the records. Is there a better way to accomplish this?
This will work in 2005:
SELECT COUNT(*) AS cnt
FROM
( SELECT 1 AS d
FROM Customer
GROUP BY Customername, Planet
) AS t ;
Tested in SQL-Fiddle. An index on (CustomerName, Planet) would be used, see the query plan (for 2012 version):
The simplest to think, "get all distinct values in a subquery, then count" , yiields the same identical plan:
SELECT COUNT(*) AS cnt
FROM
 ( SELECT DISTINCT Customername, Planet
   FROM  Customer
 ) AS t ;
And also the one (thanx to #Aaron Bertrand) using ranking function ROW_NUMBER() (not sure if it will be efficient in 2005 version, too, but you can test):
SELECT COUNT(*) AS cnt
FROM
(SELECT rn = ROW_NUMBER()
OVER (PARTITION BY CustomerName, Planet
ORDER BY CustomerName)
FROM Customer) AS x
WHERE rn = 1 ;
There are also other ways to write this (one even without subquery, thanx to #Mikael Erksson!) but not as efficient.
The subquery/CTE method is the "right" way to do it.
A quick (in terms of typing but not necessarily performance) and dirty way is:
select count(distinct customername+'###'+Planet)
from #Customer;
The '###' is to separate the values so you don't get accidental collisions.