SQL Group by Client Location - sql

Sample of Data I am trying to manipulate
Order | OrderDate | ClientName| ClientAddress | City | State| Zip |
-------|-----------|-----------|---------------|--------|------|-------|
CO101 | 1/5/2015 | Client ABC| 101 Park Drive| Boston | MA | 02134 |
C0102 | 2/6/2015 | Client ABC| 101 Park Drive| Boston | MA | 02134 |
C0103 | 1/7/2015 | Client ABC| 354 Foo Pkwy | Dallas | TX | 75001 |
C0104 | 3/7/2015 | Client ABC| 354 Foo Pkwy | Dallas | TX | 75001 |
C0105 | 5/7/2015 | Client XYZ| 1 Binary Road | Austin | TX | 73301 |
C0106 | 1/8/2015 | Client XYZ| 1 Binary Road | Austin | TX | 73301 |
C0107 | 7/9/2015 | Client XYZ| 51 Testing Rd | Austin | TX | 73301 |
I have a database setup in MS-SQL Server with all client orders for the past two year period. Some clients only have one location, others have multiple locations. I would like to write a script that will show me the number of orders a customer placed by location over the total number of weeks there was at least one order.
Based on the results of this script, I would like to be able to deduce every customer location's summary of unique orders (placed at various times). For example:
Client ABC has placed 45 orders over 35 total weeks at location A
Client ABC has placed 35 orders over 15 total weeks at location B
Client ABC has placed 15 orders over 15 total weeks at location C
I would like see this information for each unique location for each client. I am not sure how to aggregate the data in such a way. Here is where I am at with my script:
SELECT t1.ClientName, (SELECT DISTINCT t2.ClientAddress), COUNT(DISTINCT t2.Orders) AS TotalOrders,
DATEPART(week, t1.OrderDate) AS Week
FROM database t1
INNER JOIN database t2 on t1.Orders = t2.Orders
GROUP BY DATEPART(week, t1.OrderDate), t1.ClientAddress, t2.ClientAddress
HAVING COUNT(DISTINCT t2.SalesOrder) > 1
ORDER BY TotalOrders DESC
The results that I get show me the unique orders by location by week, but I'm not sure how to count the number of weeks in the way that I need; I have tried writing subqueries but I keep running into issues. I realize that in this script I am showing number of order by location by each individual week, I would like to count the total number of weeks within the time frame of where there is at least one order.
The results structure is as followed:
| ClientName| ClientAddress | TotalOrders | Week |
|-----------|---------------|--------------|------|
|Client ABC |101 Park Drive | 30 | 21 |
|Client ABC |101 Park Drive | 29 | 13 |
|Client ABC |101 Park Drive | 28 | 10 |
|Client XYZ |1 Binary Road | 27 | 19 |
|Client XYZ |1 Binary Road | 25 | 7 |
|Client XYZ |51 Testing Rd | 22 | 9 |
Any and all help would be greatly appreciated; thank you in advance.

Isn't this what you want?
SELECT t1.ClientName, ClientAddress, COUNT(DISTINCT t1.Orders) AS TotalOrders,
COUNT(DISTINCT DATEPART(week, t1.OrderDate)) AS Weeks
FROM database t1
GROUP BY t1.ClientName, t1.ClientAddress
HAVING COUNT(DISTINCT t2.SalesOrder) > 1
ORDER BY TotalOrders DESC
I don't really follow why you're doing a self-join. Seems useless to me, but I left it in, just in case, and to focus only on the change I made to get your result.

Related

compare two columns in PostgreSQL show only highest value

This is my table
I'm trying to find in which urban area having high girls to boys ratio.
Thank you for helping me in advance.
| urban | allgirls | allboys |
| :---- | :------: | :-----: |
| Ran | 100 | 120 |
| Ran | 110 | 105 |
| dhanr | 80 | 73 |
| dhanr | 140 | 80 |
| mohan | 180 | 73 |
| mohan | 25 | 26 |
This is the query I used, but I did not get the expected results
SELECT urban, Max(allboys) as high_girls,Max(allgirls) as high_boys
from table_urban group by urban
Expected results
| urban | allgirls | allboys |
| :---- | :------: | :-----: |
| dhar | 220 | 153 |
First of all your example expected result doesn't seems correct because the girls to boys ratio is highest in "mohan" and not in "dhanr" - If what you are really looking for is the highest ratio and not the highest number of girls.
You need to first group and find the sum and then find the ratio (divide one with other) and get the first one.
select foo.urban as urban, foo.girls/foo.boys as ratio from (
SELECT urban, SUM(allboys) as boys, SUM(allgirls) as girls
FROM table_urban
GROUP BY urban) as foo order by ratio desc limit 1
SELECT urban, SUM(allboys) boys, SUM(allgirls) girls
FROM table_urban
GROUP BY urban
ORDER BY boys / girls -- or backward, "girls / boys"
LIMIT 1

Is there a way to create a "pivot group" of columns with t-sql?

This seems like a common need but, unfortunately, I can't find a solution.
Assume you have a query that outputs the following content:
| TimeFrame | User | Metric1 | Metric2 |
+------------+------------+---------+---------+
| TODAY | John Doe | 10 | 20 |
| MONTHTODAY | John Doe | 100 | 200 |
| TODAY | Jack Frost | 15 | 25 |
| MONTHTODAY | Jack Frost | 150 | 250 |
What I need as output after a pivot is data that looks like this:
| User | TODAY_Metric1 | TODAY_Metric2 | MONTHTODAY_Metric1 | MONTHTODAY_Metric2 |
+------------+---------------+---------------+--------------------+--------------------+
| John Doe | 10 | 20 |100 | 200 |
| Jack Frost | 15 | 25 |150 | 250 |
Note that I'm doing the pivoting on TimeFrame, however, columns Metric1 and Metric2 remain columns but are grouped by time frame values.
Can this be done within standard PIVOT syntax or will I need to write a more complex query to pull this data together in a result set specific to my needs?
You can do this with conditional aggregation:
select
user,
sum(case when timeframe = 'TODAY' then Metric_1 end) TODAY_Metric1,
sum(case when timeframe = 'TODAY' then Metric_2 end) TODAY_Metric2,
sum(case when timeframe = 'MONTHTODAY' then Metric_1 end) MONTHTODAY_Metric1,
sum(case when timeframe = 'MONTHTODAY' then Metric_2 end) MONTHTODAY_Metric2
from mytable
group by user
I tend to prefer the conditional aggregation technique over the vendor-specific implementations, because:
I find it simpler to understand and maintain
it is cross-RDBMS (so you can easily port it to some other database if needed)
it usually performs as well, or even better than vendor implementation (that usually rely upon it under the hood)

Query'd top 15 faults, need the accumulated downtime from another column

I'm currently trying to query up a list of the top 15 occurring faults on a PLC in the warehouse. I've gotten that part down:
Select top 15 fault_number, fault_message, count(*) FaultCount
from Faults_Stator
where T_stamp> dateadd(hour, -18, getdate())
Group by Fault_number, Fault_Message
Order by Faultcount desc
HOOOWEVER I now need to find out the accumulated downtime of said faults in the top 15 list, information in another column "Fault_duration". How would I go about doing this? Thanks in advance, you've all helped me so much already.
+--------------+---------------------------------------------+------------+
| Fault Number | Fault Message | FaultCount |
+--------------+---------------------------------------------+------------+
| 122 | ST10: Part A&B Failed | 23 |
| 4 | ST16: Part on Table B | 18 |
| 5 | ST7: No Spring Present on Part A | 15 |
| 6 | ST7: No Spring Present on Part B | 12 |
| 8 | ST3: No Pin Present B | 8 |
| 1 | ST5: No A Housing | 5 |
| 71 | ST4: Shuttle Right Not Loaded | 4 |
| 144 | ST15: Vertical Cylinder did not Retract | 3 |
| 98 | ST8: Plate Loader Can not Retract | 3 |
| 72 | ST4: Shuttle Left Not Loaded | 2 |
| 94 | ST8: Spring Gripper Cylinder did not Extend | 2 |
| 60 | ST8: Plate Loader Can not Retract | 1 |
| 83 | ST6: No A Spring Present | 1 |
| 2 | ST5: No B Housing | 1 |
| 51 | ST4: Vertical Cylinder did not Extend | 1 |
+--------------+---------------------------------------------+------------+
I know I wouldn't be using the same query, but I'm at a loss at how to do this next step.
Fault duration is a column which dictates how long the fault lasted in ms. I'm trying to have those accumulated next to the corresponding fault. So the first offender would have those 23 individual fault occurrences summed next to it, in another column.
You should be able to use the SUM accumulator:
Select top 15 fault_number, fault_message, count(*) FaultCount, SUM (Fault_duration) as FaultDuration
from Faults_Stator
where T_stamp> dateadd(hour, -18, getdate())
Group by Fault_number, Fault_Message
Order by Faultcount desc

T-SQL aggregate over contiguous dates more efficiently

I have a need to aggregate a sum over contiguous dates. I've seen solutions to similar problems that will return the start and end dates, but don't have a need to aggregate the data between those ranges. It's further complicated by the extremely large amounts of data involved, to the point that a simple self join takes an impractical amount of time (especially since the start and end date fields are unindexed)
I have a solution involving cursors, but I've generally been led to believe that cursors can always be more efficiently replaced with joins that will execute faster, but so far every solution I've tried with a query anywhere close to giving me the data I need takes an hour at least, and my cursor solutions takes about 10 seconds. So I'm asking if there is a more efficient answer.
And the data includes both buy and sell transactions and each row of aggregated contiguous dates returned also needs to list the transaction ID of the last sell that occurred before the first buy of the contiguous set of buy transactions.
An example of the data:
+------------------+------------+------------+------------------+--------------------+
| TRANSACTION_TYPE | TRANS_ID | StartDate | EndDate | Amount |
+------------------+------------+------------+------------------+--------------------+
| sell | 100 | 2/16/16 | 2/18/18 | $100.00 |
| sell | 101 | 3/1/16 | 6/6/16 | $121.00 |
| buy | 102 | 6/10/16 | 6/12/16 | $22.00 |
| buy | 103 | 6/12/16 | 6/14/16 | $0.35 |
| buy | 104 | 6/29/16 | 7/2/16 | $5.00 |
| sell | 105 | 7/3/16 | 7/6/16 | $115.00 |
| buy | 106 | 7/8/16 | 7/9/16 | $200.00 |
| sell | 107 | 7/10/16 | 7/13/16 | $4.35 |
| sell | 108 | 7/17/16 | 7/20/16 | $0.50 |
| buy | 109 | 7/25/16 | 7/29/16 | $33.00 |
| buy | 110 | 7/29/16 | 8/1/16 | $75.00 |
| buy | 111 | 8/1/16 | 8/3/16 | $0.33 |
| sell | 112 | 9/1/16 | 9/2/16 | $99.00 |
+------------------+------------+------------+------------------+--------------------+
Should have results like the following:
+------------+------------+------------------+--------------------+
| Last_Sell | StartDate | EndDate | Amount |
+------------+------------+------------------+--------------------+
| 101 | 6/10/16 | 6/14/18 | $22.35 |
| 101 | 6/29/16 | 7/2/16 | $5.00 |
| 105 | 7/8/16 | 7/9/16 | $200.00 |
| 108 | 7/25/16 | 8/3/16 | $108.33 |
+------------------+------------+------------+--------------------+
Right now I use queries to split the data into buys and sells, and just walk through the buy data, aggregating as I go, inserting into the return table every time I find a break in the dates, and I step through the sell table until I reach the last sell before the start date of the set of buys.
Walking linearly through cursors gives me a computational time of n. Even though cursors are orders of magnitude less efficient, it's still calculating in n, while I suspect the joins I would need to do would give me at least n log n. With the ridiculous amount of data I'm working with, the inefficiencies of cursors get swamped if it goes beyond linear time.
If I assume that the transaction id increases along with the dates, then you can get the last sales date using a cumulative max. Then, the adjacency can be found by using similar logic, but with a lag first:
with cte as (
select t.*,
sum(case when transaction_type = 'sell' then trans_id end) over
(order by trans_id) as last_sell,
lag(enddate) over (partition by transaction_type order by trans_id) as prev_enddate
from t
)
select last_sell, min(startdate) as startdate, max(enddate) as enddate,
sum(amount) as amount
from (select cte.*,
sum(case when startdate = dateadd(day, 1, prev_enddate) then 0 else 1 end) over (partition by last_sell order by trans_id) as grp
from cte
where transaction_type = 'buy'
) x
group by last_sell, grp;

SQL JOIN to omit other columns after first result

Here is the result I need, simplified:
select name, phonenumber
from contacttmp
left outer join phonetmp on (contacttmp.id = phonetmp.contact_id);
name | phonenumber
-------+--------------
bob | 111-222-3333
bob | 111-222-4444
bob | 111-222-5555
frank | 111-222-6666
joe | 111-222-7777
The query, however displays the name, I'm trying to omit the name after the first result:
name | phonenumber
-------+--------------
bob | 111-222-3333
| 111-222-4444
| 111-222-5555
frank | 111-222-6666
joe | 111-222-7777
Here's how I made the example tables and the data:
create table contacttmp (id serial, name text);
create table phonetmp (phoneNumber text, contact_id integer);
select * from contacttmp;
id | name
----+-------
1 | bob
2 | frank
3 | joe
select * from phonetmp ;
phonenumber | contact_id
--------------+------------
111-222-3333 | 1
111-222-4444 | 1
111-222-5555 | 1
111-222-6666 | 2
111-222-7777 | 3
Old part of question
I'm working on a contacts program in PHP and a requirement is to display the results but omit the other fields after the first record is displayed if there are multiple results of that same record.
From the postgres tutorial join examples I'm doing something like this with a left outer join:
SELECT *
FROM weather LEFT OUTER JOIN cities ON (weather.city = cities.name);
city | temp_lo | temp_hi | prcp | date | name | location
--------------+---------+---------+------+------------+---------------+-----------
Hayward | 37 | 54 | | 1994-11-29 | |
San Francisco | 46 | 50 | 0.25 | 1994-11-27 | San Francisco | (-194,53)
San Francisco | 43 | 57 | 0 | 1994-11-29 | San Francisco | (-194,53)
I can't figure out how to, or if it is possible to, alter the above query to not display the other fields after the first result.
For example, if we add the clause "WHERE location = '(-194,53)'" we don't want the second (and third if there is one) results to display the columns other than location, so the query (plus something extra) and the result would look like this:
SELECT *
FROM weather LEFT OUTER JOIN cities ON (weather.city = cities.name)
WHERE location = '(-194,53)';
city | temp_lo | temp_hi | prcp | date | name | location
--------------+---------+---------+------+------------+---------------+-----------
San Francisco | 46 | 50 | 0.25 | 1994-11-27 | San Francisco | (-194,53)
| | | | | | (-194,53)
Is this possible with some kind of JOIN or exclusion or other query? Or do I have to remove these fields in PHP after getting all the results (would rather not do).
To avoid confusion, I'm required to achieve a result set like:
city | temp_lo | temp_hi | prcp | date | name | location
--------------+---------+---------+------+------------+---------------+-----------
San Francisco | 46 | 50 | 0.25 | 1994-11-27 | San Francisco | (-194,53)
| | | | | | (-19,5)
| | | | | | (-94,3)
Philadelphia | 55 | 60 | 0.1 | 1995-12-12 | Philadelphia | (-1,1)
| | | | | | (-77,55)
| | | | | | (-3,33)
Where any additional results for the same record (city) with different locations would only display the different location.
You can do this type of logic in SQL, but it is not recommended. The result set from SQL queries is in a table format. Tables represented unordered sets and generally have all columns meaning the same thing.
So, having a result set that depends on the values from the "preceding" row is not a proper way to use SQL. Although you can get this result in Postgres, I do not recommend it. Usually, this type of formatting is done on the application side.
If you want to avoid repeating the same information, you can use a window function that tells you the position of that row in the group (a PARTITION for this purpose, not a group in the GROUP BY sense), then hide the text for the columns you don't want to repeat if that position in the group is greater than 1.
WITH joined_results AS (
SELECT
w.city, c.location, w.temp_lo, w.temp_hi, w.prcp, w.date,
ROW_NUMBER() OVER (PARTITION BY w.city, c.location ORDER BY date) AS pos
FROM weather w
LEFT OUTER JOIN cities c ON (w.city = c.name)
ORDER BY w.city, c.location
)
SELECT
CASE WHEN pos > 1 THEN '' ELSE city END,
CASE WHEN pos > 1 THEN '' ELSE location END,
temp_lo, temp_hi, prcp, date
FROM joined_results;
This should give you this:
city | location | temp_lo | temp_hi | prcp | date
---------------+-----------+---------+---------+------+------------
Hayward | | 37 | 54 | | 1994-11-29
San Francisco | (-194,53) | 46 | 50 | 0.25 | 1994-11-27
| | 43 | 57 | 0 | 1994-11-29
To understand what ROW_NUMBER() OVER (PARTITION BY w.city, c.location ORDER BY date) AS pos does, it probably worth looking at what you get with SELECT * FROM joined_results:
city | location | temp_lo | temp_hi | prcp | date | pos
---------------+-----------+---------+---------+------+------------+-----
Hayward | | 37 | 54 | | 1994-11-29 | 1
San Francisco | (-194,53) | 46 | 50 | 0.25 | 1994-11-27 | 1
San Francisco | (-194,53) | 43 | 57 | 0 | 1994-11-29 | 2
After that, just replace what you don't want with white space using CASE WHEN pos > 1 THEN '' ELSE ... END.
(This being said, it's something I'd generally prefer to do in the presentation layer rather than in the query.)
Consider the slightly modified test case in the fiddle below.
Simple case
For the simple case dealing with a single column from each column, comparing to the previous row with the window function lag() does the job:
SELECT CASE WHEN lag(c.contact) OVER (ORDER BY c.contact, p.phone_nr)
= c.contact THEN NULL ELSE c.contact END
, p.phone_nr
FROM contact c
LEFT JOIN phone p USING (contact_id);
You could repeat that for n columns, but that's tedious
For many columns
SELECT c.*, p.phone_nr
FROM (
SELECT *
, row_number() OVER (PARTITION BY contact_id ORDER BY phone_nr) AS rn
FROM phone
) p
LEFT JOIN contact c ON c.contact_id = p.contact_id AND p.rn = 1;
Something like a "reverse LEFT JOIN". This is assuming referential integrity (no missing rows in contact. Also, contacts without any entries in phone are not in the result. Easy to add if need should be.
SQL Fiddle.
Aside, your query in the first example exhibits a rookie mistake.
SELECT * FROM weather LEFT OUTER JOIN cities ON (weather.city = cities.name)
WHERE location = '(-194,53)';
One does not combine a LEFT JOIN with a WHERE clause on the right table. Doesn't makes sense. Details:
Explain JOIN vs. LEFT JOIN and WHERE condition performance suggestion in more detail
Except to test for existence ...
Select rows which are not present in other table