Best solution for SQL without looping - sql

I'm relatively new to SQL, and am trying to find the best way to attack this problem.
I am trying to take data from 2 tables and start merging them together to perform analysis on it, but I don't know the best way to go about this without looping or many nested subqueries.
What I've done so far:
I have 2 tables. Table1 has user information and Table2 has information on orders(prices and dates, as well as user)
What I need to do:
I want to have a single row for each user that has a summary of information about all of their orders. I'm looking to find the sum of prices of all orders by each user, the max price paid by that user, and the number of orders. I'm not sure how to best manipulate my data in SQL.
Currently, my code looks as follows:
Select alias1.*, Table2.order_id, Table2.price, Table2.order_date
From (Select * from Table1 where country='United States') as alias1
LEFT JOIN Table2
on alias1.user_id = Table2.user_id
This filters out the datatypes by country, and then joins it with users, creating a record of each order including the user information. I don't know if this is a helpful step, but this is part of my first attempt playing around with the data. I was thinking of looping over this, but I know that is against the spirit of SQL
Edit: Here is an example of what I have and what I want:
Table 1(user info):
user_id user_country
1 United States
2 United Kingdom
(etc)
Table 2(order info):
order_id price user_id
100 5.00 1
101 3.50 2
102 2.50 1
103 1.00 1
104 8.00 2
What I would like output:
user_id user_country total_price max_price number_of_orders
1 United States 8.50 5.00 3
2 United Kingdom 11.50 8.00 2

Here's one way to do this:
SELECT alias1.user_id,
MAX(alias1.user_name) As user_name,
SUM(Table2.price) As UsersTotalPrice,
MAX(Table2.price) As UsersHighestPrice
FROM Table1 As alias1
LEFT JOIN Table2 ON alias1.user_id = Table2.user_id
WHERE country = 'United States'
GROUP BY user_id
If you can give us the actual table definitions, then we can show you some actual working queries.

Something like this? Agregate the rows in table2 and then join to table 1 for the detail info you want?
SELECT Table1.*,agg.thesum FROM
(SELECT UserID, SUM(aggregatedata) as thesum FROM Table2 GROUP BY UserID) agg
INNER JOIN Table1 on table1.userid = agg.userid

This should work
select table1.*, t2.total_price, t2.max_price, t2.order_count
from table1
join (selectt user_id, sum(table2.price) as total_price, max(table2.price) as max_price, count(order_id) as order_count from table2 as t2 group by t2.user_id)
on table1.user_id = t2.user_id
where t1.country = 'untied_states'

EDIT: (removed:"dont use explicit join" this was wrong, I meant:)
Try to use the following Sytax, for better understanding what goes on:
1st step:
select
user.user_id, -- < you must tell the DB userid of which column
user_country,
price,
price
from -- now just the two tables:
Table1 as user, --table1 is a bad name, we use 'user'
Table2 as order
where user.user_id = order.user_id
so you will get somthing like:
user_id user_country price price
1 alabama 5 5
2 nebrasca 1 1
2 alabama 7 7
1 alabama 7 7
2 alabama 3 7
and so on ..
the next step is to add an other where usercountry='alabama' so 'nebrasca' is off
user_id user_country price price
1 alabama 5 5
2 alabama 7 7
1 alabama 7 7
2 alabama 3 7
now you are ready for "aggregate": just select the MAX and SUM of price, but you have to tell the SQL engine what columes are 'fixed' = group by
select
user.user_id, user_country, MAX(price), SUM(price)
from
Table1 as user,
Table2 as order
where user.user_id = order.user_id
and user_country='alabama'
group by user_id, user_country

Related

Postgres, groupBy and count for table and relations at the same time

I have a table called 'users' that has the following structure:
id (PK)
campaign_id
createdAt
1
123
2022-07-14T10:30:01.967Z
2
1234
2022-07-14T10:30:01.967Z
3
123
2022-07-14T10:30:01.967Z
4
123
2022-07-14T10:30:01.967Z
At the same time I have a table that tracks clicks per user:
id (PK)
user_id(FK)
createdAt
1
1
2022-07-14T10:30:01.967Z
2
2
2022-07-14T10:30:01.967Z
3
2
2022-07-14T10:30:01.967Z
4
2
2022-07-14T10:30:01.967Z
Both of these table are up to millions of records... I need the most efficient query to group the data per campaign_id.
The result I am looking for would look like this:
campaign_id
total_users
total_clicks
123
3
1
1234
1
3
I unfortunately have no idea how to achieve this while minding performance and most important of it all I need to use WHERE or HAVING to limit the query in a certain time range by createdAt
Note, PostgreSQL is not my forte, nor is SQL. But, I'm learning spending some time on your question. Have a go with INNER JOIN after two seperate SELECT() statements:
SELECT * FROM
(
SELECT campaign_id, COUNT (t1."id(PK)") total_users FROM t1 GROUP BY campaign_id
) tbl1
INNER JOIN
(
SELECT campaign_id, COUNT (t2."user_id(FK)") total_clicks FROM t2 INNER JOIN t1 ON t1."id(PK)" = t2."user_id(FK)" GROUP BY campaign_id
) tbl2
USING(campaign_id)
See an online fiddle. I believe this is now also ready for a WHERE clause in both SELECT statements to filter by "createdAt". I'm pretty sure someone else will come up with something better.
Good luck.
Hope this will help you.
select u.campaign_id,
count(distinct u.id) users_count,
count(c.user_id) clicks_count
from
users u left join clicks c on u.id=c.user_id
group by 1;
See here query output

Optimal SQL to perform multiple aggregate functions with different group by fields

To simplify a complex query I am working on, I feel like solving this is key.
I have the following table
id
city
Item
1
chicago
1
2
chicago
2
3
chicago
1
4
cedar
2
5
cedar
1
6
cedar
2
7
detroit
1
I am trying to find the ratio of number of rows grouped by city and item to the number of rows grouped by just the items for each and every unique city-item pair.
So I would like something like this
City
Item
groupCityItemCount
groupItemCount
Ratio
chicago
1
2
4
2/4
chicago
2
1
3
1/3
cedar
1
1
4
1/4
cedar
2
2
3
2/3
detroit
1
1
4
1/4
This is my current solution but its too slow.
Select city, item, (count(*) / (select count(*) from records t2 where t1.item=t2.item)) AS pen_ratio
From records t1
Group By city, item
Also replaced where with groupBy and having but that is also slow.
Select city, item, (count(*) / (select count(*) from records t2 group by item having t1.item=t2.item)) AS pen_ratio
From records t1
Group By city, item
(Note: I have removed column3 and column4 from the solution for smaller code)
(Edit: Typo as pointed out by xQbert and
MatBailie)
Is it slow because it's evaluating each row separately with the subquery in the select statement? It may be operating as a correlated subquery.
If that's the case it might be faster if you get the values out of a join and go from there -
Select city, t1.item, (COUNT(t1.item) / MAX(t2.it_count)) AS pen_ratio
from records t1
JOIN (SELECT item, count(item) AS it_count
FROM records
group by item) t2
ON t2.item = t1.item
GROUP BY city, t1.item
Updated some errors and included the fiddle based off the starting point from xQbert. I had to CAST as float in the fiddle, but you may not need to CAST and use the above query in yours depending on datatypes.
I believe this follows the intent of your original query.
https://dbfiddle.uk/?rdbms=postgres_13&fiddle=d77a715175159304b9192a16ad903347
You can approach it in two parts.
First, aggregate to the level you're interested in, as normal.
Then, use analytical functions to work out subtotals across your partitions (item, in your case).
WITH
aggregate AS
(
SELECT
city,
item,
COUNT(*) AS row_count
FROM
records
GROUP BY
city,
item
)
SELECT
city,
item,
row_count AS groupCityItemCount,
SUM(row_count) OVER (PARTITION BY item) AS groupItemCount
FROM
aggregate
Fiddle borrowed from xQbert
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=730146262267412522f6e27796151f43

Union & Pivot multiple tables with different values for similar records

I have multiple tables with the same customer records, but each table has it's own cost currency.
Table 1:
User Country COST_USD
1 USA 10
2 USA 5
3 USA 3
Table 2:
User Country COST_EUR
1 USA 12
2 USA 7
3 USA 5
Table 3:
User Country COST_YEN
1 USA 100
2 USA 50
3 USA 30
What I am looking for is to Union those tables and then pivot the currencies to individual columns (or pivot then union) as follows:
User Country COST_USD COST_EUR COST_YEN
1 USA 10 12 100
2 USA 5 7 50
3 USA 3 3 30
I have tried union all and then pivot but that didn't work since I have differnt currency columns.
Is this what you were thinking of-
select user, country, MAX(COST_USD) COST_USD,MAX(COST_EUR) COST_EUR,MAX(COST_YEN ) COST_YEN
from
(
select user, country, COST_USD, NULL AS COST_EUR, NULL AS COST_YEN from table1
union all
select user, country, NULL AS COST_USD, COST_EUR, NULL AS COST_YEN from table2
union all
select user, country, NULL AS COST_USD, NULL AS COST_EUR, COST_YEN from table3
)T
group by user, country
but if you have many currency columns then you ought to maybe unpivot and union all and the pivot back.
select user, country, currency, amount
from
(select user, country, cost_curr from tableN)U
unpivot
(amount for currency in (COST_EUR, COST_USD,COST_YEN, cost_curr))UPVT
The above is done for all tables and the all resulting unpivots are unioned all and the pivoted back.
As you can see this is quite tedious.
or change you design if possible.
As I mentioned in the comments, this is just a JOIN. Based on your sample data, an INNER JOIN:
SELECT T1.[User], --USER is a reserved keyword, and should not be used for object names
T1.Country,
T1.COST_USD,
T2.COST_EUR,
T3.COST_YEN
FROM dbo.Table1 T1
JOIN dbo.Table2 T2 ON T1.[User] = T2.[User] --USER is a reserved keyword, and should not be used for object names
AND T1.Country = T2.Country
JOIN dbo.Table3 T3 ON T1.[User] = T3.[User] --USER is a reserved keyword, and should not be used for object names
AND T1.Country = T3.Country;
Of course, I doubt all your table have a value for a specific user, so you likely want a FULL OUTER JOIN:
SELECT COALESCE(T1.[User],T2.[User],T3.[User]) AS [User], --USER is a reserved keyword, and should not be used for object names
COALESCE(T1.Country,T2.Country,T3.Country) AS Country,
T1.COST_USD,
T2.COST_EUR,
T3.COST_YEN
FROM dbo.Table1 T1
FULL OUTER JOIN dbo.Table2 T2 ON T1.[User] = T2.[User] --USER is a reserved keyword, and should not be used for object names
AND T1.Country = T2.Country
FULL OUTER JOIN dbo.Table3 T3 ON T3.[User] IN (T1.[User],T2.[User])--USER is a reserved keyword, and should not be used for object names
AND T3.Country IN (T1.Country,T2.Country);
But, like I mentioned, the real solution is fix your design, and have a single table that looks something like this:
CREATE TABLE dbo.YourTable (UserID int,
Country nvarchar(50),
Currency char(3),
Value decimal(12,0));
Then your data would look like this:
INSERT INTO dbo.YourTable
VALUES(1,'USA','USD',10),
(1,'USA','EUR',12),
(1,'USA','YEN',100);
And finally you would get your results with conditional aggregation:
SELECT UserID,
Country,
MAX(CASE Currency WHEN 'USD' THEN Value END) AS COST_USD,
MAX(CASE Currency WHEN 'EUR' THEN Value END) AS COST_EUR,
MAX(CASE Currency WHEN 'YEN' THEN Value END) AS COST_YEN
FROM dbo.YourTable
GROUP BY UserID,
Country;

Using self join to find duplicates in SQL

I know that there are other questions like this. However, my question is about why the query that I am using is not returning the optimal results. Below is the query. To give context, I have a single table that has 113 columns/fields. However, only 4 really matter; acct, year, qtr, cnty (county). This table is a list of employers by establishment. An employer can appear more than once. The same person owning 12 starbucks being the best example. What I am looking for is a query that will show when acct values have different cnty values. The below query works without error but it shows far too much. It shows rows where the acct value is the same but the cnty value is the same as well. Any thoughts on looking at this query as to why it shows too much?
select distinct t1.acct, t1.year, t1.qtr, t1.cnty
from dbo.table t1 join dbo.table t2 on t1.acct=t2.acct
where (t1.cnty <> t2.cnty)
order by t1.acct, t1.year, t1.qtr, t1.cnty
Intended result
acct year qtr cnty
1234567777 2007 4 7
1234567777 2008 1 9
1234567890 2006 4 31
1234567890 2007 1 3
2345678901 2006 4 7
2345678901 2007 2 1
Is this what you want?
select distinct t.acct, t.year, t.qtr, t.cnty
from (select t.*, min(cnty) over (partition by acct, year, qtr) as min_cnty,
max(cnty) over (partition by acct, year, qtr) as max_cnty
from dbo.table t
) t
where min_cnty <> max_cnty;

SQL Grouping a Count Select With Aggregate Total

I've been working on this for far too many hours now and hit the wall. Hoping an SQL guru can help shed some light.
SELECT
CATEGORY.CategoryID, CATEGORY.Category_Name, CATEGORY_SUB.CategoryID AS Expr1,
CATEGORY_SUB.SubCategory_Name, COUNT(SELL_1.Item_SubCategory) AS Count,
(SELECT COUNT(Item_Category) AS Expr10
FROM SELL WHERE (UserName = 'me')
GROUP BY Item_Category) AS Expr20
FROM SELL AS SELL_1 LEFT OUTER JOIN
CATEGORY ON
SELL_1.Item_Category = CATEGORY.Category_Name
LEFT OUTER JOIN CATEGORY_SUB ON
CATEGORY.CategoryID = CATEGORY_SUB.CategoryID AND SELL_1.Item_SubCategory = CATEGORY_SUB.SubCategory_Name WHERE (SELL_1.Seller_UserName = 'me') AND (SELL_1.Item_Removed IS NULL) AND (SELL_1.Item_Pause IS NULL) AND (SELL_1.Item_Expires > GETDATE())
GROUP BY CATEGORY.Category_Name, CATEGORY_SUB.SubCategory_Name, CATEGORY.CategoryID, CATEGORY_SUB.CategoryID
ORDER BY Count DESC
In short the table returned should how the following columns where Expr20 is a "sum" or aggregate of the total counts of CategoryName so for example.
CategoryID CategoryName Expr1 SubCategory_Name Count Expr20
1 CatA 200 SubCatA1 1 1
1 CatA 201 SubCatA2 2 3
1 CatA 202 SubCatA3 4 7
2 CatB 301 SubCatB1 1 1
2 CatB 302 SubCatB2 4 5
3 CatC 401 SubCatC1 3 3
3 CatC 402 SubCatC2 2 5
3 CatC 403 SubCatC3 4 9
And So on.
My problem is no matter what I do I cannot seem to get Expr20 to work.
It seems the problem is with MS SQL wanting the alias after the (SELECT COUNT(Item_Category) so then it throws the error because 2 columns are returned.
I'm running MS SQL 2005. Grateful for any help
Really struggled with this and in the end used maybe a more elegant solution but potentially more server intensive...I'm not sure as I'm no SQL expert...but wanted to post my solution.
SELECT T1.CategoryID, Expr20, etc...
FROM
(
SELECT COUNT(Item_Category)
FROM SELL WHERE (UserName = 'me')
GROUP BY Item_Category) AS Expr20
) T1
JOIN
(
SELECT CATEGORY.CategoryID, CATEGORY.Category_Name, CATEGORY_SUB.CategoryID AS Expr1, CATEGORY_SUB.SubCategory_Name, COUNT(SELL.Item_SubCategory) AS Count...etc as shown in the question) T2
ON T1.Item_Category = T2.Category_Name
ORDER BY T1.Counted DESC
Worked a treat and I got the table and results I needed grouping the category names with the correct number of sum total per line.
So the trick was to make a select around the 2 selects rather than trying to join them as this just doesn't seem possible.
How this helps someone and saves them the 13 hours or hair pulling I went through last night.
It is a bit hard to see what data you are starting with. But, assuming you have all columns except Expr20, you can use outer apply or a correlated subquery:
select t.*, t2.Expr20
from sell t outer apply
(select sum(count) as Expr20
from sell t2
where t2.CategoryId = t.CategoryId and
t2.expr1 <= t.expr1
) t2;