Access SQL: Combining SELECT and USER INPUT - sql

I'm creating a form in Access where users can select a commodity out of a list of possible commodities, and then a query calculates the average price of the selected commodity.
The input field for the user is a list (List147). Let's say the user selects Copper, then I want the average to be returned for Copper. The prices of all commodities are in a table called CommMaterial. The snip below shows what the table looks like.
I'm fairly new at SQL and am not sure how to code this. It appears as if the SELECT statement needs to be dynamic, but I don't know how to do this. I envision something like this:
SELECT AVG(CommMaterial.[Forms]![NameForm]![List147])
FROM CommMaterial;

To keep from dynamic sql and VBA, You could use some SQL to get your table into a slightly more query-able format:
SELECT DateComm, 'Copper' as Metal, Copper as Price FROM CommMaterial
UNION ALL
SELECT DateComm, 'Nickel' as Metal, Nickel as Price FROM CommMaterial
UNION ALL
SELECT DateComm, 'Aluminum' as Metal, Aluminum as Price FROM CommMaterial;
Which will give you a result set with three columns:
DateComm | Metal | Price
You could save that as a query qry_CommMaterial and then your SQL would be:
SELECT Avg(Price) FROM qry_CommMaterial WHERE metal = [Forms]![NameForm]![List147];
You could also just force it all into one big statement too:
SELECT Avg(Price)
FROM (
SELECT DateComm, 'Copper' as Metal, Copper as Price FROM CommMaterial
UNION ALL
SELECT DateComm, 'Nickel' as Metal, Nickel as Price FROM CommMaterial
UNION ALL
SELECT DateComm, 'Aluminum' as Metal, Aluminum as Price FROM CommMaterial) as subUnion
WHERE metal = [Forms]![NameForm]![List147];

Related

Grouping columns to find most popular product for each country

I'm a SQL beginner, practicing through various sources. I have a table called marketing_data containing product sales information, country and other variables where I'm trying to get an output for the most popular product per country, based on sales. I don't know where to begin with my syntax.
This is how the data looks in the table
I've previously run this code to see total sales for each product per country:
SELECT Country,
SUM(Liquids) AS TotalLiquids,
SUM(Veg) AS TotalVeg,
SUM(NonVeg) AS TotalNonVeg,
SUM(Fish) AS TotalFish,
SUM(Chocolates) AS TotalChocolates,
SUM(Commodities) AS TotalCommodities
FROM marketing_data
GROUP BY COUNTRY;
This gave me a useful table, but I'd like to simply see which product has the highest sales for each country, so The output I'm trying to get would hopefully look something like this:
Country
Most popular product
Sp
Liquids
IND
NonVeg
In most DBMS, you can use the query you've shown as subquery.
And then use GREATEST with CASE WHEN to create the expected outcome.
So you can do following:
SELECT country,
CASE GREATEST(Liquids, Veg, NonVeg, Fish, Chocolates, Commodities)
WHEN Liquids THEN 'Liquids'
WHEN Veg THEN 'Veg'
WHEN NonVeg THEN 'NonVeg'
WHEN Fish THEN 'Fish'
WHEN Chocolates THEN 'Chocolates'
ELSE 'Commodities'
END AS MostPopularProduct
FROM
(SELECT Country,
SUM(Liquids) AS Liquids,
SUM(Veg) AS Veg,
SUM(NonVeg) AS NonVeg,
SUM(Fish) AS Fish,
SUM(Chocolates) AS Chocolates,
SUM(Commodities) AS Commodities
FROM marketing_data
GROUP BY COUNTRY) sub;
But attention! If the values you want to sum can be NULL, you need to replace null values by another value (very likely, you want them to be zero), otherwise the whole sum will be null, too!
That's a typical use case for COALESCE:
SELECT country,
CASE GREATEST(Liquids, Veg, NonVeg, Fish, Chocolates, Commodities)
WHEN Liquids THEN 'Liquids'
WHEN Veg THEN 'Veg'
WHEN NonVeg THEN 'NonVeg'
WHEN Fish THEN 'Fish'
WHEN Chocolates THEN 'Chocolates'
ELSE 'Commodities'
END AS MostPopularProduct
FROM
(SELECT Country,
SUM(COALESCE(Liquids,0)) AS Liquids,
SUM(COALESCE(Veg,0)) AS Veg,
SUM(COALESCE(NonVeg,0)) AS NonVeg,
SUM(COALESCE(Fish,0)) AS Fish,
SUM(COALESCE(Chocolates,0)) AS Chocolates,
SUM(COALESCE(Commodities,0)) AS Commodities
FROM marketing_data
GROUP BY COUNTRY) sub;
This query will do on Oracle DB's, MySQL DB's, Maria DB's, Postgres DB's and SQLServer 2022 DB's.
If your DBMS is none of them, you can also use this concept, but you likely have to replace GREATEST by a similar function that they provide.

match tables with intermediate mapping table (fuzzy joins with similar strings)

I'm using BigQuery.
I have two simple tables with "bad" data quality from our systems. One represents revenue and the other production rows for bus journeys.
I need to match every journey to a revenue transaction but I only have a set of fields and no key and I don't really know how to do this matching.
This is a sample of the data:
Revenue
Year, Agreement, Station_origin, Station_destination, Product
2020, 123123, London, Manchester, Qwerty
Journeys
Year, Agreement, Station_origin, Station_destination, Product
2020, 123123, Kings Cross, Piccadilly Gardens, Qwer
2020, 123123, Kings Cross, Victoria Station, Qwert
2020, 123123, London, Manchester, Qwerty
Every station has a maximum of 9 alternative names and these are stored in a "station" table.
Stations
Station Name, Station Name 2, Station Name 3,...
London, Kings Cross, Euston,...
Manchester, Piccadilly Gardens, Victoria Station,...
I would like to test matching or joining the tables first with the original fields. This will generate some matches but there are many journeys that are not matched. For the unmatched revenue rows, I would like to change the product name (shorten it to two letters and possibly get many matches from production table) and then station names by first change the station_origin and then station_destination. When using a shorter product name I could possibly get many matches but I want the row from the production table with the most common product.
Something like this:
1. Do a direct match. That is, I can use the fields as they are in the tables.
2. Do a match where the revenue.product is changed by shortening it to two letters. substr(product,0,2)
3. Change the rev.station_origin to the first alternative, Station Name 2, and then try a join. The product or other station are not changed.
4. Change the rev.station_origin to the first alternative, Station Name 2, and then try a join. The product is changed as above with a substr(product,0,2) but rev.station_destination is not changed.
5. Change the rev.station_destination to the first alternative, Station Name 2, and then try a join. The product or other station are not changed.
I was told that maybe I should create an intermediate table with all combinations of stations and products and let a rank column decide the order. The station names in the station's table are in order of importance so "station name" is more important than "station name 2" and so on.
I started to do a query with a subquery per rank and do a UNION ALL but there are so many combinations that there must be another way to do this.
Don't know if this makes any sense but I would appreciate any help or ideas to do this in a better way.
Cheers,
Cris
To implement a complex joining strategy with approximate matching, it might make more sense to define the strategy within JavaScript - and call the function from a BigQuery SQL query.
For example, the following query does the following steps:
Take the top 200 male names in the US.
Find if one of the top 200 female names matches.
If not, look for the most similar female name within the options.
Note that the logic to choose the closest option is encapsulated within the JS UDF fhoffa.x.fuzzy_extract_one(). See https://medium.com/#hoffa/new-in-bigquery-persistent-udfs-c9ea4100fd83 to learn more about this.
WITH data AS (
SELECT name, gender, SUM(number) c
FROM `bigquery-public-data.usa_names.usa_1910_2013`
GROUP BY 1,2
), top_men AS (
SELECT * FROM data WHERE gender='M'
ORDER BY c DESC LIMIT 200
), top_women AS (
SELECT * FROM data WHERE gender='F'
ORDER BY c DESC LIMIT 200
)
SELECT name male_name,
COALESCE(
(SELECT name FROM top_women WHERE name=a.name)
, fhoffa.x.fuzzy_extract_one(name, ARRAY(SELECT name FROM top_women))
) female_version
FROM top_men a

BigQuery: grouping by similar strings for a large dataset

I have a table of invoice data with over 100k unique invoices and several thousand unique company names associated with them.
I'm trying to group these company names into more general groups to understand how many invoices they're responsible for, how often they receive them, etc.
Currently, I'm using the following code to identify unique company names:
SELECT DISTINCT(company_name)
FROM invoice_data
ORDER BY company_name
The problem is that this only gives me exact matches, when its obvious that there are many string values in company_name that are similar. For example: McDonalds Paddington, McDonlads Oxford Square, McDonalds Peckham, etc.
How can I make by GROUP BY statement more general?
Sometimes the issue isn't as simple as the example listed above, occasionally there is simply an extra space or PTY/LTD which throws off a GROUP BY match.
EDIT
To give an example of what I'm looking for, I'd be looking to turn the following:
company_name
----------------------
Jim's Pizza Paddington|
Jim's Pizza Oxford |
McDonald's Peckham |
McDonald's Victoria |
-----------------------
And be able to group by their company name rather than exclusively with an exact string match.
Have you tried using the Soundex function?
SELECT
SOUNDEX(name) AS code,
MAX( name) AS sample_name,
count(name) as records
FROM ((
SELECT
"Jim's Pizza Paddington" AS name)
UNION ALL (
SELECT
"Jim's Pizza Oxford" AS name)
UNION ALL (
SELECT
"McDonald's Peckham" AS name)
UNION ALL (
SELECT
"McDonald's Victoria" AS name))
GROUP BY
1
ORDER BY
You can then use the soundex to create groupings, with a split or other type of function to pull the part of the string which matches the name group or use a windows function to pull back one occurrence to get the name string. Not perfect but means you do not need to pull into other tools with advanced language recognition.

Query to order data while maintaining grouping?

I have a request which I can accomplish in code but am wondering if it is at all possible do do on SQL alone. I have a products table that has a Category column and a Price column. What I want to achieve is all of the products grouped together by Category, and then ordered by the cheapest to most expensive in both the category and all the categories combined. So for example :
Category | Price
--------------|---------------------
Basin | 500
Basin | 700
Basin | 750
Accessories | 550
Accessories | 700
Accessories | 1000
Bath | 700
As you can see the cheapest item is a basin for 500, then an Accessory for 550 then a bath for 700. So I need the categories of products to be sorted by their cheapest item, and then each category itself in turn to be sorted cheapest to most expensive.
I have tried partitioning, grouping sets ( which i know nothing about ) but still no luck so eventually resorted to my strength ( C# ) but would prefer to do it straight in SQL if possible. One last side note : This query is hit quite often so performance is key so if possible i would like to avoid temp tables / cursors etc
I think using MIN() with a window (OVER) makes it clearest what the intent is:
declare #t table (Category varchar(19) not null,Price int not null)
insert into #t (Category,Price) values
('Basin',500),
('Basin',700),
('Basin',750),
('Accessories',550),
('Accessories',700),
('Accessories',1000),
('Bath',700)
;With FindLowest as (
select *,
MIN(Price) OVER (PARTITION BY Category) as Lowest
from
#t
)
select * from FindLowest
order by Lowest,Category,Price
If two categories share the same lowest price, this will still keep the two categories separate and sort them alphabetically.
Select...
Order by category, price desc
SELECT p.category,p.price
FROM products p,(select category,min(price) mn from products group by category order by mn) tab1
WHERE p.category=tab1.category
GROUP BY p.category,p.price,tab1.mn
order by tab1.mn,p.category;
Is this what you want?
I think, you do not need GROUP BY clause in your query. If I got your goal correctly, you can try by substituting actual categories in your ORDER BY clause with the minimum price per category inside the subquery.That will allow you getting all the categories sequential, i.e. not Basin - 500; Accessories - 550, but everything for Basin first. After that, you can group by ordinary Price inside each category.
SELECT *
FROM products p
ORDER BY
(SELECT MIN(Price) FROM products p2 WHERE p2.Category=p.Category
),
Price;

SQL Select: picking the right distinct record based on another field

How does one filter a list of records to remove those that have some identical fields, based on selecting the one with the minimum value in another field? Note that it's not sufficient to just get the minimum value... I need to have other fields from the same record.
I have a table of "products", and I am trying to add the ability to apply a coupon code. Because of how the invoices are generated, selling a product at a different cost is considered a different product. In the database you might see this:
Product ID, Product Cost, Product Name, Coupon Code
1, 20, Product1, null
2, 10, Product1, COUPON1
3, 40, Product2, null
I have a query that selects a list of all products available now (based on other criteria; I'm simplifying this a lot). The problem is that, for the above case, my query returns:
1 - Product1 for $20
2 - Product1 for $10
3 - Product2 for $40
This gets shown to the customer (assuming they've entered the coupon code), and it's obviously bad form to show a customer the same product for two prices. What I want is:
2 - Product1 for $10
3 - Product2 for $40
i.e., showing the lowest-costing version of each product.
I need a solution that will work for MySQL, but the preferred solution would be standard SQL.
Try this:
SELECT T2.*
FROM
(
SELECT `Product Name` AS name, MIN(`Product Cost`) AS cost
FROM products
GROUP BY `Product Name`
) T1
JOIN products T2
ON T1.name = T2.`Product Name`
AND T1.cost = T2.`Product Cost`
To get the output exactly as you described as a string replace the first line with:
SELECT CONCAT(`Product ID`, ' - ', T1.name, ' for $', T1.cost)