grouping with right function sql

grouping with right function sql - sql

I have a query on grouping which I need to do a quick fix on. I am at present grouping column A and counting the value in column B.
select
Column A,
Count ([Column B])
from table1
Group by Column A
The issue is that column A has some entries which are not standard for example.
ABC 100
ABC~ 3
BCA 120
BCA* 4
I need to blast the data to fix long term, but there are 3m rows, so not a quick job, as I need to create a mapping file to deal with the problem.
I currently get returned duplicate entries which is right in theory, but in practice I would like to group the ABC, by either trimming the column to only 3 characters or doing a right. However I have tried it in the select statement and it just removes the ~ or * entry and sums the standard ABC or BCA.

Have you tried something ???
select LEFT([Column A], 3),
Count ([Column B])
from table1
Group by LEFT([Column A], 3)

Related

How to take average of two columns row by row in SQL?

I have a table match which looks like this (please see attached image). I wanted to retrieve a dataset that had a column of average values for home_goal and away_goal using this code
SELECT
m.country_id,
m.season,
m.home_goal,
m.away_goal,
AVG(m.home_goal + m.away_goal) AS avg_goal
FROM match AS m;
However, I got this error
column "m.country_id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 3: m.country_id,
My question is: why was GROUP BY clause required? Why couldn't SQL know how to take average of two columns row by row?
Thank you.

try this:
SELECT
m.country_id,
m.season,
m.home_goal,
m.away_goal,
(m.home_goal + m.away_goal)/2 AS avg_goal
FROM match AS m;
You have been asked for the group_by as avg() much like sum() work on multiple values of one column where you classify all columns that are not a columns wise operation in the group by
You are looking to average two distinct columns - it is a row-wise operations instead of column-wise

how to take average of two columns row by row?
You don't use AVG() for this; it is an aggregate function, that operates over a set of rows. Here, it seems like you just want a simple math computation:
SELECT
m.country_id,
m.season,
m.home_goal,
m.away_goal,
(m.home_goal + m.away_goal) / 2.0 AS avg_goal
FROM match AS m;
Note the decimal denominator (2.0): this avoids integer division in databases that implement it.

Avg in the context of the function mentioned above is calculating the average of the values of the columns and not the average of the two values in the same row. It is an aggregate function and that’s why the group by clause is required.
In order to take the average of two columns in the same row you need to divide by 2.

Let's consider the following table:
CREATE TABLE Numbers([x] int, [y] int, [category] nvarchar(10));
INSERT INTO Numbers ([x], [y], [category])
VALUES
(1, 11, 'odd'),
(2, 22, 'even'),
(3, 33, 'odd'),
(4, 44, 'even');
Here is an example of using two aggregate functions - AVG and SUM - with GROUP BY:
SELECT
Category,
AVG(x) as avg_x,
AVG(x+y) as avg_xy,
SUM(x) as sum_x,
SUM(x+y) as sum_xy
FROM Numbers
GROUP BY Category
The result has two rows:
Category avg_x avg_xy sum_x sum_xy
even 3 36 6 72
odd 2 24 4 48
Please note that Category is available in the SELECT part because the results are GROUP BY'ed by it. If a GROUP BY is not specified then the result would be 1 row and Category is not available (which value should be displayed if we have sums and averages for multiple rows with different caetories?).
What you want is to compute a new column and for this you don't use aggregate functions:
SELECT
(x+y)/2 as avg_xy,
(x+y) as sum_xy
FROM Numbers
This returns all rows:
avg_xy sum_xy
6 12
12 24
18 36
24 48
If your columns are integers don't forget to handle rounding, if needed. For example (CAST(x AS DECIMAL)+y)/2 as avg_xy,

The simple arithmetic calculation:
(m.home_goal + m.away_goal) / 2.0
is not exactly equivalent to AVG(), because NULL values mess it up. Databases that support lateral joins provide a pretty easy (and efficient) way to actually use AVG() within a row.
The safe version looks like:
(coalesce(m.home_goal, 0) + coalesce(m.away_goal, 0)) /
nullif( (case when m.home_goal is not null then 1 else 0 end +
case when m.away_goal is not null then 1 else 0 end
), 0
)
Some databases have syntax extensions that allow the expression to be simplified.

Getting rows for values greater by 3 numbers or letters

Am trying to come up with a query where I can return back values where the the distance between the letters could be one or more than one for the chosen letter.
For example:
I have two columns which have letters in Column A and in Column B. I want to return back with rows when column B distance is more than Column A by one or more letters.

It's not clear to me, when you say "greater" if you mean that the distance between any two letters is 2 or 3 (Column B can be alphabetically before or after Column A, by a distance of 2 or 3).. Or if Column B has to be alphabetically after Column A, by a distance of 2 or 3
Because I'm not certain what you're talking about, I present two options. Read the "if" rule and choose the one that applies to your situation, then use the query under it:
If columnA is D and columnB can be any of: A B F G
SELECT * FROM table WHERE ABS(ASCII(columna) - ASCII(columnb)) IN (2,3)
If columnA is D and columnB can be any of: F G
SELECT * FROM table WHERE ASCII(columnb) - ASCII(columna) IN (2,3)
Edit1: Per your later comment, you are now saying that the distance is not just 2 or 3 letters (the first line of your question states "2 or 3") but any number of letters distance equal to or greater than 2:
SELECT * FROM table WHERE ASCII(columnb) - ASCII(columna) >= 2
Overall the technique isn't much different to the above queries and there are many ways to specify what you want:
SELECT * FROM table
WHERE
ASCII(columnb) - ASCII(columna)
BETWEEN <some_number_here> AND <other_number_here>
Ultimately the most important thing is to note the use of ASCII function, which gives us the ascii char code of the first letter in a string:
ASCII('ABCD') => 65
And we can use maths on this to work out if a letter distance from 'A' is more than 1 etc..
Probably also worth noting that ASCII() works on single byte ascii characters. If your data is multibyte (Unicode), you might need to use ORD() instead:
Edit2: Your latest edit to the question revises the limit to "B greater than A by one or more" which is equivalent to >= 1 ..
The question seems not to have a clear spec, please treat the answer as a guide for the general technique:
--for an open ended distance, ascii chars
SELECT * FROM table WHERE ASCII(columnb) - ASCII(columna) >= <some_distance>
--for an open ended distance, unicode
SELECT * FROM table
WHERE ORD(columnb) - ORD(columna) >= <some_distance>
--for a definite range of distances (replace … appropriately)
SELECT * FROM table
WHERE ... BETWEEN <some_distance> AND <some_other_distance>

this will work indeed:
select * from table_name where ascii(col_1)+2=ascii(col_2);

You can use something like this if you need it to be exactly 2 or 3 letters greater
select Column A, ColumnB from table name where ASCII(ColumnB) - ASCII(ColumnA) in (2,3)
If you want all those rows where the the difference is equal more than 2, then use this
select Column A, ColumnB from table name where ASCII(ColumnB) - ASCII(ColumnA) >=2

this is where you can make ascii in action..
select * from SampleTable where (ASCII(sampleTable.ColumnB) - ASCII(ColumnA)) >= 2;

How to show blank instead of column value for all duplicated columns of a SQL query?

There is a similar question which answer this for a known number of columns and only a single selection column. But the problem here is that
I have no knowledge of columns (count, type) of a specified SQL query and also I want to blank for all columns not a single column.
For example lets say I have following query.
Select * from View1
Result :
Column(1) Column(2) Column(..) Column(N)
1 A Sales 1500
2 C Sales 2500
3 C Sales 2500
4 A Development 2500
Expected result :
Column(1) Column(2) Column(..) Column(N)
1 A Sales 1500
2 C 2500
3
4 A Development
Pseudo SQL Query :
EXEC proc_blank_query_result 'Select * from View1'

If you're in SQL Server 2012 or newer, you can do this with lag, something like this:
select
nullif(column1, lag(column1) over (order by yourorderbyclause)) as column1,
nullif(column2, lag(column2) over (order by yourorderbyclause)) as column2,
...
from
View1
To make it dynamic, well then you have to parse a lot of metadata from the query. Using sp_describe_first_result_set might be a good idea, or use select into a temp. table and parse the columns of it.

Average Distinct Values in a single column in Power Pivot

I have a column in PowerPivot that basically goes:
1
1
2
3
4
3
5
4
If I =AVERAGE([Column]), it's going to average all 8 values in the sample column. I just need the average of the distinct values (i.e., in the example above I want the average of (1,2,3,4,5).
Any thoughts on how to go about doing this? I tried a combination of =(DISTINCT(AVERAGE)) but it gives a formula error.
Thanks!!
Kevin

There must be a cleaner way of doing this but here is one method which uses a measure to get the sum of the values divided by the number of times it appears (to basically give the original value) then uses an iterative function to do it for each unique value.
Apologies for the uninspired measure names:
[m1] = SUM(table1[theValue]) / COUNTROWS(Table1)
[m2] = AVERAGEX(VALUES(Tables1[theValue]), [m1])
Assuming your table is caled table1 and the column is called theValue

SQL Server SQL Select: How do I select rows where sum of a column is within a specified multiple?

I have a process that needs to select rows from a Table (queued items) each row has a quantity column and I need to select rows where the quantities add to a specific multiple. The mulitple is the order of between around 4, 8, 10 (but could in theory be any multiple. (odd or even)
Any suggestions on how to select rows where the sum of a field is of a specified multiple?

My first thought would be to use some kind of MOD function which I believe in SQL server is the % sign. So the criteria would be something like this
WHERE MyField % 4 = 0 OR MyField % 8 = 0
It might not be that fast so another way might be to make a temp table containing say 100 values of the X times table (where X is the multiple you are looking for) and join on that

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

grouping with right function sql - sql

Have you tried something ??? select LEFT([Column A], 3), Count ([Column B]) from table1 Group by LEFT([Column A], 3)

Related

How to take average of two columns row by row in SQL?

Getting rows for values greater by 3 numbers or letters

How to show blank instead of column value for all duplicated columns of a SQL query?

Average Distinct Values in a single column in Power Pivot

SQL Server SQL Select: How do I select rows where sum of a column is within a specified multiple?

Categories

Resources