create and calculate z score column in SQL (mariadb) - sql

As the title says, looking for a simple way to calculate the z score for each row in a column.
I tried the following, and don't know why it doesn't work (returns error ER_INVALID_GROUP_FUNC_USE)
ALTER TABLE tbl
ADD z_score INT;
UPDATE tbl
SET z_score = (values - avg(values))/ stddev(values)
WHERE values IS NOT NULL;
I've seen ways of doing it online, but they feel more complicated than is necessary.

In order to calculate the z-score, you need the average and standard deviation. You don't specify the database you are using, but one method that should work in (almost) all databases is a subquery:
UPDATE tbl
SET z_score = (SELECT (tbl.values - avg(tbl2.values)) / stddev(tbl2.values)
FROM tbl tbl2
)
WHERE values IS NOT NULL;
Most databases also support some sort of JOIN operation in the UPDATE, but the syntax depends on the database.
EDIT:
MariaDB does not support the above Standard SQL syntax. Instead, you can use a CROSS JOIN:
UPDATE tbl CROSS JOIN
(SELECT AVG(values) as avg_values, STDDEV(values) as stddev_values
FROM tbl
) t2
SET tbl2.z_score = (tbl.values - avg_values) / NULLIF(stddev_values, 0)
WHERE values IS NOT NULL AND stddev_values <> 0;

Related

SELECT query to return a row from a table with all values set to Null

I need to make a query but get the value in every field empty. Gordon Linoff give me the clue to this need here:
SQL Empty query results
which is:
select t.*
from (select 1 as val
) v left outer join
table t
on 1 = 0;
This query wors perfectly on PostgreSQL but gets an error when trying to execute it in Microsoft Access, it says that 1 = 0 expression is not admitted. How could it be fixed to work on microsoft access?
Regards,
If the table has a numeric primary key column whose values are non-negative then the following query will work in Access. The primary key field is [ID].
SELECT t2.*
FROM
myTable AS t2
RIGHT JOIN
(
SELECT TOP 1 (ID * -1) AS badID
FROM myTable AS t1
) AS rowStubs
ON t2.ID = rowStubs.badID
This was tested with Access 2010.
I am offering this answer here, even though you didn't think it worked in my edit to your original question. What is the problem?
select t.*
from (select max(col) as maxval from table as t
) as v left join
table as t
on v.val < t.col;
You can use the following query, but it would still need a little "manual coding".
EDITS:
Actually, you do not need the SWITCH function. Modified query below.
Removed the reference to Description column from one line. Still, you would need to use a Text column name (such as Description) in the last line of the query.
For example, the following query would work for the Months table:
select Months.*
from Months
RIGHT OUTER JOIN
(select "" as DummyColumn from Months) Blank_Data
ON Months.Description = Blank_Data.DummyColumn; --hardcoded Description column

SQL IN() operator with condition inside

I've got table with few numbers inside (or even empty): #states table (value int)
And I need to make SELECT from another table with WHERE clause by definite column.
This column's values must match one of #states numbers or if #states is empty then accept all values (like there is no WHERE condition for this column).
So I tried something like this:
select *
from dbo.tbl_docs docs
where
docs.doc_state in(iif(exists(select 1 from #states), (select value from #states), docs.doc_state))
Unfortunately iif() can't return subquery resulting dataset. I tried different variations with iif() and CASE but it wasn't successful. How to make this condition?
select *
from dbo.tbl_docs docs
where
(
(select count(*) from #states) > 0
AND
docs.doc_state in(select value from #states)
)
OR
(
(select count(*) from #states)=0
AND 1=1
)
Wouldn't a left join do?
declare #statesCount int;
select #statesCount = count(1) from #states;
select
docs.*
from dbo.tbl_docs docs
left join #states s on docs.doc_state = s.value
where s.value is not null or #statesCount = 0;
In general, whenever your query contains sub-queries, you should stop for five minutes, and think hard about whether you really need a sub-query at all.
And if you've got a server capable of doing that, in many cases it might be better to preprocess the input parameters first, or perhaps use constructs such as MS SQL's with.
select *
from dbo.tbl_docs docs
where exists (select 1 from #states where value = doc_state)
or not exists (select 1 from #state)

SQL Update with select MIN from specific group in one table

Im really new to SQL. I searched for an answer that would match my requirements AND I would understand what has been done- I failed obv. So here it goes:
I am making a programm that would keep data for marathon tournament. So I have a table StageResults with columns: StageNo ParticipantNumber ParticipantGroup Time(as in distance time in full sec's thus int) and Points
An example would look like:
- 1|01|M21|500|X
- 1|22|M21|550|X
- 1|45|M21|530|X
- 1|47|F09|600|X
- 1|09|F09|630|X
- 2|01|M21|515|X
- 2|45|M21|520|X
So I want the fastest member of each group in each stage to get 1000 points. In the back of my head I feel, that I could just write 1 single query for this, I tried for several hours.
Best that I have right now is this:
SELECT c1.ParticipantNumber, c1.ParticipantGroup, c1.Time
FROM StageResults AS c1
LEFT JOIN StageResults AS c2
ON c1.StageNo = c2.StageNo
AND c1.ParticipantGroup = c2.ParticipantGroup
AND c1.Time < c2.Time;
I used this under INSERT statement. No syntax errors but error, that I am trying to insert a duplicate primary key. I think this can be solved by adding GROUP BY statement. So I havent really tested this.
I would ultimately like to set 1000 points for fastest participant in each run(stage) and each group(I mean it should happen automatically). And then based on the fastest guy, calculate points for all other guys.(But thats later and if i figure out how to add these 1k pts, I think ill manage)
So I have to add this logic inside UPDATE statement. I am not able to. Im just lost.
Any advice is welcome. Maybe im thinking in the wrong direction completely on how to do this.
Any help will be much appreciated.
The query that identifies the rows might look like this:
select t.*
from table t
where not exists (select 1
from table t2
where t2.ParticipantGroup = t.ParticipantGroup and
t2.StageNo = t.StageNo and
t2.time < t.time
);
The question is then how you turn this into an update. For MySQL, you would do:
update table StageResults sr join
(select t.*
from table t
where not exists (select 1
from table t2
where t2.ParticipantGroup = t.ParticipantGroup and
t2.StageNo = t.StageNo and
t2.time < t.time
)
) toupdate
on toupdate.ParticpantNumber = sr.ParticpantNumber
set sr.points = sr.points + 1000;
The syntax for SQL Server would be a bit different, but your question is tagged MySQL.
EDIT:
For SQL Server:
with toupdate as (select t.*
from table t
where not exists (select 1
from table t2
where t2.ParticipantGroup = t.ParticipantGroup and
t2.StageNo = t.StageNo and
t2.time < t.time
)
)
update toupdate
set points = points + 1000;

Return a rowset and set a variable in an "IN" clause in SQL Server

I want use the SQL Server IN operator and also set a variable to a column value. Is this possible?
My code is like this:
DECLARE #SubkindId as tinyint;
SELECT NAME FROM SampleTable001 WHERE
Id in (SELECT Id, #SubkindId = Subkind FROM SampleTable002)
ORDER BY Name;
My issue is: I want to set the #SubkindId variable in the inner select statement.
Can It Be Done?
In SQL Server you can't SELECT a result set and SET variables in the same statement (though you can in MySQL). Sorry. But there may be another way to get what you want. Unfortunately, what you want is not completely clear.
Assuming you want to do a SELECT and at the same time return another value into a variable, you have to handle the issue that your query can return multiple rows, so in that case, which one would you want to return into #SubkindId?
Now, I may have misunderstood, and instead of trying to pull the column value into the variable, you instead want to pull only the row where the SubkindId matches the value already in the variable (though you didn't show assigning a value to it first, so this seems less likely).
Please confirm which is the case and answer the above questions, and I can help you more.
In the meantime, I'll try to give you answers for both scenarios.
First, let me mention that I recommend against using the IN() syntax with a subquery returning a list of IDs. It is poor practice in my opinion because it usually demonstrates that the person doesn't really know how to JOIN properly, and as soon as the query gets a little complicated, not only that person but even the best professional SQL Server query writer can get lost (... WHERE x IN (SELECT ... WHERE y IN (SELECT ... WHERE z NOT IN (...))) which soon leads to a serious case of what!?!?!?!. Just use JOINs, and if required, semi-joins (introduced with an EXISTS clause).
Query and Return a Value
If what you really wanted was to get access to the values that the SELECT statement found while doing its join, it might look something like this:
DECLARE #KindsAndSubkinds TABLE (
Name varchar(100),
SubkindId tinyint
);
INSERT #KindsAndSubkinds
SELECT
T1.Name,
T2.SubkindId
FROM
dbo.SampleTable001 T1
INNER JOIN dbo.SampleTable002 T2
ON T1.Id = T2.Id
SELECT DISTINCT Name
FROM #KindsAndSubkinds
ORDER BY Name;
-- Now you can something with the `SubkindId`s in the #KindsAndSubkinds table variable.
Just Query
If you really were just trying to query rather than return a value, this is what I would recommend:
DECLARE #SubkindId as tinyint;
SET #SubkindId = 5;
SELECT
T1.Name
FROM
dbo.SampleTable001 T1
INNER JOIN dbo.SampleTable002 T2
ON T1.Id = T2.Id
WHERE
T2.Subkind = #SubkindId
ORDER BY
T1.Name;
If there are multiple rows in SampleTable002 but you don't want them in the result set, then:
SELECT
T1.Name
FROM
dbo.SampleTable001 T1
WHERE
EXISTS (
-- This semi-join requires at least one row to exist
-- but doesn't increase the row count
SELECT *
FROM dbo.SampleTable002 T2
WHERE
T1.Id = T2.Id
AND T2.Subkind = #SubkindId
)
ORDER BY
T1.Name;
I hope this helps.
Do it like this:
DECLARE #SubkindId as tinyint
SELECT [NAME]
FROM SampleTable001
WHERE Id in (SELECT Id
from SampleTable002
WHERE Subkind=#SubkindId)
order by [Name]
or by using JOIN
DECLARE #SubkindId as tinyint
SELECT [NAME]
FROM SampleTable001 a
INNER JOIN SampleTable002 b
ON a.id = b.id
WHERE b.Subkind=#SubkindId
order by [Name]

Sql query that numerates the returned result

How to write one SQL query that selects a column from a table but returns two columns where the additional one contains an index of the row (a new one, starting with 1 to n). It must be without using functions that do that (like row_number()).
Any ideas?
Edit: it must be a one-select query
You can do this on any database:
SELECT (SELECT COUNT (1) FROM field_company fc2
WHERE fc2.field_company_id <= fc.field_company_id) AS row_num,
fc.field_company_name
FROM field_company fc
SET NOCOUNT ON
DECLARE #item_table TABLE
(
row_num INT IDENTITY(1, 1) NOT NULL PRIMARY KEY, --THE IDENTITY STATEMENT IS IMPORTANT!
field_company_name VARCHAR(255)
)
INSERT INTO #item_table
SELECT field_company_name FROM field_company
SELECT * FROM #item_table
if you are using Oracle or a database that supports Sequence objects, make a new db sequence object for this purpose. Next create a view, and run this.
insert into the view as select column_name, sequence.next from table
In mysql you can :
SELECT Row,Column1
FROM (SELECT #row := #row + 1 AS Row, Column1 FROM table1 )
As derived1
I figured out a hackish way to do this that I'm a bit ashamed of. On Postgres 8.1:
SELECT generate_series, (SELECT username FROM users LIMIT 1 OFFSET generate_series) FROM generate_series(0,(SELECT count(*) - 1 FROM users));
I believe this technique will work even if your source table does not have unique ids or identifiers.
On SQL Server 2005 and higher, you can use OVER to accomplish this:
SELECT rank() over (order by company_id) as rownum
, company_name
FROM company