Can you explain the meaning of a minus in a SQL select statement?

Can you explain the meaning of a minus in a SQL select statement? - sql

I am working with SQL and I found this snippet, my question is: what does it mean those minus symbols (-) inside the select statement? I know is a kind of some trick, but I can't find information online about how it is used, please any insight would be welcome.
I am referring to:
SELECT - sum(st.sales) AS sales
- sum(st.orders) AS orders
- sum(st.aov) AS aov
It seems to be related to ledger tables, if you have any documentation, blog or pdf please give me the link to check it.
The full SQL looks like this:
INSERT INTO sales_test
WITH source_query AS --find the existing values in the ledger table and invert them
(
SELECT
st.og_date
, st.merchant
, st.store_name
, st.country
, st.kam
, st.class
, st.origin
, - sum(st.sales) AS sales
, - sum(st.orders) AS orders
, - sum(st.aov) AS aov
, et.source_file_name
, et.source_file_timestamp
FROM
sales_test st
INNER JOIN
ext_sales_test et
ON
city_hash(et.og_date, et.merchant, et.store_name, et.country, et.kam, et.class, et.origin) = city_hash(st.og_date, st.merchant, st.store_name, st.country, st.kam, st.class, st.origin)
AND st.og_date = et.og_date
AND st.merchant = et.merchant
GROUP BY
st.og_date
, st.merchant
, st.store_name
, st.country
, st.kam
, st.class
, st.origin
, et.source_file_name
, et.source_file_timestamp
)
, union_query AS --if we union the incoming data with the inverted existing data, we get the difference that needs to be ledgered
(
SELECT *
FROM
source_query
UNION ALL
SELECT *
FROM
ext_sales_text
)

It makes the numeric value negative(if numeric value is negative, - - is positive), in your case it first performs the sum and then it makes it negative or positive:
As an example:
USE tempdb;
GO
DECLARE #Num1 INT;
SET #Num1 = 5;
SELECT #Num1 AS VariableValue, -#Num1 AS NegativeValue;
GO
Result set:
VariableValue NegativeValue
------------- -------------
5 -5
(1 row(s) affected)
Further info here

Related

How to debug the error of invalid identifier in the following PL/SQL block?

For the below PLSQL code , I am getting Error report -
ORA-00904: "SRC"."PART_ID_CONSOLIDATED": invalid identifier
ORA-06512: at line 37 , I tried debugging it by printing the values of I.item_no , I.PART_ID_CONSOLIDATED , getting values correctly printed but still its showing invalid identifier , unable to debug this , please guide .
DECLARE
BEGIN
FOR T IN
(
SELECT * FROM TDTEMP.ITEM_MTRL_SPLIT_TMP
where regexp_like(MATERIAL_NAME, '[[:digit:]],[[:digit:]] % +[[:alpha:]]*')
)
LOOP
FOR I IN
(
WITH parsed as(
SELECT /*+ parallel(t,8) materialize */
T.item_no,T.item_type,T.bu_code_sup,T.bu_type_sup,T.FROM_PACK_DATE,T.PART_ID,T.MATERIAL_NAME,T.PART_ID_CONSOLIDATED,T.reporting_name,
regexp_substr(REGEXP_REPLACE(replace(replace(T.MATERIAL_NAME,'% ','% '),', ',','), '(\d+),(\d+)', '\1.\2'),'[^,]+',1,ROWNUM)
AS split_value
FROM dual
CONNECT BY level <= regexp_count(REGEXP_REPLACE(replace(T.MATERIAL_NAME,'% /','%/'), '(\d+),(\d+)', '\1.\2'),'[^,]+')
)
,in_pairs as(
select /*+ parallel(k,8) materialize */
item_no,item_type,bu_code_sup,bu_type_sup,FROM_PACK_DATE,PART_ID,material_name,PART_ID_CONSOLIDATED,reporting_name
,regexp_substr(split_value, '[0-9]+[.]*[0-9]+') as percentage
,trim(substr(split_value, instr(split_value, '%') + 1)) as component
from parsed k where split_value LIKE '%\%%' ESCAPE '\'
)
select /*+ parallel(it,8) */
distinct item_no,item_type,bu_code_sup,bu_type_sup,FROM_PACK_DATE,PART_ID,material_name,percentage,component,PART_ID_CONSOLIDATED,reporting_name
from in_pairs it
)
LOOP
merge into TDTEMP.ITEM_MTRL_SPLIT_TMP targ
using (
SELECT I.item_no,I.item_type,I.bu_code_sup,I.bu_type_sup,I.FROM_PACK_DATE,I.PART_ID,I.material_name,I.percentage,I.component,I.PART_ID_CONSOLIDATED,
I.reporting_name FROM DUAL
)src
on ( targ.item_no = src.item_no
and targ.item_type = src.item_type
and targ.bu_code_sup = src.bu_code_sup
and targ.bu_type_sup = src.bu_type_sup
and targ.part_id = src.part_id
and targ.from_pack_date = src.from_pack_date
and targ.component = src.component
and targ.percentage = src.percentage
and targ.material_name = src.material_name
and targ.PART_ID_CONSOLIDATED = src.PART_ID_CONSOLIDATED
)
when not matched then
insert (item_no ,
item_type ,
bu_code_sup,
bu_type_sup ,
from_pack_date ,
part_id ,
part_id_consolidated,
material_name ,
percentage ,
component ,
reporting_name,
plastic,
ii_date )
values( src.item_no ,
src.item_type ,
src.bu_code_sup ,
src.bu_type_sup ,
src.from_pack_date,
src.part_id ,
src.part_id_consolidated ,
src.material_name ,
src.percentage ,
src.component ,
src.reporting_name,
'N',
sysdate
)
when matched then
update set targ.percentage = src.percentage ,
targ.component = src.component ;
END LOOP ;
END LOOP ;
END ;

The short answer is that because you are selecting from dual, which only has a single dummy column, you need to give aliases to all of the values you are selecting in your using clause:
using (
SELECT I.item_no AS item_no, I.item_type AS item_type, I.bu_code_sup AS bu_code_sup,
I.bu_type_sup AS bu_type_sub, I.FROM_PACK_DATE AS from_pack_date,
I.PART_ID AS part_id, I.material_name AS material_name, I.percentage AS percentage,
I.component AS component, I.PART_ID_CONSOLIDATED AS part_id_consolidated,
I.reporting_name AS reporting_name
FROM DUAL
) src
db<>fiddle with a very simplified example. To some extent that also addresses the "how to debug" part of your question - break your failing block of code down into smaller and simpler parts to make it easier to see what's happening. And if you still can't work it out, you're much closer to a minimum reproducible example you can post without so much noise for others to wade through.
The long answer is to avoid the loops and do a single merge, as MTO suggested in a comment.

How to split a column into two columns based on the value in the another column

I have below data in the Ms SQL server table.
I would like to get the output like below.
I have tried two sets of queries but it didn't helped me.
1st set query gives me the null values
Query
SELECT
[id]
, [sav]
, [cat]
, [tech]
, [asset]
, CASE
WHEN [objname] = 'FieldName'
THEN [stringvalue]
END AS [fieldname]
, CASE
WHEN [objname] = 'FieldValue'
THEN [stringvalue]
END AS [fieldvalue]
FROM [test].[dbo].[sample];
Output
2nd set query gives me 0 as field value, because i have hard coded it.
Query
SELECT
ROW_NUMBER() OVER(ORDER BY [fieldname]) AS 'id'
, [sav]
, [cat]
, [tech]
, [asset]
, [fieldname]
, 0 AS [fieldvalue]
FROM [test].[dbo].[sample] PIVOT(MAX([stringvalue]) FOR [objname] IN(
[fieldname])) [p]
WHERE [fieldname] IS NOT NULL;
Output
How to achieve it ?

You have a very arcane data structure. SQL tables are inherently unordered. From what I can tell, the SQL value is in the "next" row based on the id.
If so, you can use lead():
select . . .,
stringvalue as fieldname, next_string_value as stringvalue
from (select t.*, lead(t.stringvalue) over (order by id) as next_string_value
from t
) t
where t.objname = 'objname';
If you are really using SQL Server 2008, you can use a self-join. This does assume that the ids have no gaps in them.

Why I need Group by in this simple query?

UPDATE :
-----
the error might be in sum(si.amt_pd) from item table (as there is no relation) :
select SUM(si.amt_pd)amt_pd from [HMIS_REPORTING].HMIS_RPT_ME.dbo.item i
where
is there a work around?
----------
I am trying to run this query. The query just fetches the amount of a month based on some tables. It is just a part of a big query.
select s.sales_Contract_Nbr
, s.Sales_Id
, s.Sale_Dt
, YEAR(s.Sale_Dt) 'YEAR'
, MONTH(s.Sale_Dt) 'MONTH'
, s.Sales_Need_TYpe_Cd
, s.Sales_Status_Cd
, si.Posted
, s.location_Cd
, jan2011 = (
select SUM(si.amt_pd)amt_pd
from [HMIS_REPORTING].HMIS_RPT_ME.dbo.item i
where i.Item_Id = si.Product_Item_ID
and i.Item_Cd <> '*INT'
and convert(varchar(10),SI.Sales_Item_Dt,126) >= '2011-01-01'
and convert(varchar(10),SI.Sales_Item_Dt,126) >= '2011-01-31'
) INTO dbo.#a_acomparision
FROM [HMIS_REPORTING].HMIS_RPT_ME.dbo.Sales S
, [HMIS_REPORTING].HMIS_RPT_ME.dbo.Sales_Item SI
WHERE SI.Sales_Id = S.Sales_Id
and s.Sales_Contract_Nbr in (
select distinct (Sales_Contract_Nbr)
from mountainviewContracts
where Sales_Contract_Nbr <> '')
but I am getting the following error message.
Msg 8120, Level 16, State 1, Line 1
Column 'HMIS_REPORTING.HMIS_RPT_ME.dbo.Sales.Sales_Contract_Nbr' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
I just can't understand why my query should have a group by for sales_contract_nbr and even if I put in the group by clause it tells me that inner query si.Product_item_id and SI.sales_item_dt should also be contained in group by clause.
Please help me out.
Thanks in advance

This is a very subtle problem. However, I think the subquery should be:
select SUM(i.amt_pd)amt_pd from [HMIS_REPORTING].HMIS_RPT_ME.dbo.item i
That is, the alias should be i not si.
What is happening is that the sum in the subquery is on a value in the outer query. So, the SQL compiler assumes an aggregation query. As soon as the first column is found that is not an aggregation, it complains with the message that you have.
By the way, you should use proper join syntax, so you from clause looks like:
FROM [HMIS_REPORTING].HMIS_RPT_ME.dbo.Sales S join
[HMIS_REPORTING].HMIS_RPT_ME.dbo.Sales_Item SI
on SI.Sales_Id = S.Sales_Id

As #Gordon Linoff says, this is almost certainly because the query optimizer is treating this like a SUM operation, normalizing away the subquery for "jan2001".
If the amt_pd column is present in the ITEM table, Gordon's solution is the right one.
If not, you have to add the group by statement, as below.
select s.sales_Contract_Nbr
, s.Sales_Id
, s.Sale_Dt
, YEAR(s.Sale_Dt) 'YEAR'
, MONTH(s.Sale_Dt) 'MONTH'
, s.Sales_Need_TYpe_Cd
, s.Sales_Status_Cd
, si.Posted
, s.location_Cd
, jan2011 = (
select SUM(si.amt_pd)amt_pd
from [HMIS_REPORTING].HMIS_RPT_ME.dbo.item i
where i.Item_Id = si.Product_Item_ID
and i.Item_Cd <> '*INT'
and convert(varchar(10),SI.Sales_Item_Dt,126) >= '2011-01-01'
and convert(varchar(10),SI.Sales_Item_Dt,126) >= '2011-01-31'
) INTO dbo.#a_acomparision
FROM [HMIS_REPORTING].HMIS_RPT_ME.dbo.Sales S
, [HMIS_REPORTING].HMIS_RPT_ME.dbo.Sales_Item SI
WHERE SI.Sales_Id = S.Sales_Id
and s.Sales_Contract_Nbr in (
select distinct (Sales_Contract_Nbr)
from mountainviewContracts
where Sales_Contract_Nbr <> '')
GROUP BY s.sales_Contract_Nbr
, s.Sales_Id
, s.Sale_Dt
, YEAR
, MONTH
, s.Sales_Need_TYpe_Cd
, s.Sales_Status_Cd
, si.Posted
, s.location_Cd

speed up SQL Query

I have a query which is taking some serious time to execute on anything older than the past, say, hours worth of data. This is going to create a view which will be used for datamining, so the expectations are that it would be able to search back weeks or months of data and return in a reasonable amount of time (even a couple minutes is fine... I ran for a date range of 10/3/2011 12:00pm to 10/3/2011 1:00pm and it took 44 minutes!)
The problem is with the two LEFT OUTER JOINs in the bottom. When I take those out, it can run in about 10 seconds. However, those are the bread and butter of this query.
This is all coming from one table. The ONLY thing this query returns differently than the original table is the column xweb_range. xweb_range is a calculated field column (range) which will only use the values from [LO,LC,RO,RC]_Avg where their corresponding [LO,LC,RO,RC]_Sensor_Alarm = 0 (do not include in range calculation if sensor alarm = 1)
WITH Alarm (sub_id,
LO_Avg, LO_Sensor_Alarm, LC_Avg, LC_Sensor_Alarm, RO_Avg, RO_Sensor_Alarm, RC_Avg, RC_Sensor_Alarm) AS (
SELECT sub_id, LO_Avg, LO_Sensor_Alarm, LC_Avg, LC_Sensor_Alarm, RO_Avg, RO_Sensor_Alarm, RC_Avg, RC_Sensor_Alarm
FROM dbo.some_table
where sub_id <> '0'
)
, AddRowNumbers AS (
SELECT rowNumber = ROW_NUMBER() OVER (ORDER BY LO_Avg)
, sub_id
, LO_Avg, LO_Sensor_Alarm
, LC_Avg, LC_Sensor_Alarm
, RO_Avg, RO_Sensor_Alarm
, RC_Avg, RC_Sensor_Alarm
FROM Alarm
)
, UnPivotColumns AS (
SELECT rowNumber, value = LO_Avg FROM AddRowNumbers WHERE LO_Sensor_Alarm = 0
UNION ALL SELECT rowNumber, LC_Avg FROM AddRowNumbers WHERE LC_Sensor_Alarm = 0
UNION ALL SELECT rowNumber, RO_Avg FROM AddRowNumbers WHERE RO_Sensor_Alarm = 0
UNION ALL SELECT rowNumber, RC_Avg FROM AddRowNumbers WHERE RC_Sensor_Alarm = 0
)
SELECT rowNumber.sub_id
, cds.equipment_id
, cds.read_time
, cds.LC_Avg
, cds.LC_Dev
, cds.LC_Ref_Gap
, cds.LC_Sensor_Alarm
, cds.LO_Avg
, cds.LO_Dev
, cds.LO_Ref_Gap
, cds.LO_Sensor_Alarm
, cds.RC_Avg
, cds.RC_Dev
, cds.RC_Ref_Gap
, cds.RC_Sensor_Alarm
, cds.RO_Avg
, cds.RO_Dev
, cds.RO_Ref_Gap
, cds.RO_Sensor_Alarm
, COALESCE(range1.range, range2.range) AS xweb_range
FROM AddRowNumbers rowNumber
LEFT OUTER JOIN (SELECT rowNumber, range = MAX(value) - MIN(value) FROM UnPivotColumns GROUP BY rowNumber HAVING COUNT(*) > 1) range1 ON range1.rowNumber = rowNumber.rowNumber
LEFT OUTER JOIN (SELECT rowNumber, range = AVG(value) FROM UnPivotColumns GROUP BY rowNumber HAVING COUNT(*) = 1) range2 ON range2.rowNumber = rowNumber.rowNumber
INNER JOIN dbo.some_table cds
ON rowNumber.sub_id = cds.sub_id

It's difficult to understand exactly what your query is trying to do without knowing the domain. However, it seems to me like your query is simply trying to find, for each row in dbo.some_table where sub_id is not 0, the range of the following columns in the record (or, if only one matches, that single value):
LO_AVG when LO_SENSOR_ALARM=0
LC_AVG when LC_SENSOR_ALARM=0
RO_AVG when RO_SENSOR_ALARM=0
RC_AVG when RC_SENSOR_ALARM=0
You constructed this query assigning each row a sequential row number, unpivoted the _AVG columns along with their row number, computed the range aggregate grouping by row number and then joining back to the original records by row number. CTEs don't materialize results (nor are they indexed, as discussed in the comments). So each reference to AddRowNumbers is expensive, because ROW_NUMBER() OVER (ORDER BY LO_Avg) is a sort.
Instead of cutting this table up just to join it back together by row number, why not do something like:
SELECT cds.sub_id
, cds.equipment_id
, cds.read_time
, cds.LC_Avg
, cds.LC_Dev
, cds.LC_Ref_Gap
, cds.LC_Sensor_Alarm
, cds.LO_Avg
, cds.LO_Dev
, cds.LO_Ref_Gap
, cds.LO_Sensor_Alarm
, cds.RC_Avg
, cds.RC_Dev
, cds.RC_Ref_Gap
, cds.RC_Sensor_Alarm
, cds.RO_Avg
, cds.RO_Dev
, cds.RO_Ref_Gap
, cds.RO_Sensor_Alarm
--if the COUNT is 0, xweb_range will be null (since MAX will be null), if it's 1, then use MAX, else use MAX - MIN (as per your example)
, (CASE WHEN stats.[Count] < 2 THEN stats.[MAX] ELSE stats.[MAX] - stats.[MIN] END) xweb_range
FROM dbo.some_table cds
--cross join on the following table derived from values in cds - it will always contain 1 record per row of cds
CROSS APPLY
(
SELECT COUNT(*), MIN(Value), MAX(Value)
FROM
(
--construct a table using the column values from cds we wish to aggregate
VALUES (LO_AVG, LO_SENSOR_ALARM),
(LC_AVG, LC_SENSOR_ALARM),
(RO_AVG, RO_SENSORALARM),
(RC_AVG, RC_SENSOR_ALARM)
) x (Value, Sensor_Alarm) --give a name to the columns for _AVG and _ALARM
WHERE Sensor_Alarm = 0 --filter our constructed table where _ALARM=0
) stats([Count], [Min], [Max]) --give our derived table and its columns some names
WHERE cds.sub_id <> '0' --this is a filter carried over from the first CTE in your example

Bubbling Up Columns in Sql

Pardon the convoluted example, but I believe there is something fundamental about sql I am missing and I'm not sure what it is. I have this crazy query...
SELECT *
FROM (
SELECT *
FROM (
SELECT #t1 := #t1 +1 AS leaderboard_entry_youngness_rank, 1 - #t1 /100 AS
leaderboard_entry_youngness_based_on_expiry, leaderboard_entry . * ,
NOW( ) - leaderboard_entry_timestamp AS leaderboard_entry_age_in_some_units,
TO_DAYS( NOW( ) ) - TO_DAYS( leaderboard_entry_timestamp )
AS leaderboard_entry_age_in_days
FROM leaderboard_entry) AS inner_temp
NATURAL JOIN leaderboard
NATURAL JOIN user
WHERE (
leaderboard_load_key = 'sk-en-adjectives-1'
OR leaderboard_load_key = '-sk-en-adjectives-1'
)
AND leaderboard_quiz_mode = '0'
ORDER BY leaderboard_entry_age_in_some_units ASC , leaderboard_entry_timestamp ASC
LIMIT 0 , 100
) AS outer_temp
ORDER BY leaderboard_entry_elapsed_time_ms ASC , leaderboard_entry_timestamp ASC
LIMIT 0 , 50
I added the second nested SELECT statement because the user_name in the user table was not being returned in the outermost query. But now the leaderboard_entry_youngness_based_on_expiry field, which is being generated based on a row index ratio, is not working correctly.
If I remove the second nested SELECT statement, the leaderboard_entry_youngness_based_on_expiry works as expected, but the user_name column is not returned.
How can I satisfy both? Why is this happening?
Thanks!
This stems from the following question:
Add a numbered list column to a returned MySQL query

In your inner SELECT statement, you do not have user.user_name, that's why username is not returned. Remove the outer query, do it like earlier but with user.user_name like this:
....
SELECT #t1 := #t1 +1 AS leaderboard_entry_youngness_rank, 1 - #t1 /100 AS
leaderboard_entry_youngness_based_on_expiry, leaderboard_entry . * ,
NOW( ) - leaderboard_entry_timestamp AS leaderboard_entry_age_in_some_units,
TO_DAYS( NOW( ) ) - TO_DAYS( leaderboard_entry_timestamp )
AS leaderboard_entry_age_in_days, user.user_name
....

Try putting a ORDER BY in the inner most query, since there currently is no ORDER BY clause, its wrong to say that "is not working correctly".
Check if you take away the outer SELECT * FROM..., see if there are duplicate user_name columns.
BTW, Since you are not using the row index columns in your query, why not just put this logic in the application itself? it will be more reliable doing so.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Can you explain the meaning of a minus in a SQL select statement? - sql

Related

How to debug the error of invalid identifier in the following PL/SQL block?

How to split a column into two columns based on the value in the another column

Why I need Group by in this simple query?

speed up SQL Query

Bubbling Up Columns in Sql

Categories

Resources