Need to identify same characters between two columns' value in spark sql - sql

I have two columns one is code and other one is category
Code: category
321 3210001
432 4320001
5314 5314001
6310 7480001
Based on the above code value is exactly the prefix of category value.
Now I have to write a spark sql query which should provide how many rows are categorized and non categorized
eg: if code and prefix of category values are matching, then it should be categorized else
it is non categorized.
from the above table, 4th row is non-categorized and rest of them are categorized
I am trying to use like, substring, len, case but I am not able to achieve this result.
Could someone please help me out on this?
thanks,
Raja

assumed those columns are string , if not cast them to str by cast(column as string) in the same query below :
select case when instr(category , code) = 1
then 'Categorized'
else 'Non Categorized' end
, count(*) counts
from df
group by case when instr(category , code) = 1
then 'Categorized'
else 'Non Categorized' end

Related

ORA-00909: invalid number of arguments when using NVL

I have a column that gets the specific amount which has a condition like below.
I tried this one to get his specific value but I getting a multiple rows which it should be single row.
select distinct
(case
when aila.line_type_lookup_code = 'ITEM' and aila.tax_classification_code = 'VAT12 SERVICES' then to_char(aila.assessable_value) else '0'
end) as taxable_lines
from
ap_invoice_lines_all aila
where
aila.invoice_id = '31004'
then I tried this one to replace a null values.
select distinct
(case
when aila.line_type_lookup_code = 'ITEM' and aila.tax_classification_code = 'VAT12 SERVICES' then nvl(to_char(aila.assessable_value,0)) end) as taxable_lines
from
ap_invoice_lines_all aila
where
aila.invoice_id = '31004'
For example I have a table named ap_invoice_lines_all that has a columns name
line_type_lookup_code string
tax_classification_code string
assessable_value double
then the expected output I want based on the query I tried above is
taxable lines
1300
but the one I get is
taxable lines
0
1300
How do I remove the 0 in the result?
Thanks!
First off, you are using to_char wrong. This function should only have a single argument in this context:
nvl(to_char(aila.assessable_value,0))
should be
nvl(to_char(aila.assessable_value),0)
I would recommend against using a case expression like this in such a simple query to improve readability. Case expressions are great tools, but typically decrease the query readability. And in this particular instance, you don't really need an IF..THEN..ELSE function (which is the reason to use case expressions).
Secondly, are you sure there is only record where aila.invoice_id = '31004'? The query will only return two rows if there are actually two rows in the table where that clause is true. In this case it finds one row where assessable_value is null and one where it isn't.
In any case, to remove the zero (or in this case a null value) from the resultset, you can simply do this:
select distinct aila.assessable_value as taxable_lines
from ap_invoice_lines_all aila
where aila.invoice_id = '31004'
and aila.line_type_lookup_code = 'ITEM' -- Originally a condition in the case statement
and aila.tax_classification_code = 'VAT12 SERVICES' -- Originally a condition in the case statement
and aila.assessable_value is not null; -- Removes any null values from the result
Or if you want null values to be replaced by a zero;
select distinct nvl(aila.assessable_value, 0) as taxable_lines
from ap_invoice_lines_all aila
where aila.invoice_id = '31004'
and aila.line_type_lookup_code = 'ITEM' -- Originally a condition in the case statement
and aila.tax_classification_code = 'VAT12 SERVICES' -- Originally a condition in the case statement
Note that the distinct clause will cause all rows where assessable_value to be grouped into a single row if multiple rows are a possibility. Remove the distinct clause if you want all rows where assessable_value is null to show in the result as 0.
Lastly, you might want to think about implementing a primary key (if the table doesn't have one) or unique index on invoice_id, if there shouldn't ever be duplicate ids. And it's simply good practice to do so for ID columns which are used often in queries and should be unique.

How to Sum two columns meeting certain conditions

What I want to do is basically merge the two highlighted code, so that the end result is it using this SUM formula for only the items matching the LIKE criteria (under WHERE) - so that I am still able to pull GameDescriptions that do not include the LIKE criteria. Hope that makes sense... enter image description here
I think you just need to replace that part of the WHERE statement with a case statement in the SUM, like this:
SELECT
SUM(CASE WHEN GameDescription LIKE '5R25L%' THEN NetRevenue
ELSE 0 END) / COUNT(DISTINCT AccountingDate)
AS 'ES Created TheoWPU'
FROM Prime.dbo.PivotData

How to extract multiple substring keywords in SQL Server and show results in multiple columns?

I have a table called Usage and there is a column called TEXT.
This TEXT columns holds a string value that looks something like this below.
"TIME EXPENSE ACCRUALS COST DC WITH RATES XX INTEGRATION TIME OD TRAVEL..."
I would like to write a SQL query that would search this column by the selected keywords like TIME or TIME OD or COST, etc., and if the search is true return a check or X that represent that there is that keyword in there or nothing if it doesn't.
For example, if I ran a substring looking for my keywords, my results would like this:
I hope this helps identify what I'm looking for. Any help would be appreciated.
Image of current data fields
How about:
select
section,
name,
case when charindex('TIME', text) > 0 then 'X' else '' end as Time,
case when charindex('EXPENSE', text) > 0 then 'X' else '' end as Expense,
... all other columns here
from usage;

Literal Does Not Match 01861

select case
when sale not in TO_date(sale,'MMYY') then 'N'
else sale
end test
from daily sales
I would like to see if the data in the column meets the following criteria using the above select. I am receiving the following error:
LITERAL DOES NOT MATCH FORMAT STRING
small data set
0109
0106
0409
column is a varchar
to_date('0109', 'mmyy') is correct, it will return 01-01-2009 00:00:00 AM (in one possible format). So the problem is likely in the column - you may have one or more value(s) that are not, in fact, four digits. You need to do some troubleshooting.
Something I would try:
select max(length(sale)) as max_len, min(length(sale)) as min_len from table_name
(daily sales cannot be a table name - table names don't have spaces in them)

SSRS CountRows of a specific Field containing a specific Value

I am building an SSRS report. I have a DataSet that contains several Fields, one of which is Doc_Type. Doc_Type can contain several values, one of which is 'Shipment'.
In my report, I want to count the number of Doc_Types that are equal to 'Shipment'. This is what I am using, and it is not working:
=CountRows(Fields!Doc_Type.Value = "Shipments")
The error I get is: "The Value expression for the textrun 'Doc_Type.Paragraphs[0].TextRuns[0]' has a scope parameter that is not valid for an aggregate function. The scope parameter must be set to a string constant that is equal to either the name of a containing group, the name of a containing data region, or the name of a dataset.
You need to use an IIF to evaluate your fields and SUM the results. If your expression is in a table, you could use this:
=SUM(IIF(Fields!Doc_Type.Value = "Shipments", 1, 0))
There are many ways to achieve this.
Method 1
You can set up your expression something like this
=SUM(IIf(Fields!Doc_Type.Value = "Shipments", 1, 0), "YourDataSetName")
Remember SSRS is case sensitive so put your dataset name and Field names correctly.
Method 2
I prefer handling it in SQL as I want to keep the business logic out of RDLs.
You can use window functions to get the shipment count.
Also notice in the case of count I haven't added ELSE 0 condition. Because that will give wrong results. Count doesn't care what is the value inside it. It just counts all Non Null values. ELSE NULL will work.
SELECT Column1, Column2, Column3,
SUM(CASE WHEN Doc_Type = 'Shipments' THEN 1 ELSE 0 END) OVER() ShipmentCount_UsingSum
COUNT(CASE WHEN Doc_Type = 'Shipments' THEN 1 END) OVER() ShipmentCount_UsingCount
FROM myTable
JOIN....
WHERE .....
Now you can use this field in the report.