SQL Output Rows as columns - sql

I have a table that tests an item and stores any faliures similar to:
Item|Test|FailureValue
1 |1a |"ZZZZZZ"
1 |1b | 123456
2 |1a |"MMMMMM"
2 |1c | 111111
1 |1d |"AAAAAA"
Is there a way in SQL to essential pivot these and have the failure values be output to individual columns? I know that I can already use STUFF to achieve what I want for the Test field but I would like the results as individual columns if possible.
I'm hoping to achieve something like:
Item|Tests |FailureValue1|FailureValue2|FailureValue3|Failure......
1 |1a,1b |"ZZZZZZ" |123456 |NULL |NULL ......
2 |1a,1b |"MMMMMM" |111111 |"AAAAAA" |NULL ......
Kind regards
Matt

Related

How to merge two variables in a Dataset using SAS

I have a dataset from an imported file.
Now there are two variables that need to be merged into one variable because the data is identical.
arr and arr_nbr should be merged into arr_nbr.
How can I get that done?
Original:
|name |db |arr |arr_nbr|
+-----+--------+----+-------+
|john |10121960|0456| |
|jane |04071988| |8543 |
|mia |01121955|9583| |
|liam |23091973| |7844 |
Desired output:
|name |db |arr_nbr|
+-----+--------+-------+
|john |10121960|0456 |
|jane |04071988|8543 |
|mia |01121955|9583 |
|liam |23091973|7844 |
Given that there are leading 0's in your desired output, I assume they are all character variables. In that case, use the COALESCEC function. It returns the first non-null or nonmissing value.
data want;
set have;
arr_nbr = coalescec(arr, arr_nbr);
drop arr;
run;
name db arr_nbr
john 10121960 0456
jane 04071988 8543
mia 01121955 9583
liam 23091973 7844

Using LIKE operator for multiple words in PySpark

I have a DataFrame df in PySpark, like a one shown below -
+-----+--------------------+-------+
| ID| customers|country|
+-----+--------------------+-------+
|56 |xyz Limited |U.K. |
|66 |ABC Limited |U.K. |
|16 |Sons & Sons |U.K. |
|51 |TÜV GmbH |Germany|
|23 |Mueller GmbH |Germany|
|97 |Schneider AG |Germany|
|69 |Sahm UG |Austria|
+-----+--------------------+-------+
I would like to keep only those rows where ID starts from either 5 or 6. So, I want my final dataframe to look like this -
+-----+--------------------+-------+
| ID| customers|country|
+-----+--------------------+-------+
|56 |xyz Limited |U.K. |
|66 |ABC Limited |U.K. |
|51 |TÜV GmbH |Germany|
|69 |Sahm UG |Austria|
+-----+--------------------+-------+
This can be achieved in many ways and it's not a problem. But, I am interested in learning how this can be done using LIKE statement.
Had I only been interested in those rows where ID starts from 5, it could have been done easily like this -
df=df.where("ID like ('5%')")
My Question: How can I add the second statement like "ID like ('6%')" with OR - | boolean inside where clause? I want to do something like the one shown below, but this code gives an error. So, in nutshell, how can I use multiple boolean statement using LIKE and .where here -
df=df.where("(ID like ('5%')) | (ID like ('6%'))")
This works for me
from pyspark.sql import functions as F
df.where(F.col("ID").like('5%') | F.col("ID").like('6%'))
You can try
df = df.where('ID like "5%" or ID like "6%"')
In pyspark, SparkSql syntax:
where column_n like 'xyz%' OR column_n like 'abc%'
might not work.
Use:
where column_n RLIKE '^xyz|abc'
Explanation: It will filter all words either starting with abc or xyz.
This works perfectly fine.
For me this worked:
from pyspark.sql.functions import col
df.filter((col("ID").like("5%")) | (col("ID").like("6%")))

Appropriate idea or SQL to obtain the results set

Fig 1
TxnId | TxnTypeId |BranchId |TxnNumber |LocalAmount |ItemName
--------------------------------|-----------|---------------|----------
1777486 | 101 |1099 |1804908 |65.20000000 |A
1777486 | 101 |1099 |1804908 |324.50000000 |B
1777486 | 101 |1099 |1804908 |97.20000000 |C
1777486 | 101 |1099 |1804908 |310.00000000 |D
1777486 | 101 |1099 |1804908 |48.90000000 |E
Fig 2
TxnId |TxnTypeId |BankId |Number |Check |Bank |Cash |Wallet
--------|-----------|-------|--------|-------|------|------|------
1777486 |101 |1099 |1804908 | 48.9 | 310 |389.7 |97.2
Fig 3 (Expected Output)
TxnId |BankId |ItemName |Amount |Wallet |Bank |Check |Cash
--------|-------|-----------|-------|-------|-------|-------|-------
1777486 |1099 |A |65.2 |0 0 |0 |0 |65.2
1777486 |1099 |B |324.5 |0 0 |0 |0 |324.5
1777486 |1099 |C |97.2 |97.2 |0 |0 |0
1777486 |1099 |D |48.9 |0 |0 |48.9 |0
1777486 |1099 |E |310 |0 |310 |0 |0
I have two different result set that is obtained from the different query.
Fig 1 and Fig 2.
The Result i wanted is like shown in fig 3.
Currently i do not have the flag to identify the payment mode use for each transaction(each item). I have the flag for only the complete transaction.
Fig 4
IndividualTxnPaymentDetailId| IndividualTxnId |PaymentAmount |PaymentMode
---------------------------:|:-----------------:|:-------------:|:--------------
2106163 | 1777486 |389.70000000 | Cash
2106164 | 1777486 |97.20000000 | Wallet
2106165 | 1777486 |310.00000000 | Bank
2106166 | 1777486 |48.90000000 | Check
Means if two item or more is purchased using one payment mode i do not have the proper way of identifying the payment done for each item.
Item A and B is purchased using cash as payment mode with the amount 65.2 and 324.5. Total Cash paid is 389.7
Item C is purchased using Wallet as payment mode with amount 97.2. Total Wallet amount is 97.2.
Fig 5
TxnId |LocalAmount |ItemName
--------|--------------:|:------------
1777486 |65.20000000 | A
1777486 |324.50000000 | B
1777486 |97.20000000 | C
1777486 |310.00000000 | D
1777486 |48.90000000 | E
Query by which i generated the result in Fig 4 and Fig 5
select IndividualTxnPaymentDetailId, IndividualTxnId, PaymentAmount, cc.choicecode as PaymentMode
from dbo.IndividualTxnPaymentDetail it
inner join configchoice cc on cc.configchoiceid= it.configpaymentmodeid
where IndividualTxnId = 1777486
select IndividualTxnId as TxnId, LocalAmount, CurrencyName from dbo.IndividualTxnFCYDetail where IndividualTxnId = 1777486
This is the query written to identify the transaction made through Bank. Similarly i wanted to get the transaction on all the payment mode. But could not obtain the transaction properly.
CASE
WHEN tpm.Bank - SUM(txn.LocalAmount) OVER (PARTITION BY txn.BranchId, txn.TxnNumber ORDER BY CAST(txn.ItemName AS varchar(300))) + txn.LocalAmount < 0 THEN 0
WHEN tpm.Bank - SUM(txn.LocalAmount) OVER (PARTITION BY txn.BranchId, txn.TxnNumber ORDER BY CAST(txn.ItemName AS varchar(300))) + txn.LocalAmount > txn.LocalAmount THEN txn.LocalAmount
WHEN tpm.Bank - SUM(txn.LocalAmount) OVER (PARTITION BY txn.BranchId, txn.TxnNumber ORDER BY CAST(txn.ItemName AS varchar(300))) + txn.LocalAmount > tpm.Bank THEN tpm.Bank
ELSE tpm.Bank - SUM(txn.LocalAmount) OVER (PARTITION BY txn.BranchId, txn.TxnNumber ORDER BY CAST(txn.ItemName AS varchar(300))) + txn.LocalAmount
END AS Bank,
Can you help me to get the idea or with some sql to get the result set as in fig 3.
Updated Question - Updated Responce
I read your updated question and I'm afraid the problem still stands. Neither of those queries are summing the data - they are just pulling the same already summed numbers. You would either need to get at the numbers prior to the aggregation happening -or- to have some column in your IndividualTxnPaymentDetail table that ties each row to its counterpart rows in the other table (presumably through a cross table as in - Row 1 : ItemName A, Row 1 : ItemName B, Row 2 : ItemName C, etc).
If these are simply impossible, then perhaps your approaching this the wrong way, or to put it better, perhaps you are being asked to do something that doesn't make sense - and provable so. If there is no direct relationship between these activities in the data there's not much you can be expected to do. What's more it may indicate that your organization doesn't 'think' about them that way.
These two tables seem be payments and liabilities. Perhaps consider an approach where each payment goes toward what ever the oldest outstanding balance is and are matched to the items in Fig 4 that way. Add a column to the details table to store payment toward that item. Rather than a simple Paid/Unpaid Boolean, I would store the amount of payment that has been applied toward each item or the amount still owed on each item; that way you can handle partially applied payments. As payments come in, apply them. You would likely want a similar column in the payments table too to measure the amount of each payment that you have applied; that way you can handle over-payments, and be able to know the status of things such as pending receipts in the case that payments aren't applied immediately.
I hope this helps.
Fundamental Flaw
Your question is looking to take aggregated data (in your example, the Fig 2 Cash total of 389.7) and tease out what numbers were totaled to get the sum. You can do it here since 3 of the 4 numbers in Fig 2 are unique, one-to-one matches with numbers in Fig 1 - meaning the remaining ones have to belong to each other. But imagine 100s of numbers, many or most of them sums (i.e. not one-to-one matches like most of these). Or imagine an example as simple as yours except the numbers aren't so unique (e.g. Fig 1 = (10, 10, 10, 10, 20) and Fig 2 = (10, 20, 20, 10) - it is not possible to say which ones are which) and there only needs to be two possible combinations that could be responsible for a particular sum for the results to become ambiguous.
The weakness is in Fig 2. Do you have any control over that data source? Can grab the numbers up-stream before they are totaled?
Sorry for the negative conclusion but...
I hope this helps.
The Continuing Saga
Comment: [A version of this] report has already been made ...[but] I cannot contact the person who actually wrote that thing.
Perhaps he was also asked to do something that didn't make sense but did it anyway. The math simply doesn't work. He may have written something that finds as many one-to-one matches as it can and then sort of rolls the dice on the rest of it. He may have done something like the following:
Find and eliminate all the one-to-one matches.
Take any total and subtract any item amount from it to see if it
matches any remaining item amounts(s), if so, arbitrarily pick one,
eliminate all three numbers.
Repeat this until all combinations have been tested.
But you are still potentially left with unmatched numbers, so you next need to test for sums of three numbers by:
Arbitrarily subtract any two item amounts from any of the remaining
totals.
and so on and so on, followed by testing for sums of four items and so on.
I think part of what you're looking for is buried in here:
http://www.itprotoday.com/software-development/algorithms-still-matter
it calls it 'order fulfillment' where you go through transactions, combining them until you reach a given total
I think the solution will be in multiple parts, including cursors etc.
I'm not convinced you would be able to understand or implement any solution posted. Also, I maintain that there are cases where there are ambiguous solutions.
Lastly I see you have asked 16 questions and not marked a single one as answered.

SQL - Exclude four-digit-number

I have an SQL-Server with and simple table on it.
ID | Code
---+--------
1 | 1234
2 | TEST
3 | 12556
4 | TEST1
5 | 5678
6 | WART
I want to exclude all four-digit-number. In my case that would be 1234 and 5678.
I know I can use ISNUMERIC() tp check if code is numeric.
I also know I can use:
SELECT * FROM Codes WHERE code NOT LIKE '____';
to check if my value has four digits, but i dont get it how to combine them.
Any suggentions?
Thanks in advance!
Just use not like:
where code not like '[0-9][0-9][0-9][0-9]'

How should I design my table

I need to create a table for operating on data, which has been provided to me like this:
col1 col2 col3
1 < 3 50%
2 < 5 50%
3 < 10 50%
1 5>RC >=3 25%
2 10>RC >=5 25%
3 20>RC >=10 25%
1 >=5 0%
2 >=10 0%
3 >=20 0%
A user of the system would be passing a number, which is present in col2 and col1 above. Let's say that the user passed 7 for col2 and 1 for col1. Business requirement is that I should return the user the following row
1 >=5 0%
Roughly speaking, it means that I checked the value in col2, and noticed that it is >=5, which my input data fits.
I was thinking of splitting col2 across two columns - one for storing the number and the other for operator. Something like this:
col1 col2 col3 col4
1 3 50% <
2 5 50% <
3 10 50% <
1 5>RC >=3 25%
2 10>RC >=5 25%
3 20>RC >=10 25%
1 5 0% >=
2 10 0% >=
3 20 0% >=
Thsi way, I will be able to write queries for addressing queries based on data in first and last three columns (though I have not running queries right now, I just did dry run). What I am not able to figure out so far is - How to address the data in rows 4,5,6? You can ignore the RC part in those rows, as I can certainly do away with it, as I am concerned with the numeric range for my queries.
I tried splitting the data for rows 4,5,6 in 2 rows each, something like:
1 3 25% >=
1 5 25% <
2 5 25% >=
2 10 25% <
3 10 25% >=
3 20 25% <
But, I see an imminent issue here, when it comes to retrieving the data. Let's say that user paased col2 = 7 AND col1 = 1. Now, I should have got only one row,that is row number 7 in the first table in my question, but I am also getting an additional row (1st row in last table, where I was splitting data for BETWEEN conditions)
Can anyone suggest me a better approach for storing this data so that my requierment can be achieved?
SQLFiddle demo: http://www.sqlfiddle.com/#!4/d2d90/7
I suggest, that you should just split col2 in two columns - lower and higher bound, replacing not existing bound, for example, with NULL. It will look something like this:
+----+-------+-------+----+
|col1|col2_lb|col2_hb|col3|
+----+-------+-------+----+
|1 |NULL |3 |50% |
+----+-------+-------+----+
|2 |NULL |5 |50% |
+----+-------+-------+----+
|3 |NULL |10 |50% |
+----+-------+-------+----+
|1 |3 |5 |25% |
+----+-------+-------+----+
|... |... |... |... |
+----+-------+-------+----+
|1 |5 |NULL |0% |
+----+-------+-------+----+
Using this structure, you'll be able to find needed row with simple query:
SELECT *
FROM T_TABLE t
WHERE t.col1 = :VAL1
AND NVL(t.col2_lb,:VAL2) <= :VAL2
AND NVL(t.col2_hb,:VAL2+1) > :VAL2