SQL Query to separate data into two fields - sql

I have data in one column that I want to separate into two columns. The data is separated by a comma if present. This field can have no data, only one set of data or two sets of data saperated by the comma. Currently I pull the data and save as a comma delimited file then use an FoxPro to load the data into a table then process the data as needed then I re-insert the data back into a different SQL table for my use. I would like to drop the FoxPro portion and have the SQL query saperate the data for me. Below is a sample of what the data looks like.
Store Amount Discount
1 5.95
1 5.95 PO^-479^2
1 5.95 PO^-479^2
2 5.95
2 5.95 PO^-479^2
2 5.95 +CA8A09^-240^4,CORDRC^-239^7
3 5.95
3 5.95 +CA8A09^-240^4,CORDRC^-239^7
3 5.95 +CA8A09^-240^4,CORDRC^-239^7
In the data above I want to sum the data in the amount field to get a gross amount. Then pull out the specific discount amount which is located between the carat characters and sum it to get the total discount amount. Then add the two together and get the total net amount. The query I want to write will separate the discount field as needed, see store 2 line 3 for two discounts being applied, then pull out the value between carat characters.

For SQL Server:
You can use ChardIndex(',',fieldname) in a sql statement to find the location of the comma and then Substring to parse out the first and second field.

For Oracle you can use a case statement like this in your select clause. Use one for each of the two discounts:
CASE WHEN LENGTH(foo.discount) > 0 AND INSTR(foo.discount,',') > 0 THEN
SUBSTR(foo.discount,1,INSTR(foo.discount,',',1,1)) ELSE foo.discount END AS discount_column_1

I finally figured out exactly how to separate the fields as I need them. Below is the code that breaks the discount field into two. I can now separate the fields as needed and insert the data separated into a temp table then use a similar set of code to pull out the exact amount enclosed by the carat characters. Thanks for the help in the two answers above. I used a combination of both to get exactly what I needed.
CASE LEN(X.DISCOUNT)-LEN(REPLACE(X.DISCOUNT,',',''))
WHEN 1 THEN SUBSTRING(X.DISCOUNT,1,CHARINDEX(',',X.DISCOUNT)-1)
ELSE X.DISCOUNT
END 'FIRST_DISCOUNT',
CASE LEN(X.DISCOUNT)-LEN(REPLACE(X.DISCOUNT,',',''))
WHEN 1 THEN SUBSTRING(X.DISCOUNT,CHARINDEX(',',X.DISCOUNT)+1,LEN(X.DISCOUNT)-CHARINDEX(',',X.DISCOUNT)+1)
ELSE ''
END 'SECOND_DISCOUNT'

This alternative solution uses LEFT and RIGHT functions for split the column.
select Store, Amount,
Discount1 = CASE
WHEN CHARINDEX(',',Discount) > 1 THEN LEFT(Discount, CHARINDEX(',',Discount)-1 )
ELSE Discount END,
Discount2 = CASE
WHEN CHARINDEX(',',Discount) > 1 THEN RIGHT(Discount, LEN(Discount) - CHARINDEX(',',Discount)-1 )
END
from #Temp

Related

Updating columns based on a combine rows value on the same table

Please assist if possible, I have used Stuff to combine rows into a single row based on other columns. However I want to turn each of the unique items into it's own column with a number showing if it exists, e.g. 1 or 0 and then doing the same for all subsequent rows?
I have been able to create the columns but I can't get them to update per whats in the one column.
But I want it to be dynamic so matter how many different names appear in categories it creates a new column and adds 1 or 0 if it appears or not
How about something like this for SQL Server?
strSQL = "SELECT Category, CASE WHEN Category IS NOT NULL THEN 1 ELSE 0 END AS IsCategoryExist FROM MyTable"
Sample data (the 2nd column shows as 1 if the first column is non-blank):
Cars, 1
[Blank], 0
Airplanes, 1
Radios, 1

How to combine a row of cells in VBA if certain column values are the same

I have a database where all of the input from the user (through a userform) gets stored. In the database, each column is a different category for the type of data (ex. date, shift, quantity, etc) and the data from the userform input gets put into its corresponding category. For some of the data, all the data is the same except for the quantity. I was wondering how I could combine these rows into one and add the quantities to each other for the whole database (ex. combining the first and third data entries). I have tried playing around with a couple different loops but can't seem to figure anything out.
Period Date Line Shift Type Quantity
4 x 2 4/3/18 A 3 14 18
4 x 2 4/3/18 A 3 13 12
4 x 2 4/3/18 A 3 14 15
Thank you!
If you're looking to modify the underlying database, you might be able to query the data into the format you want by including all the other columns in a GROUP BY statement, save the result to another table, then replace the original table with the properly formatted one.
If you have the data in Excel and you just want to view it with the duplicate rows summed, a Pivot Table would be a good choice. You can select all the other columns as rows for the Pivot Table and sum of Quantity as the values.

Get percentage of NULL for all columns in Hive

I would like to get the percentage of NULL values in a table in Hive. Is there an easy way to do this without having to enumerate all column names in the query? In this case there are about 50k rows and 20 columns. Thanks in advance!
Something like:
SELECT count(each_column) / count(*) FROM TABLE_1
WHERE each_column = NULL;
If you do this using code, you need to list the columns. Here is one way:
select avg(case when col1 is null then 1.0 else 0.0 end) as col1_null_p,
avg(case when col2 is null then 1.0 else 0.0 end) as col2_null_p,
. . .
from t;
If you take the list of columns in the table, you can readily construct the query in a spreadsheet.
The approach you need depends on the situation that you have:
For 20 fixed columns: Just type your query
For 200 fixed columns: Copy the column names to your favorite tool (excel) and build the query there
For n columns that may not be fixed: Write a script to generate your code
I once wrote a python script. I now don't have it at hand but it is quite easy to create with the following logic:
Query the first 1 (or 0?) rows of the table, to get all the headers.
Build the desired queries to generate column based statistics (like percentage of null values) and union the result
Then executed the query.
Of course it can be expanded to run for different tables, and statistics, but do realize that this may not scale well.
In my case I think I had to cut the query building in batches of 20 columns each time which would then be concatenated afterwards, because running it on 400 columns just generated a too complex query.

sql: select rows with multiple lines of data

In table A -> Column X there is some data which has numbers, alphabets and special characters. Most of the records has single line of data but some of them has 2 or 3 lines of data.
1 this is a sample description of data 01/11/2017 # 123'~
Records with two lines of data
1 this is a sample description
2 of data 22/11/2017 #~ 12##'
I need to do a select query to get the records which has 2 lines of data in Column X of table A.
I use TOAD and the above mentioned sample data is from the Grid popup editor
thanks
You could select those rows that contain a new line (do not know your sample data, either chr(10) or chr(13)):
select *
from tableA
where instr(columnX, chr(10)) > 0;
The solution is taken from this SO answer, please do not forget to upvote the linked solution if it helped you.

Multicriteria Insert/Update

I'm trying to create a query that will insert new records to a table or update already existing records, but I'm getting stuck on the filtering and grouping for the criteria I want.
I have two tables: tbl_PartInfo, and dbo_CUST_BOOK_LINE.
I'm want to select from dbo_CUST_BOOK_LINE based upon the combination of CUST_ORDER_ID, CUST_ORDER_LINE_NO, and REVISION_ID. Each customer order can have multiple lines, and each line can have multiple revision. I'm trying to select the unique combinations of each order and it's connected lines, but take the connected information for the row with the highest value in the revision column.
I want to insert/update from dbo_CUST_BOOK_LINE the following columns:
CUST_ORDER_ID
PART_ID
USER_ORDER_QTY
UNIT_PRICE
I want to insert/update them into tbl_PartInfo as the following columns respectively:
JobID
DrawingNumber
Quantity
UnitPrice
So if I have the following rows in dbo_CUST_BOOK_LINE (PART_ID omitted for example)
CUST_ORDER_ID CUST_ORDER_LINE_NO REVISION_ID USER_ORDER_QTY UNIT_PRICE
SCabc 1 1 0 100
SCabc 1 2 4 150
SCabc 1 3 4 125
SCabc 2 3 2 200
SCxyz 1 1 0 0
SCxyz 1 2 3 50
It would return
CUST_ORDER_ID CUST_ORDER_LINE_NO (REVISION_ID) USER_ORDER_QTY UNIT_PRICE
SCabc 1 3 4 125
SCabc 2 3 2 200
SCxyz 1 2 3 50
but with PART_ID included and without REVISION_ID
So far, my code is just for the inset portion as I was trying to get the correct records selected, but I keep getting duplicates of CUST_ORDER_ID and CUST_ORDER_LINE_NO.
INSERT INTO tbl_PartInfo ( JobID, DrawingNumber, Quantity, UnitPrice, ProductFamily, ProductCategory )
SELECT dbo_CUST_BOOK_LINE.CUST_ORDER_ID, dbo_CUST_BOOK_LINE.PART_ID, dbo_CUST_BOOK_LINE.USER_ORDER_QTY, dbo_CUST_BOOK_LINE.UNIT_PRICE, dbo_CUST_BOOK_LINE.CUST_ORDER_LINE_NO, Max(dbo_CUST_BOOK_LINE.REVISION_ID) AS MaxOfREVISION_ID
FROM dbo_CUST_BOOK_LINE, tbl_PartInfo
GROUP BY dbo_CUST_BOOK_LINE.CUST_ORDER_ID, dbo_CUST_BOOK_LINE.PART_ID, dbo_CUST_BOOK_LINE.USER_ORDER_QTY, dbo_CUST_BOOK_LINE.UNIT_PRICE, dbo_CUST_BOOK_LINE.CUST_ORDER_LINE_NO;
This has been far more complicated that anything I've done so far, so any help would be greatly appreciated. Sorry about the long column names, I didn't get to choose them.
I did some research and think I found a way to make it work, but I'm still testing it. Right now I'm using three queries, but it should be easily simplified into two when complete.
The first is an append query that takes the two columns I want to get distinct combo's from and selects them and using "group by," while also selecting max of the revision column. It appends them to another table that I'm using called tbl_TempDrop. This table is only being used right now to reduce the number of results before the next part.
The second is an update query that updates tbl_TempDrop to include all the other columns I wanted by setting the criteria equal to the three selected columns from the first query. This took an EXTREMELY long time to complete when I had 700,000 records to work with, hence the use of the tbl_TempDrop.
The third query is a basic append query that appends the rows of tbl_TempDrop to the end destination, tbl_PartInfo.
All that's left is to run all three in a row.
I didn't want to include the full details of any tables or queries yet until I ensure that it works as desired, and because some of the names are vague since I will be using this method for multiple query searches.
This website helped me a little to make sure I had the basic idea down. http://www.techonthenet.com/access/queries/max_query2_2007.php
Let me know if you see any flaws with the ideology!