add two columns of a row in apache pig - apache-pig

Find out top 5 country with Sum of bars and strips in a flag.
Input is :
I tried the below code1:
grunt> A =load 'mapreduce/flagdata.txt' using PigStorage(',') as (name: chararray, landmass: int, zon: int, area: int, population: int, language: int, religion: int, bars: int, stripes: int, colours: int, red: int, green: int, blue: int, gold: int, white: int, black: int, orange: int, mainhue: chararray, circles: int, crosses: int, saltires: int, quarters: int, sunstairs: int, crescent: int, triangle: int, icon: int, animate: int, text: int, topleft:chararray, botleft: chararray);
grunt> cnt = foreach A generate A.$0, (A.$7+A.$8);//(the same output even if used column name like A.name,A.bars)//
grunt> ord = order cnt by $1 desc;
grunt> lm = limit ord 5;
grunt> dump lm;
Actual output of code1:
ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 0: Scalar has more than one row in the output. 1st : (Afghanistan,5,1,648,16,10,2,0,3,5,1,1,0,1,1,1,0,green,0,0,0,0,1,0,0,1,0,0,black,green), 2nd :(Albania,3,1,29,3,6,6,0,0,3,1,0,0,1,0,1,0,red,0,0,0,0,1,0,0,0,1,0,red,red)
[main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
Code2:
grunt> cnt = foreach A generate A::$0, (A::$7+A::$8) as total;
<line 6, column 28> Unexpected character '$'
[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 6, column 28> Unexpected character '$'
grunt> cnt = foreach A generate A::name, (A::bars+A::stripes) as total;
[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025:
<line 6, column 25> Invalid field projection. Projected field [A::name] does not exist in schema: name:chararray,landmass:int,zon:int,area:int,population:int,langu
age:int,religion:int,bars:int,stripes:int,colours:int,red:int,green:int,blue:int,gold:int,white:int,black:int,orange:int,mainhue:chararray,circles:int,crosses:int,
saltires:int,quarters:int,sunstairs:int,crescent:int,triangle:int,icon:int,animate:int,text:int,topleft:chararray,botleft:chararray.
Expected output is:
Need to display the name of top 5 countries with sum(bars+stripes) is greater.(seperate column is just for reference)
I am getting different outputs and sometimes errors(Could not infer the matching function for org.apache.pig.builtin.SUM as multiple or none of them fit. Please use an explicit cast.) while modifying the above code. Please help in obtaining sum of two columns.

If the datatype of bars and stripes is int then just use '+'.SUM operates on columns.Also no need to group if the country list is unique.
cnt = foreach A generate name,(bars + stripes) as total;
ord = order cnt by $1 desc;
lm = limit ord 5;
dump lm;

Related

ERROR: extra data after last expected column, how to fix

I tried to create a table to import a csv file as you can see below, but this error came up, what do I do to fix this.
id VARCHAR(255),
race_ethnicity VARCHAR(255),
sex CHAR(1),
date_of_svc DATE,
icd10_category VARCHAR(3),
billed_amt DOUBLE PRECISION,
allowed_amt DOUBLE PRECISION,
claim_status VARCHAR(255),
cpt INT
);
COPY public.test_dataset FROM 'C:\Users\marca\Downloads\Test dataset - SQL - Sep. 2021 - HCDA.csv' WITH CSV HEADER;```
ERROR: extra data after last expected column
CONTEXT: COPY test_dataset, line 2: "A100B9111,african american,F,01/01/2020,F10, $350.00 , $250.00 ,Paid,99281,"
SQL state: 22P04

Conversion failed when converting the nvarchar value to data type int sql server

I have a table. There is data in this table and there is a checkbox next to each data. Multiple selection is possible. After the user makes a selection, the id numbers of the selected columns are come in the array. I convert the array to string and send it to the stored procedure and I run the following stored procedure:
Example value for #ResultsIds: 65, 66, 67, 68, 125
#ResultsIds nvarchar(250)
UPDATE MyTable SET [IsVerified] = 1 WHERE Id IN (#ResultsIds)
And I got this error:
Conversion failed when converting the nvarchar value '65, 66, 67, 68, 125' to data type int. Because [Id] column is int data type.
I tried CAST and CONVERT functions of SQL but it didn't work.
SQL Server doesn't do that automatically. Assuming you're on a recent version, you can do this:
declare #ResultsIds nvarchar(250) = '65,66,67,68,125'
UPDATE MyTable
SET [IsVerified] = 1
WHERE Id IN (
select [value]
from string_split(#ResultIDs, ',')
)
declare #ResultsIds nvarchar(250)='65,66,67,68,125'
UPDATE MyTable SET [IsVerified] = 1 WHERE cast(Id as varchar) IN (#ResultsIds)
I solved problem using foreach. I separated the numbers in the string from commas and transferred each number to the array. Then I updated the array one by one by running foreach loop.
public void Verify(DB db, string rows)
{
int[] nums = Array.ConvertAll(rows.Split(','), int.Parse);
foreach (int value in nums)
{
DbCommand cmd = db.GetStoredProcCommand("VerifyProcess");
db.AddInParameter(cmd, "#ResultId", DbType.Int32, value);
db.ExecuteNonQuery(cmd);
}
}

Split a column into multiple columns in a SQL query using pipe delimiter

I have below data in a column of a table, I want to split it into further columns.
| is used as the separator in this scenario . Column header should be before : & after column is its value.
Column
-----------------------------------------------------------------------------
ID: 30000300 | Name: India | Use: New Use
ID: 30000400 | Name: Aus | New ID: 15625616 | Address 1: NEW Rd
ID: 30000400 | Name: USA | City: VIA ARAMAC | New ID: 123
ID: 30000500 | Name: Russia | New ID: 15624951 | Address 2: 2131 BEAUDESERT
Output should be:
ID Name Use New ID City Address 1 Address 2 New City
----------------------------------------------------------------------
30000300 India New Use
30000400 Aus 15625616 NEW Rd
30000400 USA 15625616 VIA ARAMAC GALILEE
30000500 Russia 15624951 2131 BEAUDESERT
You have several rows that contain key value pairs inside an nvarchar column, but you want a table that has a header based on the keys and then rows containing just the values, sans keys. There is first the issue of an input like Key1: Value1 | Key2: Value2. Should this be returned as
Key1 Key2
Value1 NULL
NULL Value2
or is this not a possible scenario? Either way, there is the issue of generating a table with dynamic column names.
The problem with your question is that this is not a scenario that would normally be solved via SQL. You should get the data in your programming language of choice, then use regular expressions or split methods to get what you need.
If you insist doing it via SQL, then the solution is to turn the original lines input into another string, that you then sp_executesql (https://learn.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sp-executesql-transact-sql), but I do NOT recommend it.
Here is a partial answer that you can use to return the n-th entry in a delimited string:
DECLARE #DelimitedString VARCHAR(8000);
DECLARE #Delimiter VARCHAR(100);
DECLARE #indexToReturn INT;
DECLARE #tblArray TABLE
(
ElementID INT IDENTITY(1, 1), -- Array index
Element VARCHAR(1000) -- Array element contents
);
-- Local Variable Declarations
-- ---------------------------
DECLARE #Index SMALLINT,
#Start SMALLINT,
#DelSize SMALLINT;
SET #DelSize = LEN(#Delimiter + 'x') - 1;
-- Loop through source string and add elements to destination table array
-- ----------------------------------------------------------------------
WHILE LEN(#DelimitedString) > 0
BEGIN
SET #Index = CHARINDEX(#Delimiter, #DelimitedString);
IF #Index = 0
BEGIN
INSERT INTO #tblArray
(
Element
)
VALUES
(LTRIM(RTRIM(#DelimitedString)));
BREAK;
END;
ELSE
BEGIN
INSERT INTO #tblArray
(
Element
)
VALUES
(LTRIM(RTRIM(SUBSTRING(#DelimitedString, 1, #Index - 1))));
SET #Start = #Index + #DelSize;
SET #DelimitedString = SUBSTRING(#DelimitedString, #Start, LEN(#DelimitedString) - #Start + 1);
END;
END;
DECLARE #val VARCHAR(1000);
SELECT #val = Element
FROM #tblArray AS ta
WHERE ta.ElementID = #indexToReturn;
SELECT #val;

Conversion failed when converting the varchar to data type int

I have this query, I need to save Pubname, ISBNname, copiesname and createdname as Integers
DECLARE #a Table(bkID INT, bkname varchar(100), bkpub INT, bkISBN INT, bkcopies INT, bkcreatedby INT)
INSERT INTO #a (bkID, bkname, bkpub, bkISBN, bkcopies, bkcreatedby)
SELECT A.nameID,
A.bkname,
CAST((B.Pubname) AS INT),
CAST((C.ISBNname) AS int),
CAST((D.copiesname) AS int),
CAST((F.createdname) AS int)
FROM #bkname AS A
LEFT JOIN #bkPub AS B
ON (A.nameID = PubID)
LEFT JOIN #bkISBN AS C
ON (A.nameID = C.ISBNID)
LEFT JOIN #bkcopies AS D
ON (A.nameID = D.copiesID)
LEFT JOIN #bkcraeted AS F
ON (A.nameID = F.craetedID)
It returns this error:
Msg 245, Level 16, State 1, Procedure LR_InsertBookArray, Line 46
[Batch Start Line 2] Conversion failed when converting the varchar
value ''3' to data type int.
It seems highly unlikely that columns called name are actually integers in disguise. My guess is that you simply want the ids from the reference table, but your question doesn't have enough information to know if that is true.
In the meantime, you can use TRY_CAST():
TRY_CAST(B.Pubname AS int),
TRY_CAST(C.ISBNname AS int),
TRY_CAST(D.copiesname AS int),
TRY_CAST(F.createdname AS int)
This will return NULL if the values cannot be converted -- but it avoids the error.

Recursive SQL query that finds path

i am trying to solve this problem and i am given a database table such as: create table file(id int, parentid int, name varchar(1024), size int, type
char(1));
I have to write a single (recursive) database query to list FULL PATH of all files. [assume type is either "F" or "D" for file or directory]. Your query should give you similar output to unix command: "find . -type f".
I wrote something like this but, I am not sure if this is what the question is asking for me to do since I am inexperienced in Unix. Any help would be greatly appreciated. Thanks
WITH RECURSIVE search_path (path_ids, length, is_visited) AS
(
SELECT
ARRAY[node_id, destination_node_id],
link_length,
node_id = destination_node_id
FROM
node_links_view
UNION ALL
SELECT
path_ids || d.destination_node_id,
f.length + d.link_length,
d.destination_node_id = ANY(f.path_ids)
FROM
node_links_view d,
search_path f
WHERE
f.path_ids[array_length(path_ids, 1)] = d.node_id
AND NOT f.is_visited
)
SELECT * FROM search_path
WHERE path_ids[1] = 1 AND path_ids[array_length(path_ids, 1)] = 6
ORDER BY length;