PIG - Converting elements of bags into fields - apache-pig

I have a Pig query with the following output (one row)
(6,{(6,76,35,1565),(6,76,76,920),(6,35,76,906),(6,177,35,822),(6,268,35,720),(6,35,177,701),(6,35,268,694),(6,35,35,656),(6,85,85,611),(6,35,90,559)})
I would like to transform each element of my bag into a field, so
(6,(6,76,35,1565),(6,76,76,920),(6,35,76,906),(6,177,35,822),(6,268,35,720),(6,35,177,701),(6,35,268,694),(6,35,35,656),(6,85,85,611),(6,35,90,559))
where I can name every field with a different name : x1, x2, x3,....,
I tried flattening but that made one row for each element of the bag:
6,(6,76,35,1565)
6,(6,76,76,920)
6, (6,35,76,906)
And I want all the elements to remain in one single row.
Any ideas?

try the following links it may be helpfull
1.How to convert fields to rows in Pig?
2.write array from pig

You will have to use BagToTuple.Assuming you have a relation A with 2 fields.
B = FOREACH A GENERATE A.$0,FLATTEN(BagToTuple(A.$1));

Related

Python selenium get table values into List of Lists

I'm just trying to get the data from this table:
https://www.listcorp.com/asx/sectors/materials
and put all the values (the TEXT) into a list of lists.
I've tried so many different methods using--> xpath, getByClassName, By.tag
------------
rws = driver.find_elements_by_xpath("//table/tbody/tr/td")
---------------
table = driver.find_element_by_class_name("v-datatable v-table theme--light")
--------------
findElements(By.tagName("table"))
--------------
# to identify the table rows
l = driver.find_elements_by_xpath ("//*[#class= 'v-datatable.v-
table.theme--light']/tbody/tr")
# to get the row count len method
print (len(l))
# THIS RETURNS '1' which cant be right because theres hundreds of rows
And nothing seems to work to get the values in an easy way to understand the manner.
(EDIT SOLVED)
before doing the SOLVED solution below.
First do: time.sleep(10) this will allow the page to load so that the table can actually be retrieved. then just append all the cells to a new list. YOU WILL NEED MULTIPLE LISTS to fit all the rows.
So basically you can use find_elements_by_tag_name
and use this code
row = driver.find_elements_by_tag_name("tr")
data = driver.find_elements_by_tag_name("td")
print('Rows --> {}'.format(len(row)))
print('Data --> {}'.format(len(data)))
for value in row:
print(value.text)
Have proper wait to populate the data.

Presto - Return 1 element of a row of an array

I have a table with a nested json array in (columnname), made up of 5 parts (col1,col2,col3,col4,col5), with a number of "rows". col5 is the row number. I am trying to extract col3 for row 1.
A colleague of mine suggested I use element_at(columnname, 1), which returns the whole json string for that row of data, but I want to extract one part of that data. I cannot find how to extract one part of that json string from what I have.
Is there a way to extract col3?
Found it. element_at(columnname,1).col1

Using to_datetime several columns names

I am working with several CSV's that first N columns are information and then the next Ms (M is big) columns are information regarding a date.
This is the dataframe picture
I need to set just the columns between N+1 to N+M - 1 columns name to date format.
I tried this, in this case N+1 = 5, no matter M, I suppose that I can use -1 to not affect the last column name.
ContDiarios.columns[5:-1] = pd.to_datetime(ContDiarios.columns[5:-1])
but I get the following error:
TypeError: Index does not support mutable operations
The way you are doing is not feasable. Please try this way
def convert(x):
try:
return pd.to_datetime(x)
except:
return x
x.columns = map(convert,x.columns)
Or you can also use df.rename property to convert it.

Dividing values from 2 different datasets

I am trying to divide 2 different fields from 2 different datasets. Also using a lookup in the statement but for some reason it does the lookup part of the expression but doesn't do the division part. Any ideas?
=IIF(Fields!PACKSHORT_DESC.Value = "EA",(LOOKUP(TRIM(Fields!PRODUCT_CODE.value), TRIM(Fields!item.value),Fields!tcost.value,"Cost")/Fields!NO_OF_EACHES.Value),(LOOKUP(TRIM(Fields!PRODUCT_CODE.value), TRIM(Fields!item.value),Fields!tcost.value,"Cost")))
Get it to output the two numbers you are trying to divide first to see if they are pulling through correctly first, assign them names and then divide them instead.

How to access columns by their names and not by their positions?

I have just tried my first sqlite select-statement and got a result (an iterator over tuples). So, in other words, every row is represented by a tuple and I can access value in the cells of the row like this: r[7] or r[3] (get value from the column 7 or column 3). But I would like to access columns not by their positions but by their names. Let us say, I would like to know the value in the column user_name. What is the way to do it?
I found the answer on my question here:
cursor.execute("PRAGMA table_info(tablename)")
print cursor.fetchall()