Can applying a Boolean mask data frame to another data frame with Boolean values be faster than searching a column for a particular value? - pandas

Which is faster?
Applying a 100x100 pandas data frame with Boolean vales to another data frame of the same format
or...
Searching a 100 row column trying to locate a particular cell that is say a 1000 characters long string and contains numbers and symbols

Related

Updating data frame specific column values in a row using loop

I have a data frame having 2 columns, imagename and its class. Now I want to add its corresponding features extracted from the image. This feature is a list of 10 entries(LBP).Below is the dataframe.
How can I update these A->J column
I have tried using dataframe.loc with image name as input but error.
Dataset.loc('ISIC_0028714')=Feature_vector

how to save large decimal data from dataframe to csv file

data frame consists of 13977 rows and 1280 columns and each cell has minimum 11 digit decimal number.
when i was trying to convert the data frame to c.s.v file .In the file the whole decimal number is not
storing. it is saving up-to some decimals.
for eg:
0.1336052119731903 this is one of the number in the data frame but in c.s.v file it is storing as 0.133605211
Pandas by default has limit for the column width. If its longer then the limit its going to truncate the data stored in the column.
You can bypass the setting like this:
pd.set_option("display.max_colwidth", 10000)
Cheers!
Change the type of column to a string before exporting the data frame to a CSV file.
df['col'] = df['col'].map(str)
df.to_csv("filename.csv")
If you are opening the CSV file in excel, you may have to custom format the cell for all the digits to be displayed.

Why do SQL varchars (256) not get populated to my flat file in SSIS package? [duplicate]

This question already has an answer here:
Failing to read String value from an excel column
(1 answer)
Closed 3 years ago.
I have a SSIS package which sources from an Excel file, performs a lookup in SQL, and then writes the fields from the lookup to a flat file. For some reason, any of the fields in the SQL table that are of data type varchar 256 are not getting written. They are coming in as nulls. My other fields, including varchar 255, are coming across fine. I have tried flat file and Excel as destination with no luck.
I've tried converting the varchar with a data conversion to both 256 and to a Unicode string and no luck.
Even when I preview a simple query in the source component (ex: select lastname from xyz), the preview shows the lastname as null. It doesnt show other fields that have different data types as nulls.
This is usually a case when the excel driver only reads the first 8 rows of data and misinterprets the correct data type because of the lack of data it's checking. Here are some of the known issues from the Microsoft site: Reference
Issues with importing
Empty rows
When you specify a worksheet or a named range as the source, the driver reads the contiguous block of cells starting with the first non-empty cell in the upper-left corner of the worksheet or range. As a result, your data doesn't have to start in row 1, but you can't have empty rows in the source data. For example, you can't have an empty row between the column headers and the data rows, or a title followed by empty rows at the top of the worksheet.
If there are empty rows above your data, you can't query the data as a worksheet. In Excel, you have to select your range of data and assign a name to the range, and then query the named range instead of the worksheet.
Missing values
The Excel driver reads a certain number of rows (by default, eight rows) in the specified source to guess at the data type of each column. When a column appears to contain mixed data types, especially numeric data mixed with text data, the driver decides in favor of the majority data type, and returns null values for cells that contain data of the other type. (In a tie, the numeric type wins.) Most cell formatting options in the Excel worksheet do not seem to affect this data type determination.
You can modify this behavior of the Excel driver by specifying Import Mode to import all values as text. To specify Import Mode, add IMEX=1 to the value of Extended Properties in the connection string of the Excel connection manager in the Properties window.
Truncated text
When the driver determines that an Excel column contains text data, the driver selects the data type (string or memo) based on the longest value that it samples. If the driver does not discover any values longer than 255 characters in the rows that it samples, it treats the column as a 255-character string column instead of a memo column. Therefore, values longer than 255 characters may be truncated.
To import data from a memo column without truncation, you have two options:
Make sure that the memo column in at least one of the sampled rows contains a value longer than 255 characters
Increase the number of rows sampled by the driver to include such a row. You can increase the number of rows sampled by increasing the value of TypeGuessRows under the following registry key:
Redistributable components version - Registry key
Excel 2016 - HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Microsoft\Office\16.0\Access Connectivity Engine\Engines\Excel
Excel 2010 - HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Microsoft\Office\14.0\Access Connectivity Engine\Engines\Excel

Query in Google Spreadsheet

I made a simple query function as this: =QUERY(range,"select *",1), in my Google Spreadsheet, but the results dose not show any letters, it shows only fields that contains numbers.
Study this link
Syntax
QUERY(data, query, [headers])
data - The range of cells to perform the query on.
Each column of data can only hold boolean, numeric (including
date/time types) or string values.
In case of mixed data types in a single column, the majority data type
determines the data type of the column for query purposes. Minority
data types are considered null values.
Just format your range as a plain text using Format > Number > Plain text option, the below images show how to do it:

How to query the presence of an element inside a Spark Dataframe Column that contains a set?

I have a spark dataframe where one column has the type Set<text>.
This column contains a set of string, for example ["eenie","meenie","mo"].
How do I filter the contents of the whole dataframe so that
I only get those rows that (for example) contain the value eenie in the set?
I'm looking for something similar to
dataframe.where($"list".contains("eenie"))
the above shown example is only valid for when the content of column list is a string not a Set. What alternatives are there to fit my circumstances?
Edit: My question is not a duplicate. The user in that question has a set of values and wants to know which ones are located inside a specific column. I have a column that contains a set, and I want to know if a specific value is part of the set. My approach is the opposite of that.
Try:
import org.apache.spark.sql.functions.array_contains
dataframe.where(array_contains($"list", "eenie"))