The mechanism of auto inserting in Pandas Dataframe when selecting rows by index - pandas

I noticed a mechanism of auto inserting when selecting rows by index. To illustrate, I use the following code:
Then my questions are 2 (may be they are the same):
Any document about this mechanism? (I have tried but cannot find it in the long long official documents)
How to avoid the auto inserting? For example, I want the last line of code returns the only 'a' row.
Thank you very much in advance!

I have not seen any documentation. It looks like an unintended artifact. I can think of some clever things to do with it but I wouldn't trust it.
Work around
df1.loc[pd.Index([1, 'a']).intersection(df1.index), :]

Related

SQL Best way to store "Check All That Apply" in survey

I have a survey that ask the question of availability via check boxes like so:
I am available (please check all that apply:
[] Early Mornings
[] Mid Mornings
[] Early Afternoons
[] Mid Afternoons
[] Evenings
[] Late Evenings
[] Overnight
That I need to translate into a SQL database. My question is: What is the best way to store this data under one column? I was thinking of a 7 digit bit storage like: 0010001 (Indicates the candidate is only available during Early Afternoons and overnight). Is there a better way? Thanks for any opinions!
A separate table for the options and a "join table" of options to the candidate. The other solutions/suggestions will impede data integrity and performance in a relational database. If you've got another DB it might be different but don't do anything other than the relational table if you're using SQL.
Pipe delimited flags.
Make the column a fairly wide text column, then store:
'Early Mornings|Evenings|Overnight'
if those 3 choices were selected.
(Note: I do agree with the other answer that it is likely better to use separate columns, but this is how I'd do it if there were a good reason to want just 1 column)
Is there any particular reason the results need to be stored inside one column? If so, your solution is probably the best way EDIT: If you are going to be querying this data your solution is the best way, otherwise follow the other answer using "|" to separate the strings in one long varchar field, though anyone looking at that data is going to have no clue what it means unless they've taken the time to memorize each question in order.
If it doesn't need to be all in one column I'd recommend just creating a column for each question with a bit value similar to what you already want to do.

Add and delete List items

The university example explains how to add and delete items of a map:
(departments composeLens at("Physics")).set(Some(physics))(uni)
(departments composeLens at("History")).set(None)(uni)
This does not work with Lists, though:
(lecturers composeOptional index(2)).set(Lecturer("New", "Lecturer", 50))(dep)
(lecturers composeOptional index(0)).set(None)(dep)
Adding does nothing, deleting throws a compilation error.
Edit: By now, I use quicklens, which is able to modify sequences.
Since there is no explicit question in the OP, I will try answering a couple of possible questions:
Why do the first pair of lines work, and the second pair does not?
The answer is given in "What is the difference between at and index? When should I use one or the other?" halfway across the page:
In other words, index can update any existing values while at can also insert and delete.
How can I add/remove items to/from a List?
Just below the text quoted above:
Since index is weaker than at, we can implement an instance of Index on more data structure than At. For instance, List or Vector only have an instance of Index because there is no way to insert an element at an arbitrary index of a sequence.
So it may not be possible...
I have no Monocle here to test a few things, though.

.Net Parsing Fixed Width Data... From a Concatenated, Single, Fixed-Width Column

I was bored and looking at old code that runs like molasses on a cold day. I found that a group of tables in our accounting system - each with 500,000 records of ~20 datapoints - that use a single column of concatenated, fixed-width values instead of separate columns. (Fixing the tables isn't an option.) An old .net ETL project is grabbing all records, doing a bunch of substrings on each record to set an object's corresponding attributes, then sending the object to merge with production data via a stored proc.
The way it is working is fine. It works. And, to be perfectly honest, I doubt I'll be given the go-ahead to fix it even if I come up with a better solution, but I was curious to see if anyone knew of a better way of doing this, because it's not entirely unlikely that I'll face a situation like this in the future.
I was thinking that if there was a way to use the TextFieldParser to parse a static string instead of a file/stream that might be a valid idea. Or, instead, I could write the entire table to a text file and then use the TextFieldParser to send data to the SProc. http://www.dotnetperls.com/textfieldparser does show that TextFieldParser is quite a bit faster than split, which I would assume is tantamount to the string manipulation our project is currently doing with substring. So there may be something to that idea.
Or perhaps the whole, old project should be dumped for a shiny new SSIS project. Would it also have to write the records to a flat file before importing into SQL? Or can it import directly from the table?
Thank you in advance!

Excel to Sql Server table data loading issue

I am working on data import from Excel to Sql Table using SSIS.
I am facing the issue of some of string values replaced by NULL values(since the first 8 records contains only numeric values). Needless to say I tried with appending the connection string with IMEX=1, but the problem still persists and I dont want to tamper the REGISTRY as recommended in few articles.
Can you guys suggest a resolution to this issue, where there could be string value in a column after the first 8 records in Excel, but it should go with original data in DB.
I am looking for a good workaround, knowing that this seems a standard issue.
See my answer to a similar question about how to fix the metadata of an excel source after the fact.
https://stackoverflow.com/a/13459855/236348
Also try this:
Check out the [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel] located registry REG_DWORD "TypeGuessRows". That's the key to not letting Excel use only the first 8 rows to guess the columns data type. Set this value to 0 to scan all rows. This might hurt performance. Please also note that adding the IMEX=1 option might cause the IMEX feature to set in after just 8 rows. Use IMEX=0 instead to be sure to force the registry TypeGuessRows=0 (scan all rows) to work.
from this page: http://www.connectionstrings.com/excel
In windows 7 this key is at:
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Jet\4.0\Engines\Excel]
TypeGuessRows has a range of 0 for all or 1-16 for number of rows to scan. Set as appropriate for your application.
I found a temporary workaround to the problem. Check posting it here if somebody finds its useful someday...
Solution:
1) In the ConnectionString used IMEX = 1
2) Keep the First Row as Header = FALSE
3) Now use the Data Flow task to import the data from Excel to Sql Data, but only after eliminating the first row(which is Header row) using Conditional Split.
This solution ensures that even if there is no alphanumeric value within the first 8 datarows, the header being alphanumeric the JET/ACE connector will sense the datatype as STRING - DT_STR. This solves the issue of NULLS inserted in between.
For details on issue with Excel as source and possible solution using IMEX=1...please refer:
URL 1: http://microsoft-ssis.blogspot.in/2011/06/mixed-data-types-in-excel-column.html
URL 2: http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/thread/1b9020ec-616c-42e2-99c0-18f1258ff5db
Thanks,
Justin Samuel.

How to get multi row data of one column to one row of one Column

I need to get data in multiple row of one column.
For example data from that format
ID Interest
Sports
Cooking
Movie
Reading
to that format
ID Interest
Sports,Cooking
Movie,Reading
I wonder that we can do that in MS Access sql. If anybody knows that, please help me on that.
Take a look at Allen Browne's approach: Concatenate values from related records
As for the normalization argument, I'm not suggesting you store concatenated values. But if you want to join them together for display purposes (like a report or form), I don't think you're violating the rules of normalization.
This is called de-normalizing data. It may be acceptable for final reporting. Apparently some experts believe it's good for something, as seen here.
(Mind you, kevchadder's question is right on.)
Have you looked into the SQL Pivot operation?
Take a look at this link:
http://technet.microsoft.com/en-us/library/ms177410.aspx
Just noticed you're using access. Take a look at this article:
http://www.blueclaw-db.com/accessquerysql/pivot_query.htm
This is nothing you should do in SQL and it's most likely not possible at all.
Merging the rows in your application code shouldn't be too hard.