This question already has an answer here:
Delete duplicates GPS coordinates in column in each row
(1 answer)
Closed 3 years ago.
I have columns in which coordinates are presented in the text format. Each set of coordinates in one cell. All coordinates all coordinates are in one table cell, like text. And i have more than 1000 cells and each contains more than 100 coordinates.
For example:
23.453411011874813 41.74245395132344, 23.453972640029299 41.74214208390741, 23.453977029220994 41.741827739090233, 23.454523642352295 41.741515869012523, 23.441100249526403 41.741203996333724, 23.441661846243466 41.740892121053918,
23.456223434003668 41.74058024317317, 23.441661846243466 41.740892121053918
In the case of repeating coordinates, I need to delete the last of them (bold in the example) and delete the coordinate located between them (italic in the example).
Please tell me how this can be done?
Thanks a lot!
OLAP functions will be your friend.
- ROW_NUMBER() will identify the 2nd, 3rd,... occurences
- with COUNT() OLAP you can identify the double ones
- with CASE and some MAX-ROWS PRECEEDING you can tag the rows between 1st and 2nd
Two crucial questions for the concrete solution, you have to ask:
- by which criteria are your rows ordered (I guess a not shown column with TimeStamps...)
- what happens if a coordinate occurs 3 -times (or even more)? - Delete all between 1st and last or just between 1st and 2nd or always between uneven&even?
Related
total beginner here. If my question is irrelevant, apologies in advance, I'll remove it. So, I have a question : using pandas, I want to calculate an evolution ratio for a week data compared with the previous rolling 4 weeks mean data.
df['rolling_mean_fourweeks'] = df.rolling(4).mean().round(decimals=1)
from here I wanna create a new column for the evolution ratio based on the week data compared with the row of the rolling mean at the previous week.
what is the best way to go here? (I don't have big data) I have tried unsuccessfully with .shift() but am very foreign to .shift()... I should get NAN for week 3 (fourth week) and ~47% for fifth week.
Any suggestion for retrieving the value at row with step -1?
Thanks and have a good day!
Your idea about using shift can perfectly work. The shift(x) function simply shifts a series (a full column in your case) of x steps.
A simple way to check if the rolling_mean_fourweeks is a good predictor can be to shift Column1 and then check how it differs from rolling_mean_fourweeks:
df['column1_shifted'] = df['Column1'].shift(-1)
df['rolling_accuracy'] = ((df['column1_shifted']-df['rolling_mean_fourweeks'])
/df['rolling_mean_fourweeks'])
resulting in:
I have a pandas.DataFrame which I would like to represent as string (not in Jupyter, not in IPython) with limited width (for later terminal output), without line wrapping (one value per output line) and with ellipses for excess columns in the middle. This is similar what Pandas does when printing to terminal. Is there a function for that? DataFrame.to_string lets me only wrap excess lines (with line_width) but I don't see a way to insert the ellipsis automatically.
If I understand your correctly, you could just do:
print(str(df))
But if you would like to specify n rows and n columns, pd.DataFrame.to_string has arguments for that:
print(df.to_string(max_rows=10, max_cols=10))
This would only display 10 columns (5 columns and ellipsis then another 5 columns), and 10 rows (5 rows and ellipsis then another 5 rows).
Right now, I see there are quick ways to get things like Sum/Avg/Max/Etc. for two or more rows or columns when building a table in GoodData.
quick total options
I am building a little table that shows last week and the week prior, and I'm trying to show the delta between them.
So if the first column is 100 and the second is 50, I want '-50'
If the first column is 25 and the second is 100, i want '75'
Is there an easy way to do this?
Let’s consider, that the first column contains result of calculating of metric #1 and the second column contains result of calculating of metric #2, you can simply create a metric #3, which would be defined as the (metric #1 - metric #2) or vice versa.
I have just tried my first sqlite select-statement and got a result (an iterator over tuples). So, in other words, every row is represented by a tuple and I can access value in the cells of the row like this: r[7] or r[3] (get value from the column 7 or column 3). But I would like to access columns not by their positions but by their names. Let us say, I would like to know the value in the column user_name. What is the way to do it?
I found the answer on my question here:
cursor.execute("PRAGMA table_info(tablename)")
print cursor.fetchall()
I am trying to figure out a calculation I can perform in C# to determine the rows per column. Let's say I know I am going to have 3 columns and my record count is 46. I know that I can mod the results to get a remainder, but I would like something more efficient than what I have tried. So I know I will have 16 rows per column with a remainder of 14 for the last column, but what is the best way to loop through the resutls and keep counts.
Integer divsion will give you the number of complete rows (46 / 3 = 15). You then check the modulus to see if you have any leftover (46 Mod 3 = 1; yep, you have one column to put in a final extra row.)
To loop through, just check the modulus of the current record index (zero-based) with your column count. That modulus is the (zero-based) column index. If it equals 0, you start a new row.
But from your question, it sounds like you already got this far. So am I misunderstanding the question?