How to remove unwanted values in data when reading csv file

How to remove unwanted values in data when reading csv file - pandas

Reading Pina_Indian_Diabities.csv some of the values are strings, something like this
+AC0-5.4128147485
734 2
735 4
736 0
737 8
738 +AC0-5.4128147485
739 1
740 NaN
741 3
742 1
743 9
744 13
745 12
746 1
747 1
like in row 738, there re such values in other rows and columns as well.
How can I drop them?

Related

dataframe fill in next row with previous row value

AMA1=close1
AMA(t)=AMA(t-1)+alpha(t)*(close(t)-AMA(t-1))
id date close alpha AMA
1 2016-01-04 343 1 343
2 2016-01-05 335 2 327 (343+2*(335-343))
3 2016-01-06 337 3 357 (327+3*(337-327))
4 2016-01-07 338 -1 376 (357-1*(338-357))
I want to calculate column AMA of a dataframe, the rule likes this:
if id==1, AMA(1)=close(1)
if id>1, AMA(t)=AMA(t-1)+alpha(t)*(close(t)-AMA(t-1))
I used for loop, but it's very slow when i'm dealing with large data.
Does anyone know how to do this without for loop?Many thanks!!
for i in range(len(df)):
if i==1:
AMA(1)=close(1)
if i>1:
AMA(t)=AMA(t-1)+alpha(t)*(close(t)-AMA(t-1))

How to find the row associated with the min/max of a column?

So basically I have some simple SQL code that looks like the following;
SELECT
[Column1]
,[Column2]
,[Column3]
,[Column4]
,MIN([Column5]) AS maxColumn5
,MAX([Column6]) AS minColumn6
,SUM([Column7]) AS sumColumn7
,SUM([Column8]) AS sumColumn8
,SUM([Column9]) AS sumColumn9
FROM
[tableName]
GROUP BY
[Column1]
,[Column2]
,[Column3]
,[Column4]
What I am trying to do is also find the column either 'Column1', 'Column2', or 'Column3' that corresponds to the MIN([Column6]) and then the column that corresponds to MAX([Column8]).
The output should be exactly the same except there will be an extra 2 column at the end specifying which one the min and max are associated with.

I think there is a simple problem in your question, as Col1,Col2,Col3 that correspond to the max or min, are displayed directly, in other words you have them as you are grouping by Col1,Col2,Col3 & Col4.
As you did not provide some data, I will set some random data to prove my point.
Lets create a memory table similar to yours with 9 columns and fill it with random data for col6-8 with 10 rows for example, you can use the below:-
Declare #data Table(
Column1 int,Column2 int,Column3 int,Column4 int,Column5 int,Column6 int,Column7 int,Column8 int,Column9 int
)
declare #index int=5
while(#index>0)
begin
insert into #data values(1,2,3,4,RAND()*1000,RAND()*1000,RAND()*1000,RAND()*1000,RAND()*1000)
insert into #data values(5,6,7,8,RAND()*1000,RAND()*1000,RAND()*1000,RAND()*1000,RAND()*1000)
set #index=#index-1
end
we can see the data with the below
select * from #data order BY [Column1],[Column2],[Column3],[Column4]
Column1 Column2 Column3 Column4 Column5 Column6 Column7 Column8 Column9
1 2 3 4 669 203 278 364 577
1 2 3 4 389 316 290 548 661
1 2 3 4 835 555 942 985 604
1 2 3 4 477 743 580 305 414
1 2 3 4 431 296 471 150 352
1 2 3 4 346 220 573 941 633
1 2 3 4 392 450 652 978 883
1 2 3 4 235 479 751 136 978
1 2 3 4 906 183 141 915 783
1 2 3 4 329 342 682 977 870
5 6 7 8 218 740 41 299 816
5 6 7 8 800 630 674 888 799
5 6 7 8 27 307 446 743 345
5 6 7 8 501 928 824 592 691
5 6 7 8 439 624 260 757 547
5 6 7 8 287 610 287 708 652
5 6 7 8 441 711 433 642 343
5 6 7 8 751 928 237 53 535
5 6 7 8 594 768 708 173 33
5 6 7 8 352 703 943 867 661
now lets see the result of your grouping that you provided without any change
Col1 Col2 Col3 Col4 minCol5 maxCol6 maxCol8 sumCol7 sumCol8 sumCol9
1 2 3 4 235 743 985 5360 6299 6755
5 6 7 8 27 928 888 4853 5722 5422
so if we go back to your question, what is the value of Col1,Col2,Col3 for the maxCol6, well for each maxCol6 you have the values of Col1,Col2,Col3 & even Col4.
so what are the values for Col1,Col2,Col3 for maxCol16 that is 928, well they are 5,6 & 7.
ok, now lets say you want the record key that have that maxCol6, that is easy too, we would add an identity col as ID as below:-
Declare #data Table(
ID int identity(1,1), Column1 int,Column2 int,Column3 int,Column4 int,Column5 int,Column6 int,Column7 int,Column8 int,Column9 int
)
declare #index int=10
while(#index>0)
begin
insert into #data values(1,2,3,4,RAND()*1000,RAND()*1000,RAND()*1000,RAND()*1000,RAND()*1000)
insert into #data values(5,6,7,8,RAND()*1000,RAND()*1000,RAND()*1000,RAND()*1000,RAND()*1000)
set #index=#index-1
end
select * from #data order BY [Column1],[Column2],[Column3],[Column4]
;with agg as (
SELECT
[Column1]
,[Column2]
,[Column3]
,[Column4]
,MIN([Column5]) AS minColumn5
,MAX([Column6]) AS maxColumn6
,MAX([Column8]) AS maxColumn8
,SUM([Column7]) AS sumColumn7
,SUM([Column8]) AS sumColumn8
,SUM([Column9]) AS sumColumn9
FROM
#data [tableName]
GROUP BY
[Column1]
,[Column2]
,[Column3]
,[Column4]
)
--select * from agg order BY [Column1],[Column2],[Column3],[Column4]
select agg.*,maxCol6.ID [MaxCol6Seq],maxCol8.ID [MaxCol8Seq] from agg
inner join #data maxCol6
on agg.Column1=maxCol6.Column1
and agg.Column2=maxCol6.Column2
and agg.Column3=maxCol6.Column3
and agg.Column4=maxCol6.Column4
and agg.maxColumn6=maxCol6.Column6
inner join #data maxCol8
on agg.Column1=maxCol8.Column1
and agg.Column2=maxCol8.Column2
and agg.Column3=maxCol8.Column3
and agg.Column4=maxCol8.Column4
and agg.maxColumn8=maxCol8.Column8
As this is a new run for this set of data , below:-
ID Column1 Column2 Column3 Column4 Column5 Column6 Column7 Column8 Column9
1 1 2 3 4 201 848 993 50 304
3 1 2 3 4 497 207 644 399 104
5 1 2 3 4 445 321 822 151 185
7 1 2 3 4 611 402 620 61 543
9 1 2 3 4 460 409 182 915 211
11 1 2 3 4 886 804 180 213 282
13 1 2 3 4 614 709 932 806 162
15 1 2 3 4 795 752 110 474 463
17 1 2 3 4 737 545 77 648 727
19 1 2 3 4 788 862 266 464 851
20 5 6 7 8 218 561 943 572 54
18 5 6 7 8 741 621 610 214 536
16 5 6 7 8 579 248 374 693 761
14 5 6 7 8 866 415 198 528 657
12 5 6 7 8 905 947 500 50 387
10 5 6 7 8 492 860 948 299 220
8 5 6 7 8 861 328 727 40 327
6 5 6 7 8 435 534 707 769 777
4 5 6 7 8 587 68 45 184 614
2 5 6 7 8 189 24 289 121 772
The result is as below:-
C1 C2 C3 C4 minC5 maxC6 maxC8 sumC7 sumC8 sumC9 MaxCol6Seq MaxCol8Seq
1 2 3 4 201 862 915 4826 4181 3832 19 9
5 6 7 8 189 947 769 5341 3470 5105 12 6
Hope this helps.

If you just want a flag on each row specifying whether the value is the overall maximum or minimum, you can use window functions and CASE:
SELECT [Column1], [Column2], [Column3], [Column4],
MAX([Column5]) AS maxColumn5,
MIN([Column6]) AS minColumn6,
SUM([Column7]) AS sumColumn7,
SUM([Column8]) AS sumColumn8,
SUM([Column9]) AS sumColumn9,
(CASE WHEN MIN([Column6]) = MIN(MIN([Column6])) OVER () THEN 1 ELSE 0 END) as is_min_column6,
(CASE WHEN MAX([Column7]) = MAX(MAX([Column7])) OVER () THEN 1 ELSE 0 END) as is_max_column7
FROM [tableName]
GROUP BY [Column1], [Column2], [Column3], [Column4]

Parsing Json Data from select query in SQL Server

I have a situation where I have a table that has a single varchar(max) column called dbo.JsonData. It has x number of rows with x number of properties.
How can I create a query that will allow me to turn the result set from a select * query into a row/column result set?
Here is what I have tried:
SELECT *
FROM JSONDATA
FOR JSON Path
But it returns a single row of the json data all in a single column:
JSON_F52E2B61-18A1-11d1-B105-00805F49916B
[{"Json_Data":"{\"Serial_Number\":\"12345\",\"Gateway_Uptime\":17,\"Defrost_Cycles\":0,\"Freeze_Cycles\":2304,\"Float_Switch_Raw_ADC\":1328,\"Bin_status\":2304,\"Line_Voltage\":0,\"ADC_Evaporator_Temperature\":0,\"Mem_Sw\":1280,\"Freeze_Timer\":2560,\"Defrost_Timer\":593,\"Water_Flow_Switch\":3328,\"ADC_Mid_Temperature\":2560,\"ADC_Water_Temperature\":0,\"Ambient_Temperature\":71,\"Mid_Temperature\":1259,\"Water_Temperature\":1259,\"Evaporator_Temperature\":1259,\"Ambient_Temperature_Off_Board\":0,\"Ambient_Temperature_On_Board\":0,\"Gateway_Info\":\"{\\\"temp_sensor\\\":0.00,\\\"temp_pcb\\\":82.00,\\\"gw_uptime\\\":17.00,\\\"winc_fw\\\":\\\"19.5.4\\\",\\\"gw_fw_version\\\":\\\"0.0.0\\\",\\\"gw_fw_version_git\\\":\\\"2a75f20-dirty\\\",\\\"gw_sn\\\":\\\"328\\\",\\\"heap_free\\\":11264.00,\\\"gw_sig_csq\\\":0.00,\\\"gw_sig_quality\\\":0.00,\\\"wifi_sig_strength\\\":-63.00,\\\"wifi_resets\\\":0.00}\",\"ADC_Ambient_Temperature\":1120,\"Control_State\":\"Bin Full\",\"Compressor_Runtime\":134215680}"},{"Json_Data":"{\"Serial_Number\":\"12345\",\"Gateway_Uptime\":200,\"Defrost_Cycles\":559,\"Freeze_Cycles\":510,\"Float_Switch_Raw_ADC\":106,\"Bin_status\":0,\"Line_Voltage\":119,\"ADC_Evaporator_Temperature\":123,\"Mem_Sw\":113,\"Freeze_Timer\":0,\"Defrost_Timer\":66,\"Water_Flow_Switch\":3328,\"ADC_Mid_Temperature\":2560,\"ADC_Water_Temperature\":0,\"Ambient_Temperature\":71,\"Mid_Temperature\":1259,\"Water_Temperature\":1259,\"Evaporator_Temperature\":54,\"Ambient_Temperature_Off_Board\":0,\"Ambient_Temperature_On_Board\":0,\"Gateway_Info\":\"{\\\"temp_sensor\\\":0.00,\\\"temp_pcb\\\":82.00,\\\"gw_uptime\\\":199.00,\\\"winc_fw\\\":\\\"19.5.4\\\",\\\"gw_fw_version\\\":\\\"0.0.0\\\",\\\"gw_fw_version_git\\\":\\\"2a75f20-dirty\\\",\\\"gw_sn\\\":\\\"328\\\",\\\"heap_free\\\":10984.00,\\\"gw_sig_csq\\\":0.00,\\\"gw_sig_quality\\\":0.00,\\\"wifi_sig_strength\\\":-60.00,\\\"wifi_resets\\\":0.00}\",\"ADC_Ambient_Temperature\":1120,\"Control_State\":\"Defrost\",\"Compressor_Runtime\":11304}"},{"Json_Data":"{\"Seri...
What am I missing?
I can't specify the columns explicitly because the json strings aren't always the same.
This what I expect:
Serial_Number Gateway_Uptime Defrost_Cycles Freeze_Cycles Float_Switch_Raw_ADC Bin_status Line_Voltage ADC_Evaporator_Temperature Mem_Sw Freeze_Timer Defrost_Timer Water_Flow_Switch ADC_Mid_Temperature ADC_Water_Temperature Ambient_Temperature Mid_Temperature Water_Temperature Evaporator_Temperature Ambient_Temperature_Off_Board Ambient_Temperature_On_Board ADC_Ambient_Temperature Control_State Compressor_Runtime temp_sensor temp_pcb gw_uptime winc_fw gw_fw_version gw_fw_version_git gw_sn heap_free gw_sig_csq gw_sig_quality wifi_sig_strength wifi_resets LastModifiedDateUTC Defrost_Cycle_time Freeze_Cycle_time
12345 251402 540 494 106 0 98 158 113 221 184 0 0 0 1259 1259 1259 33 0 0 0 Freeze 10833 0 78 251402 19.5.4 0.0.0 2a75f20-dirty 328.00000000 10976 0 0 -61 0 2018-03-20 11:15:28.000 0 0
12345 251702 540 494 106 0 98 178 113 517 184 0 0 0 1259 1259 1259 22 0 0 0 Freeze 10838 0 78 251702 19.5.4 0.0.0 2a75f20-dirty 328.00000000 10976 0 0 -62 0 2018-03-20 11:15:42.000 0 0
...
Thank you,
Ron

Getting more data while converting data int to float and doing division and Multiplying with int?

I have three columns as shown in below tableA
Student Day Shifts
129 11 4
91 9 6
166 19 8
164 26 12
146 11 6
147 16 8
201 8 3
164 4 2
186 8 6
165 7 4
171 10 4
104 5 4
1834 134 67
I am writing a tvf to calculate Value of Points generated for Students as below
ALTER function Statagic(
#StartDate date
)
RETURNS TABLE
AS
RETURN
(
with src as
( select
Division=case when Shifts=0 then 0 else cast(Day as float)/cast(Shifts as float) end,*
from TableA
)
,tgt as
(select *,Points=Student*Division from src
)
select * from tgt)
When i execute above tvf(select * from Statagic('3/16/2014'))
My output is below
129 11 4 2.75 354.75
91 9 6 1.5 136.5
166 19 8 2.375 394.25
164 26 12 2.16666666666667 355.333333333333
146 11 6 1.83333333333333 267.666666666667
147 16 8 2 294
201 8 3 2.66666666666667 536
164 4 2 2 328
186 8 6 1.33333333333333 248
165 7 4 1.75 288.75
171 10 4 2.5 427.5
104 5 4 1.25 130
1834 134 67 2 3668
Note :
If you see the last row for three columns in the table is the total of rest column.So when you see the last row in the Output of TVF for last two columns when i am adding i am not getting same data i am getting more.
Guys please help me i am struggling to fix this bug i tried in all ways but i am unable to fix it.
select 354.75+136.5+394.25+355.333333333333+267.666666666667+294+536+328+248+288.75+427.5+130=3760.750000000000
3668 is not euql to 3760.75(I am getting more 100 value)

SQL Query: How to pull counts of two coulmns from respective tables

Given two tables:
1st Table Name: FACETS_Business_NPI_Provider
Buss_ID NPI Bussiness_Desc
11 222 Eleven 222
12 223 Twelve 223
13 224 Thirteen 224
14 225 Fourteen 225
11 226 Eleven 226
12 227 Tweleve 227
12 228 Tweleve 228
2nd Table : FACETS_PROVIDERs_Practitioners
NPI PRAC_NO PROV_NAME PRAC_NAME
222 943 P222 PR943
222 942 P222 PR942
223 931 P223 PR931
224 932 P224 PR932
224 933 P224 PR933
226 950 P226 PR950
227 951 P227 PR951
228 952 P228 PR952
228 953 P228 PR953
With below query I'm getting following results whereas it is expected to have the provider counts from table FACETS_Business_NPI_Provider (i.e. 3 instead of 4 for Buss_Id 12 and 2 instead of 3 for Buss_Id 11, etc).
SELECT BP.Buss_ID,
COUNT(BP.NPI) PROVIDER_COUNT,
COUNT(PP.PRAC_NO)PRACTITIONER_COUNT
FROM FACETS_Business_NPI_Provider BP
LEFT JOIN FACETS_PROVIDERs_Practitioners PP
ON PP.NOI=BP.NPI
group by BP.Buss_ID
Buss_ID PROVIDER_COUNT PRACTITIONER_COUNT
11 3 3
12 4 4
13 2 2
14 1 0

If I understood it correctly, you might want to add a DISTINCT clause to the columns.
Here is an SQL Fiddle, which we can probably use to discuss further.
http://sqlfiddle.com/#!2/d9a0e6/3

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to remove unwanted values in data when reading csv file - pandas

Related

dataframe fill in next row with previous row value

How to find the row associated with the min/max of a column?

Parsing Json Data from select query in SQL Server

Getting more data while converting data int to float and doing division and Multiplying with int?

SQL Query: How to pull counts of two coulmns from respective tables

Categories

Resources