Intermittent performance issues with Apache ignite 2.7.5

Intermittent performance issues with Apache ignite 2.7.5 - ignite

We are facing intermittent performance issues with ignite where response times become very high and we see below error in our logs. We have 10 indexed columns and I don't see any issues with indexes as all the columns in the "where" clause are indexed. Joins are happening on the fields with affinity colocation which means that joins are happening only on the data in a particular node and not across nodes.
[21:48:30,765][WARNING][jvm-pause-detector-worker][IgniteKernal%PincodeGrid] Possible too long JVM pause: 4939 milliseconds.
[21:48:30,783][WARNING][query-#120%PincodeGrid%][IgniteH2Indexing] Query execution is too long [time=5052 ms, sql='SELECT
Please let me know if you can provide any help on this. 
Apache Ignite version : 2.7.5
Ignite persistence is enabled (true)
2 node cluster in partitioned mode
RAM - 150 GB per node 
JVM xms and xmx 20G
Number of records - 160 million 
JVM options - 
/usr/java/jdk1.8.0_144/bin/java -XX:+AggressiveOpts -server -Xms20g -Xmx20g -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/etappdata/ignite/logs/PROD/etail-prod-ignite76-163/logs -XX:+ExitOnOutOfMemoryError -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M -Xloggc:/etappdata/ignite/logs/PROD/etail-prod-ignite76-163/gc.log -XX:+PrintAdaptiveSizePolicy -XX:+UseTLAB -verbose:gc -XX:+ParallelRefProcEnabled -XX:+UseLargePages -XX:+AggressiveOpts -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Addresses=true -Djava.net.preferIPv6Stack=false -Djava.net.preferIPv6Addresses=false -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=8996 -Dcom.sun.management.jmxremote.rmi.port=8996 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.local.only=false -Djava.rmi.server.hostname=etail-prod-ignite76-163 -XX:MaxDirectMemorySize=4g -javaagent:/tmp/apminsight-javaagent-prod/apminsight-javaagent.jar -Dfile.encoding=UTF-8 -XX:+UseG1GC -DIGNITE_QUIET=false -DIGNITE_SUCCESS_FILE=/ignite/apache-ignite-2.7.5-bin/work/ignite_success_7d9ec20d-9728-475a-aa80-4355eb8eaf02 -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=49112 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -DIGNITE_HOME=/ignite/apache-ignite-2.7.5-bin -DIGNITE_PROG_NAME=./bin/ignite.sh -cp /ignite/apache-ignite-2.7.5-bin/libs/:/ignite/apache-ignite-2.7.5-bin/libs/ignite-indexing/:/ignite/apache-ignite-2.7.5-bin/libs/ignite-spring/:/ignite/apache-ignite-2.7.5-bin/libs/licenses/ org.apache.ignite.startup.cmdline.CommandLineStartup config/config-cache.xml
Added additional details
[04:03:48,251][INFO][main][IgniteKernal%PincodeGrid] IgniteConfiguration [igniteInstanceName=PincodeGrid, pubPoolSize=30, svcPoolSize=30, callbackPoolSize=30, stripedPoolSize=30, sysPoolSize=30, mgmtPoolSize=4, igfsPoolSize=30, dataStreamerPoolSize=30, utilityCachePoolSize=30, utilityCacheKeepAliveTime=60000, p2pPoolSize=2, qryPoolSize=30, igniteHome=/ignite/apache-ignite-2.7.5-bin, igniteWorkDir=/ignite/apache-ignite-2.7.5-bin/work, mbeanSrv=com.sun.jmx.mbeanserver.JmxMBeanServer#13221655, nodeId=6aee7bb4-2804-4396-a9ec-65abdc9483e3, marsh=BinaryMarshaller [], marshLocJobs=false, daemon=false, p2pEnabled=true, netTimeout=5000, sndRetryDelay=1000, sndRetryCnt=3, metricsHistSize=10000, metricsUpdateFreq=2000, metricsExpTime=9223372036854775807, discoSpi=TcpDiscoverySpi [addrRslvr=null, sockTimeout=0, ackTimeout=0, marsh=null, reconCnt=10, reconDelay=2000, maxAckTimeout=600000, forceSrvMode=false, clientReconnectDisabled=false, internalLsnr=null], segPlc=STOP, segResolveAttempts=2, waitForSegOnStart=true, allResolversPassReq=true, segChkFreq=10000, commSpi=TcpCommunicationSpi [connectGate=null, connPlc=org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$FirstConnectionPolicy#34123d65, enableForcibleNodeKill=false, enableTroubleshootingLog=false, locAddr=null, locHost=null, locPort=47100, locPortRange=100, shmemPort=-1, directBuf=true, directSndBuf=false, idleConnTimeout=600000, connTimeout=5000, maxConnTimeout=600000, reconCnt=10, sockSndBuf=32768, sockRcvBuf=65536, msgQueueLimit=2048, slowClientQueueLimit=0, nioSrvr=null, shmemSrv=null, usePairedConnections=true, connectionsPerNode=10, tcpNoDelay=true, filterReachableAddresses=false, ackSndThreshold=32, unackedMsgsBufSize=0, sockWriteTimeout=10000, boundTcpPort=-1, boundTcpShmemPort=-1, selectorsCnt=15, selectorSpins=0, addrRslvr=null, ctxInitLatch=java.util.concurrent.CountDownLatch#59474f18[Count = 1], stopping=false], evtSpi=org.apache.ignite.spi.eventstorage.NoopEventStorageSpi#65fb9ffc, colSpi=NoopCollisionSpi [], deploySpi=LocalDeploymentSpi [], indexingSpi=org.apache.ignite.spi.indexing.noop.NoopIndexingSpi#3590fc5b, addrRslvr=null, encryptionSpi=org.apache.ignite.spi.encryption.noop.NoopEncryptionSpi#397fbdb, clientMode=false, rebalanceThreadPoolSize=16, txCfg=TransactionConfiguration [txSerEnabled=false, dfltIsolation=REPEATABLE_READ, dfltConcurrency=PESSIMISTIC, dfltTxTimeout=0, txTimeoutOnPartitionMapExchange=0, pessimisticTxLogSize=0, pessimisticTxLogLinger=10000, tmLookupClsName=null, txManagerFactory=null, useJtaSync=false], cacheSanityCheckEnabled=true, discoStartupDelay=60000, deployMode=SHARED, p2pMissedCacheSize=100, locHost=null, timeSrvPortBase=31100, timeSrvPortRange=100, failureDetectionTimeout=80000, sysWorkerBlockedTimeout=30000, clientFailureDetectionTimeout=120000, metricsLogFreq=6000000, hadoopCfg=null, connectorCfg=ConnectorConfiguration [jettyPath=null, host=null, port=11211, noDelay=true, directBuf=false, sndBufSize=32768, rcvBufSize=32768, idleQryCurTimeout=600000, idleQryCurCheckFreq=60000, sndQueueLimit=0, selectorCnt=4, idleTimeout=7000, sslEnabled=false, sslClientAuth=false, sslCtxFactory=null, sslFactory=null, portRange=100, threadPoolSize=30, msgInterceptor=null], odbcCfg=null, warmupClos=null, atomicCfg=AtomicConfiguration [seqReserveSize=1000, cacheMode=PARTITIONED, backups=1, aff=null, grpName=null], classLdr=null, sslCtxFactory=null, platformCfg=null, binaryCfg=null, memCfg=null, pstCfg=null, dsCfg=DataStorageConfiguration [sysRegionInitSize=41943040, sysRegionMaxSize=104857600, pageSize=4096, concLvl=0, dfltDataRegConf=DataRegionConfiguration [name=Default_Region, maxSize=128849018880, initSize=112742891520, swapPath=null, pageEvictionMode=DISABLED, evictionThreshold=0.9, emptyPagesPoolSize=100, metricsEnabled=false, metricsSubIntervalCount=5, metricsRateTimeInterval=60000, persistenceEnabled=true, checkpointPageBufSize=4294967296], dataRegions=null, storagePath=/ignite/persistence, checkpointFreq=180000, lockWaitTime=10000, checkpointThreads=8, checkpointWriteOrder=SEQUENTIAL, walHistSize=20, maxWalArchiveSize=1073741824, walSegments=10, walSegmentSize=1073741824, walPath=/wal/pincode, walArchivePath=/wal/pincode/archive, metricsEnabled=false, walMode=BACKGROUND, walTlbSize=131072, walBuffSize=0, walFlushFreq=2000, walFsyncDelay=1000,
walRecordIterBuffSize=67108864, alwaysWriteFullPages=false, fileIOFactory=org.apache.ignite.internal.processors.cache.persistence.file.AsyncFileIOFactory#2d3379b4, metricsSubIntervalCnt=5, metricsRateTimeInterval=60000, walAutoArchiveAfterInactivity=-1, writeThrottlingEnabled=false, walCompactionEnabled=false, walCompactionLevel=1, checkpointReadLockTimeout=null], activeOnStart=true, autoActivation=true, longQryWarnTimeout=3000, sqlConnCfg=null, cliConnCfg=ClientConnectorConfiguration [host=null, port=10800, portRange=100, sockSndBufSize=0, sockRcvBufSize=0, tcpNoDelay=true, maxOpenCursorsPerConn=128, threadPoolSize=30, idleTimeout=0, jdbcEnabled=true, odbcEnabled=true, thinCliEnabled=true, sslEnabled=false, useIgniteSslCtxFactory=true, sslClientAuth=false, sslCtxFactory=null], mvccVacuumThreadCnt=2, mvccVacuumFreq=5000, authEnabled=false, failureHnd=NoOpFailureHandler [super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], commFailureRslvr=null]
Edit -2 - GC logs
2020-12-01T22:49:31.729+0530: 15.630: [GC pause (Metadata GC Threshold) (young) (initial-mark) 15.630: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 0, predicted base time: 38.43 ms, remaining time: 161.57 ms, target pause time: 200.00 ms]
15.630: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 24 regions, survivors: 2 regions, predicted young region time: 356.58 ms]
15.630: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 24 regions, survivors: 2 regions, old: 0 regions, predicted pause time: 395.01 ms, target pause time: 200.00 ms]
15.657: [G1Ergonomics (Mixed GCs) do not start mixed GCs, reason: concurrent cycle is about to start], 0.0274990 secs]
[Parallel Time: 15.8 ms, GC Workers: 21]
[GC Worker Start (ms): Min: 15630.2, Avg: 15630.5, Max: 15630.8, Diff: 0.7]
[Ext Root Scanning (ms): Min: 1.6, Avg: 3.4, Max: 11.4, Diff: 9.8, Sum: 71.8]
[Update RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Processed Buffers: Min: 0, Avg: 0.0, Max: 0, Diff: 0, Sum: 0]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[Code Root Scanning (ms): Min: 0.0, Avg: 1.2, Max: 12.6, Diff: 12.6, Sum: 24.2]
[Object Copy (ms): Min: 0.0, Avg: 9.5, Max: 12.0, Diff: 11.9, Sum: 199.9]
[Termination (ms): Min: 0.0, Avg: 0.7, Max: 0.8, Diff: 0.7, Sum: 14.8]
[Termination Attempts: Min: 1, Avg: 2.2, Max: 4, Diff: 3, Sum: 47]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.2, Diff: 0.1, Sum: 1.0]
[GC Worker Total (ms): Min: 14.5, Avg: 14.8, Max: 15.2, Diff: 0.7, Sum: 311.8]
[GC Worker End (ms): Min: 15645.3, Avg: 15645.3, Max: 15645.4, Diff: 0.1]
[Code Root Fixup: 0.6 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.5 ms]
[Other: 10.5 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 8.3 ms]
[Ref Enq: 0.5 ms]
[Redirty Cards: 0.5 ms]
[Humongous Register: 0.0 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.2 ms]
[Eden: 192.0M(1008.0M)->0.0B(984.0M) Survivors: 16.0M->40.0M Heap: 198.0M(20.0G)->33.7M(20.0G)]
[Times: user=0.31 sys=0.00, real=0.03 secs]
2020-12-01T22:49:31.757+0530: 15.657: [GC concurrent-root-region-scan-start]
2020-12-01T22:49:31.764+0530: 15.664: [GC concurrent-root-region-scan-end, 0.0067826 secs]
2020-12-01T22:49:31.764+0530: 15.664: [GC concurrent-mark-start]
2020-12-01T22:49:31.765+0530: 15.666: [GC concurrent-mark-end, 0.0015043 secs]
2020-12-01T22:49:31.766+0530: 15.666: [GC remark 2020-12-01T22:49:31.766+0530: 15.666: [Finalize Marking, 0.0010641 secs] 2020-12-01T22:49:31.767+0530: 15.667: [GC ref-proc, 0.0100232 secs] 2020-12-01T22:49:31.777+0530: 15.677: [Unloading, 0.0072592 secs], 0.0191010 secs]
[Times: user=0.20 sys=0.00, real=0.02 secs]
2020-12-01T22:49:31.785+0530: 15.685: [GC cleanup 37M->37M(20G), 0.0085803 secs]
[Times: user=0.04 sys=0.00, real=0.01 secs]
2020-12-01T22:53:45.090+0530: 268.990: [GC pause (G1 Evacuation Pause) (young) 268.990: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 0, predicted base time: 30.72 ms, remaining time: 169.28 ms, target pause time: 200.00 ms]
268.990: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 123 regions, survivors: 5 regions, predicted young region time: 1342.47 ms]
268.990: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 123 regions, survivors: 5 regions, old: 0 regions, predicted pause time: 1373.18 ms, target pause time: 200.00 ms]
269.040: [G1Ergonomics (Mixed GCs) do not start mixed GCs, reason: candidate old regions not available]
, 0.0494933 secs]
[Parallel Time: 31.8 ms, GC Workers: 21]
[GC Worker Start (ms): Min: 268991.8, Avg: 268992.1, Max: 268992.5, Diff: 0.7]
[Ext Root Scanning (ms): Min: 0.9, Avg: 1.9, Max: 5.7, Diff: 4.7, Sum: 39.6]
[Update RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Processed Buffers: Min: 0, Avg: 0.0, Max: 0, Diff: 0, Sum: 0]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.5]
[Code Root Scanning (ms): Min: 0.0, Avg: 1.0, Max: 7.0, Diff: 7.0, Sum: 20.2]
[Object Copy (ms): Min: 21.7, Avg: 28.1, Max: 29.1, Diff: 7.4, Sum: 591.0]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 21]
[GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 1.8]

If you have 20G heap you may expect to have an eventual full GC taking 4 seconds. I can believe that. Why do you need so much heap with Apache Ignite? How much heap is used during normal course of operations? You may also take a heap dump and search for a memory leak.
Node that Apache Ignite does not store the data on heap by default so it can't be explained by amount of data alone.
I ran your GC log through gceasy.io and it have found several GCs spanning around 2 seconds. I'm not sure it explains your observed 4s pause but you may expect 2s pauses obviously, from GC, which is in the same ballpark.
So you need to figure out why your JVM becomes slow sometimes. Maybe it's IO, virtualization pauses, etc, etc. Also, if your heap is never larger than 2G, maybe you should run with something like -Xmx4G?

Related

How to Stratify pandas DataFrame based on two columns?

I have the following pandas DataFrame:
account_num = [
1726905620833, 1727875510892, 1727925550921, 1727925575731, 1727345507414,
1713565531401, 1725735509119, 1727925546516, 1727925523656, 1727875509665,
1727875504742, 1727345504314, 1725475539855, 1791725523833, 1727925583805,
1727925544791, 1727925518810, 1727925606986, 1727925618602, 1727605517337,
1727605517354, 1727925583101, 1727925583201, 1727925583335, 1727025517810,
1727935718602]
total_due = [
1662.87, 3233.73, 3992.05, 10469.28, 799.01, 2292.98, 297.07, 5699.06, 1309.82,
1109.67, 4830.57, 3170.12, 45329.73, 46.71, 11981.58, 3246.31, 3214.25, 2056.82,
1611.73, 5386.16, 2622.02, 5011.02, 6222.10, 16340.90, 1239.23, 1198.98]
net_returned = [
0.0, 0.0, 0.0, 2762.64, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 12008.27,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2762.69, 0.0, 0.0, 0.0, 9254.66, 0.0, 0.0]
total_fees = [
0.0, 0.0, 0.0, 607.78, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2161.49, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 536.51, 0.0, 0.0, 0.0, 1712.11, 0.0, 0.0]
year = [2021, 2022, 2022, 2021, 2021, 2020, 2020, 2022, 2019, 2019, 2020, 2022, 2019,
2018, 2018, 2022, 2021, 2022, 2022, 2020, 2019, 2019, 2022, 2019, 2021, 2022]
flipped = [1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0]
proba = [
0.960085, 0.022535, 0.013746, 0.025833, 0.076159, 0.788912, 0.052489, 0.035279,
0.019701, 0.552127, 0.063949, 0.061279, 0.024398, 0.902681, 0.009441, 0.015342,
0.006832, 0.032988, 0.031879, 0.026412, 0.025159, 0.023195, 0.022104, 0.021285,
0.026480, 0.025837]
d = {
"account_num" : account_num,
"total_due" : total_due,
"net_returned" : net_returned,
"total_fees" : total_fees,
"year" : year,
"flipped" : flipped,
"proba" : proba
}
df = pd.DataFrame(data=d)
I want to sample the DataFrame by the "year" column according to a specific ratio for each year, which I have successfully done with the following code:
df_fractions = pd.DataFrame({"2018": [0.5], "2019": [0.5], "2020": [1.0], "2021": [0.8],
"2022": [0.7]})
df.year = df.year.astype(str)
grouped = df.groupby("year")
df_training = grouped.apply(lambda x: x.sample(frac=df_fractions[x.name]))
df_training = df_training.reset_index(drop=True)
However, when I invoke sample(), I also want to ensure the samples from each year are stratified according to the number of flipped accounts in that year. So, I want to stratify the per-year samples based on the flipped column. With this small, toy DataFrame, after sampling per year, the ratio of flipped per year are pretty good with respect to the original proportions. But this is not true for a really large DataFrame with close to 300K accounts.
So, that's really my question to all you Python experts: is there a better way to solve this problem than the solution I came up with?

How to create a pandas dataframe from csv where one column contains nested dictionary?

I have a CSV file and in one column there is a nested dictionary with the values of classification report, in a format like this one:
{'A': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 60},
'B': {'precision': 0.42, 'recall': 0.09, 'f1-score': 0.14, 'support': 150},
'micro avg': {'precision': 0.31, 'recall': 0.31, 'f1-score': 0.31, 'support': 1710},
'macro avg': {'precision': 0.13, 'recall': 0.08, 'f1-score': 0.071, 'support': 1710},
'weighted avg': {'precision': 0.29, 'recall': 0.31, 'f1-score': 0.26, 'support': 1710}}
I would like to get key_value1_level as a column in a data frame. So, is it possible to get the following result?
A_precision A_recall ...weighted_avg_precision weighted_avg_recall weighted_avg_f1-score weighted avg_support
0.0 0.0 0.29 0.31 0.26 1710
Thank you

You can use pd.json_normalize on that dictionary:
dct = {
"A": {"precision": 0.0, "recall": 0.0, "f1-score": 0.0, "support": 60},
"B": {"precision": 0.42, "recall": 0.09, "f1-score": 0.14, "support": 150},
"micro avg": {
"precision": 0.31,
"recall": 0.31,
"f1-score": 0.31,
"support": 1710,
},
"macro avg": {
"precision": 0.13,
"recall": 0.08,
"f1-score": 0.071,
"support": 1710,
},
"weighted avg": {
"precision": 0.29,
"recall": 0.31,
"f1-score": 0.26,
"support": 1710,
},
}
df = pd.json_normalize(dct, sep="_")
print(df)
Prints:
A_precision A_recall A_f1-score A_support B_precision B_recall B_f1-score B_support micro avg_precision micro avg_recall micro avg_f1-score micro avg_support macro avg_precision macro avg_recall macro avg_f1-score macro avg_support weighted avg_precision weighted avg_recall weighted avg_f1-score weighted avg_support
0 0.0 0.0 0.0 60 0.42 0.09 0.14 150 0.31 0.31 0.31 1710 0.13 0.08 0.071 1710 0.29 0.31 0.26 1710

Split column values to several in pandas dataframe

I am trying to do sentiment analysis on tweets using sentimentIntensityAnalyzer() from nltk.sentiment.vader
sid = SentimentIntensityAnalyzer()
listy = []
for index, row in data.iterrows():
ss = sid.polarity_scores(row["Tweets"])
listy.append(ss)
se = pd.Series(listy)
data['polarity'] = se.values
display(data.head(100))
This is the resulting dataFramee :
Tweets polarity
0 RT #spectatorindex: Facebook controls:\n\n- Wh... {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...
1 RT #YAATeamWest: Today we're at #BradfordUniSU... {'neg': 0.0, 'neu': 0.902, 'pos': 0.098, 'comp...
2 #SachinTendulkar launches India’s first Multip... {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...
3 How To Create a 360 Render (And How to Improv... {'neg': 0.0, 'neu': 0.722, 'pos': 0.278, 'comp...
4 The Most Disturbing Virtual Reality You Will E... {'neg': 0.174, 'neu': 0.826, 'pos': 0.0, 'comp...
5 VR Training for Troops 🎮\n\n... {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...
6 RT #DefenceHQ: The #BritishArmy has awarded a ... {'neg': 0.0, 'neu': 0.847, 'pos': 0.153, 'comp...
7 RT #UofGHumanities: #UofGCSPE Humanities Lectu... {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...
8 RT #OyezServices: Ever wanted a tour of Machu ... {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...
9 RT #ProjectDastaan: We are an Oxford Universit... {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...
10 RT #Paula_Piccard: Virtual reality will change... {'neg': 0.0, 'neu': 0.878, 'pos': 0.122, 'comp...
In order to do statistical analysis on the 'neg','pos','neu' and 'compound' entities in the polarity column I wanted to split the data into four different columns. To achieve this I used :
list_pos= []
list_neg = []
list_comp = []
list_neu = []
for index, row in data.iterrows():
list_pos.append(row['polarity']['pos'])
list_neg.append(row['polarity']['neg'])
list_comp.append(row['polarity']['compound'])
list_neu.append(row['polarity']['neu'])
se_pos = pd.Series(list_pos)
se_neg = pd.Series(list_neg)
se_comp = pd.Series(list_comp)
se_neu = pd.Series(list_neu)
data['positive'] = se_pos.values
data['negative'] = se_neg.values
data['compound'] = se_comp.values
data['neutral'] = se_neu.values
The resulting dataFrame:
Tweets polarity positive negative compound neutral
0 RT #spectatorindex: Facebook controls:\n\n- Wh... {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound... 0.000 0.000 0.0000 1.000
1 RT #YAATeamWest: Today we're at #BradfordUniSU... {'neg': 0.0, 'neu': 0.902, 'pos': 0.098, 'comp... 0.098 0.000 0.3612 0.902
2 #SachinTendulkar launches India’s first Multip... {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound... 0.000 0.000 0.0000 1.000
Is there a more concise way of achieving a similar dataFrame? Using the lambda function perhaps? Thanks for the help!

Drop in Fps while Reading From Camera

I have Two cameras, one is microsoft and another one is logitech.
For both cameras i have used the below pipeline.
gst-launch-1.0 -v v4l2src device=/dev/video1 ! videoconvert ! video/x-raw,format=I420,width=640,height=480 ! fpsdisplaysink
For Microsoft :
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 678, dropped: 10, current: 30.10, average: 29.71
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 678, dropped: 10, current: 30.10, average: 29.71
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 678, dropped: 10, current: 30.10, average: 29.71
But, when i moved my hand very close to the camera, or i closed the camera with my hand then the results are,
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 2554, dropped: 44, current: 7.52, average: 28.93
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0/GstTextOverlay:fps-display-text-overlay: text = rendered: 2558, dropped: 44, current: 7.51, average: 28.81
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 2558, dropped: 44, current: 7.51, average: 28.81
There is a Huge Drop in Frame Rate.
What is the problem in this scenario and how to resolve it??
For Logitech:
Same pipeline i had used, but the results are as follows,
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 0, dropped: 79, fps: 0.00, drop rate: 24.07
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 0, dropped: 79, fps: 0.00, drop rate: 24.07
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 0, dropped: 79, fps: 0.00, drop rate: 24.07
I am totally confused, What is the problem with these two scenario's??

How to round times in Xcode

I am struggling for days trying to solve this puzzle.
I have this code that calculates time IN & OUT as decimal hours: (6 min = 0.1 hr)~(60 min = 1.0 hr)
NSUInteger unitFlag = NSCalendarUnitHour | NSCalendarUnitMinute;
NSDateComponents *components = [calendar components:unitFlag
fromDate:self.outT
toDate:self.inT
options:0];
NSInteger hours = [components hour];
NSInteger minutes = [components minute];
if (minutes <0) (minutes -= 60*-1) && (hours -=1);
if (hours<0 && minutes<0)(hours +=24)&& (minutes -=60*-1);
if(hours<0 && minutes>0)(hours +=24)&& (minutes = minutes);
if(hours <0 && minutes == 00)(hours +=24)&&(minutes = minutes);
if(minutes >0)(minutes = (minutes/6));
self.blockDecimalLabel.text = [NSString stringWithFormat:#"%d.%d", (int)hours, (int)minutes];
The green lines show what the code does, what I am looking for is to round the minutes like the blue lines, 1,2 minutes round down to the next decimal hr, 3,4,5 minutes round up to the next decimal hr
What I am try to achieve is:
If the result is 11 minutes the code return 0.1 then only after 12 minutes it will return 0.2. What I am trying to do is if the result is 8 the code returns 01, but if it is 9 will round to the next decimal that is 0.2 and so on.The objective is do not loose maximum of 5 minutes in each multiple of 6 in worst cases. Doing this the maximum lost will be 3 minutes in average
Any input is more than welcome :)
Cheers

Your goals seem incoherent to me. However, I tried this:
let beh = NSDecimalNumberHandler(
roundingMode: .RoundPlain, scale: 1, raiseOnExactness: false,
raiseOnOverflow: false, raiseOnUnderflow: false, raiseOnDivideByZero: false
)
for t in 0...60 {
let div = Double(t)/60.0
let deci = NSDecimalNumber(double: div)
let deci2 = deci.decimalNumberByRoundingAccordingToBehavior(beh)
let result = deci2.doubleValue
println("min: \(t) deci: \(result)")
}
The output seems pretty much what you are asking for:
min: 0 deci: 0.0
min: 1 deci: 0.0
min: 2 deci: 0.0
min: 3 deci: 0.1
min: 4 deci: 0.1
min: 5 deci: 0.1
min: 6 deci: 0.1
min: 7 deci: 0.1
min: 8 deci: 0.1
min: 9 deci: 0.2
min: 10 deci: 0.2
min: 11 deci: 0.2
min: 12 deci: 0.2
min: 13 deci: 0.2
min: 14 deci: 0.2
min: 15 deci: 0.3
min: 16 deci: 0.3
min: 17 deci: 0.3
min: 18 deci: 0.3
min: 19 deci: 0.3
min: 20 deci: 0.3
min: 21 deci: 0.4
min: 22 deci: 0.4
min: 23 deci: 0.4
min: 24 deci: 0.4
min: 25 deci: 0.4
min: 26 deci: 0.4
min: 27 deci: 0.5
min: 28 deci: 0.5
min: 29 deci: 0.5
min: 30 deci: 0.5
min: 31 deci: 0.5
min: 32 deci: 0.5
min: 33 deci: 0.6
min: 34 deci: 0.6
min: 35 deci: 0.6
min: 36 deci: 0.6
min: 37 deci: 0.6
min: 38 deci: 0.6
min: 39 deci: 0.7
min: 40 deci: 0.7
min: 41 deci: 0.7
min: 42 deci: 0.7
min: 43 deci: 0.7
min: 44 deci: 0.7
min: 45 deci: 0.8
min: 46 deci: 0.8
min: 47 deci: 0.8
min: 48 deci: 0.8
min: 49 deci: 0.8
min: 50 deci: 0.8
min: 51 deci: 0.9
min: 52 deci: 0.9
min: 53 deci: 0.9
min: 54 deci: 0.9
min: 55 deci: 0.9
min: 56 deci: 0.9
min: 57 deci: 1.0
min: 58 deci: 1.0
min: 59 deci: 1.0
min: 60 deci: 1.0

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Intermittent performance issues with Apache ignite 2.7.5 - ignite

Related

How to Stratify pandas DataFrame based on two columns?

How to create a pandas dataframe from csv where one column contains nested dictionary?

Split column values to several in pandas dataframe

Drop in Fps while Reading From Camera

How to round times in Xcode

Categories

Resources