How to visualize connected status reports per 5 minutes

How to visualize connected status reports per 5 minutes - azure-iot-hub

Iot Hub sends connected and disconnected events to a log analytics workspace.
How can we create a diagram using a log analytics kusto query that shows the availability of a device during the day with a 5 minute sampling?
Source data looks like this:
OperationName
DeviceId
TimeGenerated
deviceConnect
device1
2022-09-22T09:43:20
deviceDisconnect
device1
2022-09-22T09:53:20
deviceDisconnect
device2
2022-09-22T09:55:20
deviceConnect
device3
2022-09-22T10:00:20
deviceConnect
device4
2022-09-22T10:43:20
...
Resulting data set should be like:
Interval
DeviceId
Status
... assuming all devices disconnected
2022-09-22T09:40:00
device1
Disconnected
2022-09-22T09:40:00
device2
Disconnected
2022-09-22T09:40:00
device3
Disconnected
2022-09-22T09:40:00
device4
Disconnected
2022-09-22T09:45:00
device1
Connected
2022-09-22T09:45:00
device2
Disconnected
2022-09-22T09:45:00
device3
Disconnected
2022-09-22T09:45:00
device4
Disconnected
2022-09-22T09:50:00
device1
Connected
2022-09-22T09:50:00
device2
Connected
2022-09-22T09:50:00
device3
Disconnected
2022-09-22T09:50:00
device4
Disconnected
2022-09-22T09:55:00
device1
Disconnected
2022-09-22T09:55:00
device2
Connected
2022-09-22T09:55:00
device3
Disconnected
2022-09-22T09:55:00
device4
Disconnected
2022-09-22T09:00:00
device1
Disconnected
2022-09-22T09:00:00
device2
Disconnected
2022-09-22T09:00:00
device3
Connected
2022-09-22T09:00:00
device4
Connected
...
Status column can then be projected to Availability 100% or 0%.

Sparse display.
Each device is displayed on another y axis.
10,20,30 etc. stands for disconnected.
15,25,35 etc. stands for connected.
let t = datatable(OperationName:string, DeviceId:string, TimeGenerated:datetime)
[
"deviceConnect" ,"device1" ,datetime("2022-09-22T09:43:20")
,"deviceDisconnect" ,"device1" ,datetime("2022-09-22T09:53:20")
,"deviceDisconnect" ,"device2" ,datetime("2022-09-22T09:55:20")
,"deviceConnect" ,"device3" ,datetime("2022-09-22T10:00:20")
,"deviceConnect" ,"device4" ,datetime("2022-09-22T10:43:20")
];
// Solution starts here
let resolution = 5m;
let y_axis_devices_distance = 10;
let y_axis_device_states_distance = 5;
let TimeGenerated_start = toscalar(t | summarize bin(min(TimeGenerated), resolution));
let TimeGenerated_end = toscalar(t | summarize max(TimeGenerated)) + resolution;
let data_points = toint((bin(TimeGenerated_end, resolution) - bin(TimeGenerated_start, resolution)) / resolution) + 1;
let fictive_connects =
t
| summarize arg_min(TimeGenerated, OperationName) by DeviceId
| where OperationName == "deviceDisconnect"
| extend TimeGenerated = TimeGenerated_start, OperationName = "deviceConnect";
let devices =
t
| distinct DeviceId
| serialize
| extend rn = row_number();
union t, fictive_connects
| make-series Availability = y_axis_device_states_distance * sum(case(OperationName == "deviceConnect", 1, -1)) on TimeGenerated from TimeGenerated_start to TimeGenerated_end step resolution by DeviceId
| lookup devices on DeviceId
| extend Availability = array_concat(pack_array(Availability[0] + rn*y_axis_devices_distance), array_slice(Availability, 1, data_points - 1))
| render timechart with (accumulate=true)
Fiddle

Unified display of all devices (number of connected devices)
let t = datatable(OperationName:string, DeviceId:string, TimeGenerated:datetime)
[
"deviceConnect" ,"device1" ,datetime("2022-09-22T09:43:20")
,"deviceDisconnect" ,"device1" ,datetime("2022-09-22T09:53:20")
,"deviceDisconnect" ,"device2" ,datetime("2022-09-22T09:55:20")
,"deviceConnect" ,"device3" ,datetime("2022-09-22T10:00:20")
,"deviceConnect" ,"device4" ,datetime("2022-09-22T10:43:20")
];
// Solution starts here
let resolution = 5m;
let TimeGenerated_start = toscalar(t | summarize bin(min(TimeGenerated), resolution));
let TimeGenerated_end = toscalar(t | summarize max(TimeGenerated)) + resolution;
let fictive_connects = t
| summarize arg_min(TimeGenerated, OperationName) by DeviceId
| where OperationName == "deviceDisconnect"
| extend TimeGenerated = TimeGenerated_start, OperationName = "deviceConnect";
union t, fictive_connects
| make-series Availability = sum(case(OperationName == "deviceConnect", 1, -1)) on TimeGenerated from TimeGenerated_start to TimeGenerated_end step resolution
| render timechart with (accumulate=true)
Fiddle

Basic solution.
Each device state is 0 or 1.
The graphs of the devices overlap.
let t = datatable(OperationName:string, DeviceId:string, TimeGenerated:datetime)
[
"deviceConnect" ,"device1" ,datetime("2022-09-22T09:43:20")
,"deviceDisconnect" ,"device1" ,datetime("2022-09-22T09:53:20")
,"deviceDisconnect" ,"device2" ,datetime("2022-09-22T09:55:20")
,"deviceConnect" ,"device3" ,datetime("2022-09-22T10:00:20")
,"deviceConnect" ,"device4" ,datetime("2022-09-22T10:43:20")
];
// Solution starts here
let resolution = 5m;
let TimeGenerated_start = toscalar(t | summarize bin(min(TimeGenerated), resolution));
let TimeGenerated_end = toscalar(t | summarize max(TimeGenerated)) + resolution;
let fictive_connects = t
| summarize arg_min(TimeGenerated, OperationName) by DeviceId
| where OperationName == "deviceDisconnect"
| extend TimeGenerated = TimeGenerated_start, OperationName = "deviceConnect";
union t, fictive_connects
| make-series Availability = sum(case(OperationName == "deviceConnect", 1, -1)) on TimeGenerated from TimeGenerated_start to TimeGenerated_end step resolution by DeviceId
| render timechart with (accumulate=true)
Fiddle

Related

Telegram User Adder

Hi recently I made telegram scrapper that scrap users from telegram groups.
Now I am trying make user adder to it.
#!/bin/env python3
from telethon.sync import TelegramClient
from telethon.tl.functions.messages import GetDialogsRequest
from telethon.tl.types import InputPeerEmpty, InputPeerChannel, InputPeerUser
from telethon.errors.rpcerrorlist import PeerFloodError, UserPrivacyRestrictedError
from telethon.tl.functions.channels import InviteToChannelRequest
import configparser
import os, sys
import csv
import traceback
import time
import random
re="\033[1;31m"
gr="\033[1;32m"
cy="\033[1;36m"
def banner():
print(f"""
_____ __ ____ ____ ____ ___ ____ _____ __ ____ ____ ____ ___ ____
.----------------. .----------------. .----------------. .----------------. .----------------.
| .--------------. || .--------------. || .--------------. || .--------------. || .--------------. |
| | __ | || | ________ | || | ________ | || | _________ | || | _______ | |
| | / \ | || | |_ ___ `. | || | |_ ___ `. | || | |_ ___ | | || | |_ __ \ | |
| | / /\ \ | || | | | `. \ | || | | | `. \ | || | | |_ \_| | || | | |__) | | |
| | / ____ \ | || | | | | | | || | | | | | | || | | _| _ | || | | __ / | |
| | _/ / \ \_ | || | _| |___.' / | || | _| |___.' / | || | _| |___/ | | || | _| | \ \_ | |
| ||____| |____|| || | |________.' | || | |________.' | || | |_________| | || | |____| |___| | |
| | | || | | || | | || | | || | | |
| '--------------' || '--------------' || '--------------' || '--------------' || '--------------' |
'----------------' '----------------' '----------------' '----------------' '----------------'
_____ __ ____ ____ ____ ___ ____ _____ __ ____ ____ ____ ___ ____
version : 2.0
""")
cpass = configparser.RawConfigParser()
cpass.read('config.data')
try:
api_id = cpass['cred']['id']
api_hash = cpass['cred']['hash']
phone = cpass['cred']['phone']
client = TelegramClient(phone, api_id, api_hash)
except KeyError:
os.system('clear')
banner()
print(re+"[!] run python3 setup.py first !!\n")
sys.exit(1)
client.connect()
if not client.is_user_authorized():
client.send_code_request(phone)
os.system('clear')
banner()
client.sign_in(phone, input(gr+'[+] Enter the code: '+re))
os.system('clear')
banner()
input_file = sys.argv[1]
users = []
with open(input_file, encoding='UTF-8') as f:
rows = csv.reader(f,delimiter=",",lineterminator="\n")
next(rows, None)
for row in rows:
user = {}
user['username'] = row[0]
user['id'] = int(row[1])
user['access_hash'] = int(row[2])
user['name'] = row[3]
users.append(user)
chats = []
last_date = None
chunk_size = 200
groups=[]
result = client(GetDialogsRequest(
offset_date=last_date,
offset_id=0,
offset_peer=InputPeerEmpty(),
limit=chunk_size,
hash = 0
))
chats.extend(result.chats)
for chat in chats:
try:
if chat.megagroup== False:
groups.append(chat)
except:
continue
i=0
for group in groups:
print(gr+'['+cy+str(i)+gr+']'+cy+' - '+group.title)
i+=1
print(gr+'[+] Choose a group to add members')
g_index = input(gr+"[+] Enter a Number : "+re)
target_group=groups[int(g_index)]
target_group_entity = InputPeerChannel(target_group.id,target_group.access_hash)
print(gr+"[1] add member by user ID\n[2] add member by username ")
mode = int(input(gr+"Input : "+re))
n = 0
for user in users:
n += 1
if n % 50 == 0:
time.sleep(1)
try:
print ("Adding {}".format(user['id']))
if mode == 1:
if user['username'] == "":
continue
user_to_add = client.get_input_entity(user['username'])
elif mode == 2:
user_to_add = InputPeerUser(user['id'], user['access_hash'])
else:
sys.exit(re+"[!] Invalid Mode Selected. Please Try Again.")
client(InviteToChannelRequest(target_group_entity,[user_to_add]))
print(gr+"[+] Waiting for 2-10 Seconds...")
time.sleep(random.randrange(2, 10))
except FloodWaitError:
print(re+"[!] Getting Flood Error from telegram. \n[!] Script is stopping now. \n[!] Please try again after some time.")
except UserPrivacyRestrictedError:
print(re+"[!] The user's privacy settings do not allow you to do this. Skipping.")
except:
traceback.print_exc()
print(re+"[!] Unexpected Error")
continue
It works but partly I can hardly add 1-10 user at a time and I shows errors some of adding proccess
Kindly I tried most thing command says it needs much time but timer doesnt seem effect on it even I add some.Any suggestions any helps ?
Adding 1456428294
[!] Getting FloodWaitError from telegram.
[!] Script is stopping now.
[!] Please try again after some time.

FloodWaitError (420)
the same request was repeated many times. Must wait .seconds (you can access this attribute). For example:
from telethon import errors
try:
messages = await client.get_messages(chat)
print(messages[0].text)
except errors.FloodWaitError as e:
print('Have to sleep', e.seconds, 'seconds')
time.sleep(e.seconds)
Read the documentation:
https://docs.telethon.dev/en/latest/concepts/errors.html

data frame parsing column scala

I have some problem with parsing Dataframe
val result = df_app_clickstream.withColumn(
"attributes",
explode(expr(raw"transform(attributes, x -> str_to_map(regexp_replace(x, '{\\}',''), ' '))"))
).select(
col("userId"),
col("attributes").getField("campaign_id").alias("app_campaign_id"),
col("attributes").getField("channel_id").alias("app_channel_id")
)
result.show()
I have input like this :
-------------------------------------------------------------------------------
| userId | attributes |
-------------------------------------------------------------------------------
| f6e8252f-b5cc-48a4-b348-29d89ee4fa9e |{'campaign_id':082,'channel_id':'Chnl'}|
-------------------------------------------------------------------------------
and need to get output like this :
--------------------------------------------------------------------
| userId | campaign_id | channel_id|
--------------------------------------------------------------------
| f6e8252f-b5cc-48a4-b348-29d89ee4fa9e | 082 | Facebook |
--------------------------------------------------------------------
but have error

you can try below solution
import org.apache.spark.sql.functions._
val data = Seq(("f6e8252f-b5cc-48a4-b348-29d89ee4fa9e", """{'campaign_id':082, 'channel_id':'Chnl'}""")).toDF("user_id", "attributes")
val out_df = data.withColumn("splitted_col", split(regexp_replace(col("attributes"),"'|\\}|\\{", ""), ","))
.withColumn("campaign_id", split(element_at(col("splitted_col"), 1), ":")(1))
.withColumn("channel_id", split(element_at(col("splitted_col"), 2), ":")(1))
out_df.show(truncate = false)
+------------------------------------+----------------------------------------+-----------------------------------+-----------+----------+
|user_id |attributes |splitted_col |campaign_id|channel_id|
+------------------------------------+----------------------------------------+-----------------------------------+-----------+----------+
|f6e8252f-b5cc-48a4-b348-29d89ee4fa9e|{'campaign_id':082, 'channel_id':'Chnl'}|[campaign_id:082, channel_id:Chnl]|082 |Chnl |
+------------------------------------+----------------------------------------+-----------------------------------+-----------+----------+

Work with data differencing (Deltas) in Spark using Dataframes

I have one parquet file in hdfs as initial load of my data. All the next parquets are only these datasets which have changes each day to the initial load (in chronological order). These are my deltas.
I want to read all or a few parquet files to have the late data of a specific date. Deltas can contain new records, too.
Example:
Initial Data (Folder: /path/spezific_data/20180101):
ID| Name | Street |
1 | "Tom" |"Street 1"|
2 | "Peter"|"Street 2"|
Delta 1 (Folder: /path/spezific_data/20180102):
ID| Name | Street |
1 | "Tom" |"Street 21"|
Delta 2 (Folder: : /path/spezific_data/20180103):
ID| Name | Street |
2 | "Peter" |"Street 44"|
3 | "Hans" | "Street 12"|
Delta 3 (Folder: : /path/spezific_data/20180105):
ID| Name | Street |
2 | "Hans" |"Street 55"|
It is possible that one day have Deltas but are loaded on day later. (Look at Delta 2 and Delta 3)
So the Folder /path/spezific_data/20180104 does note exist and we never want to load this date.
Now I want to load different cases.
Only initial data:
That is an easy load of a Directory.
initial = spark.read.parquet("hdfs:/path/spezific_data/20180101/")
Until a specific date (20180103)
initial_df = spark.read.parquet("hdfs:/path/spezific_data/20180101/") <br>
delta_df = spark.read.parquet("hdfs:/path/spezific_data/20180102/")
Now I have to merge ("Update" I know spark RDDs or dataframes can not do a update) these datasets and load the other one an merge too. Currently I solve this with this line of code (but in an for Loop):
new_df = delta_df.union(initila_df).dropDuplicates("ID") <br>
delta_df = spark.read.parqeut("hdfs:/mypath/20180103/") <br>
new_df = delta_df.union(new_df).dropDuplicates("ID") <br>
But I think that is not a good way to do this.
Load all data in Folder "/path/spezific_data"
I do this like step one with a for loop to the late date
Questions:
Can I do this like this?
Are there better ways?
Can I load this in one df and merge them there?
Currently the load takes very Long (one hour)
Update 1:
I tried to do something like this. If i run this code, it go through all dates until my enddate (I see this on my println(date)). After that, I get an Java.lang.StackOverflowError.
Where is the error?
import org.apache.spark.sql.functions.col
import util.control.Breaks._
var sourcePath = "hdfs:sourceparth/"
var destinationPath = "hdfs:destiantionpath/result"
var initial_date = "20170427"
var start_year = 2017
var end_year = 2019
var end_month = 10
var end_day = 31
var m : String = _
var d : String = _
var date : String = _
var delta_df : org.apache.spark.sql.DataFrame = _
var doubleRows_df : org.apache.spark.sql.DataFrame = _
//final DF, initial load
var final_df = spark.read.parquet(sourcePath + initial_date + "*")
breakable{
for(year <- 2017 to end_year; month <- 1 to 12; day <- 1 to 31){
//Create date String
m = month.toString()
d = day.toString()
if(month < 10)
m = "0" + m
if(day < 10)
d = "0" + d
date = year.toString() + m + d
try{
//one delta
delta_df = spark.read.parquet(sourcePath + date + "*")
//delete double Rows (i want to ignore them
doubleRows_df = delta_df.groupBy("key").count().where("count > 1").select("key")
delta_df = delta_df.join(doubleRows_df, Seq("key"), "leftanti")
//deletes all (old) rows in final_df, that are in delta_df
final_df = final_df.join(delta_df, Seq("key"), "leftanti")
//add all new rows in delta
final_df = final_df.union(delta_df)
println(date)
}catch{
case e:org.apache.spark.sql.AnalysisException=>{}
}
if(day == end_day && month == end_month && year == end_year)
break
}
}
final_df.write.mode("overwrite").parquet(destinationPath)
The full stacktrace:
19/11/26 11:19:04 WARN util.Utils: Suppressing exception in finally: Java heap space
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$3.apply(TorrentBroadcast.scala:271)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$3.apply(TorrentBroadcast.scala:271)
at org.apache.spark.util.io.ChunkedByteBufferOutputStream.allocateNewChunkIfNeeded(ChunkedByteBufferOutputStream.scala:87)
at org.apache.spark.util.io.ChunkedByteBufferOutputStream.write(ChunkedByteBufferOutputStream.scala:75)
at net.jpountz.lz4.LZ4BlockOutputStream.flushBufferedData(LZ4BlockOutputStream.java:205)
at net.jpountz.lz4.LZ4BlockOutputStream.write(LZ4BlockOutputStream.java:158)
at com.esotericsoftware.kryo.io.Output.flush(Output.java:181)
at com.esotericsoftware.kryo.io.Output.close(Output.java:191)
at org.apache.spark.serializer.KryoSerializationStream.close(KryoSerializer.scala:223)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$blockifyObject$1.apply$mcV$sp(TorrentBroadcast.scala:278)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1346)
at org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:277)
at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:126)
at org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:88)
at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:56)
at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1488)
at org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1006)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:930)
at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:874)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1677)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
Exception in thread "dag-scheduler-event-loop" java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$3.apply(TorrentBroadcast.scala:271)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$3.apply(TorrentBroadcast.scala:271)
at org.apache.spark.util.io.ChunkedByteBufferOutputStream.allocateNewChunkIfNeeded(ChunkedByteBufferOutputStream.scala:87)
at org.apache.spark.util.io.ChunkedByteBufferOutputStream.write(ChunkedByteBufferOutputStream.scala:75)
at net.jpountz.lz4.LZ4BlockOutputStream.flushBufferedData(LZ4BlockOutputStream.java:205)
at net.jpountz.lz4.LZ4BlockOutputStream.write(LZ4BlockOutputStream.java:158)
at com.esotericsoftware.kryo.io.Output.flush(Output.java:181)
at com.esotericsoftware.kryo.io.Output.require(Output.java:160)
at com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:246)
at com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:232)
at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.write(DefaultArraySerializers.java:54)
at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.write(DefaultArraySerializers.java:43)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
at org.apache.spark.serializer.KryoSerializationStream.writeObject(KryoSerializer.scala:209)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$blockifyObject$2.apply(TorrentBroadcast.scala:276)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$blockifyObject$2.apply(TorrentBroadcast.scala:276)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1337)
at org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:277)
at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:126)
at org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:88)
at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:56)
at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1488)
at org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1006)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:930)
at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:874)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1677)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)

distinct or dropDuplicates is not an option, since you can't control which values will be taken. It very whell might happen, that new value will not be added, while old value will be preserved.
You need to do join over ID - see types of joins here. The joined rows should then contain either only old, or only new, or both. When only old or only new - you take the one that present, when both - you take only new.
Example from here how to add multiple deltas at once.
Question: What are the best-selling and the second best-selling products in every category?
val dataset = Seq(
("Thin", "cell phone", 6000),
("Normal", "tablet", 1500),
("Mini", "tablet", 5500),
("Ultra thin", "cell phone", 5000),
("Very thin", "cell phone", 6000),
("Big", "tablet", 2500),
("Bendable", "cell phone", 3000),
("Foldable", "cell phone", 3000),
("Pro", "tablet", 4500),
("Pro2", "tablet", 6500))
.toDF("product", "category", "revenue")
val overCategory = Window.partitionBy('category).orderBy('revenue.desc)
val ranked = data.withColumn("rank", dense_rank.over(overCategory))
scala> ranked.show
+----------+----------+-------+----+
| product| category|revenue|rank|
+----------+----------+-------+----+
| Pro2| tablet| 6500| 1|
| Mini| tablet| 5500| 2|
| Pro| tablet| 4500| 3|
| Big| tablet| 2500| 4|
| Normal| tablet| 1500| 5|
| Thin|cell phone| 6000| 1|
| Very thin|cell phone| 6000| 1|
|Ultra thin|cell phone| 5000| 2|
| Bendable|cell phone| 3000| 3|
| Foldable|cell phone| 3000| 3|
+----------+----------+-------+----+
scala> ranked.where('rank <= 2).show
+----------+----------+-------+----+
| product| category|revenue|rank|
+----------+----------+-------+----+
| Pro2| tablet| 6500| 1|
| Mini| tablet| 5500| 2|
| Thin|cell phone| 6000| 1|
| Very thin|cell phone| 6000| 1|
|Ultra thin|cell phone| 5000| 2|
+----------+----------+-------+----+
UPDATE 1:
First of all consider using date utilities instead of manually iterating over numbers to get date:
Date dt = new Date();
LocalDateTime.from(dt.toInstant()).plusDays(1);
See this for more details.
Second - please post full stacktrace, not just StackOverflowException.

SQL select latitude and longitude point within bounding box

I am using google map and I would like to query my SQL to find all point within the bounding box.
In google map, I am using this javascript to get the bounding rectangle
var bounds = map.getBounds();
var ne = bounds.getNorthEast(), sw = bounds.getSouthWest();
var args = {
NW: { lat: ne.lat(), lng: sw.lng() },
NE: { lat: ne.lat(), lng: ne.lng() },
SE: { lat: sw.lat(), lng: ne.lng() },
SW: { lat: sw.lat(), lng: sw.lng() },
}; //NW = North-West, NE = North-East, SE = South-East, SW = South-West
I am, then, using LINQ to select all places from my SQL database:
//nw = North-West, ne = North-East, se = South-East, sw = South-West
double minLat = Math.Min(nw.Lat, Math.Min(ne.Lat, Math.Min(se.Lat, sw.Lat)));
double maxLat = Math.Max(nw.Lat, Math.Max(ne.Lat, Math.Max(se.Lat, sw.Lat)));
double minLng = Math.Min(nw.Lng, Math.Min(ne.Lng, Math.Min(se.Lng, sw.Lng)));
double maxLng = Math.Max(nw.Lng, Math.Max(ne.Lng, Math.Max(se.Lng, sw.Lng)));
return (from rec in tblPlaces.AsNoTracking()
where (rec.Lat >= minLat) && (rec.Lat <= maxLat) && (rec.Lng >= minLng) && (rec.Lng <= maxLng)
select rec).ToList<tblPlace>();
It works well when it is on quite a zoom (google zoom <= 15). But when zooming out to country size (ie. you can see the whole country), it doesn't find the point in my database.
On the debugging, I found the longitude number is way smaller than any point in my database. How is it possible? I zoomed out to see the whole country.
Is the way I select the latitude and longitude is wrong?

I am not an expert in geography but this looks simple. Let us start with longitude. The bounding box could be on one side or across the antimeridian:
-180 0 +180
| |
| +-----+ |
| -10 | x | +10 |
| +-----+ |
| |
| +-----+
| +170 | x | -170
| +-----+
| |
A given longitude exists inside the bounding box if:
lng1 <= lng2 AND (lng1 <= lng AND lng <= lng2) /* both edges on same side */
OR
lng1 > lng2 AND (lng1 <= lng OR lng <= lng2) /* edges on opposite sides */
A given latitude exists inside the bounding box if:
lat1 >= lat2 AND (lat1 >= lat AND lat >= lat2) /* both edges on same side */
OR
lat1 < lat2 AND (lat1 >= lat OR lat >= lat2) /* edges on opposite sides */
If latitudes do not wrap around e.g. in Google Maps API then opposite sides test is not required.
Some tests on db<>fiddle and a jsFiddle showing how LatLngBounds work

I finally found the answer after looking at the tutorial of Latitude and Longitude.
In summary: Latitude is between -90 to 90 and Longitude is between -180 to 180
//Latitude
90 ---------------------- 90
0 ---------------------- 0
-90 ---------------------- -90
//Longitude
-180 0 180
| | |
| | |
| | |
| | |
-180 0 180
Now, the bounding box can overlap. So, the left can be greater than the right or the top can be greater than the bottom in the bounding box (rectangle). Based on How to search (predefined) locations (Latitude/Longitude) within a Rectangular, using Sql? (see the last answer), the solution is simply union all the combination depending on where the bounding box is.
If you can find a better and more efficient solution, I'll award the bounty to you :)
Cheers

st7789 TFT Driver parallel read

I want to read the ID of my St7789 driver through the parallel interface.
I'm using a 16 bit parallel interface.(DB1-DB16)
I am running the Read ID3 command.
I'm writing the DCh hexadecimal code to the controller.
Then I go to read.
The value to read is 0x52.
When reading, I send D / CX WRX and RDX signals.
But I never read the correct values from the parallel pins.
The code looks like this.
/////////////////////////////////////////////////////////////////////////////
GPIO_InitTypeDef GPIO_InitStructure;
  set_rs; / * RS high * /
  set_nw; / * Wr high * /
  clr_nrd; / * Rd low * /
  // DB1-DB8 <-> PC1-PC8 and DB10-DB12 <-> PC10-PC12
  GPIO_InitStructure.GPIO_Pin = GPIO_Pin_1 | GPIO_Pin_2 | GPIO_Pin_3 |
GPIO_Pin_4 | GPIO_Pin_5 | GPIO_Pin_6 | GPIO_Pin_7 | GPIO_Pin_8 |
GPIO_Pin_10 | GPIO_Pin_11 | gpıo_pin_12;
  GPIO_InitStructure.GPIO_Speed  = GPIO_Speed_50 MHz;
  GPIO_InitStructure.GPIO_Mode = GPIO_Mode_IN;
  GPIO_Init (GPIOC, & GPIO_InitStructure);
  // DB13 PD2
  GPIO_InitStructure.GPIO_Pin = GPIO_Pin_2;
  GPIO_InitStructure.GPIO_Speed  = GPIO_Speed_50 MHz;
  GPIO_InitStructure.GPIO_Mode = GPIO_Mode_IN;
  GPIO_Init (GPIOD, & GPIO_InitStructure);
  // DB14 <-> PB9, DB15 <-> PB10, DB16 <-> PB1, DB17 <-> PB2
  GPIO_InitStructure.GPIO_Pin = GPIO_Pin_1 | GPIO_Pin_2 | GPIO_Pin_9 |
GPIO_pin_10;
  GPIO_InitStructure.GPIO_Speed  = GPIO_Speed_50 MHz;
  GPIO_InitStructure.GPIO_Mode = GPIO_Mode_IN;
  GPIO_Init (GPIOB, & GPIO_InitStructure);
  // We read twice to be sure.
  PORTREAD();//16 bit Parallel read.
  PORTREAD();
// DB1-DB8 <-> PC1-PC8 and DB10-DB12 <-> PC10-PC12
GPIO_InitStructure.GPIO_Pin = GPIO_Pin_1 | GPIO_Pin_2 | GPIO_Pin_3 |
GPIO_Pin_4 | GPIO_Pin_5 | GPIO_Pin_6 | GPIO_Pin_7 | GPIO_Pin_8 |
GPIO_Pin_10
| GPIO_Pin_11 | gpıo_pin_12;
GPIO_InitStructure.GPIO_Speed  = GPIO_Speed_50 MHz;
GPIO_InitStructure.GPIO_Mode = GPIO_Mode_OUT;
GPIO_Init (GPIOC, & GPIO_InitStructure);
  // DB13 PD2
  GPIO_InitStructure.GPIO_Pin = GPIO_Pin_2;
  GPIO_InitStructure.GPIO_Speed  = GPIO_Speed_50 MHz;
  GPIO_InitStructure.GPIO_Mode = GPIO_Mode_OUT;
  GPIO_Init (GPIOD, & GPIO_InitStructure);
  // DB14 <-> PB9, DB15 <-> PB10, DB16 <-> PB1, DB17 <-> PB2
  GPIO_InitStructure.GPIO_Pin = GPIO_Pin_1 | GPIO_Pin_2 | GPIO_Pin_9 |
GPIO_pin_10;
  GPIO_InitStructure.GPIO_Speed  = GPIO_Speed_50 MHz;
  GPIO_InitStructure.GPIO_Mode = GPIO_Mode_OUT;
  GPIO_Init (GPIOB, & GPIO_InitStructure);
  set_nrd; / * Rd high * /
/////////////////////////////////////////////////////////////////////////////
First, the D / CX WRX RDX signals are sent. Next parallel pins are routed from the output to the input for reading.
We read it later.
Am I sending the RD signals wrong ?(High and low in wrong sequence)
How should the D / CX WRX RDX signal sequence be ?

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to visualize connected status reports per 5 minutes - azure-iot-hub

Related

Telegram User Adder

data frame parsing column scala

Work with data differencing (Deltas) in Spark using Dataframes

SQL select latitude and longitude point within bounding box

st7789 TFT Driver parallel read

Categories

Resources