How to use Bareos (Bacula) Storage Daemon with multiple disks? - bacula

I have Bareos Storage daemon (bareos-sd) with three 2 Tb HDD. I want them to be seen as one storage and Bareos auto-switched on next disk when the previous one is full.
Now I have all disks as different Devices with different Media Type and thee Storage with the corresponding disks. In Job's Pool I set Sorage as comma-separated my three Storages. But now my first disk is full and Bareos do not use next disk.

You have to specify another device directive. Taken from http://www.bacula.org/5.0.x-manuals/en/main/main/Storage_Daemon_Configuratio.html
If you want to write into more than one directory (i.e. to spread the load to different disk drives), you will need to define two Device resources, each containing an Archive Device with a different directory
So just create another 'Device' directive, so that you have TWO device directives, like this:
Device {
Name = FifoStorage
Media Type = Fifo
Device Type = Fifo
Archive Device = /folder1
LabelMedia = yes
Random Access = no
AutomaticMount = no
RemovableMedia = no
MaximumOpenWait = 60
AlwaysOpen = no
}
Device {
Name = FifoStorage2
Media Type = Fifo
Device Type = Fifo
Archive Device = /folder2
LabelMedia = yes
Random Access = no
AutomaticMount = no
RemovableMedia = no
MaximumOpenWait = 60
AlwaysOpen = no
}

Related

How to send a packet to all switches using ryu controller?

I need to measure link delay in Ryu controller. I want to send a packet-out message for all switches by the controller and calculate the time between the sent packet-out message and received the packet-in message. I am a beginner in Ryu and don't know how to send a packet with specific EtherType such as 0x8fc to all switches. I have obtained the MAC of all switches and build a packet. how I can send a packet with specific EtherType to all switches? I don't know what is the db parameter for every switch?
def send_packet(self, dp, port, pkt):
ofproto = dp.ofproto
parser = dp.ofproto_parser
pkt.serialize()
data = pkt.data
action = [parser.OFPActionOutput(port=port)]
out = parser.OFPPacketOut(
datapath=dp, buffer_id=ofproto.OFP_NO_BUFFER,
in_port=ofproto.OFPP_CONTROLLER,
actions=action, data=data)
dp.send_msg(out)
def create_packet(self):
i=l=0
for l in range(1,len(self.adjacency)+1):
#print("\n")
for i in self.adjacency[l]:
ethertype = 0x8fc
dst = self.macaddr(i)
src = self.macaddr(l)
e = ethernet.ethernet(dst, src, ethertype)
p = packet.Packet()
p.add_protocol(e)
p.add_protocol(time.time())
p.serialize()
port=self.adjacency[l][i]
send_packet(self, dp **??????** , port, p):
DP is the abbreviation for DataPathID that is kind of Uniq ID for an OpenFlow switch in your network.
According to the OpenFlow specification:
“The datapath_id field uniquely identifies a datapath. The lower 48
bits are intended for the switch MAC address, while the top 16 bits
are up to the implementer. An example use of the top 16 bits would be
a VLAN ID to distinguish multiple virtual switch instances on a single
physical switch.”
If you're using Mininet and for example, you run linear topology:
mn --controller remote --topo linear,3
Your topology will be:
s1 -- s2 -- s3
| | |
h1 h2 h3
DataPathID's will be:
s1: 0x0000000000000001
s2: 0x0000000000000002
s3: 0x0000000000000003
Pay attention that in other testbeds this numbers may be different but they're alway 16 digit hex.

How do I make an if command that changes what the script does for a function activated by 2 different tools trying to update a different leader stat

Basically I am trying to make two tools activate the same function except one tool makes the function update one leader stat while the other tool makes the funtion update a different leader stat
local remote = game.ReplicatedStorage.Give
remote.OnServerEvent:Connect(function(Player)
local plr = Player
if Activated by Starterpack.Child.Cloud then
plr.leaderstats.JumpBoost.Value = plr.leaderstats.JumpBoost.Value +10
or if Activated by Starterpack.Child.Speed then
plr.leaderstats.Speed.Value = plr.Leaderstats.Speed.Value +10
end
end)
I expected it to allow one tool to activate the same function as the other tool but change a different leader stat
RemoteEvent.FireServer let you pass any number of args when you invoke it. Have your tools each supply a different identifier, and then you can key off the identifier in RemoteEvent.OnServerEvent.
LocalScript inside Tool 1 - Cloud
local remoteGive = game.ReplicatedStorage.Give
local tool = script.Parent
tool.Equipped:Connect(function()
remoteGive:FireServer("Cloud")
end
LocalScript inside Tool 2 - Speed
local remote = game.ReplicatedStorage.Give
local tool = script.Parent
tool.Equipped:Connect(function()
remote:FireServer("Speed")
end)
Server Script
local remote = game.ReplicatedStorage.Give
remote.OnServerEvent:Connect(function(Player, toolId)
if toolId == "Cloud" then
Player.leaderstats.JumpBoost.Value = Player.leaderstats.JumpBoost.Value + 10
elseif toolId == "Speed" then
Player.leaderstats.Speed.Value = Player.Leaderstats.Speed.Value + 10
end
end)

MBED OS 5.9 LoRA set up in SF7

Do you know how to set up the Spreading Factor to 12 in a Mbed-OS LoRaWAN protocol APIs to connect to a LoRaWAN network using OTAA?
I'm trying to make LoRA node to use Spreading Factor SF12, because the default one is SF7. I know that in the PHY layer we can change Radio configurations. There are several examples to change between the different sub-GHz frequency bands, however, I can't find one on how to change the LoRa modulation SF between 7 and 12 and with a bandwidth of 125 kHz.
I'm using an SX1276 radio at EU 868 MHz config.
In the source code you can find the SF7-12 different configurations, but there is not a clear way to set it up. These configs are the definitions (#define) DR_0, DR_, etc ).
In the configuration file in the Phy part you find some example like this:
"phy": {
"help": "LoRa PHY region. 0 = EU868 (default), 1 = AS923,
2 = AU915, 3 = CN470, 4 = CN779, 5 = EU433,
6 = IN865, 7 = KR920, 8 = US915, 9 = US915_HYBRID",
"value": "0"
},
But there is no examples or description for the Spreading Factor.
I would like to change it via source code, rather than the configuration file.
EDIT 1:
after Jon's answer, I add the following lines, but still not forcing the SF12 Joins.
retcode = lorawan.disable_adaptive_datarate ();
retcode = lorawan.set_datarate (0); // DR_0
Call:
lorawan.set_datarate(0); // SF12 125 KHz
Make sure to:
Disable ADR.
Either use ABP, or call the function above in the JOIN_SUCCESS event handler. This is because join procedure always starts at SF7, and then keeps the data rate on which the join succeeded.

Spark : Data processing using Spark for large number of files says SocketException : Read timed out

I am running Spark in standalone mode on 2 machines which have these configs
500gb memory, 4 cores, 7.5 RAM
250gb memory, 8 cores, 15 RAM
I have created a master and a slave on 8core machine, giving 7 cores to worker. I have created another slave on 4core machine with 3 worker cores. The UI shows 13.7 and 6.5 G usable RAM for 8core and 4core respectively.
Now on this I have to process an aggregate of user ratings over a period of 15 days. I am trying to do this using Pyspark
This data is stored in hourwise files in day-wise directories in an s3 bucket, every file must be around 100MB eg
s3://some_bucket/2015-04/2015-04-09/data_files_hour1
I am reading the files like this
a = sc.textFile(files, 15).coalesce(7*sc.defaultParallelism) #to restrict partitions
where files is a string of this form 's3://some_bucket/2015-04/2015-04-09/*,s3://some_bucket/2015-04/2015-04-09/*'
Then I do a series of maps and filters and persist the result
a.persist(StorageLevel.MEMORY_ONLY_SER)
Then I need to do a reduceByKey to get an aggregate score over the span of days.
b = a.reduceByKey(lambda x, y: x+y).map(aggregate)
b.persist(StorageLevel.MEMORY_ONLY_SER)
Then I need to make a redis call for the actual terms for the items the user has rated, so I call mapPartitions like this
final_scores = b.mapPartitions(get_tags)
get_tags function creates a redis connection each time of invocation and calls redis and yield a (user, item, rate) tuple
(The redis hash is stored in the 4core)
I have tweaked the settings for SparkConf to be at
conf = (SparkConf().setAppName(APP_NAME).setMaster(master)
.set("spark.executor.memory", "5g")
.set("spark.akka.timeout", "10000")
.set("spark.akka.frameSize", "1000")
.set("spark.task.cpus", "5")
.set("spark.cores.max", "10")
.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.set("spark.kryoserializer.buffer.max.mb", "10")
.set("spark.shuffle.consolidateFiles", "True")
.set("spark.files.fetchTimeout", "500")
.set("spark.task.maxFailures", "5"))
I run the job with driver-memory of 2g in client mode, since cluster mode doesn't seem to be supported here.
The above process takes a long time for 2 days' of data (around 2.5hours) and completely gives up on 14 days'.
What needs to improve here?
Is this infrastructure insufficient in terms of RAM and cores (This is offline and can take hours, but it has got to finish in 5 hours or so)
Should I increase/decrease the number of partitions?
Redis could be slowing the system, but the number of keys is just too huge to make a one time call.
I am not sure where the task is failing, in reading the files or in reducing.
Should I not use Python given better Spark APIs in Scala, will that help with efficiency as well?
This is the exception trace
Lost task 4.1 in stage 0.0 (TID 11, <node>): java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:442)
at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:554)
at sun.security.ssl.InputRecord.read(InputRecord.java:509)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:934)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:891)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:102)
at org.apache.http.impl.io.AbstractSessionInputBuffer.read(AbstractSessionInputBuffer.java:198)
at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:178)
at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:200)
at org.apache.http.impl.io.ContentLengthInputStream.close(ContentLengthInputStream.java:103)
at org.apache.http.conn.BasicManagedEntity.streamClosed(BasicManagedEntity.java:164)
at org.apache.http.conn.EofSensorInputStream.checkClose(EofSensorInputStream.java:227)
at org.apache.http.conn.EofSensorInputStream.close(EofSensorInputStream.java:174)
at org.apache.http.util.EntityUtils.consume(EntityUtils.java:88)
at org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream.releaseConnection(HttpMethodReleaseInputStream.java:102)
at org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream.close(HttpMethodReleaseInputStream.java:194)
at org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.seek(NativeS3FileSystem.java:152)
at org.apache.hadoop.fs.BufferedFSInputStream.seek(BufferedFSInputStream.java:89)
at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:63)
at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:126)
at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:236)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:212)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.CoalescedRDD$$anonfun$compute$1.apply(CoalescedRDD.scala:93)
at org.apache.spark.rdd.CoalescedRDD$$anonfun$compute$1.apply(CoalescedRDD.scala:92)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:405)
at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:243)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1617)
at org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:205)
I could really use some help, thanks in advance
Here is what my main code looks like
def main(sc):
f=get_files()
a=sc.textFile(f, 15)
.coalesce(7*sc.defaultParallelism)
.map(lambda line: line.split(","))
.filter(len(line)>0)
.map(lambda line: (line[18], line[2], line[13], line[15])).map(scoring)
.map(lambda line: ((line[0], line[1]), line[2])).persist(StorageLevel.MEMORY_ONLY_SER)
b=a.reduceByKey(lambda x, y: x+y).map(aggregate)
b.persist(StorageLevel.MEMORY_ONLY_SER)
c=taggings.mapPartitions(get_tags)
c.saveAsTextFile("f")
a.unpersist()
b.unpersist()
The get_tags function is
def get_tags(partition):
rh = redis.Redis(host=settings['REDIS_HOST'], port=settings['REDIS_PORT'], db=0)
for element in partition:
user = element[0]
song = element[1]
rating = element[2]
tags = rh.hget(settings['REDIS_HASH'], song)
if tags:
tags = json.loads(tags)
else:
tags = scrape(song, rh)
if tags:
for tag in tags:
yield (user, tag, rating)
The get_files function is as:
def get_files():
paths = get_path_from_dates(DAYS)
base_path = 's3n://acc_key:sec_key#bucket/'
files = list()
for path in paths:
fle = base_path+path+'/file_format.*'
files.append(fle)
return ','.join(files)
The get_path_from_dates(DAYS) is
def get_path_from_dates(last):
days = list()
t = 0
while t <= last:
d = today - timedelta(days=t)
path = d.strftime('%Y-%m')+'/'+d.strftime('%Y-%m-%d')
days.append(path)
t += 1
return days
As a small optimization, I have created two separate tasks, one to read from s3 and get additive sum, second to read transformations from redis. The first tasks has high number of partitions since there are around 2300 files to read. The second one has much lesser number of partitions to prevent redis connection latency, and there is only one file to read which is on the EC2 cluster itself. This is only partial, still looking for suggestions to improve ...
I was in a similar usecase: doing coalesce on a RDD with 300,000+ partitions. The difference is that I was using s3a(SocketTimeoutException from S3AFileSystem.waitAysncCopy). Finally the issue was resolved by setting a larger fs.s3a.connection.timeout(Hadoop's core-site.xml). Hopefully you can get a clue.

S3 path error with Flume HDFS Sink

I have a Flume consolidator which writes every entry on a S3 bucket on AWS.
The problem is with the directory path.
The events are supposed to be written on /flume/events/%y-%m-%d/%H%M, but they're on //flume/events/%y-%m-%d/%H%M.
It seems that Flume is appending one more "/" at the beginning.
Any ideas for this issue? Is that a problem with my path configuration?
master.sources = source1
master.sinks = sink1
master.channels = channel1
master.sources.source1.type = netcat
# master.sources.source1.type = avro
master.sources.source1.bind = 0.0.0.0
master.sources.source1.port = 4555
master.sources.source1.interceptors = inter1
master.sources.source1.interceptors.inter1.type = timestamp
master.sinks.sink1.type = hdfs
master.sinks.sink1.hdfs.path = s3://KEY:SECRET#BUCKET/flume/events/%y-%m-%d/%H%M
master.sinks.sink1.hdfs.filePrefix = event
master.sinks.sink1.hdfs.round = true
master.sinks.sink1.hdfs.roundValue = 5
master.sinks.sink1.hdfs.roundUnit = minute
master.channels.channel1.type = memory
master.channels.channel1.capacity = 1000
master.channels.channel1.transactionCapactiy = 100
master.sources.source1.channels = channel1
master.sinks.sink1.channel = channel1
The Flume NG HDFS sink doesn't implement anything special for S3 support. Hadoop has some built-in support for S3, but I don't know of anyone actively working on it. From what I have heard, it is somewhat out of date and may have some durability issues under failure.
That said, I know of people using it because it's "good enough".
Are you saying that "//xyz" (with multiple adjacent slashes) is a valid path name on S3? As you probably know, most Unixes collapse adjacent slashes.