DNS standard allows for specifying more than 1 question per query (I mean inside single DNS packet). I'm writing Snort plugin for DNS analyzis and I need to test whether it behaves properly when there's DNS query containing multiple questions.
DNS packet structure looks like this:
0 1 2 3 4 5 6 7 8 9 A B C D E F
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| ID |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|QR| Opcode |AA|TC|RD|RA| Z | RCODE |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| QDCOUNT |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| ANCOUNT |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| NSCOUNT |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| ARCOUNT |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| <ACTUAL QUESTIONS GO HERE> |
| |
| ... |
| |
So if QDCOUNT is greater than 1 there can be multiple DNS questions in single query.
How can I perform such query using linux tools? dig domain1.example domain2.example creates just 2 separate queries with 1 question each. host and nslookup seem to allow querying only 1 name at the time.
See this question for the full details: Requesting A and AAAA records in single DNS query
In short, no actually no one today does multiple questions in a single query. This was never clearly defined, and poses a lot of questions (like: there is only a single return code so what do you do for 2 questions if one failed and not the other?).
It would have been useful for people to do A and AAAA queries at the same time (instead of the deprecated ANY) but it basically does not exist today.
You can retrieve all the records from a zone using a single AXFR request, and then parse out the ones you want.
dig #127.0.0.1 domain.com. AXFR
or
nslookup -query=AXFR domain.com 127.0.0.1
Typically AXFR requests are refused except for slave servers, so you will need to whitelist IPs that are allowed to make this request. (In bind this is done with the allow-transfer option).
This won't work for OP's use case of making a snort plugin that checks QDCOUNT but it does kind of solve the problem of sending multiple questions in a single DNS request.
source: serverfault: How to request/acquire all records from a DNS?
Related
I have an application that receives messages from a database via the write ahead logs and every row looks something like this
| id | prospect_id | school_id | something | something else |
|----|-------------|------------|-----------|----------------|
| 1 | 5 | 10 | who | cares |
| 2 | 5 | 11 | what | this |
| 3 | 6 | 10 | is | blah |
Eventually, I will need to query the database for mapping between prospect_id and school name. The query results are in the 10000s. The schools table has a name column that I can query in a simple join. However, I want to store this information somewhere on my server that would be easily accessibly by my application. I need this information:
stored locally so I can access it quickly
capable of being updated once a day asynchronously
independent of the application so that when its deployed or restarted, it doesn't need to be queried again.
What can be done? What are some options?
EDIT
Is pickle a good idea? https://datascience.blog.wzb.eu/2016/08/12/a-tip-for-the-impatient-simple-caching-with-python-pickle-and-decorators/
What are limitations of pickle? The results of the sql query might be in the 10000s
The drawback of using pickle is that it is a python specific protocol. If you intend for other programming languages to read this file, then the tooling might not exist to read it and you would be better storing it in something like a JSON or XML file. If you will only be reading it with python then pickle is fine.
Here are a few options you have:
Load the data from SQL when the application is started up (the SQL data can be stored locally, doesn't have to be on an external system) in a global value.
Use pickle to serialize deserialize the data from a file when needed.
Load the data into redis, an in-memory caching system.
I am complete desperate with a performance differential and I have absolutely no clue WHY there is one.
Overwiew
VMware Workstation v11 on my local computer. I gave the VM just 2 cores and 4GB memory.
Hyper-V Server 2012 R2 with two 6-core-Xeon's (older ones) and 64GB memory. Just this VM is running with full hardware associated.
Referring to a CPU-benchmark I started in each VM, the VM within Hyper-V should be about 5x faster then my local one.
I stripped my code down to just this one operation which I set in a WHILE-loop to simulate parallel queries - normally this is done by a webserver.
Code
DECLARE #cnt INT = 1
WHILE #cnt <= 1000
BEGIN
BEGIN TRANSACTION Trans
UPDATE [Test].[dbo].[NumberTable]
SET Number = Number + 1
OUTPUT deleted.*
COMMIT TRANSACTION Trans
SET #cnt = #cnt + 1;
END
When I execute this in SSMS it needs:
VMware Workstation: 43s
Hyper-V Server: 59s
...which is about 2x slower although the system is at least 4x faster.
Some facts
the DB is the same - backuped and restored
the table has just 1 row and 13 fields
the table has 3 indexes, none of them is "Number"
logged in user is 'SA'
OS is identical
SQL Server version is identical (same iso)
installed SQL Server features are the same
to be sure Hyper-V is not the bottleneck I also installed a VMware ESXi v6 on another server with even less power - the result is nearly identical to the Hyper-V-machine
settings in SSMS should be identical - checked it twice
execution plan is identical on each machine - just execution time is different
the more loops I choose, the bigger is the relative time difference
ADDED when I comment out the OUTPUT-line to suppress the drawing of the line (and each value) my VMware Workstation does it in under 1s while the Hyper-V needs 5s. When I increase the loop number to 2000, my VMware Workstation needs one more time under 1s, the Hyper-V-version although needs 10s!
When running the full code from a local webserver the difference is about 0.8s versus about 9s! ...no, I have not forgotten the '0.'!!
Can you give me a hint what the hell is going on or what else I can proof?
EDIT
I tested the code above without the OUTPUT-line and with 10,000 passes. The client statistics on both systems look identical, except the time statistics:
VMware Workstation:
+-------------------------------+------+--+------+--+-----------+
| Time statistics | (1) | | (2) | | (3) |
+-------------------------------+------+--+------+--+-----------+
| Client processing time | 2328 | | 1084 | | 1706.0000 |
| Total execution time | 2343 | | 1098 | | 1720.5000 |
| Wait time on server replies | 15 | | 14 | | 14.5000 |
+-------------------------------+------+--+------+--+-----------+
Hyper-V:
+-------------------------------+-------+--+------+--+------------+
| Time statistics | (1) | | (2) | | (3) |
+-------------------------------+-------+--+------+--+------------+
| Client processing time | 55500 | | 1250 | | 28375.0000 |
| Total execution time | 55718 | | 1328 | | 28523.0000 |
| Wait time on server replies | 218 | | 78 | | 148.0000 |
+-------------------------------+-------+--+------+--+------------+
(1) : 10,000 passes without OUTPUT
(2) : 1,000 passes with OUTPUT
(3) : mean
EDIT (for HLGEM)
I compared both execution plans and indeed there are two differences:
fast system:
<QueryPlan DegreeOfParallelism="1" CachedPlanSize="24" CompileTime="0" CompileCPU="0" CompileMemory="176">
<OptimizerHardwareDependentProperties EstimatedAvailableMemoryGrant="104842" EstimatedPagesCached="26210" EstimatedAvailableDegreeOfParallelism="2" />
slow system:
<QueryPlan DegreeOfParallelism="1" CachedPlanSize="24" CompileTime="1" CompileCPU="1" CompileMemory="176">
<OptimizerHardwareDependentProperties EstimatedAvailableMemoryGrant="524272" EstimatedPagesCached="655341" EstimatedAvailableDegreeOfParallelism="10" />
Did you check hardware fully?
It looks as OUTPUT operator spend some time to show data to you.
https://msdn.microsoft.com/en-us/library/ms177564%28v=sql.120%29.aspx
Time differences depend on many things. A local server may be faster because you are not sending data through a full network pipeline. Other work happening concurrently on each server may affect speed.
Typically in dev there is little or no other work load and things can be faster than on Prod where there are thousands of users trying to things at the same time. This is why load testing is important if you have a large system.
You don't mention indexing but that too can be different on different servers (even when it is supposed to be the same!). So at least check that.
Look at the execution plans see if you can find the difference. Outdated statistics can also result in a less than optimal execution plan too.
Does one of the servers run applications other than the database? That could be limiting the amount of memory the server has available for the database to use.
Honestly, this is a huge topic and there are many many things you should be checking. If you are doing this kind of analysis, I would suggest you buy a performance tuning book and read through it to figure out what things can affect this. This s not something that can easily be answered by a question on the Internet; you need to get some in depth knowledge.
Query speed has little to do with CPU/memory speed, especially queries that update data.
Query speed is mainly limited by disk I/O speed, which is at least 1000 times slower than CPU/RAM speed. Making queries faster is ultimately about avoiding unnecessary disk I/O, but your query must read and write every row.
The VM box (probably) uses a virtual drive that is mapped to a file on disk and there is probably some effort required to keep the two aligned, possibly even asynchronously, while other processes are running and contending with the drive.
Maybe your workstation has less contention or a simpler virtual file system etc.
Closed. This question is opinion-based. It is not currently accepting answers.
Closed 4 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
We are thinking of introducing a AMQP based approach for our microservice infrastructure (choreography). We do have several services, let's say the customer-service, user-service, article-service etc. We are planning to introduce RabbitMQ as our central Messaging System.
I'am looking for best practices for the design of the system regarding topics/queues etc. One option would be to create a message queue for every single event which can occur in our system, for example:
user-service.user.deleted
user-service.user.updated
user-service.user.created
...
I think it is not the right approach to create hundreds of message queues, isn't it?
I would like to use Spring and these nice annotations, so for example:
#RabbitListener(queues="user-service.user.deleted")
public void handleEvent(UserDeletedEvent event){...
Isn't it better to just have something like "user-service-notifications" as one queue and then send all notifications to that queue? I would still like to register listeners just to a subset of all events, so how to solve that?
My second question: If I want to listen on a queue which was not created before, I will get an exception in RabbitMQ. I know I can "declare" a queue with the AmqpAdmin, but should I do this for every queue of my hundreds in every single microservice, as it always can happen that the queue was not created so far?
I generally find it is best to have exchanges grouped by object type / exchange type combinations.
in you example of user events, you could do a number of different things depending on what your system needs.
in one scenario, it might make sense to have an exchange per event as you've listed. you could create the following exchanges
| exchange | type |
|-----------------------|
| user.deleted | fanout |
| user.created | fanout |
| user.updated | fanout |
this would fit the "pub/sub" pattern of broadcasting events to any listeners, with no concern for what is listening.
with this setup, any queue that you bind to any of these exchanges will receive all messages that are published to the exchange. this is great for pub/sub and some other scenarios, but it might not be what you want all the time since you won't be able to filter messages for specific consumers without creating a new exchange, queue and binding.
in another scenario, you might find that there are too many exchanges being created because there are too many events. you may also want to combine the exchange for user events and user commands. this could be done with a direct or topic exchange:
| exchange | type |
|-----------------------|
| user | topic |
With a setup like this, you can use routing keys to publish specific messages to specific queues. For example, you could publish user.event.created as a routing key and have it route with a specific queue for a specific consumer.
| exchange | type | routing key | queue |
|-----------------------------------------------------------------|
| user | topic | user.event.created | user-created-queue |
| user | topic | user.event.updated | user-updated-queue |
| user | topic | user.event.deleted | user-deleted-queue |
| user | topic | user.cmd.create | user-create-queue |
With this scenario, you end up with a single exchange and routing keys are used to distribute the message to the appropriate queue. notice that i also included a "create command" routing key and queue here. this illustrates how you could combine patterns through.
I would still like to register listeners just to a subset of all events, so how to solve that?
by using a fanout exchange, you would create queues and bindings for the specific events you want to listen to. each consumer would create it's own queue and binding.
by using a topic exchange, you could set up routing keys to send specific messages to the queue you want, including all events with a binding like user.events.#.
if you need specific messages to go to specific consumers, you do this through the routing and bindings.
ultimately, there is no right or wrong answer for which exchange type and configuration to use without knowing the specifics of each system's needs. you could use any exchange type for just about any purpose. there are tradeoffs with each one, and that's why each application will need to be examined closely to understand which one is correct.
as for declaring your queues. each message consumer should declare the queues and bindings it needs before trying to attach to it. this can be done when the application instance starts up, or you can wait until the queue is needed. again, this depends on what your application needs.
i know the answer i'm providing is rather vague and full of options, rather than real answers. there are not specific solid answers, though. it's all fuzzy logic, specific scenarios and looking at the system needs.
FWIW, I've written a small eBook that covers these topics from a rather unique perspective of telling stories. it addresses many of the questions you have, though sometimes indirectly.
Derick's advice is fine, except for how he names his queues. Queues should not merely mimic the name of the routing key. Routing keys are elements of the message, and the queues shouldn't care about that. That's what bindings are for.
Queue names should be named after what the consumer attached to the queue will do. What is the intent of the operation of this queue. Say you want to send an email to the user when their account is created (when a message with routing key user.event.created is sent using Derick's answer above). You would create a queue name sendNewUserEmail (or something along those lines, in a style that you find appropriate). This means it's easy to review and know exactly what that queue does.
Why is this important? Well, now you have another routing key, user.cmd.create. Let's say this event is sent when another user creates an account for someone else (for example, members of a team). You still want to send an email to that user as well, so you create the binding to send those messages to the sendNewUserEmail queue.
If the queue was named after the binding, it can cause confusion, especially if routing keys change. Keep queue names decoupled and self descriptive.
Before answering the "one exchange, or many?" question. I actually want to ask another question: do we really even need a custom exchange for this case?
Different types of object events are so natual to match different types of messages to be published, but it is not really necessary sometimes. What if we abstract all the 3 types of events as a “write” event, whose sub-types are “created”, “updated” and “deleted”?
| object | event | sub-type |
|-----------------------------|
| user | write | created |
| user | write | updated |
| user | write | deleted |
Solution 1
The simplest solution to support this is we could only design a “user.write” queue, and publish all user write event messages to this queue directly via the global default exchange. When publishing to a queue directly, the biggest limitation is it assumes that only one app subscribes to this type of messages. Multiple instances of one app subscribing to this queue is also fine.
| queue | app |
|-------------------|
| user.write | app1 |
Solution 2
The simplest solution could not work when there is a second app (having different processing logic) want to subscribe to any messages published to the queue. When there are multiple apps subscribing, we at least need one “fanout” type exchange with bindings to multiple queues. So that messages are published to the excahnge, and the exchange duplicates the messages to each of the queues. Each queue represents the processing job of each different app.
| queue | subscriber |
|-------------------------------|
| user.write.app1 | app1 |
| user.write.app2 | app2 |
| exchange | type | binding_queue |
|---------------------------------------|
| user.write | fanout | user.write.app1 |
| user.write | fanout | user.write.app2 |
This second solution works fine if each subscriber does care about and want to handle all the sub-types of “user.write” events or at least to expose all these sub-type events to each subscribers is not a problem. For instance, if the subscriber app is for simply keeping the transction log; or although the subscriber handles only user.created, it is ok to let it know about when user.updated or user.deleted happens. It becomes less elegant when some subscribers are from external of your organization, and you only want to notify them about some specific sub-type events. For instance, if app2 only wants to handle user.created and it should not have the knowledge of user.updated or user.deleted at all.
Solution 3
To solve the issue above, we have to extract “user.created” concept from “user.write”. The “topic” type of exchange could help. When publishing the messages, let’s use user.created/user.updated/user.deleted as routing keys, so that we could set the binding key of “user.write.app1” queue be “user.*” and the binding key of “user.created.app2” queue be “user.created”.
| queue | subscriber |
|---------------------------------|
| user.write.app1 | app1 |
| user.created.app2 | app2 |
| exchange | type | binding_queue | binding_key |
|-------------------------------------------------------|
| user.write | topic | user.write.app1 | user.* |
| user.write | topic | user.created.app2 | user.created |
Solution 4
The “topic” exchange type is more flexible in case potentially there will be more event sub-types. But if you clearly know the exact number of events, you could also use the “direct” exchange type instead for better performance.
| queue | subscriber |
|---------------------------------|
| user.write.app1 | app1 |
| user.created.app2 | app2 |
| exchange | type | binding_queue | binding_key |
|--------------------------------------------------------|
| user.write | direct | user.write.app1 | user.created |
| user.write | direct | user.write.app1 | user.updated |
| user.write | direct | user.write.app1 | user.deleted |
| user.write | direct | user.created.app2 | user.created |
Come back to the “one exchange, or many?” question. So far, all the solutions use only one exchange. Works fine, nothing wrong. Then, when might we need multiple exchanges? There is a slight performance drop if a "topic" exchange has too many bindings. If performance difference of too many bindings on “topic exchange” really becomes an issue, of course you could use more “direct” exchanges to reduce number of “topic” exchange bindings for better performance. But, here I want to focus more on function limitations of “one exchange” solutions.
Solution 5
One case we might natually consider multiple exchanges is for different groups or dimensions of events. For instance, besides the created, updated and deleted events memtioned above, if we have another group of events: login and logout - a group of events describing “user behaviors” rather than “data write”. Coz different group of events might need completely different routing strategies and routing key & queue naming conventions, it is so that natual to have a separate user.behavior exchange.
| queue | subscriber |
|----------------------------------|
| user.write.app1 | app1 |
| user.created.app2 | app2 |
| user.behavior.app3 | app3 |
| exchange | type | binding_queue | binding_key |
|--------------------------------------------------------------|
| user.write | topic | user.write.app1 | user.* |
| user.write | topic | user.created.app2 | user.created |
| user.behavior | topic | user.behavior.app3 | user.* |
Other Solutions
There are other cases when we might need multiple exchanges for one object type. For instance, if you want to set different permissions on exchanges (e.g. only selected events of one object type are allowed to be published to one exchange from external apps, while the other exchange accepts any the events from internal apps). For another instance, if you want to use different exchanges suffixed with a version number to support different versions of routing strategies of same group of events. For another another instance, you might want to define some “internal exchanges” for exchange-to-exchange bindings, which could manage routing rules in a layered way.
In summary, still, “the final solution depends on your system needs”, but with all the solution examples above, and with the background considerations, I hope it could at least get one thinking in the right directions.
I also created a blog post, putting together this problem background, the solutions and other related considerations.
Let me first start by stating that in the last two weeks I have received ENORMOUS help from just about all of you (ok ok not all... but I think perhaps two dozen people commented, and almost all of these comments were helpful). This is really amazing and I think it shows that the stackoverflow team really did something GREAT altogether. So thanks to all!
Now as some of you know, I am working at a campus right now and I have to use a windows machine. (I am the only one who has to use windows here... :( )
Now I manage to setup (ok, IT department did that for me) and populate a Postgres database (this I did on my own), with about 400 mb of data. Which perhaps is not so much for most of you heavy Ppostgre users, but I was more used to sqlite database for personal use which rarely exceeded 2mb ever.
Anyway, sorry for being so chatty - now the queries from that database work
nicely. I use ruby to do queries actually.
The entries in the Postgres database are interconnected, in as far as they are like
"pointers" - they have one field that points to another field.
Example:
entry 3667 points to entry 35785 which points to entry 15566. So it is quite simple.
The main entry is 1, so the end of all these queries is 1. So, from any other number, we can reach 1 in the end as the last result.
I am using ruby to make as many individual queries to the database until the last result returned is 1. This can take up to 10 individual queries. I do this by logging into psql with my password and data, and then performing the SQL query via -c. This probably is not ideal, it takes a little time to do these logins and queries, and ideally I would have to login only once, perform ALL queries in Postgres, then exit with a result (all these entries as result).
Now here comes my question:
- Is there a way to make conditional queries all inside of Postgres?
I know how to do it in a shell script and in ruby but I do not know if this is available in postgresql at all.
I would need to make the query, in literal english, like so:
"Please give me all the entries that point to the parent entry, until the last found entry is eventually 1, then return all of these entries."
I already "solved" it by using ruby to make several queries until 1 is eventually returned, but this strikes me as fairly inelegant and possibly not effective.
Any information is very much appreciated - thanks!
Edit (argh, I fail at pasting...):
Example dataset, the table would be like this:
id | parent
----+---------------+
1 | 1 |
2 | 131567 |
6 | 335928 |
7 | 6 |
9 | 1 |
10 | 135621 |
11 | 9 |
I hope that works, I tried to narrow it down solely on example.
For instance, id 11 points to id 9, and id 9 points to id 1.
It would be great if one could use SQL to return:
11 -> 9 -> 1
Unless you give some example table definitions, what you're asking for vaguely reminds of a tree structure which could be manipulated with recursive queries: http://www.postgresql.org/docs/8.4/static/queries-with.html .
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I'm looking at writing a Django app to help document fairly small IT environments. I'm getting stuck at how best to model the data as the number of attributes per device can vary, even between devices of the same type. For example, a SAN will have 1 or more arrays, and 1 or more volumes. The arrays will then have an attribute of Name, RAID Level, Size, Number of disks, and the volumes will have attributes of Size and Name. Different SANs will have a different number of arrays and volumes.
Same goes for servers, each server could have a different number of disks/partitions, all of which will have attributes of Size, Used space, etc, and this will vary between servers.
Another device type may be a switch, which won't have arrays or volumes, but will have a number of network ports, some of which may be gigabit, others 10/100, others 10Gigabit, etc.
Further, I would like the ability to add device types in the future without changing the model. A new device type may be a phone system, which will have its own unique attributes which may vary between different phone systems.
I've looked into EAV database designs but it seems to get very complicated very quickly, and I'm unclear on whether it's the best way to go about this. I was thinking something along the lines of the model as shown in the picture.
http://i.stack.imgur.com/ZMnNl.jpg
A bonus would be the ability to create 'snapshots' of environments at a particular time, making it possible to view changes to the environment over time. Adding a date column to the attributes table may be a way to solve this.
For the record, this app won't need to scale very much (at most 1000 devices), so massive scalability isn't a big concern.
Since your attributes are per model instance and are different for each instance,
I would suggest going with completely free schema
class ITEntity(Model):
name = CharField()
class ITAttribute(Modle)
name = CharField()
value = CharField()
entity = ForeignKey(ITEntity, related_name="attrs")
This is very simple model and you can do the rest, like templates (i.e. switch template, router template, etc) in you app code - its much more straight-forward then using complicated model like EAV (I do like EAV, but this does not seem the usage case for this).
Adding history is also simple - just add timestamp to ITAttribute. When changing attribute - create new one instead. Then fetching attribute pick the one with the latest timestamp. That way you can always have point-in-time view of your environment.
If you are more comfortable with something along the lines of the image you posted, below is a slightly modified version (sorry I can't upload an image, don't have enough rep).
+-------------+
| Device Type |
|-------------|
| type |--------+
+-------------+ |
^
+---------------+ +--------------------+ +-----------+
| Device |----<| DeviceAttributeMap |>----| Attribute |
|---------------| |--------------------| |-----------|
| name | | Device | | name |
| DeviceType | | Attribute | +-----------+
| parent_device | | value |
| Site | +--------------------+
+---------------+
v
+-------------+ |
| Site | |
|-------------| |
| location |--------+
+-------------+
I added a linker table DeviceAttributeMap so you can have more control over an Attribute catalog, allowing queries for devices with the same Attribute but differing values. I also added a field in the Device model named parent_device intended as a self-referential foreign key to capture a relationship between a device's parent device. You'll likely want to make this field optional. To make the foreign key parent_device optional in Django set the field's null and blank attributes to True.
You could try a document based NoSQL database, like MongoDB. Each document can represent a device with as many different fields as you like.