BigQuery resource used every 12h without configuring it

BigQuery resource used every 12h without configuring it - google-bigquery

I need some help to understand what happened to our cloud to have BigQuery resource running every 12h to our cloud without configuring it. Also, it seems very intense because we got charged, in average, one dollar every day for the past month.
After checking in Logs Explorer, I saw several logs regarding the BigQuery resource
I saw the email from one of our software guy. Since I removed him from our Firebase project, there is no more requests.
Though, that person did not do or configure anything regarding the BigQuery so we are a bit lost here and this is why we are asking some help to investigate and understand what is going on.
Hope you will be able to help. Let me know if you need more information.
Thanks in advance
NB: I did not try to add the software guy's email yet. I wanted to see how it will go for the rest of the month.

The most likely causes I've observed for this in the wild:
A scheduled query was setup by a user.
A data studio dashboard was setup and configured to periodically refresh data in the background.
Someone's setup a workflow the queries BigQuery, such as cloud composer or a cloud function, etc.
It's also possible its just something like a script running in a crontab on someone's machine. The audit log should have relevant details like request origin for cases where it's just something running as part of a bespoke process.

Related

Is there any way to track log in/log out timing on Onepanel?

I've installed Onepanel on my EKS cluster and I want to run CVAT tool there. I want to keep track on user log in-log out activities and timings. Is that even possible?

Onepanel isn't supported anymore as far as I know. It has an outdated version of CVAT. CVAT has analytics functionality: https://opencv.github.io/cvat/v2.2.0/docs/manual/advanced/analytics/. It can show working time and intervals of activity.

Build pipeline for a large project

I should start with saying that this is my first time really using Azure Dev Ops and setting up pipelines, so I apologize if I don’t understand things right away and seem a little slow haha
I have a large Kentico CMS project (It’s a .NET C# Website project) that I’m trying to setup a build pipeline for but unfortunately because it is so big, the 30 minute timeout always cancels the build process and I’m not too sure what to do to speed it up.
Below are my available pools to choose from. I don’t think we have any self hosted pools at the moment.
This is all for my job. I unfortunately don’t have full access to our Azure Dev Ops or our Azure Portal but there are some settings and configurations that I think I should be able to do. If there are some settings or adjustments that I don’t have access to, I can pass that information along to our IT and Platform Services department.
This is what my build report looks like.
And these are the error messages that I'm getting.
##[Error 1]
The agent has received a shutdown signal. This can happen when the agent service is stopped, or a manually started agent is canceled.
##[Error 2]
The job exceeded the maximum allowed time of 00:30:00 and was stopped. Please visit for more information.
Please let me know what other information I should provide.

Looks like the solution is more a kind of pricing options
Please have a look at here
Free Tier
240 minutes (shared with Build)
30-minutes maximum single job duration
Paid Tier
$40 / Agent
360 minute maximum single job duration
Refer here for the detailed pricing

I ended up creating a self-hosted agent and that got things working. Unfortunately the size of the repo still makes the build and release very long. But I guess that will have to do for now.

How to proceed with query automation using Import.io

I've successfully created a query with the Extractor tool found in Import.io. It does exactly what I want it to do, however I need to now run this once or twice a day. Is the purpose of Import.io as an API to allow me to build logic such as data storage and schedules tasks (running queries multiple times a day) with my own application or are there ways to scheduled queries and make use of long-term storage of my results completely within the Import.io service?
I'm happy to create a Laravel or Rails app to make requests to the API and store the information elsewhere but if I'm reinventing the wheel by doing so and they provides the means to address this then that is a true time saver.

Thanks for using the new forum! Yes, we have moved this over to Stack Overflow to maximise the community atmosphere.
At the moment, Import does not have the ability to schedule crawls. However, this is something we are going to roll out in the near future.
For the moment, there is the ability to set a Cron job to run when you specify.

Another solution if you are using the free version is to use a CI tool like travis or jenkins to schedule your API scripts.
You can query live the extractors so you don't need to make them run manually every time. This will consume one of your requests from your limit.
The endpoint you can use is:
https://extraction.import.io/query/extractor/extractor_id?_apikey=apikey&url=url
Unfortunately the script will not be a very simple one since most websites have very different respond structures towards import.io and as you may already know, the premium version of the tool provides now with scheduling capabilities.

How to identify process that generates data transfer out in EC2?

I am hosting a small web-based application with Apache Web Server on EC2. On my monthly fee I usually see ~40GB usage of data transfer out, which cause about $5 or so a month.
Although this is no big money, I am curious on how these data transfer out were generated. I am sure at Midnight there won't be anyone actually visiting the web-based application. And yet there are some data transfer out at ~50M per hour (as I can see from the details report from amazon).
Is there any way to figure out what process actually generates those data-transfer out activity (even at Midnight when no one uses the web-application)?
thanks!
J.

How you looked at Boundary, may be they can help. They can monitor data going in and out of your EC2 instance (networking) You can see details like what ports the packets are coming from and where they are going to.
You have to install and agent on your machine and sign up for a trial.

Deploying on EC2

This question is for anyone who has actually used Amazon EC2. I'm looking into what it would take to deploy a server there.
It looks like I can start in VirtualBox, setup my server and then export the image using the provided ec2-tools.
What gets tricky is if I actually want to make configuration changes to my running server, they will not be persistent.
I have some PHP code that I need to be able to deploy (and redeploy) to the system, so I was thinking that EBS would be a good choice there.
I have a massive amount of data that I need stored, but it just so happens that latency is not an issue, so I was thinking something like s3fs might work.
So my question is... What would you do? What does your configuration look like? What have been particular challenges that perhaps you didn't see coming?

We have deployed a large-scale commercial app in the AWS environment.
There are three basic approaches to keeping your changes under control once the server is running, all of which we use in different situations:
Keep the changes in source control. Have a script that is part of your original image that can pull down the latest and greatest. You can pull down PHP code, Apache settings, whatever you need. If you need to restart your instance from your AMI (Amazon Machine Image), just run your script to get the latest code and configuration, and you're good to go.
Use EBS (Elastic Block Storage). EBS is like a big external hard drive that you can attach to your instance. Even if your instance goes away, EBS survives. If you later need two (or more) identical instances, you can give each one of them access to what you save in EBS. See https://stackoverflow.com/a/3630707/141172
Burn a new AMI after each change. There's a tool to create a new AMI from a running instance. If EBS is like having an external hard drive, creating a new AMI is like having a DVD-R. You can save the current state of your machine to it. Next time you have to start a new instance, base it on that new AMI. Good to go.

I recommend storing your PHP code in a repository such as SVN, and writing a script that checks the latest code out of the repository and redeploys it when you want to upgrade. You could also have this script run on instance startup so that you get the latest code whenever you spin up a new instance; saves on having to create a new AMI every time.
The main challenge that I didn't see coming with EC2 is instance startup time - especially with Windows. Linux instances take 5 to 10 minutes to launch, but I've seen Windows instances take up to 40 minutes; this can be an issue if you want to do dynamic load balancing and start up new instances when your load increases.

I'd suggest the best bet is to simply 'try it'. The charges to run a small instance are not high and data transfer rates are very low - I have moved quite a few GB and my data fees are still less than a dollar(!) in my first month. You will likely end up paying mostly for system time rather than data I suspect.
I haven't deployed yet but have run up an instance, migrated it from Ubuntu 8.04 to 8.10, tried different port security settings, seen what sort of access attempts unknown people have tried (mostly looking for phpadmin), run some testing against it and generally experimented with the config and restart of the components I'm deploying. It has been a good prelude to my end deployment. I won't be starting with a big DB so will be initially sticking with the standard EC2 instance space.
The only negativity I have heard it that some spammers have made some of the IP ranges subject to spam-blocking - but have not yet confirmed that.

Your virtual box approach I will suggest you take after you are more familiar with the EC2 infrastructure. I suggest that you go to EC2, open an account and follow Amazon's EC2 getting-started guide. This guide will give you enough overview on all things (EBS, IP, CONNECTIONS, and otherS) to get you started. We are currently using EC2 for production and the way we started was like I am explaining here.
I hope you become a Cloud Expert Soon.

Per timbo's concern, I was able to nab an IP that, so far hasn't legitimately shown up on any spam lists. You will have a few hiccups since many blacklists are technically whitelists and will have every IP on their list until otherwise notified that a Mail Server is running on that IP. It's really easy to remove, most of them have automated removal request forms and every one that doesn't has been very cooperative in removing me from their lists. Just be professional, ask if they can give a time and reason for the block and what steps you should take to remove your IP. All the services I have emailed never asked me to jump through any hoops, within two or three business days they all informed me my IP had been removed.
Still, if you plan on running a mail server I would recommend reserving IPs now. They're 1 cent per every hour they are not bound to an instance so it works out to being about $7 a month. I went ahead and reserved an extra one as I plan on starting up another instance soon.

I have deployed some simple stuff to EC2 Win2k3 instances. Here's my advice:
Find a tutorial. Sign up for the service. Just spend an afternoon setting up your first server. It's pretty darned easy, though there will be obstacles to overcome. It's not too tough.
When I was fooling with EC2 I think I spent like $2.00 setting up a server and playing with it for a while.
Some of your data will be persistent, but you can connect S3 to EC2 as well.
Just go for it!

With regards to the concerns about blacklisting of mail servers, you can also use Amazon's Simple Email Service (SES), which obviates the need to run the mail server on the EC2 instances.

I had trouble with this as well, but posted a note here in their forums - https://forums.aws.amazon.com/thread.jspa?threadID=80158&tstart=0

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

BigQuery resource used every 12h without configuring it - google-bigquery

Related

Is there any way to track log in/log out timing on Onepanel?

Build pipeline for a large project

How to proceed with query automation using Import.io

How to identify process that generates data transfer out in EC2?

Deploying on EC2

Categories

Resources