AWS Quicksight monthly summarize in percentage from different csv files - sql

1. I have a Lambda function that is running monthly, it is running Athena query, and export the results in a CSV file to my S3 bucket.
2. Now i have a Quicksight dashboard which is using this CSV file in Dataset and visual all the rows from the report into a dashboard.
Everything is good and working until here.
3. Now every month I'm getting a new csv file in my S3 bucket, and i want to add a "Visual Type" in my main dashboard that will show me the difference in % from the previous csv file(previous month).
For example:
My dashboard is focusing on the collection of missing updates.
In May i see i have 50 missing updates.
In June i got a CSV file with 25 missing updates.
Now i want it somehow to reflect into my dashboard with a "Visual Type" that this month we have reduced the number of missing updates by 50%.
And in month July, i get a file with 20 missing updates, so i want to see that we reduced with with 60% from the month May.
Any idea how i can do it?

I'm not sure I quite understand where you're standing, but I'll assume that you have an S3 manifest that points to an S3 directory and not a different manifest (and dataset) per each file.
If that's your case you could try to tackle that comparison creating a calculated field and using the periodOverPeriodPercentDifference
Hope this helps!

Related

How to append new records only to BiqQuery table?

I get reports from 3rd party API on daily basis and going to store data in BigQuery table. Each report includes data for the last 90 days, so each new report has new records for new day, but loses some records for 91 day. My task is keeping data in Bigquery for period > 90 days.
I tried to setup BiqQuery data transfer from Cloud Storage with "Write preference" option "Mirror" and seems that it just overwrites my old data with new. If I change it to "Append" it will add data from new report to old with doubles.
Are there any ideas how can I just append new records to my table using BigQuery functional? Can't believe that it's impossible.

Appsflyer Data Locker Override data BigQuery

I have an issue with setting up Appsflyer Cost ETL with Google BigQuery. We get parquet files each day.
The issue is the following - each day you get the file with 10 dates.
enter image description here
The problem is that each day you have 6 dates that shoud rewrite yesterday file. And the task is how to set a data transfers or scheduled queries to override the data for each date that you have in newer file to make the data for long period in one table.

How to store and serve coupons with Google tools and javascript

I'll get a list of coupons by mail. That needs to be stored somewhere somehow (bigquery?) where I can request and send it to the user. The user should only be able to get 1 unique code, that was not used beforehand.
I need the ability to get a code and write, that it was used, so the next request gets the next code...
I know it is a completely vague question but I'm not sure how to implement that, anyone has any ideas?
thanks in advance
Thr can be multiples solution for same requirement, one of them is given below :-
Step 1. Try to get coupons over a file (CSV, JSON, and etc) as per your preference/requirement.
Step 2. Load Source file to GCS (storage).
Step 3. Write a Dataflow code which read data from GCS (file) an load data to a different Bigquery table (tentative name: New_data). Sample code.
Step 4. Create a Dataflow code to read data from Bigquery table New_data and compare it with History_data and identify new coupons and write data to a file on GCS or Bigquery table. Sample code.
Step 5. Schedule entire process over an orchestrator/Cloud scheduler/Cron tab job.
Step 6. Once you have data you can send it to consumers through any communication channel.

How do we get/extract log data from splunk

I'm having splunk with holding 3 months of log details getting refreshed after that (no history we can see after that), but my requirement is: I need to store that log details to another folder in splunk, which holds all the log info with history by dumping. Not sure how to extract data from splunk. Can we use any java code? or any API to extract the log data from splunk and store into another?
I'm new to splunk.
You need to investigate the following:
index retention (and for Smart Store)
storage availability
if you have an index set for 500G or 1 year, but you store 50G per day, you'll rotate at 10 days
if you hsve an index set for 500G or 1 year, but only have 400G available storage, it will rotate sooner
In addition to the answer by #warren, look into the coldToFrozenDir and coldToFrozenScript settings in indexes.conf. These settings govern where and how data is archived rather than deleted. The data is not exported, however, it is stored in Splunk's proprietary format.

apache solr csv file same values

We have identified Apache Solr as a possible solution to our problem. Please bear with me, I'm new to Apache Solr. We are planning to upload several large CVS files and use Solrs REST like feature to get the result back in XML/JSON.
The problem I am thinking of is e.g. you have two file currency.csv and country.csv and they both have a 'GBP' as the currency entry in them. So if you upload these both files into Solr and do a query for value of 'GBP' then form which file entries will this have been returned?
What I would ideally like to do is a query that would only return currency e.g. 'GBP' form entries that were upload from the currency.csv and not from the country.csv file.
Hope someone can help or point me in the right direction as we may have files with similar data and yet we need to be sure to retrieve the right values from the right csv file.
Thanks in advance.
GM
UPDATE
Is it better to have multiple cores? i.e. one core per file?
You can add an additional field data_type which would indicate the type like country or currency for the records.
You can then use the field to filter the results by the type or be able to display and use the type to indicate which type the record belongs to.