Partition names in ADLA Tools. Is it bug or feature?

Partition names in ADLA Tools. Is it bug or feature? - azure-data-lake

Some of my tables are partitioned. But when i expanded the partitions node a saw that they have a very cryptic names. I know guys you gonna implement catalog views in the future, but may be it will be easy to implement showing partition key label instead of a cryptic guid (partition name) in the near upcoming refresh ?

This is a good suggestion. Can you please file it via http://aka.ms/adlfeedback?
I am also forwarding this post to the tools team.

Related

Disabling built-in indexes in Google Cloud Datastore

I'm currently doing a benchmark to see if Google Cloud Datastore could suit our needs but I've got a problem with how indexes are handled.
I know that I will never have to filter on anything except the key field, and thus I would like to be able to disable the built-in indexing of all the other fields. I just want to use it as a key/value store.
I'm currently looking at potentially multiple TB of indexes if I cannot disable them (~50 fields, billions of rows) and that would kill our budget.
Is there any way to remove these indexes ? It seems the index.yaml file this link talks about is only about composite indexes.
Thanks for your help !

Found it ! You can explicitly tell Datastore not to index your field by doing it like this (excluded properties)

I have researched in Datastore github issues about this same question, about (2015), the last inquiry was on 2019. But there is no response. You can ask there if it has been any
I have also researched in the Public Issue Tracker PIT of Google Cloud Platform for an existing Feature Request (FR) or Issue related with this, but not found any.
I think the best way to proceed is to file a FR with the proper components. In this way the Engineering team will have visibility about this. The PIT uses the number of "stars" (people who have indicated interest in an issue) to prioritize work on the platform. Given that there is no FR opened, you should open a new one.

Google Cloud Big Query Scheduled Queries weird error relating JURISDICTION

All my datasheets, tables, and ALL items inside BQ are un EU. When I try to do a View->to->Table 15 min scheduled query I get an error regarding my location, which is incorrect, because all, source and destiny are both on EU...
Anyone knows why?

There is a transient known issue matching your situation, GCP support team needs more time for troubleshooting. There may be a potential issue in the UI. I would ask you to try the following steps:
Firstly, try to make the same operation in Chrome's incognito mode.
Another possible workaround is trying to follow this official guide using a different approach than the UI (CLI for instance).
I hope it helps.

Backfill Google Analytics in BigQuery

I'm looking for a workaround on the following issue. Hope someone can help.
I'm unable to backfill data in the ga_sessions_ table in BigQuery through product linking in GA. e.g. partition ga_sessions_20180517 is missing
This specific view has already been linked before. Google documentation says that historical load is only done once per view (hence, the issue) (https://support.google.com/analytics/answer/3416092?hl=en)
Is there any way to work around it?
Kind regards,
Martijn

You can use Google Analytics Reporting API to get the data for that view. This method has lot of restrictions like sometimes the data is sampled/only 7 dimensions can be exported in one call, but at least you will be able to fetch your data in a partitioned manner.
Documentation hereDoc

If you need a lot of dimensions/metrics in hit level format, scitylana.com has a service that can provide this data historically.
If you have a clientId set in a custom dimension the data-quality is near perfect.
It also works without a clientId set.
You can get all history as available through the API.
You can get 100+ dimensions/metrics in one batch into BQ.

Cosmos on Wirecloud

Taking as a reference public documentation (https://wirecloud.conwet.etsiinf.upm.es/slides/1.2_Integration%20with%20other%20GEs.html#slide16) I wonder if at this point there is any progress on connecting Wirecloud & Cosmos in order to retrieve historical data and visualised it over mashups setups.
If not, could you give any direction so I can give a try implementing something around this?
Note: I have already checked some of the available documentation, and it looks to me that my desired feature could be tackled by a simple python implementation to retrieve HDFS files to the appropriated NGSI format, Is it right?
Nevertheless, I believe it will be a dirty mechanism. What should be the recommended way?

I honestly hope not to be cheating by answering my own questions and marking them as correct, but I would like to leave a record of a solution for those folks that might be experiencing same troubles as me.
I have developed a quick and dirty mechanism to retrieve HDFS files into NGSI formats so we can retrieve historical data like we do with Orion widgets.
https://github.com/netzahdzc/cloudCos
Please note, that this is a quite working progress, so there are some hardcode that I hope eventually fix.

Official Cosmos-WireCloud integration is currently not available, although there are third-party widgets using cosmos out there.
In my opinion, the best option for accessing the HDFS filesystem, is using WebHDFS (you will need adding a FIWARE token into the request for authentication).
It should also be possible to connect to Hive (see this ticket for more info)

Using SQL for cleaning up JIRA database

Has anyone had luck with removing large amount of issues from a jira database instead of using the frontend? Deleting 60000 issues with the bulktools is not really feasible.
Last time I tried it, the jira went nuts because of its own way of doing indexes.

How about doing a backup to xml, editing the xml, and reimporting?

We got gutsy and did a truncate on the jiraissues table and then use the rebuild index feature on the frontend. It looks like it's working!

This is old, but I see that this question was just edited recently, so to chime in:
Writing directly to the JIRA database is problematic. The reindex feature suggested in the Oct 14 08 answer just rebuilds the Lucene index, so it is unlikely to clean up everything that needs to be cleaned up from the database on a modern JIRA instance. Off the top of my head, this will probably leave data lying around in the following tables, among others:
custom field data (customfieldvalue table)
issue links (issuelink table)
versions and components (nodeassociation table, which contains other stuff too, so be careful!)
remote issue links or wiki mentions (remotelink table)
If one has already done such a manual delete on production, it's always a good idea to run the database integrity checker (YOURJIRAURL/secure/admin/IntegrityChecker!default.jspa) to make sure that nothing got seriously broken.
Fast forwarding to 2014, the best solution is to write a quick shell script that uses the REST API to delete all of the required issues. (The JIRA CLI plugin is usually a good option for automating certain types of tasks too, but as far as I can tell, it does not currently support the deletion of issues, so the REST API is your best bet.)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Partition names in ADLA Tools. Is it bug or feature? - azure-data-lake

This is a good suggestion. Can you please file it via http://aka.ms/adlfeedback? I am also forwarding this post to the tools team.

Related

Disabling built-in indexes in Google Cloud Datastore

Google Cloud Big Query Scheduled Queries weird error relating JURISDICTION

Backfill Google Analytics in BigQuery

Cosmos on Wirecloud

Using SQL for cleaning up JIRA database

Categories

Resources