SQL/BigQuery on github samples

SQL/BigQuery on github samples - sql

I'm using the google bigquery tool, I'm trying to select ALL sample github repositories that have a pom.xml file and within the content of the file, have an artifact id ex-ex e.g
<artifactId>ex-ex</artifactId>
For this I have broken it down into 2 steps:
1) Find all pom.xml files
SELECT sample_repo_name FROM 'bigquery-public-data.github_repos.sample_contents' WHERE sample_path LIKE 'pom.xml'
2) Select the repositories which contain ex-ex artifact (in the content table)
AND content LIKE '%ex-ex'
The 2nd part of the query does not work (no results found) and is likely due to some syntax error somewhere. Full query below:
SELECT sample_repo_name FROM 'bigquery-public-data.github_repos.sample_contents' WHERE sample_path LIKE 'pom.xml' AND content LIKE '%ex-ex' LIMIT 1000
Would really appreciate help with this, thanks!

Have you tried '%ex-ex%'? Without the second %, you are only searching for records whose last 5 characters are 'ex-ex'. Adding content to the select in your first query and spot checking a few results, the content field appears to be XML (pom.xml, duh) and seem to end with </project>, and thus will probably never match with '%ex-ex'.

Related

osquery - How can I retrieve a file origin using osquery?

I'm using osquery on Windows and I need help: I want to retrieve the file origin of a specific file. For example I download a file from http://example.com and I'm looking for a query on osquery that show me the info that I download that specific file from http://example.com (or something like this). I thought that to derive this information I can compare the timestamps between the table file and the table routes but there isn't the column timestamp in routes. How can I do that?

I don't see a table for this on windows, although the information is available on the system through ADS(see this answer). I would open an issue for this on the osquery repo, it would be a valuable table to have.
You can use the extended_attributes table. For example:
osquery> select path, key, value, base64 from extended_attributes where path ='/Users/victor/Downloads/osqueryi.zip';
path = /Users/victor/Downloads/osqueryi.zip
key = com.apple.lastuseddate#PS
value = eynzWgAAAAAbZEQgAAAAAA==
base64 = 1
path = /Users/victor/Downloads/osqueryi.zip
key = where_from
value = https://files.slack.com/files-pri/T04QVKUQG-FALAL3WP2/download/osqueryi.zip
base64 = 0
osquery>

+1 on what #groob mentioned, this'd be a nice table to have and I think we've wanted it for some time. I thought we already had an issue cut for this, but I went ahead and made a new one as simple searches wasn't turning anything up. Thanks for the question :)
https://github.com/facebook/osquery/issues/5250

--exclude-lang = "xml" is not working using cloc

I have used --exclude-lang = "xml" and --exclude-lang = "designer.cs" to exclude designer and xml files from count of lines changes in two different folders but it's not working.

Simple case of mis-capitalization. An easy way to troubleshoot this issue is to run cloc .
In your console, you'll see a table report printed like this:
You posted that you used: --exclude-lang = "xml". Instead you need to match the language description exactly. In the consoles output, you can see xml is all caps. If you update your flag too --exclude-lang="XML" you'll see that XML scanning is ignored.

Several languages can be excluded for instance by:
--exclude-lang="SVG,CSV,XML,Markdown,TeX,reStrcuturedText"

Alfresco FTS - why first digit of folder's name should be escaped?

I have a question regarding the alfresco FTS/lucene search. It is known that in the search query some special characters have to be escaped, like space (by _x0020_).
But it turned out that if folder's name first chatacter is a digit, it should also be escaped. It can be easily tested in Node Browser by creating a folder, like 123456 and navigate to the parent folder in node browser (in my case I have following folder structure: */2017/123456/):
Primary Path: /app:company_home/st:sites/<some-folders>/cm:_x0032_017/cm:_x0031_23456
^this is 2 ^ and this is 1
If I don't ecape first character of the folder I have 500 error returned.
Why is that, I tried to find something relevant in Alfresco documentation, but didn't manage to.
Alfresco v.4.2.0

Lucene search uses ISO 9075 codification (SQL) like similar frameworks, so we need to encode the path elements. It would be nice if the API hides this requirement like the browser url but you could use ISO9075Encode to do the job.

Querying for shared nodes in JCR (ModeShape)

I have a JCR content repository implemented in ModeShape (4.0.0.Final). The structure of the repository is quite simple and looks like this:
/ (root)
Content/
Item 1
Item 2
Item 3
...
Tags/
Foo/
Bar/
.../
The content is initially created and stored under /Content as [nt:unstructured] nodes with [mix:shareable] mixin. When a content item is tagged, the tag node is first created under /Tags if it's not already there, and the content node is shared/cloned to the tag node using Workspace.clone(...) as described in the JCR 2.0 spec, section 14.1, Creation of Shared Nodes.
(I don't find this particularly elegant and I did just read this answer, about creating a tag based search system in JCR, so I realize this might not be the best/fastest/most scaleable solution. But I "inherited" this solution from developers before me, so I hope I don't have to rewrite it all...)
Anyway, the sharing itself seems to work (I can verify that the nodes are there using the ModeShape Content Explorer web app or programatically by session.getRootNode().getNode("Tags/Foo").getNodes()). But I am not able to find any shared nodes using a query!
My initial try (using JCR_SQL2 syntax) was:
SELECT * FROM [nt:unstructured] AS content
WHERE PATH(content) LIKE '/Tags/Foo/%' // ISDECENDANTNODE(content, '/Tags/Foo') gives same result
ORDER BY NAME(content)
The result set was to my surprise empty.
I also tried searching in [mix:shareable] like this:
SELECT * FROM [mix:shareable] AS content
WHERE PATH(content) LIKE '/Tags/Foo/%' // ISDECENDANTNODE(content, '/Tags/Foo') gives same result
ORDER BY NAME(content)
This also returned an empty result set.
I can see from the query:
SELECT * FROM [nt:unstructured] AS content
WHERE PATH(content) LIKE '/Content/%' // ISDECENDANTNODE(content, '/Content') works just as well
ORDER BY NAME(content)
...that the query otherwise works, and returns the expected result (all content). It just doesn't work when searching for the shared nodes.
How do I correctly search for shared nodes in JCR using ModeShape?
Update: I upgraded to 4.1.0.Final to see if that helped, but it had no effect on the described behaviour.

Cross-posted from the ModeShape forum:
Shared nodes are really just a single node that appears in multiple places within a workspace, so it's not exactly clear what it semantically means to get multiple query results for that one shareable node. Per Section 14.16 of the JSR-283 (JCR 2.0) specification implementations are free to include shareable nodes in query results at just one or at multiple/all of those locations.
ModeShape 2.x and 3.x always returned in query results only a single location of the shared nodes, as this was the behavior of the reference implementation and this was the feedback we got from users. When we were working on Modeshape 4.0, we tried to make it possible to return multiple results, but we ran into problems with the TCK and uncertainty about what this new expected behavior would be. Therefore, we backed off our goals and implemented query to return only one of the shared locations, as we did with 2.x and 3.x.
I may be wrong, but I'm not exactly sure if any JCR implementation returns multiple rows for a single shared node, but I may be wrong.

Whats wrong with Neo4j 2.0 Query?

I am trying to understand why the data is not showing up in my query. I was wondering if there is any way to troubleshoot whats going on.
Here is the current issue:
I have populated some data from existing test database to check the performance with a relation like this : (e:Event)-[:FOR_USER]->(u:User) when I get all the users and look at the property, I can see the data, but when I query the users using same data it says 0 records found.
Below image shows the 2 query:
Can some one please help me understand how to debug such issue in neo4j
EDIT
Issue is that the Browser is somehow truncating the multiple spaces in the result. Like in this case "User-May<space>1 2013 1:18AM" was displayed on both webadmin and new browser, but in reality it should have been "User-May<space><space>1 2013<space><space>1:18AM"
So no matter what I do I can't query the value as looks like duplicate space is truncated somewhere.
Tabular data as Micheal suggested is as below
{"id":"75307","labels":["User"],"properties":{"Name":"User-May 1 2013 1:18AM"}}
and what we are seeing is User-May 1 2013 1:18AM
Regards
Kiran

Use the following Cypher syntax in the browser:
MATCH (user:User { Name: "User-May 1 2013 1:18AM" })
RETURN user.Name as Name
As far as the rendering of multiple spaces being trimmed, that is a browser specific functionality. See screenshot below for example:
The text itself is preserved as it is returned from the Neo4j server. As you can see when I analyze the HTML element of the browser using Firebug, the redundant spaces are indeed there.
So again, this doesn't seem to be a bug with Neo4j, it's how the browser you are using renders the text. The browser expects redundant spaces to be encoded as like so: "Testing testing" which is HTML encoded as Testing testing

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL/BigQuery on github samples - sql

Related

osquery - How can I retrieve a file origin using osquery?

--exclude-lang = "xml" is not working using cloc

Alfresco FTS - why first digit of folder's name should be escaped?

Querying for shared nodes in JCR (ModeShape)

Whats wrong with Neo4j 2.0 Query?

Categories

Resources