My goal is to get the length of text and perform other functions on the text that is nested in a Google Big Query field. The data in question comes from the BQ public dataset: public patents data. Right now I'm using the BQ console to fetch the data, but in the end I will use an API.
I just want to get the length of the text instead of having to fetch the whole field locally to analyze it, or even to truncate the field at a certain length to make the download feasible.
This query runs, but returns all NULL for all fields except the application_number field. If I specify WHERE all fields IS NOT NULL, I get the same response. All null fields.
SELECT
-- Get the application number
application_number,
-- Get the length of the claims text
LENGTH(claims_localized[SAFE_OFFSET(0)].text) as claims_length,
-- Get the length of the description text
LENGTH(description_localized[SAFE_OFFSET(0)].text) as description,
-- Get claims truncated at the first double line break
SPLIT(claims_localized_html[SAFE_OFFSET(0)].text, "\n\n")[SAFE_OFFSET(0)] as first_claim_text,
-- Get the number of claims tags in claims html
ARRAY_LENGTH(SPLIT(claims_localized_html[SAFE_OFFSET(0)].text, "<claim>")) as claims_num,
-- Get the number of image tags in claims html
ARRAY_LENGTH(SPLIT(claims_localized_html[SAFE_OFFSET(0)].text, "<figref>")) as drawings_num
-- Specify database
FROM `patents-public-data.patents.publications_201909`
-- Specify not NULL claims text
WHERE claims_localized[SAFE_OFFSET(0)].text IS NOT NULL
LIMIT 1000
What am I doing wrong here to collect the data from these fields?
Here is what I get for results from the BQ console. It always NULL even when I specify the results not be NULL.
TT: I don't have a way to add an image to comment so i will add it in here, take a look: Nothing from your query is changed here and I still see the results perfectly okay. "May be try unchecking 'Use cache results' in your query settings and try again, it might be something that u run before which is stuck in cache/memory"
Related
I have a query lets call it query1 that looks at sales and I have storenumber=1 in that query I have query2 that looks at returns with storenumber=1 in the second query
I want to add these queries to my store's dashboard. Now in the dashboard, I want to have a box (text, dropdown, anything) where I can enter a new store number and run both those queries.
If I enter say 2 in that box how does that 2 get replaced in the queries where store number =1 to be storenumber=2 now?
I thought it would be some token like variable? but I'm not sure how to get that to work so that the number entered is populated where the storenumber= is in the queries?
Any help will be appreciated
Thank You
It is a token-like variable. When you create your input (dropdown, text, etc.) is it given a name. The value of that input is referenced simply by putting the token name within $s.
... | where storenumber = $store$ | ...
I'm creating a sales leaderboard in HOLISTICS and the column "user_id" is a multi-data column.
Here's a snapshot of the column "user_id":
I need to show the "name" part of the user. I tried using CONVERT and even JSON_VALUE but both are not recognized by Holistics.
I used CAST but still the user_id is in numerical form.
Here's the my code:
And here's the data output:
Can you help me on what to do to be able to show the actual name of the sales person?
I'm a newbie here and its my first post that's why all my snipshots are put in a link form.
To select a particular field from a JSON data (and JSON is what you have in user_id column), try this combination:
SELECT
JSON_UNQUOTE(JSON_EXTRACT(user_id,'$.id')) as id
JSON_UNQUOTE(JSON_EXTRACT(user_id,'$.name')) as user_name
FROM public.deals
This should return the user's id and name from your JSON column.
Whatever software you use, it probably expects the data to be retrieved in a row-column format, so you just need to play with the SQL query, so that it returns properly formatted data. And since you have JSONs in a user_id column (which seems weird, but nevermind) - a combination of JSON_EXTRACT, JSON_UNQUOTE and perhaps CAST should do the trick.
But bear in mind, that running DISTINCT on a big table using those methods could be slow.
I am trying to get information about files in a folder using https://apis.live.net/v5.0/folderid/files?
This particular folder of mine has around 5200 files. So I am getting a readtimeout when I make the above mentioned request. Is there any restriction on the number of files that I can make the request.
Note : I am able to successfully retrieve the file information from folder if I restrict the file count to 500 say https://apis.live.net/v5.0/folderid/files?limit=500
In general it's good to page queries that could potentially return a large number of requests. You could try using the limit query parameter in combination with the offset query parameter to read sets of the children at a time and see if that works better for you.
I'll quote in the relevant information from the documentation for ease of reference:
Specify the first item to get by setting the offset parameter in the preceding code to the index of the first item that you want to get. For example, to get two items starting with the third item, use FOLDER_ID/files?limit=2&offset=3.
Note In the JavaScript Object Notation (JSON)-formatted object that's returned, you can look in the paging object for the previous and next structures, if they apply, to get the offset and limit parameter values of the previous and next entries, if they exist.
You may also want to consider swapping to the new API, which has it's own paging model (using the next links).
I want to use DataTables to show data to a user.
I read the document about "Server-side processing", but
I don't know PHP, so I can't figure out what has happened.
How client-side code sends the data to server-side script?
And how to let server-side script know how many records should be returned?
Please refer the Server-side processing chapter in the DataTables manual. The length parameter determines how many records are requested and start parameter determines first record index (zero-based).
Below is an excerpt from the manual:
start
Paging first record indicator. This is the start point in the current
data set (0 index based - i.e. 0 is the first record).
length
Number of records that the table can display in the current draw. It
is expected that the number of records returned will be equal to this
number, unless the server has fewer records to return. Note that this
can be -1 to indicate that all records should be returned (although
that negates any benefits of server-side processing!)
I'm trying to find the best way to query both news feed and wall using a single request.
First attempt:
Query me/home and me/feed in batch request.
Problem: querying me/home gives me bad results due to Graph API bugs (showing blocked items and on the contrary not showing some items that should be shown) so I decided to change to FQL which seems to handle it much better.
Second attempt:
Use single batch request to query:
(1) me/feed directly.
(2) fql.query for stream table with filter_key set to 'others'.
Problem: Needs to also query for user names because the stream table contains only ids.
Third attempt:
Use batch request to query:
(1) me/feed directly
(2) fql.multiquery for stream table with filter_key set to 'others' and the names table with "WHERE id IN (SELECT actor_id FROM #stream)".
Problem: Fails. It returns "Error: batch parameter must be a JSON array" although it is a json array.
Fourth Attempt:
Use fql.multiquery to get news feed stream, wall stream and names.
Problem: I have no idea how to get a view similar to me/feed using FQL. The best I could get is a list of all my own posts but it doesn't show photos the user is tagged in (so I guess more things are missing).
Appreciate any hints.
Due to FQL not doing SQL style joins, getting information from multiple tables in one query is currently impossible.
Use FQL on the stream table to get the list of posts you want to display be sure to grab the source_id. The source_id can be a user id, page id, event id, group id, and there may be more objects too, just don't remember off the top of my head. (You may also want to do similar caching of the actor_id, target_id and viewer_id)
Cache the source_ids in a dictionary style data cache with source_id being the PK.
Loop thru the cache for ones you don't have information on
Try grabbing the information from the user table based upon id, then next the page table, then event table, and group table until you can find what that ID belongs to. Store the information in your cache
For display merge together the stream table items with the source_id information.