Mongo db . Read records while maintaining last readrecord information - sql

My problem is as follows : On a daily basis , records are appended into the Mongo database . There is no specific column for showing timestamp or id (The user has designed the table thus). Whilst retrieving records from the database , is there any meta information (such as the BSON document_id) which will help to retrieve records based on a certain time range (11-12-2013 to 12-12-2013 ). ?

Timestamp can be extracted from ObjectId Mongo dcs
More details here

Related

What method ga4 use for streaming data to bigquery? In SQL terms, is it just insert or update too?

Sorry, I'm new to this. I read a few sources including some google documentation guides but still don't quiet understand:
Every time GA4 streams data into bigquery, bigquery table get a new row, or it can update existing?
For example, if I want data from a particular client_id should I expect only 1 row, or I get a bunch with different data to research?
Every record should be insert entry as if you see table column , each streamed record loaded with epoch timestamp as event_timestamp.

Extracting Data from a Multi-Data Column in SQL

I'm creating a sales leaderboard in HOLISTICS and the column "user_id" is a multi-data column.
Here's a snapshot of the column "user_id":
I need to show the "name" part of the user. I tried using CONVERT and even JSON_VALUE but both are not recognized by Holistics.
I used CAST but still the user_id is in numerical form.
Here's the my code:
And here's the data output:
Can you help me on what to do to be able to show the actual name of the sales person?
I'm a newbie here and its my first post that's why all my snipshots are put in a link form.
To select a particular field from a JSON data (and JSON is what you have in user_id column), try this combination:
SELECT
JSON_UNQUOTE(JSON_EXTRACT(user_id,'$.id')) as id
JSON_UNQUOTE(JSON_EXTRACT(user_id,'$.name')) as user_name
FROM public.deals
This should return the user's id and name from your JSON column.
Whatever software you use, it probably expects the data to be retrieved in a row-column format, so you just need to play with the SQL query, so that it returns properly formatted data. And since you have JSONs in a user_id column (which seems weird, but nevermind) - a combination of JSON_EXTRACT, JSON_UNQUOTE and perhaps CAST should do the trick.
But bear in mind, that running DISTINCT on a big table using those methods could be slow.

Multi user Saving Query in SQL

I am working on VS 2010 C# and SQL Server 2014.
My entry program generate new transaction number at saving and that number store in tables, I have header and detail concept in tables, meaning that in header table this data is stored:
party Name/Code,
Transaction No
Date
and in the details table, these piece of information are stored:
Item Code/Name
Unit
Rate
Quantity
In the header table, I have an incremented column i.e. SrNo as integer.
On saving, 1st save in header table then get the max SrNo from header table and then that store in the detail table.
This is working fine on a single user/machine, but when on multi user / machines when save at the same time, data store on different SrNo in detail table.
How can I store entry when multi user save same program at the same time?

Bigquery and Tableau

I attached Tableau with Bigquery and was working on the Dash boards. Issue hear is Bigquery charges on the data a query picks everytime.
My table is 200GB data. When some one queries the dash board on Tableau, it runs on total query. Using any filters on the dashboard it runs again on the total table.
on 200GB data, if someone does 5 filters on different analysis, bigquery is calculating 200*5 = 1 TB (nearly). For one day on testing the analysis we were charged on a 30TB analysis. But table behind is 200GB only. Is there anyway I can restrict Tableau running on total data on Bigquery everytime there is any changes?
The extract in Tableau is indeed one valid strategy. But only when you are using a custom query. If you directly access the table it won't work as that will download 200Gb to your machine.
Other options to limit the amount of data are:
Not calling any columns that you don't need. Do this by hiding unused fields in Tableau. It will not include those fields in the query it sends to BigQuery. Otherwise it's a SELECT * and then you pay for the full 200Gb even if you don't use those fields.
Another option that we use a lot is partitioning our tables. For instance, a partition per day of data if you have a date field. Using TABLE_DATE_RANGE and TABLE_QUERY functions you can then smartly limit the amount of partitions and hence rows that Tableau will query. I usually hide the complexity of these table wildcard functions away in a view. And then I use the view in Tableau. Another option is to use a parameter in Tableau to control the TABLE_DATE_RANGE.
1) Right now I learning BQ + Tableau too. And I found that using "Extract" is must for BQ in Tableau. With this option you can also save time building dashboard. So my current pipeline is "Build query > Add it to Tableau > Make dashboard > Upload Dashboard to Tableau Online > Schedule update for Extract
2) You can send Custom Quota Request to Google and set up limits per project/per user.
3) If each of your query touching 200GB each time, consider to optimize these queries (Don't use SELECT *, use only dates you need, etc)
The best approach I found was to partition the table in BQ based on a date (day) field which has no timestamp. BQ allows you to partition a table by a day level field. The important thing here is that even though the field is day/date with no timestamp it should be a TIMESTAMP datatype in the BQ table. i.e. you will end up with a column in BQ with data looking like this:
2018-01-01 00:00:00.000 UTC
The reasons the field needs to be a TIMESTAMP datatype (even though there is no time in the data) is because when you create a viz in Tableau it will generate SQL to run against BQ and for the partitioned field to be utilised by the Tableau generated SQL it needs to be a TIMESTAMP datatype.
In Tableau, you should always filter on your partitioned field and BQ will only scan the rows within the ranges of the filter.
I tried partitioning on a DATE datatype and looked up the logs in GCP and saw that the entire table was being scanned. Changing to TIMESTAMP fixed this.
The thing about tableau and Big Query is that tableau calculates the filter values using your query ( live query ). What I have seen in my project logging is, it creates filters from your own query.
select 'Custom SQL Query'.filtered_column from ( your_actual_datasource_query ) as 'Custom SQL Query' group by 'Custom SQL Query'.filtered_column
Instead, try to create the tableau data source with incremental extracts and also try to have your query date partitioned ( Big Query only supports date partitioning) so that you can limit the data use.

Best way to load xml data into new SQL table

I have users info in SQL table with 3 columns. One of the column is in XML datatype which has user information in XML format. The number of columns in the XML data can vary from User to User.For instance, under User 1, i can have 25 fields and then User 2 can have 100 fields . That can change again to 50 for User 3. The fields for each user changes. I need to be able to pull all the fields(columns) under each user and write to a SQL table XYZ.
After writing user A record into SQL table XYZ, User B will have more fields(columns) than A, here i need to ADD these fields(columns) to XYZ table making values as NULL to user A.
Is there an efficient way of achieving this using T-SQL OR SSIS?
I think your problem is not the Data loading mechanism but the Data Injection Strategy
2 strategies I can think of right now:
I would suggest you to define an XSD for your XML with the worst case (hoping it is definable) scenario and then design your db table around it. As long as the user info conforms to the XSD then you should be fine with your inserts.
You create a table like: Userid | ColumnName | ColumnValue
and then enter the data row-wise , that would give you a lot of flexibility to work around the scenario. You could then always write queries to extract the data in the format you want.