How to simulate a warehouse without using Rackpick/store or Dynamic Systems? - process

I would like to simulate a warehouse as part of a process, like using a RackStore, but without linking it to a PalletRack, where in my input they can be agents. I made a quick illustration in relation to my question, maybe it will help to understand.
Simulation of Warehouse like a process
Regards!

Related

BigQuery Testing, Debugging, and Design Patterns

We use BigQuery as the main data warehouse in our company.
We have gotten very efficient with SQL syntax and we write multi-page SQL queries with valid Syntax to analyze our data.
The main problem that we are struggling with are terrible logic mistakes in our queries. For example, it could be that a > should have been a >=, or that a join was treating NULL values the wrong way.
The effect is that we are getting wrong data out of BigQuery.
The logic within our data structure is so complicated ("what again was the definition of Customer Type ABC?") that it's terribly difficult to actually pull out anything useable. We estimate that up to 50% of analytics that we pull out of BigQuery are plain wrong.
Of course this is a problem that significantly hurts our bottom line and leads to wrong business decision. It has gotten so bad that we are craving for a normalized database structure that at least could be comprehended easier.
My hope is that maybe we are just missing certain design patterns to properly use BigQuery. However I find zero guidance about this online. The SQL we are using is so complex that I'm starting to think that although the Syntax is correct, SQL was not made for this. What we are doing feels like fitting a complex program into a single function, which in turn becomes untestable and a nightmare to work with.
I would appreciate any input and guidance
I can empathize here. I don't think your issue is unique, and there isn't one best practice. I can tell you what we have done to help with these same issues.
We are a small team of analysts, and only have a couple TB of data to crunch daily so your mileage will vary with these tips depending on your situation.
We use DBT - https://www.getdbt.com/. It has a free command line version, or you can pay for DBT cloud if you aren't confident with command line tools. It will help you go from Pages long SQL queries to smaller digestible chunks that are easier to maintain.
It helps with 3 main use cases for us.
database normalization/summarization - you can easily write queries, have them dependent on each other, have them scheduled to run at a certain time, while doing a lot of the more complex data engineering tasks for you. Such as making sure to run things in the right order, and that no circular references exist. This part of the tool helped us migrate away from pages long SQL queries to smaller digestible chunks that are useful in multiple applications.
documentation: there is a documentation site built in. So you can document a column and write out the definition of 'customer' easily.
Testing. We write loads of tests. We have a 100% accepted answer to certain metrics. Any time we need to reference this metric in other queries, or transform data to slice that metric by other dimensions, we write a test to make sure the new transformation matches back to the 100% accepted answer.
We have explored DBT, unfortunately we didn't have the bandwidth to support it at the company level. As an alternative we use airflow to build and maintain datasets in Bigquery. We use the BigQuery operators to interface with BQ through airflow. This helps us in the following ways:
Ability to build custom operators that can help with organizational level bells and whistles (integration with internal systems, data lifecycle management, lineage management etc.)
Ability to break down complex pieces of SQL into smaller manageable blocks that can be reused
Ability to incorporate testing in the process. You can build testing into your pipeline DAG or can build out separate DAGs of tests that can monitor your datasets and send out reports.
Ability to replay and recreate datasets
Ability to easily manage schema changes
I am sure there are other use cases where airflow helps, but these are some of the things that come to mind.

How to move data model automatically?

Can I somehow pull a data model from someone's platform to mine, so I don't need to create the model manually? Is there a chance to make it automatically? Thanks
for some cloud datawarehouses GoodData supports “model scan” I believe where you can connect to the database and generate GD LDM based on that. I believe similar mechanism is also in CN. BUT - I would not recommend doing it for production use especially on top of some random database. It might still require some manual tuning to make a reasonable LDM. And from our experience in GD the LDM should be based on the reporting requirements. But for a quick and dirty demo, it might work out of the box.

Alter data source

Power BI can connect to various data sources and run SELECT queries.
Is it possible to run also other queries (INSERT INTO, UPDATE...)?
Now I need it for a postgresql database, but could use also for others in the future.
No, you can't run directly INSERT/UPDATE queries from Power BI. This isn't the idea of the tool. If you find you need it, then probably there is a major flaw in your design, or you are not using the right tool for this job. But there are few ways to workaround this (again, I'm not saying that you SHOULD do it). Usually this is done in a combination with custom written Power App, embedded in your report in Power Apps visual. The idea is that the app will write to the database, and will refresh your report after that (if needed).
You can start here and I will recommend you to look at this in-depth session - Writing back data to PowerBI from your reports.
The answer is No if I am very straight forward. PBI is a analysis platform for data. There are probably some advance way to do that but, this is not logical or good idea to think about manipulating data from report or from any BI tools. You can search answers from different blog where the same questions asked. For more details, you can check below links-
help link 1
help link 2

Is there a way to let non-technical individuals utilize BigQuery reports?

I want to have an access port for non-tech savvy individuals in which they could make reports of their own without needing to know SQL what-so-ever.
It would be best if I could create custom fields of myself, and then just let the users in the access port pick and choose whichever they like with a custom date range.
I've explored the options Google Data Studio offers, but it looks to me like it mostly puts an emphasis on data visualization.
In addition, my attempts to make custom queries with it were not successful, since the platform is rigid in terms of deciding which field is a metric and which is a dimension (and it does so inaccurately). This makes it hard to query reports as you normally would using BigQuery, which doesn't have these somewhat arbitrary limitations.
Perhaps I've misunderstood something about the platform due to my limited experience with it, but it looks like Data Studio isn't going to fit the bill for me.
EDIT: In addition, the platform should have a way of exporting said reports as CSV files, a feature that Data Studio doesn't have as far as I know.
It would be great to receive suggestions for a different platform which would better fit my needs, or even suggestions on how to make better use of Data Studio.
Have you looked at using a tool like redash (https://redash.io)? Assuming your GA360 data is in BigQuery you can connect redash to BQ. Then you can author queries and visualize.
You can also use the Google Could SDK to connect to BQ and run custom queries to generate new tables in BQ based on the GA360 session data. Then use redash, or any tool, to report/visualize.

Database approach to use for Dynamic Form Data Collection which is suitable for good Reports and Searching

I am working on a project which involves collecting dynamic form data. These forms are user-defined (think surveymonkey) and thus a fixed schema cannot be defined for them. Data in terms of questions/answers would be retrieved for these forms and then stored into the database. Reporting/Searching on this answers (filtering and aggregation) is of utmost importance. There are two approaches which are feasible.
Use a SQL database and store the each field data as a separate row. Reporting/Searching is then done via SQL. My apprehension is that it would result in complicated joins for reporting.
Use a NoSQL database like MongoDB. This seems to be a perfect fit for storing the dynamic data since it is schema-less. However, I am not sure how good its reporting capabilities are.
It seems easier for target users to learn sql than to define map/reduce queries. How easy would it be to build a UI for reporting/searching over mongoDB.
Simple things like - list of users who gave a particular set of answers. How many such users over a period of time etc?
Thanks,
Pulkit
It's already been mentioned in the comments, but I'll re-iterate that you should look at Mongo's map/reduce functionality for reporting and the aggregation framework.
Having done map/reduce in both Couch and Mongo I can say that they are very similar. It's definitely a barrier to entry for a developer that isn't familiar with it, but once you get a few working examples, it's not too bad.
Consider that Mongo can output a map/reduce job to a collection, which I've found to be really useful. This means you can schedule the jobs and run them periodically and output to a place that you can then report on. It's not that hard to create a framework that lets developers write simple Javascript map and reduce functions and then plug them in to be run on a schedule.
The aggregation framework is much easier to understand for a developer coming from SQL. Still a learning curve, but not as bad as map/reduce. It is much more well suited to ad-hoc reporting queries and there is nothing comparable in Couch.
You could maybe make a reporting UI that maps to the aggregation framework, but I wouldn't try to do something similar for map/reduce queries.