Well,I have got data of the Stackexchange.
LINK: https://data.stackexchange.com/stackoverflow/query/new
There're several tables.
Q:in the table posthistory,what's the posthistorytypeid meaning ?
Q:there're some brief intruduction about these tables?
There's also a readme.txt file between all the different dumps of stackexchange-sites. In this file all columns are described in detail.
Related
I am looking for a way to visualize the stats of a table in Snowflake.
The long step is to pull a meaningful sample of the data with python and apply Pandas, but it is somewhat inefficient and unsafe to pull the data out of snowflake.
Snowflake's new interface shows these stats graphically and I would like to know if there is a way to obtain this data with query or by consulting metadata.
I need something like Pandas-profiling but without a external server. maybe snowflake store metadata/statistic about its colums. numeric, categoric
https://github.com/pandas-profiling/pandas-profiling
thank you for your advices.
You can find a lot meta information in the INFORMATION_SCHEMA.
All the views and table functions in the Snowflake INFORMATION_SCHEMA can be found here: https://docs.snowflake.com/en/sql-reference/info-schema.html
not sure if you're talking about viewing the information schema as mentioned, but if you need documentation on this whole new interface, it's called SnowSight
you can learn more there:
https://docs.snowflake.com/en/user-guide/ui-snowsight.html
cheers!
The highlight in your screenshot isn't statistics about the data in the table, but merely about the query result (which looks like a DESCRIBE TABLE query). For example, if you look at type, it simply tells you that this table has 6 VARCHAR columns, 2 timestamps, and 1 number.
What you're looking for is something that is provided by most BI tools or data catalogs. I suggest you take a look at those instead.
You could also use an independent tool, like Soda, which is open source.
Having read almost all topics related to dynamic where clauses, I still can't find a way through.
Here is my source table:
Source Table
And the result I want is:
Results
In fact I want to return all values satisfying the Test value condition but don't know how to implement it dynamicly (I have a table with 700K lines).
Thanks a lot for your help.
EDIT:
Following your answers, I will detailed a bit more the approach.
Unfortunately, as I'm a new user, I'm not allowed to post pictures directly in the post.
I'm basicly performing segregation of duties controls over the SAP system.
Basicly, I want to test if some of the access of a customer are conflictual based on SAP extractions against a knowledge template stating potential conflicts.
Here is an simplified example of the SAP extract:
And here is a simplified example of the Potential conflict template:
<table><tbody><tr><th>Field</th><th>Value</th></tr><tr><td>ACTVT</td><td>AZ0220</td></tr><tr><td>KOART</td><td>K</td></tr><tr><td>BUKRS</td><td>*</td></tr><tr><td>EKGRP</td><td>03</td></tr></tbody></table>
This is a faxe example of the raw data of the customer:
<table><tbody><tr><th>FIELD</th><th>RANGE_START</th><th>RANGE_END</th></tr><tr><td>ACTVT</td><td>AZ01*</td><td>AZ99*</td></tr><tr><td>KOART</td><td>A</td><td>L</td></tr><tr><td>BUKRS</td><td>011</td><td>099</td></tr><tr><td>EKGRP</td><td>2</td><td>10</td></tr></tbody></table>
I thought a way of doing this is to use dynamic where clause.
Thanks a lot for your help
I only found answers about how to import csv files into the database, for example as blob or as 1:1 representation of the table you are importing it into.
What I need is a little different: My team and I are tracking everything we do in a database. A lot of these tasks produce logfiles, benchmark results, etc., which are stored in CSV format. The number of columns are far from consistent and also the data could be completely different from file to file, e.g. it could be a log from fraps with frametimes in it or a log of CPU temparatures over an amount of time, or even something completely different.
Long story short, I came up with an idea, but - being far from a sql pro - I am not sure if it makes sense or if there is a more elegant solution.
Does this make sense to you:
We also need to deal with a lot of data that is produced, so please give me also your opinion if that is feasible with like 200 files per day which can easyly have a couple of thousands rows.
The purpose of all this will be, that we can generate reports form the stored data and perform analysis of the data. E.g. view it on a webpage in a graph or do calculations with it.
I'm limited to MS-SQL in this case, because that's what the current (quite complex) database is and I'm just adding a new schema with that functionality to it.
Currently we just archive the files on a raid and store a link to it in the database. So everyone who wants to do magic with the data needs to download every file he needs and then use R or Excel to create a visualization of the data.
Have you considered a column of XML data type for the file data as an alternative of ColumnId -> Data structure? SQL server provides is a special dedicated XML index (over the entire XML structure) so your data can be fully indexed no matter what CSV columns you have. You will have much less records in the database to handle (as an entire CSV file will be a single XML field value). There are good XML query options to search by values & attributes of the XML type.
For that you will need to translate CSV to XML, but you will have to parse it either way ...
Not that your plan won't work, I am just giving an idea :)
=========================================================
Update with some online info:
An article from Simple talk: The XML Methods in SQL Server
Microsoft documentation for nodes() with various use case samples: nodes() Method (xml Data Type)
Microsoft document for value() with various use case samples: value() Method (xml Data Type)
I'm working with Pentaho Data Integration (Kettle) and I have a question.
I have two input files file1.txt and file2.txt with the same header:
file1.txt
NAME;AGE
alberto;22
angela;22
madelaine;23
file2.txt
NAME;AGE
carlos;56
fernando;30
ana;16
and I want to merge both files into one, files_together.txt
NAME;AGE
alberto;22
angela;22
madelaine;23
carlos;56
fernando;30
ana;16
I've tried all (I think) and I don't know how to do it. I've been searching in Google, Youtube... with no positive match.
Thank you very much.
Answer; just put the output of each file you want to merge as input of the final one.
I personally found the "Append Stream" to be more useful as it kept the streams together. By pointing two inputs into one output, they are running in parallel so the results will be interlaced, depending on various factors. Using Append Stream will give you results from file1 then results from file2 in the output.
You must "Select Values" step. The name of the fields must be the same.
I was trying something similar with .csv files. Tried dong what you suggested but it didn't work for me. Many other blogs said "It would be better to use Excel scripting then employing Pentaho Data Integration (Kettle) for this." Which is not true.
You can use "Append Stream" step which is under flow category of Transformation. Which takes two input merge it provide you with expected merged file. You can also this step to merge more number of file with each other.
I need to get data in multiple row of one column.
For example data from that format
ID Interest
Sports
Cooking
Movie
Reading
to that format
ID Interest
Sports,Cooking
Movie,Reading
I wonder that we can do that in MS Access sql. If anybody knows that, please help me on that.
Take a look at Allen Browne's approach: Concatenate values from related records
As for the normalization argument, I'm not suggesting you store concatenated values. But if you want to join them together for display purposes (like a report or form), I don't think you're violating the rules of normalization.
This is called de-normalizing data. It may be acceptable for final reporting. Apparently some experts believe it's good for something, as seen here.
(Mind you, kevchadder's question is right on.)
Have you looked into the SQL Pivot operation?
Take a look at this link:
http://technet.microsoft.com/en-us/library/ms177410.aspx
Just noticed you're using access. Take a look at this article:
http://www.blueclaw-db.com/accessquerysql/pivot_query.htm
This is nothing you should do in SQL and it's most likely not possible at all.
Merging the rows in your application code shouldn't be too hard.