Export data with phpmyadmin, ordered on row-level - sql

I'm working with a huge database with more than 800 tables and over 50,000 rows in total. All these tables have different structures, with the exception of a timestamp field which is present in all tables.
My challenge: export all data but be able to use the timestamp field in a meaningful way.
For statistical purposes I want to create an overview of all the entries into this database in which I can work with the timestamp field. The problem with a "normal" export is that the data is ordered by table, then ID. This means that all the timestamp fields are in a different columns (using excel here), and I can't effectively use it to sort the entries based on this field.
TL;DR version: Is it possible to export all data from a database managed with PHPMyAdmin ordered by a field that is present in all tables, while all the other fields are table-specific?

It seems to me that what you want to do is first get the information in the format you would like and then export it. First though, you need to figure out exactly what you are trying to accomplish. You might rather create the SQL to do the statistical (counting, summing, averaging, etc.) work and then just use Excel for the final product. Views and alternate indexes provide logical ways of looking at the data.
As I understand what you are attempting, you need to recreate your database with the timestamp field as the major key for each table. Without physically rewriting the database I don't think you can use phpmyadmin's export to export in the format you want.

Related

I need to export masked data from a Microsoft SQL Server database

I have a develop/test/production environment. I need to export the data of several tables so that certain columns contain masked data.
For example, First Name, Last Name or SSN are scrubbed or randomized so that there's no PII. This is over an existing schema that more/less matches 1:1.
I believe there are maybe 10 or 12 columns that contain sensitive data. I don't want to damage or do anything to the existing data. I would prefer if I can a bunch of INSERT scripts that accomplishes this.
What's the easiest way to do this?

How to tag and store log descriptions in a SQL database

I have logs being captured that contains both a log message and a log "tag".
These "tags" describe the logging events as key-value pairs, for example:
CPU_USAGE: 52.3
USER_LOGGED_IN: "steve15"
NUMBER_OF_RETURNED_RESULTS: 125
I want to store these key-value pairs in a table. Each parameter can be either a string, float, or integer so I can't put all keys in one column and all values in a 2nd column. What is a good SQL table design to store this kind of data? How is the "tagging" of logs typically done? My ultimate goal is to be able to funnel this information into a dashboard so I can monitor resource usage and bottlenecks on charts.
A constraint is that I would like to be able to add new keys without needing to modify my database schema, so having each parameter as a separate column isn't good.
Various solutions I have thought of are:
Storing each value as a string and adding another column in my "tag" table that dictates the actual type
Use a JSON column
Just altering my table to add a new column every time I think of a new tag to log :(
In SQL, key-value pairs are often stored just using strings. You can use a view or application code to convert back to a more appropriate data type.
I notice that you have left out date/times -- those are a little trickier, because you really want canonical formats such as YYYYMMDD. Or perhaps Unix epoch times (number of seconds since 1970-01-01).
You can extend this by having separate columns for each type you want to store and having a separate type columns.
However, a key-value pair may not be the best approach for this type of data. A common solution is to do the following:
Determine if any columns are "common" enough that you really care about them. This might commonly be a date/time or user or multiple such columns.
Store these columns in separate columns, parsing them when you insert the data (or perhaps using triggers).
Store the rest (or all) as JSON within the row as "extra details".
Or, in Postgres, you can just store the whole thing as JSONB and have indexes on the JSONB column. That gives you both performance and simplicity on inserting the data.

Copy tables from query in Bigquery

I am attempting to fix the schema of a Bigquery table in which the type of a field is wrong (but contains no data). I would like to copy the data from the old schema to the new using the UI ( select * except(bad_column) from ... ).
The problem is that:
if I select into a table, then Bigquery is removing the required of the columns and therefore rejecting the insert.
Exporting via json loses information on dates.
Is there a better solution than creating a new table with all columns being nullable/repeated or manually transforming all of the data?
Update (2018-06-20): BigQuery now supports required fields on query output in standard SQL, and has done so since mid-2017.
Specifically, if you append your query results to a table with a schema that has required fields, that schema will be preserved, and BigQuery will check as results are written that it contains no null values. If you want to write your results to a brand-new table, you can create an empty table with the desired schema and append to that table.
Outdated:
You have several options:
Change your field types to nullable. Standard SQL returns only nullable fields, and this is intended behavior, so going forward it may be less useful to mark fields as required.
You can use legacy SQL, which will preserve required fields. You can't use except, but you can explicitly select all other fields.
You can export and re-import with the desired schema.
You mention that export via JSON loses date information. Can you clarify? If you're referring to the partition date, then unfortunately I think any of the above solutions will collapse all data into today's partition, unless you explicitly insert into a named partition using the table$yyyymmdd syntax. (Which will work, but may require lots of operations if you have data spread across many dates.)
BigQuery now supports table clone features. A table clone is a lightweight, writeable copy of another table
Copy tables from query in Bigquery

multiple different record types within same file must be converted to sql table

One for the SQL data definition gurus:
I have a mainframe file that has about 35-100 different record types within it. Depending upon the type of record the layout and each column is redefined into whatever. Any column on any different record could become a different length or type. I am not really wanting to split this thing up into 35-100 different tables and relating them together. I did find out that postgres has %ROWTYPE with cursor or table based records. However in all examples the data looked the same. How can I setup a table that would handle this and what sql queries would be needed to return the data? Doesn't have to be postgres but that was the only thing I could find, that looked similar to my problem.
I would just make a table with all TEXT datatype fields at first. TEXT is variable, so it only takes up the space it needs, so it performs very well. From there, you may find it quicker to move the data into better formed tables, if certain data is better with a more specific data type.
It's easier to do it in this order, because bulk insert with COPY is very picky... so with TEXT you just worry about the number of columns and get it in there.
EDIT: I'm referring to Postgres with this answer. Not sure if you wanted another DB specific answer.

How to compare rows in source and destination tables dynamically in SQL Server

We receive a data feed from our customers and we get roughly the same schema each time, though it can change on the customer end as they are using a 3rd party application. When we receive the data files we import the data into a staging database with a table for each data file (students, attendance, etc). We then want to compare that data to the data that we already have existing in the database for that customer and see what data has changed (either the column has changed or the whole row was possibly deleted) from the previous run. We then want to write the updated values or deleted rows to an audit table so we can then go back to see what data changed from the previous data import. We don't want to update the data itself, we only want to record what's different between the two datasets. We will then delete all the data from the customer database and import the data exactly as is from the new data files without changing it(this directive has been handed down and cannot change). The big problem is that I need to do this dynamically since I don't know exactly what schema I'm going to be getting from our customers since they can make customization to their tables. I need to be able to dynamically determine what tables there are in the destination, and their structure, and then look at the source and compare the values to see what has changed in the data.
Additional info:
There are no ID columns on source, though there are several columns that can be used as a surrogate key that would make up a distinct row.
I'd like to be able to do this generically for each table without having to hard-code values in, though I might have to do that for the surrogate keys for each table in a separate reference table.
I can use either SSIS, SPs, triggers, etc., whichever would make more sense. I've looked at all, including tablediff, and none seem to have everything I need or the logic starts to get extremely complex once I get into them.
Of course any specific examples anyone has of something like this they have already done would be greatly appreciated.
Let me know if there's any other information that would be helpful.
Thanks
I've worked on a similar problem and used a series of meta data tables to dynamically compare datasets. These meta data tables described which datasets need to be staged and which combination of columns (and their data types) serve as business key for each table.
This way you can dynamically construct a SQL query (e.g., with a SSIS script component) that performs a full outer join to find the differences between the two.
You can join your own meta data with SQL Server's meta data (using sys.* or INFORMATION_SCHEMA.*) to detect if the columns still exist in the source and the data types are as you anticipated.
Redirect unmatched meta data to an error flow for evaluation.
This way of working is very risky, but can be done if you maintain your meta data well.
If you want to compare two tables to see what is different the keyword is 'except'
select col1,col2,... from table1
except
select col1,col2,... from table2
this gives you everything in table1 that is not in table2.
select col1,col2,... from table2
except
select col1,col2,... from table1
this gives you everything in table2 that is not in table1.
Assuming you have some kind of useful durable primary key on the two tables, everything in both sets, is a change. Everything in the first set is an insert; Everything in the second set is a delete.