Best way to store this data in SQL Table - sql

I am trying to best figure out a way to store this particular case of data in a static database table
on the front end the user will be presented with a simple table (say 5x5 by default) but the user can expand or delete rows/ columns at will.
The idea is that the first column will be labels that are the same as the headers, similar to a graph and that any change to either is reflected to the other, so the graph (not really a graph, but I feel it is a valid comparison) remains a square
col1 col2 col3 col4 col5
row1 1 5 6 8 9
row2 2 3 4 5 3
row3 0 0 0 0 0 all values can be different
row4 0 0 0 0 0 the row and column names will be the same always
row5 0 0 0 0 0
IE (start with 5x5, if user removes row5 or col5 it becomes 4x4)
so because of this dynamic amount of columns I am unsure how to represent it in a table in the database.
each cell in the "map" will just contain an int and I am having a hard time finding an elegant way to map this in my database for usage with linq to sql
any ideas?
EDIT: I could simply wrap everything from the front end in a JSON object and have a bunch of code for that but i'm not sure if that would be easiest or not, mainly looking for some good input for now

Here's a start, using a Graph table with separate tables for columns, rows and cells. It' doesn't enforce the "squareness" of the graph, but you could enforce that in your application logic.
Graph
- GraphID
GraphColumn
- GraphID
- ColumnID
- ColumnName
GraphRow
- GraphID
- RowID
- RowName
GraphCell
- ColumnID
- RowID
- CellValue

Related

Storing a large array into a table with 10,000 columns in SQLite

I want to be able to store some 100x100 matrices onto a table within my database (covariance matrices). A first good step for me would be to flatten the matrix and store the matrix structure (among other things) into a parent table.
However, creating such a table would require to make a table with about 10,000 or so columns. Writing so many field names would make my SQL code extraordinarily large, and I wouldn't know where to start if I want to query for that matrix.
Is there a neat way to specify such a table in SQL? Is there a neat way for me to set or get a particular (set of) matrix (matrices) from my database using such a table? Is there a better way?
I am using Sqlite for my databases.
All tables with big size of same typed columns can be rotated.
For example if you have a table A like this:
row col1 col2 col3 ...
1 1 2 3
2 11 12 13
You can simply rotate to a table with 3 colums
row col value
1 1 1
1 2 2
1 3 3
2 1 11
2 2 12
2 3 13
so instead of writing big sql like
select col1, col2, col3 ...... from A where row = 2
you write sql like
select value from A where row = 2 order by col
the result set was originally horizontal and now become vertical -- it is rotated and easy to handle.

Datastage Column Mapping logic Based on column Value (Or any SQL function to do the same)

I'm trying to find a solution in datastage (Or in SQL) - without having to use a bunch of if/else conditions - where I can map value of one column based on value of another column.
Example -
Source File -
ID
Header1
Value1
Header2
Value2
1
Length
10
Height
15
2
Weight
200
Length
20
Target Output -
ID
Length
Height
Weight
1
10
15
2
20
200
I can do this using Index/Match function of excel. Was wondering if datastage or Snowflake can look into all these fields similarly and automatically populate the value column to the corresponding header column!
I think the best solution in DataStage would be a Pivot stage followed by a Transformer stage to strip out the hard-coded source column names.

How to (1) condense into one row after certain number of rows; (2) How to assign field names

Using Pentaho PDI 8.3.
After REST calls with quite complex data structures, I was able to extract data with a row for each data element in a REST result/ E.g:
DataCenterClusterAbstract
1
UK1
Datacenter (auto generated)
Company
29
0
39
15
DATAUPDATEJOB
2016-04-09T21:34:31.18
DataCenterClusterAbstract
2
UK1_Murex
Datacenter (auto generated)
Company
0
0
0
0
DATAUPDATEJOB
2016-04-09T21:34:31.18
DataCenterClusterAbstract
3
UK1_UNIX
Notice that there are 8 data elements that are spread out into separate rows. I would like to condense these 8 data elements into one row each iteration in Pentaho. Is this possible? And assign field names?
Row flattener
Condense 8 data element in columns into one row. Each of these 8 data elements are repeating.
(1) Add row flattener
(2) Assign field names for the rows coming in - so you have 10 data attributes in rows specify a field name for each row.
(3) In table output use space as seperator

SQL: How to sort overlapping groups efficiently

I'm trying to make groups on a database with 10.000+ rows.
I need to be fast and efficient, so I'm doing binary variables for each cluster.
One, Two, Four, Five and Six is in Group1.
But 'Two' might also be in Group nr. 2, because of errors I cannot overcome because my dataset is from a webscrape. I try to sort everything in a unique way, but it's basically impossible not to do errors, if I wish to be efficient and fast.
ID Title Group1 Group2 Group3 Ungrouped
1 One 1 0 0 0
2 Two 1 1 0 0
3 Three 0 1 1 0
4 Four 1 0 1 0
5 Five 1 0 0 0
6 Six 1 1 1 0
7 Seven 0 0 0 1
My idea for a sollution:
Assign groups (one's) until everything is grouped one or more times.
Make a query for everything that has more than one group assigned (2, 3, 4, 6)
Manually decide which 1's to remove, until they only have one group assigned each.
It's actually a good idea to do the 3rd part manually, because it requires content analysis of the documents)
My question:
How do I specify, that I need to see everything with more than one group? Does it have something to do with constraints and unique values, or is there a more simple and obvious way that I'm not seeing?
If your clusters are stored as integers, you can just do:
select c.*
from clusters c
where (cluster1 + cluster2 + cluster3) > 1;
I don't know what a "binary variable" is in SQLite. Some databases do support binary flags, and you would need to convert the values to integers for the where clause.

How to proceed with my Spark / Scala project

I am new to Spark and Scala. I am working on a Scala project where I will have data access from SQL Server.
There is a table in SQL Server has info about clothes. itemCode is the primary key and several attributes with Boolean value 0/1 - Designer, Exclusive, Handloom and several other columns having attributes of the product etc.
Code Designer Exclusive Handloom
A 1 0 1
B 1 0 0
C 0 0 1
D 0 1 0
E 0 1 0
F 1 0 1
G 0 1 0
H 0 0 0
I 1 1 1
J 1 1 1
K 0 0 1
L 0 1 0
M 0 1 0
N 1 1 0
O 0 1 1
P 1 1 0
and the list continues.
I have to select a collection of 32 items out of 320 items that have ATLEAST:
8 Designer, 8 Exclusive, 8 Handloom, 8 WeddingStyle, 8 PartyStyle,
8 Silk, 8 Georgette
I had solved the problem in MS Excel solver (it uses Gradient Descent algo) by adding an extra column and using sumproduct function between added column and required columns. So, the problem was solved there and it took around 1 minute 30 seconds for the same.
Also, the problem can be solved by writing an SQL query with 32 joins (so many), for example, if i want to select 6 items out of those 16 above with atleast 4 items designer, 4 exclusive, 4 handloom, the query would be like in my post: MYSQL - Select rows fulfilling many count conditions
In production, I have to fetch 32 rows like this way, So my question is how do I proceed further with the project.
I am working on Scala IDE for Eclipse, and have added spark mllib there. I have fetched data via JDBC and stored in a dataframe, and the created a temporary table:
dataFrame.registerTempTable("Data")
There is a class optimizer in mllib optimization that uses gradient descent (like excel solver does) to solve problems. But, that is for machine learning and takes as input training data.
I am not able to understand how do I proceed with my project. Can i use mllib, or use a better simple version of the sql with sparkSQL. I need serious help.
I'd recommend you to use https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#creating-dataframes rather than MLLib.
I solved this problem through linear programming. I have now used lpsolver library for java in my scala project. It is giving almost the same result as in excel solver.