Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import of CSV-columns with hyphens (-) fail #38

Open
gatepoet opened this issue Jan 12, 2023 · 6 comments
Open

Import of CSV-columns with hyphens (-) fail #38

gatepoet opened this issue Jan 12, 2023 · 6 comments
Assignees
Labels
bug Something isn't working Upload

Comments

@gatepoet
Copy link
Contributor

We store the colunm mappings in the DB as shown below
image

This causes import of data with hyphens in their name, e.g. my-column

@gatepoet gatepoet added the bug Something isn't working label Jan 12, 2023
@AnneAbelseth
Copy link
Collaborator

AnneAbelseth commented Sep 14, 2023

Not sure which one of you to assign this to, Please remove yourself if you think you're the wrong person :)

(I don't think Kristoffer fixed it, but if so: Just close the issue?)

@AnneAbelseth AnneAbelseth added this to the Most urgent bugs fixed milestone Sep 14, 2023
@jhf
Copy link
Collaborator

jhf commented Sep 14, 2023

The file src/nscreg.Data/Entities/DataSource.cs contains

public class DataSource  {
  ...
  public string VariablesMapping { get; set; }
  ...
}

And the string is parsed with

public (string source, string target)[] VariablesMappingArray =>
            VariablesMapping?.Split(',').Select(vm =>
            {
                var pair = vm.Split('-');
                return (pair[0], pair[1]);
            }).ToArray()
            ?? Array.Empty<(string, string)>();

So there is no support for variable names with '-'.

I think a better design is to store this as a complex type, such as json, that handles escaping of strings,
rather than as a simple string.
I do note that Postgres supports json columns for such complex use cases.

@sirarsalih
Copy link

@AnneAbelseth As a short term solution to this bug, we recommend that hyphens (-) are not used in the column names. The long term solution is to rewrite the data to use a complex type instead of string.

@sirarsalih
Copy link

sirarsalih commented Sep 27, 2023

I have implemented a short term solution to the problem; I added a label informing the user not to use hyphens (-) in the column values when uploading the XML/CSV document. The implementation will be code reviewed, I'll let you know once it has been merged to the main branch.

@jhf
Copy link
Collaborator

jhf commented Sep 27, 2023

Here a the relevant table excerpt for this problem:

statbus_development=> select * from "DataSourceUploads";
-[ RECORD 1 ]---------+----------------------------------------------------------------------------------------------------------------------------
Id                    | 1
Name                  | Statbus Legal Unit Batch
Description           | 
UserId                | 55e24c8c-eb4b-4d63-9b2f-11616f34661a
Priority              | 1
AllowedOperations     | 1
AttributesToCheck     | statId,name,statIddate,statusDate,status,sectorCode
OriginalCsvAttributes | statId,name,statIddate,statusDate,status,sectorCode
StatUnitType          | 2
Restrictions          | 
VariablesMapping      | statId-StatId,name-Name,statIddate-StatIdDate,statusDate-StatusDate,status-UnitStatusId.Code,sectorCode-InstSectorCode.Code
CsvDelimiter          |         
CsvSkipCount          | 0
DataSourceUploadType  | 1

@sirarsalih
Copy link

Added the short solution (label) to the main branch (merged).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Upload
Projects
None yet
Development

No branches or pull requests

4 participants