typical report data is like this,
A simple approach that i wanted to follow was to use space as a delimeter but the data is not in a well structured manner
read the first line of the file and split each column by checking if there is more than 1 whitespace. In addition to that you count how long each column is.
after that you can simply go through the other rows containing data and extract the information, by checking the length of the column you are at
(and please don't put images of text into stackoverflow, actual text is better)
EDIT:
python implementation:
import pandas as pd
import re
file = "path/to/file.txt"
with open("file", "r") as f:
line = f.readline()
columns = re.split(" +", line)
column_sizes = [re.finditer(column, line).__next__().start() for column in columns]
column_sizes.append(-1)
# ------
f.readline()
rows = []
while True:
line = f.readline()
if len(line) == 0:
break
elif line[-1] != "\n":
line += "\n"
row = []
for i in range(len(column_sizes)-1):
value = line[column_sizes[i]:column_sizes[i+1]]
row.append(value)
rows.append(row)
columns = [column.strip() for column in columns]
df = pd.DataFrame(data=rows, columns=columns)
print(df)
df.to_excel(file.split(".")[0] + ".xlsx")
You are correct that export from text to csv is not a practical start, however it would be good for import. So here is your 100% well structured source text to be saved into plain text.
And here is the import to Excel
you can use google lens to get your data out of this picture then copy and paste to excel file. the easiest way.
or first convert this into pdf then use google lens. go to file scroll to print option in print setting their is an option of MICROSOFT PRINT TO PDF select that and press print it will ask you for location then give it and use it
I have requirement where I need to read a file which is generated by another application and file has 201 numeric column name like: -10.0, -9.9, -9.8, -9.7 .......0.....+9.7, +9.8, +9.9, +10.0 so total I have 201 columns in the file. I am reading many files through Flink but file has string type column name and I am creating an model Object with the attributes as columns name available in file as below
DataSet<Person>> csvInput = env.readCsvFile("file:///path/to/my/textfile")
.pojoType(Person.class, "name", "age", "zipcode");
above code will ready file and Person object will be populated with the values available in the File.
I am facing challenge in new requirement where file columns name is numeric and in Java I cannot create a variable with numeric value along with decimal like -10.0 etc.
like private String -10.0 not allowed in java
I am seeking for a solution, could any one please help me out here.
I want to resolve some .mif files with C# or Java to get the coordinate information.But there are few documents about this kind of file and I'm a newcomer ... Maybe I can resolve them simply using InputStream but I want to know how others do this :P
Also if you know the meaning of each line please tell me :-D
With great pleasure.
The head of mif file like this:
Version 300
Charset "WindowsSimpChinese"
Delimiter ","
CoordSys Earth Projection 1, 104
Columns 9
guid Char(64)
sr_id Char(64)
mp_id Char(64)
name Char(128)
catalog Char(9)
floor Char(8)
tag Char(254)
elevation Char(6)
rank Char(2)
Data
And the data like this:
Region 1
5
116.312476 40.027556
116.312302 40.027541
116.312292 40.027609
116.312468 40.027623
116.312476 40.027556
Pen (1,2,0)
Brush (1,0,16777215)
Center 116.312384 40.027582
I'm quite a beginner for not only OptaPlanner but also Java, which I started to learn two weeks ago. I am considering to start with modifying an example accompanied with the source program to make a scheduler for my present project.
Target
The XML input file of the “Curriculum Course” example of RedHat OptaPlanner
Question
Is there any easy way to modify the XML input files or add customized ones?
Example data
Original XML: optaplanner-examples/data/curriculumcourse/unsolved/comp01.xml
<CourseSchedule id="1">
...
<curriculumList id="27">
<Curriculum id="28">
<code>q000</code>
</Curriculum>
...
<courseList id="42">
<Course id="43">
<curriculumList id="44">
<Curriculum reference="28"/>
<Curriculum reference="30"/>
</curriculumList>
</Course>
...
</courseList>
...
</CourseSchedule>
The problem is that I have to renumber all the ID numbers every time I change the length of any lists.
In the “Curriculum Course” example, I can find the corresponding .ctt files that are easy to change using a common text editor. The following is input data that I want to create expressed in the .ctt format:
.ctt modified from: optaplanner-examples/data/curriculumcourse/import/comp01.ctt
Name: Test01
Courses: 3
Rooms: 3
Days: 5
Periods_per_day: 13
Curricula: 3
Constraints: 0
COURSES:
c0001 t000 5 99 1
c0002 t001 5 99 1
c0003 t002 5 99 1
ROOMS:
P01 999
P02 999
P03 999
CURRICULA:
q001 1 c0001
q002 1 c0002
q003 1 c0003
UNAVAILABILITY_CONSTRAINTS:
END.
I wonder some tools could convert this text data into the XML files specific to the “Curriculum Course” example.
If you click "Import..." button of the application, you can load the .ctt file and solve it. You don't need to convert .ctt to .xml. So there should be no problem for you to place your own xxx.ctt file under data/curriculumcourse/import and let the app import it.
MAJOR ACC NO,MINOR ACC NO,STD CODE,TEL NO,DIST CODE
7452145,723456, 01,4213036,AAA
7254287,7863265, 01,2121920,AAA
FRUNDTE,FMACNO,FACCNO,FDISTCOD,FBILSEQ,FOOCTYP,FOOCDES,FOOCAMT,FSTD,FTELNO,FNORECON,FXFRACCN,FLANGIND,CUR
12345,71234,7643234,AAA,001,DX,WLR Promotion - Insitu /Pre-Cabled PSTN Connection,-37.87,,,0,,E,EUR
FRUNDTE,FMACNO,FACCNO,FDISTCOD,FBILSEQ,FORDNO,FREF,FCHGDES,FCHGAMT,CUR,FORENFRM,FORENTO
3242241,72349489,2345352,AAA,001,30234843P ,1,NEW CONNECTION - PRECABLED CHARGE,37.87,EUR,2123422,201201234
12123471,7618412389,76333232,AAA,001,3123443P ,2,BROKEN PERIOD RENTAL,5.40,EUR,201234523,20123601
I have a csv file something like the one above and I want to extract certain columns from it. For example I want to extract the first column of the first paragraph. I'm kind of new to java but I am able to read the file but I want to extract certain columns from different paragraphs. Any help will be appreciated.