Removing shiftIn and shiftout characters from utf8 file using java - java

I am receiving a file with shiftJis encoding. It has Japanese characters with shift in and shift out characters at the beginning and end of each multi byte string.
As per my requirement, I have to convert this file to utf-8 and remove the SI and SO characters from the utf-8 file? what is the best way to do this? Should I remove them before utf-8 conversion or after? and how do I remove it? thanks in advance.
my javacode is as below
public static void main(String[] args) throws Exception {
// TODO Auto-generated method stub
String inFilePath = "src\\encoding\\input\\dfd02.PGP_dec";
String filePath = "src\\encoding\\output\\";
String utf8FileNm = "utf8-out.txt";
String charsetName = "x-SJIS_0213";
InputStream in;
try {
in = new FileInputStream(inFilePath);
Reader reader = new InputStreamReader(in, charsetName);
StringBuilder sb = new StringBuilder();
int read;
while ((read = reader.read()) != -1){
sb.append((char)read);
}
reader.close();
String string = sb.toString();
OutputStream out = new FileOutputStream(filePath + charsetName + "-" + utf8FileNm);
Writer writer = new OutputStreamWriter(out, "UTF-8");
writer.write(string);
writer.close();
System.out.println("Finished writing the input file in UTF-8 format");
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}

Related

JAVA write on specific line using BufferedWriter [duplicate]

How do I replace a line of text found within a text file?
I have a string such as:
Do the dishes0
And I want to update it with:
Do the dishes1
(and vise versa)
How do I accomplish this?
ActionListener al = new ActionListener() {
#Override
public void actionPerformed(ActionEvent e) {
JCheckBox checkbox = (JCheckBox) e.getSource();
if (checkbox.isSelected()) {
System.out.println("Selected");
String s = checkbox.getText();
replaceSelected(s, "1");
} else {
System.out.println("Deselected");
String s = checkbox.getText();
replaceSelected(s, "0");
}
}
};
public static void replaceSelected(String replaceWith, String type) {
}
By the way, I want to replace ONLY the line that was read. NOT the entire file.
At the bottom, I have a general solution to replace lines in a file. But first, here is the answer to the specific question at hand. Helper function:
public static void replaceSelected(String replaceWith, String type) {
try {
// input the file content to the StringBuffer "input"
BufferedReader file = new BufferedReader(new FileReader("notes.txt"));
StringBuffer inputBuffer = new StringBuffer();
String line;
while ((line = file.readLine()) != null) {
inputBuffer.append(line);
inputBuffer.append('\n');
}
file.close();
String inputStr = inputBuffer.toString();
System.out.println(inputStr); // display the original file for debugging
// logic to replace lines in the string (could use regex here to be generic)
if (type.equals("0")) {
inputStr = inputStr.replace(replaceWith + "1", replaceWith + "0");
} else if (type.equals("1")) {
inputStr = inputStr.replace(replaceWith + "0", replaceWith + "1");
}
// display the new file for debugging
System.out.println("----------------------------------\n" + inputStr);
// write the new string with the replaced line OVER the same file
FileOutputStream fileOut = new FileOutputStream("notes.txt");
fileOut.write(inputStr.getBytes());
fileOut.close();
} catch (Exception e) {
System.out.println("Problem reading file.");
}
}
Then call it:
public static void main(String[] args) {
replaceSelected("Do the dishes", "1");
}
Original Text File Content:
Do the dishes0
Feed the dog0
Cleaned my room1
Output:
Do the dishes0
Feed the dog0
Cleaned my room1
----------------------------------
Do the dishes1
Feed the dog0
Cleaned my room1
New text file content:
Do the dishes1
Feed the dog0
Cleaned my room1
And as a note, if the text file was:
Do the dishes1
Feed the dog0
Cleaned my room1
and you used the method replaceSelected("Do the dishes", "1");,
it would just not change the file.
Since this question is pretty specific, I'll add a more general solution here for future readers (based on the title).
// read file one line at a time
// replace line as you read the file and store updated lines in StringBuffer
// overwrite the file with the new lines
public static void replaceLines() {
try {
// input the (modified) file content to the StringBuffer "input"
BufferedReader file = new BufferedReader(new FileReader("notes.txt"));
StringBuffer inputBuffer = new StringBuffer();
String line;
while ((line = file.readLine()) != null) {
line = ... // replace the line here
inputBuffer.append(line);
inputBuffer.append('\n');
}
file.close();
// write the new string with the replaced line OVER the same file
FileOutputStream fileOut = new FileOutputStream("notes.txt");
fileOut.write(inputBuffer.toString().getBytes());
fileOut.close();
} catch (Exception e) {
System.out.println("Problem reading file.");
}
}
Since Java 7 this is very easy and intuitive to do.
List<String> fileContent = new ArrayList<>(Files.readAllLines(FILE_PATH, StandardCharsets.UTF_8));
for (int i = 0; i < fileContent.size(); i++) {
if (fileContent.get(i).equals("old line")) {
fileContent.set(i, "new line");
break;
}
}
Files.write(FILE_PATH, fileContent, StandardCharsets.UTF_8);
Basically you read the whole file to a List, edit the list and finally write the list back to file.
FILE_PATH represents the Path of the file.
If replacement is of different length:
Read file until you find the string you want to replace.
Read into memory the part after text you want to replace, all of it.
Truncate the file at start of the part you want to replace.
Write replacement.
Write rest of the file from step 2.
If replacement is of same length:
Read file until you find the string you want to replace.
Set file position to start of the part you want to replace.
Write replacement, overwriting part of file.
This is the best you can get, with constraints of your question. However, at least the example in question is replacing string of same length, So the second way should work.
Also be aware: Java strings are Unicode text, while text files are bytes with some encoding. If encoding is UTF8, and your text is not Latin1 (or plain 7-bit ASCII), you have to check length of encoded byte array, not length of Java string.
I was going to answer this question. Then I saw it get marked as a duplicate of this question, after I'd written the code, so I am going to post my solution here.
Keeping in mind that you have to re-write the text file. First I read the entire file, and store it in a string. Then I store each line as a index of a string array, ex line one = array index 0. I then edit the index corresponding to the line that you wish to edit. Once this is done I concatenate all the strings in the array into a single string. Then I write the new string into the file, which writes over the old content. Don't worry about losing your old content as it has been written again with the edit. below is the code I used.
public class App {
public static void main(String[] args) {
String file = "file.txt";
String newLineContent = "Hello my name is bob";
int lineToBeEdited = 3;
ChangeLineInFile changeFile = new ChangeLineInFile();
changeFile.changeALineInATextFile(file, newLineContent, lineToBeEdited);
}
}
And the class.
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.UnsupportedEncodingException;
import java.io.Writer;
public class ChangeLineInFile {
public void changeALineInATextFile(String fileName, String newLine, int lineNumber) {
String content = new String();
String editedContent = new String();
content = readFile(fileName);
editedContent = editLineInContent(content, newLine, lineNumber);
writeToFile(fileName, editedContent);
}
private static int numberOfLinesInFile(String content) {
int numberOfLines = 0;
int index = 0;
int lastIndex = 0;
lastIndex = content.length() - 1;
while (true) {
if (content.charAt(index) == '\n') {
numberOfLines++;
}
if (index == lastIndex) {
numberOfLines = numberOfLines + 1;
break;
}
index++;
}
return numberOfLines;
}
private static String[] turnFileIntoArrayOfStrings(String content, int lines) {
String[] array = new String[lines];
int index = 0;
int tempInt = 0;
int startIndext = 0;
int lastIndex = content.length() - 1;
while (true) {
if (content.charAt(index) == '\n') {
tempInt++;
String temp2 = new String();
for (int i = 0; i < index - startIndext; i++) {
temp2 += content.charAt(startIndext + i);
}
startIndext = index;
array[tempInt - 1] = temp2;
}
if (index == lastIndex) {
tempInt++;
String temp2 = new String();
for (int i = 0; i < index - startIndext + 1; i++) {
temp2 += content.charAt(startIndext + i);
}
array[tempInt - 1] = temp2;
break;
}
index++;
}
return array;
}
private static String editLineInContent(String content, String newLine, int line) {
int lineNumber = 0;
lineNumber = numberOfLinesInFile(content);
String[] lines = new String[lineNumber];
lines = turnFileIntoArrayOfStrings(content, lineNumber);
if (line != 1) {
lines[line - 1] = "\n" + newLine;
} else {
lines[line - 1] = newLine;
}
content = new String();
for (int i = 0; i < lineNumber; i++) {
content += lines[i];
}
return content;
}
private static void writeToFile(String file, String content) {
try (Writer writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(file), "utf-8"))) {
writer.write(content);
} catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
private static String readFile(String filename) {
String content = null;
File file = new File(filename);
FileReader reader = null;
try {
reader = new FileReader(file);
char[] chars = new char[(int) file.length()];
reader.read(chars);
content = new String(chars);
reader.close();
} catch (IOException e) {
e.printStackTrace();
} finally {
if (reader != null) {
try {
reader.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
return content;
}
}
Sharing the experience with Java Util Stream
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.Stream;
public static void replaceLine(String filePath, String originalLineText, String newLineText) {
Path path = Paths.get(filePath);
// Get all the lines
try (Stream<String> stream = Files.lines(path, StandardCharsets.UTF_8)) {
// Do the line replace
List<String> list = stream.map(line -> line.equals(originalLineText) ? newLineText : line)
.collect(Collectors.toList());
// Write the content back
Files.write(path, list, StandardCharsets.UTF_8);
} catch (IOException e) {
LOG.error("IOException for : " + path, e);
e.printStackTrace();
}
}
Usage
replaceLine("test.txt", "Do the dishes0", "Do the dishes1");
//Read the file data
BufferedReader file = new BufferedReader(new FileReader(filepath));
StringBuffer inputBuffer = new StringBuffer();
String line;
while ((line = file.readLine()) != null) {
inputBuffer.append(line);
inputBuffer.append('\n');
}
file.close();
String inputStr = inputBuffer.toString();
// logic to replace lines in the string (could use regex here to be generic)
inputStr = inputStr.replace(str, " ");
//'str' is the string need to update in this case it is updating with nothing
// write the new string with the replaced line OVER the same file
FileOutputStream fileOut = new FileOutputStream(filer);
fileOut.write(inputStr.getBytes());
fileOut.close();
Well you would need to get a file with JFileChooser and then read through the lines of the file using a scanner and the hasNext() function
http://docs.oracle.com/javase/7/docs/api/javax/swing/JFileChooser.html
once you do that you can save the line into a variable and manipulate the contents.
just how to replace strings :) as i do
first arg will be filename second target string third one the string to be replaced instead of targe
public class ReplaceString{
public static void main(String[] args)throws Exception {
if(args.length<3)System.exit(0);
String targetStr = args[1];
String altStr = args[2];
java.io.File file = new java.io.File(args[0]);
java.util.Scanner scanner = new java.util.Scanner(file);
StringBuilder buffer = new StringBuilder();
while(scanner.hasNext()){
buffer.append(scanner.nextLine().replaceAll(targetStr, altStr));
if(scanner.hasNext())buffer.append("\n");
}
scanner.close();
java.io.PrintWriter printer = new java.io.PrintWriter(file);
printer.print(buffer);
printer.close();
}
}

Java java.io.IOException: Not in GZIP format

I searched for an example of how to compress a string in Java.
I have a function to compress then uncompress. The compress seems to work fine:
public static String encStage1(String str)
{
String format1 = "ISO-8859-1";
String format2 = "UTF-8";
if (str == null || str.length() == 0)
{
return str;
}
System.out.println("String length : " + str.length());
ByteArrayOutputStream out = new ByteArrayOutputStream();
String outStr = null;
try
{
GZIPOutputStream gzip = new GZIPOutputStream(out);
gzip.write(str.getBytes());
gzip.close();
outStr = out.toString(format2);
System.out.println("Output String lenght : " + outStr.length());
} catch (Exception e)
{
e.printStackTrace();
}
return outStr;
}
But the reverse is complaining about the string not being in GZIP format, even when I pass the return from encStage1 straight back into the decStage3:
public static String decStage3(String str)
{
if (str == null || str.length() == 0)
{
return str;
}
System.out.println("Input String length : " + str.length());
String outStr = "";
try
{
String format1 = "ISO-8859-1";
String format2 = "UTF-8";
GZIPInputStream gis = new GZIPInputStream(new ByteArrayInputStream(str.getBytes(format2)));
BufferedReader bf = new BufferedReader(new InputStreamReader(gis, format2));
String line;
while ((line = bf.readLine()) != null)
{
outStr += line;
}
System.out.println("Output String lenght : " + outStr.length());
} catch (Exception e)
{
e.printStackTrace();
}
return outStr;
}
I get this error when I call with a string return from encStage1:
public String encIDData(String idData)
{
String tst = "A simple test string";
System.out.println("Enc 0: " + tst);
String stg1 = encStage1(tst);
System.out.println("Enc 1: " + toHex(stg1));
String dec1 = decStage3(stg1);
System.out.println("unzip: " + toHex(dec1));
}
Output/Error:
Enc 0: A simple test string
String length : 20
Output String lenght : 40
Enc 1: 1fefbfbd0800000000000000735428efbfbdefbfbd2defbfbd495528492d2e51282e29efbfbdefbfbd4b07005aefbfbd21efbfbd14000000
Input String length : 40
java.io.IOException: Not in GZIP format
at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:137)
at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:58)
at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:68)
A small error is:
gzip.write(str.getBytes());
takes the default platform encoding, which on Windows will never be ISO-8859-1. Better:
gzip.write(str.getBytes(format1));
You could consider taking "Cp1252", Windows Latin-1 (for some European languages), instead of "ISO-8859-1", Latin-1. That adds comma like quotes and such.
The major error is converting the compressed bytes to a String. Java separates binary data (byte[], InputStream, OutputStream) from text (String, char, Reader, Writer) which internally is always kept in Unicode. A byte sequence does not need to be valid UTF-8. You might get away by converting the bytes as a single byte encoding (ISO-8859-1 for instance).
The best way would be
gzip.write(str.getBytes(StandardCharsets.UTF_8));
So you have full Unicode, every script may be combined.
And uncompressing to a ByteArrayOutputStream and new String(baos.toByteArray(), StandardCharsets.UTF_8).
Using BufferedReader on an InputStreamReader with UTF-8 is okay too, but a readLine throws away the newline characters
outStr += line + "\r\n"; // Or so.
Clean answer:
public static byte[] encStage1(String str) throws IOException
{
try (ByteArrayOutputStream out = new ByteArrayOutputStream())
{
try (GZIPOutputStream gzip = new GZIPOutputStream(out))
{
gzip.write(str.getBytes(StandardCharsets.UTF_8));
}
return out.toByteArray();
//return out.toString(StandardCharsets.ISO_8859_1);
// Some single byte encoding
}
}
public static String decStage3(byte[] str) throws IOException
{
ByteArrayOutputStream baos = new ByteArrayOutputStream();
try (GZIPInputStream gis = new GZIPInputStream(new ByteArrayInputStream(str)))
{
int b;
while ((b = gis.read()) != -1) {
baos.write((byte) b);
}
}
return new String(baos.toByteArray(), StandardCharset.UTF_8);
}
usage of toString/getBytes for encoding/decoding is a wrong way. try to use something like BASE64 encoding for this purpose (java.util.Base64 in jdk 1.8)
as a proof try this simple test:
import org.testng.annotations.Test;
import java.io.ByteArrayOutputStream;
import static org.testng.Assert.assertEquals;
public class SimpleTest {
#Test
public void test() throws Exception {
final String CS = "utf-8";
byte[] b0 = {(byte) 0xff};
ByteArrayOutputStream out = new ByteArrayOutputStream();
out.write(b0);
out.close();
byte[] b1 = out.toString(CS).getBytes(CS);
assertEquals(b0, b1);
}
}

New Line Command ("\n") Doesn't Working While Reading Files (Android) java

String mystring="Hello"+"\n"+ "World" ;
writeToFile(mystring);
String newstring = readFromFile();
mytextview.setText(newstring);
my text view just shows "HelloWorld" without newline
I Couldn't understand why It doesn't recognizes "\n"
These are my writetofile and readfromfile functions;
private void writeToFile(String data) {
try {
OutputStreamWriter outputStreamWriter = new OutputStreamWriter(openFileOutput("myfilename", Context.MODE_PRIVATE));
outputStreamWriter.write(data);
outputStreamWriter.close();
}
catch (IOException e) {
// Log.e(TAG, "File write failed: " + e.toString());
}
}
//////////////////////////////////////////////////
private String readFromFile() {
String ret = "";
try {
InputStream inputStream = openFileInput("myfilename");
if ( inputStream != null ) {
InputStreamReader inputStreamReader = new InputStreamReader(inputStream);
BufferedReader bufferedReader = new BufferedReader(inputStreamReader);
String receiveString = "";
StringBuilder stringBuilder = new StringBuilder();
while ( (receiveString = bufferedReader.readLine()) != null ) {
stringBuilder.append(receiveString);
}
inputStream.close();
ret = stringBuilder.toString();
}
}
catch (FileNotFoundException e) {
//Log.e(TAG, "File not found: " + e.toString());
} catch (IOException e) {
// Log.e(TAG, "Can not read file: " + e.toString());
}
return ret;
}
what I am trying to do is saving a string to phone's internal storage and read back the same string .
you are using the BufferedReader, check the documentation for readLine()
it states:
Returns the next line of text available from this reader. A line is represented by zero or more characters followed by '\n', '\r', "\r\n" or the end of the reader. The string does not include the newline sequence.
you could manually add it back in your while loop, or use another readXYZ method.
I have found the solution as suggested by Su-Au Hwang by this way
addded manually \n by replacing ordinary \n
String[] strLines = new String[lines.size()];
for (int i = 0; i < lines.size(); i++) {
lines.set(i, lines.get(i).replace("\\n","\n"));
strLines[i] = lines.get(i);
}
return strLines;

Displaying character when read into file - encoding

I have a problem with the encoding of text when I display it into my text area.
The problem is when there are character like : é à è
I have in my text area ? ? ? instead
Here is the part of code to read my file :
private void importerActionPerformed(java.awt.event.ActionEvent evt) {
jTabbedPane1.setSelectedIndex(0);
try {
JFileChooser explorer = new JFileChooser(chemin);
int answer = explorer.showOpenDialog(this);
if (answer == JFileChooser.APPROVE_OPTION) {
chemin = explorer.getCurrentDirectory().getAbsolutePath();
String name = explorer.getSelectedFile().getCanonicalPath();
System.out.println("name : "+name);
texte.setText("");
File file = new File(name);
try {
DataInputStream in = new DataInputStream(new FileInputStream(file));
String result = in.readUTF();
texte.setText(result);
in.close();
System.out.println("Erreur la");
} catch (IOException e) {
DataInputStream in = new DataInputStream(new FileInputStream(file));
String result = null;
result = "";
byte[] buff = new byte[2048];
int read = in.read(buff, 0, 2048);
while (read >= 0) {
String substr = new String(buff, 0, read);
result += substr;
read = in.read(buff, 0, 2048);
}
// System.out.println(result);
Charset charset = Charset.forName("UTF-8");
result = charset.decode(charset.encode(result)).toString();
texte.setText(result);
in.close();
//System.out.println("Erreur la2");
}
}
} catch (Exception err) {
JOptionPane.showMessageDialog(this, "Erreur lors du chargement du fichier", "Error", JOptionPane.WARNING_MESSAGE);
}
}
My textarea is : texte.setText(result);
Do you have any idea?
If your file encoding is UTF-8 then simply read it like here
BufferedReader br = new BufferedReader(new FileReader(file));
String line;
while ((line = br.readLine()) != null) {
result += line;
}
br.close();
texte.setText(result);
The Charset.encode method expects a string in unicode format. In fact all strings in java are supposed to be unicode (utf16).
Do this
String substr = new String(buff, 0, read, "UTF-8");
And remove all the charset.encode/decode code.
Your line String substr = new String(buff, 0, read); should have been
String substr = new String(buff, 0, read,"UTF-8");
Constructor String(byte[] bytes, int offset, int length) constructs a new String by decoding the specified subarray of bytes using the platform's default charset.

Android: decompress string that was compressed with PHP gzcompress()

How can i decompress a String that was zipped by PHP gzcompress() function?
Any full examples?
thx
I tried it now like this:
public static String unzipString(String zippedText) throws Exception
{
ByteArrayInputStream bais = new ByteArrayInputStream(zippedText.getBytes("UTF-8"));
GZIPInputStream gzis = new GZIPInputStream(bais);
InputStreamReader reader = new InputStreamReader(gzis);
BufferedReader in = new BufferedReader(reader);
String unzipped = "";
while ((unzipped = in.readLine()) != null)
unzipped+=unzipped;
return unzipped;
}
but it's not working if i i'm trying to unzip a PHP gzcompress (-ed) string.
PHP's gzcompress uses Zlib NOT GZIP
public static String unzipString(String zippedText) {
String unzipped = null;
try {
byte[] zbytes = zippedText.getBytes("ISO-8859-1");
// Add extra byte to array when Inflater is set to true
byte[] input = new byte[zbytes.length + 1];
System.arraycopy(zbytes, 0, input, 0, zbytes.length);
input[zbytes.length] = 0;
ByteArrayInputStream bin = new ByteArrayInputStream(input);
InflaterInputStream in = new InflaterInputStream(bin);
ByteArrayOutputStream bout = new ByteArrayOutputStream(512);
int b;
while ((b = in.read()) != -1) {
bout.write(b); }
bout.close();
unzipped = bout.toString();
}
catch (IOException io) { printIoError(io); }
return unzipped;
}
private static void printIoError(IOException io)
{
System.out.println("IO Exception: " + io.getMessage());
}
Try a GZIPInputStream. See this example and this SO question.
See
http://developer.android.com/reference/java/util/zip/InflaterInputStream.html
since the DEFLATE algorithm is gzip.

Categories