I recently received an example export file generated by the Roche LightCycler 480 instrument. It uses a proprietary XML format, for which I haven't found a specification yet.
From such types of files, I would like to extract some information relevant to my purposes. Although most of it can be easily parsed and interpreted, it contains a number of (unpadded) base 64 encoded fields of binary/serialized data representing arrays of integer and/or floating point numbers. A link to the example file can be found in this gist.
I have included some fragment of it at the end of this post. The AcquisitionTable contains a total of 19 such encoded item entries. That likely represent arrays of integer (SampleNo) and floating point (Fluor1) values.
How the decoded bytes are to be translated to integer or floating point values is still unclear to me. When base 64 decoding, each of the items starts with the following (hex) 6 byte sequence:
42 41 52 5A 00 00 ... // ['B','A','R','Z','\0','\0', ...]
Note that while it is my expectation that each 'item' contains the same amount of numbers (or "rows" in this table), I am observing a different number of decoded bytes for similar items: 5654 for Fluor1 and 5530 for Fluor2.
Additionally for those arrays which I suspect contain (sequential) integers, a pattern can be observed:
SampleNo : ... 1F F5 1F 07 2F 19 2F 2B 2F 3D 2F 4F 2F 61 2F 00 73 2F 85 2F 97 2F A9 2F BB 2F CD 2F DF 2F F1 2F 00 03 3F 15 3F 27 ...
Cycles : ... 1F FF 1F 11 2F 23 2F 35 2F 47 2F 59 2F 6B 2F 00 7D 2F 8F 2F A1 2F B3 2F C5 2F D7 2F E9 2F FB 2F 00 0D 3F 1F 3F 31 ...
Gain : ... 1F EE 1F 00 2F 12 2F 24 2F 36 2F 00 48 2F 5A 2F 6C 2F 7E 2F 90 2F A2 2F B4 2F C6 2F 00 D8 2F EA 2F FC 2F 0E 3F 20 3F 32 ...
It looks like pairs of bytes, where the second byte is increasing by 0x12 (18) and occasionally a group of 3 bytes with 0x00 as the second byte in case the last byte's nibble is 3, D or 8 for the three examples respectively.
I was wondering if the type of encoding/serialization format would be obvious to anyone (or, even better, if someone has a specification of this file format).
I believe the software used to create these files is currently Java based, but has a history as a Windows/MFC/C++ product.
<obj name="AcquisitionTable" class="AcquisitionTable" version="1">
<prop name="Count">2400</prop>
<prop name="ChannelCount">6</prop>
<list name="Columns" count="19">
<item name="SampleNo">QkFSWgAABHgCAER0Cu3xAe3wAuv//f8PDyEPADMPRQ9XD2kPew+ND58PsQ8Aww/VD+cP+Q8LHx0fLx9BHwBTH2Ufdx+JH5sfrR+/H9EfAOMf9R8HLxkvKy89L08vYS8Acy+FL5cvqS+7L80v3y/xLwADPxU/Jz85P0s/XT9vP4E/AJM/pT+3P8k/2z/tP/8/EU8AI081T0dPWU9rT31Pj0+hTwCzT8VP10/pT/tPDV8fXzFfAENfVV9nX3lfi1+dX69fwV8A01/lX/dfCW8bby1vP29RbwBjb3Vvh2+Zb6tvvW/Pb+FvAPNvBX8Xfyl/O39Nf19/cX8Ag3+Vf6d/uX/Lf91/738BjwATjyWPN49Jj1uPbY9/j5GPAKOPtY/Hj9mP64/9jw+fIZ8AM59Fn1efaZ97n42fn5+xnwDDn9Wf55/5nwuvHa8vr0GvAFOvZa93r4mvm6+tr7+v0a8A46/1rwe/Gb8rvz2/T79hvwBzv4W/l7+pv7u/zb/fv/G/AAPPFc8nzznPS89dz2/Pgc8Ak8+lz7fPyc/bz+3P/88R3wAj3zXfR99Z32vffd+P36HfALPfxd/X3+nf+98N7x/vMe8AQ+9V72fvee+L753vr+/B7wDT7+Xv9+8J/xv/Lf8//1H/AGP/df+H/5n/q/+9/8//4f8A8/8FDxcPKQ87D00PXw9xDwCDD5UPpw+5D8sP3Q/vDwEfABMfJR83H0kfWx9tH38fkR8Aox+1H8cf2R/rH/0fDy8hLwAzL0UvVy9pL3svjS+fL7EvAMMv1S/nL/kvCz8dPy8/QT8AUz9lP3c/iT+bP60/vz/RPwDjP/U/B08ZTytPPU9PT2FPAHNPhU+XT6lPu0/NT99P8U8AA18VXydfOV9LX11fb1+BXwCTX6Vft1/JX9tf7V//XxFvACNvNW9Hb1lva299b49voW8As2/Fb9dv6W/7bw1/H38xfwBDf1V/Z395f4t/nX+vf8F/ANN/5X/3fwmPG48tjz+PUY8AY491j4ePmY+rj72Pz4/hjwDzjwWfF58pnzufTZ9fn3GfAIOflZ+nn7mfy5/dn++fAa8AE68lrzevSa9br22vf6+RrwCjr7Wvx6/Zr+uv/a8PvyG/ADO/Rb9Xv2m/e7+Nv5+/sb8Aw7/Vv+e/+b8Lzx3PL89BzwBTz2XPd8+Jz5vPrc+/z9HPAOPP9c8H3xnfK98930/fYd8Ac9+F35ffqd+7383f39/x3wAD7xXvJ+8570vvXe9v74HvAJPvpe+378nv2+/t7//vEf8AI/81/0f/Wf9r/33/j/+h/wCz/8X/1//p//v/DQ8fDzEPAEMPVQ9nD3kPiw+dD68PwQ8A0w/lD/cPCR8bHy0fPx9RHwBjH3Ufhx+ZH6sfvR/PH+EfAPMfBS8XLykvOy9NL18vcS8Agy+VL6cvuS/LL90v7y8BPwATPyU/Nz9JP1s/bT9/P5E/AKM/tT/HP9k/6z/9Pw9PIU8AM09FT1dPaU97T41Pn0+xTwDDT9VP50/5TwtfHV8vX0FfAFNc</item>
<item name="ProgramNo">QkFSWgAABHMCAERvANz///8RDyMPNQ9HD1kPaw8AfQ+PD6EPsw/FD9cP6Q/7DwANHx8fMR9DH1UfZx95H4sfAJ0frx/BH9Mf5R/3HwkvGy8ALS8/L1EvYy91L4cvmS+rLwC9L88v4S/zLwU/Fz8pPzs/AE0/Xz9xP4M/lT+nP7k/yz8A3T/vPwFPE08lTzdPSU9bTwBtT39PkU+jT7VPx0/ZT+tPAP1PD18hXzNfRV9XX2lfe18AjV+fX7Ffw1/VX+df+V8LbwAdby9vQW9Tb2Vvd2+Jb5tvAK1vv2/Rb+Nv9W8Hfxl/K38APX9Pf2F/c3+Ff5d/qX+7fwDNf99/8X8DjxWPJ485j0uPAF2Pb4+Bj5OPpY+3j8mP248A7Y//jxGfI581n0efWZ9rnwB9n4+foZ+zn8Wf15/pn/ufAA2vH68xr0OvVa9nr3mvi68Ana+vr8Gv06/lr/evCb8bvwAtvz+/Ub9jv3W/h7+Zv6u/AL2/z7/hv/O/Bc8XzynPO88ATc9fz3HPg8+Vz6fPuc/LzwDdz+/PAd8T3yXfN99J31vfAG3ff9+R36Pftd/H39nf698A/d8P7yHvM+9F71fvae977wCN75/vse/D79Xv5+/57wv/AB3/L/9B/1P/Zf93/4n/m/8Arf+//9H/4//1/wcPGQ8rDwA9D08PYQ9zD4UPlw+pD7sPAM0P3w/xDwMfFR8nHzkfSx8AXR9vH4Efkx+lH7cfyR/bHwDtH/8fES8jLzUvRy9ZL2svAH0vjy+hL7MvxS/XL+kv+y8ADT8fPzE/Qz9VP2c/eT+LPwCdP68/wT/TP+U/9z8JTxtPAC1PP09RT2NPdU+HT5lPq08AvU/PT+FP808FXxdfKV87XwBNX19fcV+DX5Vfp1+5X8tfAN1f718BbxNvJW83b0lvW28AbW9/b5Fvo2+1b8dv2W/rbwD9bw9/IX8zf0V/V39pf3t/AI1/n3+xf8N/1X/nf/l/C48AHY8vj0GPU49lj3ePiY+bjwCtj7+P0Y/jj/WPB58ZnyufAD2fT59hn3OfhZ+Xn6mfu58AzZ/fn/GfA68VryevOa9LrwBdr2+vga+Tr6Wvt6/Jr9uvAO2v/68RvyO/Nb9Hv1m/a78Afb+Pv6G/s7/Fv9e/6b/7vwANzx/PMc9Dz1XPZ895z4vPAJ3Pr8/Bz9PP5c/3zwnfG98ALd8/31HfY99134ffmd+r3wC938/f4d/z3wXvF+8p7zvvAE3vX+9x74Pvle+n77nvy+8A3e/v7wH/E/8l/zf/Sf9b/wBt/3//kf+j/7X/x//Z/+v/AP3/Dw8hDzMPRQ9XD2kPew8AjQ+fD7EPww/VD+cP+Q8LHwAdHy8fQR9TH2Ufdx+JH5sfAK0fvx/RH+Mf9R8HLxkvKy8APS9PL2Evcy+FL5cvqS+7LwDNL98v8S8DPxU/Jz85P0s/AF0/bz+BP5M/pT+3P8k/2z8A7T//PxFPI081T0dPWU9rTwB9T49PoU+zT8VP10/pT/tPAA1fH18xX0NfVV9nUwA</item>
... snipped
<item name="Fluor1">QkFSWgAAFg0CAFYJ+xwg7vGsP1qIWb738CFAHegc//CsnT/u9cqGyQ8A/PcbfVgeAas/qoOpJwDu/P9SgVE/ACFAHmuwHUcArR0GwX9WAWYUD2l9bgFcD9l7hgFmdA9Jep4BLA8pd7YBZqQPmXXOAYwP0XTmAWa8D0Fz/gHsD7FxFhFmBB9Zby4RHB8BbUYRZjQfqWpeEUwfGWl2EWZkH8FmjhHUD2lkphFmfB8RYr4RlB+5X9YRZqwf8V7uEdwfmVwGIWb0H0FaHiEML+lXNiFmxB+RVU4hPC9xUmYhZiQv4VB+IVQviU6WIWZsL2lLriGcL0lIxiFmtC+5Rt4hzC+ZQ/YhMoQvQQ4y/C8hPiYy/f4zkTw+MRQ/cTlWMUQ/M+E3bjFcP4k1hjF0PzMxM54xLD/ZMLYxpD8zSS/OMYw/KSzmMbw/M5kq/jHsP0EoFkHUPzPpJS5BHE+RI0ZBBE8zASJeQUxPqR92QWRPMxkejkF8TzEapkGUT5p3Ehk0TxEX1kFED/GZE+5BrE+ZEQZRxE9BmQ8eUfRPsQ02USRfWZkLTlE8X3EHZlEMXxmZBX5RbF+JA5ZRhF/BuQKuUZxfof+gx1AgxsVP/hDfUMxXrgb6KJz3UMxfmfiYD2D8X7Fz9LAnYBRvIfMgP2HmTU/wAFdgLG9x7nCcb2Bcb6ntqIdgRG+Jc+qIn2B0bzHoMLdgzqRvoeagz2CMb0nkOUjnYNRv8eHw/2Dsb+eZ35gXcLxvCd4InC9wHH/p2uhHcDR/kXPYkF9wTH9x1XB3cJkgRQZ2JtPgj3Bkf8Gb0MCncCBA7vW+Vs05oL9wlH8RzBDXcMR/5ynIKO9wBH+ZxpicB4Dcf3nDeB+A9H8hg8EgN4EtbzeE7vXu9TlzvThngDyP4brgf4DOJI+JuIiXgGyP+bY5+K+AhI+htKDHgLSP50mySN+AnI/xr/Cc94Dkj9Gs0A+QVI95c6p4J5Csf+mo6D+QzvyPyaXIV5BEn3GjOXBvkCyf4aHgh5DMj+fBnsCfkHSfMZ0wnLeQjJ9JmUjPkBSfuXOXuOeQXJ9hlWD/kM7snwmTCBegpJ95kTl4L6C8nyGPIEehFQ7nAYwAX6BMr6mJqJx3oByvGYgYj6Bkr8FzhcCnoJSvaYNov6DOrK9JgEjXoNSf8H3M7qF8r2B8BrHErwh6zB6xDL+wdzaxJL8gdsxOshUOyHNmsTSvcHHMfrFUvxhvlrFsv8BszK6xnL+gacax3K8QaMzesYS/8GT2seS/mGLMDsH8v0BgJsG0v+hdzD7BFM/IWlbB9K9wWMxuwUTPqFeGwXTPiFTMnsGMzzBStsEsz2hRzM7BpM9ITubBvM+4TMz+wVzPYEoW0QTfCEjMLtEc33hGRtE031hDzF7R1M8AQXbRZN+oPsyO0UzfUDym0XzfwDrMvtGs32g41tHE30g1zO7RlN8oMgbh3N/QL8we4QzveC024STvICtkTuJl3yhm4fTfcCZ+4WZU7xgkluGE74giruFmnO8wIMbh7M8QHd7hZmzvuBr24eTvYBgO8WbM79AWJvEU/7ATPvFmtO+QEFbxRP9wDW7xZlz/4AuG8XT/wAie8cj874b/oNQKzvG8/zAHzObx1P/YBP7x7P+AAlwWAQQPoPufLwAfHQ+sLw8Bs/VfXwAfrX6w+viB8EwPAOz/6/+Z63wHdrbqb6cAZA+gc+KfvwCsD4Dff9cAznwPQNk/7wDcD5DUOY8HEPQP4M/fHxDED+cwyy83EAwfgMZ/nE8QPB9gw19nEJQPsPO+r38QVB8Auv+5O/+5hB9QtU+vEJwf5xCvD8cQtB/QqM/s3xAkGJAa7xCqP7CTpa/3EMwfAMZS/B/gc53fJyAULzCZLz8gzmwf8JLvVyAsL9CPOc9vIFwvkImPhyB0L+fghN+fIIwvMIAvmLcgJBdGVQ99ziGkL1+ZeOYh1C8fcv4irX7fmWsWMewvn2UuMQQ/f5liRjEcP89dXjFMPx+ZWXYyZa7fUo4xfD8vmU6mMZQ/70e+MTQ/P5lD1jGsPx9A7jN2bW+ZOwZB9D+/Nh5BDE/vmS42QSRPPypOQTxPj5klZkFUT28ifkHEPy+ZHJZBbE/vFa5BhE8/mRHGQZxP/wreQcxPv5kE9kHkHw8ADlG0T1/r+14nUB6tfh/1Hsw/UW1P8G5XUCxfL+o5Lm9QXF+f6J6HUERYzu8Uz+DOn1B0Xz/fOT63UERf/9j+z1CkX+dP1E7nUNRfn8+eHP9Q7F/vyu4XYLxX3mXnz8fOL2C8X6/ErpxHYDRvb75uX2BMb99zvN53YBxvD7UOj2DOBG9fsF6nYJRvH6o5Hr9hZa5Pok7XYIxf5y+fLu9grG/vmO6cB3DEb6+Srh9wDH9vc4xuN3Akf9+K3k9wzjx/n4SeZ3B8b15+zH5x9G+ueZZxhH/+dMyucZx/3nHGcbR/DmrM3nFUf85j9nHkf65gzA6BzH/+WyaBFI9OV8w+gfx/LlRWgSyPzkrMboFcj65HhoF0j45EzJ6BjI9OPraBRI8OOMzOgaSPXjPmgWx/ri7M/oG8j24oFpHUj94mzC6RHJ8OH0aRNJ9+HcxekQSfXhp2kUyfrhWYjpF8mO8U/hCmkXyfvpkKvpGUn34E1pGsn165Ae6R3J+u/K0HoB3WBZ/2bR+gHTWf7Q0cN6AMr37rfU+gPKjvFOc+5T1noDyvjuCNnH+gJK/e292XoISvvnPYva+gbK9+0n3HoHKcr15eMsyvHskd96HOrQ5uxG0PsLSvvr85vSew/K9+uX0/sBS/5z6zPVewRL+Oro2cb7Bcv76mvYewLL8Oc6INn7B0v16dXbewzoy/jpWNz7Ckv96QOd3nsNS/noqd/7Dsv0dehF0XwLy/X5J+HM8zbXlGwTTPLXNewUzPM31udsFkzz1ojsF8z5MdZabCNc9gvsEEz00xXdbBrMjvFA1X7sGszzPNUAbR9M89Tx7RxM8z/Ug20QzfTURO0TzfmZ1l0lTf7Tp+0dzPzZk3ltFs3x0zrtKtzy7MxtG0370p3tEk3w0lzPbRhN/NHg7hzN+tG8wm4eTfbRU+4RTvnQ3MVuFE7+0IbuFc760CXIbhdO9t/Gyf4BzN33P3vLfgpO8N8wzP4By86O8UPes85+C8724TztTvTeBMF/Ds753bOZwv8AT/XdVcR/Ac/9fN08xf8Bw1Do3NcYx38GT47xTdyNyP8M5M/y3ELKfwZP99vzl8v/CU/826zNfwrP/njbSM7/DE/22xbJwHAPT/LassHwAMD11zo1w3ADT/HZ0cTwHOrQ7dltxnAFQPvZM5vH8AbA+dkJyXAIQP512KXK8AnA89hzycxwA8D/2A/N8AzA8smXn2Adz/fHQOEbQPPBluJhEUH4xpPhEsF74lM0xjVhEsH3xbbhFcHzNcWIYRdB+sU54RjB8z3Eu2EaQfLEfOEbwfk+xA5hHUHzz+Enz/jJk3FiHsH2w0LiEcL0yZMUYhNC98KV4hTC9cYSZ2IWQvviPkD/wcpiE2fC9MGL4hHEB+VcYWmRXWIZQvzAvuIawvrJkIBjHcL2wCHjH0Lyxr/Cs3MBtlL/RbTzA4PD9OP4BT/Or7fzBsP+dM5kuXMIQ/LOMrnK8wnD/s3OvHMLQ/PHPYO98wzD+M04v3MM7kP0zNSw9A/D+cyHWbJ0AbxS58xXs/QJwUSM+UrL2rV0BET/xzuPtvQFxPLLErh0CdG1V+DK4Ln0B0T8xzp8u3QBRPjKGLz0DOpE9snmvnQIxPvJk5u/9AvE8MlQsXUARf51yQWy9QHF88jTuYR1AsSJ8lhvtfUDRfvDOAu3dQZF8LfI5R1E8zW3emUXxfG3G+Uu1eM2ts1lHEXytm7lHcXzN7YQZh9F87Wx5hDG8zi1Y2YSRv21FOYSxPM5tLZmE8b8tDfmFsbzM7QpZhhG/7O65hnG8zSzfGYbRvKzTeYZRfM+st9mHMb1ssDnHkbzOLJCZx/G/bHz5xFH8zKxtWcexP6xRucSx/MzsQhnF0f4sLnnGMf3MrArZxTF8L/wrPcM0adX/3Oudw1H+r9Tmq/3Dsf/vw+heABI/HK+kqL4AciMd15gqcR4AcjzveOl+ATI8bc9sad4HFnrvRuo+AmGSPjo7Hdc0Kv4Csj8tzxsrXgMSPG8Ia74HmxV+9ageQ3I+7uLqcH5AMn3uyejeQ9I8766w6T5AaTV7botqcZ5Akn7ufun+QPJ9bc5Zal5CEn6uRqq+Qzpyf+4z6x5Bsn0uIOUrfkLSfm4Oa95DknzPqfg6h/J/KeyahFK8z+nM+oSyvSm9WocyfM5pqbqFErwpphqFcr6M6YZ6hdK+vAlykr7qZVM6hjK9aS+ahvK8aqUX+oi2vQRaxGu2+upk7LrEEv3o1RrHUrzqZL16xVJ9qJ3axZL+6mSKOsTS/Ch6msUy/qpkUvrGUvxoT1rIlntqZDO6xfL+6CQbB3L96uQMewcS/Ov05N8AZxqWv+IlPwDzIL5RK8jlJZ8BUzyrvKX/APM/Xeup5l8AZxS7K5TnJr8CEz4rficfAtM/Mxs/IpNe598Dkz5rUOZkP0PzPWs5ZJ9AU3+eKxok/0Czf2sHZ5FfQbM8KBsJE31q1WZyH0FzfGq8Zn9FVz6c5SbfQdN8qpCnP0Lzf5+qd6efQ1N+ql6mc/9Ck3xqWGRfhTV5Kc45JL+Ds3yqLKUfgrjTvCogJX+AZ3f7KM4HJd+BM7xl9juIlnjNpeKbhfO9Jdb7hrO8zCW/W4cTv6Wvu4dzvMzlnBvH074liHvGU7zPZXTbxHO+5Wk7xDP8z6VJm8VT/OU5+8Wz/M2lGlvGE/7lBrvGc/zPpOcbxJP+pM97xnM8ziTD28cz/KScOATz/M+kgJgG0/zkcPgH8/zPZElYB5P8pDm4BRA8zWQaGASwPOQOeAYwP12n7aLcAGCWeSfg5SM8AvA8J8gjnANQIOM/l5wROXVntWBcQ7A/nqeioLxDUD/nj+JxHEBwfKdwoXxA0Hwnj2Qh3EAQYH0RZ1FicjxBkH6nPqKcQBB/5c8r4vxBMH0nGSNcQzpQfmcGY7xDEH8m5OcgHIKwf+bH4HyAML+fZrtg3INwfuau4nE8gPC95pXhnIPQfWXOiWH8gbC+JmoiXIM5UL0mUSK8gnC8JjjkIxyC0L1mJWN8gzC9nGYMY9yElnrh5DjFmJC94cyYx/C/Ibj4xZiw/GGpWMhVOSGJuMWZEPyhfhjF0P3hanjFmXD/IVbYxjD8YUc4yZiWeSEnmMdQ/CEP+MWbsP8g8FkEUP4g2LkFmHE9IMEZBBE8IKl5BZkxPWCV2QaQ/iB2OQWY0T0gXpkF8T5gSvkFmrE84CdZBZE8YBu5CrjUe2P/XB1AXlU/+NUcfUBeNP/Z3N1AkX8dY81dPUDxY75So7jmnZ1A8X4jrh39QDFjmZzXlR5dQVF943Xecr1CcXzjXN8dQtF/4c9D331DMX7jKt/dQzgxfCMYHD2BsX+jCOecnYBRvGLsXP2FtX3O5h1dg5F+4sbdvYM5cb5iul4dg/F94qzl3n2BEb+ip57dgpG/nOKU3z2C8b2idZ5znYIxvuJi3/2B0bwhzlAcXcNRvyI3HL3DOHH84jDdHcDR/aISZZ19wTH8nfnZxZH9XmXaOcXx/N3OmcZR/15lpvnEEf7dm1nGEX3eZYO5x7G83WgaBrH+HmVUegcR/R082gfR/B5lJToEMjzdBZoFUj6fMfYI8j0c2loFsj7c0zK6BnI/nLMaBJI+nJszegbSPZyD2geSPJxrMDpH8j1cSJpEUnxcMZD6SrX8FVpHcfycBbpGuRJ9X+VaHkBZtj/Rxpp+QjJjvlPfv9reQzoyf1+zWz5CknyfouSbnkLyfl+aW/5AWzg0uV+BWF6Dsn6fbOaYvoByv99b2R6DUn+e30LZfoASvd8p2nHegTK9Xx1aPoDSvN3PENqegfK9nvGa/oM6Ur9e61tegZK8ntjkm76Csr3exdgew3K+Hx6zGH7DErx61rh2Hc6aGT7Akv/ek9mewzlS/J50mf7A8v5ebOZaXsGy/x5PGr7Ccv2cXjxbHsVW/i/bfsM7Mv7eFtvewhL92f8wOweS/xnomwfy/hnTMPsEUz7ZsVsEsz5ZpzG7BRM/GYYbBXM+mXsyewXTPZli2wYzP1lbMzsGkz7ZT5sG8z+ZLKP7B1M/XEkgE32Yt0hzfZC7f+MQyXtFM33Y0dtGGZN8WK47Sfd+P32EiRpkjvtHsz7Yh1tGs35YxHu7RxNh/ZDYVBuH0rzMWEh7iDS5GCjbhJO9zJgdO4Qzvxv3FZ+CdFQ0uFvkVf+E1HgbD6wWX4Gzvlu+YxOGlHMfgtO/23PXf4ITof2TnltOV9+Dk73bQdZwP8PzvNso1J/AU/2ZzwmU/8MzvJrwlV/DORP9WtFVv8ITvNrE5NYfwdP+2pLWf8Iz/nJ71rh02mDXP8Lz/pnOWpefwpP9GjUX/8M7U/5aIlRcABA9WgrlVLwAcBxrIGT8AqWc/pX1GAez/9XheAcYcDyVxdgFkCH9ktXLMjgFkD1VppgGUDzVmzL4BTA8VY9YCNR4FbszuAawP5WoGEXwPhWHMHhH0D0VbNhEkH3VTzE4RDB+lS2YRPB9lRcx+EWwf1UOWEYQftUAMrhHcD5U9xhG0H8YfYCMzxTX2EZwfNTQOIfwfQ0UpJiEULyYvABQfViFm5B9VHm4hXC+FFoYhZkQv1RGeIYwvBQq2IWakL+UGziF0LzUC5iGuLP/V+NT/IBR9D/UxtBcwBDj/Be3kLzAcP+c16TRHMDQ/9eL0Ol8wFCVetdy0dzAEP+fV39SPMGQ/5dTkcKcwlD+mNc7tFc0U1zGYxT/XNGY+w7QHQPQ/5VO75B9A3D+loPG1JE/nRaxET0A8TyWpJJxnQAxPVaFUf0BUTzVznjSXQGxPFZsUr0HGTQ+X9MdAtEh/ZEWTOUTfQIRPBY0E90DkT+dViFQPUPxPNYU0nCdQfD+FgIQ/UBRftGF4VlFEX1ZVdm00bIZRZlxf9GWeUbRPlFy2UWaMX8RUzlF0XxRQ5lJmZd5ESP5R7F8kRRZhZgRvVD0uYbxfxDtGYWY0bzQ6XmFMb4Q1dmFmZG9kMo5hfG+UKqZhZpRv5CW+YaxvFB7WYmZdniQT7mEcb1QLBnHm9G+kBh5xxG9E/UM6N3ATZd4E9wNPcDx/16Tto2dwE8Vv7BOcf3HN7tTl05dwVH9Ew+RDr3CEf65/cKN03Dlz33Ccf8TXw/dwzHjO7+TUzNMPgOR/RMs5QyeAFI9UwFM/gCyPzD6FZm64g2+AXI/UsznTh4B0j7Sws5+ARI/nBKwDt4Ckj5StkxzPgLyPhJ+D54CMj+aP5rBUlJMXkASfVI5TnC+Q1I/0hPNHkDSfs2F+XpFMn16dPgUTb46SZs3uM3KmkWSfY2q+kWaUn7Nl1pHEn+Nd7pFm3J8TVgaize6jVx6hZgyvQ042oSSvI0tOoWasn8NBZqE8rzNAfqFmbK9DNZahhK+TMK6hZpyv4yvGoVSvMyfeoQC0pwA</item>
... snipped
<item name="Gain6">QkFSWgAACOQCAEjgBu3z8D/u/wAPEg8kDzYPAEgPWg9sD34PkA+iD7QPxg8A2A/qD/wPDh8gHzIfRB9WHwBoH3ofjB+eH7Afwh/UH+YfAPgfCi8cLy4vQC9SL2Qvdi8AiC+aL6wvvi/QL+Iv9C8GPwAYPyo/PD9OP2A/cj+EP5Y/AKg/uj/MP94/8D8CTxRPJk8AOE9KT1xPbk+AT5JPpE+2TwDIT9pP7E/+TxBfIl80X0ZfAFhfal98X45foF+yX8Rf1l8A6F/6XwxvHm8wb0JvVG9mbwB4b4pvnG+ub8Bv0m/kb/ZvAAh/Gn8sfz5/UH9if3R/hn8AmH+qf7x/zn/gf/J/BI8WjwAojzqPTI9ej3CPgo+Uj6aPALiPyo/cj+6PAJ8SnySfNp8ASJ9an2yffp+Qn6KftJ/GnwDYn+qf/J8OryCvMq9Er1avAGiveq+Mr56vsK/Cr9Sv5q8A+K8Kvxy/Lr9Av1K/ZL92vwCIv5q/rL++v9C/4r/0vwbPABjPKs88z07PYM9yz4TPls8AqM+6z8zP3s/wzwLfFN8m3wA430rfXN9u34Dfkt+k37bfAMjf2t/s3/7fEO8i7zTvRu8AWO9q73zvju+g77LvxO/W7wDo7/rvDP8e/zD/Qv9U/2b/AHj/iv+c/67/wP/S/+T/9v8ACA8aDywPPg9QD2IPdA+GDwCYD6oPvA/OD+AP8g8EHxYfACgfOh9MH14fcB+CH5Qfph8AuB/KH9wf7h8ALxIvJC82LwBIL1ovbC9+L5Avoi+0L8YvANgv6i/8Lw4/ID8yP0Q/Vj8AaD96P4w/nj+wP8I/1D/mPwD4PwpPHE8uT0BPUk9kT3ZPAIhPmk+sT75P0E/iT/RPBl8AGF8qXzxfTl9gX3JfhF+WXwCoX7pfzF/eX/BfAm8UbyZvADhvSm9cb25vgG+Sb6Rvtm8AyG/ab+xv/m8QfyJ/NH9GfwBYf2p/fH+Of6B/sn/Ef9Z/AOh/+n8Mjx6PMI9Cj1SPZo8AeI+Kj5yPro/Aj9KP5I/2jwAInxqfLJ8+n1CfYp90n4afAJifqp+8n86f4J/ynwSvFq8AKK86r0yvXq9wr4KvlK+mrwC4r8qv3K/urwC/Er8kvza/AEi/Wr9sv36/kL+iv7S/xr8A2L/qv/y/Ds8gzzLPRM9WzwBoz3rPjM+ez7DPws/Uz+bPAPjPCt8c3y7fQN9S32Tfdt8AiN+a36zfvt/Q3+Lf9N8G7wAY7yrvPO9O72Dvcu+E75bvAKjvuu/M797v8O8C/xT/Jv8AOP9K/1z/bv+A/5L/pP+2/wDI/9r/7P/+/xAPIg80D0YPAFgPag98D44PoA+yD8QP1g8A6A/6DwwfHh8wH0IfVB9mHwB4H4ofnB+uH8Af0h/kH/YfAAgvGi8sLz4vUC9iL3Qvhi8AmC+qL7wvzi/gL/IvBD8WPwAoPzo/TD9eP3A/gj+UP6Y/ALg/yj/cP+4/AE8STyRPNk8ASE9aT2xPfk+QT6JPtE/GTwDYT+pP/E8OXyBfMl9EX1ZfAGhfel+MX55fsF/CX9Rf5l8A+F8KbxxvLm9Ab1JvZG92bwCIb5pvrG++b9Bv4m/0bwZ/ABh/Kn88f05/YH9yf4R/ln8AqH+6f8x/3n/wfwKPFI8mjwA4j0qPXI9uj4CPko+kj7aPAMiP2o/sj/6PEJ8inzSfRp8AWJ9qn3yfjp+gn7KfxJ/WnwDon/qfDK8erzCvQq9Ur2avAHiviq+cr66vwK/Sr+Sv9q8ACL8avyy/Pr9Qv2K/dL+GvwCYv6q/vL/Ov+C/8r8EzxbPACjPOs9Mz17PcM+Cz5TPps8AuM/Kz9zP7s8A3xLfJN823wBI31rfbN9+35Dfot+038bfANjf6t/83w7vIO8y70TvVu8AaO9674zvnu+w78Lv1O/m7wD47wr/HP8u/0D/Uv9k/3b/AIj/mv+s/77/0P/i//T/Bg8AGA8qDzwPTg9gD3IPhA+WDwCoD7oPzA/eD/APAh8UHyYfADgfSh9cH24fgB+SH6Qfth8AyB/aH+wf/h8QLyIvNC9GLwBYL2ovfC+OL6Avsi/EL9YvAOgv+i8MPx4/MD9CP1Q/Zj8AeD+KP5w/rj/AP9I/5D/2PwAITxpPLE8+T1BPYk90T4ZPAJhPqk+8T85P4E/yTwRfFl8AKF86X0xfXl9wX4JflF+mXwC4X8pf3F/uXwBvEm8kbzZvAEhvWm9sb35vkG+ib7Rvxm8A2G/qb/xvDn8gfzJ/RH9WfwBof3p/jH+ef7B/wn/Uf+Z/APh/Co8cjy6PQI9Sj2SPdo8AiI+aj6yPvo/Qj+KP9I8GnwAYnyqfPJ9On2Cfcp+En5afAKifup/Mn96f8J8CrxSvJq8AOK9Kr1yvbq+Ar5KvpK+2rwDIr9qv7K/+rxC/Ir80v0a/AFi/ar98v46/oL+yv8S/1r8A6L/6vwzPHs8wz0LPVM9mzwB4z4rPnM+uz8DP0s/kz/bPAAjfGt8s3z7fUN9i33Tfht8AmN+q37zfzt/g3/LfBO8W7wAo7zrvTO9e73Dvgu+U76bvALjvyu/c7+7vAP8S/yT/Nv8ASP9a/2z/fv+Q/6L/tP/G/wDY/+r//P8ODyAPMg9ED1YPAGgPeg+MD54PsA/CD9QP5g8A+A8KHxwfLh9AH1IfZB92HwCIH5ofrB++H9Af4h/0HwYvABgvKi88L04vYC9yL4Qvli8AqC+6L8wv3i/wLwI/FD8mPwA4P0o/XD9uP4A/kj+kP7Y/AMg/2j/sP/4/EE8iTzRPRk8AWE9qT3xPjk+gT7JPxE/WTwDoT/pPDF8eXzBfQl9UX2ZfAHhfil+cX65fwF/SX+Rf9l8ACG8abyxvPm9Qb2JvdG+GbwCYb6pvvG/Ob+Bv8m8EfxZ/ACh/On9Mf15/cH+Cf5R/pn8AuH/Kf9x/7n8AjxKPJI82jwBIj1qPbI9+j5CPoo+0j8aPANiP6o/8jw6fIJ8yn0SfVp8AaJ96n4yfnp+wn8Kf1J/mnwD4nwqvHK8ur0CvUq9kr3avAIivmq+sr76v0K/ioQ</item>
</list>
</obj>
This is what I have so far.
The document you're using is not from an actual PCR run, as inferred from the readable data. It is a color compensation run (short overview that seems to match the file) (full updated manual, page 250, not as fitting). Specifically, it seems to be a color compensation run for the "FAM/Pulsar 650" dye.
The output type, as you point out, is this "AcquisitionTable" with 2400 "counts" which must be different, I believe, from output you would normally get from a PCR run. I'm sure you've found these already, but a few public examples of PCR templates (not completed runs) are here, here, here and here.
According to the LCRunProgram in your file, the protocol here was:
hold 95°C for 0" at a speed of 20°C/s
hold 40°C for 30", 20°C/s
hold 95°C for 0" at 0.1°C/s, acquisition mode "2".
So, we're expecting that the acquisition timeframe lasted an estimated (95°C-40°C) / 0.1°C/s = 550 seconds, approximately; during which time, there should have been a fixed number of acquisition events per second.
EDIT 0 - this is what I had done at the beginning, so I'm not deleting it, but I got more interesting information later (see below).
I took a look at the data with a simple Python script (I'm a Python guy), to search for patterns. The script holds your data's initial strings in a dictionary called values which would be too long to post here; so here's it in a gist, just as you had to do.
#!/usr/bin/env python3
import base64
from collections import OrderedDict, defaultdict
from values import values
def splitme(name, sep):
splitted = base64.b64decode(values[name]+'==').split(sep)
print("{:<12} [{}; {}] separated in {} chunks: {}".format(
name,
len(values[name]), len(base64.b64decode(values[name]+'==')),
len(splitted),
[len(i) for i in splitted]))
return splitted
if __name__ == '__main__':
allchunks = defaultdict(list)
separator = b'\r'
print("separating by:", separator)
for key in values:
data = splitme(key, sep=separator)
for i, item in enumerate(data):
allchunks[item].append((key, i))
print("Common chunks:")
for location in [value for item, value in allchunks.items() if len(value)>1]:
print(location)
Let's get the obvious out of the way and say that ProgramNo and CycleNo hold the same data; and all Gain are identical. So I'll only post one of each.
Now trying the script with the separator b'\r' (just to try for one) cuts a few of them in chunks of 272 (271+separator) bytes. The others aren't tidy.
separating by: b'\r'
SampleNo [1536; 1152] separated in 5 chunks: [174, 271, 271, 271, 161]
ProgramNo [1531; 1148] separated in 6 chunks: [47, 271, 271, 271, 271, 12]
SegmentNo [1531; 1148] separated in 5 chunks: [169, 271, 271, 271, 162]
Separating by b'\t' gives similar results:
separating by: b'\t'
SampleNo [1536; 1152] separated in 5 chunks: [204, 271, 271, 271, 131]
ProgramNo [1531; 1148] separated in 5 chunks: [76, 271, 271, 271, 255]
SegmentNo [1531; 1148] separated in 5 chunks: [199, 271, 271, 271, 132]
And separating by b'\n' splits the gains this time, in a similar way:
separating by: b'\n'
Gain1 [3046; 2284] separated in 10 chunks: [81, 271, 271, 271, 271, 271, 271, 271, 271, 26]
So I am not at all implying that these "separators" are of any importance; I'm thinking that they are rare characters that appear to cut the data in 272-byte chunks, and this value, 272 bytes, might be important in understanding how this data is stored.
The beginning of each string "BARZ" seems like a "foo-bar" thing; probably set as check at the start of the header.
Another thing that is interesting is that the gains data separates into 8 equal-sized chunks (plus other two smaller blocks). If this data is from a 96-well plate, I would start exploring if this might possibly be a header and then 8 chunks (lines) which would be splittable in 12 items (colums), so that 8*12=96 replicating the setting of a 96-well plate.
Also, if this "272 bytes per line" hypothesis is true, then the data in ProgramNo, SampleNo etc that do split into 272-bytes chunks might be explained if the plate wasn't full, and some wells had samples (with a few complete lines) while others were empty. I'm not sure if this would make sense for a color compensation plate.
Time, Temperature, Error and Fluors do not separate into chunks and you are correct in thinking they are a set of continuous values; not necessarily floats though. Fluorescence can be captured as "units" which might be positive ints (I don't have a LightCycler so I don't know if it's the case or not).
And this is where I am so far. I'm not sure I'll have time to go further. In case I don't reply back, good luck with your endeavour.
EDIT 1:
So regarding the SampleNo data, it seems to be structured in this way:
1) a header, which might or might not be separated by 0x00 like:
* the BARZ header, then 2 times 0x00 (total 6 bytes)
* three bytes, then 0x00 (total 4 bytes)
* 17 bytes, then 0x00 (total 18 bytes)
2) a series of data, each of them comprised of 16 bytes and terminated by 0x00 (so 17 bytes each).
This means that Samples holds a header, plus 66 sets of 17 bytes.
EDIT 2:
Splitting everything by 0x00 with this awful piece of code:
def splitme(name):
data = base64.b64decode(values[name]+'==')
hit = 0
index = 0
countit = 0
splits = []
while hit >= 0 and countit < 500:
countit += 1
hit = data[index+1:].find(0)
index += hit+1
if hit >= 0:
splits.append(index)
lastindex = -1
splitted = []
if splits:
for index in splits:
splitted.append(data[lastindex+1:index])
lastindex = index
else:
splitted = [data]
Yields:
separating by: 0x0
SampleNo [1536; 1152] separated in 70 chunks: [4, 0, 3, 17, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16]
ProgramNo [1531; 1148] separated in 71 chunks: [4, 0, 3, 2, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 12]
SegmentNo [1531; 1148] separated in 69 chunks: [4, 0, 3, 18, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16]
CycleNo [1531; 1148] separated in 71 chunks: [4, 0, 3, 2, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 12]
Time [11944; 8958] separated in 63 chunks: [4, 0, 3, 45, 14, 42, 76, 46, 172, 110, 109, 15, 81, 90, 111, 108, 78, 46, 175, 141, 88, 209, 74, 117, 156, 170, 59, 107, 78, 103, 125, 171, 103, 170, 191, 333, 154, 187, 11, 257, 149, 208, 173, 156, 153, 412, 72, 55, 207, 131, 131, 274, 284, 238, 19, 241, 247, 13, 74, 558, 763, 8, 0]
Temperature [6731; 5048] separated in 14 chunks: [4, 0, 3, 394, 186, 543, 177, 173, 530, 534, 371, 714, 373, 1032]
Error [398; 298] separated in 21 chunks: [4, 0, 3, 2, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 12]
Fluor1 [7539; 5654] separated in 38 chunks: [4, 0, 3, 31, 13, 7, 7, 426, 331, 218, 187, 11, 10, 13, 7, 6, 7, 48, 45, 217, 840, 6, 7, 14, 7, 6, 7, 7, 6, 1178, 8, 6, 1147, 7, 6, 141, 630, 2]
...
Gain1 [3046; 2284] separated in 145 chunks: [4, 0, 3, 9, 7, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 8, 7, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 8, 7, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 8, 7, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 8, 7, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 8, 7, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 8, 7, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 8, 7, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 8, 7, 16, 16, 16, 16]
...
So SampleNo, ProgramNo, SegmentNo, Error and the Gains all split in blocks of 17 bytes (16 bytes + 0x00).
EDIT 3:
The first fifteen 17-bit chunks of ProgramNo (and the copy CycleNo) and Error are identical.
Just to clarify, the "chunks" I describe are what you describe as a series of number pairs, one of which increases by 0x12. The 0x00 that you mention is the separator between the chunks.
EDIT 4:
About Gain data, the link between my initial "272 bytes" blocks and the (16+0x00)-byte blocks, is that there's a repeating pattern of 16 blocks, 15 of them are "16+0x00" blocks and one last block has a 0x00 in the middle. So 17 bytes(=16+0x00) * 16 blocks = 272 bytes total for this repeat.
The whole string is built as follows: the "header" part, then 8 such repeats of 17bytes*16 blocks, and then four 17bytes blocks at the end. So on one side I was right about the 8 blocks, but apparently I was wrong when making the parallel with a 8x12 wells PCR plate. Here it's more like 8*16 (+4).
About Fluor etc. data, I don't have an answer but I'd try to strip the header and see if any (integer or float) compression algorithm can work on it... Compressed data would explain why you have different lengths for these fields.
This is what I found so far. (Some of it overlaps with what you already found)
The data is encoded in Base64, where the padding (=) is missing, so you will need to add that.
The first bytes identify the kind of data. The file I am looking at has DARZ/LARZ/FORM/Empty.
DARZ = Double[]
LARZ = Time? Havent decoded this
FORM = Double[][] (has 96 DARZ fields), this is the only field where byte 6 is 01x
Empty = Just a bunch of 0-1-2
For the first three types the first four bytes thus identify the type.
byte 1-4 = TypeID
byte 5-8 = The size of the element (BigEndian)
byte 9-12 = Checksum?
byte 13 - 13+length = the actual data.
In my case, I needed to extract Fluoresence0 items that have the DARZ header.
Header DARZ (5 bytes including null terminator)
Null bytes (2 bytes)
Block size (1 byte)
Null bytes (2 bytes)
Array size (1 byte)
Array of Doubles / Float64 (8 bytes each one)
End mark (1 byte)
Using HxD editor and it's cool data inspector is possible to validate the values.
With that information, it is easy to parse the data using python and hachoir
class CycleFloat64(FieldSet):
def createFields(self):
yield CString(self, "DARZ Header")
yield Bytes(self, "6 bytes", 0x6)
yield Float64(self, "Value 1")
yield Float64(self, "Value 2")
yield Float64(self, "Value 3")
yield Bytes(self, "1 byte", 0x1)
I have a problem with interpreting a file. The file is builded as follow:
"name"-#-"date"-#-"author"-#-"signature"
The signature is a byte array. When i read the file back in i parse it to String en split it:
myFileInpuStream.read(fileContent);
String[] data = new String(fileContent).split("-#-");
If i look at the var fileContent i see that the bytes are al good.
But when i try to get the signature byte array:
byte[] signature= data[3].getBytes();
Sometimes i get wrong values of 63. I tried a few solutions with:
new String(fileContent, "UTF-8")
But no luck. Can someone help?
The signature is not a fixed length thus i can not do it hard coded...
Some extra info:
Original signature:
[48, 45, 2, 21, 0, -123, -3, -5, -115, 84, -86, 26, -124, -112,
75, -10, -1, -56, 40, 13, -46, 6, 120, -56, 100, 2, 20, 66, -92, -8,
48, -88, 101, 57, 56, 20, 125, -32, -49, -123, 73, 96, 76, -82, 81,
51, 69]
filecontent(var after reading):
... 48, 45, 2, 21, 0, -123, -3, -5, -115, 84, -86, 26, -124, -112,
75, -10, -1, -56, 40, 13, -46, 6, 120, -56, 100, 2, 20, 66, -92, -8,
48, -88, 101, 57, 56, 20, 125, -32, -49, -123, 73, 96, 76, -82, 81,
51, 69]
signature (after split and getBytes()):
[48, 45, 2, 21, 0, -123, -3, -5, 63, 84, -86, 26, -124, 63, 75,
-10, -1, -56, 40, 13, -46, 6, 120, -56, 100, 2, 20, 66, -92, -8, 48, -88, 101, 57, 56, 20, 125, -32, -49, -123, 73, 96, 76, -82, 81, 51, 69]
You can't access data[4] because you have 4 String in your table. So you can access data from 0 to 3.
data[0] = name
data[1] = date
data[2] = author
data[3] = signature
The solution :
byte[] signature = data[3].getBytes();
Edit: I think I finally understand what you are doing.
You have four parts: name, date, author, signature. The name and author are strings, the date is a date and the signature is a hashed or encrypted array of bytes. You want to store them as text in a file, separated by -#-. To do this, you first need to convert each to a valid string. Name and author are already strings. Converting a date to string is easy. Converting an array of bytes to string is not easy.
You can use base64 encoding to convert a byte array to a string. Use javax.xml.bind.DatatypeConverter printBase64Binary() for encoding and javax.xml.bind.DatatypeConverter parseBase64Binary() for decoding.
For example, if you have a name denBelg, date 2013-03-19, author Virtlink and this signature:
30 2D 02 15 00 85 FD FB 8D 54 AA 1A 84 90 4B F6 FF C8 28 0D D2 06 78 C8 64 02 14
42 A4 F8 30 A8 65 39 38 14 7D E0 CF 85 49 60 4C AE 51 33 45
Then, after concatenation and base64 encoding of the signature, the resulting string became, for example:
denBelg-#-20130319-#-Virtlink-#-MC0CFQCF/fuNVKoahJBL9v/IKA3SBnjIZAIUQqT4MKhlOTgUfeDPhUlgTK5RM0U=
Later, when you split the string on -#- you can decode the base64 signature part and get back an array of bytes.
Note that when the name or author can include -#- in their name, they can mess up your code. For example, if I set name as den-#-Belg then your code would fail.
Original post:
Java's String.getBytes() uses the platform default encoding for the string. Encoding is the way string characters are mapped to bytes values. So, depending on the platform the resulting bytes may be different.
Fix the encoding to UTF-8 and read it with the same encoding, and your problems will go away.
byte[] signature = data[3].getBytes("UTF-8");
String sigdata = new String(signature, "UTF-8");
0-???����T�?��K���(
�?x�d??B��0�e98?}�υI`L�Q3E
Your example represents some garbled mess of characters (is it encrypted or something?), but the bytes you highlighted show the problem:
You start with a byte value of -115. The minus indicates it is a byte value above 0x7F, whose character representation highly depends on the encoding used. Let's assume extended US-ASCII, then your byte represents (according to this table) the character ì (with an accent). Now when you decode it the decoder (depending on the encoding you use) might not understand the byte value 0x8D and instead represents it with a question mark ?. Note that the question mark is US-ASCII character 63, and that's where your 63 came from.
So make sure you use your encodings consistently and don't rely on the system's default.
Also, never use string encoding to decode byte arrays that do not represent strings (e.g. hashes or other cryptographic content).
According to your comment you are trying to read encrypted data (which are bytes) and converting them to a string using a decoder? That will never work in any way you expect it to. After you've encrypted something you have an array of bytes which you should store as-is. When you read them back, you have to put the bytes through a decrypter to regain the unencrypted bytes. Only if those decrypted bytes represent a string, then you can use an encoding to decode the string.
You're making extra work for yourself by converting these bytes into Strings by hand. Why aren't you doing it using the classes intended for this?
// get the file /logs/access.log
Path path = FileSystems.getRoot().getPath("logs", "access.log");
// open it, decoding UTF-8
BufferReader reader = Files.newBufferedReader(path, StandardCharsets.UTF_8);
// read a line of text, properly decoded
String line = reader.readLine();
Or, if you're in Java 6:
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream("/logs/access.log"), "UTF-8"));
String line = reader.readLine();
Links:
Files.newBufferedReader
InputStreamReader
Sounds like an encoding issue to me.
First you need to know what encoding your file is using, and use that when reading the file.
Secondly, you say you signature is a byte array, but java strings are always unicode. If you want a different encoding (I'm guessing you want ASCII), you need to do getBytes("US-ASCII").
Of course, if your input was ascii, it would be strange that this could cause encoding issues.