What should the first few values be as int32 after the text in your example file. Note that the number of bytes from the end of the! In the header I found:. The beginning of the data part should be the three numbers , , How did you find out about the number of bytes between!
None of these numbers can be found anywhere in your attached file, encoded as int32 or uint32, in little-endian or big-endian. I don't see how you could ever get these numbers from the file. Note that if you were editing the file with a text editor to crop the text, then your editor may well have changed the actual binary values. It is not safe to edit a binary file in a text editor. I'll repeat my request for actual documentation of the format.
It'd be a lot easier to understand how data is encoded. Not an even number. By the way:. Unfortunately, I cannot provide any documentation about the data file, I don't have any.
I agree, that working on such a file with the text editor is questionable. But on the other hand, like this I could import the data, plot it, and check that it is correct. The 3 numbers I provided above are part of the data set, which I checked like this. Thanks a lot for you efforts and the idea, that also the operation with the text editor could have caused an error.
I will use the bit counting operations you provided above to investigate, how the file changes it's lenght during the whole operation. I don't have time for this today, but I'll do it in the next days.
Thanks a lot for your effort, I'll post it once I know more! Accepted Answer. Edited: dpb on 25 Mar As noted in the comment, it's bizzaro way to have done, but the following seems to work These aren't quite the same values as OP says, probably he's looking at a different file than the one he posted.
NB: The position after the while loop will be dependent on the data content -- fgetl won't terminate until if finds a two-byte sequence that qualifies as line terminator and it'll be dependent upon the actual data values where that is. So, back up the number of bytes in the last read and the offset of the two terminator bytes that offset the two indicator characters and add one. This is dependent upon the Windows convention which it appears the file follows. It would be more robust to do the read on a character-by-character basis or, as I think G?
Great, this works! You're right, apparently I sent the three numbers of the scan file previous to the one, which I attached. Sorry for that! More Answers 2. Vote 2. Edited: Guillaume on 25 Mar Right, now that we've resolved that the published int32 were incorrect.
Here how I would parse the file. Note that the above should be a lot faster than reading the file line by line. Things to take into account:. It is assumed that the order of the fields in the [Dataset] portion of text is fixed otherwise a more complex parsing of the text is requiried. To properly parse the file, you really need to parse the [Dataset] portions of the code and decode the binary according to the encodings above.
How are we doing? Please help us improve Stack Overflow. Take our short survey. Stack Overflow for Teams — Collaborate and share knowledge with a private group. Create a free Team What is Teams? Collectives on Stack Overflow. Learn more. Asked 6 years, 1 month ago. Active 6 years, 1 month ago. Viewed times. Ikaros 1, 11 11 silver badges 19 19 bronze badges. Add a comment. Answers 2. Kevin Claytor on 9 Nov Import data seems to work pretty well but doesn't directly get you the headers :.
Edited: dpb on 13 Nov A"The better way I hadn't noted before the symptom of repeated delimiters with dlmread ; agreed that's a pit[proverbial]a[ppendage].
IMO, it's unfortunate TMW has chosen to deprecate the use of textread in favor of textscan ; it has the advantage of. The above equivalent in textscan would be. It does seem as though the multiple delimiters option would be a worthwhile enhancement for them; as noted, I hadn't actually noted that behavior previously as I tend to use the textread route for the above reasons.
There are things it can't do that textscan can being able to be called on the same file multiple times being a major one but instead of deprecating it, it should be brought up to the level of textscan instead imo or, alternatively, the option I've asked for since it was introduced, have an optional ability in textscan to return the double array directly and understand a file name as well as file handle.
Actually, on reading the source for dlmread I observed something hadn't noticed before and I don't think it's documented; at least not well -- if one submits an empty string for the formatting string, then textscan will do something else internally and in a regular numeric array come up with the number of fields per input record and reflect that.
That is a super result that should be shouted from the rooftops by TMW but seems to be a closely held secret Columns 1 through 8. Columns 9 through The delimiter is inferred from the formatting.
The result is. When a delimiter is inferred from the formatting of the file ,. I'd forgotten this detail; the behavior is documented. The problem is, there's no way with the interface as designed to specify the header rows and not the delimiter The preprocessing section looks like the following:. If one were to use [] placeholder for the delimiter but also provided the R,C offsets, nargin still returns the place counter in the list so it would be pretty easy to also test for the second argument being empty as well as the first case of only one argument and have it do the search.
Then only if the delimiter were explicitly specified would the multiple vs single come into play. Would take a little more effort to handle that case as well, but certainly doable and probably should have been.
0コメント