Skip to content
GitLab
Projects
Groups
Snippets
Help
Loading...
Help
What's new
10
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Open sidebar
dms3
go-unixfs
Commits
1f309b72
Commit
1f309b72
authored
10 years ago
by
Jeromy
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
implement basic rabin fingerprint file splitting
parent
fbd611f4
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
102 additions
and
0 deletions
+102
-0
importer/split_test.go
importer/split_test.go
+50
-0
importer/splitting.go
importer/splitting.go
+52
-0
No files found.
importer/split_test.go
0 → 100644
View file @
1f309b72
package
importer
import
(
"testing"
"crypto/rand"
"bytes"
)
func
TestDataSplitting
(
t
*
testing
.
T
)
{
buf
:=
make
([]
byte
,
16
*
1024
*
1024
)
rand
.
Read
(
buf
)
split
:=
Rabin
(
buf
)
if
len
(
split
)
==
1
{
t
.
Fatal
(
"No split occurred!"
)
}
min
:=
2
<<
15
max
:=
0
mxcount
:=
0
n
:=
0
for
_
,
b
:=
range
split
{
if
!
bytes
.
Equal
(
b
,
buf
[
n
:
n
+
len
(
b
)])
{
t
.
Fatal
(
"Split lost data!"
)
}
n
+=
len
(
b
)
if
len
(
b
)
<
min
{
min
=
len
(
b
)
}
if
len
(
b
)
>
max
{
max
=
len
(
b
)
}
if
len
(
b
)
==
16384
{
mxcount
++
}
}
if
n
!=
len
(
buf
)
{
t
.
Fatal
(
"missing some bytes!"
)
}
t
.
Log
(
len
(
split
))
t
.
Log
(
min
,
max
,
mxcount
)
}
This diff is collapsed.
Click to expand it.
importer/splitting.go
0 → 100644
View file @
1f309b72
package
importer
type
BlockSplitter
func
([]
byte
)
[][]
byte
// TODO: this should take a reader, not a byte array. what if we're splitting a 3TB file?
func
Rabin
(
b
[]
byte
)
[][]
byte
{
var
out
[][]
byte
windowsize
:=
uint64
(
48
)
chunk_max
:=
1024
*
16
min_blk_size
:=
2048
blk_beg_i
:=
0
prime
:=
uint64
(
61
)
var
poly
uint64
var
curchecksum
uint64
// Smaller than a window? Get outa here!
if
len
(
b
)
<=
int
(
windowsize
)
{
return
[][]
byte
{
b
}
}
i
:=
0
for
n
:=
i
;
i
<
n
+
int
(
windowsize
);
i
++
{
cur
:=
uint64
(
b
[
i
])
curchecksum
=
(
curchecksum
*
prime
)
+
cur
poly
=
(
poly
*
prime
)
+
cur
}
for
;
i
<
len
(
b
);
i
++
{
cur
:=
uint64
(
b
[
i
])
curchecksum
=
(
curchecksum
*
prime
)
+
cur
poly
=
(
poly
*
prime
)
+
cur
curchecksum
-=
(
uint64
(
b
[
i
-
1
])
*
prime
)
if
i
-
blk_beg_i
>=
chunk_max
{
// push block
out
=
append
(
out
,
b
[
blk_beg_i
:
i
])
blk_beg_i
=
i
}
// first 13 bits of polynomial are 0
if
poly
%
8192
==
0
&&
i
-
blk_beg_i
>=
min_blk_size
{
// push block
out
=
append
(
out
,
b
[
blk_beg_i
:
i
])
blk_beg_i
=
i
}
}
if
i
>
blk_beg_i
{
out
=
append
(
out
,
b
[
blk_beg_i
:
])
}
return
out
}
This diff is collapsed.
Click to expand it.
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment